JP7077356B2

JP7077356B2 - Peripheral monitoring system for work machines

Info

Publication number: JP7077356B2
Application number: JP2020075442A
Authority: JP
Inventors: 晋相澤
Original assignee: Sumitomo Heavy Industries Ltd
Current assignee: Sumitomo Heavy Industries Ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2022-05-30
Anticipated expiration: 2036-03-01
Also published as: JP2020127225A

Description

本発明は、作業機械の周辺を監視する作業機械用周辺監視システムに関する。 The present invention relates to a peripheral monitoring system for a working machine that monitors the periphery of the working machine.

イメージ・センサと熱を感知するサーモパイル・アレイを持ち、撮像範囲と熱検出範囲を重複させ、サーモパイル・アレイの出力が示す人体らしき範囲のみを顔抽出範囲と限定して画像識別処理の際の不要な演算処理量を減らす人体検出装置が知られている（特許文献１参照。）。 It has an image sensor and a thermopile array that senses heat, overlaps the imaging range and heat detection range, and limits only the human body-like range indicated by the output of the thermopile array to the face extraction range, which is unnecessary for image identification processing. A human body detection device that reduces the amount of computational processing is known (see Patent Document 1).

特開２００６－０５９０１５号公報Japanese Unexamined Patent Publication No. 2006-0590115

一方で、撮像画像が人検知に不適な状態である場合に人検知結果に過度に依存しないよう注意を喚起できる作業機械用周辺監視システムの提供が望まれる。 On the other hand, it is desired to provide a peripheral monitoring system for work machines that can call attention not to be excessively dependent on the human detection result when the captured image is in an unsuitable state for human detection.

本発明の実施例に係る作業機械用周辺監視システムは、作業機械に取り付けられる撮像装置の撮像画像を用いて前記作業機械の周辺に存在する人を検知する作業機械用周辺監視システムであって、前記撮像画像から人候補画像を抽出する抽出部を有し、前記抽出部が抽出した人候補画像の数に基づいて前記撮像画像が人検知に不適な状態であるか否かを判定し、前記撮像画像が人検知に不適な状態であると判定した場合、前記撮像画像が人検知に不適な状態である旨を通知する。

The peripheral monitoring system for a work machine according to an embodiment of the present invention is a peripheral monitoring system for a work machine that detects a person existing in the vicinity of the work machine by using an image taken by an image pickup device attached to the work machine. It has an extraction unit that extracts a person candidate image from the captured image, and determines whether or not the captured image is in an unsuitable state for human detection based on the number of person candidate images extracted by the extraction unit. When it is determined that the captured image is in an unsuitable state for human detection, the captured image is notified that the captured image is in an unsuitable state for human detection.

上述の手段により、撮像画像が人検知に不適な状態である場合に人検知結果に過度に依存しないよう注意を喚起できる作業機械用周辺監視システムが提供される。 By the above-mentioned means, there is provided a peripheral monitoring system for a work machine that can call attention not to be excessively dependent on the human detection result when the captured image is in a state unsuitable for human detection.

本発明の実施例に係る周辺監視システムが搭載されるショベルの側面図である。It is a side view of the excavator equipped with the peripheral monitoring system which concerns on embodiment of this invention. 周辺監視システムの構成例を示す機能ブロック図である。It is a functional block diagram which shows the configuration example of the peripheral monitoring system. 後方カメラの撮像画像の例である。This is an example of an image captured by a rear camera. 撮像画像から識別処理対象画像を切り出す際に用いられる幾何学的関係の一例を示す概略図である。It is a schematic diagram which shows an example of the geometric relation used when cutting out the identification processing target image from a captured image. ショベル後方の実空間の上面図である。It is a top view of the real space behind the excavator. 撮像画像から正規化画像を生成する処理の流れを示す図である。It is a figure which shows the flow of the process which generates the normalized image from the captured image. 撮像画像と識別処理対象画像領域と正規化画像との関係を示す図である。It is a figure which shows the relationship between the captured image, the image area for identification processing, and a normalized image. 識別処理対象画像領域と識別処理不適領域との関係を示す図である。It is a figure which shows the relationship between the image area which is the object of identification process, and the area which is unsuitable for identification process. 正規化画像の例を示す図である。It is a figure which shows the example of the normalized image. 実空間における仮想平面領域と後方カメラとの間の後方水平距離と、正規化画像における頭部画像部分の大きさとの関係を説明する図である。It is a figure explaining the relationship between the rear horizontal distance between a virtual plane area and a rear camera in a real space, and the size of the head image part in a normalized image. 撮像画像から識別処理対象画像を切り出す際に用いられる幾何学的関係の別の一例を示す概略図である。It is a schematic diagram which shows another example of the geometric relation used when cutting out the identification processing target image from a captured image. 撮像画像における特徴画像の一例を示す図である。It is a figure which shows an example of the characteristic image in the captured image. 画像抽出処理の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of an example of an image extraction process. 画像抽出処理の別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of another example of an image extraction process. 画像抽出処理の更に別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of still another example of an image extraction process. 画像抽出処理の更に別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of still another example of an image extraction process. 人検知適否判定処理の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of an example of the person detection suitability determination process. ヘルメット度の度数分布図である。It is a frequency distribution map of the helmet degree.

図１は、本発明の実施例に係る周辺監視システム１００が搭載される作業機械（建設機械）としてのショベル（掘削機）の側面図である。ショベルの下部走行体１には、旋回機構２を介して上部旋回体３が搭載される。上部旋回体３には、ブーム４が取り付けられる。ブーム４の先端にはアーム５が取り付けられ、アーム５の先端にはバケット６が取り付けられる。ブーム４、アーム５、及びバケット６は掘削アタッチメントを構成し、ブームシリンダ７、アームシリンダ８、及びバケットシリンダ９によりそれぞれ油圧駆動される。また、上部旋回体３には、キャビン１０が設けられ、且つエンジン等の動力源が搭載される。また、上部旋回体３の上部には撮像装置４０が取り付けられる。具体的には、上部旋回体３の後端上部、左端上部、右端上部に後方カメラ４０Ｂ、左側方カメラ４０Ｌ、右側方カメラ４０Ｒが取り付けられる。また、キャビン１０内にはコントローラ３０及び出力装置５０が設置される。 FIG. 1 is a side view of an excavator (excavator) as a work machine (construction machine) on which the peripheral monitoring system 100 according to the embodiment of the present invention is mounted. The upper swivel body 3 is mounted on the lower traveling body 1 of the excavator via the swivel mechanism 2. A boom 4 is attached to the upper swing body 3. An arm 5 is attached to the tip of the boom 4, and a bucket 6 is attached to the tip of the arm 5. The boom 4, arm 5, and bucket 6 form an excavation attachment and are hydraulically driven by the boom cylinder 7, arm cylinder 8, and bucket cylinder 9, respectively. Further, the upper swing body 3 is provided with a cabin 10 and is equipped with a power source such as an engine. Further, an image pickup device 40 is attached to the upper part of the upper swing body 3. Specifically, the rear camera 40B, the left side camera 40L, and the right side camera 40R are attached to the upper rear end, the upper left end, and the upper right end of the upper swing body 3. Further, a controller 30 and an output device 50 are installed in the cabin 10.

図２は、周辺監視システム１００の構成例を示す機能ブロック図である。周辺監視システム１００は、主に、コントローラ３０、撮像装置４０、入力装置４２、出力装置５０、及び機械制御装置５１を含む。本実施例では、撮像装置４０、入力装置４２、出力装置５０、及び機械制御装置５１は、ＣＡＮを介してコントローラ３０に接続されている。 FIG. 2 is a functional block diagram showing a configuration example of the peripheral monitoring system 100. The peripheral monitoring system 100 mainly includes a controller 30, an image pickup device 40, an input device 42, an output device 50, and a machine control device 51. In this embodiment, the image pickup device 40, the input device 42, the output device 50, and the machine control device 51 are connected to the controller 30 via the CAN.

コントローラ３０は、ショベルの駆動制御を行う制御装置である。本実施例では、コントローラ３０は、ＣＰＵ及び内部メモリを含む演算処理装置で構成され、内部メモリに格納された駆動制御用のプログラムをＣＰＵに実行させて各種機能を実現する。 The controller 30 is a control device that controls the drive of the excavator. In this embodiment, the controller 30 is composed of an arithmetic processing unit including a CPU and an internal memory, and causes the CPU to execute a drive control program stored in the internal memory to realize various functions.

また、コントローラ３０は、各種装置の出力に基づいてショベルの周辺に人が存在するかを判定し、その判定結果に応じて各種装置を制御する。具体的には、コントローラ３０は、撮像装置４０及び入力装置４２の出力を受け、抽出部３１、識別部３２、追跡部３３、及び制御部３５のそれぞれに対応するソフトウェアプログラムを実行する。そして、その実行結果に応じて機械制御装置５１に制御指令を出力してショベルの駆動制御を実行し、或いは、出力装置５０から各種情報を出力させる。なお、コントローラ３０は、画像処理専用の制御装置であってもよい。 Further, the controller 30 determines whether or not a person is present around the shovel based on the output of the various devices, and controls the various devices according to the determination result. Specifically, the controller 30 receives the outputs of the image pickup device 40 and the input device 42, and executes software programs corresponding to each of the extraction unit 31, the identification unit 32, the tracking unit 33, and the control unit 35. Then, a control command is output to the machine control device 51 to execute the excavator drive control according to the execution result, or various information is output from the output device 50. The controller 30 may be a control device dedicated to image processing.

撮像装置４０は、ショベルの周囲の画像を撮像する装置であり、撮像した画像をコントローラ３０に対して出力する。本実施例では、撮像装置４０は、ＣＣＤ等の撮像素子を採用するワイドカメラであり、上部旋回体３の上部において光軸が斜め下方を向くように取り付けられる。 The image pickup device 40 is a device that captures an image around the excavator, and outputs the captured image to the controller 30. In this embodiment, the image pickup device 40 is a wide camera that employs an image pickup element such as a CCD, and is attached so that the optical axis faces diagonally downward at the upper portion of the upper swing body 3.

入力装置４２は操作者の入力を受ける装置である。本実施例では、入力装置４２は、操作装置（操作レバー、操作ペダル等）、ゲートロックレバー、操作装置の先端に設置されたボタン、車載ディスプレイに付属のボタン、タッチパネル等を含む。 The input device 42 is a device that receives input from the operator. In this embodiment, the input device 42 includes an operation device (operation lever, operation pedal, etc.), a gate lock lever, a button installed at the tip of the operation device, a button attached to an in-vehicle display, a touch panel, and the like.

出力装置５０は、各種情報を出力する装置であり、例えば、各種画像情報を表示する車載ディスプレイ、各種音声情報を音声出力する車載スピーカ、警報ブザー、警報ランプ等を含む。本実施例では、出力装置５０は、コントローラ３０からの制御指令に応じて各種情報を出力する。 The output device 50 is a device that outputs various information, and includes, for example, an in-vehicle display that displays various image information, an in-vehicle speaker that outputs various audio information, an alarm buzzer, an alarm lamp, and the like. In this embodiment, the output device 50 outputs various information in response to a control command from the controller 30.

機械制御装置５１は、ショベルの動きを制御する装置であり、例えば、油圧システムにおける作動油の流れを制御する制御弁、ゲートロック弁、エンジン制御装置等を含む。 The mechanical control device 51 is a device that controls the movement of the shovel, and includes, for example, a control valve that controls the flow of hydraulic oil in a hydraulic system, a gate lock valve, an engine control device, and the like.

抽出部３１は、撮像装置４０が撮像した撮像画像から識別処理対象画像を抽出する機能要素である。具体的には、抽出部３１は、局所的な輝度勾配又はエッジに基づく簡易な特徴、Hough変換等による幾何学的特徴、輝度に基づいて分割された領域の面積又はアスペクト比に関する特徴等を抽出する比較的演算量の少ない画像処理（以下、「前段画像認識処理」とする。）によって識別処理対象画像を抽出する。識別処理対象画像は、後続の画像処理の対象となる画像部分（撮像画像の一部）であり、人候補画像を含む。人候補画像は、人画像である可能性が高いとされる画像部分（撮像画像の一部）である。撮像画像はカラー画像であってもよく、グレースケール画像であってもよい。抽出部３１は、カラー画像をグレースケール化する複数種類の機能を備えていてもよい。それぞれの人候補画像は、人らしさの度合い、又はその度合いを示すレベルについて、大小の差異があると考えてもよい。その度合い、又はその度合いを示すレベルは評価値として捉えることもできる。また、抽出部３１は、それぞれで複数の絞り込みを行う複数段で構成されてもよい。例えば直列接続された前段の第１抽出部、後段の第２抽出部として構成されてもよい。 The extraction unit 31 is a functional element that extracts an image to be identified for identification processing from an image captured by the image pickup device 40. Specifically, the extraction unit 31 extracts simple features based on a local luminance gradient or edge, geometric features by Hough transform, etc., features related to the area or aspect ratio of a region divided based on luminance, and the like. The image to be identified is extracted by image processing (hereinafter referred to as "previous image recognition processing") with a relatively small amount of calculation. The identification processing target image is an image portion (a part of the captured image) that is the target of subsequent image processing, and includes a human candidate image. The human candidate image is an image portion (a part of the captured image) that is highly likely to be a human image. The captured image may be a color image or a gray scale image. The extraction unit 31 may have a plurality of types of functions for grayscale a color image. Each person candidate image may be considered to have a difference in size in terms of the degree of humanity or the level indicating the degree of humanity. The degree, or the level indicating the degree, can be regarded as an evaluation value. Further, the extraction unit 31 may be configured by a plurality of stages in which a plurality of narrowing downs are performed. For example, it may be configured as a first extraction unit in the front stage and a second extraction unit in the rear stage connected in series.

識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する機能要素である。具体的には、識別部３２は、ＨＯＧ（Histograms of Oriented Gradients）特徴量に代表される画像特徴量記述と機械学習により生成した識別器とを用いた画像認識処理等の比較的演算量の多い画像処理（以下、「後段画像認識処理」とする。）によって人候補画像が人画像であるかを識別する。識別部３２が人候補画像を人画像として識別する割合は、抽出部３１による識別処理対象画像の抽出が高精度であるほど高くなる。なお、識別部３２は、夜間、悪天候時等の撮像に適さない環境下で所望の品質の撮像画像を得られない場合等においては、人候補画像の全てが人画像であると識別し、抽出部３１が抽出した識別処理対象画像における人候補画像の全てを人であると識別してもよい。人の検知漏れを防止するためである。 The identification unit 32 is a functional element that identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. Specifically, the identification unit 32 has a relatively large amount of calculation such as image recognition processing using an image feature amount description represented by a HOG (Histograms of Oriented Gradients) feature amount and a classifier generated by machine learning. It is identified whether the human candidate image is a human image by image processing (hereinafter referred to as "post-stage image recognition processing"). The rate at which the identification unit 32 identifies the human candidate image as a human image increases as the extraction of the identification processing target image by the extraction unit 31 becomes more accurate. The identification unit 32 identifies and extracts all of the human candidate images as human images when an image of desired quality cannot be obtained in an environment unsuitable for imaging such as at night or in bad weather. All of the person candidate images in the identification processing target image extracted by the unit 31 may be identified as a person. This is to prevent human detection omission.

次に、図３を参照し、後方カメラ４０Ｂが撮像したショベル後方の撮像画像における人画像の見え方について説明する。なお、図３の２つの撮像画像は、後方カメラ４０Ｂの撮像画像の例である。また、図３の点線円は人画像の存在を表し、実際の撮像画像には表示されない。 Next, with reference to FIG. 3, how the human image looks in the captured image behind the excavator captured by the rear camera 40B will be described. The two captured images in FIG. 3 are examples of captured images of the rear camera 40B. Further, the dotted line circle in FIG. 3 represents the existence of a human image and is not displayed in the actual captured image.

後方カメラ４０Ｂは、ワイドカメラであり、且つ、人を斜め上から見下ろす高さに取り付けられる。そのため、撮像画像における人画像の見え方は、後方カメラ４０Ｂから見た人の存在方向によって大きく異なる。例えば、撮像画像中の人画像は、撮像画像の左右の端部に近いほど傾いて表示される。これは、ワイドカメラの広角レンズに起因する像倒れによる。また、後方カメラ４０Ｂに近いほど頭部が大きく表示される。また、脚部がショベルの車体の死角に入って見えなくなってしまう。これらは、後方カメラ４０Ｂの設置位置に起因する。そのため、撮像画像に何らの加工を施すことなく画像処理によってその撮像画像に含まれる人画像を識別するのは困難である。 The rear camera 40B is a wide camera and is mounted at a height at which a person is viewed from diagonally above. Therefore, the appearance of the human image in the captured image greatly differs depending on the direction in which the person is present as seen from the rear camera 40B. For example, a human image in a captured image is displayed at an angle as it is closer to the left and right edges of the captured image. This is due to image collapse caused by the wide-angle lens of the wide-angle camera. Further, the closer to the rear camera 40B, the larger the head is displayed. In addition, the legs enter the blind spot of the excavator's body and become invisible. These are due to the installation position of the rear camera 40B. Therefore, it is difficult to identify the human image included in the captured image by image processing without performing any processing on the captured image.

そこで、本発明の実施例に係る周辺監視システム１００は、識別処理対象画像を正規化することで、識別処理対象画像に含まれる人画像の識別を促進する。なお、「正規化」は、識別処理対象画像を所定サイズ及び所定形状の画像に変換することを意味する。本実施例では、撮像画像において様々な形状を取り得る識別処理対象画像は射影変換によって所定サイズの長方形画像に変換される。なお、射影変換としては例えば８変数の射影変換行列が用いられる。 Therefore, the peripheral monitoring system 100 according to the embodiment of the present invention promotes the identification of the human image included in the identification processing target image by normalizing the identification processing target image. In addition, "normalization" means that the image to be identified for identification processing is converted into an image having a predetermined size and a predetermined shape. In this embodiment, the identification processing target image that can take various shapes in the captured image is converted into a rectangular image of a predetermined size by projective transformation. As the projective transformation, for example, an 8-variable projective transformation matrix is used.

ここで、図４～図６を参照し、周辺監視システム１００が識別処理対象画像を正規化する処理（以下、「正規化処理」とする。）の一例について説明する。なお、図４は、抽出部３１が撮像画像から識別処理対象画像を切り出す際に用いる幾何学的関係の一例を示す概略図である。 Here, with reference to FIGS. 4 to 6, an example of a process (hereinafter referred to as “normalization process”) in which the peripheral monitoring system 100 normalizes the image to be identified will be described. Note that FIG. 4 is a schematic diagram showing an example of the geometric relationship used when the extraction unit 31 cuts out the identification processing target image from the captured image.

図４のボックスＢＸは、実空間における仮想立体物であり、本実施例では、８つの頂点Ａ～Ｈで定められる仮想直方体である。また、点Ｐｒは、識別処理対象画像を参照するために予め設定される参照点である。本実施例では、参照点Ｐｒは、人の想定立ち位置として予め設定される点であり、４つの頂点Ａ～Ｄで定められる四角形ＡＢＣＤの中心に位置する。また、ボックスＢＸのサイズは、人の向き、歩幅、身長等に基づいて設定される。本実施例では、四角形ＡＢＣＤ及び四角形ＥＦＧＨは正方形であり、一辺の長さは例えば８００ｍｍである。また、直方体の高さは例えば１８００ｍｍである。すなわち、ボックスＢＸは、幅８００ｍｍ×奥行８００ｍｍ×高さ１８００ｍｍの直方体である。 The box BX in FIG. 4 is a virtual three-dimensional object in real space, and in this embodiment, it is a virtual rectangular parallelepiped defined by eight vertices A to H. Further, the point Pr is a reference point set in advance for referring to the image to be identified for identification processing. In this embodiment, the reference point Pr is a point preset as an assumed standing position of a person, and is located at the center of the quadrangle ABCD defined by the four vertices A to D. The size of the box BX is set based on the direction of the person, the stride length, the height, and the like. In this embodiment, the quadrangle ABCD and the quadrangle EFGH are square, and the length of one side is, for example, 800 mm. The height of the rectangular parallelepiped is, for example, 1800 mm. That is, the box BX is a rectangular parallelepiped having a width of 800 mm, a depth of 800 mm, and a height of 1800 mm.

４つの頂点Ａ、Ｂ、Ｇ、Ｈで定められる四角形ＡＢＧＨは、撮像画像における識別処理対象画像の領域に対応する仮想平面領域ＴＲを形成する。また、仮想平面領域ＴＲとしての四角形ＡＢＧＨは、水平面である仮想地面に対して傾斜する。 The quadrangle ABGH defined by the four vertices A, B, G, and H forms a virtual plane region TR corresponding to the region of the image to be identified in the captured image. Further, the quadrangle ABGH as the virtual plane region TR is inclined with respect to the virtual ground which is a horizontal plane.

なお、本実施例では、参照点Ｐｒと仮想平面領域ＴＲとの関係を定めるために仮想直方体としてのボックスＢＸが採用される。しかしながら、撮像装置４０の方向を向き且つ仮想地面に対して傾斜する仮想平面領域ＴＲを任意の参照点Ｐｒに関連付けて定めることができるのであれば、他の仮想立体物を用いた関係等の他の幾何学的関係が採用されてもよく、関数、変換テーブル等の他の数学的関係が採用されてもよい。 In this embodiment, the box BX as a virtual rectangular parallelepiped is adopted in order to determine the relationship between the reference point Pr and the virtual plane region TR. However, if the virtual plane region TR that faces the direction of the image pickup apparatus 40 and is inclined with respect to the virtual ground can be defined in association with an arbitrary reference point Pr, other relations using other virtual three-dimensional objects, etc. Geometric relations may be adopted, and other mathematical relations such as functions and conversion tables may be adopted.

図５は、ショベル後方の実空間の上面視であり、参照点Ｐｒ１、Ｐｒ２を用いて仮想平面領域ＴＲ１、ＴＲ２が参照された場合における後方カメラ４０Ｂと仮想平面領域ＴＲ１、ＴＲ２との位置関係を示す。なお、本実施例では、参照点Ｐｒは、仮想地面上の仮想グリッドの格子点のそれぞれに配置可能である。但し、参照点Ｐｒは、仮想地面上に不規則に配置されてもよく、後方カメラ４０Ｂの仮想地面への投影点から放射状に伸びる線分上に等間隔に配置されてもよい。例えば、各線分は１度刻みで放射状に伸び、参照点Ｐｒは各線分上に１００ｍｍ間隔に配置される。 FIG. 5 is a top view of the real space behind the excavator, and shows the positional relationship between the rear camera 40B and the virtual plane regions TR1 and TR2 when the virtual plane regions TR1 and TR2 are referred to using the reference points Pr1 and Pr2. show. In this embodiment, the reference point Pr can be arranged at each of the grid points of the virtual grid on the virtual ground. However, the reference points Pr may be irregularly arranged on the virtual ground, or may be arranged at equal intervals on a line segment extending radially from the projection point of the rear camera 40B on the virtual ground. For example, each line segment extends radially in 1 degree increments, and reference points Pr are arranged on each line segment at 100 mm intervals.

図４及び図５に示すように、四角形ＡＢＦＥ（図４参照。）で定められるボックスＢＸの第１面は、参照点Ｐｒ１を用いて仮想平面領域ＴＲ１が参照される場合、後方カメラ４０Ｂに正対するように配置される。すなわち、後方カメラ４０Ｂと参照点Ｐｒ１とを結ぶ線分は、参照点Ｐｒ１に関連して配置されるボックスＢＸの第１面と上面視で直交する。同様に、ボックスＢＸの第１面は、参照点Ｐｒ２を用いて仮想平面領域ＴＲ２が参照される場合にも、後方カメラ４０Ｂに正対するように配置される。すなわち、後方カメラ４０Ｂと参照点Ｐｒ２とを結ぶ線分は、参照点Ｐｒ２に関連して配置されるボックスＢＸの第１面と上面視で直交する。この関係は、参照点Ｐｒが何れの格子点上に配置された場合であっても成立する。すなわち、ボックスＢＸは、その第１面が常に後方カメラ４０Ｂに正対するように配置される。 As shown in FIGS. 4 and 5, the first surface of the box BX defined by the quadrangle ABFE (see FIG. 4) is positive to the rear camera 40B when the virtual plane region TR1 is referenced using the reference point Pr1. Arranged to face each other. That is, the line segment connecting the rear camera 40B and the reference point Pr1 is orthogonal to the first surface of the box BX arranged in relation to the reference point Pr1 in the top view. Similarly, the first surface of the box BX is arranged so as to face the rear camera 40B even when the virtual plane region TR2 is referred to by using the reference point Pr2. That is, the line segment connecting the rear camera 40B and the reference point Pr2 is orthogonal to the first surface of the box BX arranged in relation to the reference point Pr2 in the top view. This relationship holds regardless of which grid point the reference point Pr is placed on. That is, the box BX is arranged so that its first surface always faces the rear camera 40B.

図６は、撮像画像から正規化画像を生成する処理の流れを示す図である。具体的には、図６（Ａ）は、後方カメラ４０Ｂの撮像画像の一例であり、実空間における参照点Ｐｒに関連して配置されるボックスＢＸを映し出す。また、図６（Ｂ）は、撮像画像における識別処理対象画像の領域（以下、「識別処理対象画像領域ＴＲｇ」とする。）を切り出した図であり、図６（Ａ）の撮像画像に映し出された仮想平面領域ＴＲに対応する。また、図６（Ｃ）は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化した正規化画像ＴＲｇｔを示す。 FIG. 6 is a diagram showing a flow of processing for generating a normalized image from a captured image. Specifically, FIG. 6A is an example of a captured image of the rear camera 40B, and projects a box BX arranged in relation to a reference point Pr in real space. Further, FIG. 6B is a diagram obtained by cutting out a region of the identification processing target image in the captured image (hereinafter referred to as “identification processing target image region TRg”), and is projected on the captured image of FIG. 6A. Corresponds to the virtual plane area TR. Further, FIG. 6C shows a normalized image TRgt in which the identification processing target image having the identification processing target image region TRg is normalized.

図６（Ａ）に示すように、実空間上で参照点Ｐｒ１に関連して配置されるボックスＢＸは、実空間における仮想平面領域ＴＲの位置を定め、そして、仮想平面領域ＴＲに対応する撮像画像上の識別処理対象画像領域ＴＲｇを定める。 As shown in FIG. 6A, the box BX arranged in relation to the reference point Pr1 in the real space determines the position of the virtual plane region TR in the real space, and the image pickup corresponding to the virtual plane region TR. The image area TRg to be identified on the image is defined.

このように、実空間における参照点Ｐｒの位置が決まれば、実空間における仮想平面領域ＴＲの位置が一意に決まり、撮像画像における識別処理対象画像領域ＴＲｇも一意に決まる。そして、抽出部３１は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して所定サイズの正規化画像ＴＲｇｔを生成できる。本実施例では、正規化画像ＴＲｇｔのサイズは、例えば縦６４ピクセル×横３２ピクセルである。 In this way, if the position of the reference point Pr in the real space is determined, the position of the virtual plane region TR in the real space is uniquely determined, and the identification processing target image region TRg in the captured image is also uniquely determined. Then, the extraction unit 31 can normalize the identification processing target image having the identification processing target image region TRg to generate a normalized image TRgt of a predetermined size. In this embodiment, the size of the normalized image TRgt is, for example, 64 pixels in length × 32 pixels in width.

図７は、撮像画像と識別処理対象画像領域と正規化画像との関係を示す図である。具体的には、図７（Ａ１）は、撮像画像における識別処理対象画像領域ＴＲｇ３を示し、図７（Ａ２）は、識別処理対象画像領域ＴＲｇ３を有する識別処理対象画像の正規化画像ＴＲｇｔ３を示す。また、図７（Ｂ１）は、撮像画像における識別処理対象画像領域ＴＲｇ４を示し、図７（Ｂ２）は、識別処理対象画像領域ＴＲｇ４を有する識別処理対象画像の正規化画像ＴＲｇｔ４を示す。同様に、図７（Ｃ１）は、撮像画像における識別処理対象画像領域ＴＲｇ５を示し、図７（Ｃ２）は、識別処理対象画像領域ＴＲｇ５を有する識別処理対象画像の正規化画像ＴＲｇｔ５を示す。 FIG. 7 is a diagram showing the relationship between the captured image, the identification processing target image area, and the normalized image. Specifically, FIG. 7 (A1) shows the identification processing target image region TRg3 in the captured image, and FIG. 7 (A2) shows the normalized image TRgt3 of the identification processing target image having the identification processing target image region TRg3. .. Further, FIG. 7 (B1) shows an identification processing target image region TRg4 in the captured image, and FIG. 7 (B2) shows a normalized image TRgt4 of the identification processing target image having the identification processing target image region TRg4. Similarly, FIG. 7 (C1) shows the identification processing target image region TRg5 in the captured image, and FIG. 7 (C2) shows the normalized image TRgt5 of the identification processing target image having the identification processing target image region TRg5.

図７に示すように、撮像画像における識別処理対象画像領域ＴＲｇ５は、撮像画像における識別処理対象画像領域ＴＲｇ４より大きい。識別処理対象画像領域ＴＲｇ５に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が、識別処理対象画像領域ＴＲｇ４に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離より小さいためである。同様に、撮像画像における識別処理対象画像領域ＴＲｇ４は、撮像画像における識別処理対象画像領域ＴＲｇ３より大きい。識別処理対象画像領域ＴＲｇ４に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が、識別処理対象画像領域ＴＲｇ３に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離より小さいためである。すなわち、撮像画像における識別処理対象画像領域は、対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が大きいほど小さい。その一方で、正規化画像ＴＲｇｔ３、ＴＲｇｔ４、ＴＲｇｔ５は何れも同じサイズの長方形画像である。 As shown in FIG. 7, the identification processing target image region TRg5 in the captured image is larger than the identification processing target image region TRg4 in the captured image. This is because the distance between the virtual plane area corresponding to the identification processing target image area TRg5 and the rear camera 40B is smaller than the distance between the virtual plane area corresponding to the identification processing target image area TRg4 and the rear camera 40B. Similarly, the identification processing target image region TRg4 in the captured image is larger than the identification processing target image region TRg3 in the captured image. This is because the distance between the virtual plane area corresponding to the identification processing target image area TRg4 and the rear camera 40B is smaller than the distance between the virtual plane area corresponding to the identification processing target image area TRg3 and the rear camera 40B. That is, the identification processing target image area in the captured image is smaller as the distance between the corresponding virtual plane area and the rear camera 40B is larger. On the other hand, the normalized images TRgt3, TRgt4, and TRgt5 are all rectangular images of the same size.

このように、抽出部３１は、撮像画像において様々な形状及びサイズを取り得る識別処理対象画像を所定サイズの長方形画像に正規化し、人画像を含む人候補画像を正規化できる。具体的には、抽出部３１は、正規化画像の所定領域に人候補画像の頭部であると推定される画像部分（以下、「頭部画像部分」とする。）を配置する。また、正規化画像の別の所定領域に人候補画像の胴体部であると推定される画像部分（以下、「胴体部画像部分」とする。）を配置し、正規化画像のさらに別の所定領域に人候補画像の脚部であると推定される画像部分（以下、「脚部画像部分」とする。）を配置する。また、抽出部３１は、正規化画像の形状に対する人候補画像の傾斜（像倒れ）を抑えた状態で正規化画像を取得できる。 In this way, the extraction unit 31 can normalize the identification processing target image that can take various shapes and sizes in the captured image to a rectangular image of a predetermined size, and normalize the human candidate image including the human image. Specifically, the extraction unit 31 arranges an image portion (hereinafter, referred to as “head image portion”) presumed to be the head of a human candidate image in a predetermined area of the normalized image. Further, an image portion presumed to be the body portion of the human candidate image (hereinafter referred to as “body portion image portion”) is arranged in another predetermined area of the normalized image, and still another predetermined region of the normalized image is provided. An image portion presumed to be the leg portion of the human candidate image (hereinafter referred to as “leg portion image portion”) is arranged in the area. Further, the extraction unit 31 can acquire the normalized image in a state where the inclination (image collapse) of the person candidate image with respect to the shape of the normalized image is suppressed.

次に、図８を参照し、識別処理対象画像領域が、人画像の識別に悪影響を与える識別に適さない画像領域（以下、「識別処理不適領域」とする。）を含む場合の正規化処理について説明する。識別処理不適領域は、人画像が存在し得ない既知の領域であり、例えば、ショベルの車体が映り込んだ領域（以下、「車体映り込み領域」とする。）、撮像画像からはみ出た領域（以下、「はみ出し領域」とする。）等を含む。なお、図８は、識別処理対象画像領域と識別処理不適領域との関係を示す図であり、図７（Ｃ１）及び図７（Ｃ２）に対応する。また、図８左図の右下がりの斜線ハッチング領域は、はみ出し領域Ｒ１に対応し、左下がりの斜線ハッチング領域は、車体映り込み領域Ｒ２に対応する。 Next, with reference to FIG. 8, normalization processing is performed when the image area to be identified includes an image area unsuitable for identification that adversely affects the identification of human images (hereinafter referred to as “identification processing unsuitable area”). Will be explained. The region unsuitable for the identification process is a known region in which a human image cannot exist, for example, a region in which the vehicle body of the excavator is reflected (hereinafter referred to as a “vehicle body reflection region”), a region protruding from the captured image (a region protruding from the captured image). Hereinafter, it is referred to as a “protruding area”) and the like. Note that FIG. 8 is a diagram showing the relationship between the identification processing target image area and the identification processing unsuitable area, and corresponds to FIGS. 7 (C1) and 7 (C2). Further, the diagonally downward-sloping hatching area in the left figure of FIG. 8 corresponds to the protruding area R1, and the downwardly downward diagonally-sloping hatching area corresponds to the vehicle body reflection area R2.

本実施例では、抽出部３１は、識別処理対象画像領域ＴＲｇ５がはみ出し領域Ｒ１及び車体映り込み領域Ｒ２の一部を含む場合、それらの識別処理不適領域をマスク処理した後で、識別処理対象画像領域ＴＲｇ５を有する識別処理対象画像の正規化画像ＴＲｇｔ５を生成する。なお、抽出部３１は、正規化画像ＴＲｇｔ５を生成した後で、正規化画像ＴＲｇｔ５における識別処理不適領域に対応する部分をマスク処理してもよい。 In this embodiment, when the identification processing target image region TRg5 includes a part of the protruding region R1 and the vehicle body reflection region R2, the extraction unit 31 masks the identification processing unsuitable region and then the identification processing target image. A normalized image TRgt5 of the identification processing target image having the region TRg5 is generated. After the normalized image TRgt5 is generated, the extraction unit 31 may mask the portion corresponding to the identification processing unsuitable region in the normalized image TRgt5.

図８右図は、正規化画像ＴＲｇｔ５を示す。また、図８右図において、右下がりの斜線ハッチング領域は、はみ出し領域Ｒ１に対応するマスク領域Ｍ１を表し、左下がりの斜線ハッチング領域は、車体映り込み領域Ｒ２の一部に対応するマスク領域Ｍ２を表す。 The right figure of FIG. 8 shows the normalized image TRgt5. Further, in the right figure of FIG. 8, the downward-sloping diagonal line hatch area represents the mask area M1 corresponding to the protruding area R1, and the downward-sloping diagonal line hatching area represents the mask area M2 corresponding to a part of the vehicle body reflection area R2. Represents.

このようにして、抽出部３１は、識別処理不適領域の画像をマスク処理することで、識別処理不適領域の画像が識別部３２による識別処理に影響を及ぼすのを防止する。このマスク処理により、識別部３２は、識別処理不適領域の画像の影響を受けることなく、正規化画像におけるマスク領域以外の領域の画像を用いて人画像であるかを識別できる。なお、抽出部３１は、マスク処理以外の他の任意の公知方法で、識別処理不適領域の画像が識別部３２による識別処理に影響を及ぼさないようにしてもよい。 In this way, the extraction unit 31 masks the image of the region unsuitable for the identification process to prevent the image of the region unsuitable for the identification process from affecting the identification process by the identification unit 32. By this mask processing, the identification unit 32 can identify whether the image is a human image by using an image in a region other than the mask region in the normalized image without being affected by the image in the region unsuitable for the identification processing. The extraction unit 31 may use any known method other than the mask processing so that the image in the region unsuitable for the identification process does not affect the identification process by the identification unit 32.

次に、図９を参照し、抽出部３１が生成する正規化画像の特徴について説明する。なお、図９は、正規化画像の例を示す図である。また、図９に示す１４枚の正規化画像は、図の左端に近い正規化画像ほど、後方カメラ４０Ｂから近い位置に存在する人候補の画像を含み、図の右端に近い正規化画像ほど、後方カメラ４０Ｂから遠い位置に存在する人候補の画像を含む。 Next, with reference to FIG. 9, the features of the normalized image generated by the extraction unit 31 will be described. Note that FIG. 9 is a diagram showing an example of a normalized image. Further, in the 14 normalized images shown in FIG. 9, the normalized image closer to the left end of the figure includes the image of the person candidate existing at a position closer to the rear camera 40B, and the normalized image closer to the right end of the figure. The image of the person candidate existing at a position far from the rear camera 40B is included.

図９に示すように、抽出部３１は、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の後方水平距離（図５に示すＹ軸方向の水平距離）に関係なく、何れの正規化画像内においてもほぼ同じ割合で頭部画像部分、胴体部画像部分、脚部画像部分等を配置できる。そのため、抽出部３１は、識別部３２が識別処理を実行する際の演算負荷を低減でき、且つ、その識別結果の信頼性を向上できる。なお、上述の後方水平距離は、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の位置関係に関する情報の一例であり、抽出部３１は、抽出した識別処理対象画像にその情報を付加する。また、上述の位置関係に関する情報は、仮想平面領域ＴＲに対応する参照点Ｐｒと後方カメラ４０Ｂとを結ぶ線分の後方カメラ４０Ｂの光軸に対する上面視角度等を含む。 As shown in FIG. 9, the extraction unit 31 has any normalization regardless of the rear horizontal distance (horizontal distance in the Y-axis direction shown in FIG. 5) between the virtual plane region TR and the rear camera 40B in the real space. The head image portion, the body portion image portion, the leg portion image portion, and the like can be arranged at almost the same ratio in the image. Therefore, the extraction unit 31 can reduce the calculation load when the identification unit 32 executes the identification process, and can improve the reliability of the identification result. The above-mentioned rear horizontal distance is an example of information regarding the positional relationship between the virtual plane region TR and the rear camera 40B in the real space, and the extraction unit 31 adds the information to the extracted identification processing target image. .. Further, the above-mentioned information regarding the positional relationship includes a top view angle with respect to the optical axis of the rear camera 40B of the line segment connecting the reference point Pr corresponding to the virtual plane region TR and the rear camera 40B.

次に、図１０を参照し、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の後方水平距離と、正規化画像における頭部画像部分の大きさとの関係について説明する。なお、図１０上図は、後方カメラ４０Ｂからの後方水平距離がそれぞれ異なる３つの参照点Ｐｒ１０、Ｐｒ１１、Ｐ１２のところに人が存在する場合の頭部画像部分の大きさＬ１０、Ｌ１１、Ｌ１２を示す図であり、横軸が後方水平距離に対応する。また、図１０下図は、後方水平距離と頭部画像部分の大きさの関係を示すグラフであり、縦軸が頭部画像部分の大きさに対応し、横軸が後方水平距離に対応する。なお、図１０上図及び図１０下図の横軸は共通である。また、本実施例は、カメラ高さを２１００ｍｍとし、頭部ＨＤの中心の地面からの高さを１６００ｍｍとし、頭部の直径を２５０ｍｍとする。 Next, with reference to FIG. 10, the relationship between the rear horizontal distance between the virtual plane region TR and the rear camera 40B in the real space and the size of the head image portion in the normalized image will be described. In the upper figure of FIG. 10, the sizes L10, L11, and L12 of the head image portion when a person is present at three reference points Pr10, Pr11, and P12 having different rear horizontal distances from the rear camera 40B are shown. In the figure shown, the horizontal axis corresponds to the rear horizontal distance. Further, the lower figure of FIG. 10 is a graph showing the relationship between the rear horizontal distance and the size of the head image portion, the vertical axis corresponds to the size of the head image portion, and the horizontal axis corresponds to the rear horizontal distance. The horizontal axes of the upper figure of FIG. 10 and the lower figure of FIG. 10 are common. Further, in this embodiment, the height of the camera is 2100 mm, the height of the center of the head HD from the ground is 1600 mm, and the diameter of the head is 250 mm.

図１０上図に示すように、参照点Ｐｒ１０で示す位置に人が存在する場合、頭部画像部分の大きさＬ１０は、後方カメラ４０Ｂから見た頭部ＨＤの仮想平面領域ＴＲ１０への投影像の大きさに相当する。同様に、参照点Ｐｒ１１、Ｐｒ１２で示す位置に人が存在する場合、頭部画像部分の大きさＬ１１、Ｌ１２は、後方カメラ４０Ｂから見た頭部ＨＤの仮想平面領域ＴＲ１１、ＴＲ１２への投影像の大きさに相当する。なお、正規化画像における頭部画像部分の大きさは投影像の大きさに伴って変化する。 As shown in the upper figure of FIG. 10, when a person is present at the position indicated by the reference point Pr10, the size L10 of the head image portion is a projection image of the head HD seen from the rear camera 40B onto the virtual plane region TR10. Corresponds to the size of. Similarly, when a person is present at the positions indicated by the reference points Pr11 and Pr12, the sizes L11 and L12 of the head image portion are projected images of the head HD on the virtual plane regions TR11 and TR12 as seen from the rear camera 40B. Corresponds to the size of. The size of the head image portion in the normalized image changes with the size of the projected image.

そして、図１０下図に示すように、正規化画像における頭部画像部分の大きさは、後方水平距離がＤ１（例えば７００ｍｍ）以上ではほぼ同じ大きさを維持するが、後方水平距離がＤ１を下回ったところで急激に増大する。 Then, as shown in the lower figure of FIG. 10, the size of the head image portion in the normalized image maintains almost the same size when the rear horizontal distance is D1 (for example, 700 mm) or more, but the rear horizontal distance is smaller than D1. It increases sharply at the point.

そこで、識別部３２は、後方水平距離に応じて識別処理の内容を変更する。例えば、識別部３２は、教師あり学習（機械学習）の手法を用いる場合、所定の後方水平距離（例えば６５０ｍｍ）を境に、識別処理で用いる学習サンプルをグループ分けする。具体的には、近距離用グループと遠距離用グループに学習サンプルを分けるようにする。この構成により、識別部３２は、より高精度に人画像を識別できる。 Therefore, the identification unit 32 changes the content of the identification process according to the rear horizontal distance. For example, when the method of supervised learning (machine learning) is used, the identification unit 32 groups learning samples used in the identification process at a predetermined rear horizontal distance (for example, 650 mm). Specifically, the learning sample is divided into a short-distance group and a long-distance group. With this configuration, the identification unit 32 can identify a human image with higher accuracy.

以上の構成により、周辺監視システム１００は、撮像装置４０の方向を向き且つ水平面である仮想地面に対して傾斜する仮想平面領域ＴＲに対応する識別処理対象画像領域ＴＲｇから正規化画像ＴＲｇｔを生成する。そのため、人の高さ方向及び奥行き方向の見え方を考慮した正規化を実現できる。その結果、人を斜め上から撮像するように建設機械に取り付けられる撮像装置４０の撮像画像を用いた場合であっても建設機械の周囲に存在する人をより確実に検知できる。特に、人が撮像装置４０に接近した場合であっても、撮像画像上の十分な大きさの領域を占める識別処理対象画像から正規化画像を生成できるため、その人を確実に検知できる。 With the above configuration, the peripheral monitoring system 100 generates a normalized image TRgt from the identification processing target image region TRg corresponding to the virtual plane region TR that faces the direction of the image pickup apparatus 40 and is inclined with respect to the virtual ground that is a horizontal plane. .. Therefore, it is possible to realize normalization in consideration of how a person looks in the height direction and the depth direction. As a result, even when the captured image of the image pickup device 40 attached to the construction machine so as to image the person from diagonally above is used, the person existing around the construction machine can be detected more reliably. In particular, even when a person approaches the image pickup device 40, the normalized image can be generated from the identification processing target image that occupies a region of sufficient size on the captured image, so that the person can be reliably detected.

また、周辺監視システム１００は、実空間における仮想直方体であるボックスＢＸの４つの頂点Ａ、Ｂ、Ｇ、Ｈで形成される矩形領域として仮想平面領域ＴＲを定義する。そのため、実空間における参照点Ｐｒと仮想平面領域ＴＲとを幾何学的に対応付けることができ、さらには、実空間における仮想平面領域ＴＲと撮像画像における識別処理対象画像領域ＴＲｇとを幾何学的に対応付けることができる。 Further, the peripheral monitoring system 100 defines a virtual plane area TR as a rectangular area formed by four vertices A, B, G, and H of a box BX which is a virtual rectangular parallelepiped in real space. Therefore, the reference point Pr in the real space and the virtual plane region TR can be geometrically associated with each other, and further, the virtual plane region TR in the real space and the identification processing target image region TRg in the captured image can be geometrically associated with each other. Can be associated.

また、抽出部３１は、識別処理対象画像領域ＴＲｇに含まれる識別処理不適領域の画像をマスク処理する。そのため、識別部３２は、車体映り込み領域Ｒ２を含む識別処理不適領域の画像の影響を受けることなく、正規化画像におけるマスク領域以外の領域の画像を用いて人画像であるかを識別できる。 Further, the extraction unit 31 masks the image of the identification processing unsuitable region included in the identification processing target image region TRg. Therefore, the identification unit 32 can identify whether the image is a human image by using an image in a region other than the mask region in the normalized image without being affected by the image in the region unsuitable for the identification process including the vehicle body reflection region R2.

また、抽出部３１は、識別処理対象画像を抽出した場合、仮想平面領域ＴＲと撮像装置４０との位置関係に関する情報として両者間の後方水平距離をその識別処理対象画像に付加する。そして、識別部３２は、その後方水平距離に応じて識別処理の内容を変更する。具体的には、識別部３２は、所定の後方水平距離（例えば６５０ｍｍ）を境に、識別処理で用いる学習サンプルをグループ分けする。この構成により、識別部３２は、より高精度に人画像を識別できる。 Further, when the identification processing target image is extracted, the extraction unit 31 adds the rear horizontal distance between the virtual plane region TR and the image pickup apparatus 40 to the identification processing target image as information regarding the positional relationship between the two. Then, the identification unit 32 changes the content of the identification process according to the rear horizontal distance thereof. Specifically, the identification unit 32 groups the learning samples used in the identification process at a predetermined rear horizontal distance (for example, 650 mm). With this configuration, the identification unit 32 can identify a human image with higher accuracy.

また、抽出部３１は、参照点Ｐｒ毎に識別処理対象画像を抽出可能である。また、識別処理対象画像領域ＴＲｇのそれぞれは、対応する仮想平面領域ＴＲを介して、人の想定立ち位置として予め設定される参照点Ｐｒの１つに関連付けられる。そのため、周辺監視システム１００は、人が存在する可能性が高い参照点Ｐｒを任意の方法で抽出することで、人候補画像を含む可能性が高い識別処理対象画像を抽出できる。この場合、人候補画像を含む可能性が低い識別処理対象画像に対して、比較的演算量の多い画像処理による識別処理が施されてしまうのを防止でき、人検知処理の高速化を実現できる。 Further, the extraction unit 31 can extract the image to be identified for each reference point Pr. Further, each of the identification processing target image regions TRg is associated with one of the reference points Pr preset as the assumed standing position of the person via the corresponding virtual plane region TR. Therefore, the peripheral monitoring system 100 can extract the identification processing target image that is likely to include the human candidate image by extracting the reference point Pr that is likely to have a person by an arbitrary method. In this case, it is possible to prevent the identification processing target image, which is unlikely to include the person candidate image, from being subjected to the identification processing by the image processing having a relatively large amount of calculation, and it is possible to realize the speeding up of the person detection processing. ..

次に、図１１及び図１２を参照し、人候補画像を含む可能性が高い識別処理対象画像を抽出部３１が抽出する処理の一例について説明する。なお、図１１は、抽出部３１が撮像画像から識別処理対象画像を切り出す際に用いる幾何学的関係の一例を示す概略図であり、図４に対応する。また、図１２は、撮像画像における特徴画像の一例を示す図である。なお、特徴画像は、人の特徴的な部分を表す画像であり、望ましくは、実空間における地面からの高さが変化し難い部分を表す画像である。そのため、特徴画像は、例えば、ヘルメットの画像、肩の画像、頭の画像、人に取り付けられる反射板若しくはマーカの画像等を含む。 Next, with reference to FIGS. 11 and 12, an example of a process in which the extraction unit 31 extracts an image to be identified, which is likely to include a human candidate image, will be described. Note that FIG. 11 is a schematic diagram showing an example of the geometric relationship used when the extraction unit 31 cuts out the identification processing target image from the captured image, and corresponds to FIG. 4. Further, FIG. 12 is a diagram showing an example of a feature image in the captured image. The feature image is an image showing a characteristic part of a person, and is preferably an image showing a part where the height from the ground in the real space is hard to change. Therefore, the feature image includes, for example, an image of a helmet, an image of a shoulder, an image of a head, an image of a reflector or a marker attached to a person, and the like.

本実施例では、抽出部３１は、前段画像認識処理によって、撮像画像におけるヘルメット画像（厳密にはヘルメットであると推定できる画像）を見つけ出す。ショベルの周囲で作業する人はヘルメットを着用していると考えられるためである。そして、抽出部３１は、見つけ出したヘルメット画像の位置から最も関連性の高い参照点Ｐｒを導き出す。その上で、抽出部３１は、その参照点Ｐｒに対応する識別処理対象画像を抽出する。 In this embodiment, the extraction unit 31 finds a helmet image (strictly speaking, an image that can be presumed to be a helmet) in the captured image by the pre-stage image recognition process. This is because people working around the excavator are believed to be wearing helmets. Then, the extraction unit 31 derives the most relevant reference point Pr from the position of the found helmet image. Then, the extraction unit 31 extracts the identification processing target image corresponding to the reference point Pr.

具体的には、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像におけるヘルメット画像の位置から関連性の高い参照点Ｐｒを導き出す。なお、図１１の幾何学的関係は、実空間における仮想頭部位置ＨＰを定める点で図４の幾何学的関係と相違するが、その他の点で共通する。 Specifically, the extraction unit 31 uses the geometric relationship shown in FIG. 11 to derive a highly relevant reference point Pr from the position of the helmet image in the captured image. The geometrical relationship of FIG. 11 differs from the geometrical relationship of FIG. 4 in that the virtual head position HP in the real space is determined, but is common in other respects.

仮想頭部位置ＨＰは、参照点Ｐｒ上に存在すると想定される人の頭部位置を表し、参照点Ｐｒの真上に配置される。本実施例では、参照点Ｐｒ上の高さ１７００ｍｍのところに配置される。そのため、実空間における仮想頭部位置ＨＰが決まれば、実空間における参照点Ｐｒの位置が一意に決まり、実空間における仮想平面領域ＴＲの位置も一意に決まる。また、撮像画像における識別処理対象画像領域ＴＲｇも一意に決まる。そして、抽出部３１は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して所定サイズの正規化画像ＴＲｇｔを生成できる。 The virtual head position HP represents the head position of a person who is assumed to exist on the reference point Pr, and is arranged directly above the reference point Pr. In this embodiment, it is arranged at a height of 1700 mm on the reference point Pr. Therefore, if the virtual head position HP in the real space is determined, the position of the reference point Pr in the real space is uniquely determined, and the position of the virtual plane region TR in the real space is also uniquely determined. Further, the identification processing target image area TRg in the captured image is also uniquely determined. Then, the extraction unit 31 can normalize the identification processing target image having the identification processing target image region TRg to generate a normalized image TRgt of a predetermined size.

逆に、実空間における参照点Ｐｒの位置が決まれば、実空間における仮想頭部位置ＨＰが一意に決まり、実空間における仮想頭部位置ＨＰに対応する撮像画像上の頭部画像位置ＡＰも一意に決まる。そのため、頭部画像位置ＡＰは、予め設定されている参照点Ｐｒのそれぞれに対応付けて予め設定され得る。なお、頭部画像位置ＡＰは、参照点Ｐｒからリアルタイムに導き出されてもよい。 Conversely, if the position of the reference point Pr in the real space is determined, the virtual head position HP in the real space is uniquely determined, and the head image position AP on the captured image corresponding to the virtual head position HP in the real space is also unique. It is decided to. Therefore, the head image position AP can be preset in association with each of the preset reference points Pr. The head image position AP may be derived in real time from the reference point Pr.

そこで、抽出部３１は、前段画像認識処理により後方カメラ４０Ｂの撮像画像内でヘルメット画像を探索する。図１２上図は、抽出部３１がヘルメット画像ＨＲｇを見つけ出した状態を示す。そして、抽出部３１は、ヘルメット画像ＨＲｇを見つけ出した場合、その代表位置ＲＰを決定する。なお、代表位置ＲＰは、ヘルメット画像ＨＲｇの大きさ、形状等から導き出される位置である。本実施例では、代表位置ＲＰは、ヘルメット画像ＨＲｇを含むヘルメット画像領域の中心画素の位置である。図１２下図は、図１２上図における白線で区切られた矩形画像領域であるヘルメット画像領域の拡大図であり、そのヘルメット画像領域の中心画素の位置が代表位置ＲＰであることを示す。 Therefore, the extraction unit 31 searches for a helmet image in the captured image of the rear camera 40B by the pre-stage image recognition process. FIG. 12 The upper figure shows a state in which the extraction unit 31 has found the helmet image HRg. Then, when the extraction unit 31 finds the helmet image HRg, the extraction unit 31 determines the representative position RP. The representative position RP is a position derived from the size, shape, etc. of the helmet image HRg. In this embodiment, the representative position RP is the position of the central pixel of the helmet image region including the helmet image HRg. The lower figure of FIG. 12 is an enlarged view of a helmet image area which is a rectangular image area separated by a white line in the upper figure of FIG. 12, and shows that the position of the central pixel of the helmet image area is the representative position RP.

その後、抽出部３１は、例えば最近傍探索アルゴリズムを用いて代表位置ＲＰの最も近傍にある頭部画像位置ＡＰを導き出す。図１２下図は、代表位置ＲＰの近くに６つの頭部画像位置ＡＰ１～ＡＰ６が予め設定されており、そのうちの頭部画像位置ＡＰ５が代表位置ＲＰの最も近傍にある頭部画像位置ＡＰであることを示す。 After that, the extraction unit 31 derives the head image position AP closest to the representative position RP by using, for example, the nearest neighbor search algorithm. In the lower figure of FIG. 12, six head image positions AP1 to AP6 are preset near the representative position RP, and the head image position AP5 is the head image position AP closest to the representative position RP. Show that.

そして、抽出部３１は、図１１に示す幾何学的関係を利用し、導き出した最近傍の頭部画像位置ＡＰから、仮想頭部位置ＨＰ、参照点Ｐｒ、仮想平面領域ＴＲを辿って、対応する識別処理対象画像領域ＴＲｇを抽出する。その後、抽出部３１は、抽出した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して正規化画像ＴＲｇｔを生成する。 Then, the extraction unit 31 traces the virtual head position HP, the reference point Pr, and the virtual plane region TR from the derived nearest head image position AP by using the geometric relationship shown in FIG. The image area TRg to be identified is extracted. After that, the extraction unit 31 normalizes the identification processing target image having the extracted identification processing target image region TRg to generate a normalized image TRgt.

このようにして、抽出部３１は、撮像画像における人の特徴画像の位置であるヘルメット画像ＨＲｇの代表位置ＲＰと、予め設定された頭部画像位置ＡＰの１つ（頭部画像位置ＡＰ５）とを対応付けることで識別処理対象画像を抽出する。 In this way, the extraction unit 31 includes a representative position RP of the helmet image HRg, which is the position of the human feature image in the captured image, and one of the preset head image position APs (head image position AP5). The image to be identified is extracted by associating with.

なお、抽出部３１は、図１１に示す幾何学的関係を利用する代わりに、頭部画像位置ＡＰと参照点Ｐｒ、仮想平面領域ＴＲ、又は識別処理対象画像領域ＴＲｇとを直接的に対応付ける参照テーブルを利用し、頭部画像位置ＡＰに対応する識別処理対象画像を抽出してもよい。 Instead of using the geometric relationship shown in FIG. 11, the extraction unit 31 directly associates the head image position AP with the reference point Pr, the virtual plane region TR, or the identification processing target image region TRg. The identification processing target image corresponding to the head image position AP may be extracted by using the table.

また、抽出部３１は、山登り法、Mean-shift法等の最近傍探索アルゴリズム以外の他の公知のアルゴリズムを用いて代表位置ＲＰから参照点Ｐｒを導き出してもよい。例えば、山登り法を用いる場合、抽出部３１は、代表位置ＲＰの近傍にある複数の頭部画像位置ＡＰを導き出し、代表位置ＲＰとそれら複数の頭部画像位置ＡＰのそれぞれに対応する参照点Ｐｒとを紐付ける。このとき、抽出部３１は、代表位置ＲＰと頭部画像位置ＡＰが近いほど重みが大きくなるように参照点Ｐｒに重みを付ける。そして、複数の参照点Ｐｒの重みの分布を山登りし、重みの極大点に最も近い重みを有する参照点Ｐｒから識別処理対象画像領域ＴＲｇを抽出する。 Further, the extraction unit 31 may derive the reference point Pr from the representative position RP by using a known algorithm other than the nearest neighbor search algorithm such as the mountain climbing method and the Mean-shift method. For example, when the mountain climbing method is used, the extraction unit 31 derives a plurality of head image position APs in the vicinity of the representative position RP, and the reference point Pr corresponding to each of the representative position RP and the plurality of head image position APs. And link. At this time, the extraction unit 31 weights the reference point Pr so that the closer the representative position RP and the head image position AP are, the larger the weight is. Then, the weight distribution of the plurality of reference points Pr is climbed, and the identification processing target image region TRg is extracted from the reference points Pr having the weight closest to the maximum point of the weight.

次に、図１３を参照し、コントローラ３０の抽出部３１が識別処理対象画像を抽出する処理（以下、「画像抽出処理」とする。）の一例について説明する。なお、図１３は、画像抽出処理の一例の流れを示すフローチャートである。抽出部３１は、例えば、撮像画像を取得する度にこの画像抽出処理を実行する。 Next, with reference to FIG. 13, an example of a process (hereinafter referred to as “image extraction process”) in which the extraction unit 31 of the controller 30 extracts the image to be identified will be described. Note that FIG. 13 is a flowchart showing the flow of an example of the image extraction process. The extraction unit 31 executes this image extraction process every time a captured image is acquired, for example.

最初に、抽出部３１は、撮像画像内でヘルメット画像を探索する（ステップＳＴ１）。本実施例では、抽出部３１は、前段画像認識処理により後方カメラ４０Ｂの撮像画像をラスタスキャンしてヘルメット画像を見つけ出す。 First, the extraction unit 31 searches for a helmet image in the captured image (step ST1). In this embodiment, the extraction unit 31 finds out the helmet image by raster-scanning the captured image of the rear camera 40B by the front-stage image recognition process.

撮像画像でヘルメット画像ＨＲｇを見つけ出した場合（ステップＳＴ１のＹＥＳ）、抽出部３１は、ヘルメット画像ＨＲｇの代表位置ＲＰを取得する（ステップＳＴ２）。 When the helmet image HRg is found in the captured image (YES in step ST1), the extraction unit 31 acquires the representative position RP of the helmet image HRg (step ST2).

その後、抽出部３１は、取得した代表位置ＲＰの最近傍にある頭部画像位置ＡＰを取得する（ステップＳＴ３）。 After that, the extraction unit 31 acquires the head image position AP closest to the acquired representative position RP (step ST3).

その後、抽出部３１は、取得した頭部画像位置ＡＰに対応する識別処理対象画像を抽出する（ステップＳＴ４）。本実施例では、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像における頭部画像位置ＡＰ、実空間における仮想頭部位置ＨＰ、実空間における人の想定立ち位置としての参照点Ｐｒ、及び、実空間における仮想平面領域ＴＲの対応関係を辿って識別処理対象画像を抽出する。 After that, the extraction unit 31 extracts the identification processing target image corresponding to the acquired head image position AP (step ST4). In this embodiment, the extraction unit 31 utilizes the geometric relationship shown in FIG. 11 as a head image position AP in the captured image, a virtual head position HP in the real space, and an assumed standing position of a person in the real space. The identification processing target image is extracted by tracing the correspondence between the reference point Pr and the virtual plane region TR in the real space.

なお、抽出部３１は、撮像画像でヘルメット画像ＨＲｇを見つけ出さなかった場合には（ステップＳＴ１のＮＯ）、識別処理対象画像を抽出することなく、処理をステップＳＴ５に移行させる。 If the helmet image HRg is not found in the captured image (NO in step ST1), the extraction unit 31 shifts the processing to step ST5 without extracting the identification processing target image.

その後、抽出部３１は、撮像画像の全体にわたってヘルメット画像を探索したかを判定する（ステップＳＴ５）。 After that, the extraction unit 31 determines whether or not the helmet image has been searched for the entire captured image (step ST5).

撮像画像の全体を未だ探索していないと判定した場合（ステップＳＴ５のＮＯ）、抽出部３１は、撮像画像の別の領域に対し、ステップＳＴ１～ステップＳＴ４の処理を実行する。 When it is determined that the entire captured image has not been searched yet (NO in step ST5), the extraction unit 31 executes the processes of steps ST1 to ST4 for another region of the captured image.

一方、撮像画像の全体にわたるヘルメット画像の探索を完了したと判定した場合（ステップＳＴ５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that the search for the helmet image over the entire captured image is completed (YES in step ST5), the extraction unit 31 ends the image extraction process this time.

このように、抽出部３１は、最初にヘルメット画像ＨＲｇを見つけ出し、見つけ出したヘルメット画像ＨＲｇの代表位置ＲＰから、頭部画像位置ＡＰ、仮想頭部位置ＨＰ、参照点（想定立ち位置）Ｐｒ、仮想平面領域ＴＲを経て識別処理対象画像領域ＴＲｇを特定する。そして、特定した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を抽出して正規化することで、所定サイズの正規化画像ＴＲｇｔを生成できる。 In this way, the extraction unit 31 first finds the helmet image HRg, and from the representative position RP of the found helmet image HRg, the head image position AP, the virtual head position HP, the reference point (assumed standing position) Pr, and the virtual The image region TRg to be identified for identification processing is specified via the plane region TR. Then, by extracting and normalizing the identification processing target image having the specified identification processing target image region TRg, a normalized image TRgt of a predetermined size can be generated.

次に、図１４を参照し、画像抽出処理の別の一例について説明する。なお、図１４は、画像抽出処理の別の一例の流れを示すフローチャートである。 Next, another example of the image extraction process will be described with reference to FIG. Note that FIG. 14 is a flowchart showing the flow of another example of the image extraction process.

最初に、抽出部３１は、頭部画像位置ＡＰの１つを取得する（ステップＳＴ１１）。その後、抽出部３１は、その頭部画像位置ＡＰに対応するヘルメット画像領域を取得する（ステップＳＴ１２）。本実施例では、ヘルメット画像領域は、頭部画像位置ＡＰのそれぞれについて予め設定された所定サイズの画像領域である。 First, the extraction unit 31 acquires one of the head image position APs (step ST11). After that, the extraction unit 31 acquires a helmet image area corresponding to the head image position AP (step ST12). In this embodiment, the helmet image area is an image area of a predetermined size preset for each of the head image position APs.

その後、抽出部３１は、ヘルメット画像領域内でヘルメット画像を探索する（ステップＳＴ１３）。本実施例では、抽出部３１は、前段画像認識処理によりヘルメット画像領域内をラスタスキャンしてヘルメット画像を見つけ出す。 After that, the extraction unit 31 searches for a helmet image in the helmet image area (step ST13). In this embodiment, the extraction unit 31 finds the helmet image by raster scanning the inside of the helmet image area by the pre-stage image recognition process.

ヘルメット画像領域内でヘルメット画像ＨＲｇを見つけ出した場合（ステップＳＴ１３のＹＥＳ）、抽出部３１は、そのときの頭部画像位置ＡＰに対応する識別処理対象画像を抽出する（ステップＳＴ１４）。本実施例では、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像における頭部画像位置ＡＰ、実空間における仮想頭部位置ＨＰ、実空間における人の想定立ち位置としての参照点Ｐｒ、及び、実空間における仮想平面領域ＴＲの対応関係を辿って識別処理対象画像を抽出する。 When the helmet image HRg is found in the helmet image area (YES in step ST13), the extraction unit 31 extracts the identification processing target image corresponding to the head image position AP at that time (step ST14). In this embodiment, the extraction unit 31 utilizes the geometric relationship shown in FIG. 11 as a head image position AP in the captured image, a virtual head position HP in the real space, and an assumed standing position of a person in the real space. The identification processing target image is extracted by tracing the correspondence between the reference point Pr and the virtual plane region TR in the real space.

なお、抽出部３１は、ヘルメット画像領域内でヘルメット画像ＨＲｇを見つけ出さなかった場合には（ステップＳＴ１３のＮＯ）、識別処理対象画像を抽出することなく、処理をステップＳＴ１５に移行させる。 If the helmet image HRg is not found in the helmet image area (NO in step ST13), the extraction unit 31 shifts the processing to step ST15 without extracting the identification processing target image.

その後、抽出部３１は、全ての頭部画像位置ＡＰを取得したかを判定する（ステップＳＴ１５）。そして、全ての頭部画像位置ＡＰを未だ取得していないと判定した場合（ステップＳＴ１５のＮＯ）、抽出部３１は、未取得の別の頭部画像位置ＡＰを取得し、ステップＳＴ１１～ステップＳＴ１４の処理を実行する。一方、全ての頭部画像位置ＡＰを取得し終わったと判定した場合（ステップＳＴ１５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 After that, the extraction unit 31 determines whether or not all the head image position APs have been acquired (step ST15). Then, when it is determined that all the head image position APs have not been acquired yet (NO in step ST15), the extraction unit 31 acquires another head image position AP that has not been acquired, and steps ST11 to ST14. Executes the processing of. On the other hand, when it is determined that all the head image position APs have been acquired (YES in step ST15), the extraction unit 31 ends the image extraction process this time.

このように、抽出部３１は、最初に頭部画像位置ＡＰの１つを取得し、取得した頭部画像位置ＡＰに対応するヘルメット画像領域でヘルメット画像ＨＲｇを見つけ出した場合に、そのときの頭部画像位置ＡＰから、仮想頭部位置ＨＰ、参照点（想定立ち位置）Ｐｒ、仮想平面領域ＴＲを経て、識別処理対象画像領域ＴＲｇを特定する。そして、特定した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を抽出して正規化することで、所定サイズの正規化画像ＴＲｇｔを生成できる。 As described above, when the extraction unit 31 first acquires one of the head image position APs and finds the helmet image HRg in the helmet image area corresponding to the acquired head image position AP, the head at that time. The identification processing target image area TRg is specified from the part image position AP via the virtual head position HP, the reference point (assumed standing position) Pr, and the virtual plane area TR. Then, by extracting and normalizing the identification processing target image having the specified identification processing target image region TRg, a normalized image TRgt of a predetermined size can be generated.

以上の構成により、周辺監視システム１００の抽出部３１は、撮像画像における特徴画像としてのヘルメット画像を見つけ出し、そのヘルメット画像の代表位置ＲＰと所定画像位置としての頭部画像位置ＡＰの１つとを対応付けることで識別処理対象画像を抽出する。そのため、簡易なシステム構成で後段画像認識処理の対象となる画像部分を絞り込むことができる。 With the above configuration, the extraction unit 31 of the peripheral monitoring system 100 finds a helmet image as a feature image in the captured image, and associates the representative position RP of the helmet image with one of the head image position APs as a predetermined image position. By doing so, the image to be identified is extracted. Therefore, it is possible to narrow down the image portion to be the target of the subsequent image recognition processing with a simple system configuration.

なお、抽出部３１は、最初に撮像画像からヘルメット画像ＨＲｇを見つけ出し、そのヘルメット画像ＨＲｇの代表位置ＲＰに対応する頭部画像位置ＡＰの１つを導き出し、その頭部画像位置ＡＰの１つに対応する識別処理対象画像を抽出してもよい。或いは、抽出部３１は、最初に頭部画像位置ＡＰの１つを取得し、その頭部画像位置ＡＰの１つに対応する特徴画像の位置を含む所定領域であるヘルメット画像領域内にヘルメット画像が存在する場合に、その頭部画像位置ＡＰの１つに対応する識別処理対象画像を抽出してもよい。 The extraction unit 31 first finds the helmet image HRg from the captured image, derives one of the head image position APs corresponding to the representative position RP of the helmet image HRg, and assigns it to one of the head image position APs. The corresponding identification processing target image may be extracted. Alternatively, the extraction unit 31 first acquires one of the head image position APs, and the helmet image is within the helmet image area which is a predetermined area including the position of the feature image corresponding to one of the head image position APs. If is present, the identification processing target image corresponding to one of the head image position APs may be extracted.

また、抽出部３１は、図１１に示すような所定の幾何学的関係を利用し、撮像画像におけるヘルメット画像の代表位置ＲＰから識別処理対象画像を抽出してもよい。この場合、所定の幾何学的関係は、撮像画像における識別処理対象画像領域ＴＲｇと、識別処理対象画像領域ＴＲｇに対応する実空間における仮想平面領域ＴＲと、仮想平面領域ＴＲに対応する実空間における参照点Ｐｒ（人の想定立ち位置）と、参照点Ｐｒに対応する仮想頭部位置ＨＰ（人の想定立ち位置に対応する人の特徴的な部分の実空間における位置である仮想特徴位置）と、仮想頭部位置ＨＰに対応する撮像画像における頭部画像位置ＡＰ（仮想特徴位置に対応する撮像画像における所定画像位置）との幾何学的関係を表す。 Further, the extraction unit 31 may extract the identification processing target image from the representative position RP of the helmet image in the captured image by utilizing a predetermined geometric relationship as shown in FIG. In this case, the predetermined geometric relationship is the identification processing target image region TRg in the captured image, the virtual plane region TR in the real space corresponding to the identification processing target image region TRg, and the real space corresponding to the virtual plane region TR. Reference point Pr (assumed standing position of a person) and virtual head position HP corresponding to reference point Pr (virtual feature position which is a position in real space of a characteristic part of a person corresponding to a person's assumed standing position) , Represents the geometric relationship with the head image position AP (predetermined image position in the captured image corresponding to the virtual feature position) in the captured image corresponding to the virtual head position HP.

次に、図１５を参照し、画像抽出処理の更に別の一例について説明する。なお、図１５は、画像抽出処理の更に別の一例の流れを示すフローチャートである。図１５の画像抽出処理は、正規化画像を生成した後でその正規化画像に特徴画像としてのヘルメット画像が含まれるか否かを判定する点で図１３及び図１４のそれぞれにおける画像抽出処理と異なる。図１３及び図１４のそれぞれにおける画像抽出処理はヘルメット画像を見つけた後でそのヘルメット画像を含む画像部分を正規化するためである。 Next, still another example of the image extraction process will be described with reference to FIG. Note that FIG. 15 is a flowchart showing the flow of still another example of the image extraction process. The image extraction process of FIG. 15 is the same as the image extraction process of FIGS. 13 and 14, in that after the normalized image is generated, it is determined whether or not the normalized image includes a helmet image as a feature image. different. The image extraction process in each of FIGS. 13 and 14 is for normalizing the image portion including the helmet image after finding the helmet image.

本実施例では、抽出部３１は、撮像画像における複数の所定画像部分のそれぞれを正規化して複数の正規化画像を生成し、それら正規化画像のうちヘルメット画像を含む正規化画像を識別処理対象画像として抽出する。複数の所定画像部分は、例えば、撮像画像上に予め定められた複数の識別処理対象画像領域ＴＲｇである。識別処理対象画像領域ＴＲｇ（図６参照。）は、実空間における仮想平面領域ＴＲに対応し、仮想平面領域ＴＲは実空間における参照点Ｐｒに対応する。そして、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。 In this embodiment, the extraction unit 31 normalizes each of a plurality of predetermined image portions in the captured image to generate a plurality of normalized images, and among the normalized images, the normalized image including the helmet image is identified. Extract as an image. The plurality of predetermined image portions are, for example, a plurality of identification processing target image regions TRg predetermined on the captured image. The identification processing target image region TRg (see FIG. 6) corresponds to the virtual plane region TR in the real space, and the virtual plane region TR corresponds to the reference point Pr in the real space. Then, the identification unit 32 identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image.

最初に、抽出部３１は所定画像部分の１つから正規化画像を生成する（ステップＳＴ２１）。本実施例では、後方カメラ４０Ｂの撮像画像における予め設定された参照点Ｐｒの１つに対応する識別処理対象画像領域ＴＲｇの１つを所定画像部分とし、その所定画像部分を射影変換によって所定サイズの長方形画像に変換する。 First, the extraction unit 31 generates a normalized image from one of the predetermined image portions (step ST21). In this embodiment, one of the identification processing target image regions TRg corresponding to one of the preset reference points Pr in the captured image of the rear camera 40B is set as a predetermined image portion, and the predetermined image portion is set to a predetermined size by projective transformation. Convert to a rectangular image of.

その後、抽出部３１は、生成した正規化画像にヘルメット画像が含まれるか否かを判定する（ステップＳＴ２２）。本実施例では、抽出部３１は、前段画像認識処理によりその正規化画像をラスタスキャンしてヘルメット画像を見つけ出す。 After that, the extraction unit 31 determines whether or not the generated normalized image includes the helmet image (step ST22). In this embodiment, the extraction unit 31 raster scans the normalized image by the pre-stage image recognition process to find the helmet image.

正規化画像にヘルメット画像が含まれると判定した場合（ステップＳＴ２２のＹＥＳ）、抽出部３１は、その正規化画像を識別処理対象画像として抽出する（ステップＳＴ２３）。このとき、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。すなわち、識別部３２は、抽出部３１による次の正規化が行われる前に、ヘルメット画像を含むと判定された正規化画像に対し、その正規化画像に含まれる画像が人画像であるかを識別する。そのため、周辺監視システム１００は、１つの正規化画像を記憶できるメモリ容量を有していれば撮像画像全体に対する画像抽出処理及び識別処理を実行できる。但し、識別部３２は、抽出部３１が複数の正規化画像を識別処理対象画像として抽出した後で、それら複数の識別処理対象画像のそれぞれに含まれる人候補画像が人画像であるかを識別してもよい。 When it is determined that the normalized image includes the helmet image (YES in step ST22), the extraction unit 31 extracts the normalized image as the identification processing target image (step ST23). At this time, the identification unit 32 identifies whether the person candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. That is, the identification unit 32 determines whether the image included in the normalized image is a human image with respect to the normalized image determined to include the helmet image before the next normalization by the extraction unit 31 is performed. Identify. Therefore, the peripheral monitoring system 100 can execute the image extraction process and the identification process for the entire captured image as long as it has a memory capacity capable of storing one normalized image. However, after the extraction unit 31 extracts a plurality of normalized images as identification processing target images, the identification unit 32 identifies whether the person candidate image included in each of the plurality of identification processing target images is a human image. You may.

正規化画像にヘルメット画像が含まれないと判定した場合（ステップＳＴ２２のＮＯ）、抽出部３１は、その正規化画像を識別処理対象画像として抽出することなく処理を進める。 When it is determined that the normalized image does not include the helmet image (NO in step ST22), the extraction unit 31 proceeds with the process without extracting the normalized image as the identification processing target image.

その後、抽出部３１は、全ての所定画像部分から正規化画像を生成したか否かを判定する（ステップＳＴ２４）。本実施例では、後方カメラ４０Ｂの撮像画像における参照点Ｐｒのそれぞれに対応する識別処理対象画像領域ＴＲｇである所定画像部分の全てから正規化画像を生成したか否かを判定する。 After that, the extraction unit 31 determines whether or not the normalized image is generated from all the predetermined image portions (step ST24). In this embodiment, it is determined whether or not the normalized image is generated from all of the predetermined image portions which are the identification processing target image regions TRg corresponding to each of the reference points Pr in the captured image of the rear camera 40B.

全ての所定画像部分から正規化画像を生成していないと判定した場合（ステップＳＴ２４のＮＯ）、抽出部３１は、別の所定画像部分に対し、ステップＳＴ２１～ステップＳＴ２３の処理を実行する。 When it is determined that the normalized image is not generated from all the predetermined image portions (NO in step ST24), the extraction unit 31 executes the processes of steps ST21 to ST23 for the other predetermined image portions.

一方、全ての所定画像部分から正規化画像を生成したと判定した場合（ステップＳＴ２４のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that the normalized image is generated from all the predetermined image portions (YES in step ST24), the extraction unit 31 ends the image extraction process this time.

図１５の例では、周辺監視システム１００は、１つの正規化画像を生成した段階でその正規化画像にヘルメット画像が含まれるか否かを判定する。但し、複数の正規化画像を生成した段階でそれら複数の正規化画像のそれぞれにヘルメット画像が含まれるか否かを纏めて判定してもよい。また、全ての正規化画像を生成した段階でそれら全ての正規化画像のそれぞれにヘルメット画像が含まれるか否かを纏めて判定してもよい。 In the example of FIG. 15, the peripheral monitoring system 100 determines whether or not the helmet image is included in the normalized image at the stage when one normalized image is generated. However, at the stage when a plurality of normalized images are generated, it may be collectively determined whether or not each of the plurality of normalized images includes a helmet image. Further, at the stage where all the normalized images are generated, it may be collectively determined whether or not the helmet image is included in each of all the normalized images.

このような画像抽出処理により、周辺監視システム１００は、人検知処理の際のメモリアクセス回数を低減させることができる。ヘルメット画像を見つけ出す処理と識別処理対象画像を抽出する処理を同時に実行できるためである。 By such an image extraction process, the peripheral monitoring system 100 can reduce the number of memory accesses during the human detection process. This is because the process of finding the helmet image and the process of extracting the identification process target image can be executed at the same time.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、図１３又は図１４の画像抽出処理を採用する場合に比べ、メモリ容量を低減させることができる。図１３又は図１４の画像抽出処理は、少なくとも、正規化される前の画像部分（正規化画像より大きい画像）を記憶できるだけのメモリ容量が必要であるが、図１５の画像抽出処理は正規化画像を最初に生成するため正規化画像を記憶できるだけのメモリ容量があれば実行可能なためである。 Further, the peripheral monitoring system 100 that employs the image extraction process of FIG. 15 can reduce the memory capacity as compared with the case of adopting the image extraction process of FIG. 13 or FIG. The image extraction process of FIG. 13 or FIG. 14 requires at least a memory capacity capable of storing an image portion before normalization (an image larger than the normalized image), but the image extraction process of FIG. 15 is normalized. This is because it can be executed if there is enough memory capacity to store the normalized image because the image is generated first.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、正規化画像にヘルメット画像が含まれるか否かを判定する。そのため、図１３又は図１４の画像抽出処理のように正規化される前の画像にヘルメット画像が含まれるか否かを判定する構成に比べ、ヘルメット画像が含まれるか否かの判定に要する処理時間を短縮できる。探索対象となるヘルメット画像のサイズ及び歪みのばらつきを小さくできるためである。その結果、識別処理対象画像の抽出に要する時間を短縮できる。また、探索対象となるヘルメット画像のサイズ及び歪みのばらつきを小さくできるため、ヘルメット画像が含まれるか否かの判定の精度を高めることができる。 Further, the peripheral monitoring system 100 that employs the image extraction process of FIG. 15 determines whether or not the normalized image includes the helmet image. Therefore, the process required for determining whether or not the helmet image is included, as compared with the configuration for determining whether or not the helmet image is included in the image before normalization as in the image extraction process of FIG. 13 or FIG. You can save time. This is because the variation in the size and distortion of the helmet image to be searched can be reduced. As a result, the time required for extracting the image to be identified can be shortened. Further, since the variation in the size and distortion of the helmet image to be searched can be reduced, the accuracy of determining whether or not the helmet image is included can be improved.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、人の足位置に対する頭部位置のバラツキが大きい場合（例えば屈んでいる人の画像が撮像画像に含まれる場合）であっても、より確実な人検知を実現できる。正規化画像にヘルメット画像が含まれるか否かを判定する構成を採用しているためである。すなわち、図１３又は図１４の画像抽出処理のようにヘルメット画像の位置に関連する頭部画像位置を用いて識別処理対象画像領域を特定する構成ではなく、正規化の対象となる所定画像部分としての識別処理対象画像領域が頭部画像位置とは無関係に設定されるためである。 Further, the peripheral monitoring system 100 that employs the image extraction process of FIG. 15 has a large variation in the head position with respect to the foot position of the person (for example, when the image of the person who is crouching is included in the captured image). , More reliable person detection can be realized. This is because the configuration for determining whether or not the helmet image is included in the normalized image is adopted. That is, it is not a configuration for specifying the identification processing target image area using the head image position related to the position of the helmet image as in the image extraction processing of FIG. 13 or FIG. 14, but as a predetermined image portion to be normalized. This is because the identification processing target image area is set regardless of the head image position.

次に、図１６を参照し、画像抽出処理の更に別の一例について説明する。なお、図１６は画像抽出処理の更に別の一例の流れを示すフローチャートである。図１６の画像抽出処理は、所定画像部分の一部を正規化した段階でその部分的に正規化された画像（以下、「部分正規化画像」とする。）に特徴画像としてのヘルメット画像が含まれるか否かを判定する点で図１５の画像抽出処理と異なる。図１５の画像抽出処理は、１つの所定画像部分の全部を正規化した段階でその正規化画像にヘルメット画像が含まれるか否かを判定するためである。 Next, still another example of the image extraction process will be described with reference to FIG. Note that FIG. 16 is a flowchart showing the flow of still another example of the image extraction process. In the image extraction process of FIG. 16, a helmet image as a feature image is added to the partially normalized image (hereinafter referred to as “partially normalized image”) at the stage where a part of the predetermined image portion is normalized. It differs from the image extraction process of FIG. 15 in that it determines whether or not it is included. The image extraction process of FIG. 15 is for determining whether or not a helmet image is included in the normalized image at the stage where all of one predetermined image portion is normalized.

最初に、抽出部３１は、所定画像部分の一部から部分正規化画像を生成する（ステップＳＴ３１）。そして、抽出部３１は、部分正規化画像を生成した段階でその部分正規化画像にヘルメット画像が含まれるか否かを判定する（ステップＳＴ３２）。本実施例では、抽出部３１は、正規化画像の上半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成した段階で前段画像認識処理によりその部分正規化画像をラスタスキャンしてヘルメット画像を見つけ出す。 First, the extraction unit 31 generates a partially normalized image from a part of a predetermined image portion (step ST31). Then, the extraction unit 31 determines whether or not the helmet image is included in the partially normalized image at the stage of generating the partially normalized image (step ST32). In this embodiment, the extraction unit 31 rasterizes the partially normalized image by the pre-stage image recognition process at the stage where a part of the predetermined image portion corresponding to the upper half of the normalized image is normalized to generate the partially normalized image. Scan to find the helmet image.

部分正規化画像にヘルメット画像が含まれると判定した場合（ステップＳＴ３２のＹＥＳ）、抽出部３１は、所定画像部分の残りの部分を正規化して正規化画像を生成する（ステップＳＴ３３）。本実施例では、抽出部３１は、正規化画像の下半分に対応する所定画像部分の残りの部分を正規化して正規化画像を生成する。そして、抽出部３１は、その正規化画像を識別処理対象画像として抽出する（ステップＳＴ３４）。このとき、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。但し、識別部３２は、抽出部３１が複数の正規化画像を識別処理対象画像として抽出した後で、それら複数の識別処理対象画像のそれぞれに含まれる人候補画像が人画像であるかを識別してもよい。 When it is determined that the partially normalized image includes the helmet image (YES in step ST32), the extraction unit 31 normalizes the remaining portion of the predetermined image portion to generate a normalized image (step ST33). In this embodiment, the extraction unit 31 normalizes the remaining portion of the predetermined image portion corresponding to the lower half of the normalized image to generate a normalized image. Then, the extraction unit 31 extracts the normalized image as an image to be identified (step ST34). At this time, the identification unit 32 identifies whether the person candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. However, after the extraction unit 31 extracts a plurality of normalized images as identification processing target images, the identification unit 32 identifies whether the person candidate image included in each of the plurality of identification processing target images is a human image. You may.

部分正規化画像にヘルメット画像が含まれないと判定した場合（ステップＳＴ３２のＮＯ）、抽出部３１は、所定画像部分の残りの部分を正規化して正規化画像を生成することなく処理を進める。 When it is determined that the helmet image is not included in the partially normalized image (NO in step ST32), the extraction unit 31 normalizes the remaining portion of the predetermined image portion and proceeds with the process without generating the normalized image.

その後、抽出部３１は、全ての所定画像部分から部分正規化画像を生成したか否かを判定する（ステップＳＴ３５）。本実施例では、後方カメラ４０Ｂの撮像画像における参照点Ｐｒのそれぞれに対応する識別処理対象画像領域ＴＲｇである所定画像部分の全てから部分正規化画像を生成したか否かを判定する。 After that, the extraction unit 31 determines whether or not a partially normalized image is generated from all the predetermined image portions (step ST35). In this embodiment, it is determined whether or not a partially normalized image is generated from all of the predetermined image portions which are the identification processing target image regions TRg corresponding to each of the reference points Pr in the captured image of the rear camera 40B.

全ての所定画像部分から部分正規化画像を生成していないと判定した場合（ステップＳＴ３５のＮＯ）、抽出部３１は、別の所定画像部分に対し、ステップＳＴ３１～ステップＳＴ３４の処理を実行する。 When it is determined that the partially normalized image is not generated from all the predetermined image portions (NO in step ST35), the extraction unit 31 executes the processes of steps ST31 to ST34 for the other predetermined image portions.

一方、全ての所定画像部分から部分正規化画像を生成したと判定した場合（ステップＳＴ３５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that the partially normalized image is generated from all the predetermined image portions (YES in step ST35), the extraction unit 31 ends the image extraction process this time.

図１６の例では、周辺監視システム１００は、正規化画像の上半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成する。ヘルメット画像は正規化画像の上半分（人の上半身に対応する部分）に含まれる蓋然性が高いためである。但し、正規化画像の上半分より小さい部分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。例えば、正規化画像の上の１／３にあたる部分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。或いは、周辺監視システム１００は、正規化画像の別の一部を正規化して部分正規化画像を生成してもよい。例えば、特徴画像としての肩の画像を探索する場合、正規化画像の右半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよく、特徴画像としてのマーカの画像を探索する場合、正規化画像の中央半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。 In the example of FIG. 16, the peripheral monitoring system 100 normalizes a part of a predetermined image portion corresponding to the upper half of the normalized image to generate a partially normalized image. This is because the helmet image is highly likely to be included in the upper half of the normalized image (the part corresponding to the upper body of a person). However, a partially normalized image may be generated by normalizing a part of a predetermined image portion corresponding to a portion smaller than the upper half of the normalized image. For example, a partially normalized image may be generated by normalizing a part of a predetermined image portion corresponding to a portion corresponding to the upper 1/3 of the normalized image. Alternatively, the peripheral monitoring system 100 may normalize another part of the normalized image to generate a partially normalized image. For example, when searching for a shoulder image as a feature image, a part of a predetermined image portion corresponding to the right half of the normalized image may be normalized to generate a partially normalized image, and a marker as a feature image may be generated. When searching for an image, a part of a predetermined image portion corresponding to the central half of the normalized image may be normalized to generate a partially normalized image.

このような画像抽出処理により、図１６の画像抽出処理を採用する周辺監視システム１００は、図１５の画像抽出処理を採用する場合と同様の効果を実現できる。 By such an image extraction process, the peripheral monitoring system 100 that adopts the image extraction process of FIG. 16 can realize the same effect as the case of adopting the image extraction process of FIG.

また、図１６の画像抽出処理を採用する周辺監視システム１００は、図１５の画像抽出処理を採用する場合に比べ、人検知処理に要する処理時間を更に短縮できる。部分正規化画像にヘルメット画像が含まれないと判定した場合、その所定画像部分の残りの部分を正規化することなく、次の所定画像部分の正規化を開始できるためである。 Further, the peripheral monitoring system 100 that employs the image extraction process of FIG. 16 can further shorten the processing time required for the human detection process as compared with the case of adopting the image extraction process of FIG. This is because when it is determined that the helmet image is not included in the partially normalized image, the normalization of the next predetermined image portion can be started without normalizing the remaining portion of the predetermined image portion.

また、図１３～図１６のそれぞれにおける画像抽出処理において、抽出部３１は、前段画像認識処理によって、撮像画像の所定画像部分におけるヘルメット画像（厳密にはヘルメットであると推定できる画像）を見つけ出す。そして、抽出部３１は、見つけ出したヘルメット画像に対応する識別処理対象画像を抽出する。 Further, in the image extraction processing in each of FIGS. 13 to 16, the extraction unit 31 finds a helmet image (strictly speaking, an image that can be presumed to be a helmet) in a predetermined image portion of the captured image by the pre-stage image recognition processing. Then, the extraction unit 31 extracts the identification processing target image corresponding to the found helmet image.

しかしながら、背景画像が複雑な場合、抽出部３１は、ヘルメットの画像ではない画像をヘルメット画像であると誤って認識してしまうことがある。また、ヘルメットの画像をヘルメット画像として認識できずに見逃してしまうことがある。すなわち、抽出部３１は、撮像画像の内容によっては識別処理対象画像の抽出結果の信頼性を低下させてしまう場合がある。ここでのヘルメット画像は、人候補画像として考えることができる。すなわち、人検知結果の信頼性を低下させる場面である。抽出結果の信頼性を低下させてしまう撮像画像は、例えば、夜間環境、逆光環境、日陰環境等で撮像された撮像画像、カメラのレンズが汚れた状態で撮像された撮像画像、フレア、被写体ブレ等のある撮像画像を含む。 However, when the background image is complicated, the extraction unit 31 may mistakenly recognize an image that is not a helmet image as a helmet image. In addition, the helmet image may not be recognized as a helmet image and may be overlooked. That is, the extraction unit 31 may reduce the reliability of the extraction result of the identification processing target image depending on the content of the captured image. The helmet image here can be considered as a person candidate image. That is, it is a scene where the reliability of the human detection result is lowered. The captured images that reduce the reliability of the extraction results are, for example, captured images captured in a night environment, backlight environment, shade environment, etc., captured images captured with the camera lens dirty, flare, and subject blur. Includes captured images such as.

そこで、コントローラ３０は、抽出部３１による識別処理対象画像の抽出結果の信頼性が低い状態を人検知に不適な状態（以下、「人検知不適状態」とする。）として検知できるように構成されてもよい。人検知不適状態の検知に応じて種々の対応をとることができるようにするためである。例えば、人検知不適状態が発生していることをショベルの操作者に通知できるようにするためである。或いは、人検知不適状態が発生しているときの画像特徴等を記録して周辺監視システムの改良に有効なデータを選択的に収集できるようにするためである。或いは、人検知不適状態の検知に応じて周辺監視システム１００の動作内容を変更できるようにするためである。周辺監視システム１００の動作内容の変更は、例えば、前段画像認識処理で用いられる各種パラメータの変更、後段画像認識処理で用いられる各種パラメータの変更等を含む。 Therefore, the controller 30 is configured to detect a state in which the reliability of the extraction result of the image to be identified by the extraction unit 31 is low as a state unsuitable for human detection (hereinafter referred to as "human detection unsuitable state"). You may. This is to enable various measures to be taken according to the detection of an unsuitable state for human detection. For example, this is to enable the shovel operator to be notified that a human detection inappropriate state has occurred. Alternatively, it is for recording image features and the like when a human detection unsuitable state occurs so that data effective for improving the peripheral monitoring system can be selectively collected. Alternatively, it is possible to change the operation content of the peripheral monitoring system 100 according to the detection of the unsuitable state for human detection. The change of the operation content of the peripheral monitoring system 100 includes, for example, a change of various parameters used in the first-stage image recognition process, a change of various parameters used in the second-stage image recognition process, and the like.

例えば、コントローラ３０は、所定の評価式を用いて各所定画像部分に関する評価値を算出し、その評価値の分布に基づいて撮像画像が人検知不適状態であるか否かを判定してもよい。具体的には、コントローラ３０は、ヘルメット度の分布に基づいて人検知不適状態が発生しているか否かを判定してもよい。ここでの評価値は、ヘルメット画像、すなわち人候補画像における人らしさの度合い又はその度合いを示すレベルとして捉えることもできる。この評価値には大小の差異があると考えてもよい。 For example, the controller 30 may calculate an evaluation value for each predetermined image portion using a predetermined evaluation formula, and determine whether or not the captured image is in an unsuitable state for human detection based on the distribution of the evaluation value. .. Specifically, the controller 30 may determine whether or not a human detection unsuitable state has occurred based on the distribution of the helmet degree. The evaluation value here can also be regarded as a level indicating the degree of humanity or the degree of humanity in the helmet image, that is, the human candidate image. It may be considered that there is a difference in the evaluation value.

「ヘルメット度」は、評価値の一例であり、撮像画像の所定画像部分における判定対象画像のヘルメットらしさを表す値である。探索対象がヘルメットではなく頭である場合には頭らしさを表す頭度が評価値の別の一例として用いられる。 The "helmet degree" is an example of an evaluation value, and is a value representing the helmet-likeness of the determination target image in a predetermined image portion of the captured image. When the search target is not a helmet but a head, the head degree, which indicates the headiness, is used as another example of the evaluation value.

「判定対象画像」は、特徴画像（ヘルメット画像）であるか否かの判定対象となった画像を意味する。図１３～図１６のそれぞれにおける画像抽出処理での「ヘルメット画像」は、判定対象画像のうち、所定の採否判定値（例えばゼロである。）以上のヘルメット度を有するものを意味する。或いは、所定の採否判定値以上のヘルメット度を有する判定対象画像をヘルメット度が大きい順に並べたときの上位の所定数のものであってもよい。なお、判定対象画像は、採否判定値未満のヘルメット度を有する画像（ヘルメット画像以外の画像）を含む。仕様として選択可能な所定ヘルメット度が採否判定値として設定される。採否判定値は、抽出部３１が取得した判定対象画像のヘルメット度と比較される。 The “judgment target image” means an image that has been determined as to whether or not it is a feature image (helmet image). The “helmet image” in the image extraction process in each of FIGS. 13 to 16 means an image having a helmet degree equal to or higher than a predetermined acceptance / rejection determination value (for example, zero) among the images to be determined. Alternatively, it may be a higher predetermined number when the determination target images having a helmet degree equal to or higher than the predetermined acceptance / rejection determination value are arranged in descending order of the helmet degree. The determination target image includes an image having a helmet degree less than the acceptance / rejection determination value (an image other than the helmet image). A predetermined helmet degree that can be selected as a specification is set as an acceptance / rejection determination value. The acceptance / rejection determination value is compared with the helmet degree of the determination target image acquired by the extraction unit 31.

採否判定値は、判定対象画像をヘルメット画像として採用するか否かを判定するための値である。判定対象画像は、例えば、評価値としてのヘルメット度が採否判定値以上であればヘルメット画像として採用される。採否判定値は、例えば、機械学習、統計分析等により予め設定される。採否判定値は、夜間環境、逆光環境、日陰環境等のショベルの作業環境毎に別々に設定されていてもよい。ショベルの作業環境は、例えば、キャビン１０内の入力装置４２を用いて入力されてもよい。 The acceptance / rejection determination value is a value for determining whether or not the determination target image is adopted as the helmet image. The determination target image is adopted as a helmet image, for example, if the helmet degree as an evaluation value is equal to or higher than the acceptance / rejection determination value. The acceptance / rejection determination value is set in advance by, for example, machine learning, statistical analysis, or the like. The acceptance / rejection determination value may be set separately for each excavator work environment such as a night environment, a backlight environment, and a shade environment. The working environment of the excavator may be input using, for example, the input device 42 in the cabin 10.

ヘルメット度は、例えば、後段画像認識処理で用いられるＨＯＧ特徴量の算出よりも低い演算コストで算出される画像特徴量に基づいて導き出される。例えば、ヘルメット度は、ＨＯＧ特徴量の次元数よりも低い次元数の画像特徴量をランダムフォレスト等の機械学習アルゴリズムに入力して重回帰分析を行うことで導き出される。ヘルメット度は、例えば、－１～＋１の間の実数として導き出され、＋１に近いほどヘルメットらしさが強く、－１に近いほどヘルメットらしさが弱いことを表す。「頭画像」に関する頭らしさを表す「頭度」等の他の特徴画像に関する特徴画像らしさを表す評価値についても同様である。 The helmet degree is derived, for example, based on the image feature amount calculated at a lower calculation cost than the calculation of the HOG feature amount used in the subsequent image recognition process. For example, the helmet degree is derived by inputting an image feature quantity having a dimension number lower than the dimension number of the HOG feature quantity into a machine learning algorithm such as Random Forest and performing multiple regression analysis. The helmet degree is derived, for example, as a real number between -1 and +1. The closer it is to +1 the stronger the helmet-likeness, and the closer it is to -1, the weaker the helmet-likeness. The same applies to the evaluation value representing the characteristic image-likeness of other characteristic images such as "head degree" indicating the head-likeness of the "head image".

次に図１７を参照し、ヘルメット度の分布に基づいて撮像画像が人検知不適状態であるか否かをコントローラ３０が判定する処理（以下、「人検知適否判定処理」とする。）について説明する。図１７は人検知適否判定処理の一例の流れを示すフローチャートである。 Next, with reference to FIG. 17, a process for the controller 30 to determine whether or not the captured image is in an unsuitable state for human detection based on the distribution of the helmet degree (hereinafter referred to as “human detection suitability determination process”) will be described. do. FIG. 17 is a flowchart showing the flow of an example of the human detection suitability determination process.

コントローラ３０は、各所定画像部分における判定対象画像のヘルメット度を画像抽出処理の際に算出して内部メモリ等に記憶する。ヘルメット度は、例えば、所定順（降順）で並べ替えできるように記憶されてもよい。そして、コントローラ３０は、例えば、所定数の所定画像部分における判定対象画像のヘルメット度を記憶した時点で人検知適否判定処理の実行を開始し、その後はヘルメット度を新たに算出する度に人検知適否判定処理の実行を繰り返す。 The controller 30 calculates the helmet degree of the determination target image in each predetermined image portion at the time of the image extraction process and stores it in the internal memory or the like. Helmet degrees may be stored, for example, so that they can be sorted in a predetermined order (descending order). Then, for example, the controller 30 starts executing the human detection suitability determination process when the helmet degree of the determination target image in a predetermined number of predetermined image portions is stored, and thereafter, the person is detected every time the helmet degree is newly calculated. The execution of the suitability determination process is repeated.

また、コントローラ３０は、画像抽出処理と人検知適否判定処理とを並列的に実行する。画像抽出処理で算出されるヘルメット度を人検知適否判定処理で利用するためである。但し、画像抽出処理が完了した後で人検知適否判定処理を実行してもよい。画像抽出処理は、例えば、図１３～図１６に示す画像抽出処理の何れかが採用される。 Further, the controller 30 executes the image extraction process and the human detection suitability determination process in parallel. This is because the helmet degree calculated in the image extraction process is used in the human detection suitability determination process. However, the human detection suitability determination process may be executed after the image extraction process is completed. As the image extraction process, for example, any of the image extraction processes shown in FIGS. 13 to 16 is adopted.

図１７に示すように、コントローラ３０はまずｎ番目に高いヘルメット度を取得する（ステップＳＴ４１）。コントローラ３０は、例えば、内部メモリに記憶されたヘルメット度を参照し、所定順位のヘルメット度であるｎ番目に高いヘルメット度を取得する。「ｎ」は、適宜設定される１以上の整数であり、例えば、所定画像部分（判定対象画像）の総数の２％に相当する値が採用される。内部メモリには、例えば、判定対象画像のヘルメット度が降順に並べられて記憶されている。 As shown in FIG. 17, the controller 30 first acquires the nth highest helmet degree (step ST41). The controller 30 refers to, for example, the helmet degree stored in the internal memory, and acquires the nth highest helmet degree, which is a predetermined order of helmet degree. “N” is an integer of 1 or more that is appropriately set, and for example, a value corresponding to 2% of the total number of predetermined image portions (determination target images) is adopted. In the internal memory, for example, the helmet degrees of the images to be determined are stored in descending order.

そして、ｎ番目に高いヘルメット度が所定の適否判定値以上であるか否かを判定する（ステップＳＴ４２）。仕様として選択可能な所定ヘルメット度が適否判定値として設定される。ｎ番目に高いヘルメット度は、適否判定値以上であるか判定される。 Then, it is determined whether or not the nth highest helmet degree is equal to or higher than a predetermined suitability determination value (step ST42). A predetermined helmet degree that can be selected as a specification is set as a suitability judgment value. The nth highest helmet degree is determined to be equal to or higher than the suitability determination value.

適否判定値は、撮像画像が人検知不適状態であるか人検知適合状態であるかを判定するための値である。採否判定値と同様に、適否判定値は、例えば、機械学習、統計分析等により予め設定される。適否判定値は、夜間環境、逆光環境、日陰環境等のショベルの作業環境毎に別々に設定されていてもよい。 The suitability determination value is a value for determining whether the captured image is in a human detection unsuitable state or a human detection conforming state. Similar to the acceptance / rejection determination value, the suitability determination value is preset by, for example, machine learning, statistical analysis, or the like. The suitability determination value may be set separately for each excavator work environment such as a night environment, a backlight environment, and a shade environment.

ｎ番目に高いヘルメット度が適否判定値以上であると判定した場合（ステップＳＴ４２のＹＥＳ）、コントローラ３０は、撮像画像が人検知不適状態であると判定する（ステップＳＴ４３）。適否判定値が採否判定値以上であれば、「ｎ番目に高いヘルメット度が適否判定値以上である」という条件が満たされる状態は、適否判定値以上のヘルメット度を有するヘルメット画像がｎ個以上存在する状態を意味する。図１７の例では、ヘルメット画像がｎ個以上存在する撮像画像の状態を人検知不適状態と定めている。「ｎ」は、抽出部３１から識別部３２に送られる人候補画像数の基準と考えることができる。すなわち、識別部３２で処理すべき人候補画像が多いか少ないかの判断基準となる。 When it is determined that the nth highest helmet degree is equal to or higher than the suitability determination value (YES in step ST42), the controller 30 determines that the captured image is in an unsuitable state for human detection (step ST43). If the suitability judgment value is equal to or higher than the acceptance / rejection judgment value, the condition that "the nth highest helmet degree is equal to or higher than the suitability judgment value" is satisfied, and there are n or more helmet images having a helmet degree equal to or higher than the suitability judgment value. It means a state that exists. In the example of FIG. 17, the state of the captured image in which n or more helmet images are present is defined as the unsuitable state for human detection. “N” can be considered as a reference for the number of candidate image images sent from the extraction unit 31 to the identification unit 32. That is, it serves as a criterion for determining whether the number of candidate images to be processed by the identification unit 32 is large or small.

撮像画像が人検知不適状態であると判定した場合、コントローラ３０は撮像画像が人検知不適状態である旨をショベルの操作者に通知してもよい。例えば、コントローラ３０は、出力装置５０としての車載ディスプレイに制御指令を出力してその旨を伝えるテキストメッセージを表示させてもよい。或いは、コントローラ３０は、出力装置５０としての車載スピーカに制御指令を出力してその旨を伝える音声メッセージを音声出力させてもよい。 When it is determined that the captured image is in an unsuitable state for human detection, the controller 30 may notify the shovel operator that the captured image is in an unsuitable state for human detection. For example, the controller 30 may output a control command to the vehicle-mounted display as the output device 50 and display a text message to that effect. Alternatively, the controller 30 may output a control command to the vehicle-mounted speaker as the output device 50 to output a voice message to that effect.

また、撮像画像が人検知不適状態であると判定した場合、コントローラ３０は、周辺監視システムによる人検知を中止させてもよい。信頼性の低い抽出結果に基づく人検知結果をショベルの操作者に提示しないようにするためである。 Further, when it is determined that the captured image is in an unsuitable state for human detection, the controller 30 may stop the human detection by the peripheral monitoring system. This is to prevent the excavator operator from presenting the human detection result based on the unreliable extraction result.

また、撮像画像が人検知不適状態であると判定した場合、コントローラ３０は、その撮像画像をＮＶＲＡＭ等に記録してもよい。周辺監視システムの改良に利用できるようにするためである。 Further, when it is determined that the captured image is in an unsuitable state for human detection, the controller 30 may record the captured image in NVRAM or the like. This is so that it can be used to improve the peripheral monitoring system.

ｎ番目に高いヘルメット度が適否判定値未満であると判定した場合（ステップＳＴ４２のＮＯ）、コントローラ３０は、撮像画像が人検知に適した状態（以下、「人検知適合状態」とする。）であると判定する（ステップＳＴ４４）。適否判定値が採否判定値以上であれば、「ｎ番目に高いヘルメット度が適否判定値未満である」という条件が満たされる状態は、適否判定値以上のヘルメット度を有するヘルメット画像がｎ個未満である状態を意味する。図１７の例から考えると、この場合は人検知不適状態とはならない。識別器で処理すべき人候補画像の数が、基準である「ｎ」未満だからである。 When it is determined that the nth highest helmet degree is less than the suitability determination value (NO in step ST42), the controller 30 determines that the captured image is suitable for human detection (hereinafter referred to as "human detection conforming state"). (Step ST44). If the suitability judgment value is equal to or higher than the acceptance / rejection judgment value, the condition that "the nth highest helmet degree is less than the suitability judgment value" is satisfied, and the number of helmet images having a helmet degree equal to or higher than the suitability judgment value is less than n. Means the state of being. Considering from the example of FIG. 17, in this case, the human detection is not unsuitable. This is because the number of candidate image images to be processed by the classifier is less than the standard "n".

次に図１８を参照し、人検知適合状態及び人検知不適状態のそれぞれの特徴について説明する。図１８は、ある環境下で撮像された撮像画像における全ての所定画像部分のそれぞれから導き出された－１～＋１の間の実数値を有するヘルメット度の度数分布図（ヒストグラム）である。縦軸の度数は、ヘルメット度に対応する検出数と考えることができる。図１８（Ａ）は人検知適合状態のときの度数分布図を示し、図１８（Ｂ）は人検知不適状態のときの度数分布図を示す。図１８（Ａ）及び図１８（Ｂ）は、明瞭性を維持するため、各ビンの図示を省略して度数曲線のみを示す。度数曲線と横軸とで囲まれた範囲は、撮像画像における所定画像部分の総数を表す。 Next, with reference to FIG. 18, the characteristics of the human detection conforming state and the human detection unsuitable state will be described. FIG. 18 is a frequency distribution diagram (histogram) of a helmet degree having a real value between -1 and +1 derived from each of all predetermined image portions in a captured image captured under a certain environment. The frequency on the vertical axis can be considered as the detection frequency corresponding to the helmet degree. FIG. 18A shows a frequency distribution map in a human detection conforming state, and FIG. 18B shows a frequency distribution map in a human detection unsuitable state. 18 (A) and 18 (B) show only the frequency curve by omitting the illustration of each bin in order to maintain clarity. The range surrounded by the frequency curve and the horizontal axis represents the total number of predetermined image portions in the captured image.

図１８（Ａ）及び図１８（Ｂ）において、ヘルメット度が＋０．５（第１基準）のところに引かれた一点鎖線は適否判定値の位置を示し、ヘルメット度が＋０．２（第２基準）のところに引かれた二点鎖線は採否判定値の位置を示す。したがって、ヘルメット度が＋０．２以上の判定対象画像はヘルメット画像として採用され、ヘルメット度が＋０．２未満の判定対象画像はヘルメット画像として採用されない。以下では、判定対象画像のうちヘルメット画像として採用されなかったものを非ヘルメット画像と称する。なお、第１基準及び第２基準はそれぞれ変更され得る。図１８（Ａ）のヒストグラムのうち、＋０．２未満のヘルメット度を有する判定対象画像は人候補画像としては抽出されない。＋０．２以上のヘルメット度を有する判定対象画像は抽出部３１により人候補画像として抽出されたといえる。抽出された人候補画像は、識別部によって処理される識別処理対象画像として取り扱われる。なお、抽出部３１による識別処理対象画像の抽出は複数の段階で行われてもよい。そして、識別部３２は、抽出部３１が抽出した識別処理対象画像の全てに対して画像認識処理を施す。 In FIGS. 18 (A) and 18 (B), the alternate long and short dash line drawn at the place where the helmet degree is +0.5 (first reference) indicates the position of the suitability judgment value, and the helmet degree is +0.2 (second reference). The two-dot chain line drawn at (reference) indicates the position of the acceptance / rejection judgment value. Therefore, the determination target image having a helmet degree of +0.2 or more is adopted as a helmet image, and the determination target image having a helmet degree of less than +0.2 is not adopted as a helmet image. In the following, among the images to be determined, those not adopted as the helmet image will be referred to as a non-helmet image. The first criterion and the second criterion can be changed respectively. From the histogram of FIG. 18A, the determination target image having a helmet degree of less than +0.2 is not extracted as a human candidate image. It can be said that the determination target image having a helmet degree of +0.2 or more was extracted as a human candidate image by the extraction unit 31. The extracted person candidate image is treated as an identification processing target image processed by the identification unit. The image to be identified by the extraction unit 31 may be extracted in a plurality of stages. Then, the identification unit 32 performs image recognition processing on all the identification processing target images extracted by the extraction unit 31.

図１８（Ａ）では、ｎ番目に高いヘルメット度は－０．３であり、適否判定値（＋０．５）より小さい。すなわち、「ｎ番目に高いヘルメット度が適否判定値未満である」という条件が満たされる。適否判定値より大きいヘルメット度を有するヘルメット画像がｎ個未満である状態を意味する。基準「ｎ」の観点から考えると、識別部３２で処理すべき識別処理対象画像の数が少ない人検知適合状態ということができる。なお、ｎ番目に高いヘルメット度を有する判定対象画像が非ヘルメット画像である場合に人検知適合状態であるということもできる。したがって、コントローラ３０は、現に取得した撮像画像が人検知適合状態であると判定する。 In FIG. 18A, the nth highest helmet degree is −0.3, which is smaller than the suitability determination value (+0.5). That is, the condition that "the nth highest helmet degree is less than the suitability determination value" is satisfied. It means that the number of helmet images having a helmet degree larger than the suitability judgment value is less than n. From the viewpoint of the reference "n", it can be said that the number of the identification processing target images to be processed by the identification unit 32 is small, and it is a human detection conforming state. It can also be said that the human detection conforming state is obtained when the determination target image having the nth highest helmet degree is a non-helmet image. Therefore, the controller 30 determines that the captured image actually acquired is in the human detection conforming state.

このように、図１８（Ａ）に示すようなヘルメット度の度数分布をもたらす撮像画像は、見つけ出されたヘルメット画像の数が少なく、非ヘルメット画像の数が多いという特徴を有する。そのため識別部３２で処理する処理対象画像は相対的に少ない。 As described above, the captured image having the frequency distribution of the helmet degree as shown in FIG. 18A has a feature that the number of helmet images found is small and the number of non-helmet images is large. Therefore, the number of images to be processed by the identification unit 32 is relatively small.

図１８（Ｂ）では、ｎ番目に高いヘルメット度は＋０．６であり、適否判定値（＋０．５）より大きい。すなわち、「ｎ番目に高いヘルメット度が適否判定値以上である」という条件が満たされる。適否判定値より大きいヘルメット度を有するヘルメット画像がｎ個以上である状態を意味する。基準「ｎ」の観点から考えると、識別部３２で処理すべき識別処理対象画像の数が多い、人検知不適状態ということができる。なお、ｎ番目に高いヘルメット度を有する判定対象画像がヘルメット画像である場合に人検知不適状態であるということもできる。したがって、コントローラ３０は、現に取得した撮像画像が人検知不適状態であると判定する。 In FIG. 18B, the nth highest helmet degree is +0.6, which is larger than the suitability determination value (+0.5). That is, the condition that "the nth highest helmet degree is equal to or higher than the suitability determination value" is satisfied. It means that the number of helmet images having a helmet degree larger than the suitability judgment value is n or more. From the viewpoint of the reference "n", it can be said that the number of images to be processed for identification to be processed by the identification unit 32 is large, which is unsuitable for human detection. It can also be said that the human detection unsuitable state is obtained when the determination target image having the nth highest helmet degree is the helmet image. Therefore, the controller 30 determines that the actually acquired image is in an unsuitable state for human detection.

このように、図１８（Ｂ）に示すようなヘルメット度の度数分布をもたらす撮像画像は、見つけ出されたヘルメット画像の数が多く、非ヘルメット画像の数が少ないという特徴を有する。すなわち、誤認識されたヘルメット画像の数が多いという特徴を有する。そのため識別部３２で処理する処理対象画像は相対的に多い。 As described above, the captured image having the frequency distribution of the helmet degree as shown in FIG. 18B has a feature that the number of helmet images found is large and the number of non-helmet images is small. That is, it has a feature that the number of erroneously recognized helmet images is large. Therefore, the number of images to be processed by the identification unit 32 is relatively large.

上述の通り、周辺監視システム１００は、作業機械の周辺の人を信頼性高く検知できる。また、コントローラ３０は、撮像装置４０の撮像画像のみに基づいて撮像画像が人検知不適状態であるか人検知適合状態であるかを判定できる。具体的には、抽出部３１が抽出した識別処理対象画像の数に基づいて撮像画像が人検知に不適な状態であるか否かを判定できる。この特徴によれば、人検知部は前段（抽出部３１）と後段（識別部３２）とを含み、後段の演算コストのほうが高いことは後段の負担を減らす契機となる。また、後段の処理に時間が掛かる場面であるかを判断することができる。 As described above, the peripheral monitoring system 100 can detect people around the work machine with high reliability. Further, the controller 30 can determine whether the captured image is in a human detection unsuitable state or a human detection compatible state based only on the captured image of the image pickup device 40. Specifically, it is possible to determine whether or not the captured image is in a state unsuitable for human detection based on the number of identification processing target images extracted by the extraction unit 31. According to this feature, the human detection unit includes the front stage (extraction unit 31) and the rear stage (identification unit 32), and the fact that the calculation cost of the rear stage is higher is an opportunity to reduce the burden on the rear stage. In addition, it is possible to determine whether the scene requires a long time for the subsequent processing.

また、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合にその旨をショベルの操作者に通知することで、現在の人検知結果に過度に依存しないよう注意を喚起できる。 Further, the peripheral monitoring system 100 can alert the operator of the excavator when it is determined that the captured image is unsuitable for human detection so as not to excessively depend on the current human detection result. ..

或いは、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合に人検知を中止することで、信頼性の低い抽出結果に基づく人検知結果の提示を防止できる。 Alternatively, the peripheral monitoring system 100 can prevent the presentation of the human detection result based on the unreliable extraction result by stopping the human detection when it is determined that the captured image is in an unsuitable state for human detection.

或いは、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合にその撮像画像をＮＶＲＡＭ等に記録することで、その撮像画像を周辺監視システムの改良に利用できる。 Alternatively, the peripheral monitoring system 100 can use the captured image for improving the peripheral monitoring system by recording the captured image in NVRAM or the like when it is determined that the captured image is in an unsuitable state for human detection.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなしに上述した実施例に種々の変形及び置換を加えることができる。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the above-mentioned examples, and various modifications and substitutions are made to the above-mentioned examples without departing from the scope of the present invention. Can be added.

例えば、上述の実施例では、周辺監視システム１００は、人の一部を表すヘルメットの画像を特徴画像として利用しながら識別処理対象画像を抽出したが、人の全身の画像を特徴画像として利用しながら識別処理対象画像を抽出してもよい。 For example, in the above embodiment, the peripheral monitoring system 100 extracts the image to be identified while using the image of the helmet representing a part of the person as the feature image, but uses the image of the whole body of the person as the feature image. However, the image to be identified may be extracted.

また、評価値は、撮像画像の一部が人の画像であるかの指標として広く用いられてもよい。評価値としてのヘルメット度は人らしさを表す値と言うこともできる。 Further, the evaluation value may be widely used as an index as to whether or not a part of the captured image is a human image. The degree of helmet as an evaluation value can also be said to be a value representing humanity.

また、上述の実施例では、ショベルの上部旋回体３の上に取り付けられる撮像装置４０の撮像画像を用いて人を検知する場合を想定するが、本発明はこの構成に限定されるものではない。移動式クレーン、固定式クレーン、リフマグ機、フォークリフト等の他の作業機械の本体部に取り付けられる撮像装置の撮像画像を用いる構成にも適用され得る。 Further, in the above-described embodiment, it is assumed that a person is detected by using an image captured by an image pickup device 40 mounted on the upper swivel body 3 of the shovel, but the present invention is not limited to this configuration. .. It can also be applied to a configuration using an image taken by an image pickup device attached to the main body of another work machine such as a mobile crane, a fixed crane, a riff mag machine, and a forklift.

また、上述の実施例では、３つのカメラを用いてショベルの死角領域を撮像するが、１つ、２つ、又は４つ以上のカメラを用いてショベルの死角領域を撮像してもよい。 Further, in the above-described embodiment, the blind spot area of the excavator is imaged using three cameras, but the blind spot area of the excavator may be imaged using one, two, or four or more cameras.

また、上述の実施例では、複数の撮像画像のそれぞれに対して個別に人検知処理が適用されるが、複数の撮像画像から生成される１つの合成画像に対して人検知処理が適用されてもよい。
本発明の別の実施形態に係る作業機械用周辺監視システムは、作業機械に取り付けられる撮像装置の撮像画像を用いて前記作業機械の周辺に存在する人を検知する作業機械用周辺監視システムであって、前記撮像画像から複数の所定画像部分を識別処理対象画像として抽出する抽出部と、前記識別処理対象画像に含まれる画像が人の画像であるかを画像認識処理によって識別する識別部と、を有し、前記抽出部が抽出した前記識別処理対象画像の数が所定の基準より多いときに前記撮像画像が人検知に不適な状態であると判定するように構成されていてもよい。
また、作業機械用周辺監視システムは、前記複数の所定画像部分のそれぞれに関して算出される評価値を高い順に並べ、所定順位の前記評価値が所定の適否判定値以上の場合に前記撮像画像が人検知に不適な状態であると判定するように構成されていてもよい。この場合、前記評価値は、各所定画像部分の画像特徴量に基づいて算出され、且つ、各所定画像部分の人らしさを表していてもよく、前記抽出部は、前記評価値が所定の採否判定値以上の所定画像部分を前記識別処理対象画像として抽出するように構成されていてもよい。また、前記適否判定値は、前記採否判定値以上であってもよい。なお、各所定画像部分の画像特徴量の算出に要する演算コストは、典型的には、前記識別部が実行する画像認識処理で用いる画像特徴量の算出に要する演算コストよりも低い。
また、作業機械用周辺監視システムは、前記撮像画像が人検知に不適な状態であると判定した場合にその旨を通知するように構成されていてもよい。
また、作業機械用周辺監視システムは、前記撮像画像が人検知に不適な状態であると判定した場合に前記撮像画像を記録するように構成されていてもよい。 Further, in the above-described embodiment, the human detection process is individually applied to each of the plurality of captured images, but the human detection process is applied to one composite image generated from the plurality of captured images. May be good.
The peripheral monitoring system for a work machine according to another embodiment of the present invention is a peripheral monitoring system for a work machine that detects a person existing in the vicinity of the work machine by using an image taken by an image pickup device attached to the work machine. An extraction unit that extracts a plurality of predetermined image portions from the captured image as identification processing target images, and an identification unit that identifies whether the image included in the identification processing target image is a human image by image recognition processing. When the number of the identification processing target images extracted by the extraction unit is larger than a predetermined reference, the captured image may be determined to be in an unsuitable state for human detection.
Further, in the peripheral monitoring system for work machines, the evaluation values calculated for each of the plurality of predetermined image portions are arranged in descending order, and when the evaluation value in the predetermined order is equal to or higher than the predetermined suitability determination value, the captured image is a person. It may be configured to determine that the state is unsuitable for detection. In this case, the evaluation value may be calculated based on the image feature amount of each predetermined image portion and may represent the humanity of each predetermined image portion, and the extraction unit may accept or reject the evaluation value. A predetermined image portion equal to or larger than the determination value may be configured to be extracted as the identification processing target image. Further, the suitability determination value may be equal to or higher than the acceptance / rejection determination value. The calculation cost required for calculating the image feature amount of each predetermined image portion is typically lower than the calculation cost required for calculating the image feature amount used in the image recognition process executed by the identification unit.
Further, the peripheral monitoring system for work machines may be configured to notify when it is determined that the captured image is in an unsuitable state for human detection.
Further, the peripheral monitoring system for a work machine may be configured to record the captured image when it is determined that the captured image is in an unsuitable state for human detection.

１・・・下部走行体２・・・旋回機構３・・・上部旋回体４・・・ブーム５・・・アーム６・・・バケット７・・・ブームシリンダ８・・・アームシリンダ９・・・バケットシリンダ１０・・・キャビン３０・・・コントローラ３１・・・抽出部３２・・・識別部３３・・・追跡部３４・・・人検知部３５・・・制御部４０・・・撮像装置４０Ｂ・・・後方カメラ４０Ｌ・・・左側方カメラ４０Ｒ・・・右側方カメラ４２・・・入力装置５０・・・出力装置５１・・・機械制御装置１００・・・周辺監視システムＡＰ、ＡＰ１～ＡＰ６・・・頭部画像位置ＢＸ・・・ボックスＨＤ・・・頭部ＨＰ・・・仮想頭部位置ＨＲｇ・・・ヘルメット画像Ｍ１、Ｍ２・・・マスク領域Ｐｒ、Ｐｒ１、Ｐｒ２、Ｐｒ１０～Ｐｒ１２・・・参照点Ｒ１・・・はみ出し領域Ｒ２・・・車体映り込み領域ＲＰ・・・代表位置ＴＲ、ＴＲ１、ＴＲ２、ＴＲ１０～ＴＲ１２・・・仮想平面領域ＴＲｇ、ＴＲｇ３、ＴＲｇ４、ＴＲｇ５・・・識別処理対象画像領域ＴＲｇｔ、ＴＲｇｔ３、ＴＲｇｔ４、ＴＲｇｔ５・・・正規化画像 1 ... Lower traveling body 2 ... Swivel mechanism 3 ... Upper swivel body 4 ... Boom 5 ... Arm 6 ... Bucket 7 ... Boom cylinder 8 ... Arm cylinder 9 ...・ Bucket cylinder 10 ・・・ Cabin 30 ・・・ Controller 31 ・・・ Extraction unit 32 ・・・ Identification unit 33 ・・・ Tracking unit 34 ・・・ Person detection unit 35 ・・・ Control unit 40 ・・・ Imaging device 40B ・・・ Rear camera 40L ・・・ Left side camera 40R ・・・ Right side camera 42 ・・・ Input device 50 ・・・ Output device 51 ・・・ Machine control device 100 ・・・ Peripheral monitoring system AP, AP1 ～ AP6 ... Head image position BX ... Box HD ... Head HP ... Virtual head position HRg ... Helmet image M1, M2 ... Mask area Pr, Pr1, Pr2, Pr10 to Pr12 ... Reference point R1 ... Overhang area R2 ... Vehicle body reflection area RP ... Representative position TR, TR1, TR2, TR10 to TR12 ... Virtual plane area TRg, TRg3, TRg4, TRg5 ... Identification processing target image area TRgt, TRgt3, TRgt4, TRgt5 ... Normalized image

Claims

作業機械に取り付けられる撮像装置の撮像画像を用いて前記作業機械の周辺に存在する人を検知する作業機械用周辺監視システムであって、
前記撮像画像から人候補画像を抽出する抽出部を有し、
前記抽出部が抽出した人候補画像の数に基づいて前記撮像画像が人検知に不適な状態であるか否かを判定し、
前記撮像画像が人検知に不適な状態であると判定した場合、前記撮像画像が人検知に不適な状態である旨を通知する、
作業機械用周辺監視システム。 It is a peripheral monitoring system for a work machine that detects a person existing in the vicinity of the work machine by using an image taken by an image pickup device attached to the work machine.
It has an extraction unit that extracts a person candidate image from the captured image, and has an extraction unit.
Based on the number of human candidate images extracted by the extraction unit, it is determined whether or not the captured image is in a state unsuitable for human detection.
When it is determined that the captured image is in an unsuitable state for human detection, the captured image is notified that the captured image is in an unsuitable state for human detection.
Peripheral monitoring system for work machines.

作業機械に取り付けられる撮像装置の撮像画像を用いて前記作業機械の周辺に存在する人を検知する作業機械用周辺監視システムであって、
前記撮像画像に基づいて、人らしさを表す値である評価値を算出し、算出された評価値に基づいて前記撮像画像が人検知に不適な状態であるか否かを判定し、
前記撮像画像が人検知に不適な状態である場合、前記撮像画像が人検知に不適な状態である旨を通知する、
作業機械用周辺監視システム。 It is a peripheral monitoring system for a work machine that detects a person existing in the vicinity of the work machine by using an image taken by an image pickup device attached to the work machine.
Based on the captured image, an evaluation value which is a value representing humanity is calculated, and based on the calculated evaluation value, it is determined whether or not the captured image is in an unsuitable state for human detection.
When the captured image is in a state unsuitable for human detection, the captured image is notified that the captured image is in a state unsuitable for human detection.
Peripheral monitoring system for work machines.

前記撮像画像が人検知に不適な状態である場合に前記撮像画像を記録する、
請求項１又は２に記載の作業機械用周辺監視システム。 When the captured image is in a state unsuitable for human detection, the captured image is recorded.
Peripheral monitoring system for work machines according to claim 1 or 2 .

前記撮像画像が人検知に不適な状態である場合、人検知を中止する、
請求項１乃至３の何れか一項に記載の作業機械用周辺監視システム。 If the captured image is in an unsuitable state for human detection, human detection is stopped.
The peripheral monitoring system for work machines according to any one of claims 1 to 3 .

前記撮像画像が人検知に不適な状態であると判定した場合に前記作業機械の操作者に前記撮像画像が人検知に不適な状態である旨を通知する、
請求項１乃至４の何れか一項に記載の作業機械用周辺監視システム。 When it is determined that the captured image is in an unsuitable state for human detection, the operator of the work machine is notified that the captured image is in an unsuitable state for human detection.
The peripheral monitoring system for work machines according to any one of claims 1 to 4 .

前記作業機械の操作者に前記撮像画像が人検知に不適な状態である旨の通知として、人検知に不適な状態である旨を示すテキストメッセージを表示させる、
請求項５に記載の作業機械用周辺監視システム。 As a notification to the operator of the work machine that the captured image is in an unsuitable state for human detection, a text message indicating that the captured image is in an unsuitable state for human detection is displayed.
The peripheral monitoring system for work machines according to claim 5 .

前記作業機械の操作者に前記撮像画像が人検知に不適な状態である旨の通知として、人検知に不適な状態である旨を示す音声メッセージを出力させる、
請求項５又は６に記載の作業機械用周辺監視システム。 As a notification to the operator of the work machine that the captured image is in an unsuitable state for human detection, a voice message indicating that the captured image is in an unsuitable state for human detection is output.
Peripheral monitoring system for work machines according to claim 5 or 6 .

前記撮像画像が人検知に不適な状態であると判定した場合に前記作業機械用周辺監視システムの動作内容を変更する、
請求項１乃至７の何れか一項に記載の作業機械用周辺監視システム。 When it is determined that the captured image is in an unsuitable state for human detection, the operation content of the peripheral monitoring system for the work machine is changed.
The peripheral monitoring system for work machines according to any one of claims 1 to 7 .