JP7266008B2

JP7266008B2 - LEARNING IMAGE JUDGMENT DEVICE, PROGRAM AND LEARNING IMAGE JUDGMENT METHOD

Info

Publication number: JP7266008B2
Application number: JP2020052255A
Authority: JP
Inventors: 博章三沢; 博基古川; 一則和久井
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2023-04-27
Anticipated expiration: 2040-03-24
Also published as: JP2021152691A

Description

本発明は、画像認識用の学習モデルの追加学習に係る学習画像判定装置、プログラムおよび学習画像判定方法に関する。 The present invention relates to a learning image determination device, a program, and a learning image determination method related to additional learning of a learning model for image recognition.

近年、画像認識の分野では、深層学習を中心とする機械学習を活用したシステムやサービスが増加している。深層学習には、学習用の画像データに正解となるラベル（正解ラベル）を付与して学習する教師あり学習、および、正解ラベルを付与せずに学習を行い、学習用の画像データの特徴量を抽出する教師なし学習がある。例えば、入力画像に写る物体が「何か」を識別する画像分類問題や、画像に写る物体が「正常か異常か」を判断する故障診断、人物画像から年齢を推定するといった回帰問題などを深層学習で解くためには、教師あり学習を用いる場合が多い。 In recent years, in the field of image recognition, systems and services that utilize machine learning centered on deep learning are increasing. Deep learning includes supervised learning in which correct labels (correct labels) are assigned to the image data for learning, and learning is performed without assigning correct labels, and the feature values of the image data for learning are There is unsupervised learning to extract For example, we can solve deep-level problems such as the image classification problem of identifying what an object in an input image is, fault diagnosis of determining whether an object in an image is normal or abnormal, and regression problems such as estimating the age of a person from an image. In order to solve by learning, supervised learning is often used.

一般的に教師あり学習は、学習に用いる画像データが多ければ多いほど、学習モデルの性能（認識精度や汎化性能など）が高い。しかし、学習用の画像データと、学習後の運用環境における実際の画像データとは、画像を撮影する画角や照明条件、背景などに違いがあり、学習モデルの性能が低下する可能性がある。このため、深層学習技術を活用したシステムを運用する前には、運用環境で画像データを収集し、再学習（追加学習）を行う必要がある。 Generally, in supervised learning, the more image data used for learning, the higher the performance of the learning model (recognition accuracy, generalization performance, etc.). However, there are differences in the angle of view, lighting conditions, background, etc. between the image data for training and the actual image data in the operational environment after training, which may reduce the performance of the learning model. . Therefore, before operating a system that utilizes deep learning technology, it is necessary to collect image data in the operating environment and perform re-learning (additional learning).

追加学習を行う場合、従来は運用環境に設置したカメラで一定期間映像データを取得し、取得した画像データ全てに正解ラベルの付与（アノテーション）を行っていた。このため、深層学習の学習データセット（正解ラベルが付与された状態の学習用画像データ）の構築に大きな工数を要する。しかし、運用環境で取得した画像データのなかには、学習の効果が低い画像やノイズ画像が多く含まれる。学習効果が低い画像とは、追加学習前の学習モデルで既に正しい認識ができる画像である。また、ノイズ画像とは、画質が粗い画像や認識対象が撮影できていない画像などである。このため、取得した画像データのなかから、追加学習の効果が高い画像を効率よく抽出し、学習データセットの構築工数を削減する手法が求められている。 In the case of additional learning, conventionally, a camera installed in the operating environment acquires video data for a certain period of time, and all the acquired image data is given correct labels (annotations). Therefore, a large number of man-hours are required to construct a deep learning learning data set (learning image data to which correct labels are assigned). However, the image data acquired in the operational environment contains many images with low learning effects and noise images. An image with a low learning effect is an image that can already be correctly recognized by the learning model before additional learning. A noise image is an image with rough image quality or an image in which a recognition target cannot be captured. Therefore, there is a need for a method of efficiently extracting images for which additional learning is highly effective from acquired image data, and reducing the man-hours for constructing a learning data set.

上記課題に対して、特許文献１に記載の発明においては、学習モデルが算出する確からしさの値が予め定めた基準を満たしていない画像データのみを追加学習用データとすることで、学習データセット構築の工数低減を実現している。確からしさの値とは、画像データが分類（識別、推定）されたクラスに属する確率であり、学習モデルが分類するクラスそれぞれに対して算出される。特許文献１では、この確からしさの値が全てのクラスに対して平均的な値となる画像データ（どのクラスに属するか曖昧な画像データ）を追加学習用データに選定している。 In order to solve the above problem, in the invention described in Patent Document 1, only image data for which the probability value calculated by the learning model does not satisfy a predetermined criterion is used as additional learning data, so that the learning data set Realizes a reduction in man-hours for construction. The likelihood value is the probability that image data belongs to a classified (identified, estimated) class, and is calculated for each class classified by the learning model. In Patent Literature 1, image data (image data that is ambiguous as to which class it belongs to) whose likelihood value is the average value for all classes is selected as data for additional learning.

特開２０１９－１５２９４８号公報JP 2019-152948 A

学習モデルが算出する確からしさの値に基づいて追加学習用の画像データを選定すると、ノイズ画像のように元々の学習データに含まれていない画像データは、確からしさの値が平均的となるため、追加学習用データとして選定される可能性がある。また、間違った推定（認識）をした画像は、追加学習が必要であり、最も追加学習の効果が高い画像と考えられるが、このような画像が選定されるとは限らない。 When image data for additional learning is selected based on the likelihood value calculated by the learning model, image data not included in the original learning data, such as noise images, has an average likelihood value. , may be selected as additional training data. In addition, an incorrectly estimated (recognized) image requires additional learning, and is considered to be the image with the highest effect of additional learning, but such an image is not necessarily selected.

本発明は、このような背景を鑑みてなされたものであり、追加学習用の画像として効果が高い画像の判定を可能とする学習画像判定装置、プログラムおよび学習画像判定方法を提供することを課題とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a learning image determination device, a program, and a learning image determination method that enable determination of an image that is highly effective as an image for additional learning. and

上記課題を解決するため、本発明に係る学習画像判定装置は、画像に含まれる人物である認識対象を認識する学習モデルの追加学習に関し、画像の追加学習の効果を判定する学習画像判定装置であって、前記学習モデルを用いた認識処理において認識対象となった画像の領域である注視領域と、認識対象が存在する領域として定義済みである対象存在領域とを比較して、重ならない度合いが大きいほど、前記追加学習用の画像としての効果が大きいと判定する追加学習効果判定部と、画像中の人物領域と、前記人物の関節点とが異なる画像をノイズ画像と判定するノイズ画像判定部と、を備え、前記追加学習効果判定部は、前記ノイズ画像以外の画像を前記追加学習用の画像としての効果の判定対象とする。 In order to solve the above problems, a learning image determination device according to the present invention relates to additional learning of a learning model for recognizing a recognition target that is a person included in an image. A region of interest, which is a region of an image to be recognized in recognition processing using the learning model, is compared with a target existence region defined as a region in which the recognition target exists, and the degree of non-overlapping is determined. An additional learning effect determination unit that determines that the larger the size, the greater the effect as the image for additional learning, and a noise image determination unit that determines that an image in which the person region in the image and the joint points of the person are different is a noise image. and the additional learning effect determination unit determines the effect of the image for additional learning on images other than the noise image.

本発明によれば、追加学習用の画像として効果が高い画像の判定を可能とする学習画像判定装置、プログラムおよび学習画像判定方法を提供することができる。 According to the present invention, it is possible to provide a learning image determination device, a program, and a learning image determination method that enable determination of images that are highly effective as images for additional learning.

本実施形態に係る学習画像判定装置の機能ブロック図である。1 is a functional block diagram of a learning image determination device according to this embodiment; FIG. 本実施形態に係る画像データベースのデータ構成図である。3 is a data configuration diagram of an image database according to the embodiment; FIG. 本実施形態に係る領域設定データベースのデータ構成図である。4 is a data configuration diagram of an area setting database according to the embodiment; FIG. 本実施形態に係る人物検出部が画像から検出した人物領域を示す図である。It is a figure which shows the person area|region which the person detection part which concerns on this embodiment detected from the image. 本実施形態に係る骨格検出部が画像から検出した関節点を示す図である。FIG. 4 is a diagram showing joint points detected from an image by a skeleton detection unit according to the embodiment; 本実施形態に係る人物検出部が人物を検出できなかった画像を示す図（１）である。FIG. 11 is a diagram (1) showing an image in which a person detection unit according to the embodiment cannot detect a person; 本実施形態に係る人物検出部が人物を検出できなかった画像を示す図（２）である。FIG. 11B is a diagram (2) showing an image in which the person detection unit according to the embodiment cannot detect a person; 本実施形態に係る認識対象の存在領域を設定する存在領域設定画面の画面構成図（１）である。FIG. 3 is a screen configuration diagram (1) of an existing area setting screen for setting an existing area of a recognition target according to the present embodiment; 本実施形態に係る認識対象の存在領域を設定する存在領域設定画面の画面構成図（２）である。FIG. 11 is a screen configuration diagram (2) of an existence area setting screen for setting an existence area of a recognition target according to the present embodiment; 本実施形態に係る認識対象の存在領域を設定する存在領域設定画面の画面構成図（３）である。FIG. 3 is a screen configuration diagram (3) of an existence area setting screen for setting an existence area of a recognition target according to the present embodiment; 本実施形態に係る認識対象の存在領域を設定する存在領域設定画面の画面構成図（４）である。FIG. 4 is a screen configuration diagram (4) of an existence area setting screen for setting an existence area of a recognition target according to the present embodiment; 本実施形態に係る追加学習効果判定部の動作を説明するための図（１）である。FIG. 11 is a diagram (1) for explaining the operation of the additional learning effect determination unit according to the embodiment; 本実施形態に係る追加学習効果判定部の動作を説明するための図（２）である。FIG. 12B is a diagram (2) for explaining the operation of the additional learning effect determination unit according to the present embodiment; 本実施形態に係る学習画像判定処理のフローチャートである。4 is a flowchart of learning image determination processing according to the embodiment;

次に、本発明を実施するための形態（実施形態）における学習画像判定装置について説明する。学習画像判定装置は、カメラで撮像した画像に対して、ノイズ画像判定、および学習効果判定を行い、ノイズ画像ではなく、追加学習の効果が高い画像を追加学習用の画像として選定する。 Next, a learning image determination device in a mode (embodiment) for carrying out the present invention will be described. The learning image determination device performs noise image determination and learning effect determination on images captured by a camera, and selects images for which additional learning is highly effective, not noise images, as images for additional learning.

ノイズ画像判定では、人物検出処理と骨格検出処理（関節点の検出処理）とを行い、検出された人物の領域と、関節点の位置とが不一致ならばノイズ画像と判定される。また、画像が粗かったり、コントラストが悪かったりして人物が検出できない画像や、遮蔽物があって人物の一部が欠如している画像もノイズ画像と判定される。 In noise image determination, person detection processing and skeleton detection processing (joint point detection processing) are performed, and if the detected person region and the positions of the joint points do not match, the image is determined to be a noise image. An image in which a person cannot be detected because the image is rough or has poor contrast, or an image in which a part of a person is missing due to an obstacle is also determined as a noise image.

追加学習効果判定では、認識処理（推定処理、分類処理、識別処理）において学習モデルが注視している領域（注視領域）と、認識対象が存在する領域（対象存在領域、存在領域）とが比較される。認識対象が存在する領域とは、例えば、骨格検出処理で抽出された関節点により定まる領域である。例えば、上半身の領域は肩や肘の関節点により定まる領域である。 In the additional learning effect judgment, the area where the learning model is gazing (gazing area) in the recognition process (estimation process, classification process, discrimination process) and the area where the recognition target exists (target existence area, existence area) are compared. be done. The region in which the recognition target exists is, for example, a region defined by joint points extracted in the skeleton detection process. For example, the upper body region is defined by the joint points of the shoulders and elbows.

注視領域と対象存在領域との２つの領域が不一致である画像は、学習モデルが本来の認識対象とは別の対象（領域）を注視して認識しており、誤認識していると考えられる。このため、当該画像を用いて追加学習が必要であり、学習効果が大きいと判定される。２つの領域が一致する画像は、モデルが正しく認識していると考えられるため、当該画像を用いた追加学習が不必要であり、学習効果が小さい画像と判定される。 Images in which the gaze region and the target existence region do not match are considered to be misrecognized because the learning model is gazing at and recognizing another target (region) other than the original recognition target. . Therefore, it is determined that additional learning is required using the image, and that the learning effect is large. An image in which the two regions match is considered to be correctly recognized by the model, so additional learning using the image is unnecessary and the learning effect is judged to be small.

学習効果が大きい画像を収集し、正解ラベルを付与して、追加学習を行うことで、学習モデルの性能が向上する。ノイズ画像や学習効果が小さい画像を除いた後の、追加学習の効果が高い画像に正解ラベルを付与（アノテーション）することで、学習データセットを構築する工数を削減することができるようになる。 The performance of the learning model is improved by collecting images with a large learning effect, assigning correct labels, and performing additional learning. After removing noise images and images with a small learning effect, assigning correct labels (annotations) to images with a high effect of additional learning can reduce the man-hours for constructing a learning data set.

以下では、カメラが撮像した画像に含まれる人物の上半身の服装と下半身の服装とを認識する処理用の学習モデルについて、追加学習用の画像を選定して収集する学習画像判定装置について説明する。なお、学習モデルを、機械学習モデル、または単にモデルとも記す。また、プロセッサが学習モデルを用いて服装を認識する処理を実行することを、学習モデルが服装を認識するとも記す。例えば「学習モデルが画像中の人物の上半身服装をジャケットと認識した」とも記す。 A learning image determination device that selects and collects images for additional learning will be described below for a learning model for processing to recognize upper and lower body clothing of a person included in images captured by a camera. A learning model is also referred to as a machine learning model or simply a model. In addition, the process of recognizing clothing by the processor using the learning model is also referred to as recognizing the clothing by the learning model. For example, "the learning model recognized the upper-body clothing of the person in the image as a jacket" is also described.

≪学習画像判定装置の構成≫
図１は、本実施形態に係る学習画像判定装置１００の機能ブロック図である。学習画像判定装置１００には、カメラ３８０や学習装置３００が接続される。学習画像判定装置１００は、カメラ３８０が撮像した画像について追加学習の効果を判定し、効果が高い画像を選定して蓄積する。蓄積された学習用画像は、学習装置３００に送られ、人手で正解ラベルが付与されて、モデルの追加学習用の教師データ（学習データセット）となる。 <<Structure of learning image determination device>>
FIG. 1 is a functional block diagram of a learning image determination device 100 according to this embodiment. A camera 380 and a learning device 300 are connected to the learning image determination device 100 . The learning image determination device 100 determines the effect of additional learning on the images captured by the camera 380, and selects and accumulates highly effective images. The accumulated learning images are sent to the learning device 300 and manually labeled with correct labels to become teacher data (learning data set) for additional learning of the model.

学習画像判定装置１００は、制御部１１０、記憶部１２０、および入出力部１８０を備える。入出力部１８０は、ディスプレイやキーボード、マウスなどのユーザインタフェース装置、および通信インタフェースなどから構成される。入出力部１８０は、カメラ３８０が撮像した画像を受け取り、制御部１１０に出力する。また、入出力部１８０は、学習装置３００とデータの送受信を行う。 The learning image determination device 100 includes a control unit 110 , a storage unit 120 and an input/output unit 180 . The input/output unit 180 includes a display, a keyboard, a user interface device such as a mouse, a communication interface, and the like. The input/output unit 180 receives an image captured by the camera 380 and outputs the image to the control unit 110 . The input/output unit 180 also transmits and receives data to and from the learning device 300 .

記憶部１２０は、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）、ＳＳＤ（Solid State Drive）などから構成される。記憶部１２０には、プログラム１２１、学習モデル１２２、画像データベース１３０、および領域設定データベース１５０が記憶される。プログラム１２１には、制御部１１０を構成するＣＰＵ（Central Processing Unit）が実行する学習画像判定処理（後記する図１４参照）の処理手順が含まれる。学習モデル１２２は、画像に含まれる人物の上半身の服装（上半身服装）と下半身の服装（下半身服装）とを認識する機械学習処理用の学習モデルである。学習画像判定装置１００が追加学習の効果が大きいと判定して選定する画像は、学習モデル１２２の追加学習用の画像である。 The storage unit 120 includes a ROM (Read Only Memory), a RAM (Random Access Memory), an SSD (Solid State Drive), and the like. Storage unit 120 stores program 121 , learning model 122 , image database 130 , and area setting database 150 . The program 121 includes a processing procedure of learning image determination processing (see FIG. 14 described later) executed by a CPU (Central Processing Unit) that constitutes the control unit 110 . The learning model 122 is a learning model for machine learning processing that recognizes the clothing of the upper body (upper body clothing) and the clothing of the lower body (lower body clothing) of the person included in the image. Images selected by the learning image determination device 100 as having a large effect of additional learning are images for additional learning of the learning model 122 .

≪学習画像判定装置の構成：画像データベース≫
図２は、本実施形態に係る画像データベース１３０のデータ構成図である。画像データベース１３０は、例えば表形式のデータであって、１つの行（レコード）は、１人の人物の画像を示す。レコードは、パス名１３１、種別１３２、人物領域１３３、関節点１３４、上半身服装１３５、上半身注視領域１３６、下半身服装１３７、および下半身注視領域１３８の列（属性）を含む。 <<Structure of learning image determination device: image database>>
FIG. 2 is a data configuration diagram of the image database 130 according to this embodiment. The image database 130 is tabular data, for example, and one row (record) indicates an image of one person. The record includes columns (attributes) of path name 131, type 132, person area 133, joint point 134, upper body clothing 135, upper body gaze area 136, lower body clothing 137, and lower body gaze area 138.

パス名１３１は、人物を含む画像ファイルのパス名（識別情報）である。なお、以下の説明では、特に区別する必要がない場合には、画像ファイルと、画像ファイルに含まれる画像（画像データ）とを同一視する。
種別１３２は、後記するノイズ画像判定部１１３や追加学習効果判定部１１７が判定した結果の画像の種別である。種別には、ノイズ画像を示す「ノイズ」、学習効果が大きい画像を示す「効果大」、および学習効果が小さい画像を示す「効果小」がある。 The path name 131 is the path name (identification information) of the image file containing the person. In the following description, image files and images (image data) included in the image files are regarded as the same when there is no particular need to distinguish between them.
The type 132 is the type of image as a result of determination by the noise image determination unit 113 and the additional learning effect determination unit 117, which will be described later. The types include "noise" indicating a noise image, "high effect" indicating an image with a large learning effect, and "small effect" indicating an image with a small learning effect.

人物領域１３３は、パス名１３１で示される画像における人物が写っている領域を示す。人物領域１３３は、領域が、例えば矩形であるとすると、当該矩形の対角線の両端となる２つの点の座標である。人物領域が特定できない場合には、人物領域１３３は、「ＮＤ（Not Detected）」となり、種別１３２は「ノイズ」となる。
関節点１３４は、人物の関節点である。関節点には、両耳、両目、両肩、両肘、両手首、腰の左右、両膝、両足首がある。関節点１３４は、これら関節点の画像における座標が含まれる。座標と関節点名（例えば右目）とがペアとなって含まれてもよい。 A person area 133 indicates an area in which a person appears in the image indicated by the path name 131 . If the area is, for example, a rectangle, the person area 133 is the coordinates of two points that are both ends of the diagonal line of the rectangle. When the person area cannot be identified, the person area 133 is "ND (Not Detected)" and the type 132 is "noise".
The joint point 134 is the joint point of the person. Joint points include both ears, both eyes, both shoulders, both elbows, both wrists, left and right hips, both knees, and both ankles. Joint points 134 contain the coordinates in the image of these joint points. Coordinates and articulation point names (eg, right eye) may be included in pairs.

上半身服装１３５は、学習モデル１２２が認識した結果の上半身の服装である。下半身服装１３７は、学習モデル１２２が認識した結果の下半身の服装である。
上半身注視領域１３６は、学習モデル１２２が上半身服装１３５を認識した際に、学習モデル１２２が注視した領域（注視領域）である。下半身注視領域１３８は、学習モデル１２２が下半身服装１３７を認識した際に、学習モデル１２２が注視した領域である。学習モデル１２２が注視した領域とは、学習モデル１２２を用いた認識処理において、上半身服装１３５または下半身服装１３７にある認識結果を出力するのに影響を与えた画像の領域である。例えば、深層学習における注視点可視化技術を用いれば、認識時に影響を与えた画像の特徴部分を影響関数により特定でき、画像のどこの領域を注視したか、ピクセル単位で影響度合いを算出できる。注視点可視化技術については、次の文献に記載がある: Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, 2016, <https://arxiv.org/abs/1610.02391>。
レコード１４６～１４９については、後記する。 The upper-body clothing 135 is the upper-body clothing recognized by the learning model 122 . The lower-body clothing 137 is the lower-body clothing recognized by the learning model 122 .
The upper-body gaze region 136 is the region (gazing region) that the learning model 122 gazes upon when the learning model 122 recognizes the upper-body clothing 135 . The lower-body gaze region 138 is the region that the learning model 122 gazed at when the learning model 122 recognized the lower-body clothing 137 . The region focused on by the learning model 122 is the region of the image that influenced the output of the recognition result of the upper body clothing 135 or the lower body clothing 137 in the recognition processing using the learning model 122 . For example, using point-of-regard visualization technology in deep learning, it is possible to identify the characteristic portions of an image that have an impact on recognition by means of an influence function, and calculate the degree of influence on which area of the image the user gazes at on a pixel-by-pixel basis. The point-of-regard visualization technique is described in Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, 2016, <https://arxiv.org/abs/1610.02391>.
Records 146-149 will be described later.

≪学習画像判定装置の構成：領域設定データベース≫
図３は、本実施形態に係る領域設定データベース１５０のデータ構成図である。領域設定データベース１５０は、例えば表形式のデータであって、１つの行（レコード）は、１人の認識対象が存在する領域（対象存在領域）の定義を示す。本実施形態では、関節点を用いて定義する。レコード１５９は、学習モデル１２２の認識対象である下半身服装が存在する領域は、左膝、右膝、左足首、および右足首の関節点が存在する領域であることを示している。対象存在領域（存在領域）は、存在領域設定画面（後記する図８～図１１参照）を用いて、学習画像判定装置１００の利用者が設定する。 <<Structure of learning image determination device: Region setting database>>
FIG. 3 is a data configuration diagram of the area setting database 150 according to this embodiment. The region setting database 150 is, for example, tabular data, and one row (record) indicates the definition of a region (target existence region) where one person's recognition target exists. In this embodiment, it is defined using joint points. A record 159 indicates that the region where the lower-body clothing to be recognized by the learning model 122 exists is the region where the joint points of the left knee, right knee, left ankle, and right ankle exist. The target existence area (existence area) is set by the user of the learning image determination device 100 using an existence area setting screen (see FIGS. 8 to 11 described later).

≪学習画像判定装置の構成：制御部≫
図１に戻って、制御部１１０の説明を続ける。制御部１１０は、人物検出部１１１、骨格検出部１１２、ノイズ画像判定部１１３、認識部１１４、注視点可視化部１１５、対象存在領域設定部１１６、追加学習効果判定部１１７、および学習画像選定部１１８を備える。 <<Structure of learning image determination device: control unit>>
Returning to FIG. 1, the description of the control unit 110 is continued. Control unit 110 includes person detection unit 111, skeleton detection unit 112, noise image determination unit 113, recognition unit 114, gaze point visualization unit 115, target presence area setting unit 116, additional learning effect determination unit 117, and learning image selection unit. 118.

人物検出部１１１は、画像のなかで人物が写っている領域を検出する。人物検出部１１１は、例えば、深層学習技術を用いて人物領域を検出してもよいし、ＨＯＧ（Histogram of Oriented Gradients）特徴量を用いて勾配強度を抽出してＳＶＭ（Support Vector Machine）で検出してもよい。図４は、本実施形態に係る人物検出部１１１が画像４１０から検出した人物領域４１１を示す図である。 A person detection unit 111 detects an area in which a person appears in an image. For example, the person detection unit 111 may detect a person region using a deep learning technique, or extract a gradient strength using a HOG (Histogram of Oriented Gradients) feature amount and detect it with an SVM (Support Vector Machine). You may FIG. 4 is a diagram showing a person area 411 detected from an image 410 by the person detection unit 111 according to this embodiment.

図１に戻って、骨格検出部１１２は、画像のなかで人物の関節点を検出する。関節点には、両耳、両目、両肩、両肘、両手首、腰の左右、両膝、両足首がある。図５は、本実施形態に係る骨格検出部１１２が画像４２０から検出した関節点を示す図である。関節点４２１は、右膝に相当する関節点である。 Returning to FIG. 1, the skeleton detection unit 112 detects the joint points of the person in the image. Joint points include both ears, both eyes, both shoulders, both elbows, both wrists, left and right hips, both knees, and both ankles. FIG. 5 is a diagram showing joint points detected from the image 420 by the skeleton detection unit 112 according to this embodiment. A joint point 421 is a joint point corresponding to the right knee.

図１に戻って、ノイズ画像判定部１１３は、画像がノイズ画像か否かを判定する。詳しくは、ノイズ画像判定部１１３は、人物検出部１１１が人物を検出できなければノイズ画像と判定し、画像データベース１３０（図２参照）に記録する。ノイズ画像判定部１１３は、レコード１４６が示すように、人物領域１３３を「ＮＤ」として人物が未検出であると記録し、種別１３２を「ノイズ」としてノイズ画像であると記録する。 Returning to FIG. 1, the noise image determination unit 113 determines whether the image is a noise image. Specifically, if the person detection unit 111 cannot detect a person, the noise image determination unit 113 determines that the image is a noise image, and records the image in the image database 130 (see FIG. 2). As shown in the record 146, the noise image determination unit 113 records that the person area 133 is set to "ND" to indicate that no person has been detected, and the type 132 is set to "noise" to indicate that it is a noise image.

図６は、本実施形態に係る人物検出部１１１が人物を検出できなかった画像４３０を示す図（１）である。画像４３０は、コントラストが悪く人物（人物領域）が未検出となる。
図７は、本実施形態に係る人物検出部１１１が人物を検出できなかった画像４４０を示す図（２）である。画像４４０では、障害物４４１があるため人物（人物領域）が未検出となる。なお、骨格検出部１１２は、一部の関節点を検出している。 FIG. 6 is a diagram (1) showing an image 430 in which the person detection unit 111 according to this embodiment cannot detect a person. The image 430 has poor contrast and no person (person area) is detected.
FIG. 7 is a diagram (2) showing an image 440 in which the person detection unit 111 according to this embodiment cannot detect a person. In the image 440, the person (person area) is not detected because there is an obstacle 441. FIG. Note that the skeleton detection unit 112 detects some joint points.

図１に戻って、人物領域が検出できた場合、ノイズ画像判定部１１３は、骨格検出部１１２が検出した関節点と人物領域とが対応していれば、画像は非ノイズ画像と判定する。画像４２０（図５参照）では、全ての関節点が人物領域４２２に含まれており、ノイズ画像判定部１１３は、関節点と人物領域とが対応しており非ノイズ画像と判定する。ノイズ画像判定部１１３は、例えば、人物領域に含まれる関節点の数が所定数より小さければ、ノイズ画像と判定する。骨格検出部１１２が検出した関節点のなかで所定比の関節点が人物画像に含まれていれば、ノイズ画像判定部１１３は、非ノイズ画像と判定してもよい。また、関節点が人物領域の一部分に片寄っていれば、ノイズ画像判定部１１３は、ノイズ画像と判定してもよい。 Returning to FIG. 1, when the human region can be detected, the noise image determination unit 113 determines that the image is a non-noise image if the joint points detected by the skeleton detection unit 112 correspond to the human region. In the image 420 (see FIG. 5), all the joint points are included in the human region 422, and the noise image determination unit 113 determines that the joint points and the human region correspond to each other and is a non-noise image. For example, if the number of joint points included in the human region is smaller than a predetermined number, the noise image determination unit 113 determines that the image is a noise image. If the human image includes a predetermined ratio of joint points among the joint points detected by the skeleton detection unit 112, the noise image determination unit 113 may determine that the image is a non-noise image. Also, if the joint points are concentrated in a part of the human region, the noise image determination unit 113 may determine that the image is a noise image.

ノイズ画像判定部１１３は、人物領域と関節点が対応しなければ、ノイズ画像と判定し、画像を記憶部１２０に記憶して、判定結果を画像データベース１３０（図２参照）に記録する。ノイズ画像判定部１１３は、レコード１４７が示すように、人物領域１３３に人物検出部１１１が検出した領域を、関節点１３４に骨格検出部１１２が検出した関節点を格納し、種別１３２を「ノイズ」としてノイズ画像であると記録する。
認識部１１４は、学習モデル１２２を用いた認識処理（推定処理）を実行する。詳しくは、認識部１１４は、学習モデル１２２を用いて画像に含まれる人物の上半身服装と下半身服装とを認識する。 If the human region and the joint point do not correspond, the noise image determination unit 113 determines that the image is a noise image, stores the image in the storage unit 120, and records the determination result in the image database 130 (see FIG. 2). The noise image determination unit 113 stores the area detected by the person detection unit 111 in the person area 133 and the joint points detected by the skeleton detection unit 112 in the joint point 134, as indicated by the record 147, and sets the type 132 to "noise ” and recorded as a noise image.
The recognition unit 114 executes recognition processing (estimation processing) using the learning model 122 . Specifically, the recognition unit 114 uses the learning model 122 to recognize the upper body clothing and the lower body clothing of the person included in the image.

注視点可視化部１１５は、学習モデル１２２が認識処理中に注視した画像の領域（注視領域）を抽出する。注視点可視化部１１５は、例えば、注視点可視化技術を用いて、認識部１１４が学習モデル１２２を用いて認識する際に、認識処理に影響を与えた画像の特徴部分を影響関数により特定して、注視領域を抽出する。
対象存在領域設定部１１６は、学習画像判定装置１００の利用者の指示に従って認識対象が存在する領域を設定する。設定される領域は、人が認識対象を認識する際に注視する領域でもある。認識対象である上半身服装や下半身服装の存在領域は、関節点を参照して設定される。設定結果は、領域設定データベース１５０（図３参照）に格納される。 The point-of-regard visualization unit 115 extracts an area of the image that the learning model 122 gazed at during the recognition process (a gaze area). The point-of-regard visualization unit 115 uses the point-of-regard visualization technology, for example, to identify the characteristic portion of the image that affected the recognition process when the recognition unit 114 recognizes using the learning model 122, using an influence function. , to extract the fixation region.
The target existence region setting unit 116 sets a region where the recognition target exists according to the instruction of the user of the learning image determination device 100 . The set area is also the area that a person pays attention to when recognizing the recognition target. The existence regions of upper-body clothing and lower-body clothing to be recognized are set with reference to joint points. The setting result is stored in the area setting database 150 (see FIG. 3).

図８は、本実施形態に係る認識対象の存在領域を設定する存在領域設定画面２１０の画面構成図（１）である。画面左側のラジオボタン２１１において上半身服装が指定されており、存在領域設定画面２１０は、上半身服装の存在領域を設定する画面であることがわかる。画面右側の設定領域２１２には関節点が表示されており、上半身服装の存在領域２１３を設定することができる。存在領域設定画面２１０では、両肩と両肘を含む存在領域２１３が設定されており、上半身服装が存在する領域は、両肩と両肘とを含んでいると設定していることになる。ＯＫボタン２１４がクリックされると、対象存在領域設定部１１６は、画面上の設定を領域設定データベース１５０のレコード１５８（図３参照）に格納する。 FIG. 8 is a screen configuration diagram (1) of the existence area setting screen 210 for setting the existence area of the recognition target according to the present embodiment. The radio button 211 on the left side of the screen designates the upper-body clothing, and it can be seen that the existing area setting screen 210 is a screen for setting the existing area of the upper-body clothing. Joint points are displayed in a setting area 212 on the right side of the screen, and an existence area 213 of upper-body clothing can be set. On the existence area setting screen 210, an existence area 213 including both shoulders and both elbows is set, and the area in which upper-body clothing exists is set to include both shoulders and both elbows. When the OK button 214 is clicked, the target existence area setting unit 116 stores the setting on the screen in the record 158 (see FIG. 3) of the area setting database 150. FIG.

図９は、本実施形態に係る認識対象の存在領域を設定する存在領域設定画面２２０の画面構成図（２）である。画面左側のラジオボタン２２１において下半身服装が指定されており、存在領域設定画面２２０は、下半身服装の存在領域を設定する画面であることがわかる。画面右側の設定領域２２２では、両膝と両足首とを含む存在領域２２３が設定されており、下半身服装が存在する領域は、両膝と両足首を含んでいると設定していることになる。ＯＫボタン２２４がクリックされると、対象存在領域設定部１１６は、画面上の設定をレコード１５９（図３参照）に格納する。
存在領域設定画面２１０，２２０では、図示された関節点を存在領域２１３，２２３で囲うことで設定している。関節点を１つずつ選択して認識対象の存在領域を設定するようにしてもよい。 FIG. 9 is a screen configuration diagram (2) of the existence area setting screen 220 for setting the existence area of the recognition target according to the present embodiment. The radio button 221 on the left side of the screen designates the lower-body clothing, and it can be seen that the existence area setting screen 220 is a screen for setting the existence area of the lower-body clothing. In the setting area 222 on the right side of the screen, an existence area 223 including both knees and both ankles is set, and the area where the lower-body clothing exists is set to include both knees and both ankles. . When the OK button 224 is clicked, the target existence area setting unit 116 stores the settings on the screen in the record 159 (see FIG. 3).
On the existence area setting screens 210 and 220, the illustrated joint points are set by surrounding them with existence areas 213 and 223, respectively. The existing region of the recognition target may be set by selecting the joint points one by one.

図１０は、本実施形態に係る認識対象の存在領域を設定する存在領域設定画面２３０の画面構成図（３）である。図１１は、本実施形態に係る認識対象の存在領域を設定する存在領域設定画面２４０の画面構成図（４）である。存在領域設定画面２３０，２４０の左側にある関節点のチェックボックスをチェックすることで存在領域に含まれる関節点を選択して認識対象の存在領域を設定することができる。 FIG. 10 is a screen configuration diagram (3) of the existence area setting screen 230 for setting the existence area of the recognition target according to the present embodiment. FIG. 11 is a screen configuration diagram (4) of the existence area setting screen 240 for setting the existence area of the recognition target according to the present embodiment. By checking the joint point check boxes on the left side of the existence area setting screens 230 and 240, the joint points included in the existence area can be selected and the existence area to be recognized can be set.

図１に戻って、追加学習効果判定部１１７は、学習モデル１２２に対する画像の追加学習の効果を判定する。追加学習効果判定部１１７は、注視点可視化部１１５が抽出した注視領域と、対象存在領域設定部１１６により設定された対象存在領域とが対応する画像であれば、学習モデル１２２は、当該画像を正しく認識しており、当該画像を用いた学習は不要であって学習効果は小さいと判定する。逆に、注視領域と対象存在領域とが対応しない画像であれば、学習モデル１２２は、当該画像を認識できておらず、当該画像を用いた学習が必要であって学習効果は大きいと判定する。 Returning to FIG. 1 , the additional learning effect determination unit 117 determines the effect of the additional learning of the image on the learning model 122 . If the region of interest extracted by the point-of-regard visualization unit 115 and the target existence region set by the target existence region setting unit 116 are images corresponding to each other, the additional learning effect determination unit 117 makes the learning model 122 convert the image into a corresponding image. It is determined that the image is recognized correctly, that learning using the image is unnecessary, and that the learning effect is small. Conversely, if the gaze region and the target existence region do not correspond to an image, the learning model 122 determines that the image cannot be recognized and that learning using the image is necessary and the learning effect is large. .

図１２は、本実施形態に係る追加学習効果判定部１１７の動作を説明するための図（１）である。画像４５０の注視領域４５１は、注視点可視化部１１５が抽出した、学習モデル１２２が上半身服装を認識するときの注視領域である。上半身服装の存在領域は、両肩と両肘とを含むと設定されている（図３のレコード１５８、図８、図１０参照）。注視領域４５１は、両肩と両肘を含んでおり、注視領域と存在領域とは対応している。このため、追加学習効果判定部１１７は、画像４５０は学習効果小と判定する。 FIG. 12 is a diagram (1) for explaining the operation of the additional learning effect determination unit 117 according to this embodiment. A gaze region 451 of the image 450 is a gaze region extracted by the gaze point visualization unit 115 when the learning model 122 recognizes upper-body clothing. The presence area of upper-body clothing is set to include both shoulders and both elbows (see record 158 in FIG. 3, FIGS. 8 and 10). The gaze area 451 includes both shoulders and both elbows, and the gaze area and existence area correspond to each other. Therefore, the additional learning effect determination unit 117 determines that the image 450 has a small learning effect.

図１３は、本実施形態に係る追加学習効果判定部１１７の動作を説明するための図（２）である。画像４６０の注視領域４６１は、注視点可視化部１１５が抽出した、学習モデル１２２が下半身服装を認識するときの注視領域である。下半身服装の存在領域は、両膝と両足首とを含むと設定されている（図３のレコード１５９、図９、図１１参照）。注視領域４６１は、膝も足首を含んでおらず、注視領域と存在領域とは対応していない。このため、追加学習効果判定部１１７は、画像４５０は学習効果大と判定する。 FIG. 13 is a diagram (2) for explaining the operation of the additional learning effect determination unit 117 according to this embodiment. A gaze region 461 of the image 460 is a gaze region extracted by the gaze point visualization unit 115 when the learning model 122 recognizes the lower-body clothing. The presence area of the lower-body clothing is set to include both knees and both ankles (see record 159 in FIG. 3, FIGS. 9 and 11). The gaze area 461 does not include the knees or the ankles, and the gaze area and the presence area do not correspond. Therefore, the additional learning effect determination unit 117 determines that the image 450 has a large learning effect.

上記の例では、追加学習効果判定部１１７は、存在領域に含まれる全ての関節点が注視領域に含まれるときに、注視領域と対象存在領域とが対応していると判定している。追加学習効果判定部１１７は、存在領域に含まれる関節点のなかで所定数ないしは所定比の関節点が注視領域に含まれれば、注視領域と存在領域とが対応すると判定してもよい。 In the above example, the additional learning effect determination unit 117 determines that the gaze area and the target existence area correspond when all the joint points included in the existence area are included in the gaze area. The additional learning effect determination unit 117 may determine that the gaze area and the existence area correspond if the gaze area includes a predetermined number or a predetermined ratio of joint points among the joint points included in the existence area.

図１に戻って、追加学習効果判定部１１７は、画像（画像ファイル）を記憶部１２０に記憶し、判定結果を画像データベース１３０に記録する。学習効果小と判定した場合には、レコード１４８に示すように、追加学習効果判定部１１７は、種別１３２を「効果小」として記録する。学習効果大と判定した場合には、レコード１４９に示すように、追加学習効果判定部１１７は、種別１３２を「効果大」として記録する。 Returning to FIG. 1 , the additional learning effect determination unit 117 stores the image (image file) in the storage unit 120 and records the determination result in the image database 130 . When the learning effect is determined to be small, the additional learning effect determining unit 117 records the type 132 as “small effect” as shown in the record 148 . When it is determined that the learning effect is large, the additional learning effect determination unit 117 records the type 132 as “large effect” as shown in the record 149 .

また、追加学習効果判定部１１７は、上半身注視領域１３６と下半身注視領域１３８とに、学習モデル１２２が上半身服装と下半身服装とを認識する際の注視領域をそれぞれ格納する。また、追加学習効果判定部１１７は、上半身服装と下半身服装との認識結果を上半身服装１３５と下半身服装１３７とにそれぞれ格納する。なお、注視領域と存在領域とが対応していないときには、上半身服装１３５と下半身服装１３７とに格納される認識結果に「（ＮＧ）」を付加するようにしてもよい（図２のレコード１４９の上半身服装１３５参照）。
学習画像選定部１１８は、画像データベース１３０のなかの種別１３２が「効果大」のレコードと、対応する画像（画像ファイル）とを選定して、学習装置３００に送信する。 Further, the additional learning effect determination unit 117 stores the gaze areas when the learning model 122 recognizes the upper-body outfit and the lower-body outfit in the upper-body gaze area 136 and the lower-body gaze area 138, respectively. Further, the additional learning effect determination unit 117 stores the recognition results of the upper-body clothing and the lower-body clothing in the upper-body clothing 135 and the lower-body clothing 137, respectively. When the region of interest and the region of presence do not correspond, "(NG)" may be added to the recognition results stored in the upper-body clothing 135 and the lower-body clothing 137 (see record 149 in FIG. 2). See upper body clothing 135).
The learning image selection unit 118 selects records whose type 132 is “high effect” in the image database 130 and corresponding images (image files), and transmits them to the learning device 300 .

≪学習装置の構成≫
学習装置３００には、機能部と記憶部と入出力部とを備える。入出力部は、ユーザインタフェース装置や通信インタフェースを備える。
記憶部には、学習モデル３２１と、学習用画像データベース３３０と、画像ファイルとが記憶される。学習モデル３２１は、追加学習の対象となる学習モデルであって、学習モデル１２２と同等である。学習用画像データベース３３０のデータ構成は、画像データベース１３０と同様のデータ構成である。学習画像選定部１１８が送信した画像データベース１３０のレコードは学習用画像データベース３３０に、画像ファイルは記憶部に格納される。なお、学習画像判定装置１００が送信したレコードは、学習効果が大きいレコードであり、学習モデル１２２が間違った領域を注視して出力した認識結果（図２の上半身服装１３５、下半身服装１３７参照）が含まれている。受信データが格納された時点では、認識結果には間違いが含まれている。 <<Structure of learning device>>
The learning device 300 includes a functional section, a storage section, and an input/output section. The input/output unit includes a user interface device and a communication interface.
The storage unit stores a learning model 321, a learning image database 330, and image files. The learning model 321 is a learning model to be subjected to additional learning and is equivalent to the learning model 122 . The data configuration of the learning image database 330 is similar to that of the image database 130 . The record of the image database 130 transmitted by the learning image selection unit 118 is stored in the learning image database 330, and the image file is stored in the storage unit. Note that the record transmitted by the learning image determination device 100 is a record with a large learning effect, and the recognition result output by the learning model 122 focusing on the wrong area (see upper body clothing 135 and lower body clothing 137 in FIG. 2) is include. At the time the received data is stored, the recognition results contain errors.

学習装置３００は、機能部としてラベル付与部３１１と学習部３１２とを備える。
ラベル付与部３１１は、学習用画像データベース３３０にある各レコードに正しい正解ラベルを付与する。詳しくは、ラベル付与部３１１は、学習用画像データベース３３０にある各レコードについて、対応する画像を表示して、利用者が入力した上半身服装と下半身服装とを正解ラベルとして取得して、対応するレコードに格納する。記憶部にある画像と正解ラベルとが合わさって、学習データセット（教師データ）となる。
学習部３１２は、学習データセットを用いて学習モデル３２１に対して追加学習処理を実行する。 The learning device 300 includes a labeling unit 311 and a learning unit 312 as functional units.
The label assigning unit 311 assigns a correct correct label to each record in the learning image database 330 . Specifically, the labeling unit 311 displays the corresponding image for each record in the learning image database 330, acquires the upper body clothing and the lower body clothing input by the user as correct labels, and displays the corresponding record. store in The images in the storage unit and the correct labels are combined to form a learning data set (teaching data).
The learning unit 312 performs additional learning processing on the learning model 321 using the learning data set.

≪学習画像判定処理≫
図１４は、本実施形態に係る学習画像判定処理のフローチャートである。
ステップＳ１１において制御部１１０は、所定数の画像を収集するまでステップＳ１２～Ｓ２２を繰り返す処理を開始する。
ステップＳ１２において制御部１１０は、入出力部１８０から画像を取得する。
ステップＳ１３において人物検出部１１１は、画像から人物領域を抽出する。
ステップＳ１４において骨格検出部１１２は、画像から関節点を抽出する。 ≪Learning Image Judgment Processing≫
FIG. 14 is a flowchart of learning image determination processing according to this embodiment.
In step S11, the control unit 110 starts processing to repeat steps S12 to S22 until a predetermined number of images are acquired.
In step S12 , control unit 110 acquires an image from input/output unit 180 .
In step S13, the person detection unit 111 extracts a person area from the image.
In step S14, the skeleton detection unit 112 extracts joint points from the image.

ステップＳ１５においてノイズ画像判定部１１３は、画像がノイズ画像か否かを判定して、ノイズ画像であれば（ステップＳ１５→ＹＥＳ）ステップＳ１６に進み、ノイズ画像でなければ（ステップＳ１５→ＮＯ）ステップＳ１７に進む。
ステップＳ１６において、ノイズ画像判定部１１３は、判定結果を画像データベース１３０に格納して、ステップＳ１２に戻る。詳しくは、ノイズ画像判定部１１３は、画像を記憶部１２０に格納する。次に、ノイズ画像判定部１１３は、画像データベース１３０にレコードを追加し、パス名１３１を画像（画像ファイル）のパス名とし、種別１３２を「ノイズ」とする。ノイズ画像判定部１１３は、ステップＳ１３で抽出された人物領域を人物領域１３３に、ステップＳ１４で抽出された関節点を関節点１３４に格納する。ノイズ画像判定部１１３は、人物領域や関節点が検出できなかった場合には、「ＮＤ」を格納する。ノイズ画像判定部１１３は、上半身服装１３５、上半身注視領域１３６、下半身服装１３７、および下半身注視領域１３８に「ＮＤ」を格納する。 In step S15, the noise image determination unit 113 determines whether the image is a noise image. If the image is a noise image (step S15→YES), the process proceeds to step S16. Proceed to S17.
In step S16, the noise image determination unit 113 stores the determination result in the image database 130, and returns to step S12. Specifically, the noise image determination unit 113 stores the image in the storage unit 120 . Next, the noise image determination unit 113 adds a record to the image database 130, sets the path name 131 to the path name of the image (image file), and sets the type 132 to "noise". The noise image determination unit 113 stores the person area extracted in step S13 in the person area 133 and the joint points extracted in step S14 in the joint points 134 . The noise image determination unit 113 stores "ND" when the human region or the joint point cannot be detected. The noise image determination unit 113 stores “ND” in the upper-body clothing 135 , the upper-body gaze region 136 , the lower-body clothing 137 , and the lower-body gaze region 138 .

ステップＳ１７において認識部１１４は、学習モデル１２２を用いて上半身服装と下半身服装とを認識（推定、判定）する。
ステップＳ１８において注視点可視化部１１５は、認識部１１４が上半身服装と下半身服装とを認識したときに注視領域をそれぞれ抽出する。 In step S17 , the recognition unit 114 uses the learning model 122 to recognize (estimate, determine) upper-body clothing and lower-body clothing.
In step S18 , the point-of-regard visualization unit 115 extracts respective regions of interest when the recognition unit 114 recognizes upper-body clothing and lower-body clothing.

ステップＳ１９において追加学習効果判定部１１７は、注視領域と対象存在領域設定部１１６により設定された存在領域（図３、図８、図９参照）とを比較する。詳しくは、追加学習効果判定部１１７は、認識部１１４が上半身服装を認識したときの注視領域と、上半身服装の存在領域（図３のレコード１５８、図８参照）とが対応しているかを判断する。また、追加学習効果判定部１１７は、認識部１１４が下半身服装を認識したときの注視領域と、下半身服装の存在領域（図３のレコード１５９、図９参照）とが対応しているか判断する。
ステップＳ２０において追加学習効果判定部１１７は、ステップＳ１９における２つの判断のうち何れかで対応していないときには（ステップＳ２０→ＮＯ）ステップＳ２１に進み、両方とも対応していれば（ステップＳ２０→ＹＥＳ）ステップＳ２２に進む。 In step S19 , the additional learning effect determination unit 117 compares the region of interest with the existing region set by the target existing region setting unit 116 (see FIGS. 3, 8, and 9). More specifically, the additional learning effect determination unit 117 determines whether or not the gaze region when the recognition unit 114 recognizes the upper-body clothing corresponds to the existence region of the upper-body clothing (record 158 in FIG. 3, see FIG. 8). do. Further, the additional learning effect determination unit 117 determines whether or not the gaze region when the recognition unit 114 recognizes the lower-body clothing corresponds to the existence region of the lower-body clothing (record 159 in FIG. 3, see FIG. 9).
In step S20, the additional learning effect determination unit 117 proceeds to step S21 if one of the two determinations in step S19 does not correspond (step S20→NO), and if both correspond (step S20→YES ) Go to step S22.

ステップＳ２１において追加学習効果判定部１１７は、判定結果を記憶部１２０に格納する。詳しくは、追加学習効果判定部１１７は、画像を記憶部１２０に格納する。次に、追加学習効果判定部１１７は、画像データベース１３０にレコードを追加し、パス名１３１を画像（画像ファイル）のパス名とし、種別１３２を「効果大」とする。次に、追加学習効果判定部１１７は、ステップＳ１３で抽出された人物領域を人物領域１３３に、ステップＳ１４で抽出された関節点を関節点１３４に格納する。続けて、追加学習効果判定部１１７は、上半身服装１３５、および下半身服装１３７にステップＳ１７の認識結果を格納する。但し、注視領域と存在領域とが対応していない場合には、認識結果に「（ＮＧ）」を付加する（図２のレコード１４９の上半身服装１３５参照）。また、追加学習効果判定部１１７は、上半身注視領域１３６、および下半身注視領域１３８にステップＳ１８で抽出された上半身服装および下半身服装の認識時の注視領域をそれぞれ格納する。 In step S21 , additional learning effect determination section 117 stores the determination result in storage section 120 . Specifically, additional learning effect determination section 117 stores the image in storage section 120 . Next, the additional learning effect determination unit 117 adds a record to the image database 130, sets the path name 131 to the path name of the image (image file), and sets the type 132 to "high effect". Next, the additional learning effect determination unit 117 stores the human region extracted in step S13 in the human region 133 and the joint points extracted in step S14 in the joint point 134. FIG. Subsequently, the additional learning effect determination unit 117 stores the recognition result of step S17 in the upper-body clothing 135 and the lower-body clothing 137. FIG. However, if the gaze area and the presence area do not correspond, "(NG)" is added to the recognition result (see upper-body clothing 135 of record 149 in FIG. 2). Further, the additional learning effect determination unit 117 stores the gaze areas at the time of recognition of the upper body clothing and the lower body clothing extracted in step S18 in the upper body gaze area 136 and the lower body gaze area 138, respectively.

ステップＳ２２は、種別１３２を「効果小」とすることを除いて、ステップＳ２１と同様である（図２のレコード１４８参照）。
ステップＳ２３において制御部１１０は、所定数の画像を収集（画像データベース１３０に記録）したならばステップＳ２４に進み、所定数に満たない場合にはステップＳ１２に戻る。所定数の画像を収集するとは、種別１３２が「効果大」となる画像データベース１３０のレコードの数が所定数に達したということである。または、制御部１１０は、上半身服装１３５に「（ＮＧ）」を含むレコードと、下半身服装１３７に「（ＮＧ）」を含むレコードとが、それぞれ所定数に達したときに、所定数の画像を収集したと判断してもよい。 Step S22 is the same as step S21 except that the type 132 is set to "small effect" (see record 148 in FIG. 2).
In step S23, if the control unit 110 collects (records in the image database 130) a predetermined number of images, the process proceeds to step S24, and if the number is less than the predetermined number, the process returns to step S12. Collecting a predetermined number of images means that the number of records in the image database 130 whose type 132 is "high effect" has reached a predetermined number. Alternatively, when the number of records including "(NG)" in the upper-body clothing 135 and the number of records including "(NG)" in the lower-body clothing 137 reach a predetermined number, control unit 110 displays a predetermined number of images. may be considered collected.

ステップＳ２４において学習画像選定部１１８は、種別１３２が「効果大」となる画像データベース１３０のレコードと、当該レコードのパス名１３１に対応する画像ファイルとを学習装置３００に送信する。 In step S24 , the learning image selection unit 118 transmits the record of the image database 130 whose type 132 is “high effect” and the image file corresponding to the path name 131 of the record to the learning device 300 .

≪学習画像判定装置の特徴≫
学習画像判定装置１００は、画像について、学習モデル１２２の追加学習の効果を判定し、効果が大きい画像を選定する。追加学習の効果が大きい画像とは、非ノイズ画像であって、認識部１１４が認識時に注目した注目領域と、認識対象の存在領域とが対応していない画像である。認識対象の存在領域は、人が認識する際に注目する領域を、関節点を用いて定義される（図８～図１１参照）。注目領域と存在領域とが対応していない画像は、学習モデル１２２を用いる認識部１１４が誤認識する画像であり、学習モデル１２２に学習させて正しく認識できるようにすべき画像である。 <<Characteristics of the learning image judgment device>>
The learning image determination device 100 determines the effect of the additional learning of the learning model 122 on the images, and selects images with a large effect. An image with a large effect of additional learning is a non-noise image in which the region of interest that the recognition unit 114 paid attention to during recognition does not correspond to the presence region of the recognition target. The existing region of the recognition target is defined by using joint points as a region that a person pays attention to when recognizing (see FIGS. 8 to 11). An image in which the attention area and the existing area do not correspond is an image that is erroneously recognized by the recognition unit 114 using the learning model 122, and is an image that should be correctly recognized by the learning model 122.

このように、学習画像判定装置１００は、学習モデル１２２が誤認識する画像を選定して学習装置３００に送信する。学習装置３００は、利用者が指定した正解ラベルを画像に付与し、学習モデル３２１（学習モデル１２２と同等）の追加学習を行う。利用者は、学習モデル１２２（学習モデル１２２を用いる認識部１１４）が誤認識する画像に対してのみ正解ラベルを付与すればよいので工数が削減でき、効率よく追加学習用の学習データセットを作成できる。 In this way, the learning image determination device 100 selects images that are erroneously recognized by the learning model 122 and transmits them to the learning device 300 . The learning device 300 assigns the correct label specified by the user to the image, and performs additional learning of the learning model 321 (equivalent to the learning model 122). The user only needs to assign correct labels to images that are misrecognized by the learning model 122 (the recognition unit 114 that uses the learning model 122), so the number of man-hours can be reduced, and a learning data set for additional learning can be efficiently created. can.

≪変形例：学習効果判定≫
上記した実施形態の追加学習効果判定部１１７は、上半身服装と下半身服装との双方について注視領域と存在領域とを比較し、何れかが対応していない場合には、学習効果が大きいと判定している（図１４のステップＳ１９～Ｓ２０→ＮＯ参照）。これに対して、双方とも対応していない場合には学習効果大、一方だけが対応していない場合には学習効果中と判定するようにしてもよい。または、学習効果を大中小と判定するのではなく、注視領域と存在領域との重なり度合いとして数値化（重ならないほど数値が大きい）してもよい（注視領域と対象存在領域とが異なる度合いが大きいほど、追加学習用の画像としての効果が大きい）。学習画像選定部１１８は、学習効果が大きい画像を優先して学習装置３００に送信するようにしてもよい。学習効果が大きい画像を追加学習用の送信することで、少ない追加学習用画像で追加学習しても、学習モデルの性能向上が見込めるようになる。 ≪Modification: Learning effect judgment≫
The additional learning effect determination unit 117 of the above-described embodiment compares the region of interest and the region of existence for both the upper-body clothing and the lower-body clothing, and determines that the learning effect is large if one of them does not correspond. (See steps S19-S20→NO in FIG. 14). On the other hand, if both do not correspond, the learning effect may be determined to be high, and if only one does not correspond, the learning effect may be determined to be good. Alternatively, instead of judging the learning effect as large, medium, or small, the degree of overlap between the gaze area and the existence area may be quantified (the smaller the overlap, the larger the numerical value) (the degree of difference between the gaze area and the target existence area is The larger the size, the greater the effect as an image for additional learning). The learning image selection unit 118 may preferentially transmit images having a large learning effect to the learning device 300 . By transmitting images with a large learning effect for additional learning, it is possible to expect an improvement in the performance of the learning model even if additional learning is performed with a small number of images for additional learning.

≪変形例：関節点以外の認識≫
上記した実施形態では、認識対象の存在領域は関節点を参照して定義されている（図３、図８～図１１参照）が、関節点以外のものを参照して定義されてもよい。例えば、頭部、肩、胸、腹、腰、臀部、上腕、前腕、手、大腿、下腿、足などの部位を参照して定義されてもよい。さらに、目、鼻、口、右手と左手の親指・人さし指・中指・薬指・小指などを加えてもよい。 <<Modification: Recognition other than joint points>>
In the above-described embodiments, the existing region of the recognition target is defined with reference to the joint points (see FIGS. 3 and 8 to 11), but it may be defined with reference to something other than the joint points. For example, it may be defined with reference to parts such as the head, shoulders, chest, abdomen, waist, buttocks, upper arms, forearms, hands, thighs, lower legs, and feet. In addition, eyes, nose, mouth, right and left thumbs, index fingers, middle fingers, ring fingers, little fingers, etc. may be added.

≪変形例：服装以外の認識≫
上記した実施形態では、上半身と下半身の服装を認識しているが、他の認識対象であってもよい。例えば、年齢（年代）や性別、持ち物などを認識対象としてもよい。年齢についての存在領域を両目と両耳を含む領域と定義してもよい。 ≪Modification: Recognition other than clothes≫
In the above-described embodiment, clothes for the upper body and lower body are recognized, but other recognition targets may be used. For example, age (age), gender, belongings, etc. may be recognized. A region of presence for age may be defined as a region that includes both eyes and both ears.

≪変形例：物品認識≫
上記した実施形態では、学習モデルは、人物の服装を認識しているが、人物以外を認識対象としてもよい。例えば、自動車の車名を判別する学習モデルの追加学習用の画像収集に学習画像判定装置を用いてもよい。車名を判別する学習モデルにおける対象存在領域を、エンブレムがある領域としてのフロントやグリル、ボンネット前部の何れかとしてもよい。 ≪Modified Example: Article Recognition≫
In the above-described embodiment, the learning model recognizes the clothing of the person, but it may be recognition targets other than the person. For example, the learning image determination device may be used to collect images for additional learning of a learning model for discriminating car names. The target presence area in the learning model for discriminating the car name may be any of the front, grille, and front part of the bonnet as the area where the emblem is present.

自動車に限らず、認識対象が物品である学習モデルにおいて、対象存在領域設定部１１６は、物品の部位を用いて対象存在領域を定義し、追加学習効果判定部１１７は、この定義された対象存在領域と注視領域との対応の度合い（領域の重なり度合い）に応じて効果を判定するようにしてもよい。物品の部位に替わり、上部、下部、前部、後部、左部、右部、上面、下面、前面、後面、左面（左側面）、右面（右側面）などの相対的位置を示す部位を用いて対象存在領域を定義してもよい。ここで、前とは物品が水平移動する場合の移動方向である。
動物を認識対象とする場合に動物の部位（頭部、前脚、後脚、尾など）を用いて対象存在領域を定義してもよい。また、植物を認識対象とする場合に植物の部位（花、花弁、がく、茎、枝など）を用いて対象存在領域を定義してもよい。 In a learning model whose recognition target is not only an automobile but also an article, the target existence region setting unit 116 defines the target existence region using the parts of the article, and the additional learning effect determination unit 117 determines the defined object existence region. The effect may be determined according to the degree of correspondence between the area and the gaze area (the degree of overlapping of the areas). Instead of parts of the article, use parts that indicate relative positions such as top, bottom, front, rear, left, right, top, bottom, front, back, left (left side), right side (right side) may define the target presence area. Here, the front is the direction of movement when the article moves horizontally.
When an animal is the recognition target, the target presence area may be defined using the animal's parts (head, front legs, hind legs, tail, etc.). In addition, when a plant is the recognition target, the target existence region may be defined using the parts of the plant (flower, petal, calyx, stem, branch, etc.).

≪変形例：ノイズ画像判定≫
上記した実施形態においてノイズ画像判定部１１３は、人物の全身が検出できない場合にはノイズ画像と判定している（図７参照）が、これに限る必要はない。例えば、上半身だけ、または下半身だけを検出できれば非ノイズ画像と判定してもよい。この場合、追加学習効果判定部１１７は、上半身服装のみ、または下半身服装のみについて、注視領域と存在領域とが対応しているか否かで学習効果を判定する。 ≪Modification: Noise Image Judgment≫
In the above-described embodiment, the noise image determination unit 113 determines that the image is a noise image when the whole body of the person cannot be detected (see FIG. 7), but the present invention is not limited to this. For example, if only the upper half of the body or only the lower half of the body can be detected, it may be determined as a non-noise image. In this case, the additional learning effect determination unit 117 determines the learning effect of only the upper-body clothing or only the lower-body clothing, based on whether or not the gaze region and the presence region correspond.

≪その他変形例≫
以上、本発明のいくつかの実施形態について説明したが、これらの実施形態は、例示に過ぎず、本発明の技術的範囲を限定するものではない。本発明はその他の様々な実施形態を取ることが可能であり、さらに、本発明の要旨を逸脱しない範囲で、省略や置換等種々の変更を行うことができる。これら実施形態やその変形は、本明細書等に記載された発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 ≪Other Modifications≫
Although several embodiments of the present invention have been described above, these embodiments are merely examples and do not limit the technical scope of the present invention. The present invention can take various other embodiments, and various modifications such as omissions and substitutions can be made without departing from the gist of the present invention. These embodiments and modifications thereof are included in the scope and gist of the invention described in this specification and the like, and are included in the scope of the invention described in the claims and equivalents thereof.

１００学習画像判定装置
１１０制御部
１１１人物検出部
１１２骨格検出部
１１３ノイズ画像判定部
１１４認識部
１１５注視点可視化部
１１６対象存在領域設定部
１１７追加学習効果判定部
１１８学習画像選定部
１２１プログラム
１２２学習モデル
１３０画像データベース
１５０領域設定データベース
２１０，２２０，２３０，２４０存在領域設定画面
２１３，２２３存在領域（対象存在領域）
３００学習装置
３１１ラベル付与部
３１２学習部
３２１学習モデル
３３０学習用画像データベース
４３０，４４０画像（ノイズ画像）
４５０画像（学習効果小の追加学習用の画像）
４６０画像（学習効果大の追加学習用の画像）
４５１，４６１注視領域 100 learning image determination device 110 control unit 111 person detection unit 112 skeleton detection unit 113 noise image determination unit 114 recognition unit 115 gazing point visualization unit 116 target existence area setting unit 117 additional learning effect determination unit 118 learning image selection unit 121 program 122 learning Model 130 Image database 150 Area setting database 210, 220, 230, 240 Existence area setting screens 213, 223 Existence area (target existence area)
300 learning device 311 labeling unit 312 learning unit 321 learning model 330 learning image database 430, 440 image (noise image)
450 images (images for additional learning with small learning effect)
460 images (images for additional learning with a large learning effect)
451,461 Gaze area

Claims

画像に含まれる人物である認識対象を認識する学習モデルの追加学習に関し、画像の追加学習の効果を判定する学習画像判定装置であって、
前記学習モデルを用いた認識処理において認識対象となった画像の領域である注視領域と、認識対象が存在する領域として定義済みである対象存在領域とを比較して、重ならない度合いが大きいほど、前記追加学習用の画像としての効果が大きいと判定する追加学習効果判定部と、
画像中の人物領域と、前記人物の関節点とが異なる画像をノイズ画像と判定するノイズ画像判定部と、を備え、
前記追加学習効果判定部は、前記ノイズ画像以外の画像を前記追加学習用の画像としての効果の判定対象とする
学習画像判定装置。 A learning image determination device for determining the effect of additional learning of an image, relating to additional learning of a learning model for recognizing a recognition target that is a person included in an image,
Comparing the region of interest, which is the region of the image to be recognized in the recognition process using the learning model, and the target existence region defined as the region where the recognition target exists, the greater the degree of non-overlapping , an additional learning effect determination unit that determines that the image for additional learning has a large effect ;
a noise image determination unit that determines an image in which a human region in an image and a joint point of the person are different from each other as a noise image,
The additional learning effect determination unit determines the effect of an image other than the noise image as the image for additional learning.
Learning image judgment device.

前記対象存在領域は、前記人物の関節点または部位に基づいて定義される
ことを特徴とする請求項１に記載の学習画像判定装置。 2. The learning image determination device according to claim 1, wherein the target existence area is defined based on joint points or parts of the person.

前記追加学習効果判定部が追加学習用の画像としての効果が大きいと判定した画像を前記追加学習用の画像として選定する学習画像選定部を備える
ことを特徴とする請求項１に記載の学習画像判定装置。 2. The learning image according to claim 1, further comprising a learning image selection unit that selects, as the image for additional learning, an image determined by the additional learning effect determination unit to have a large effect as an image for additional learning. judgment device.

コンピュータを請求項１～３の何れか１項に記載の学習画像判定装置として機能させるためのプログラム。 A program for causing a computer to function as the learning image determination device according to any one of claims 1 to 3 .

画像に含まれる人物である認識対象を認識する学習モデルの追加学習に関し、画像の追加学習の効果を判定する学習画像判定装置の学習画像判定方法であって、
前記学習モデルを用いた認識処理において認識対象となった画像の領域である注視領域と、認識対象が存在する領域として定義済みである対象存在領域とを比較して、重ならない度合いが大きいほど、前記追加学習用の画像としての効果が大きいと判定するステップと、
画像中の人物領域と、前記人物の関節点とが異なる画像をノイズ画像と判定するステップと、を実行し、
前記効果が大きいと判定するステップにおいては、前記ノイズ画像以外の画像を前記追加学習用の画像としての効果の判定対象とする
学習画像判定方法。
A learning image determination method for a learning image determination device for determining the effect of additional learning of an image, relating to additional learning of a learning model for recognizing a recognition target that is a person included in an image, comprising:
Comparing the region of interest, which is the region of the image to be recognized in the recognition process using the learning model, and the target existence region defined as the region where the recognition target exists, the greater the degree of non-overlapping , a step of determining that the effect as the image for additional learning is large;
determining an image in which a person region in the image and the joint points of the person are different from each other as a noise image;
In the step of determining that the effect is large, an image other than the noise image is subjected to determination of the effect as the image for additional learning.
Learning image judgment method.