JP2010102396A

JP2010102396A - Person detection device, person detection method, and program

Info

Publication number: JP2010102396A
Application number: JP2008271097A
Authority: JP
Inventors: Kazuya Ueki; 一也植木
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2008-10-21
Filing date: 2008-10-21
Publication date: 2010-05-06
Anticipated expiration: 2028-10-21
Also published as: JP5231159B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a person detection device for detecting a person in an image at high speed by simplifying processing. <P>SOLUTION: The person detection device is provided with: a means for calculating a first value showing how much an image patch overlaps a person region by inputting a learning model preliminarily created by cropping an image patch from image data to be detected, and for determining whether or not the first value is equal to or more than a preset first threshold; a means for, when the first value is a first threshold or more, calculating a second value showing how much the image patch shifts from a person region by using a learning model, and for changing the image patch based on the second value, and for recording the new region; and a means for applying the same scores to pixels in the recorded region, and adding the scores of the section where the recorded regions overlap each other, and calculating the mean value of the scores for each column and row, so as to define the column and row where calculated mean values are equal to or more than a preliminarily set second threshold as a boarder line, and define a region surrounded by the boarder line as a person detection region. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画像中の人物を検出する人物検出装置、人物検出方法及びプログラムに関する。 The present invention relates to a person detection apparatus, a person detection method, and a program for detecting a person in an image.

パターン認識（Regression）を用いて画像中の人物を検出する技術は、例えば、顔の一部（例えば目、鼻、口など）や頭部、肌の色を検出することで行われている（例えば、特許文献１、非特許文献１参照）。このような人物検出では、一般的に、検出対象の画像データに対して、所定領域の画像パッチを細かくずらすラスタスキャンを行い、顔か否かを判断している。また、画像データは多段階にサイズを変えられ、サイズ毎にラスタスキャンが行われる。よって、検出処理に時間がかかるという問題があった。
特許第３８１０９４３号公報 Ming Hsuan,David J.Kriegman,Narendra Ahuja、「Detecting Faces in Images: A Survey」、IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE、２００２年、VOL.24、NO.1 A technique for detecting a person in an image using pattern recognition (Regression) is performed, for example, by detecting a part of the face (for example, eyes, nose, mouth, etc.), head, and skin color ( For example, see Patent Document 1 and Non-Patent Document 1). In such person detection, generally, raster scanning is performed on the image data to be detected by finely shifting an image patch in a predetermined area, and it is determined whether or not the image is a face. Further, the size of the image data can be changed in multiple stages, and raster scanning is performed for each size. Therefore, there is a problem that the detection process takes time.
Japanese Patent No. 3810943 Ming Hsuan, David J. Kriegman, Narendra Ahuja, "Detecting Faces in Images: A Survey", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, VOL.24, NO.1

本発明は、上記事情に鑑みてなされたものであり、画像中の人物検出において、画像サイズを変える段階数を減らし、かつ、画像パッチをずらす幅を大きくすることで、処理時間を減少させ、高速に人物を検出できる人物検出装置、人物検出方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and in human detection in an image, the number of steps for changing the image size is reduced, and the width for shifting the image patch is increased, thereby reducing the processing time, It is an object of the present invention to provide a person detection device, a person detection method, and a program that can detect a person at high speed.

かかる目的を達成するために、本発明の人物検出装置は、第１の態様として、学習されたモデルを用い、画像データ中の人物領域を検出する人物検出装置であって、検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定手段と、第１の値が第１閾値以上である場合、学習モデルを用いて、画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、第２の値に基づいて画像パッチを変化させ、変化させた領域を記録する領域記録手段と、記録した領域のピクセルに同一のスコアを与え、記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、境界線で囲まれた領域を人物検出領域とするスコア算出判定手段と、を有することを特徴とする。 In order to achieve such an object, as a first aspect, a human detection apparatus according to the present invention is a human detection apparatus that detects a human region in image data using a learned model, and includes image data to be detected. A first value indicating how much the image patch overlaps with the human region is calculated by cutting out a predetermined region in the image as an image patch and inputting it to a learning model created in advance. The first value is set in advance. First value calculation determination means for determining whether or not the first threshold value is greater than or equal to the first threshold value, and if the first value is greater than or equal to the first threshold value, the learning model is used to determine how much the image patch is from the person area Calculating a second value indicating whether or not there is a shift, changing the image patch based on the second value, and recording the changed area, giving the same score to the pixels of the recorded area, The recorded areas overlap The score of the part to be added is calculated, the average value of the score is calculated for each vertical column and horizontal row, and the column and row where the calculated average value is equal to or greater than a preset second threshold value are defined as boundaries, And score calculation determination means for setting a region surrounded by the boundary line as a person detection region.

本発明の人物検出装置は、第２の態様として、少なくとも１人の人物が映っている画像データが入力され、画像データ中の人物領域が指定された場合、画像データ中の複数の領域毎に、指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成手段と、学習サンプルに基づいて、計算結果として出力される値のエラーが減るように学習を行い、学習モデルを作成するモデル作成手段と、を有することを特徴とする。 In the human detection device of the present invention, as a second aspect, when image data showing at least one person is input and a person area in the image data is designated, the person detection apparatus is provided for each of a plurality of areas in the image data. , Learning sample generation means that generates learning samples based on the specified person area, and model creation that learns based on the learning samples so as to reduce errors in the values output as calculation results and creates a learning model And means.

本発明の人物検出方法は、第１の態様として、学習されたモデルを用い、画像データ中の人物領域を検出する人物検出方法であって、検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定ステップと、第１の値が第１閾値以上である場合、学習モデルを用いて、画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、第２の値に基づいて画像パッチを変化させ、変化させた領域を記録する領域記録ステップと、記録した領域のピクセルに同一のスコアを与え、記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、境界線で囲まれた領域を人物検出領域とするスコア算出判定ステップと、を有することを特徴とする。 According to a first aspect of the present invention, there is provided a human detection method for detecting a human region in image data using a learned model, wherein a predetermined region in the image data to be detected is used as an image patch. A first value indicating how much the image patch overlaps with the person area is calculated by cutting and inputting to a learning model created in advance, and the first value is equal to or greater than a preset first threshold value. A first value calculation determination step for determining whether or not, and a second value indicating how much the image patch is deviated from the person region using the learning model when the first value is equal to or greater than the first threshold value A value is calculated, the image patch is changed based on the second value, the area recording step for recording the changed area, and the same score is given to the pixels of the recorded area, and the recorded areas overlap each other Score The average value of the score is calculated for each vertical column and horizontal row, and the column and row where the calculated average value is equal to or greater than the preset second threshold value are defined as the boundary line and surrounded by the boundary line. And a score calculation determination step using the region as a person detection region.

本発明の人物検出方法は、第２の態様として、少なくとも１人の人物が映っている画像データが入力され、画像データ中の人物領域が指定された場合、画像データ中の複数の領域毎に、指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成ステップと、学習サンプルに基づいて、計算結果として出力される値のエラーが減るように学習を行い、学習モデルを作成するモデル作成ステップと、を有することを特徴とする。 In the person detection method of the present invention, as a second aspect, when image data showing at least one person is input and a person area in the image data is designated, a plurality of areas in the image data are specified. A learning sample generation step that generates a learning sample based on a specified person area, and a model creation that creates a learning model based on the learning sample by learning to reduce the error in the value that is output as the calculation result And a step.

本発明のプログラムは、第１の態様として、学習されたモデルを用い、画像データ中の人物領域を検出するためのプログラムであって、検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定処理と、第１の値が第１閾値以上である場合、学習モデルを用いて、画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、第２の値に基づいて画像パッチを変化させ、変化させた領域を記録する領域記録処理と、記録した領域のピクセルに同一のスコアを与え、記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、境界線で囲まれた領域を人物検出領域とするスコア算出判定処理と、をコンピュータに実行させることことを特徴とする。 According to a first aspect of the present invention, there is provided a program for detecting a person area in image data using a learned model, and cutting out a predetermined area in image data to be detected as an image patch. By inputting into a learning model created in advance, a first value indicating how much the image patch overlaps the person area is calculated, and whether or not the first value is equal to or greater than a first threshold set in advance. When the first value is greater than or equal to the first threshold value and the first value is greater than or equal to the first threshold value, a second value indicating how much the image patch is deviated from the person region is obtained using the learning model. Calculate and change the image patch based on the second value, record the changed area, give the same score to the pixels in the recorded area, and score the part where the recorded areas overlap And add The average value of the score is calculated for each vertical column and horizontal row, and the column and row where the calculated average value is equal to or greater than a preset second threshold value is defined as a boundary line. It is characterized by causing a computer to execute score calculation determination processing as a person detection area.

本発明のプログラムは、第２の態様として、少なくとも１人の人物が映っている画像データが入力され、画像データ中の人物領域が指定された場合、画像データ中の複数の領域毎に、指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成処理と、学習サンプルに基づいて、計算結果として出力される値のエラーが減るように学習を行い、学習モデルを作成するモデル作成処理と、をコンピュータに実行させることを特徴とする。 In the second aspect of the program of the present invention, when image data showing at least one person is input and a person area in the image data is specified, the program is designated for each of a plurality of areas in the image data. A learning sample generation process that generates a learning sample based on the person area that has been created, and a model creation process that creates a learning model by performing learning so as to reduce the error in the value output as a calculation result based on the learning sample Are executed by a computer.

本発明によれば、画像中の人物検出において処理時間を減少させることにより、高速に画像中の人物を検出することが可能となる。 According to the present invention, it is possible to detect a person in an image at high speed by reducing the processing time in detecting a person in the image.

以下、本発明を実施するための最良の形態について添付図面を参照して詳細に説明する。 The best mode for carrying out the present invention will be described below in detail with reference to the accompanying drawings.

図１は、本発明の人物検出装置の一実施形態としてのＰＣ（Personal Computer）の構成を示す図である。図１に示すように、ＰＣ１は、少なくともＣＰＵ（Central Processing Unit）１１と、メモリ１２と、記憶手段１３、通信手段１４と、出力手段１５と、操作手段１６と、を備える情報処理装置である。これら各手段１１〜１６は、データの伝送路であるバスラインによって接続されている。 FIG. 1 is a diagram showing a configuration of a PC (Personal Computer) as an embodiment of a person detection apparatus of the present invention. As shown in FIG. 1, the PC 1 is an information processing apparatus including at least a CPU (Central Processing Unit) 11, a memory 12, a storage unit 13, a communication unit 14, an output unit 15, and an operation unit 16. . These means 11 to 16 are connected by a bus line which is a data transmission path.

ＣＰＵ１１は、プログラムの演算処理を行い、ＰＣ１内の各手段に命令を送りその動作を制御する。ここでいうプログラムには、後述する本発明の一実施形態としての人物検出方法の実行に係るものを含む。 The CPU 11 performs arithmetic processing of the program, sends commands to each means in the PC 1 and controls its operation. The program here includes a program related to execution of a person detection method according to an embodiment of the present invention to be described later.

メモリ１２は、例えば、任意のアドレスを指定して読み書きすることが可能なＲＡＭ（Random Access Memory）、読み込みのみ可能なＲＯＭ（Read Only Memory）等からなる。ＲＡＭは、ＣＰＵ１１が演算処理を実行する際に、記憶手段１３（又はＲＯＭ）から各種データや上記プログラムを一時的に読み込み、演算処理のワークスペースとして使用される。ＲＯＭは、ハードウェアを構成する各手段の基本入出力システム（ＢＩＯＳ：Basic Input Output System）等を格納するための記憶装置である。 The memory 12 includes, for example, a RAM (Random Access Memory) that can be read and written by designating an arbitrary address, a ROM (Read Only Memory) that can be read only, and the like. When the CPU 11 executes arithmetic processing, the RAM temporarily reads various data and the above programs from the storage unit 13 (or ROM) and is used as a workspace for arithmetic processing. The ROM is a storage device for storing a basic input / output system (BIOS) of each means constituting the hardware.

記憶手段１３は、各種データ及び上記プログラムを格納、保持する装置である。記憶手段１３は、例えば、ハードディスク、可搬性記録媒体等が挙げられる。なお、本実施形態では大量の電子データを取り扱うため、記憶手段１３としては大容量かつ高速な機能を有するものが好ましい。 The storage means 13 is a device that stores and holds various data and the programs. Examples of the storage means 13 include a hard disk and a portable recording medium. In this embodiment, since a large amount of electronic data is handled, it is preferable that the storage means 13 has a large capacity and a high-speed function.

通信手段１４は、ＰＣ１と外部装置（ＰＣ１以外の装置・機器。例えばスキャナ、カメラ、ＵＳＢメモリ等）とを接続し、通信を行うためのインターフェース装置である。通信手段１４は、例えば、ルータ、スイッチングハブ等の基幹ネットワークとの接続に用いられる装置等も含む。 The communication means 14 is an interface device for connecting and communicating with the PC 1 and an external device (devices / equipment other than the PC 1, such as a scanner, a camera, a USB memory, etc.). The communication means 14 includes, for example, a device used for connection to a backbone network such as a router or a switching hub.

出力手段１５は、ＰＣ１の操作者（ユーザ）に対して、各種データ（例えば、画像データ）を出力するための装置である。出力手段１５の例としては、画面表示部（グラフィックスボード及びディスプレイ）、音声出力部（スピーカ）、印刷部（プリンタ）などが挙げられる。 The output means 15 is a device for outputting various data (for example, image data) to an operator (user) of the PC 1. Examples of the output unit 15 include a screen display unit (graphics board and display), an audio output unit (speaker), a printing unit (printer), and the like.

操作手段１６は、ＰＣ１の操作者が各種データ（例えば、パラメータ）を入力する操作を行うための装置である。操作手段１６の例としては、キーボードやマウス、タッチパネル等が挙げられる。 The operation means 16 is a device for the operator of the PC 1 to perform operations for inputting various data (for example, parameters). Examples of the operation means 16 include a keyboard, a mouse, a touch panel, and the like.

記録媒体読取手段１７は、例えば、ＩＣカード、フロッピー（登録商標）ディスク、ＣＤ（Compact Disk）、ＤＶＤ（Digital Versatile Disc）等の可搬性記録媒体の記録情報（例えば、画像データ）を読み込む処理を実行する。 The recording medium reading unit 17 performs a process of reading recording information (for example, image data) of a portable recording medium such as an IC card, a floppy (registered trademark) disk, a CD (Compact Disk), and a DVD (Digital Versatile Disc). Execute.

以上のように構成された本実施形態のＰＣ１の動作（本発明の人物検出方法の一実施形態）について説明する。本実施形態のＰＣ１の動作は、学習のプロセスと、テスト（認識・検出）のプロセスとの２つに分けられる。すなわち、テストのプロセスを行う前提として、予め学習のプロセスを行っておく必要がある。 The operation of the PC 1 of the present embodiment configured as described above (one embodiment of the person detection method of the present invention) will be described. The operation of the PC 1 according to the present embodiment is divided into a learning process and a test (recognition / detection) process. That is, as a premise for performing the test process, it is necessary to perform a learning process in advance.

まず学習のプロセスについて説明する。図２は、学習のプロセスの概要を模式的に示すイメージ図である。ＰＣ１には、操作者の操作により、複数の画像データと、それら複数の画像データ毎の正解データとが入力される。画像データは、１人以上の人物（人間）の上半身を含むものとする。また、正解データは、画像データ中の人物の頭部周辺のデータである。なお、頭部周辺のデータとは、頭や顔だけでなく、少なくとも首又は肩までを含むものとする。 First, the learning process will be described. FIG. 2 is an image diagram schematically showing an outline of a learning process. A plurality of image data and correct data for each of the plurality of image data are input to the PC 1 by the operation of the operator. The image data includes the upper body of one or more persons (humans). The correct answer data is data around the head of the person in the image data. The data around the head includes not only the head and face but also at least the neck or shoulder.

ＰＣ１は、学習データ作成プログラムにより、１つの画像データ毎に、正解データに基づいて大量の学習サンプルを生成する（学習サンプル生成機能）。学習サンプルは、画像データ中の所定領域が正解データからどの程度ずれているかを示す値、及び、画像データ中の所定領域が正解データとどの程度重なっているかを示す値である。 The PC 1 generates a large number of learning samples based on the correct answer data for each image data by the learning data creation program (learning sample generation function). The learning sample is a value indicating how much the predetermined area in the image data is deviated from the correct data, and a value indicating how much the predetermined area in the image data overlaps with the correct data.

ＰＣ１は、学習プログラムにより、初期パラメータ（ランダム値等）が設定された初期モデルに対して大量の学習サンプルを入力し、計算の結果として初期モデルから出力される推定結果（推定値）のエラーが減るようにチューニング（学習）を行い、学習モデルを作成する（モデル作成機能）。作成された学習モデルは、テストのプロセスで使用される。 The PC 1 inputs a large number of learning samples to the initial model in which initial parameters (random values, etc.) are set by the learning program, and there is an error in the estimation result (estimated value) output from the initial model as a calculation result. Tuning (learning) to reduce and create a learning model (model creation function). The created learning model is used in the testing process.

なお、ＰＣ１は、上記チューニングとして、初期モデルから出力される推定結果と正解データとの誤差（エラー）を計算し、その誤差が少なくなる方向にパラメータを変更する。この変更方法は、例えばニューラルネットワークの場合、最急降下法を使用することによって自動的に誤差が減る方向にパラメータを変更できる。 As the tuning, the PC 1 calculates an error (error) between the estimation result output from the initial model and the correct answer data, and changes the parameter so that the error is reduced. In this changing method, for example, in the case of a neural network, the parameter can be changed in a direction in which the error is automatically reduced by using the steepest descent method.

また、上記モデル（又は識別器）とは、学習サンプル等のデータが入力されたときに、内部で所定の計算を行い、最終的に求めたい推定結果（推定値）を出力するものである。その例としては、ニューラルネットワーク、ＧＭＭ（ガウス混合モデル）、ＨＭＭ（隠れマルコフモデル）、ＳＶＭ（サポートベクタマシン）などが挙げられる。 The model (or discriminator) performs a predetermined calculation internally and outputs an estimation result (estimated value) to be finally obtained when data such as a learning sample is input. Examples thereof include neural networks, GMM (Gaussian mixture model), HMM (Hidden Markov Model), SVM (Support Vector Machine) and the like.

ここで、図２に示す学習のプロセスの具体例について、図３〜図７を参照しながら以下に説明する。 A specific example of the learning process shown in FIG. 2 will be described below with reference to FIGS.

ＰＣ１は、操作者による操作手段１６の操作を受け付け、複数の画像データを入力する（ステップＳ１）。入力される画像データの例を図４に示す。図４（ａ）〜（ｃ）に示すように、各画像データには、人物の少なくとも上半身が含まれている。なお、これらの画像データは、通信手段１４を介して外部装置から入力されてもよいし、又は、記録媒体読取手段１７を介して記録媒体から入力されてもよい。 The PC 1 accepts the operation of the operation means 16 by the operator and inputs a plurality of image data (step S1). An example of input image data is shown in FIG. As shown in FIGS. 4A to 4C, each image data includes at least the upper body of a person. These image data may be input from an external device via the communication unit 14 or may be input from a recording medium via the recording medium reading unit 17.

ＰＣ１は、入力した各画像データを、出力手段１５の一例であるディスプレイ等に画面表示し、画像データ毎に正解データ（正解領域）を指定するように操作者に促す。操作者は、画面表示される画像データ毎に、操作手段１６の一例であるマウスをドラッグするなどの操作を行い、画像データ中に正解データを指定する。これによりＰＣ１は、画像データ毎の正解データを入力する（ステップＳ２）。なお、正解データとは、画像データ中の人物の頭部周辺の領域を示すデータであり、その領域としては、上述したように、人物の頭や顔だけでなく少なくとも首や肩までを含むものとする。図４（ａ）〜（ｃ）に示す各画像データに正解データが指定された様子を図５（ａ）〜（ｃ）に示す。図５（ａ）〜（ｃ）に示すように、各画像データ中において、各人物の頭部周辺の領域（四角形で囲まれた部分）が正解データ（正解領域）として指定されている。 The PC 1 displays the input image data on a screen as an example of the output unit 15 and prompts the operator to specify correct data (correct answer area) for each image data. The operator performs an operation such as dragging a mouse which is an example of the operation means 16 for each image data displayed on the screen, and specifies correct data in the image data. Accordingly, the PC 1 inputs correct data for each image data (step S2). The correct answer data is data indicating an area around the head of the person in the image data, and the area includes not only the head and face of the person but also at least the neck and shoulders as described above. . FIGS. 5A to 5C show how correct data is designated for the image data shown in FIGS. 4A to 4C. As shown in FIGS. 5A to 5C, in each image data, an area around the head of each person (a part surrounded by a square) is designated as correct data (correct answer area).

ＰＣ１は、学習データ作成プログラムにより、画像データ毎に、指定された正解データに基づいて大量の学習サンプル（学習データ）を生成する（ステップＳ３）。学習サンプルとは、上述したように、画像データ中の所定領域（推定領域）が正解領域からどの程度ずれているかを示す値、及び、画像データ中の所定領域（推定領域）が正解領域に対してどの程度重なっているかを示す値である。 The PC 1 generates a large amount of learning samples (learning data) based on the specified correct answer data for each image data by the learning data creation program (step S3). As described above, the learning sample is a value indicating how much the predetermined area (estimated area) in the image data is deviated from the correct answer area, and the predetermined area (estimated area) in the image data is compared to the correct area. It is a value indicating how much overlap.

ここで、学習サンプルの生成（算出）について、図５（ａ）の画像データを例として説明する。図６に示すように、ＰＣ１は、図５（ａ）の画像データにおいて、正解データ（正解領域）とは位置及び大きさが異なる複数の推定領域（ａ）〜（ｄ）を指定し、複数の推定領域毎に、学習サンプルとしてF-measure、Cx、Cy、Csizeを算出する。Cx、Cy、Csize（変化量）は、推定領域が正解領域からどの程度ずれているかを示す値である。Cxはｘ方向の移動量、Cyはｙ方向の移動量を示す値であり、単位はピクセルである。Csizeは、大きさの変化量を示す値であり、倍率である。F-measureは、推定領域が正解領域に対してどの程度重なっているかを示す値である。 Here, generation (calculation) of a learning sample will be described using the image data in FIG. 5A as an example. As shown in FIG. 6, the PC 1 designates a plurality of estimated areas (a) to (d) that are different in position and size from the correct data (correct answer area) in the image data of FIG. For each estimated area, F-measure, Cx, Cy, and Csize are calculated as learning samples. Cx, Cy, and Csize (variations) are values indicating how much the estimated area is deviated from the correct area. Cx is a value indicating the amount of movement in the x direction, Cy is a value indicating the amount of movement in the y direction, and the unit is a pixel. Csize is a value indicating the amount of change in size, and is a magnification. F-measure is a value indicating how much the estimated area overlaps the correct answer area.

ここで、F-measureの算出について、図７を参照して説明する。図７は、図６（ａ）の推定領域のF-measureの算出例を示している。すなわち、図７において、点線で示す推定領域は、図６（ａ）に示す領域と同じである。また、図７では、推定領域と正解領域とが重なっている領域を重複領域として示している。F-measureを算出する前に、まず再現率（Recall）と適合率（Precision）を求める。再現率は、重複領域／正解領域で算出される。再現率の算出結果を［１］とする。また、適合率は、重複領域／推定領域で算出される。適合率の算出結果を［２］とする。F-measureは、これら［１］と［２］を以下の式に代入して算出される。
F-measure＝（２×［１］×［２］）／（［１］＋［２］） Here, calculation of F-measure will be described with reference to FIG. FIG. 7 shows a calculation example of the F-measure of the estimation region in FIG. That is, in FIG. 7, the estimation area indicated by the dotted line is the same as the area shown in FIG. Moreover, in FIG. 7, the area | region where the estimation area | region and the correct answer area | region have overlapped is shown as an overlapping area | region. Before calculating F-measure, first calculate the recall (Recall) and precision (Precision). The recall rate is calculated by the overlap area / correct answer area. The calculation result of the recall is [1]. In addition, the relevance ratio is calculated by the overlapping area / estimated area. The calculation result of the precision is [2]. F-measure is calculated by substituting these [1] and [2] into the following equation.
F-measure = (2 × [1] × [2]) / ([1] + [2])

以上のようにして、ＰＣ１は、１つの画像データから、複数の指定領域毎の学習サンプル（F-measure、Cx、Cy、Csize）を生成する。図６に示す例では、推定領域（ａ）の学習サンプルとして、F-measure=0.22、Cx=23(pixel)、Cy=4(pixel)、Csize=0.82(倍)が算出されている。その他の推定領域（ｂ）及び（ｃ）も同様に学習サンプルが算出されている。なお、推定領域（ｄ）は、F-measure=0.00、すなわち正解領域と重なっていないので、Cx、Cy、Csizeが学習サンプルとして算出されていない。つまり、ＰＣ１は、まずF-measureを算出し、算出したF-measureが予め設定された閾値以上かどうかを判定し、閾値以上である場合にCx、Cy、Csizeを算出する。よって、算出したF-measureが予め設定された閾値未満である場合は、Cx、Cy、Csizeの算出は行われない。 As described above, the PC 1 generates learning samples (F-measure, Cx, Cy, Csize) for each of a plurality of designated areas from one image data. In the example illustrated in FIG. 6, F-measure = 0.22, Cx = 23 (pixel), Cy = 4 (pixel), and Csize = 0.82 (times) are calculated as learning samples of the estimation region (a). Learning samples are similarly calculated for the other estimation regions (b) and (c). Note that the estimated area (d) is F-measure = 0.00, that is, does not overlap the correct area, so Cx, Cy, and Csize are not calculated as learning samples. That is, the PC 1 first calculates F-measure, determines whether or not the calculated F-measure is equal to or greater than a preset threshold value, and calculates Cx, Cy, and Csize when equal to or greater than the threshold value. Therefore, when the calculated F-measure is less than a preset threshold, Cx, Cy, and Csize are not calculated.

また、図６に示す例では、推定領域を（ａ）〜（ｄ）の４つを例としているが、実際には大量である。そして、大量の推定領域毎に学習サンプルが算出（生成）される。このように本実施形態では、１つの画像データから大量の学習サンプルを取得することができるので、その学習サンプルを基に学習したモデルを用いたテスト（認識・検出）では、検出精度が高くなる。 In the example shown in FIG. 6, four estimation regions (a) to (d) are taken as an example, but in actuality there are a large amount. A learning sample is calculated (generated) for each of a large number of estimation regions. As described above, in the present embodiment, a large amount of learning samples can be acquired from one image data. Therefore, in the test (recognition / detection) using the model learned based on the learning samples, the detection accuracy is high. .

ＰＣ１は、学習プログラムにより、大量の学習サンプルを基に学習を実施し、学習モデルを作成する（ステップＳ４）。上述したように、初期モデルに対して大量の学習サンプルが入力されることで、学習が実施される。学習は、当該モデルから出力される推定結果（F-measure、Cx、Cy、Csize）の誤差（エラー）が少なくなるように行われる。この学習の結果として、学習モデルが作成される。この学習モデルは、例えば記憶手段１３又は外部装置等に保持され、後述するテストのプロセスで使用される。 The PC 1 uses the learning program to perform learning based on a large number of learning samples and creates a learning model (step S4). As described above, learning is performed by inputting a large amount of learning samples to the initial model. Learning is performed so that errors (errors) in the estimation results (F-measure, Cx, Cy, Csize) output from the model are reduced. As a result of this learning, a learning model is created. This learning model is held in, for example, the storage unit 13 or an external device, and is used in a test process described later.

次にテスト（認識・検出）のプロセスについて説明する。図８は、テストのプロセスの概要を模式的に示すイメージ図である。ＰＣ１には、操作者の操作により、テストの対象となる画像データと、各種パラメータとが入力される。画像データは、必ずしも人物が映っている必要はない。また、上記学習のプロセスで作成された学習モデルがＰＣ１以外に保存されている場合は、操作者の操作により、ＰＣ１に入力される。また、各種パラメータは、画像パッチの移動量、テスト対象の画像データの縮小率、F-measure判定に用いる閾値（第１閾値）、スコア判定に用いる閾値（第２閾値）などがある。 Next, a test (recognition / detection) process will be described. FIG. 8 is an image diagram schematically showing an outline of a test process. The image data to be tested and various parameters are input to the PC 1 by the operation of the operator. The image data does not necessarily show a person. When the learning model created in the learning process is stored in a place other than PC1, it is input to PC1 by the operation of the operator. The various parameters include the amount of movement of the image patch, the reduction rate of the image data to be tested, the threshold used for F-measure determination (first threshold), and the threshold used for score determination (second threshold).

ＰＣ１は、テストプログラムにより、入力したパラメータに従い、テスト対象の画像データ中の所定領域を画像パッチとして切り取り、学習モデルに入力する。そして、ＰＣ１は、学習モデルを用いてF-measure（第１の値）を算出し、そのF-measureが第１閾値以上であるか否かを判定する（F-measure算出判定機能）。 The PC 1 cuts out a predetermined area in the test target image data as an image patch according to the input parameters by the test program, and inputs it to the learning model. And PC1 calculates F-measure (1st value) using a learning model, and determines whether the F-measure is more than a 1st threshold value (F-measure calculation determination function).

ＰＣ１は、テストプログラムにより、F-measureが第１閾値以上である場合、学習モデルを再び用いて変化量Cx、Cy、Csize（第２の値）を算出し、算出した変化量に応じて当該画像パッチの位置（ｘ、ｙ）と大きさ（倍率）を変化させ、変化させた長方形領域を記録する（長方形領域記録機能）。なお、F-measureが第１閾値未満である場合、長方形領域の記録は行わない。 When the F-measure is greater than or equal to the first threshold, the PC 1 uses the learning model again to calculate the change amounts Cx, Cy, Csize (second value), and according to the calculated change amount, The position (x, y) and size (magnification) of the image patch are changed, and the changed rectangular area is recorded (rectangular area recording function). Note that when the F-measure is less than the first threshold, the rectangular area is not recorded.

ＰＣ１は、入力したパラメータに従い、画像パッチの移動量及びテスト対象の画像データの縮小率を変化させ、F-measure算出判定機能及び長方形領域記録機能を繰り返し行う。全てのパラメータでの処理が終わったら、以下のスコア統合に移行する。 The PC 1 changes the moving amount of the image patch and the reduction rate of the image data to be tested in accordance with the input parameters, and repeatedly performs the F-measure calculation determination function and the rectangular area recording function. When the processing with all parameters is completed, the following score integration is performed.

ＰＣ１は、スコア統合プログラムにより、記録した全ての長方形領域内のピクセルに所定のスコア（例えば「１」）を与える。これにより、長方形同士が重なっている部分のスコアは加算される。例えば、３つの長方形領域が重なっている部分のスコアは、１＋１＋１で「３」になる。ＰＣ１は、縦の１列毎及び横の１行毎においてスコアの平均値を算出し、算出した平均値が第２閾値以上になる列及び行を境界線として決定する（スコア算出判定機能）。そして、ＰＣ１は、決定した境界線で囲まれた領域を最終的な人物検出領域として出力する。出力内容としては、例えば、画像データの中にいる人数、頭部位置の座標、頭部領域の大きさなどが挙げられる。 The PC 1 gives a predetermined score (for example, “1”) to the pixels in all the recorded rectangular areas by the score integration program. Thereby, the score of the part where the rectangles overlap is added. For example, the score of a portion where three rectangular areas overlap is 1 + 1 + 1 and becomes “3”. The PC 1 calculates an average score value for each vertical column and each horizontal row, and determines a column and row where the calculated average value is equal to or greater than the second threshold value as a boundary line (score calculation determination function). Then, the PC 1 outputs the area surrounded by the determined boundary line as the final person detection area. Examples of output contents include the number of people in the image data, the coordinates of the head position, and the size of the head region.

ここで、図８に示すテスト（認識・検出）のプロセスの具体例について、図９〜図１４を参照しながら以下に説明する。 A specific example of the test (recognition / detection) process shown in FIG. 8 will be described below with reference to FIGS.

ＰＣ１は、操作者による操作手段１６の操作を受け付け、テストの対象となる画像データ及び各種パラメータを入力する（ステップＳ１１）。テスト対象の画像データの例を図１０に示す。なお、テスト対象の画像データは、通信手段１４を介して外部装置から入力されてもよいし、又は、記録媒体読取手段１７を介して記録媒体から入力されてもよい。また、各種パラメータの例としては、画像パッチの移動量、テスト対象の画像データの縮小率、F-measure判定に用いる閾値（第１閾値）、スコア判定に用いる閾値（第２閾値）などが挙げられる。 The PC 1 accepts the operation of the operation means 16 by the operator, and inputs image data and various parameters to be tested (step S11). An example of image data to be tested is shown in FIG. Note that the image data to be tested may be input from an external device via the communication unit 14, or may be input from a recording medium via the recording medium reading unit 17. Examples of various parameters include the amount of movement of the image patch, the reduction rate of the image data to be tested, the threshold used for F-measure determination (first threshold), the threshold used for score determination (second threshold), and the like. It is done.

また、上記学習のプロセスで作成された学習モデルがＰＣ１以外に保存されている場合は、操作者の操作により、ステップＳ１１の段階でＰＣ１に入力される。 When the learning model created by the learning process is stored in a place other than PC1, it is input to PC1 in the step S11 by the operation of the operator.

ＰＣ１は、テストプログラムにより、テスト対象の画像データの中から所定領域を画像パッチとして切り取り、学習モデルに入力する（ステップＳ１２）。このときの例を図１１（ａ）に示す。図１１（ａ）に示す符号２０が、画像パッチとして切り取られる領域である。ここでの例では、画像パッチの切り取り開始地点を、図１１（ａ）に示すようにテスト対象の画像データの左上としているが、これに限定されない。また、図１１（ａ）に示すテスト対象の画像データの縮小率は、等倍（１００％）とする。 The PC 1 cuts out a predetermined area as an image patch from the image data to be tested by the test program and inputs it to the learning model (step S12). An example at this time is shown in FIG. Reference numeral 20 shown in FIG. 11A is an area cut out as an image patch. In this example, the image patch cut start point is set to the upper left of the test target image data as shown in FIG. 11A, but is not limited to this. Further, the reduction rate of the test target image data shown in FIG. 11A is assumed to be equal (100%).

ＰＣ１は、画像パッチを学習モデルに入力し、学習モデルを用いてF-measure（第１の値）を算出し、そのF-measureが第１閾値以上であるか否かを判定する（ステップＳ１３）。 The PC 1 inputs the image patch into the learning model, calculates the F-measure (first value) using the learning model, and determines whether the F-measure is equal to or greater than the first threshold (step S13). ).

ＰＣ１は、F-measureが第１閾値以上である場合（ステップＳ１３／ＹＥＳ）、学習モデルを再び用いて変化量Cx、Cy、Csize（第２の値）を算出し、算出した変化量に応じて、切り取った画像パッチ２０の位置（ｘ、ｙ）と大きさ（倍率）を変化させ、変化させた長方形領域を記録する（ステップＳ１４）。なお、F-measureが第１閾値未満である場合（ステップＳ１３／ＮＯ）、ＰＣ１は、変化量の算出及び長方形領域の記録は行わずにステップＳ１５へ進む。 When the F-measure is equal to or greater than the first threshold value (step S13 / YES), the PC 1 uses the learning model again to calculate the change amounts Cx, Cy, Csize (second value), and according to the calculated change amount Then, the position (x, y) and size (magnification) of the cut image patch 20 are changed, and the changed rectangular area is recorded (step S14). If F-measure is less than the first threshold (step S13 / NO), the PC 1 proceeds to step S15 without calculating the amount of change and recording the rectangular area.

ＰＣ１は、等倍のテスト対象の画像データにおいて、画像パッチの移動が全て終了したか、すなわち、画像パッチの切り取り終了地点（例えば、図１１（ａ）の画像データの右下）まで切り取りを行ったかどうかを判断する（ステップＳ１５）。 The PC 1 cuts out all the movements of the image patch in the same-size test target image data, that is, to the cut end point of the image patch (for example, the lower right of the image data in FIG. 11A). It is determined whether or not (step S15).

ＰＣ１は、ステップＳ１５での判断の結果、画像パッチの移動が全て終了していない場合は、パラメータとして入力された画像パッチの移動量に基づいて切り取り位置をずらし、新たに画像パッチの切り取りを行い、学習モデルに入力する（ステップＳ１５／ＮＯ〜ステップＳ１２）。なお、画像パッチの移動量（幅）は、従来よりも大きい値とすることができる。その理由は、変化量を基にした長方形領域を記録するようにしているので、画像パッチを細かくずらしてスキャンする必要がないからである。このように、本実施形態では、従来に比べ、画像パッチをずらす幅を大きくすることができるので、処理時間を減少させることができる。図１１（ａ）に示す符号２１が、画像パッチとして新たに切り取られる領域である。画像パッチ２１は、パラメータとして入力された画像パッチの移動量分、画像パッチ２０から移動させた位置の領域である。この画像パッチ２１に対しても、上記同様にステップＳ１３〜Ｓ１４の処理を行う。 If it is determined in step S15 that the movement of all the image patches has not been completed, the PC 1 shifts the cutting position based on the movement amount of the image patch input as a parameter, and newly cuts the image patch. Then, input to the learning model (step S15 / NO to step S12). It should be noted that the moving amount (width) of the image patch can be set to a larger value than before. The reason is that since the rectangular area based on the amount of change is recorded, it is not necessary to scan the image patch with fine displacement. As described above, in this embodiment, since the width for shifting the image patch can be increased as compared with the related art, the processing time can be reduced. Reference numeral 21 shown in FIG. 11A is an area that is newly cut out as an image patch. The image patch 21 is an area at a position moved from the image patch 20 by the amount of movement of the image patch input as a parameter. The image patch 21 is also processed in steps S13 to S14 in the same manner as described above.

このようにして、ある倍率のテスト対象の画像データにおいて、少しずつ位置をずらしながら画像パッチが切り取られ、切り取られた画像パッチのF-measureが第１閾値以上である場合に、その画像パッチを変化量分変化させた長方形領域が記録される。画像パッチの切り取りは、画像パッチの切り取り終了地点（例えば、図１１（ａ）の画像データの右下）まで行われたら終了する。 In this way, when the image patch is cut out while shifting the position little by little in the image data to be tested at a certain magnification, and the F-measure of the cut out image patch is equal to or more than the first threshold, the image patch is A rectangular area changed by the change amount is recorded. The image patch cutting ends when the image patch cutting end point (for example, the lower right of the image data in FIG. 11A) is performed.

ＰＣ１は、ステップＳ１５での判断の結果、画像パッチの移動が全て終了した場合は、テスト対象の画像データの縮小が終了したかどうかを判断する（ステップＳ１６）。ここでは、縮小率のパラメータとして、例えば固定値０．７が予め入力（設定）されたものとする。この固定値は、テスト対象の画像データの縮小サイズを決めるための値である。そして、テスト対象の画像データは、最初に等倍（１００％）でテストされた後、上記固定値を基に算出される大きさで順次テストが行われる。つまり、１００％の次は１００×０．７＝７０％の大きさとなり、その次は７０×０．７＝４９％の大きさとなる。このように、本実施形態では、従来０．９程度であった縮小率を、例えば０．７といった小さい値に設定することができるので、従来に比べて画像サイズ変更の段階数を減らすことができる。なお、縮小サイズが画像パッチのサイズ以下になった時点で、テスト対象の画像データの縮小を終了する。 If it is determined in step S15 that the movement of all image patches has been completed, the PC 1 determines whether the reduction of the image data to be tested has been completed (step S16). Here, it is assumed that, for example, a fixed value of 0.7 is input (set) as a reduction ratio parameter. This fixed value is a value for determining the reduced size of the image data to be tested. The image data to be tested is first tested at the same magnification (100%), and then sequentially tested with a size calculated based on the fixed value. That is, the next of 100% is 100 × 0.7 = 70%, and the next is 70 × 0.7 = 49%. As described above, in the present embodiment, the reduction ratio, which was about 0.9 in the past, can be set to a small value, for example, 0.7, so that the number of image size change steps can be reduced as compared with the conventional case. it can. Note that when the reduction size becomes equal to or smaller than the size of the image patch, the reduction of the image data to be tested is finished.

ＰＣ１は、ステップＳ１６での判断の結果、縮小サイズが画像パッチのサイズ以下になっていない場合は、縮小率を別の値に変更し、変更した倍率のテスト対象の画像データの中から所定領域を画像パッチとして切り取り、学習モデルに入力する（ステップＳ１６／ＮＯ〜ステップＳ１２）。例えば、図１１（ａ）に示す１００％の画像データにおける画像パッチの切り取りが終了した場合、ＰＣ１は、テスト対象の画像データのサイズを次の７０％に変更する。そして、ＰＣ１は、図１１（ｂ）に示す７０％の大きさの画像データにおいて、画像パッチの切り取り開始地点を画像データの左上とし、点線で示す領域を画像パッチとして切り取り、学習モデルに入力する。この画像パッチに対しても、上記同様にステップＳ１３〜Ｓ１４の処理を行う。７０％の大きさの画像データにおける画像パッチの切り取り処理が終了したら、テスト対象の画像データのサイズを次ぎの４９％に変更し、上記同様に処理を行う。なお、４９％の大きさのテスト対象の画像データのイメージは、図１１（ｃ）に示すようになる。 If the reduction size is not less than or equal to the size of the image patch as a result of the determination in step S16, the PC 1 changes the reduction rate to another value and selects a predetermined area from the image data to be tested with the changed magnification. Are cut out as image patches and input to the learning model (step S16 / NO to step S12). For example, when the cutting of the image patch in the 100% image data shown in FIG. 11A is completed, the PC 1 changes the size of the image data to be tested to the next 70%. Then, in the image data having the size of 70% shown in FIG. 11B, the PC 1 sets the image patch cut start point as the upper left of the image data, cuts the region indicated by the dotted line as the image patch, and inputs it to the learning model. . Also for this image patch, the processing in steps S13 to S14 is performed in the same manner as described above. When the image patch cutting process for 70% image data is completed, the size of the image data to be tested is changed to the next 49%, and the same process as described above is performed. Note that the image of the test target image data having a size of 49% is as shown in FIG.

ＰＣ１は、ステップＳ１６での判断の結果、縮小サイズが画像パッチのサイズ以下になった場合は、スコア統合プログラムにより、記録した各長方形領域内にスコアを与えた上でスコアの平均を縦／横で算出し、その算出結果に基づいて境界線を決定し、決定した境界線で囲まれた領域を人物検出領域として出力する（ステップＳ１６／ＹＥＳ〜ステップＳ１７）。 If the reduced size is equal to or smaller than the size of the image patch as a result of the determination in step S16, the PC 1 gives a score to each recorded rectangular area by the score integration program, and calculates the average of the vertical / horizontal scores. The boundary line is determined based on the calculation result, and the area surrounded by the determined boundary line is output as a person detection area (step S16 / YES to step S17).

ここで、ステップＳ１７におけるスコア統合について具体的に説明する。上記ステップＳ１４で記録された長方形領域を、例えば、図１２（ａ）に示す長方形領域３１、３２として説明する。 Here, the score integration in step S17 will be specifically described. The rectangular area recorded in step S14 will be described as rectangular areas 31 and 32 shown in FIG.

まず、ＰＣ１は、例えば図１２（ｂ）に示すように、記録した全ての長方形領域内のピクセル毎に所定のスコア（例えば「１」）を与える。長方形領域３１、３２は、図１２（ａ）に示すように互いに重なる領域同士であるので、重なった部分のスコアは加算される。すなわち、図１２（ｃ）に示すように、長方形領域３１、３２が重なった部分の各ピクセルのスコアは、１＋１で「２」となる。ここでは、例として長方形領域を２つとして説明したが、実際には、多量の長方形領域が記録され、それら長方形領域はそれぞれ、部分的に重なった状態となっている。 First, for example, as shown in FIG. 12B, the PC 1 gives a predetermined score (for example, “1”) for each pixel in all the recorded rectangular areas. Since the rectangular regions 31 and 32 are regions that overlap each other as shown in FIG. 12A, the scores of the overlapping portions are added. That is, as shown in FIG. 12C, the score of each pixel in the portion where the rectangular regions 31 and 32 overlap is 1 + 1 and becomes “2”. Here, two rectangular areas have been described as an example, but actually, a large number of rectangular areas are recorded, and these rectangular areas are partially overlapped.

次に、ＰＣ１は、記録された領域の縦の１列毎及び横の１行毎に、スコアの平均値を算出する。図１３を用いて説明する。図１３は、記録された複数の長方形領域を重ねた場合のスコアの加算例を示す図である。なお、図１３では、横の並びを「行」、縦の並びを「列」とする。ＰＣ１は、縦の１列毎にスコアの平均を算出し、算出した平均値が第２閾値以上になる列を境界線として決定する。図１３の例では、Ｂ列のスコアの平均値が第２閾値以上となり、Ａ列のスコアの平均値が第２閾値未満であるので、Ｂ列が境界線（太線の実線）として決定されている。同様に、ＰＣ１は、横の１行毎にスコアの平均を算出し、算出した平均値が第２閾値以上になる行を境界線として決定する。図１３の例では、Ｂ行のスコアの平均値が第２閾値以上となり、Ａ行のスコアの平均値が第２閾値未満であるので、Ｂ行が境界線（太線の実線）として決定されている。 Next, the PC 1 calculates an average score value for each vertical column and horizontal row of the recorded area. This will be described with reference to FIG. FIG. 13 is a diagram illustrating an example of score addition when a plurality of recorded rectangular regions are overlapped. In FIG. 13, the horizontal arrangement is “row” and the vertical arrangement is “column”. The PC 1 calculates the average score for each vertical column, and determines a column where the calculated average value is equal to or greater than the second threshold as a boundary line. In the example of FIG. 13, the average value of the score of the B column is equal to or greater than the second threshold value, and the average value of the score of the A column is less than the second threshold value. Yes. Similarly, the PC 1 calculates the average of the scores for each horizontal row, and determines a row where the calculated average value is equal to or greater than the second threshold as a boundary line. In the example of FIG. 13, the average value of the B line score is equal to or greater than the second threshold value, and the average value of the A line score is less than the second threshold value. Yes.

そして、ＰＣ１は、図１４に示すように、決定した縦・横の境界線で囲まれた領域を、最終的な人物検出領域として出力する。出力の例としては、人物検出領域の画像データをＰＣ１に画面表示してもよいし、ＰＣ１以外の外部装置へ伝送してもよい。また、画像データ以外の出力例としては、画像データの中にいる人数、頭部位置の座標、頭部領域の大きさなどを示すデータが挙げられる。 Then, as shown in FIG. 14, the PC 1 outputs the area surrounded by the determined vertical and horizontal boundary lines as the final person detection area. As an output example, the image data of the person detection area may be displayed on the PC 1 or transmitted to an external device other than the PC 1. Examples of output other than image data include data indicating the number of people in the image data, the coordinates of the head position, the size of the head region, and the like.

なお、上記説明では、全ての長方形領域の記録が終了した後に各長方形領域内にスコアを与え、重複部分のスコアを加算するように説明したが、ステップＳ１４で長方形領域が記録される度にその長方形領域内にスコアを与えるようにし、先に記録されている長方形領域と重なった部分についてはスコアを加算するようにしてもよい。 In the above description, a score is given to each rectangular area after the recording of all the rectangular areas is finished, and the score of the overlapping part is added. However, every time a rectangular area is recorded in step S14, the score is added. A score may be given in the rectangular area, and the score may be added to a portion overlapping the previously recorded rectangular area.

また、上記説明では、学習のプロセスとテストのプロセスの両方を１つの装置（ＰＣ１）で実行するようにしたが、各プロセスを別々の装置で実行するようにしてもよい。その場合、学習のプロセスで作成された学習モデルを、テストのプロセスを実行する装置へ入力する必要がある。 In the above description, both the learning process and the test process are executed by one apparatus (PC1). However, each process may be executed by separate apparatuses. In this case, it is necessary to input the learning model created in the learning process to a device that executes the test process.

以上説明したように本実施形態によれば、従来方法に比べて、画像サイズを変える段階数を減らすことができ、かつ、画像パッチをずらす幅を大きくすることができるので、処理時間を減少させることができる。よって、高速に画像中の人物を検出できる。 As described above, according to the present embodiment, the number of steps for changing the image size can be reduced and the width for shifting the image patch can be increased compared with the conventional method, so that the processing time is reduced. be able to. Therefore, a person in the image can be detected at high speed.

また、本実施形態によれば、学習のプロセスにおいて、１つの画像データから大量の学習サンプルを生成して学習を行うことで、人物検出の精度が高くなる。 Further, according to the present embodiment, in the learning process, learning is performed by generating a large number of learning samples from one image data, thereby improving the accuracy of person detection.

上記特許文献１や非特許文献１に開示されている、顔の一部や頭部、肌の色等を検出する人物検出方法には、例えば以下のような問題がある。顔の一部を検出する方法では、正面に近い顔しか検出できないという問題がある。また、肌の色を検出する方法では、後ろ向きの人物を検出できないという問題がある。また、頭部を検出する方法では、頭部には特徴（情報量）が少ないので、誤検出が起こりやすいという問題がある。これに対し、本実施形態では、顔だけでなく首や肩の情報を使うことにより、顔の向きに左右されることなく、後ろ姿であっても、人物を検出できる。すなわち本実施形態では、情報量（特徴量）が少ない場合でも、人物を検出することができる。 The person detection methods for detecting part of the face, head, skin color, etc. disclosed in the above-mentioned Patent Document 1 and Non-Patent Document 1 have the following problems, for example. The method of detecting a part of the face has a problem that only the face close to the front can be detected. In addition, the method of detecting the skin color has a problem that a backward-facing person cannot be detected. In addition, the method of detecting the head has a problem that erroneous detection is likely to occur because the head has few features (information amount). On the other hand, in the present embodiment, by using information on not only the face but also the neck and shoulders, it is possible to detect a person even in the back without depending on the orientation of the face. That is, in the present embodiment, a person can be detected even when the amount of information (feature amount) is small.

以上、本発明の実施形態について説明したが、上記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々の変形が可能である。 As mentioned above, although embodiment of this invention was described, it is not limited to the said embodiment, A various deformation | transformation is possible in the range which does not deviate from the summary.

例えば、上述した各実施形態における動作（各フローチャートに示す動作）は、ハードウェア、または、ソフトウェア、あるいは、両者の複合構成によって実行することも可能である。 For example, the operation (the operation shown in each flowchart) in each of the above-described embodiments can be executed by hardware, software, or a combined configuration of both.

ソフトウェアによる処理を実行する場合には、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれているコンピュータ内のメモリにインストールして実行させてもよい。あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させてもよい。 When executing processing by software, a program in which a processing sequence is recorded may be installed and executed in a memory in a computer incorporated in dedicated hardware. Or you may install and run a program in the general purpose computer which can perform various processes.

例えば、プログラムは、図１に示すハードディスクやＲＯＭに予め記録しておくことが可能である。あるいは、プログラムは、ＣＤ−ＲＯＭ(Compact Disc Read Only Memory)，ＭＯ(Magneto optical)ディスク，ＤＶＤ(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体に、一時的、あるいは、永続的に格納（記録）しておくことが可能である。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウエアとして提供することが可能である。 For example, the program can be recorded in advance on a hard disk or ROM shown in FIG. Alternatively, the program is temporarily or permanently stored on a removable recording medium such as a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical) disc, a DVD (Digital Versatile Disc), a magnetic disc, or a semiconductor memory. It can be stored (recorded). Such a removable recording medium can be provided as so-called package software.

なお、プログラムは、上述したようなリムーバブル記録媒体からコンピュータにインストールする他、ダウンロードサイトから、コンピュータに無線転送してもよい。または、ＬＡＮ(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送してもよい。コンピュータでは、転送されてきたプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることが可能である。 The program may be wirelessly transferred from the download site to the computer in addition to being installed on the computer from the removable recording medium as described above. Alternatively, the data may be transferred to the computer by a wire via a network such as a LAN (Local Area Network) or the Internet. The computer can receive the transferred program and install it on a recording medium such as a built-in hard disk.

また、上記実施形態で説明した処理動作に従って時系列的に実行されるのみならず、処理を実行する装置の処理能力、あるいは、必要に応じて並列的にあるいは個別に実行するように構築することも可能である。 In addition to being executed in time series in accordance with the processing operations described in the above embodiment, the processing capability of the apparatus that executes the processing, or a configuration to execute in parallel or individually as necessary Is also possible.

本発明の一実施形態であるＰＣの構成を示すブロック図である。It is a block diagram which shows the structure of PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われる学習のプロセスの概要を示すイメージ図である。It is an image figure which shows the outline | summary of the process of learning performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われる学習のプロセスの流れを示すフローチャートである。It is a flowchart which shows the flow of the process of learning performed with PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣに学習対象として入力される画像データの各例を示す図である。It is a figure which shows each example of the image data input as learning object in PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣに入力される画像データ中に指定された正解データの各例を示す図である。It is a figure which shows each example of the correct data designated in the image data input into PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われる学習サンプル生成の例を説明する図である。It is a figure explaining the example of the learning sample production | generation performed with PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われるF-measure算出の例を説明する図である。It is a figure explaining the example of F-measure calculation performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われるテストのプロセスの概要を示すイメージ図である。It is an image figure which shows the outline | summary of the process of the test performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われるテストのプロセスの流れを示すフローチャートである。It is a flowchart which shows the flow of the process of the test performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣにテスト対象として入力される画像データの例を示す図である。It is a figure which shows the example of the image data input as test object in PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣにテスト対象として入力される画像データを縮小率毎に示す図である。It is a figure which shows the image data input as test object into PC which is one Embodiment of this invention for every reduction ratio. 本発明の一実施形態であるＰＣで行われるスコア統合の一例を説明する図である。It is a figure explaining an example of score integration performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣで行われるスコアの平均算出の一例を説明する図である。It is a figure explaining an example of the average calculation of the score performed by PC which is one Embodiment of this invention. 本発明の一実施形態であるＰＣから出力される人物検出領域の一例を示す図である。It is a figure which shows an example of the person detection area output from PC which is one Embodiment of this invention.

符号の説明Explanation of symbols

１ＰＣ
１１ＣＰＵ
１２メモリ
１３記憶手段
１４通信手段
１５出力手段
１６操作手段
１７記録媒体読取手段
２０切り取られる画像パッチ
２１切り取られる画像パッチ
３１記録された長方形領域
３２記録された長方形領域 1 PC
11 CPU
DESCRIPTION OF SYMBOLS 12 Memory 13 Storage means 14 Communication means 15 Output means 16 Operation means 17 Recording medium reading means 20 Image patch to be cut 21 Image patch to be cut 31 Recorded rectangular area 32 Recorded rectangular area

Claims

学習されたモデルを用い、画像データ中の人物領域を検出する人物検出装置であって、
検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、前記画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、前記第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定手段と、
前記第１の値が前記第１閾値以上である場合、前記学習モデルを用いて、前記画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、前記第２の値に基づいて前記画像パッチを変化させ、変化させた領域を記録する領域記録手段と、
記録した領域のピクセルに同一のスコアを与え、前記記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、前記境界線で囲まれた領域を人物検出領域とするスコア算出判定手段と、
を有することを特徴とする人物検出装置。 A human detection device that detects a human region in image data using a learned model,
By cutting out a predetermined area in the image data to be detected as an image patch and inputting it into a learning model created in advance, a first value indicating how much the image patch overlaps the person area is calculated, First value calculation determination means for determining whether or not the first value is greater than or equal to a preset first threshold;
When the first value is equal to or greater than the first threshold, the learning model is used to calculate a second value indicating how much the image patch is deviated from the person area, and the second value is calculated as the second value. An area recording means for changing the image patch based on the recorded area and recording the changed area;
Give the same score to the pixels in the recorded area, add the scores of the overlapping areas of the recorded areas, calculate the average value of the score for each vertical column and horizontal row, and calculate the average value A score calculation determination means that sets a column and a row that are equal to or greater than a preset second threshold as a boundary line, and sets a region surrounded by the boundary line as a person detection region;
A person detecting device characterized by comprising:

少なくとも１人の人物が映っている画像データが入力され、前記画像データ中の人物領域が指定された場合、前記画像データ中の複数の領域毎に、前記指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成手段と、
前記学習サンプルに基づいて、計算結果として出力される値のエラーが減るように前記学習を行い、前記学習モデルを作成するモデル作成手段と、
を有することを特徴とする請求項１記載の人物検出装置。 When image data showing at least one person is input and a person area in the image data is designated, a learning sample is obtained for each of a plurality of areas in the image data based on the designated person area. Learning sample generating means for generating
Based on the learning sample, model learning means for performing the learning so as to reduce an error in a value output as a calculation result and creating the learning model;
The person detecting apparatus according to claim 1, wherein

前記学習サンプルは、前記画像データ中の所定の領域が前記指定された人物領域からどの程度ずれているかを示す値、及び、前記画像データ中の所定の領域が前記指定された人物領域とどの程度重なっているかを示す値であることを特徴とする請求項２記載の人物検出装置。 The learning sample includes a value indicating how much a predetermined area in the image data is deviated from the designated person area, and how much the predetermined area in the image data is different from the designated person area. The person detection device according to claim 2, wherein the person detection device is a value indicating whether or not they overlap.

前記指定された人物領域とは、前記画像データ中の人物の頭又は顔から、少なくとも首又は肩までを含む領域であることを特徴とする請求項３記載の人物検出装置。 4. The person detection apparatus according to claim 3, wherein the designated person region is a region including at least a neck or a shoulder from a person's head or face in the image data.

前記第１の値算出判定手段及び前記領域記録手段は、
パラメータとして予め設定された前記画像パッチの移動量及び前記検出対象の画像データの縮小率を変化させながら処理を繰り返し行い、
全てのパラメータでの処理が終了したら、前記スコア算出判定手段の処理に移行することを特徴とする請求項１から４のいずれか１項に記載の人物検出装置。 The first value calculation determination unit and the area recording unit are:
The processing is repeated while changing the movement amount of the image patch set in advance as a parameter and the reduction rate of the image data to be detected,
5. The person detection apparatus according to claim 1, wherein when the processing is completed for all parameters, the process proceeds to the process of the score calculation determination unit.

少なくとも１人の人物が映っている画像データが入力され、前記画像データ中の人物領域が指定された場合、前記画像データ中の複数の領域毎に、前記指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成手段と、
前記学習サンプルに基づいて、計算結果として出力される値のエラーが減るように前記学習を行い、前記学習モデルを作成するモデル作成手段と、
を有することを特徴とする人物検出装置。 When image data showing at least one person is input and a person area in the image data is designated, a learning sample is obtained for each of a plurality of areas in the image data based on the designated person area. Learning sample generating means for generating
Based on the learning sample, model learning means for performing the learning so as to reduce an error in a value output as a calculation result and creating the learning model;
A person detecting device characterized by comprising:

前記学習サンプルは、前記画像データ中の所定の領域が前記指定された人物領域からどの程度ずれているかを示す値、及び、前記画像データ中の所定の領域が前記指定された人物領域とどの程度重なっているかを示す値であることを特徴とする請求項６記載の人物検出装置。 The learning sample includes a value indicating how much a predetermined area in the image data is deviated from the designated person area, and how much the predetermined area in the image data is different from the designated person area. The person detection apparatus according to claim 6, wherein the person detection apparatus is a value indicating whether or not they overlap.

前記指定された人物領域とは、前記画像データ中の人物の頭又は顔から、少なくとも首又は肩までを含む領域であることを特徴とする請求項７記載の人物検出装置。 The person detection apparatus according to claim 7, wherein the designated person region is a region including at least a neck or a shoulder from a person's head or face in the image data.

学習されたモデルを用い、画像データ中の人物領域を検出する人物検出方法であって、
検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、前記画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、前記第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定ステップと、
前記第１の値が前記第１閾値以上である場合、前記学習モデルを用いて、前記画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、前記第２の値に基づいて前記画像パッチを変化させ、変化させた領域を記録する領域記録ステップと、
記録した領域のピクセルに同一のスコアを与え、前記記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、前記境界線で囲まれた領域を人物検出領域とするスコア算出判定ステップと、
を有することを特徴とする人物検出方法。 A human detection method for detecting a human area in image data using a learned model,
By cutting out a predetermined area in the image data to be detected as an image patch and inputting it into a learning model created in advance, a first value indicating how much the image patch overlaps the person area is calculated, A first value calculation determination step for determining whether or not the first value is equal to or greater than a preset first threshold;
When the first value is equal to or greater than the first threshold, the learning model is used to calculate a second value indicating how much the image patch is deviated from the person area, and the second value is calculated as the second value. An area recording step for changing the image patch based on the recorded area and recording the changed area;
Give the same score to the pixels in the recorded area, add the scores of the overlapping areas of the recorded areas, calculate the average value of the score for each vertical column and horizontal row, and calculate the average value A score calculation determination step in which a column and a row that are equal to or greater than a preset second threshold value are defined as a boundary line, and a region surrounded by the boundary line is defined as a person detection region;
A person detection method characterized by comprising:

少なくとも１人の人物が映っている画像データが入力され、前記画像データ中の人物領域が指定された場合、前記画像データ中の複数の領域毎に、前記指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成ステップと、
前記学習サンプルに基づいて、計算結果として出力される値のエラーが減るように前記学習を行い、前記学習モデルを作成するモデル作成ステップと、
を有することを特徴とする人物検出方法。 When image data showing at least one person is input and a person area in the image data is designated, a learning sample is obtained for each of a plurality of areas in the image data based on the designated person area. A learning sample generation step for generating
Based on the learning sample, a model creation step for performing the learning so as to reduce an error in a value output as a calculation result and creating the learning model;
A person detection method characterized by comprising:

学習されたモデルを用い、画像データ中の人物領域を検出するためのプログラムであって、
検出対象の画像データ中の所定領域を画像パッチとして切り取って予め作成された学習モデルに入力することで、前記画像パッチが人物領域とどの程度重なっているかを示す第１の値を算出し、前記第１の値が予め設定された第１閾値以上であるか否かを判定する第１の値算出判定処理と、
前記第１の値が前記第１閾値以上である場合、前記学習モデルを用いて、前記画像パッチが人物領域からどの程度ずれているかを示す第２の値を算出し、前記第２の値に基づいて前記画像パッチを変化させ、変化させた領域を記録する領域記録処理と、
記録した領域のピクセルに同一のスコアを与え、前記記録した領域同士が重複する部分のスコアを加算し、縦の１列及び横の１行毎にスコアの平均値を算出し、算出した平均値が予め設定された第２閾値以上になる列及び行を境界線とし、前記境界線で囲まれた領域を人物検出領域とするスコア算出判定処理と、
をコンピュータに実行させることことを特徴とするプログラム。 A program for detecting a human area in image data using a learned model,
By cutting out a predetermined area in the image data to be detected as an image patch and inputting it into a learning model created in advance, a first value indicating how much the image patch overlaps the person area is calculated, A first value calculation determination process for determining whether or not the first value is equal to or greater than a preset first threshold;
When the first value is equal to or greater than the first threshold, the learning model is used to calculate a second value indicating how much the image patch is deviated from the person area, and the second value is calculated as the second value. An area recording process for changing the image patch based on the image and recording the changed area;
Give the same score to the pixels in the recorded area, add the scores of the overlapping areas of the recorded areas, calculate the average value of the score for each vertical column and horizontal row, and calculate the average value A score calculation determination process in which a column and a row that are equal to or greater than a preset second threshold value are defined as a boundary line, and a region surrounded by the boundary line is defined as a person detection region;
A program characterized by causing a computer to execute.

少なくとも１人の人物が映っている画像データが入力され、前記画像データ中の人物領域が指定された場合、前記画像データ中の複数の領域毎に、前記指定された人物領域に基づいて学習サンプルを生成する学習サンプル生成処理と、
前記学習サンプルに基づいて、計算結果として出力される値のエラーが減るように前記学習を行い、前記学習モデルを作成するモデル作成処理と、
をコンピュータに実行させることを特徴とするプログラム。 When image data showing at least one person is input and a person area in the image data is designated, a learning sample is obtained for each of a plurality of areas in the image data based on the designated person area. A learning sample generation process for generating
Based on the learning sample, model learning processing for performing the learning so as to reduce an error in a value output as a calculation result and creating the learning model;
A program that causes a computer to execute.