JP2020187531A

JP2020187531A - Information processing device, information processing method and program

Info

Publication number: JP2020187531A
Application number: JP2019091384A
Authority: JP
Inventors: 佐藤　博; Hiroshi Sato; 博佐藤; 山本　貴久; Takahisa Yamamoto; 貴久山本; 敦夫野本; Atsuo Nomoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-11-19
Anticipated expiration: 2039-05-14
Also published as: JP7337541B2

Abstract

To decide a condition capable of identifying a particular person even if the installation environment of a camera is different.SOLUTION: An information processing device according to the present invention which addresses a technical problem is an information processing device which decides a threshold to determine that a person represented by a face characteristic is the same person as a predetermined person on the basis of the output result by an identifier which outputs the face characteristic corresponding to the person from a face image. This information processing device includes: extracting means for extracting, from a plurality of images, a face image representing the face of a person, and a human body image representing a human body corresponding to the face; identifying means for identifying a first human body image group similar to the human body image from the plurality of images; and deciding means for deciding the threshold in such a way that a rate to determine that the face image group corresponding to the first human body image group identified by the identifying means does not match the same person becomes smaller than a predetermined rate on the basis of a first similarity level obtained by comparing respective first output results input to the identifier.SELECTED DRAWING: Figure 1

Description

本発明は、映像から特定の人物を識別する技術に関する。 The present invention relates to a technique for identifying a specific person from an image.

複数のカメラで撮影された映像から人物を識別する技術において、カメラの設置条件に応じて、人物を識別するための識別条件を設定する必要がある。例えば、人物の顔を認証する（個人を特定する）処理を例にとる。この場合、あるカメラで得られた映像中の顔画像の特徴と、あらかじめ登録された顔画像の特徴とを比較する。このとき、この２つの顔画像から抽出された特徴を用いて、本人と識別する基準を識別条件として設定する必要がある。しかし、カメラの設置環境によっては、あらかじめ学習された顔画像と同じ特徴が抽出できるとは限らない。このため、カメラが設置された環境に応じて、人物を識別する識別条件をそれぞれ設定する必要である。この識別条件を設定するには、カメラが設置された環境に応じて識別対象である人物についての学習データを十分に取得する必要がある。 In the technique of identifying a person from images taken by a plurality of cameras, it is necessary to set identification conditions for identifying the person according to the installation conditions of the cameras. For example, take the process of authenticating a person's face (identifying an individual) as an example. In this case, the features of the face image in the image obtained by a certain camera are compared with the features of the face image registered in advance. At this time, it is necessary to set a criterion for identifying the person as an identification condition by using the features extracted from these two face images. However, depending on the installation environment of the camera, it is not always possible to extract the same features as the face image learned in advance. Therefore, it is necessary to set identification conditions for identifying a person according to the environment in which the camera is installed. In order to set this identification condition, it is necessary to sufficiently acquire learning data about the person to be identified according to the environment in which the camera is installed.

特許文献１では、実際に設置された環境においてカメラが撮像した映像を用いて、ある人物についての学習データを収集する技術として、映像中に映った特定の人物を追尾し、そのフレームごとにその人物を識別する。本人と識別されたフレームと、本人と識別されなかったフレームを選別し、後者のフレームについて識別された人物のラベルを付与することによって、新たな学習データを生成する。 In Patent Document 1, as a technique for collecting learning data about a certain person by using an image captured by a camera in an actually installed environment, a specific person reflected in the image is tracked and the person is tracked for each frame. Identify a person. New learning data is generated by selecting a frame identified as the person and a frame not identified as the person and assigning a label of the person identified for the latter frame.

特開２０１８−１８１１５７号広報Japanese Patent Application Laid-Open No. 2018-181157

特許文献１では、現地に設置されたカメラで撮影された映像について人物の学習データを取得できる。しかし、同じカメラで撮影された映像から人物の追尾をするため、複数のカメラに適用することを考慮していない。また、同じ画角に含まれる人物を対象に追尾するため学習データとして偏りが発生する懸念がある。このような理由から、特許文献１の方法では、必要な学習データを十分に収集できないために、人物を識別するための識別条件を適切に設定できない可能性がある。本発明は上記課題に鑑みてなされたものであり、カメラの設置環境が異なる場合でも特定の人物を識別できる条件を決定することを目的とする。 In Patent Document 1, it is possible to acquire learning data of a person for an image taken by a camera installed in the field. However, since it tracks a person from images taken by the same camera, it is not considered to be applied to multiple cameras. In addition, there is a concern that bias may occur as learning data because a person included in the same angle of view is tracked. For this reason, in the method of Patent Document 1, since necessary learning data cannot be sufficiently collected, there is a possibility that the identification conditions for identifying the person cannot be appropriately set. The present invention has been made in view of the above problems, and an object of the present invention is to determine conditions under which a specific person can be identified even if the installation environment of the camera is different.

上記課題を解決する本発明にかかる情報処理装置は、顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理装置であって、複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出手段と、前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定する特定手段と、前記特定手段によって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて、同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定手段と、を有することを特徴とする。 In the information processing device according to the present invention that solves the above problems, it is determined that the person indicated by the facial features is the same as a predetermined person based on the output result of the classifier that outputs the facial features corresponding to the person from the face image. An information processing device that determines a threshold to be determined, and is an extraction means for extracting a face image showing a person's face and a human body image showing a human body corresponding to the face from a plurality of images, and the human body image. A specific means for identifying a similar first human body image group from a plurality of the images and a face image group corresponding to the first human body image group specified by the specific means are input to the classifier. It is characterized by having a determination means for determining the threshold value so that the ratio of being determined not to be the same person is smaller than a predetermined ratio based on the first similarity degree in which the first output results are compared with each other. To do.

本発明によれば、カメラの設置環境が異なる場合でも特定の人物を識別できる条件を決定できる。 According to the present invention, it is possible to determine the conditions under which a specific person can be identified even if the installation environment of the camera is different.

情報処理システムの概念を説明する図Diagram explaining the concept of an information processing system 情報処理システムの機能構成例を示すブロック図Block diagram showing an example of functional configuration of an information processing system 情報処理装置のハードウェア構成例を示す図The figure which shows the hardware configuration example of an information processing apparatus 情報処理システムが実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing system 情報処理装置が実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing device 情報処理装置が実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing device 情報処理装置が実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing device 監視カメラの映像と認識結果の一例を示すAn example of the image of the surveillance camera and the recognition result is shown. 識別結果の分布の一例を示す図The figure which shows an example of the distribution of the identification result 情報処理装置が実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing device

＜実施形態１＞
以下、図面を参照して本発明の実施形態を詳細に説明する。はじめに、本実施形態の概念について図１を用いて説明する。情報処理システム１は、ある環境に設置された複数のカメラにおいて共通の映像解析（例えば、人物識別やブラックリスト検知等）を行う。撮像システム２は、監視対象のエリアに設置された複数の監視カメラから構成される。 <Embodiment 1>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, the concept of the present embodiment will be described with reference to FIG. The information processing system 1 performs common video analysis (for example, person identification, blacklist detection, etc.) in a plurality of cameras installed in a certain environment. The imaging system 2 is composed of a plurality of surveillance cameras installed in the area to be monitored.

ここでは例として、４つのエリアに監視カメラを設置した例を説明する。それぞれのカメラは、野外（１０Ａ）、屋内の高い位置（１０Ｂ）、屋内の低い位置（１０Ｃ）、屋上（１０Ｄ）の４か所に設置される。それぞれのカメラは、設置された位置姿勢、環境条件、カメラの内部パラメータが同じとは限らない。特定人物の画像と照合して、その人物と同じ人を探し出す際は、それぞれのカメラで撮影された映像を、それぞれのカメラに対応した識別条件（閾値）を設定することになる。ここで、識別条件とは、画像から検出された人物の画像特徴同士の類似度についての閾値である。類似度とは、人物に固有の特徴を有する特定の部位（例えば顔）を含む画像を識別器に入力し、出力された特徴同士を比較することによって取得する。監視システム３では、１台以上の監視カメラから取得された映像を解析することによって、タスクを達成する。 Here, as an example, an example in which surveillance cameras are installed in four areas will be described. Each camera is installed at four locations: outdoor (10A), indoor high position (10B), indoor low position (10C), and rooftop (10D). Each camera does not always have the same position, orientation, environmental conditions, and internal parameters of the camera. When collating with the image of a specific person to find the same person as that person, the identification condition (threshold value) corresponding to each camera is set for the image taken by each camera. Here, the identification condition is a threshold value for the degree of similarity between image features of a person detected from an image. The degree of similarity is acquired by inputting an image including a specific part (for example, a face) having a characteristic peculiar to a person into a classifier and comparing the output features with each other. The surveillance system 3 accomplishes the task by analyzing the images acquired from one or more surveillance cameras.

撮像装置１０Ａは、例えば監視カメラで、周辺環境を撮像する。撮像装置１０Ａ〜Ｄまたは情報処理装置１００は、複数のカメラから取得した映像を用いて、映像に含まれる人物を識別する各監視カメラの閾値を決定する。表示装置１０７は、監視中のカメラの映像や、情報処理装置１００や各監視カメラによって撮影された映像や映像を用いた解析結果を表示する。これによって、ユーザ（監視者）はカメラの映像や、識別処理における判定結果を容易に視認できる。 The image pickup device 10A uses, for example, a surveillance camera to image the surrounding environment. The imaging devices 10A to D or the information processing device 100 use images acquired from a plurality of cameras to determine a threshold value of each surveillance camera that identifies a person included in the image. The display device 107 displays an image of the camera being monitored and an analysis result using the image or image taken by the information processing device 100 or each surveillance camera. As a result, the user (monitorer) can easily visually recognize the image of the camera and the determination result in the identification process.

以下の実施形態では、各監視カメラに識別器を備え、他のカメラの映像の解析結果を用いて各撮像装置に設定すべき閾値を決定する例を述べる。なお、情報処理装置１００が各撮像装置に対応する閾値を決定するような実施形態も可能である。その場合は、以下に示す撮像装置の機能構成を情報処理装置１００に置き換えればよい。 In the following embodiment, an example will be described in which each surveillance camera is provided with a classifier, and the threshold value to be set in each imaging device is determined using the analysis results of images of other cameras. It should be noted that an embodiment in which the information processing apparatus 100 determines a threshold value corresponding to each imaging apparatus is also possible. In that case, the functional configuration of the imaging device shown below may be replaced with the information processing device 100.

実施形態１では、他のカメラに映った人物の特徴の照合結果を使って、所定のカメラに対応する閾値を決定する。人物が映った画像から得られる異なる２つの特徴（ここでは顔と人体）を抽出し、他のカメラから得られる映像と各特徴を照合する。このとき、照合に用いる特徴は各監視カメラで共通して抽出されやすい特徴で照合すると良い。つまり、複数のカメラによってとらえられた人物の特徴をマッチングする。例えば、人物の服の色から得られる輝度（特徴）は、他の特徴に比べてカメラの設置場所の影響を受けにくいと考えられるため、複数のカメラによって撮像された画像間で同一人物を特定することができる。 In the first embodiment, the threshold value corresponding to a predetermined camera is determined by using the collation result of the characteristics of the person captured by another camera. Two different features (here, face and human body) obtained from an image of a person are extracted, and each feature is collated with images obtained from other cameras. At this time, the features used for collation may be collated with features that are commonly extracted by each surveillance camera. That is, the characteristics of the person captured by the plurality of cameras are matched. For example, the brightness (feature) obtained from the color of a person's clothes is considered to be less affected by the location of the camera than other features, so the same person can be identified among images captured by multiple cameras. can do.

以下、説明では便宜上、閾値を決定したいカメラを注目撮像装置と呼ぶ。本実施形態では、注目撮像装置で撮像された画像と、他のカメラで撮像された画像に類似する特徴を特定し、同一人物の特徴同士を比較した第１類似度と、他人の特徴同士を比較した第２類似度に基づいて人物の識別に必要な識別条件を決定する方法を説明する。 Hereinafter, for convenience, a camera whose threshold value is desired to be determined is referred to as a attention imaging device. In the present embodiment, the first similarity degree in which the image captured by the attention imaging device and the feature similar to the image captured by another camera are specified and the features of the same person are compared with each other and the features of another person are compared. A method of determining the identification conditions necessary for identifying a person based on the second similarity compared will be described.

先述した通り、複数の環境に設置されたカメラによって撮像された映像から、同一の人物を検出するためには、識別基準をカメラ（環境）毎に用意する必要がある。その為には、多様な人物の特徴を識別器に十分に学習させるデータが必要である。しかしながら、監視対象となる現場において、人物についての特に正解データを収集することは現実的ではない。 As described above, in order to detect the same person from images captured by cameras installed in a plurality of environments, it is necessary to prepare identification criteria for each camera (environment). For that purpose, we need data that allows the classifier to fully learn the characteristics of various people. However, it is not realistic to collect particularly correct answer data about a person at the site to be monitored.

また、学習に用いた画像が、監視対象である現場で得られる画像と類似しているとは限らないという問題もある。特に学習に用いた画像が、デジタルカメラやスマートフォンでのスナップショットからの画像である場合は、上記問題はより顕著となる。なぜなら、学習に用いた画像は正面から顔を撮影した画像であることが多い。それ対し、監視カメラの場合は、屋内の天井など高い位置に取り付けられるため、監視カメラの画像は上から見下ろした顔画像が得られる。正面からの顔画像と上からの顔画像は、正面顔と正面顔同士の比較に対して、本人同士であっても類似度が下がる場合がほとんどであり、同じ閾値を用いて本人と識別することは、困難である。 Another problem is that the image used for learning is not always similar to the image obtained at the site to be monitored. In particular, when the image used for learning is an image taken from a snapshot taken by a digital camera or a smartphone, the above problem becomes more remarkable. This is because the image used for learning is often an image of the face taken from the front. On the other hand, in the case of a surveillance camera, since it is mounted at a high position such as an indoor ceiling, the image of the surveillance camera can be obtained as a face image looking down from above. In most cases, the face image from the front and the face image from above have a lower degree of similarity than the comparison between the front face and the front face, and the same threshold value is used to identify the person. That is difficult.

つまり、複数の環境に設置されたカメラから共通する人物を検出するためには以下のような困難がある。すなわち、例えば複数の監視カメラを現地に設置した状態で、得られた映像から人物の顔画像を抽出し、人物と同一であるかを示すラベルを付与することは、手間がかかる。特に数十台を超えるカメラに映った人物に対してラベルを付けることは、相当な労力を要する。その上に、そのラベルを付けた映像に対して、解析結果から人物を識別するための適切な識別条件を各撮像装置で決定することは、手間がかかる。これは、適切な識別条件を決定するのに必要なデータを十分に収集することが困難であるためである。 That is, there are the following difficulties in detecting a common person from cameras installed in a plurality of environments. That is, for example, in a state where a plurality of surveillance cameras are installed in the field, it is troublesome to extract a face image of a person from the obtained video and give a label indicating whether the person is the same as the person. In particular, labeling people in more than dozens of cameras requires considerable effort. On top of that, it is troublesome for each imaging device to determine an appropriate identification condition for identifying a person from the analysis result for the image with the label. This is because it is difficult to collect enough data to determine appropriate identification conditions.

本実施形態では、これらの問題を、複数の環境で撮影された画像から人物を検出した複数の検出結果を組み合わせることによって、人物を識別する識別条件を撮像装置毎に決定することで対処する。なお、本実施形態では、識別器の識別条件を決定する決定フェーズと、識別条件を決定済みの識別器を用いた識別フェーズに分けて説明する。まず、決定フェーズについて説明する。なお、以下の説明では、複数の撮像装置のうち１つの撮像装置で行う処理を説明する。同様の処理を他の撮像装置で実行することによって、すべての撮像装置に対応する閾値を決定できる。また、情報処理装置１００で一括して処理を実行してもよい。現地の映像でこのようなヒストグラムを得るには、従来は人手で正解を
付けて、測定する必要があったが、本発明により、自動的に正解とみなせるデータを得ることにより、人手を省いて、識別条件を得ることが可能になった。 In the present embodiment, these problems are dealt with by determining the identification conditions for identifying a person for each imaging device by combining a plurality of detection results of detecting a person from images taken in a plurality of environments. In the present embodiment, the determination phase for determining the identification condition of the discriminator and the identification phase for using the determined discriminator will be described separately. First, the decision phase will be described. In the following description, the processing performed by one of the plurality of imaging devices will be described. By executing the same process on another imaging device, the threshold value corresponding to all the imaging devices can be determined. Further, the information processing apparatus 100 may collectively execute the processing. In the past, in order to obtain such a histogram in a local image, it was necessary to manually add a correct answer and measure it, but according to the present invention, by automatically obtaining data that can be regarded as a correct answer, manpower is omitted. , It has become possible to obtain identification conditions.

なお、説明上、監視カメラが複数あるシステムで説明しているが、本発明は監視カメラに限定されるものではない。Ｗｅｂカメラや、デジタルカメラなど、用途が異なる複数のカメラから構成されるカメラシステムにおいても、本実施形態を適応可能であることは言うまでもない。また、カメラはすべて同一機種である必要はない。すなわち、異なるカメラを複数の環境で用いてもよい。以下、詳細に説明する。 In the description, a system having a plurality of surveillance cameras is described, but the present invention is not limited to the surveillance cameras. Needless to say, this embodiment can be applied to a camera system composed of a plurality of cameras having different uses such as a Web camera and a digital camera. Also, the cameras do not have to be all the same model. That is, different cameras may be used in a plurality of environments. The details will be described below.

図２は、情報処理装置１０Ａの機能構成例を示したブロック図である。本情報処理装置１０Ａは、具体的には撮像装置である。情報処理装置１０Ａは、撮像部２００、画像取得部２０１、検出部２０２、識別部２０３、決定部２０４、記録部２０５、出力部２０６を含む。これらは、バスによって接続され、必要なデータ、命令等の情報が伝達される。なお、出力部２０６は装置の外部に備わっていても良い。また、撮像部２００についても外部の装置に備わっていてもよい。ここでは、撮像装置１０Ａについて説明するが、撮像装置１０Ｂ、１０Ｃ、１０Ｄも同様の構成を有する情報処理装置であるとする。 FIG. 2 is a block diagram showing a functional configuration example of the information processing device 10A. The information processing device 10A is specifically an imaging device. The information processing device 10A includes an image pickup unit 200, an image acquisition unit 201, a detection unit 202, an identification unit 203, a determination unit 204, a recording unit 205, and an output unit 206. These are connected by a bus, and information such as necessary data and instructions is transmitted. The output unit 206 may be provided outside the device. Further, the imaging unit 200 may also be provided in an external device. Although the image pickup device 10A will be described here, it is assumed that the image pickup devices 10B, 10C, and 10D are also information processing devices having the same configuration.

撮像装置１０Ａ、１０Ｂ、１０Ｃ、１０Ｄは、監視対象の環境のうちそれぞれ異なる環境に設置されたカメラである。個々の監視カメラは、結像光学系、ズーム機構を備えた光学レンズで構成される。また、パン・チルト軸方向の駆動機構を備えてもよい。 The image pickup devices 10A, 10B, 10C, and 10D are cameras installed in different environments among the monitoring target environments. Each surveillance camera is composed of an imaging optical system and an optical lens equipped with a zoom mechanism. Further, a drive mechanism in the pan / tilt axis direction may be provided.

撮像部２００は、センサによって外界を計測する。ここでは、情報処理装置１０Ａは監視カメラであるため、画像センサによって画像（映像）を撮像する。画像（映像）センサの具体例としては、典型的にはＣＣＤまたはＣＭＯＳイメージセンサが用いられ、不図示のセンサ駆動回路からの読み出し制御信号により所定の映像信号が画像データとして出力される。例えば、サブサンプリング、ブロック読み出しして得られる信号が画像データとして出力される。それぞれのカメラの設置場所と撮影する画像の例は後述する。 The imaging unit 200 measures the outside world with a sensor. Here, since the information processing device 10A is a surveillance camera, an image (video) is captured by an image sensor. As a specific example of the image (video) sensor, a CCD or CMOS image sensor is typically used, and a predetermined video signal is output as image data by a read control signal from a sensor drive circuit (not shown). For example, a signal obtained by subsampling or reading a block is output as image data. The installation location of each camera and an example of the image to be taken will be described later.

画像取得部２０１は、複数の監視カメラが撮影した映像（時系列画像）を取得する。なお、後述するフローチャートではフレーム毎に処理を行う。 The image acquisition unit 201 acquires images (time-series images) taken by a plurality of surveillance cameras. In the flowchart described later, processing is performed for each frame.

検出部２０２は、画像取得部２０１から取得した時系列画像または後述する記録部２０５から取得した時系列画像データに対して、人物の検出を行う。ここで人物とは典型的には人物の顔や人体のことである。画像中の顔の位置や、人体の位置について検出を行う。車や動物など人以外の物体についても検出してもよい。 The detection unit 202 detects a person with respect to the time-series image acquired from the image acquisition unit 201 or the time-series image data acquired from the recording unit 205 described later. Here, a person is typically a person's face or human body. Detects the position of the face and the position of the human body in the image. Objects other than humans, such as cars and animals, may also be detected.

これらは、公知の技術、例えばＤｅｅｐＬｅａｒｎｉｎｇ（以下、ＤＬと表記する）の技術を用いることで容易に実現することが出来る。顔検出のＤＬは、画像中から顔を探すように訓練したニューラルネットワークである。具体的には、画像を入力すると何らかの値が出るようなニューラルネットで、顔画像の時は高い値で、そうでない画像（非顔画像）では低い値になるように訓練する。訓練データに顔画像についてより詳細な情報を含むことで、例えば顔の向きや、年齢・性別などその属性情報を検出するようにしてもよい。これもＤＬによって実現することが出来る。 These can be easily realized by using a known technique, for example, a technique of Deep Learning (hereinafter referred to as DL). The face detection DL is a neural network trained to search for faces in images. Specifically, it is a neural network that gives some value when an image is input, and is trained so that the value is high for a face image and low for an image (non-face image) that is not. By including more detailed information about the face image in the training data, the attribute information such as the orientation of the face and the age / gender may be detected. This can also be realized by DL.

識別部２０３は、識別フェーズにおいて、検出部２０２で検出した人物の顔画像を識別器に入力した出力結果（顔特徴）を出力する。すなわち、個体の違い、例えば、人物であれば、ＡさんはＢさんなど、個体としての違いを判別する処理を行う。識別部２０３の詳細については、後述する。 In the identification phase, the identification unit 203 outputs an output result (face feature) in which the face image of the person detected by the detection unit 202 is input to the classifier. That is, the difference between individuals, for example, in the case of a person, Mr. A performs a process of discriminating the difference as an individual such as Mr. B. The details of the identification unit 203 will be described later.

決定部２０４は、特定の人物を識別するための識別条件である閾値を決定する。つまり、決定フェーズにおいて、各カメラによって撮像された画像から検出された人物のうち共通する人物の情報に基づいて、各カメラに設定された閾値を決定する。本人を示す特徴が本人ではないと誤る確率（本人拒否率）と、他人を示す特徴が本人であると誤る確率（他人受入率）とに基づいて、誤りが少なくなるように閾値を決定する。決定部２０４の処理については、後で詳しく説明する。 The determination unit 204 determines a threshold value which is an identification condition for identifying a specific person. That is, in the determination phase, the threshold value set for each camera is determined based on the information of a common person among the persons detected from the images captured by each camera. The threshold value is determined so that the number of errors is reduced based on the probability that the characteristic indicating the person is not the person (false rejection rate) and the probability that the characteristic indicating the other person is the person (false acceptance rate). The processing of the determination unit 204 will be described in detail later.

記録部２０５は、画像取得部２０１によって撮像された映像を受け取り、圧縮などの記録のための処理と、映像解析の処理を行って、不揮発性の内部メモリやＨＤＤやＳＤなどのメディア等の、記録装置に記録を行う。 The recording unit 205 receives the image captured by the image acquisition unit 201, performs processing for recording such as compression and processing for image analysis, and performs processing for recording such as compression and processing for image analysis to obtain a non-volatile internal memory, media such as HDD and SD, and the like. Record on a recording device.

出力部２０６は、閾値に基づいて、対象画像を識別器に入力した出力結果と予め登録された登録人物の画像の特徴とを比較した類似度が閾値を満たす場合は対象画像が示す人物は登録人物であること、を示す判定結果を出力する。また、類似度が閾値を満たさない場合は対象画像が示す人物は登録人物ではないこと、を示す判定結果を出力する。また、記録部２０５によって処理された映像および付随する情報を、モニタ等に出力する。なお、外部、典型的にはネットワークを介して、ＰＣサーバ等に接続し、映像と解析情報を転送するようにしてもよい。または、出力部２０６は表示部であって、撮影中の映像や、識別結果を表示してもよい。 The output unit 206 compares the output result of inputting the target image into the classifier based on the threshold value and the characteristics of the image of the registered person registered in advance. If the similarity satisfies the threshold value, the person indicated by the target image is registered. Outputs a judgment result indicating that the person is a person. Further, when the similarity does not satisfy the threshold value, a determination result indicating that the person indicated by the target image is not a registered person is output. In addition, the video processed by the recording unit 205 and the accompanying information are output to a monitor or the like. It should be noted that the video and the analysis information may be transferred by connecting to a PC server or the like via an external device, typically a network. Alternatively, the output unit 206 may be a display unit and display an image being photographed or an identification result.

ここで、情報処理装置１０Ａのハードウェア構成例について図３を用いて説明する。中央処理ユニット（ＣＰＵ）３０１は、ＲＡＭ３０３をワークメモリとして、ＲＯＭ３０２や記憶装置３０４に格納されたＯＳやその他プログラムを読みだして実行し、システムバス３０９に接続された各構成を制御して、各種処理の演算や論理判断などを行う。ＣＰＵ３０１が実行する処理には、実施形態の情報処理が含まれる。記憶装置３０４は、ハードディスクドライブや外部記憶装置などであり、実施形態の画像認識処理にかかるプログラムや各種データを記憶する。入力部３０５は、カメラなどの撮像装置、ユーザ指示を入力するためのボタン、キーボード、タッチパネルなどの入力デバイスである。なお、記憶装置３０４は例えばＳＡＴＡなどのインタフェイスを介して、入力部３０５は例えばＵＳＢなどのシリアルバスを介して、それぞれシステムバス３０９に接続されるが、それらの詳細は省略する。通信Ｉ／Ｆ３０６は無線通信で外部の機器と通信を行う。表示部３０７はディスプレイである。なお、表示部は情報処理装置の内部に有していても、外部に接続されていてもよい。センサ３０８は画像センサである。 Here, a hardware configuration example of the information processing apparatus 10A will be described with reference to FIG. The central processing unit (CPU) 301 uses the RAM 303 as a work memory to read and execute the OS and other programs stored in the ROM 302 and the storage device 304, and controls each configuration connected to the system bus 309 to control various configurations. Performs processing operations and logical judgments. The process executed by the CPU 301 includes the information processing of the embodiment. The storage device 304 is a hard disk drive, an external storage device, or the like, and stores programs and various data related to the image recognition process of the embodiment. The input unit 305 is an image pickup device such as a camera, an input device such as a button for inputting a user instruction, a keyboard, and a touch panel. The storage device 304 is connected to the system bus 309 via an interface such as SATA, and the input unit 305 is connected to the system bus 309 via a serial bus such as USB, but the details thereof will be omitted. The communication I / F 306 communicates with an external device by wireless communication. The display unit 307 is a display. The display unit may be provided inside the information processing device or may be connected to the outside. The sensor 308 is an image sensor.

識別部２０３は、人体画像抽出部５０１、部分画像抽出部５０２、人体画像照合部５０３、部分画像照合部５０４、識別部５０５を含む。図で示したもの以外に、人物の色特徴を抽出するものや、輪郭特徴を抽出するもの、動きの特徴を抽出するものなど、複数の特徴抽出と、それに対応する画像照合部があってよい。 The identification unit 203 includes a human body image extraction unit 501, a partial image extraction unit 502, a human body image collation unit 503, a partial image collation unit 504, and an identification unit 505. In addition to the ones shown in the figure, there may be a plurality of feature extractions such as one for extracting color features of a person, one for extracting contour features, and one for extracting motion features, and an image collation unit corresponding thereto. ..

人体画像抽出部５０１、検出部２０２が検出した人物の位置と大きさに関する情報をもとに、人体を示す人体画像を抽出する。また、部分画像抽出部５０２は、検出部２０２が検出した人物の位置と大きさに関する情報をもとに、顔を示す顔画像を抽出する。これらの部分画像から、個体を識別するための特徴を取得する。この処理には、公知の技術を用いればよい。例えば、ＬＢＰ（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎ）特徴などを用いることができる。ＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ）特徴やＳＩＦＴ（Ｓｃａｌｅ−ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴、これらを混合した特徴を用いてもよい。抽出した特徴をＰＣＡ（ＰｒｉｎｃｉｐａｌＣｏｍｐｏｎｅｎｔＡｎａｌｙｓｉｓ）等の手法を用いて次元圧縮してもよい。また、これについても前述のようにＤＬ技術を適用することが可能である。 Based on the information on the position and size of the person detected by the human body image extraction unit 501 and the detection unit 202, a human body image showing the human body is extracted. Further, the partial image extraction unit 502 extracts a face image showing a face based on the information regarding the position and size of the person detected by the detection unit 202. Features for identifying individuals are obtained from these partial images. A known technique may be used for this treatment. For example, LBP (Local Binary Pattern) features and the like can be used. HOG (Histogram of Oriented Gradient) features, SIFT (Scale-Invariant Feature Transfer) features, and features that are a mixture of these may be used. The extracted features may be dimensionally compressed using a method such as PCA (Principal Component Analysis). Further, the DL technology can be applied to this as described above.

人体特徴と部分特徴を、抽出する領域が異なるだけになるようにしてもよいが、人物の識別の場合、全体と部分とで取得される特徴が示す情報の種類を異なるようにすることが多い。具体的には、顔は位置関係の情報、人体は色情報で特徴を取得する。人体で個体を識別する場合、服装など、色を含んだ情報が有用とされており、そのような特徴を用いる場合が多い。ＤＬ技術を適用する場合でも、明示的に輪郭のみを抽出するように設計した場合を除いて、暗黙的に色の情報が含まれていることが多いとされる。これは、被写体が画像上で小さいサイズである場合や、後ろを向いている場合などでも有用な情報は服装のテククスチャ、すなわち色成分を含んだ情報が有用と考えられるからである。これに対して、部分的な特徴、人物の場合の顔特徴は、色情報では不十分なだけでなく、色情報だけだと、誤認証する場合もあり得るので、あまり積極的に色情報は用いられない。顔領域のエッジ量や、輝度の勾配方向などを用いることが多い。近年はＤＬ技術の発展が目覚ましく、顔の識別においても、有用な特徴として、ＤＬ特徴が用いられている。人体と顔とで同じＤＬ技術を用いた場合でも、学習データはそれぞれ別であり、ネットワーク構造も個別に設計することが多いので、特徴としては全く別物である。なお、人体特徴は人物が着用しているゼッケン番号や、個人毎に付与された視認できるバーコードによって個人を画像で識別できるような特徴でも良い。 The human body feature and the partial feature may be extracted only in different areas, but in the case of identifying a person, the type of information indicated by the feature acquired in the whole and the part is often different. .. Specifically, the face acquires the feature by the positional relationship information, and the human body acquires the feature by the color information. When identifying an individual by the human body, information including color such as clothes is considered to be useful, and such characteristics are often used. Even when the DL technology is applied, it is often said that color information is implicitly included unless it is explicitly designed to extract only the outline. This is because even when the subject is small in size on the image or when the subject is facing backward, the texture of the clothes, that is, the information including the color component is considered to be useful. On the other hand, for partial features and facial features in the case of a person, not only color information is insufficient, but also color information may cause erroneous authentication, so color information is used very aggressively. I can't. In many cases, the edge amount of the face region, the gradient direction of the brightness, and the like are used. In recent years, the development of DL technology has been remarkable, and DL features have been used as useful features in face identification. Even when the same DL technology is used for the human body and the face, the learning data is different and the network structure is often designed individually, so the features are completely different. The human body feature may be a feature that allows an individual to be identified by an image by a bib number worn by the person or a visible barcode assigned to each individual.

人体画像照合部５０３は、類似する人体画像の集合を取得するため、抽出された人体画像をそれぞれ照合する。また、部分画像照合部５０４は、類似する顔画像の集合を取得するため、抽出された顔画像をそれぞれ照合する。それぞれ人体画像または部分画像同士の照合処理を行う。典型的には、特徴を数値列（特徴ベクトル）として扱い、２つの特徴ベクトルの距離を計測することによって識別する。特徴ベクトルを、機械学習の技術、例えばサポートベクターマシンに投入することにより、同一か否か識別させることも可能である。距離ではなく、内積を取得して、２つの特徴がどれほど類似しているかを数値（以下類似度と呼ぶ）としてあらわすこともできる。簡単には、距離の逆数をとれば、同じく類似度に変換することもできる。ＤＬ特徴を用いる場合では、学習時に用いた類似度取得方法と同じにするのが良い。 The human body image collation unit 503 collates the extracted human body images in order to acquire a set of similar human body images. Further, the partial image collation unit 504 collates the extracted face images in order to acquire a set of similar face images. Collation processing is performed between human body images or partial images, respectively. Typically, a feature is treated as a numerical sequence (feature vector) and identified by measuring the distance between the two feature vectors. It is also possible to discriminate whether or not the feature vector is the same by inputting it into a machine learning technique, for example, a support vector machine. It is also possible to obtain the inner product instead of the distance and express how similar the two features are as a numerical value (hereinafter referred to as the degree of similarity). Simply, the reciprocal of the distance can be converted to the same degree of similarity. When using the DL feature, it is preferable to use the same method for acquiring the similarity used at the time of learning.

以下、同一人物の照合の方法について図８を用いて説明する。図８の画像Ａはカメラ１０Ａ、画像Ｂはカメラ１０Ｂ、画像Ｃはカメラ１０Ｃ、画像Ｄはカメラ１０Ｄによって撮影された異なる時刻における画像であるとする。第１の人物は、人体１００００と顔１０００１で示される人物であるとする。この第１の人物は、画像Ｂにおける人体１０００２、顔１０００３、画像Ｃにおける人体１０００４、顔１０００５、画像Ｄにおける人体１０００６、顔１０００７にそれぞれ対応する。識別部５０５が行う具体的な処理の一例としては、画像Ａにおける第１の人物の顔画像１０００１（または顔画像の特徴）を、他の画像Ｂ，Ｃ，Ｄにおいて人物照合用の特徴量を抽出して、照合を行う。その結果、理想的には、顔画像１０００３、１０００５、１０００７が得られる。これらの顔画像は同一人物であると見なせる。なお、広域を監視するカメラにおいては、顔画像よりも人体画像の方がマッチングしやすい場合がある。特に、珍しい服装を着用している人物等は人体画像のマッチングが精度よい。例えば、各画像から所定の人物を抽出するために、画像Ａの人体画像１００００（の特徴）をテンプレートとする。このテンプレートを、画像Ｂ，Ｃ，Ｄにおいてスライディングウィンドウとして走査する。結果、人体画像１０００２、１０００４、１０００６を得る。抽出された部分画像は所定の人物であると見なせる。なお、所定の人物は少なくとも２つ以上のカメラで撮影された画像において検出された任意の人物である。 Hereinafter, a method of collating the same person will be described with reference to FIG. It is assumed that the image A of FIG. 8 is the camera 10A, the image B is the camera 10B, the image C is the camera 10C, and the image D is an image taken by the camera 10D at different times. The first person is assumed to be the person represented by the human body 10000 and the face 10001. This first person corresponds to the human body 10002 and face 10003 in image B, the human body 10004 and face 10005 in image C, and the human body 10006 and face 10007 in image D, respectively. As an example of specific processing performed by the identification unit 505, the face image 10001 (or the feature of the face image) of the first person in the image A is used, and the feature amount for person matching is used in the other images B, C, and D. Extract and collate. As a result, ideally, facial images 1003, 10005, 1007 are obtained. These facial images can be regarded as the same person. In a camera that monitors a wide area, a human body image may be easier to match than a face image. In particular, for people wearing unusual clothes, matching of human body images is accurate. For example, in order to extract a predetermined person from each image, the human body image 10000 (feature) of the image A is used as a template. This template is scanned as a sliding window in images B, C, D. As a result, human body images 10002, 10004, 10006 are obtained. The extracted partial image can be regarded as a predetermined person. The predetermined person is an arbitrary person detected in the images taken by at least two or more cameras.

識別部５０５は、人体画像照合部５０３による照合結果をもとに類似した人体画像群に対応する顔画像群を、識別器に入力し、第１の出力結果を取得する。ここでは、人体画像は各監視カメラに共通して取得しやすいという前提として、複数の画像から類似した人体画像群を抽出し、さらにその人体画像群からそれぞれ対応した顔画像群を取得する。 The identification unit 505 inputs a face image group corresponding to a similar human body image group based on the collation result by the human body image collation unit 503 into the classifier, and acquires the first output result. Here, on the premise that the human body image is easy to acquire in common to each surveillance camera, a similar human body image group is extracted from a plurality of images, and a corresponding face image group is acquired from the human body image group.

つまり、典型的には以下のように行う。すなわち、あるカメラ（注目撮像装置：識別器を決定する対象）の画像に映った所定の人物の人体を示す人体画像と、他の監視カメラの画像から検出された人物の人体画像とを照合する。その結果、最も高い類似度を示した（画像）特徴が、さらに所定の条件に当てはまっていれば、注目撮像装置の画像に映った所定の人物と、照合された特徴は同一人物であると識別する。最も高い類似度が閾値を下回った場合、検出された所定の人物に該当する画像がないと識別する。以上が１つの識別器についての識別で、人物の場合は、例えば顔だけで識別することに該当する。同じことを人体特徴についても行い、２つの結果を統合して、最終的な識別を行う。 That is, it is typically performed as follows. That is, the human body image showing the human body of a predetermined person reflected in the image of a certain camera (attention imaging device: target for determining the classifier) is compared with the human body image of the person detected from the images of other surveillance cameras. .. As a result, if the (image) feature showing the highest degree of similarity further meets the predetermined conditions, it is identified that the predetermined person shown in the image of the image pickup device and the collated feature are the same person. To do. If the highest similarity falls below the threshold, it identifies that there is no corresponding image for the detected predetermined person. The above is the identification of one classifier, and in the case of a person, it corresponds to, for example, identification only by a face. Do the same for human features and integrate the two results for final identification.

統合の方法は、簡単には多数決で行う。複数の識別器の結果が相反する場合、以下のようにするとよい。すなわち、事前に識別器に信頼度を設定しておき、もっとも信頼度の高い識別器の結果を採用する。信頼度は、例えば、事前に決めた画像セット（正解情報がある）で、各識別器の正解率を求めて、その正解率を信頼度として設定する。また、以下のようにしてもよい。個々の識別器の類似度に信頼度をかけて、全識別器の結果を足すことで、統合された類似度が取得され、その値をもって閾値と比較して、該当する画像を識別すればよい。以上が、識別部２０３の処理の内容である。個々の識別器（以下、顔識別器と人体識別器と呼ぶ）について特徴取得した後、それらの出力結果を統合して人物の識別を行う。人物の識別の場合、まず顔による識別を行い、次に人体の識別を行って、２つの出力結果を統合する。ほかにも識別器がある場合は、順次識別を行い、最後に統合を行って、出力結果とする。 The method of integration is simply a majority vote. If the results of multiple classifiers are in conflict, the following should be done. That is, the reliability is set in the classifier in advance, and the result of the classifier with the highest reliability is adopted. The reliability is, for example, a predetermined image set (with correct answer information), the correct answer rate of each classifier is obtained, and the correct answer rate is set as the reliability. Moreover, you may do as follows. By multiplying the similarity of individual classifiers by the reliability and adding the results of all classifiers, the integrated similarity can be obtained, and the value can be compared with the threshold value to identify the corresponding image. .. The above is the content of the processing of the identification unit 203. After acquiring the characteristics of each discriminator (hereinafter referred to as a face discriminator and a human body discriminator), the output results thereof are integrated to identify a person. In the case of identifying a person, first the face is identified, then the human body is identified, and the two output results are integrated. If there are other classifiers, they are identified in sequence, and finally integrated to obtain the output result.

次に、決定部２０４について説明する。決定部２０４は、画像情報取得部６０１、対応付け部６０２、誤り率取得部６０３、決定部６０４を含む。画像情報取得部６０１は、記録部２０５と検出部２０２から、撮像装置毎に撮像された映像（時系列画像）と、その映像（時系列画像）から検出された各人物の位置情報とを含む画像情報を取得する。 Next, the determination unit 204 will be described. The determination unit 204 includes an image information acquisition unit 601, an association unit 602, an error rate acquisition unit 603, and a determination unit 604. The image information acquisition unit 601 includes an image (time-series image) captured by each imaging device from the recording unit 205 and the detection unit 202, and position information of each person detected from the image (time-series image). Get image information.

対応付け部６０２は、画像情報取得部６０１から取得した画像情報に基づいて、複数の撮像装置において検出された人物を対応付ける。対応付け部６０２で行われる処理については、後で詳しく説明する。誤り率取得部６０３は、対応付け部６０２の対応付け結果に基づいて、識別部２０３のある識別器に関して誤り率を取得する。ここで、誤り率とは、他人受入率（Ｆａｌｓｅ−Ｐｏｓｉｔｉｖｅ；誤検知）と本人拒否率（Ｆａｌｓｅ−Ｎｅｇａｔｉｖｅ；検知漏れ）とを含む。例えば人物Ａを特定したい場合に、本人拒否率は、検出された人物Ａが同一人物でないと識別した結果が誤りである確率（割合）である。他人受入率は、検出された人物Ｂが同一人物であると識別した結果が誤りである確率（割合）である。 The association unit 602 associates the persons detected by the plurality of imaging devices with each other based on the image information acquired from the image information acquisition unit 601. The processing performed by the association unit 602 will be described in detail later. The error rate acquisition unit 603 acquires the error rate for a certain classifier of the identification unit 203 based on the association result of the association unit 602. Here, the error rate includes the false acceptance rate (false acceptance rate) and the false rejection rate (false-negative; detection omission). For example, when it is desired to identify the person A, the false rejection rate is the probability (ratio) that the result of identifying the detected person A as not being the same person is incorrect. The false acceptance rate is the probability (ratio) that the result of identifying the detected person B as the same person is incorrect.

誤り率取得部６０３の処理の内容については後述する。決定部６０４は、誤り率取得部６０３で取得された誤り率に基づいて、撮像装置毎の閾値を決定する。閾値決定部６０４の処理の内容についても、後で詳しく説明する。なお、閾値は、初期設定として予め決定された閾値をセットしておく。これによって、元の認識精度を確かめることができる。 The details of the processing of the error rate acquisition unit 603 will be described later. The determination unit 604 determines the threshold value for each imaging device based on the error rate acquired by the error rate acquisition unit 603. The details of the processing of the threshold value determination unit 604 will also be described in detail later. As the threshold value, a predetermined threshold value is set as an initial setting. This makes it possible to confirm the original recognition accuracy.

図４は、情報処理装置が実行する処理を説明するフローチャートである。図４を用いて本実施形態の処理の概要を説明する。以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。ただし、図４のフローチャートに示した処理は、コンピュータである図３のＣＰＵ１０１により記憶装置１０４に格納されているコンピュータプログラムに従って実行される。情報処理装置１００は必ずしもこのフローチャートで説明するすべてのステップを行わなくても良い。なお、ここではＳ４０３における識別条件の更新は行わないものとする（第２の実施形態で説明する。）
Ｓ４００では、決定部２０４が識別器の閾値を決定するか否かを判断する。本実施形態においては、時間に応じて判断する。例えば、監視カメラが一定時間（例えば、２４時間）稼働したら閾値を決め直すようにする。また、例えば初回に本情報処理装置を起動する際も、識別条件を新たに決定するようにしてもよい。Ｓ４００でＹｅｓと判断した場合、Ｓ４０２に進む。Ｓ４００でＮｏと判断した場合、Ｓ４０４に進む。 FIG. 4 is a flowchart illustrating a process executed by the information processing apparatus. The outline of the process of this embodiment will be described with reference to FIG. In the following description, the notation of the process (step) is omitted by adding S at the beginning of each process (step). However, the process shown in the flowchart of FIG. 4 is executed by the CPU 101 of FIG. 3, which is a computer, according to the computer program stored in the storage device 104. The information processing apparatus 100 does not necessarily have to perform all the steps described in this flowchart. Here, it is assumed that the identification condition in S403 is not updated (described in the second embodiment).
In S400, the determination unit 204 determines whether or not to determine the threshold value of the discriminator. In the present embodiment, the determination is made according to the time. For example, when the surveillance camera operates for a certain period of time (for example, 24 hours), the threshold value is set again. Further, for example, when the information processing apparatus is started for the first time, the identification condition may be newly determined. If it is determined to be Yes in S400, the process proceeds to S402. If No is determined in S400, the process proceeds to S404.

Ｓ４０１とＳ４０２は閾値決定フェーズである。Ｓ４０１では、決定部２０４が、各カメラの映像について検出された人物の対応付けを行う。詳しい処理は図５を用いて後述する。Ｓ４０２では、決定部２０４が、監視カメラ毎に設定された識別器の閾値を決定する。詳しい説明は後述する。 S401 and S402 are threshold determination phases. In S401, the determination unit 204 associates the detected person with respect to the image of each camera. Detailed processing will be described later with reference to FIG. In S402, the determination unit 204 determines the threshold value of the discriminator set for each surveillance camera. A detailed explanation will be described later.

Ｓ４０４は、監視フェーズである。Ｓ４０４では、識別部２０３が、識別器と閾値とを用いて撮像された画像にターゲット人物が含まれていないか識別する。つまり、ターゲット人物を映像から検出する。Ｓ４０４では、識別部２０３が監視を続行するか否かを判断する。本実施形態では、ユーザ指示によって、監視の続行もしくは中断を判断する。監視を続行する場合（Ｙｅｓ）は、Ｓ４００に戻る。監視を中断する場合（Ｎｏ）は、処理を終了する。処理を開始してから一定時間経過後に終了するようにしてもよい。また、所望の人物を識別できた場合に終了するようにしてもよい。 S404 is a monitoring phase. In S404, the identification unit 203 identifies whether or not the target person is included in the image captured by using the classifier and the threshold value. That is, the target person is detected from the video. In S404, the identification unit 203 determines whether or not to continue monitoring. In the present embodiment, the continuation or interruption of monitoring is determined by a user instruction. When continuing the monitoring (Yes), the process returns to S400. When the monitoring is interrupted (No), the process ends. The process may be terminated after a lapse of a certain period of time from the start. Moreover, you may end when a desired person can be identified.

図５は、決定部２０４が実行する処理の一例を説明するフローチャートである。最初に前提条件として、処理量が膨大になることを避けるため、以下の処理を行う時間範囲がユーザまたは事前の設定により指定されるか、設置後に予め定めた期間内で行うようにする。 FIG. 5 is a flowchart illustrating an example of processing executed by the determination unit 204. First, as a prerequisite, in order to avoid an enormous amount of processing, the time range for performing the following processing should be specified by the user or preset settings, or should be performed within a predetermined period after installation.

以下の図５に沿って、決定部２０４が実行する処理を説明する。まず、Ｓ７０１では、画像取得部２０１が、各撮像装置によって撮像された時系列画像を取得する。画像情報には、時系列画像とカメラの識別子とが含まれる。次に、Ｓ７０２では、対応付け部６０２が、Ｓ７０１で取得した画像情報に基づいて、検出部２０２から各撮像装置の画像から検出された人物の検出結果に対して人物の位置を取得する。 The process executed by the determination unit 204 will be described with reference to FIG. 5 below. First, in S701, the image acquisition unit 201 acquires a time-series image captured by each imaging device. The image information includes a time series image and a camera identifier. Next, in S702, the association unit 602 acquires the position of the person with respect to the detection result of the person detected from the image of each imaging device from the detection unit 202 based on the image information acquired in S701.

次に、Ｓ７０３では、人体画像抽出部５０１と、部分画像抽出部５０２が、各撮像装置によって撮像された各時系列画像に含まれる人物を示す部分画像（顔画像）と人体画像とを抽出する。ここでは、各時系列画像から検出された人物すべてにこの処理を行う。ここまでで取得した各映像に対して共通する方法（同じ識別器）で、すべての人物の画像を抽出する。照合するときは同じ識別器から取得した特徴同士で比較するためである。なお、検出された人物のうち、検出の信頼度（検出された物体が人物である確からしさ）が所定の閾値より大きいといった条件を満たす一部の人物のみを取得してもよい。例えば、画角の中央付近に映った人物は特徴がうまく抽出できる可能性が高いため、積極的に閾値決定に用いる。この際、画像を入力する識別部２０３にある識別器のうち、もっとも信頼できるものにするとよい。人物識別器の信頼性はあらかじめ定めたテストデータで事前に性能を測ることで取得することができる。 Next, in S703, the human body image extraction unit 501 and the partial image extraction unit 502 extract a partial image (face image) and a human body image showing a person included in each time-series image captured by each imaging device. .. Here, this process is performed on all the persons detected from each time-series image. Images of all people are extracted by the same method (same discriminator) for each video acquired so far. This is because the features acquired from the same classifier are compared with each other when collating. It should be noted that, among the detected persons, only some persons who satisfy the condition that the reliability of detection (probability that the detected object is a person) is larger than a predetermined threshold value may be acquired. For example, a person who appears near the center of the angle of view is likely to be able to extract features well, so it is actively used for threshold determination. At this time, it is preferable to use the most reliable discriminator in the discriminator 203 for inputting an image. The reliability of the person identifyr can be obtained by measuring the performance in advance with predetermined test data.

次に、Ｓ７０４では、人体画像照合部５０３が、Ｓ７０３において取得された人体画像のうち、所定の人物を示す注目人体画像に基づいて、第１の撮像装置とは別の第２の撮像装置によって撮像された画像から所定の人物を示す人体画像を照合する。ここで、照合した少なくとも１つ以上の人体画像を第１の人体画像群と呼ぶ。つまり、取得されたすべての画像におけるすべての人体画像のうち、所定の人物を示す注目人体画像と類似した人体画像を所定の人物と見なす。これによって、複数の撮像装置によって撮像された所定の人物の人体画像を対応付ける。この処理は以下のようにして行う。ある撮像装置によって撮像された画像から検出された人物の人体画像と、ほか撮像装置によって撮像された画像から検出された人物の人体画像とを照合する。具体的には、第１撮像装置によって撮像された画像から検出された人物の人体画像Ｘと、第１撮像装置とは異なる撮像装置によって撮像された画像から検出された人物画像Ｙとを比較し、類似度が所定の値より大きい場合、類似した画像であると照合する。処理の高速化のために、時刻情報を用いてもよい。視野の重複がないように設置された監視カメラでは、同時に同じ人物が映ることはない。また、カメラの位置関係によって、一方のカメラに現れた時刻から、他方のカメラまでの移動時間が予測されるので、同一人物が現れやすい時間帯が推定可能である。また、ディープラーニングによって、照合を行ってもよい。 Next, in S704, the human body image collating unit 503 uses a second imaging device different from the first imaging device based on the attention human body image showing a predetermined person among the human body images acquired in S703. A human body image showing a predetermined person is collated from the captured image. Here, at least one or more collated human body images are referred to as a first human body image group. That is, among all the human body images in all the acquired images, the human body image similar to the attention human body image showing the predetermined person is regarded as the predetermined person. As a result, human body images of a predetermined person captured by a plurality of imaging devices are associated with each other. This process is performed as follows. The human body image of a person detected from an image captured by a certain imaging device is collated with the human body image of a person detected from an image captured by another imaging device. Specifically, the human body image X of the person detected from the image captured by the first imaging device is compared with the human body image Y detected from the image captured by the imaging device different from the first imaging device. If the similarity is greater than a predetermined value, the images are matched with each other. Time information may be used to speed up the processing. Surveillance cameras installed so that the fields of view do not overlap do not show the same person at the same time. Further, since the moving time from the time when the person appears in one camera to the other camera is predicted by the positional relationship of the cameras, it is possible to estimate the time zone in which the same person is likely to appear. In addition, collation may be performed by deep learning.

続いて、Ｓ７０５では、対応付け部６０２が全ての撮像装置によって撮像されたすべての画像から検出されたすべての人物について、上記処理を行ったか判断する。未処理の人物がある場合（ＹＥＳ）、次の人物を対象に処理を行うためＳ７０１に戻る。全ての人物を処理した場合（Ｓ７０５でＮＯの場合）、Ｓ７０６に進む。 Subsequently, in S705, it is determined whether the association unit 602 has performed the above processing on all the persons detected from all the images captured by all the imaging devices. If there is an unprocessed person (YES), the process returns to S701 to process the next person. When all the persons are processed (NO in S705), the process proceeds to S706.

Ｓ７０６では、対応付け部６０２が、Ｓ７０４で注目特徴と照合した第１の人体画像群と対応する第１の顔画像群を対応付ける。画像から検出された人物に各撮像装置に共通のユニークな識別子（ＩＤ）を付与する。この識別子は後段の誤り率取得部６０３で用いられる。以上が、対応付け部６０２で行われる処理の説明である。この処理によって、ある撮像装置によって撮像されたある人物が検出された画像を、システムに含まれる複数の撮像装置によって撮像された画像と対応付けることができる。その結果、複数の撮像装置で撮像された画像から共通人物を特定することができる。 In S706, the association unit 602 associates the first human body image group collated with the feature of interest in S704 with the corresponding first face image group. A unique identifier (ID) common to each imaging device is given to the person detected from the image. This identifier is used in the error rate acquisition unit 603 in the subsequent stage. The above is the description of the processing performed by the association unit 602. By this process, an image in which a certain person captured by a certain imaging device is detected can be associated with an image captured by a plurality of imaging devices included in the system. As a result, a common person can be identified from the images captured by a plurality of imaging devices.

次に、決定部２０４が実行する処理について図８を用いて詳細に説明する。この処理では、ある人物を識別する識別器について、カメラ毎に適切な閾値を設定する。図９を用いて識別条件（閾値）の決定方法について説明する。 Next, the process executed by the determination unit 204 will be described in detail with reference to FIG. In this process, an appropriate threshold value is set for each camera for the classifier that identifies a certain person. A method of determining the identification condition (threshold value) will be described with reference to FIG.

図９のグラフ９０は、縦軸ｙ（ｘ＝０）は頻度を、横軸ｘ（ｙ＝０）は同じ識別器によって出力された特徴同士のペアの類似度を示すヒストグラムである。ここでは、類似度は特徴ベクトルの内積で示されるものとする（−１＜類似度Ｓ＜１）。つまり、類似度が大きいほどペアが同一人物である可能性が高く、類似度は小さいほどペアは他人同士である可能性が高い。まず、従来技術において、識別条件を決定する際には図９（Ａ）に示すヒストグラムが得られる。このとき、顔画像のペア（顔特徴のペア）は本人同士であるか、他人同士であるか分からない（正解のペアが既知でない）。特に、本人同士の顔画像のペアを取得するのが難しい。そのため、２種類の誤り率（本人拒否率と他人受入率）を特定することができない。従って、誤り率が小さくなるような識別条件を自動的に決定することができなかった。 In the graph 90 of FIG. 9, the vertical axis y (x = 0) is a histogram showing the frequency, and the horizontal axis x (y = 0) is a histogram showing the similarity between the pairs of features output by the same classifier. Here, the similarity is indicated by the inner product of the feature vectors (-1 <similarity S <1). That is, the higher the similarity, the more likely the pair is the same person, and the lower the similarity, the more likely the pair is another person. First, in the prior art, when determining the identification conditions, the histogram shown in FIG. 9A is obtained. At this time, it is unknown whether the pair of face images (pair of facial features) is the person or another person (the correct pair is not known). In particular, it is difficult to obtain a pair of facial images of each other. Therefore, it is not possible to specify two types of error rates (false rejection rate and false acceptance rate). Therefore, it was not possible to automatically determine the identification conditions that reduce the error rate.

一方で、本実施形態では、照合結果を用いることで、本人同士と他人同士の顔画像（顔特徴）の２種類のペアが特定できる。本人同士のペアである顔画像群から取得された類似度をグラフ９１に示す。これは本人同士の顔画像の組み合わせが取りうる類似度の頻度を示す。また、他人同士のペアからはグラフ９２が得られる。これは他人同士の顔画像の組み合わせが取りうる類似度の頻度を示す。グラフ９１の左側（０に近い類似度を取る範囲）は、本人の顔画像同士のペアであるのに低い類似度を取るため、識別結果を誤る可能性が高い。この場合の識別ミスを本人拒否率（検知漏れ）と呼ぶ。この本人拒否率は、図９（Ｂ）に示す閾値９００と、グラフ９１とｘ軸（ｙ＝０）が成す面積９０１で示される。この本人拒否率を下げたい場合は、この面積が所定の割合より小さくなるように閾値（識別条件）を大きくすると良い。もう一方の誤り率である他人受入率は、図９（Ｃ）の面積９０２に示される。グラフ９２は、他人同士の顔画像（顔特徴）の類似度をプロットした結果である。 On the other hand, in the present embodiment, by using the collation result, two types of pairs of face images (face features) between the person and another person can be specified. Graph 91 shows the degree of similarity obtained from the face image group which is a pair of the persons. This indicates the frequency of similarity that can be taken by the combination of facial images of the individuals. In addition, a graph 92 can be obtained from a pair of others. This indicates the frequency of similarities that can be taken by combining facial images of others. The left side of the graph 91 (the range in which the similarity is close to 0) is a pair of facial images of the person, but the similarity is low, so there is a high possibility that the identification result will be erroneous. The identification error in this case is called the false rejection rate (missing detection). This false rejection rate is shown by the threshold value 900 shown in FIG. 9B and the area 901 formed by the graph 91 and the x-axis (y = 0). If it is desired to reduce the false rejection rate, it is advisable to increase the threshold value (identification condition) so that this area becomes smaller than a predetermined ratio. The false acceptance rate, which is the other error rate, is shown in the area 902 of FIG. 9 (C). Graph 92 is a result of plotting the similarity of facial images (face features) between other people.

面積９０２は、閾値９００‘とグラフ９２とｙ＝０を積分した値である。他人を本人である間違えてしまう他人受入率（誤検知）を減らしたい場合は、面積９０２が所定の割合より小さくなるように閾値を小さくすると良い。なお、面積９０１と面積９０２はトレードオフの関係であって、どちらかを小さくすると一方が大きくなる。したがって、ユースケースに応じて、２つの和を最小にする、一方の確率が所定の割合より小さくなるようにするといった条件を満たすように閾値を決定すると良い。この条件はユーザが予め設定してもよい。このようにして、２種類のヒストグラムを用いて識別条件を設定することができる。現地の映像でこのようなヒストグラムを得るには、従来は人手で正解を
付けて、測定する必要があったが、本発明により、自動的に正解とみなせる
データを得ることにより、人手を省いて、識別条件を得ることが可能になった
Ｓ４０２の処理について図６のフローチャートで説明する。Ｓ８０１では、誤り率取得部６０３が、閾値を更新する対象となる撮像装置に対応する識別器を取得する。この識別器は、様々な人物の顔画像とその個人に固有な特徴を学習したニューラルネットワークである。すなわち、複数の人物の顔画像を複数セット用意し、同一人物には類似した値を出力するよう学習させる。なお、人体画像でも同様の識別器を用いる。人物の部分画像（例えば顔画像や人体画像）を入力すると、人物毎に固有の特徴を出力する。 The area 902 is a value obtained by integrating the threshold value 900', the graph 92, and y = 0. If it is desired to reduce the false acceptance rate (false acceptance rate) that mistakes another person for the person himself / herself, it is preferable to reduce the threshold value so that the area 902 becomes smaller than a predetermined ratio. The area 901 and the area 902 are in a trade-off relationship, and if either one is made smaller, the other becomes larger. Therefore, depending on the use case, it is preferable to determine the threshold value so as to satisfy the conditions such as minimizing the sum of the two and making the probability of one of them smaller than a predetermined ratio. This condition may be preset by the user. In this way, the identification conditions can be set using two types of histograms. In the past, in order to obtain such a histogram in a local image, it was necessary to manually add a correct answer and measure it, but according to the present invention, by automatically obtaining data that can be regarded as a correct answer, manpower is omitted. The processing of S402, which has made it possible to obtain the identification conditions, will be described with reference to the flowchart of FIG. In S801, the error rate acquisition unit 603 acquires the classifier corresponding to the image pickup device whose threshold value is to be updated. This classifier is a neural network that learns facial images of various people and their individual characteristics. That is, a plurality of sets of face images of a plurality of people are prepared, and the same person is trained to output similar values. A similar classifier is used for human body images. When a partial image of a person (for example, a face image or a human body image) is input, features unique to each person are output.

例えば、人物Ｎの顔画像を入力したときに、ベクトルＳｎという出力をしたとする。次に、人物Ｎが映った画像で他のアングルや画角で撮った画像を入力すると、共通した特徴があればベクトルＳｎに近いベクトルＳｎ’と出力される。人物Ｎと異なる人物Ｍの画像がその識別器に入力された場合は、ベクトルＳｎと異なる、人物Ｍに固有なベクトルＳｍが出力される。つまり、２つの画像を入力された識別器の識別結果ベクトルＳｎとＳｍとの距離や内積が所定の値以下あれば、２つの画像に映った人物は同一人物と見なせる。ＳｎとＳｍが所定の値より大きい場合は、２つの画像に映った人物は異なる人物同士である。 For example, suppose that when a face image of a person N is input, an output called a vector Sn is output. Next, when an image of the person N is input and taken at another angle or angle of view, if there is a common feature, a vector Sn'close to the vector Sn is output. When an image of a person M different from the person N is input to the classifier, a vector Sm unique to the person M, which is different from the vector Sn, is output. That is, if the distance and inner product between the identification result vectors Sn and Sm of the input classifiers of the two images are equal to or less than a predetermined value, the persons reflected in the two images can be regarded as the same person. When Sn and Sm are larger than a predetermined value, the persons shown in the two images are different persons.

次に、Ｓ８０２では、誤り率取得部６０３が、対応付け部６０２からある期間の全ての撮影装置の画像から検出された全ての人物について付与した識別子を含む対応付け情報を取得する。抽出された各特徴には、画像から検出された人物に各撮像装置に共通のユニークな識別子（ＩＤ）を付与されている。 Next, in S802, the error rate acquisition unit 603 acquires the association information including the identifiers assigned to all the persons detected from the images of all the photographing devices in a certain period from the association unit 602. Each of the extracted features is given a unique identifier (ID) common to each imaging device to the person detected from the image.

次に、Ｓ８０３では、識別部５０５が、所定の人物を示す注目特徴と、別の撮像装置の画像から抽出された特徴のうち注目特徴と照合した特徴を示す照合結果を取得する。つまり、識別部５０５は、所定の人物を示す人体画像と類似した第１の人体画像群と、所定の人物の人体画像と類似しない第２の人体画像群を特定する。例えば、所定の人物の服装（人体特徴）をテンプレートに決定し、他の撮像装置の画像に対してテンプレートマッチングを行った照合結果は、他の撮像装置によって撮像された所定の人物を示している可能性がある。すなわち、第１の人体画像群は同一人物（本人）である可能性が高い人体画像の集合で、第２の人体画像群は所定の人物とは異なる人物である（他人）である可能性が高い人体画像の集合である。また、その人体画像に対応する顔画像についても同様のことがいえる。 Next, in S803, the identification unit 505 acquires a collation result indicating a feature of interest indicating a predetermined person and a feature extracted from an image of another imaging device that collates with the feature of interest. That is, the identification unit 505 identifies a first human body image group similar to the human body image showing a predetermined person and a second human body image group not similar to the human body image of the predetermined person. For example, the clothes (human body characteristics) of a predetermined person are determined as a template, and the collation result of performing template matching with the images of other imaging devices indicates the predetermined person imaged by the other imaging device. there is a possibility. That is, the first human body image group may be a set of human body images that are likely to be the same person (person), and the second human body image group may be a person (other person) different from a predetermined person. A collection of tall human body images. The same can be said for the face image corresponding to the human body image.

次に、Ｓ８０４では、誤り率取得部６０３が、第１の人体画像群と対応する顔画像群を、画像から人物と対応する（顔）特徴を出力する第１の識別器に入力した第１の出力結果同士を比較し、第１の類似度を取得する。また。誤り率取得部６０３が、第２の人体画像群と対応する顔画像群を、画像から人物と対応する（顔）特徴を出力する第２の識別器に入力した第２の出力結果同士を比較し、第２の類似度を取得する。図９（Ｂ）における、グラフ９１は第１の類似度の頻度を示す。また、図９（Ｂ）における、グラフ９２は第２の類似度の頻度を示す。のちの処理において、この２つのヒストグラムを用いて閾値を決定する。 Next, in S804, the error rate acquisition unit 603 inputs the face image group corresponding to the first human body image group to the first classifier that outputs the (face) feature corresponding to the person from the image. The output results of are compared with each other, and the first degree of similarity is obtained. Also. The error rate acquisition unit 603 compares the second output results of the face image group corresponding to the second human body image group input to the second classifier that outputs the (face) feature corresponding to the person from the image. And get the second similarity. Graph 91 in FIG. 9B shows the frequency of the first similarity. Further, in FIG. 9B, graph 92 shows the frequency of the second degree of similarity. In the later processing, the threshold value is determined using these two histograms.

Ｓ８０５では、誤り率取得部６０３が、第１の類似度と閾値とを比較し、本人の画像を本人でないと誤る可能性を示す本人拒否率を取得する。同様に、誤り率取得部６０３が、第２の類似度と閾値とを比較し、他人の画像を本人であると誤る可能性を示す他人受入率を取得する。なお、誤り率とは、本人拒否率と他人受入率との和で示される。照合結果によって示される本人同士の人体画像（とそれに対応する顔画像）のペアが正しいとして、誤り率を取得する。人物の同一性を識別する際の誤りは、２つの場合が考えられる。すなわち、本来同一であるはずの２つの人物を、異なる人物としてしまう誤り（ＦａｌｓｅＮｅｇａｔｉｖｅ：本人拒否率）と、異なる人物２つを同一であると識別してしまう誤り（ＦａｌｓｅＰｏｓｉｔｉｖｅ：他人受入率）である。この２つの誤りについて、それぞれ取得する。なお、この２つの確率はトレードオフの関係にあるため、どちらかを小さくするともう一方が大きくなってしまう。そのため、ユースケースに応じて、どちらの確率をコントロールするか設定すると良い。または、両方の確率の和が最小になるような閾値を決定すると良い。 In S805, the error rate acquisition unit 603 compares the first similarity with the threshold value and acquires the false rejection rate indicating the possibility that the image of the person is not the person. Similarly, the error rate acquisition unit 603 compares the second similarity with the threshold value and acquires the false acceptance rate indicating the possibility that the image of another person is mistaken for the person. The error rate is indicated by the sum of the false rejection rate and the false acceptance rate. The error rate is acquired assuming that the pair of human body images (and the corresponding face images) of the individuals indicated by the collation result is correct. There are two possible mistakes in identifying a person's identity. That is, an error (False Negative: false rejection rate) that makes two people who should be the same originally different, and an error (False Positive: false acceptance rate) that identifies two different people as the same. Is. Get each of these two errors. Since these two probabilities are in a trade-off relationship, if one is made smaller, the other becomes larger. Therefore, it is advisable to set which probability to control according to the use case. Alternatively, it is advisable to determine a threshold value that minimizes the sum of both probabilities.

次に、Ｓ８０６では、決定部６０４が、本人拒否率または他人受入率を所定の割合より小さくなるように閾値を決定する。つまり、決定部６０２は、閾値より小さい第１の類似度の頻度が所定の値よりも少なくなるように閾値を決定する。または、決定部６０４が、閾値より大きい第２の類似度の頻度が所定の値よりも少なくなるように閾値を決定する。または、決定部６０４は、閾値より小さい第１の類似度の頻度と前記閾値より大きい第２の類似度の頻度との和が所定の値より小さくなるように閾値を決定する。 Next, in S806, the determination unit 604 determines the threshold value so that the false rejection rate or the false acceptance rate is smaller than a predetermined ratio. That is, the determination unit 602 determines the threshold value so that the frequency of the first similarity smaller than the threshold value is less than a predetermined value. Alternatively, the determination unit 604 determines the threshold so that the frequency of the second similarity greater than the threshold is less than a predetermined value. Alternatively, the determination unit 604 determines the threshold value so that the sum of the frequency of the first similarity smaller than the threshold value and the frequency of the second similarity larger than the threshold value is smaller than a predetermined value.

決定部６０４で行われる処理について説明する。以下では特に、類似度を、閾値を超えたか否かで識別する処理について説明する。決定部３０４では、誤り率取得部６０３で取得された、識別器のパラメータ（典型的には閾値）と誤り率の表を取得して、所望の誤り率に近くなるパラメータを選択する。上述したように、誤り率には、同じ人物を異なると識別してしまう誤り（ＦａｌｓｅＮｅｇａｔｉｖｅ）と、異なる人物を同一と識別してしまう誤り（ＦａｌｓｅＰｏｓｉｔｉｖｅ）の、２種類の誤りがあり、一般に、トレードオフの関係がある。すなわち、ＦａｌｓｅＮｅｇａｔｉｖｅを減らそうとして、類似度が低くても同一と識別するようにする（閾値を下げることに相当）と、異なる人物を同一と識別するＦａｌｓｅＰｏｓｉｔｉｖｅが増える。類似度が高くても同一でないと判断する（閾値を上げることに相当）と、ＦａｌｓｅＰｏｓｉｔｉｖｅは下がるが、ＦａｌｓｅＮｅｇａｔｉｖｅは上がってしまう。通常は、２つの誤り率の総和が最小になるように設定するが、用途によっては、ＦａｌｓｅＰｏｓｉｔｉｖｅを避けたい（誤認証は避けたい）など、目的に応じた設定がとりえる。このような誤り率の目標設定は、あらかじめユーザによる指定等で、事前に定めておくことができる。実際の監視映像で事前に定めた誤り率に近くなるように、識別器のパラメータ、典型的には類似度の閾値を更新することが可能になる。このように選んだ識別器のパラメータを、識別部２０３の識別器の新しいパラメータとして設定する。以上が、決定部６０４で行われる処理の説明である。 The process performed by the determination unit 604 will be described. In particular, the process of identifying the similarity based on whether or not the threshold value has been exceeded will be described below. The determination unit 304 acquires a table of the classifier parameters (typically a threshold value) and the error rate acquired by the error rate acquisition unit 603, and selects a parameter that is close to the desired error rate. As described above, there are two types of error rates, one is an error that identifies the same person as different (False Negative) and the other is an error that identifies different persons as the same (False Positive). , There is a trade-off relationship. That is, if an attempt is made to reduce the False Negative and the same is identified even if the similarity is low (corresponding to lowering the threshold value), the False Positive that identifies different persons as the same increases. If it is judged that they are not the same even if the degree of similarity is high (corresponding to raising the threshold value), the False Positive decreases, but the False Negative increases. Normally, the sum of the two error rates is set to the minimum, but depending on the application, settings can be made according to the purpose, such as avoiding False Positive (to avoid false authentication). The target setting of such an error rate can be set in advance by the user's designation or the like. It is possible to update the classifier parameters, typically the similarity threshold, so that the actual surveillance video is close to a predetermined error rate. The parameters of the classifier selected in this way are set as new parameters of the classifier of the discriminator 203. The above is a description of the processing performed by the determination unit 604.

すべての識別器について、または予め指定された所定の回数を満たすまでは、Ｓ８０６からＳ８０３に戻り、誤り率取得部６０３が、誤り率を求め直す。これを予め定めた回数繰り返す。これによって、繰り返し処理することで識別の精度を向上させる。以上が、第１の決定で行われる処理の説明である。なお、第１の決定と第２の決定は片方のみ行ってもよいし、両方実施してもよい。また、顔特徴と人体特徴は、ケースに応じて入れ替えて処理を行ってもよい。例えば、学校行事等で似たような服装を着用する人物が多い場合は、人体特徴では個体を識別するのは難しい。そのため、顔特徴を用いて映像全体における人物の抽出を行うと良い。また、顔特徴と人体特徴だけではなく、持ち物や個体識別用の道具を用いて人物の特徴を抽出してもよい。 For all classifiers, or until a predetermined number of times specified in advance is satisfied, the process returns from S806 to S803, and the error rate acquisition unit 603 recalculates the error rate. This is repeated a predetermined number of times. As a result, the accuracy of identification is improved by iterative processing. The above is the description of the process performed in the first determination. The first decision and the second decision may be made only for one or both. Further, the facial features and the human body features may be exchanged and processed depending on the case. For example, when many people wear similar clothes at school events, it is difficult to identify the individual by the human body characteristics. Therefore, it is advisable to extract a person in the entire image using facial features. Moreover, not only facial features and human body features, but also personal features may be extracted using belongings and tools for individual identification.

なお、所定の人物は、より多くのカメラによって撮像されている人物を選択するようにしてもよい。様々な角度から撮像された画像が得られていると識別器の学習や決定がうまくいきやすいためである。また、人物の登場回数（または撮像されている時間）に応じて、決定の際に重みづけを行ってもよい。所定の人物が撮像された画像が多いほど（撮像された時間が長いほど）、所定の人物が様々なアングルで撮像されている可能性が高いためである。こうすることによって、効率的に識別器の決定ができる。また、決定対象である撮像装置に対応する識別器は、撮像装置の設置位置によって決定の重みづけを行ってもよい。 As the predetermined person, a person photographed by more cameras may be selected. This is because learning and determination of the classifier can be easily performed if images taken from various angles are obtained. In addition, weighting may be performed at the time of determination according to the number of appearances (or the time during which the person is imaged) of the person. This is because the more images a predetermined person is captured (the longer the captured time is), the higher the possibility that the predetermined person is captured at various angles. By doing so, the classifier can be determined efficiently. Further, the classifier corresponding to the image pickup device to be determined may be weighted according to the installation position of the image pickup device.

次に識別フェーズを説明する。パラメータ決定された識別器を利用する具体例として、１台以上の監視カメラの映像から予め登録された人物（以下、ターゲット人物と呼ぶ）を検出する例を説明する。ターゲット人物は施設の周辺を自由に行き来するため、複数のカメラで検出できることが望ましい。ターゲット人物が検出された場合、その旨をユーザに知らせることでユーザはターゲット人物に対して適切な対応をとることができる。なお、本実施形態における情報処理システムのタスクは、不特定多数の人物が映った映像からターゲット人物１０００を検出することである。ターゲット人物１０００とターゲット人物１０００’は同一人物である。監視カメラに映った人物が、事前に登録してあるターゲット人物に該当するか、画像から得る顔の特徴を使って識別する顔認証を行う。 Next, the identification phase will be described. As a specific example of using the parameter-determined classifier, an example of detecting a person registered in advance (hereinafter referred to as a target person) from the images of one or more surveillance cameras will be described. Since the target person can move freely around the facility, it is desirable that it can be detected by multiple cameras. When the target person is detected, the user can take an appropriate response to the target person by notifying the user to that effect. The task of the information processing system in the present embodiment is to detect the target person 1000 from the image of an unspecified number of people. The target person 1000 and the target person 1000'are the same person. Face recognition is performed to identify whether the person captured by the surveillance camera corresponds to the target person registered in advance or using the facial features obtained from the image.

識別フェーズでは、図２における画像取得部２０１、検出部２０２、識別部２０３と出力部２０６によって処理が行われる。画像取得部２０１は、各監視カメラからリアルタイムで撮像した時系列画像（映像）を取得する。検出部２０２では、決定フェーズと同様に、画像取得部２０１によって取得された時系列画像から人物を検出する。識別部２０３は、検出部２０２によって検出された人物から、２つの異なる部分特徴を抽出し、特徴から人物を識別する。ここで、決定フェーズで閾値を決定した識別器を用いる。出力部２０６は、識別部２０３によって識別された結果を図示しない表示部等に出力する。 In the identification phase, processing is performed by the image acquisition unit 201, the detection unit 202, the identification unit 203, and the output unit 206 in FIG. The image acquisition unit 201 acquires a time-series image (video) captured in real time from each surveillance camera. Similar to the determination phase, the detection unit 202 detects a person from the time-series image acquired by the image acquisition unit 201. The identification unit 203 extracts two different partial features from the person detected by the detection unit 202, and identifies the person from the features. Here, a discriminator whose threshold is determined in the determination phase is used. The output unit 206 outputs the result identified by the identification unit 203 to a display unit or the like (not shown).

図４のフローチャートを用いて、識別フェーズについて説明する。Ｓ４００において、決定部２０４が、識別器の識別条件を更新しないと判断した場合（ＮＯ）、Ｓ４０４に進み、識別フェーズが実行される。Ｓ４０４は、識別部２０２が、各監視カメラによって取得された画像から特定の人物を識別する。なお、監視カメラ（情報処理装置）は、特定の人物の顔画像（人体画像）と登録画像として登録された共通のウォッチリスト（ブラックリスト）を保持しており、撮像された画像に含まれる対象人物が登録画像の人物と類似しているか比較する。登録画像との類似度のうち、最も類似度が高いかつその類似度が識別条件より大きい場合に、対象人物は最も類似した登録画像が示す人物であると識別する。 The identification phase will be described with reference to the flowchart of FIG. If the determination unit 204 determines in S400 that the identification condition of the classifier is not updated (NO), the process proceeds to S404 and the identification phase is executed. In S404, the identification unit 202 identifies a specific person from the images acquired by each surveillance camera. The surveillance camera (information processing device) holds a face image (human body image) of a specific person and a common watch list (black list) registered as a registered image, and is an object included in the captured image. Compare whether the person is similar to the person in the registered image. When the degree of similarity with the registered image is the highest and the degree of similarity is greater than the identification condition, the target person is identified as the person indicated by the most similar registered image.

図１０を用いてＳ４０４の処理を更に説明する。Ｓ１２０１では、識別部２０３が、対象人物の画像（または特徴）を取得する。ここでは、顔画像による識別器によって個人の識別をする場合を考える。そのため、対象人物の顔を示す対象顔画像を取得する。ユーザが過去の映像データから指定しても良い。または、リアルタイム映像から指定された人物の画像を取得する。 The process of S404 will be further described with reference to FIG. In S1201, the identification unit 203 acquires an image (or feature) of the target person. Here, a case where an individual is identified by a face image classifier is considered. Therefore, a target face image showing the face of the target person is acquired. The user may specify from the past video data. Alternatively, the image of the specified person is acquired from the real-time video.

Ｓ１２０２では、識別部２０３が、画像取得部２０１で取得した時系列画像から特徴を取得する。なお、ここで取得する特徴は、決定した閾値に対応した特徴がよい。つまり、決定フェーズにおいて、顔特徴による識別器の閾値を決定した場合は、顔特徴を取得する。 In S1202, the identification unit 203 acquires features from the time-series images acquired by the image acquisition unit 201. The feature acquired here is preferably a feature corresponding to the determined threshold value. That is, in the determination phase, when the threshold value of the discriminator based on the face feature is determined, the face feature is acquired.

Ｓ１２０４では、人物識別部５０５は、人体画像照合部５０３、および、部分画像照合部５０４による、照合結果をもとに、検出された人物を識別する。識別部２０３が、決定された識別器と、Ｓ１２０３で取得された特徴とに基づいて、時系列画像に含まれる人物を識別する。 In S1204, the person identification unit 505 identifies the detected person based on the collation results of the human body image collation unit 503 and the partial image collation unit 504. The identification unit 203 identifies a person included in the time-series image based on the determined classifier and the features acquired in S1203.

Ｓ１２０５では、識別部２０３が、識別器によって出力された結果と対象人物とを照合する。Ｓ１２０５では、識別部２０３が、Ｓ１２０４の照合結果に基づいて、対象人物が画像に映っているか判断する。Ｓ１２０４で、対象人物と識別結果が一致した場合は、Ｓ１２０６に進む。Ｓ１２０４で、対象人物と識別結果が一致しなかった場合は、Ｓ１２０１に戻る。 In S1205, the identification unit 203 collates the result output by the classifier with the target person. In S1205, the identification unit 203 determines whether the target person is reflected in the image based on the collation result of S1204. If the identification result matches the target person in S1204, the process proceeds to S1206. If the identification result does not match the target person in S1204, the process returns to S1201.

Ｓ１２０６では、出力部２０６が、閾値に基づいて、対象顔画像を識別器に入力した出力結果と予め登録された登録人物の画像の特徴とを比較した類似度が閾値を満たす場合は対象顔画像が示す人物は登録人物であること、を示す判定結果を出力する。また、類似度が閾値を満たさない場合は対象顔画像が示す人物は登録人物ではないこと、を示す判定結果を出力する。具体的には、モニタ表示やアラート音によって、対象人物が検出されたことをユーザ（監視者）に伝える。 In S1206, when the output unit 206 compares the output result of inputting the target face image into the classifier based on the threshold value and the feature of the image of the registered person registered in advance, the target face image satisfies the threshold value. Outputs a determination result indicating that the person indicated by is a registered person. Further, when the similarity does not satisfy the threshold value, a determination result indicating that the person indicated by the target face image is not a registered person is output. Specifically, the user (monitorer) is notified that the target person has been detected by the monitor display or the alert sound.

以上に説明した処理によって以下のような効果が期待される。従来ユーザは所望の誤り率を実現するために、設置環境の映像で、実際にどのような誤り率となっているか、確かめる必要があった。これは実際には困難である。次善の策として、所望の誤り率に近くなるように設置環境とパラメータの対応表を用意することが考えられるが、すべての条件を事前に用意することは困難であるため、現実には不十分な対応表しか用意しえない。 The following effects can be expected from the processing described above. Conventionally, in order to realize the desired error rate, the user has had to confirm what kind of error rate is actually obtained in the video of the installation environment. This is actually difficult. As a second best measure, it is conceivable to prepare a correspondence table of the installation environment and parameters so that it is close to the desired error rate, but in reality it is not possible because it is difficult to prepare all the conditions in advance. I can't prepare a sufficient correspondence.

それに対して、本実施形態では実際の設置環境で得られた映像に対して、異なる撮像装置の映像から共通する人物の画像を取得し、これを用いてさまざまな撮像装置の閾値を変更可能となる。これにより、所望の誤り率を実現する閾値を人手に頼らずに設定可能となる。また、このように得られた映像によって適切なパラメータを設定可能なので、複数台あるカメラの映像に個別に閾値を設定することも可能となる。 On the other hand, in the present embodiment, it is possible to acquire an image of a common person from images of different imaging devices with respect to an image obtained in an actual installation environment, and use this to change the threshold values of various imaging devices. Become. This makes it possible to set a threshold value for achieving a desired error rate without relying on human hands. Further, since an appropriate parameter can be set by the image obtained in this way, it is also possible to individually set a threshold value for the images of a plurality of cameras.

カメラ台数が数十台から百台以上になる大規模な情報処理システムでは、個々のカメラ映像に対する閾値の設定問題が必ず発生する。本実施形態では、そのような大規模なシステムの課題を、逆に複数のカメラの映像が得られるメリットとして利用し、より信頼性のある閾値を取得可能である。以上が実施形態１の内容の説明である。上記のような処理を実行することによってカメラの設置環境に応じて特定の人物を識別する条件を決定できる。 In a large-scale information processing system in which the number of cameras is several tens to one hundred or more, a problem of setting a threshold value for each camera image always occurs. In the present embodiment, it is possible to obtain a more reliable threshold value by utilizing the problem of such a large-scale system as a merit of obtaining images from a plurality of cameras. The above is the description of the contents of the first embodiment. By executing the above processing, the conditions for identifying a specific person can be determined according to the installation environment of the camera.

＜実施形態２＞
実施形態１では、複数ある人物識別器のうち最も信頼性の高い識別器を選び出し、その識別器により取得した、カメラ間の人物の対応関係をもとに、誤り率を取得して、適切なパラメータを設定する例を示した。それに対して、本実施形態では、複数ある識別器の取得した、複数のカメラ間人物の対応関係を、相互に参照して、パラメータを更新する点が異なる。 <Embodiment 2>
In the first embodiment, the most reliable classifier is selected from a plurality of person classifiers, and the error rate is acquired based on the correspondence relationship between the cameras acquired by the classifier, and is appropriate. An example of setting parameters is shown. On the other hand, the present embodiment is different in that the parameters are updated by mutually referring to the correspondence between the plurality of camera-to-camera persons acquired by the plurality of classifiers.

以下、具体的に説明する。なお、重複を避けるため、以下の説明において、実施形態１と同じ部分は、省略する。本実施形態にかかわるシステムの構成は、実施形態１と同じであるので、説明を省略する。説明は実施形態１を参照されたい。実施形態１と異なる点は、決定部の処理である。以下、実施形態１と異なる点を中心に説明を行う。 Hereinafter, a specific description will be given. In order to avoid duplication, the same parts as those in the first embodiment will be omitted in the following description. Since the configuration of the system related to the present embodiment is the same as that of the first embodiment, the description thereof will be omitted. See Embodiment 1 for an explanation. The difference from the first embodiment is the processing of the determination unit. Hereinafter, the points different from those of the first embodiment will be mainly described.

決定部の構成は実施形態１と同じである。図７は決定部で行われるＳ４０３の処理の一例を示したフローチャートである。まず、Ｓ９０１では、決定部が、人物識別部の識別器から、任意の識別器を取得する。ここで適切な識別器とは、実施形態１と同じく、もっとも信頼されるべき識別器を選択すればよい。この信頼性は、事前に定められたデータを用いた性能評価を行い、その数値をもとに、取得すればよい。次に、Ｓ９０２では、対応付け部が、取得した識別器を使って、各撮像装置によって所定の期間に撮像された各時系列画像から検出された少なくとも１つ以上のオブジェクト（たとえば人物）について、基準となる識別子（ＩＤ）を付与する。識別子を付与する処理については、実施形態１の対応付け部６０２で行われる処理と同じであるので、割愛する。次に、Ｓ９０３では、決定部が、識別条件を決定する対象となるターゲット識別器を取得する。これは、特に基準は不要で、順番に選択すればよい。次に、Ｓ９０４では、誤り率取得部が、ターゲット識別器を用いて、識別子をもとに誤り率を取得する。Ｓ９０４では、決定部が、全カメラ全人物にＩＤを割り当てる。誤り率の求め方は、実施形態１の誤り率取得部６０３の処理と同じである。全人物に対するＩＤの割り当ても、Ｓ７０２の処理と同様にすればよい。Ｓ９０５では、決定部が、〇〇に基づいてすべての識別器に対して誤り率の取得が完了したか否かを判断する。全識別器に対して処理が済むまでこれを繰り返す（Ｓ９０５でＮｏの場合）。全識別器で誤り率、ＩＤ割り当てが済んだら（Ｓ９０５でＹｅｓの場合）、Ｓ９０６では、決定部が、誤り率に基づいて識別子の補正を行う。識別子の補正は、単純には、１つ以上の識別器で識別された識別結果もとに、多数決を行うようにするとよい。一つ一つの人物について、１つ以上の識別器の識別結果を投票し、もっとも投票数の多かった識別結果をその人物の識別子とすればよい。Ｓ７０５で同時に取得した誤り率をもとに、重み付の投票を行うようにしてもよい。誤り率の低い識別器の重みづけが大きくなるようにすればよい。例えば、１票に誤り率の逆数をかけて投票すればよい。誤り率は、ＦａｌｓｅＮｅｇａｔｉｖｅとＦａｌｓｅＰｏｓｉｔｉｖｅの２つがあるが、両者の和や平均を用いることができる。上記Ｓ９０３から、Ｓ９０６までの処理を所定回数繰り返す。予め定めた回数繰り返すようにしてもよいし、別の基準でやめるようにしてもよい。例えば、識別子の変化が所定の回数より少なくなった場合や、全識別器の誤り率の変化が所定の回数より少なくなった場合、などが考えられる。 The configuration of the determination unit is the same as that of the first embodiment. FIG. 7 is a flowchart showing an example of the processing of S403 performed by the determination unit. First, in S901, the determination unit acquires an arbitrary classifier from the classifier of the person identification unit. Here, as the appropriate classifier, as in the first embodiment, the most reliable classifier may be selected. This reliability may be obtained by performing a performance evaluation using predetermined data and based on the numerical value. Next, in S902, the matching unit uses the acquired classifier to detect at least one or more objects (for example, a person) from each time-series image captured by each imaging device in a predetermined period. A reference identifier (ID) is assigned. The process of assigning the identifier is the same as the process performed by the association unit 602 of the first embodiment, and is therefore omitted. Next, in S903, the determination unit acquires a target classifier for determining the identification condition. This does not require any particular criteria and may be selected in order. Next, in S904, the error rate acquisition unit acquires the error rate based on the identifier using the target classifier. In S904, the determination unit assigns IDs to all persons in all cameras. The method of obtaining the error rate is the same as the processing of the error rate acquisition unit 603 of the first embodiment. The assignment of IDs to all persons may be the same as the processing of S702. In S905, the determination unit determines whether or not the acquisition of the error rate has been completed for all the classifiers based on XX. This is repeated until the processing is completed for all the classifiers (when No in S905). After all the classifiers have assigned the error rate and ID (yes in S905), in S906, the determination unit corrects the identifier based on the error rate. In the correction of the identifier, it is preferable to simply make a majority vote based on the identification result identified by one or more classifiers. For each person, the identification result of one or more classifiers may be voted, and the identification result with the largest number of votes may be used as the identifier of the person. A weighted vote may be performed based on the error rate acquired at the same time in S705. The weighting of the classifier having a low error rate may be increased. For example, one vote may be voted by multiplying it by the reciprocal of the error rate. There are two error rates, False Negative and False Positive, and the sum or average of the two can be used. The processes from S903 to S906 are repeated a predetermined number of times. It may be repeated a predetermined number of times, or it may be stopped according to another standard. For example, it is conceivable that the change of the identifier becomes less than the predetermined number of times, or the change of the error rate of all the classifiers becomes less than the predetermined number of times.

上記のようにして求めた識別子をもとに、決定部で、誤り率を取得して、識別条件の変更を行うようにする。この処理は、実施形態１と同じである。このように、識別子の割り当てを１つの識別器の結果ではなく、複数の識別器の結果で補正し、より信頼性の高いものにしていくことで、より適切な識別子の取得と決定が可能になる。例えば、人物の識別を行う場合、人体特徴での照合と、顔特徴での照合の２つが考えられるが、一般的に、解像度が十分であれば、顔特徴の方が、信頼性が高いとされる。人体の特徴は、ＤＬ特徴であっても、色特徴に近い特徴と考えられるので、同じような服装の人物が複数いると、誤る可能性が高い。そのため、相対的に顔特徴の方が、信頼性が高いと言え、識別子は顔特徴で取得するのが妥当と考えられるが、顔特徴も万能ではない。例えば、解像度が低い場合や、顔の向きが正面から大きく離れた場合、顔の一部が隠れてしまっている場合などは、十分な精度は得られず、人体の特徴を用いた方が、精度が高い場合もある。このように、１つの識別器の結果だけを参照して、パラメータを更新するのではなく、複数の識別器の識別結果を相互に参照して、識別子を更新した方が、より適切な識別条件の決定につながり、ユーザの意図した誤り率の実現に寄与できると考えられる。以上が、実施形態２の説明である。 Based on the identifier obtained as described above, the determination unit acquires the error rate and changes the identification condition. This process is the same as in the first embodiment. In this way, by correcting the identifier assignment not with the result of one discriminator but with the result of multiple discriminators to make it more reliable, it is possible to obtain and determine a more appropriate identifier. Become. For example, when identifying a person, there are two possibilities, collation based on human body features and matching based on facial features. Generally, if the resolution is sufficient, facial features are more reliable. Will be done. Since the characteristics of the human body are considered to be characteristics close to the color characteristics even if they are DL characteristics, there is a high possibility that they will be mistaken if there are a plurality of people dressed in the same manner. Therefore, it can be said that the facial features are relatively more reliable, and it is considered appropriate to obtain the identifier by the facial features, but the facial features are not universal. For example, if the resolution is low, the orientation of the face is far from the front, or a part of the face is hidden, sufficient accuracy cannot be obtained, and it is better to use the characteristics of the human body. The accuracy may be high. In this way, it is more appropriate to update the identifier by referring to the identification results of a plurality of classifiers and updating the parameters, instead of updating the parameters by referring to the result of only one classifier. It is thought that it can lead to the decision of the user and contribute to the realization of the error rate intended by the user. The above is the description of the second embodiment.

本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、データ通信用のネットワーク又は各種記憶媒体を介してシステム或いは装置に供給する。そして、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、そのプログラムをコンピュータが読み取り可能な記録媒体に記録して提供してもよい。 The present invention is also realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiment is supplied to the system or device via a network for data communication or various storage media. Then, the computer (or CPU, MPU, etc.) of the system or device reads and executes the program. Further, the program may be recorded and provided on a computer-readable recording medium.

１情報処理システム
２撮影システム
３監視システム
１０Ａ〜１０Ｄ監視カメラ
１００情報処理装置
１０７表示装置
１０００，１０００’ 人物 1 Information processing system 2 Imaging system 3 Surveillance system 10A-10D Surveillance camera 100 Information processing device 107 Display device 1000, 1000'Person

Claims

顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理装置であって、
複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出手段と、
前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定する特定手段と、
前記特定手段によって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定手段と、を有することを特徴とする情報処理装置。 An information processing device that determines a threshold value for determining that a person indicated by a face feature is the same as a predetermined person based on an output result of a classifier that outputs a face feature corresponding to the person from a face image.
An extraction means for extracting a face image showing a person's face and a human body image showing a human body corresponding to the face from a plurality of images.
A specific means for identifying a first human body image group similar to the human body image from a plurality of the images, and
It is determined that the face image group corresponding to the first human body image group specified by the specific means is not the same person based on the first similarity between the first output results input to the classifier. An information processing apparatus comprising: a determination means for determining the threshold value so that the ratio to be determined is smaller than a predetermined ratio.

前記決定手段は、前記第１の類似度に基づいて同一人物でないと判定する割合が、予め決定された前記閾値に基づいて取得される同一人物でないと判定される割合より小さくなるように前記閾値を更新することを特徴とする請求項１に記載の情報処理装置。 The determination means so that the ratio of determining that the person is not the same person based on the first similarity is smaller than the ratio of determining that the person is not the same person acquired based on the predetermined threshold value. The information processing apparatus according to claim 1, wherein the information processing apparatus is updated.

前記決定手段によって決定された前記閾値に基づいて、対象顔画像を前記識別器に入力した出力結果と予め登録された登録人物の顔画像の特徴とを比較した類似度が前記閾値を満たす場合は前記対象顔画像に対応する人物は前記登録人物であること、前記類似度が前記閾値を満たさない場合は前記対象顔画像に対応する人物は前記登録人物ではないこと、を示す判定結果を出力する出力手段を更に有することを特徴とする請求項１に記載の情報処理装置。 When the similarity between the output result of inputting the target face image into the classifier and the features of the face image of the registered person registered in advance based on the threshold value determined by the determination means satisfies the threshold value. A determination result indicating that the person corresponding to the target face image is the registered person, and that the person corresponding to the target face image is not the registered person when the similarity does not satisfy the threshold value is output. The information processing apparatus according to claim 1, further comprising an output means.

前記決定手段は、前記第１の類似度に基づいて、同一人物でないと判定される割合を示す本人拒否率が、所定の値より小さくなるように前記閾値を決定することを特徴とする請求項２または３に記載の情報処理装置。 The claim means that the determination means determines the threshold value so that the false rejection rate, which indicates the rate of determination that the person is not the same person, is smaller than a predetermined value based on the first similarity. The information processing apparatus according to 2 or 3.

前記特定手段は、所定の人物を示す前記人体画像と類似した前記第１の人体画像群と、前記所定の人物とは類似しない人物を示す第２の人体画像群を複数の前記画像からさらに特定し、
前記決定手段は、さらに前記第２の人体画像群と対応する顔画像群を前記識別器に入力した第２の出力結果同士を比較した第２の類似度に基づいて、同一人物であると判定される割合が所定の割合より小さくなるように前記閾値を決定することを特徴とする請求項１乃至４のいずれか１項に記載の情報処理装置。 The specific means further identifies the first human body image group similar to the human body image showing a predetermined person and the second human body image group showing a person dissimilar to the predetermined person from the plurality of the images. And
The determination means further determines that the person is the same person based on the second similarity between the second output results obtained by inputting the face image group corresponding to the second human body image group into the classifier. The information processing apparatus according to any one of claims 1 to 4, wherein the threshold value is determined so that the ratio is smaller than a predetermined ratio.

前記決定手段は、前記第２の類似度に基づいて、同一人物であると判定される割合を示す他人受入率が、所定の値より小さくなるように前記閾値を決定することを特徴とする請求項５に記載の情報処理装置。 The claim is characterized in that the determination means determines the threshold value so that the false acceptance rate, which indicates the ratio of being determined to be the same person, is smaller than a predetermined value based on the second similarity degree. Item 5. The information processing apparatus according to item 5.

前記決定手段は、前記第１の類似度に基づいて同一人物でないと判定される割合と前記第２の類似度に基づいて同一人物であると判定される割合との和が所定の値より小さくなるように前記閾値を決定することを特徴とする請求項５または６に記載の情報処理装置。 In the determination means, the sum of the ratio of not being the same person based on the first similarity and the ratio of being determined to be the same person based on the second similarity is smaller than a predetermined value. The information processing apparatus according to claim 5 or 6, wherein the threshold value is determined so as to be the same.

前記時系列画像から人物を検出する検出手段を更に有し、
前記抽出手段は、前記検出手段によって検出された人物に基づいて、前記顔画像と、前記人体画像とを抽出することを特徴とする請求項１乃至７のいずれか１項に記載の情報処理装置。 Further having a detection means for detecting a person from the time series image,
The information processing apparatus according to any one of claims 1 to 7, wherein the extraction means extracts the face image and the human body image based on the person detected by the detection means. ..

前記識別器は、人物の顔が含まれる画像を入力とし、人物毎に固有の出力をするように学習された識別器であることを特徴とする請求項１乃至８のいずれか１項に記載の情報処理装置。 The discriminator according to any one of claims 1 to 8, wherein the discriminator is a discriminator that has been trained to input an image including a person's face and output a unique output for each person. Information processing equipment.

前記決定手段は、複数の撮像装置毎に用意されたそれぞれの前記閾値を決定することを特徴とする請求項１乃至９のいずれか１項に記載の情報処理装置。 The information processing device according to any one of claims 1 to 9, wherein the determination means determines each of the threshold values prepared for each of the plurality of imaging devices.

前記決定手段は、すでに決定した前記閾値に基づいて、異なる環境に対応する閾値を決定することを特徴とする請求項１乃至１０のいずれか１項に記載の情報処理装置。 The information processing apparatus according to any one of claims 1 to 10, wherein the determination means determines a threshold value corresponding to a different environment based on the already determined threshold value.

コンピュータを、請求項１乃至１１のいずれか１項に記載の情報処理装置が有する各手段として機能させるためのプログラム。 A program for causing a computer to function as each means included in the information processing apparatus according to any one of claims 1 to 11.

顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理方法であって、
複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出ステップと、
前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定する特定ステップと、
前記特定ステップによって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて、同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定ステップと、を有することを特徴とする情報処理方法。 An information processing method for determining a threshold value for determining that a person indicated by a face feature is the same as a predetermined person based on an output result of a classifier that outputs a face feature corresponding to the person from a face image.
An extraction step of extracting a face image showing a person's face and a human body image showing a human body corresponding to the face from a plurality of images.
A specific step of identifying a first human body image group similar to the human body image from a plurality of the images, and
The face image group corresponding to the first human body image group specified by the specific step must be the same person based on the first similarity between the first output results input to the classifier. An information processing method comprising: a determination step of determining the threshold value so that the determined ratio becomes smaller than a predetermined ratio.