JP7337541B2

JP7337541B2 - Information processing device, information processing method and program

Info

Publication number: JP7337541B2
Application number: JP2019091384A
Authority: JP
Inventors: 博佐藤; 貴久山本; 敦夫野本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2023-09-04
Anticipated expiration: 2039-05-14
Also published as: JP2020187531A

Description

本発明は、映像から特定の人物を識別する技術に関する。 The present invention relates to technology for identifying a specific person from video.

複数のカメラで撮影された映像から人物を識別する技術において、カメラの設置条件に応じて、人物を識別するための識別条件を設定する必要がある。例えば、人物の顔を認証する（個人を特定する）処理を例にとる。この場合、あるカメラで得られた映像中の顔画像の特徴と、あらかじめ登録された顔画像の特徴とを比較する。このとき、この２つの顔画像から抽出された特徴を用いて、本人と識別する基準を識別条件として設定する必要がある。しかし、カメラの設置環境によっては、あらかじめ学習された顔画像と同じ特徴が抽出できるとは限らない。このため、カメラが設置された環境に応じて、人物を識別する識別条件をそれぞれ設定する必要である。この識別条件を設定するには、カメラが設置された環境に応じて識別対象である人物についての学習データを十分に取得する必要がある。 In the technology for identifying a person from images captured by a plurality of cameras, it is necessary to set identification conditions for identifying the person according to the installation conditions of the cameras. For example, a process of authenticating a person's face (identifying an individual) will be taken as an example. In this case, the feature of the face image in the video obtained by a certain camera is compared with the feature of the face image registered in advance. At this time, it is necessary to use the features extracted from these two face images to set a criterion for identifying the person as an identification condition. However, depending on the installation environment of the camera, it is not always possible to extract the same features as the pre-learned face image. Therefore, it is necessary to set identification conditions for identifying a person according to the environment in which the camera is installed. In order to set this identification condition, it is necessary to acquire sufficient learning data about the person to be identified according to the environment in which the camera is installed.

特許文献１では、実際に設置された環境においてカメラが撮像した映像を用いて、ある人物についての学習データを収集する技術として、映像中に映った特定の人物を追尾し、そのフレームごとにその人物を識別する。本人と識別されたフレームと、本人と識別されなかったフレームを選別し、後者のフレームについて識別された人物のラベルを付与することによって、新たな学習データを生成する。 In Patent Literature 1, as a technique for collecting learning data about a certain person using images captured by a camera in an actually installed environment, a specific person in the image is tracked, and that person is tracked for each frame. identify a person. New training data is generated by sorting frames identified as the person himself and frames not identified as the person himself, and assigning the label of the identified person to the latter frames.

特開２０１８－１８１１５７号広報Japanese Patent Application Publication No. 2018-181157

特許文献１では、現地に設置されたカメラで撮影された映像について人物の学習データを取得できる。しかし、同じカメラで撮影された映像から人物の追尾をするため、複数のカメラに適用することを考慮していない。また、同じ画角に含まれる人物を対象に追尾するため学習データとして偏りが発生する懸念がある。このような理由から、特許文献１の方法では、必要な学習データを十分に収集できないために、人物を識別するための識別条件を適切に設定できない可能性がある。本発明は上記課題に鑑みてなされたものであり、カメラの設置環境が異なる場合でも特定の人物を識別できる条件を決定することを目的とする。 In Patent Literature 1, it is possible to acquire learning data of a person from an image captured by a camera installed on site. However, since a person is tracked from images taken by the same camera, application to multiple cameras is not considered. In addition, since a person included in the same angle of view is tracked as a target, there is a concern that bias may occur in learning data. For this reason, the method disclosed in Patent Document 1 may not be able to adequately collect necessary learning data, and may not be able to appropriately set identification conditions for identifying a person. SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to determine conditions under which a specific person can be identified even when cameras are installed in different environments.

上記課題を解決する本発明にかかる情報処理装置は、顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理装置であって、複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出手段と、前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定手段と、前記特定手段によって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて、同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定手段と、を有することを特徴とする。 An information processing apparatus according to the present invention for solving the above problems determines that a person indicated by the facial features is the same person as a predetermined person based on an output result of a classifier that outputs facial features corresponding to a person from a face image. An information processing apparatus for determining a determination threshold, comprising extracting means for extracting, from a plurality of images, a face image representing a person's face and a human body image representing a human body corresponding to the face; and the human body image. Identifying means for identifying a similar first human body image group from a plurality of said images, and a face image group corresponding to said first human body image group identified by said identifying means, are inputted to said classifier. determining means for determining the threshold value so that a percentage of the persons determined not to be the same person is smaller than a predetermined percentage, based on a first degree of similarity obtained by comparing the first output results. do.

本発明によれば、カメラの設置環境が異なる場合でも特定の人物を識別できる条件を決定できる。 According to the present invention, it is possible to determine the conditions under which a specific person can be identified even when the installation environments of the cameras are different.

情報処理システムの概念を説明する図Diagram explaining the concept of an information processing system 情報処理システムの機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing system 情報処理装置のハードウェア構成例を示す図FIG. 4 is a diagram showing a hardware configuration example of an information processing device; 情報処理システムが実行する処理を説明するフローチャートFlowchart for explaining processing executed by an information processing system 情報処理装置が実行する処理を説明するフローチャートFlowchart for explaining processing executed by an information processing device 情報処理装置が実行する処理を説明するフローチャートFlowchart for explaining processing executed by an information processing device 情報処理装置が実行する処理を説明するフローチャートFlowchart for explaining processing executed by an information processing device 監視カメラの映像と認識結果の一例を示すAn example of surveillance camera footage and recognition results 識別結果の分布の一例を示す図Diagram showing an example of the distribution of identification results 情報処理装置が実行する処理を説明するフローチャートFlowchart for explaining processing executed by an information processing device

＜実施形態１＞
以下、図面を参照して本発明の実施形態を詳細に説明する。はじめに、本実施形態の概念について図１を用いて説明する。情報処理システム１は、ある環境に設置された複数のカメラにおいて共通の映像解析（例えば、人物識別やブラックリスト検知等）を行う。撮像システム２は、監視対象のエリアに設置された複数の監視カメラから構成される。 <Embodiment 1>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, the concept of this embodiment will be described with reference to FIG. The information processing system 1 performs common video analysis (for example, person identification, blacklist detection, etc.) in a plurality of cameras installed in a certain environment. The imaging system 2 is composed of a plurality of surveillance cameras installed in an area to be monitored.

ここでは例として、４つのエリアに監視カメラを設置した例を説明する。それぞれのカメラは、野外（１０Ａ）、屋内の高い位置（１０Ｂ）、屋内の低い位置（１０Ｃ）、屋上（１０Ｄ）の４か所に設置される。それぞれのカメラは、設置された位置姿勢、環境条件、カメラの内部パラメータが同じとは限らない。特定人物の画像と照合して、その人物と同じ人を探し出す際は、それぞれのカメラで撮影された映像を、それぞれのカメラに対応した識別条件（閾値）を設定することになる。ここで、識別条件とは、画像から検出された人物の画像特徴同士の類似度についての閾値である。類似度とは、人物に固有の特徴を有する特定の部位（例えば顔）を含む画像を識別器に入力し、出力された特徴同士を比較することによって取得する。監視システム３では、１台以上の監視カメラから取得された映像を解析することによって、タスクを達成する。 Here, as an example, an example in which surveillance cameras are installed in four areas will be described. Each camera is installed at four locations: outdoor (10A), indoor high position (10B), indoor low position (10C), and rooftop (10D). Each camera does not always have the same installed position and orientation, environmental conditions, and camera internal parameters. When matching with an image of a specific person to find the same person as that person, identification conditions (threshold values) corresponding to the respective cameras are set for the images captured by the respective cameras. Here, the identification condition is a threshold for the degree of similarity between image features of persons detected from an image. The degree of similarity is obtained by inputting an image including a specific part (for example, a face) having features unique to a person to a classifier and comparing the output features. Surveillance system 3 accomplishes tasks by analyzing video captured from one or more surveillance cameras.

撮像装置１０Ａは、例えば監視カメラで、周辺環境を撮像する。撮像装置１０Ａ～Ｄまたは情報処理装置１００は、複数のカメラから取得した映像を用いて、映像に含まれる人物を識別する各監視カメラの閾値を決定する。表示装置１０７は、監視中のカメラの映像や、情報処理装置１００や各監視カメラによって撮影された映像や映像を用いた解析結果を表示する。これによって、ユーザ（監視者）はカメラの映像や、識別処理における判定結果を容易に視認できる。 The imaging device 10A is, for example, a surveillance camera, and captures an image of the surrounding environment. The imaging devices 10A to 10D or the information processing device 100 uses images acquired from a plurality of cameras to determine a threshold for each surveillance camera that identifies a person included in the images. The display device 107 displays the video of the camera being monitored, the video captured by the information processing apparatus 100 and each monitoring camera, and the analysis result using the video. As a result, the user (monitoring person) can easily visually recognize the image of the camera and the determination result in the identification process.

以下の実施形態では、各監視カメラに識別器を備え、他のカメラの映像の解析結果を用いて各撮像装置に設定すべき閾値を決定する例を述べる。なお、情報処理装置１００が各撮像装置に対応する閾値を決定するような実施形態も可能である。その場合は、以下に示す撮像装置の機能構成を情報処理装置１００に置き換えればよい。 In the following embodiments, an example will be described in which each monitoring camera is provided with a discriminator, and a threshold value to be set for each imaging device is determined using the analysis results of images from other cameras. An embodiment in which the information processing apparatus 100 determines a threshold value corresponding to each imaging apparatus is also possible. In that case, the functional configuration of the imaging device described below may be replaced with the information processing device 100 .

実施形態１では、他のカメラに映った人物の特徴の照合結果を使って、所定のカメラに対応する閾値を決定する。人物が映った画像から得られる異なる２つの特徴（ここでは顔と人体）を抽出し、他のカメラから得られる映像と各特徴を照合する。このとき、照合に用いる特徴は各監視カメラで共通して抽出されやすい特徴で照合すると良い。つまり、複数のカメラによってとらえられた人物の特徴をマッチングする。例えば、人物の服の色から得られる輝度（特徴）は、他の特徴に比べてカメラの設置場所の影響を受けにくいと考えられるため、複数のカメラによって撮像された画像間で同一人物を特定することができる。 In the first embodiment, matching results of features of persons captured by other cameras are used to determine a threshold corresponding to a given camera. Two different features (here, face and human body) obtained from an image of a person are extracted, and each feature is compared with an image obtained from another camera. At this time, it is preferable to collate features that are commonly extracted by each surveillance camera as the features used for collation. In other words, it matches features of a person captured by multiple cameras. For example, the brightness (feature) obtained from the color of a person's clothes is considered to be less affected by the location of the camera than other features, so it is possible to identify the same person in images captured by multiple cameras. can do.

以下、説明では便宜上、閾値を決定したいカメラを注目撮像装置と呼ぶ。本実施形態では、注目撮像装置で撮像された画像と、他のカメラで撮像された画像に類似する特徴を特定し、同一人物の特徴同士を比較した第１類似度と、他人の特徴同士を比較した第２類似度に基づいて人物の識別に必要な識別条件を決定する方法を説明する。 Hereinafter, for the sake of convenience, the camera whose threshold value is to be determined will be referred to as an imaging device of interest. In this embodiment, features similar to an image captured by the imaging device of interest and images captured by other cameras are specified, and a first similarity obtained by comparing the features of the same person and the features of other people are determined. A method for determining identification conditions necessary for identifying a person based on the second degree of similarity that has been compared will now be described.

先述した通り、複数の環境に設置されたカメラによって撮像された映像から、同一の人物を検出するためには、識別基準をカメラ（環境）毎に用意する必要がある。その為には、多様な人物の特徴を識別器に十分に学習させるデータが必要である。しかしながら、監視対象となる現場において、人物についての特に正解データを収集することは現実的ではない。 As described above, in order to detect the same person from images taken by cameras installed in a plurality of environments, it is necessary to prepare identification criteria for each camera (environment). For this purpose, data that allows the classifier to sufficiently learn various human features is required. However, it is not realistic to collect particularly correct data about a person at a site to be monitored.

また、学習に用いた画像が、監視対象である現場で得られる画像と類似しているとは限らないという問題もある。特に学習に用いた画像が、デジタルカメラやスマートフォンでのスナップショットからの画像である場合は、上記問題はより顕著となる。なぜなら、学習に用いた画像は正面から顔を撮影した画像であることが多い。それ対し、監視カメラの場合は、屋内の天井など高い位置に取り付けられるため、監視カメラの画像は上から見下ろした顔画像が得られる。正面からの顔画像と上からの顔画像は、正面顔と正面顔同士の比較に対して、本人同士であっても類似度が下がる場合がほとんどであり、同じ閾値を用いて本人と識別することは、困難である。 There is also the problem that the images used for learning are not necessarily similar to the images obtained at the site to be monitored. Especially when the images used for learning are images taken from snapshots taken with a digital camera or smartphone, the above problem becomes even more pronounced. This is because the images used for learning are often images of faces taken from the front. On the other hand, in the case of a surveillance camera, since it is installed at a high position such as an indoor ceiling, the image of the surveillance camera can be obtained as a face image looking down from above. Compared to the comparison between the front face and the front face, the similarity between the face image from the front and the face image from above is almost always lower even if the face is the same person. It is difficult.

つまり、複数の環境に設置されたカメラから共通する人物を検出するためには以下のような困難がある。すなわち、例えば複数の監視カメラを現地に設置した状態で、得られた映像から人物の顔画像を抽出し、人物と同一であるかを示すラベルを付与することは、手間がかかる。特に数十台を超えるカメラに映った人物に対してラベルを付けることは、相当な労力を要する。その上に、そのラベルを付けた映像に対して、解析結果から人物を識別するための適切な識別条件を各撮像装置で決定することは、手間がかかる。これは、適切な識別条件を決定するのに必要なデータを十分に収集することが困難であるためである。 In other words, there are the following difficulties in detecting a common person from cameras installed in a plurality of environments. That is, it takes time and effort to extract a face image of a person from an image obtained by, for example, installing a plurality of monitoring cameras at a site and to assign a label indicating whether it is the same as the person. In particular, it takes considerable effort to label people captured by more than dozens of cameras. In addition, it takes time and effort for each imaging device to determine an appropriate identification condition for identifying a person from the analysis result for the labeled video. This is because it is difficult to collect enough data necessary to determine the appropriate identification criteria.

本実施形態では、これらの問題を、複数の環境で撮影された画像から人物を検出した複数の検出結果を組み合わせることによって、人物を識別する識別条件を撮像装置毎に決定することで対処する。なお、本実施形態では、識別器の識別条件を決定する決定フェーズと、識別条件を決定済みの識別器を用いた識別フェーズに分けて説明する。まず、決定フェーズについて説明する。なお、以下の説明では、複数の撮像装置のうち１つの撮像装置で行う処理を説明する。同様の処理を他の撮像装置で実行することによって、すべての撮像装置に対応する閾値を決定できる。また、情報処理装置１００で一括して処理を実行してもよい。現地の映像でこのようなヒストグラムを得るには、従来は人手で正解を
付けて、測定する必要があったが、本発明により、自動的に正解とみなせるデータを得ることにより、人手を省いて、識別条件を得ることが可能になった。 In this embodiment, these problems are addressed by combining a plurality of detection results obtained by detecting a person from images captured in a plurality of environments, thereby determining an identification condition for identifying a person for each imaging device. In this embodiment, the description will be divided into a determination phase for determining the discrimination condition of the discriminator and a discrimination phase using the discriminator for which the discrimination condition has already been determined. First, the decision phase will be described. In addition, in the following description, processing performed by one imaging device out of a plurality of imaging devices will be described. Thresholds corresponding to all imaging devices can be determined by performing similar processing on other imaging devices. Alternatively, the information processing apparatus 100 may collectively execute the processes. Conventionally, to obtain such a histogram from an image of the site, it was necessary to add the correct answers manually and measure them. , it became possible to obtain the identification conditions.

なお、説明上、監視カメラが複数あるシステムで説明しているが、本発明は監視カメラに限定されるものではない。Ｗｅｂカメラや、デジタルカメラなど、用途が異なる複数のカメラから構成されるカメラシステムにおいても、本実施形態を適応可能であることは言うまでもない。また、カメラはすべて同一機種である必要はない。すなわち、異なるカメラを複数の環境で用いてもよい。以下、詳細に説明する。 For the sake of explanation, a system having a plurality of surveillance cameras has been described, but the present invention is not limited to surveillance cameras. Needless to say, the present embodiment can be applied to a camera system including a plurality of cameras having different uses, such as web cameras and digital cameras. Also, the cameras do not all have to be of the same model. That is, different cameras may be used in multiple environments. A detailed description will be given below.

図２は、情報処理装置１０Ａの機能構成例を示したブロック図である。本情報処理装置１０Ａは、具体的には撮像装置である。情報処理装置１０Ａは、撮像部２００、画像取得部２０１、検出部２０２、識別部２０３、決定部２０４、記録部２０５、出力部２０６を含む。これらは、バスによって接続され、必要なデータ、命令等の情報が伝達される。なお、出力部２０６は装置の外部に備わっていても良い。また、撮像部２００についても外部の装置に備わっていてもよい。ここでは、撮像装置１０Ａについて説明するが、撮像装置１０Ｂ、１０Ｃ、１０Ｄも同様の構成を有する情報処理装置であるとする。 FIG. 2 is a block diagram showing a functional configuration example of the information processing device 10A. The information processing device 10A is specifically an imaging device. The information processing device 10A includes an imaging unit 200 , an image acquisition unit 201 , a detection unit 202 , an identification unit 203 , a determination unit 204 , a recording unit 205 and an output unit 206 . These are connected by a bus to transmit information such as necessary data and instructions. Note that the output unit 206 may be provided outside the apparatus. Also, the imaging unit 200 may be provided in an external device. Here, the imaging device 10A will be described, but the imaging devices 10B, 10C, and 10D are also assumed to be information processing devices having the same configuration.

撮像装置１０Ａ、１０Ｂ、１０Ｃ、１０Ｄは、監視対象の環境のうちそれぞれ異なる環境に設置されたカメラである。個々の監視カメラは、結像光学系、ズーム機構を備えた光学レンズで構成される。また、パン・チルト軸方向の駆動機構を備えてもよい。 The imaging devices 10A, 10B, 10C, and 10D are cameras installed in different environments among environments to be monitored. Each surveillance camera consists of an imaging optical system and an optical lens with a zoom mechanism. In addition, a driving mechanism in the pan/tilt axis direction may be provided.

撮像部２００は、センサによって外界を計測する。ここでは、情報処理装置１０Ａは監視カメラであるため、画像センサによって画像（映像）を撮像する。画像（映像）センサの具体例としては、典型的にはＣＣＤまたはＣＭＯＳイメージセンサが用いられ、不図示のセンサ駆動回路からの読み出し制御信号により所定の映像信号が画像データとして出力される。例えば、サブサンプリング、ブロック読み出しして得られる信号が画像データとして出力される。それぞれのカメラの設置場所と撮影する画像の例は後述する。 The imaging unit 200 measures the outside world with a sensor. Here, since the information processing device 10A is a surveillance camera, an image (video) is captured by the image sensor. As a specific example of the image (video) sensor, a CCD or CMOS image sensor is typically used, and a predetermined video signal is output as image data by a read control signal from a sensor driving circuit (not shown). For example, signals obtained by sub-sampling and block reading are output as image data. An example of the installation location of each camera and the image to be captured will be described later.

画像取得部２０１は、複数の監視カメラが撮影した映像（時系列画像）を取得する。なお、後述するフローチャートではフレーム毎に処理を行う。 The image acquisition unit 201 acquires images (time-series images) captured by a plurality of surveillance cameras. In addition, in the flowchart described later, processing is performed for each frame.

検出部２０２は、画像取得部２０１から取得した時系列画像または後述する記録部２０５から取得した時系列画像データに対して、人物の検出を行う。ここで人物とは典型的には人物の顔や人体のことである。画像中の顔の位置や、人体の位置について検出を行う。車や動物など人以外の物体についても検出してもよい。 The detection unit 202 detects a person from time-series images acquired from the image acquisition unit 201 or time-series image data acquired from a recording unit 205, which will be described later. Here, a person typically means a person's face or a human body. The position of the face in the image and the position of the human body are detected. Objects other than humans, such as cars and animals, may also be detected.

これらは、公知の技術、例えばＤｅｅｐＬｅａｒｎｉｎｇ（以下、ＤＬと表記する）の技術を用いることで容易に実現することが出来る。顔検出のＤＬは、画像中から顔を探すように訓練したニューラルネットワークである。具体的には、画像を入力すると何らかの値が出るようなニューラルネットで、顔画像の時は高い値で、そうでない画像（非顔画像）では低い値になるように訓練する。訓練データに顔画像についてより詳細な情報を含むことで、例えば顔の向きや、年齢・性別などその属性情報を検出するようにしてもよい。これもＤＬによって実現することが出来る。 These can be easily realized by using known techniques such as Deep Learning (hereinafter referred to as DL) technique. A face detection DL is a neural network trained to find faces in images. Specifically, a neural network that outputs some value when an image is input is trained so that a high value is obtained when the image is a face image, and a low value is obtained when the image is not (non-face image). By including more detailed information about the face image in the training data, for example, attribute information such as face direction, age, and sex may be detected. This can also be realized by DL.

識別部２０３は、識別フェーズにおいて、検出部２０２で検出した人物の顔画像を識別器に入力した出力結果（顔特徴）を出力する。すなわち、個体の違い、例えば、人物であれば、ＡさんはＢさんなど、個体としての違いを判別する処理を行う。識別部２０３の詳細については、後述する。 In the identification phase, the identification unit 203 outputs an output result (facial features) obtained by inputting the face image of the person detected by the detection unit 202 to the identification device. That is, a process of discriminating individual differences, such as person A and person B, is performed. Details of the identification unit 203 will be described later.

決定部２０４は、特定の人物を識別するための識別条件である閾値を決定する。つまり、決定フェーズにおいて、各カメラによって撮像された画像から検出された人物のうち共通する人物の情報に基づいて、各カメラに設定された閾値を決定する。本人を示す特徴が本人ではないと誤る確率（本人拒否率）と、他人を示す特徴が本人であると誤る確率（他人受入率）とに基づいて、誤りが少なくなるように閾値を決定する。決定部２０４の処理については、後で詳しく説明する。 A determination unit 204 determines a threshold, which is an identification condition for identifying a specific person. That is, in the determination phase, the threshold set for each camera is determined based on the information about the common person among the persons detected from the images captured by each camera. A threshold value is determined so as to reduce errors based on the probability that the feature indicating the person is not the person (principal rejection rate) and the probability that the feature indicating another person is the person (false acceptance rate). The processing of the determination unit 204 will be described later in detail.

記録部２０５は、画像取得部２０１によって撮像された映像を受け取り、圧縮などの記録のための処理と、映像解析の処理を行って、不揮発性の内部メモリやＨＤＤやＳＤなどのメディア等の、記録装置に記録を行う。 The recording unit 205 receives the image captured by the image acquisition unit 201, performs recording processing such as compression, and image analysis processing, and stores the image in a nonvolatile internal memory, media such as an HDD, an SD, or the like. Record on the recording device.

出力部２０６は、閾値に基づいて、対象画像を識別器に入力した出力結果と予め登録された登録人物の画像の特徴とを比較した類似度が閾値を満たす場合は対象画像が示す人物は登録人物であること、を示す判定結果を出力する。また、類似度が閾値を満たさない場合は対象画像が示す人物は登録人物ではないこと、を示す判定結果を出力する。また、記録部２０５によって処理された映像および付随する情報を、モニタ等に出力する。なお、外部、典型的にはネットワークを介して、ＰＣサーバ等に接続し、映像と解析情報を転送するようにしてもよい。または、出力部２０６は表示部であって、撮影中の映像や、識別結果を表示してもよい。 Based on the threshold, the output unit 206 compares the output result of inputting the target image to the classifier with the features of the image of the registered person registered in advance. If the similarity satisfies the threshold, the person indicated by the target image is registered. A determination result indicating that the person is a person is output. If the degree of similarity does not satisfy the threshold, a determination result indicating that the person indicated by the target image is not the registered person is output. It also outputs the video processed by the recording unit 205 and accompanying information to a monitor or the like. Note that the image and analysis information may be transferred by connecting to a PC server or the like externally, typically via a network. Alternatively, the output unit 206 may be a display unit, and may display an image being captured or an identification result.

ここで、情報処理装置１０Ａのハードウェア構成例について図３を用いて説明する。中央処理ユニット（ＣＰＵ）３０１は、ＲＡＭ３０３をワークメモリとして、ＲＯＭ３０２や記憶装置３０４に格納されたＯＳやその他プログラムを読みだして実行し、システムバス３０９に接続された各構成を制御して、各種処理の演算や論理判断などを行う。ＣＰＵ３０１が実行する処理には、実施形態の情報処理が含まれる。記憶装置３０４は、ハードディスクドライブや外部記憶装置などであり、実施形態の画像認識処理にかかるプログラムや各種データを記憶する。入力部３０５は、カメラなどの撮像装置、ユーザ指示を入力するためのボタン、キーボード、タッチパネルなどの入力デバイスである。なお、記憶装置３０４は例えばＳＡＴＡなどのインタフェイスを介して、入力部３０５は例えばＵＳＢなどのシリアルバスを介して、それぞれシステムバス３０９に接続されるが、それらの詳細は省略する。通信Ｉ／Ｆ３０６は無線通信で外部の機器と通信を行う。表示部３０７はディスプレイである。なお、表示部は情報処理装置の内部に有していても、外部に接続されていてもよい。センサ３０８は画像センサである。 Here, a hardware configuration example of the information processing apparatus 10A will be described with reference to FIG. A central processing unit (CPU) 301 reads and executes an OS and other programs stored in a ROM 302 and a storage device 304 using a RAM 303 as a work memory, controls each component connected to a system bus 309, and executes various It performs processing calculations and logical judgments. The processing executed by the CPU 301 includes the information processing of the embodiment. A storage device 304 is a hard disk drive, an external storage device, or the like, and stores programs and various data related to the image recognition processing of the embodiment. An input unit 305 is an imaging device such as a camera, and an input device such as a button for inputting a user instruction, a keyboard, or a touch panel. The storage device 304 is connected to the system bus 309 via an interface such as SATA, and the input unit 305 is connected to the system bus 309 via a serial bus such as USB. A communication I/F 306 communicates with an external device by wireless communication. A display unit 307 is a display. Note that the display unit may be provided inside the information processing apparatus or may be connected to the outside. Sensor 308 is an image sensor.

識別部２０３は、人体画像抽出部５０１、部分画像抽出部５０２、人体画像照合部５０３、部分画像照合部５０４、識別部５０５を含む。図で示したもの以外に、人物の色特徴を抽出するものや、輪郭特徴を抽出するもの、動きの特徴を抽出するものなど、複数の特徴抽出と、それに対応する画像照合部があってよい。 Identification unit 203 includes human body image extraction unit 501 , partial image extraction unit 502 , human body image matching unit 503 , partial image matching unit 504 , and identification unit 505 . In addition to the ones shown in the figure, there may be a plurality of feature extractions, such as a person's color feature extraction, outline feature extraction, movement feature extraction, etc., and an image matching unit corresponding to them. .

人体画像抽出部５０１、検出部２０２が検出した人物の位置と大きさに関する情報をもとに、人体を示す人体画像を抽出する。また、部分画像抽出部５０２は、検出部２０２が検出した人物の位置と大きさに関する情報をもとに、顔を示す顔画像を抽出する。これらの部分画像から、個体を識別するための特徴を取得する。この処理には、公知の技術を用いればよい。例えば、ＬＢＰ（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎ）特徴などを用いることができる。ＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ）特徴やＳＩＦＴ（Ｓｃａｌｅ－ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴、これらを混合した特徴を用いてもよい。抽出した特徴をＰＣＡ（ＰｒｉｎｃｉｐａｌＣｏｍｐｏｎｅｎｔＡｎａｌｙｓｉｓ）等の手法を用いて次元圧縮してもよい。また、これについても前述のようにＤＬ技術を適用することが可能である。 Based on the information about the position and size of the person detected by the human body image extraction unit 501 and the detection unit 202, a human body image representing the human body is extracted. Also, the partial image extraction unit 502 extracts a face image representing a face based on the information regarding the position and size of the person detected by the detection unit 202 . Features for identifying individuals are obtained from these partial images. A known technique may be used for this processing. For example, an LBP (Local Binary Pattern) feature or the like can be used. A HOG (Histogram of Oriented Gradient) feature, a SIFT (Scale-Invariant Feature Transform) feature, or a combination of these features may be used. The extracted features may be dimensionally compressed using a technique such as PCA (Principal Component Analysis). In addition, it is possible to apply the DL technology to this as well, as described above.

人体特徴と部分特徴を、抽出する領域が異なるだけになるようにしてもよいが、人物の識別の場合、全体と部分とで取得される特徴が示す情報の種類を異なるようにすることが多い。具体的には、顔は位置関係の情報、人体は色情報で特徴を取得する。人体で個体を識別する場合、服装など、色を含んだ情報が有用とされており、そのような特徴を用いる場合が多い。ＤＬ技術を適用する場合でも、明示的に輪郭のみを抽出するように設計した場合を除いて、暗黙的に色の情報が含まれていることが多いとされる。これは、被写体が画像上で小さいサイズである場合や、後ろを向いている場合などでも有用な情報は服装のテククスチャ、すなわち色成分を含んだ情報が有用と考えられるからである。これに対して、部分的な特徴、人物の場合の顔特徴は、色情報では不十分なだけでなく、色情報だけだと、誤認証する場合もあり得るので、あまり積極的に色情報は用いられない。顔領域のエッジ量や、輝度の勾配方向などを用いることが多い。近年はＤＬ技術の発展が目覚ましく、顔の識別においても、有用な特徴として、ＤＬ特徴が用いられている。人体と顔とで同じＤＬ技術を用いた場合でも、学習データはそれぞれ別であり、ネットワーク構造も個別に設計することが多いので、特徴としては全く別物である。なお、人体特徴は人物が着用しているゼッケン番号や、個人毎に付与された視認できるバーコードによって個人を画像で識別できるような特徴でも良い。 Human body features and partial features may be extracted only in different regions, but in the case of human identification, the types of information indicated by the features obtained for the whole and the parts are often different. . Specifically, the features of the face are acquired from positional relationship information, and from the color information of the human body. When identifying individuals in the human body, information including color such as clothing is considered useful, and such features are often used. Even when the DL technology is applied, color information is often implicitly included, except when explicitly designed to extract only the contour. This is because information including the texture of clothing, that is, information including color components, is considered to be useful information even when the subject is small in size in the image or when the subject is facing the back. On the other hand, for partial features and facial features of a person, not only is color information insufficient, but if only color information is used, false recognition may occur, so color information is not actively used. can't The amount of edges in the face area, the gradient direction of luminance, and the like are often used. DL technology has developed remarkably in recent years, and DL features are used as useful features in face identification as well. Even if the same DL technology is used for the human body and the face, the learning data are different for each, and the network structures are often designed individually, so the features are completely different. Note that the human body feature may be a bib number worn by a person or a feature that allows an individual to be identified by an image based on a visually recognizable bar code assigned to each individual.

人体画像照合部５０３は、類似する人体画像の集合を取得するため、抽出された人体画像をそれぞれ照合する。また、部分画像照合部５０４は、類似する顔画像の集合を取得するため、抽出された顔画像をそれぞれ照合する。それぞれ人体画像または部分画像同士の照合処理を行う。典型的には、特徴を数値列（特徴ベクトル）として扱い、２つの特徴ベクトルの距離を計測することによって識別する。特徴ベクトルを、機械学習の技術、例えばサポートベクターマシンに投入することにより、同一か否か識別させることも可能である。距離ではなく、内積を取得して、２つの特徴がどれほど類似しているかを数値（以下類似度と呼ぶ）としてあらわすこともできる。簡単には、距離の逆数をとれば、同じく類似度に変換することもできる。ＤＬ特徴を用いる場合では、学習時に用いた類似度取得方法と同じにするのが良い。 A human body image collating unit 503 collates each of the extracted human body images in order to obtain a set of similar human body images. Further, the partial image collating unit 504 collates each of the extracted facial images in order to acquire a set of similar facial images. A matching process is performed between human body images or partial images. Typically, features are treated as numerical sequences (feature vectors) and identified by measuring the distance between two feature vectors. It is also possible to identify whether or not the feature vectors are identical by inputting the feature vectors into a machine learning technique such as a support vector machine. It is also possible to obtain the inner product instead of the distance and express how similar two features are as a numerical value (hereinafter referred to as similarity). Simply taking the reciprocal of the distance can also be converted to similarity. When using DL features, it is preferable to use the same similarity acquisition method used during learning.

以下、同一人物の照合の方法について図８を用いて説明する。図８の画像Ａはカメラ１０Ａ、画像Ｂはカメラ１０Ｂ、画像Ｃはカメラ１０Ｃ、画像Ｄはカメラ１０Ｄによって撮影された異なる時刻における画像であるとする。第１の人物は、人体１００００と顔１０００１で示される人物であるとする。この第１の人物は、画像Ｂにおける人体１０００２、顔１０００３、画像Ｃにおける人体１０００４、顔１０００５、画像Ｄにおける人体１０００６、顔１０００７にそれぞれ対応する。識別部５０５が行う具体的な処理の一例としては、画像Ａにおける第１の人物の顔画像１０００１（または顔画像の特徴）を、他の画像Ｂ，Ｃ，Ｄにおいて人物照合用の特徴量を抽出して、照合を行う。その結果、理想的には、顔画像１０００３、１０００５、１０００７が得られる。これらの顔画像は同一人物であると見なせる。なお、広域を監視するカメラにおいては、顔画像よりも人体画像の方がマッチングしやすい場合がある。特に、珍しい服装を着用している人物等は人体画像のマッチングが精度よい。例えば、各画像から所定の人物を抽出するために、画像Ａの人体画像１００００（の特徴）をテンプレートとする。このテンプレートを、画像Ｂ，Ｃ，Ｄにおいてスライディングウィンドウとして走査する。結果、人体画像１０００２、１０００４、１０００６を得る。抽出された部分画像は所定の人物であると見なせる。なお、所定の人物は少なくとも２つ以上のカメラで撮影された画像において検出された任意の人物である。 A method of matching the same person will be described below with reference to FIG. Assume that the image A in FIG. 8 is taken by the camera 10A, the image B by the camera 10B, the image C by the camera 10C, and the image D by the camera 10D at different times. Assume that the first person is a person represented by a human body 10000 and a face 10001 . This first person corresponds to the human body 10002 and face 10003 in image B, the human body 10004 and face 10005 in image C, and the human body 10006 and face 10007 in image D, respectively. As an example of specific processing performed by the identification unit 505, the face image 10001 (or the features of the face image) of the first person in the image A is extracted from the other images B, C, and D as the feature amount for person matching. Extract and collate. As a result, face images 10003, 10005, and 10007 are ideally obtained. These face images can be regarded as the same person. It should be noted that, in a camera that monitors a wide area, it may be easier to match a human body image than a face image. In particular, matching of the human body image is highly accurate for a person wearing unusual clothes. For example, in order to extract a predetermined person from each image, the human body image 10000 (features thereof) of image A is used as a template. This template is scanned in images B, C, D as a sliding window. As a result, human body images 10002, 10004 and 10006 are obtained. The extracted partial image can be regarded as a predetermined person. Note that the predetermined person is any person detected in images captured by at least two or more cameras.

識別部５０５は、人体画像照合部５０３による照合結果をもとに類似した人体画像群に対応する顔画像群を、識別器に入力し、第１の出力結果を取得する。ここでは、人体画像は各監視カメラに共通して取得しやすいという前提として、複数の画像から類似した人体画像群を抽出し、さらにその人体画像群からそれぞれ対応した顔画像群を取得する。 The identification unit 505 inputs a group of face images corresponding to a group of similar human body images based on the matching result by the human body image matching unit 503 to the classifier, and acquires a first output result. Here, on the premise that human body images are common and easy to acquire for each monitoring camera, a similar human body image group is extracted from a plurality of images, and a corresponding face image group is acquired from each human body image group.

つまり、典型的には以下のように行う。すなわち、あるカメラ（注目撮像装置：識別器を決定する対象）の画像に映った所定の人物の人体を示す人体画像と、他の監視カメラの画像から検出された人物の人体画像とを照合する。その結果、最も高い類似度を示した（画像）特徴が、さらに所定の条件に当てはまっていれば、注目撮像装置の画像に映った所定の人物と、照合された特徴は同一人物であると識別する。最も高い類似度が閾値を下回った場合、検出された所定の人物に該当する画像がないと識別する。以上が１つの識別器についての識別で、人物の場合は、例えば顔だけで識別することに該当する。同じことを人体特徴についても行い、２つの結果を統合して、最終的な識別を行う。 That is, it is typically done as follows. That is, the human body image showing the human body of a predetermined person captured in the image of a certain camera (imaging device of interest: target for determining the classifier) is collated with the human body image of the person detected from the image of another surveillance camera. . As a result, if the (image) feature showing the highest degree of similarity further satisfies a predetermined condition, it is identified that the predetermined person shown in the image of the imaging device of interest and the collated feature are the same person. do. If the highest degree of similarity is below the threshold, it is identified that there is no image corresponding to the predetermined person detected. The above is the identification by one classifier, and in the case of a person, it corresponds to, for example, identification by face only. The same is done for the anthropomorphic features and the two results are combined for the final identification.

統合の方法は、簡単には多数決で行う。複数の識別器の結果が相反する場合、以下のようにするとよい。すなわち、事前に識別器に信頼度を設定しておき、もっとも信頼度の高い識別器の結果を採用する。信頼度は、例えば、事前に決めた画像セット（正解情報がある）で、各識別器の正解率を求めて、その正解率を信頼度として設定する。また、以下のようにしてもよい。個々の識別器の類似度に信頼度をかけて、全識別器の結果を足すことで、統合された類似度が取得され、その値をもって閾値と比較して、該当する画像を識別すればよい。以上が、識別部２０３の処理の内容である。個々の識別器（以下、顔識別器と人体識別器と呼ぶ）について特徴取得した後、それらの出力結果を統合して人物の識別を行う。人物の識別の場合、まず顔による識別を行い、次に人体の識別を行って、２つの出力結果を統合する。ほかにも識別器がある場合は、順次識別を行い、最後に統合を行って、出力結果とする。 The method of integration is simply by majority vote. If the results of multiple discriminators are contradictory, the following should be done. That is, the reliability is set in the discriminator in advance, and the result of the discriminator with the highest reliability is adopted. For the reliability, for example, the accuracy rate of each discriminator is obtained using a predetermined image set (with correct information), and the accuracy rate is set as the reliability. Alternatively, the following may be done. By multiplying the similarity of each classifier by the reliability and adding the results of all the classifiers, the integrated similarity is obtained, and the value is compared with the threshold to identify the corresponding image. . The above is the content of the processing of the identification unit 203 . After acquiring the features of individual classifiers (hereafter referred to as a face classifier and a human body classifier), their output results are integrated to identify a person. In the case of human identification, face identification is first performed, then human body identification is performed, and the two output results are integrated. If there are other discriminators, the discriminators are sequentially discriminated and finally integrated to obtain the output result.

次に、決定部２０４について説明する。決定部２０４は、画像情報取得部６０１、対応付け部６０２、誤り率取得部６０３、決定部６０４を含む。画像情報取得部６０１は、記録部２０５と検出部２０２から、撮像装置毎に撮像された映像（時系列画像）と、その映像（時系列画像）から検出された各人物の位置情報とを含む画像情報を取得する。 Next, the determination unit 204 will be described. The determination unit 204 includes an image information acquisition unit 601 , an association unit 602 , an error rate acquisition unit 603 and a determination unit 604 . The image information acquisition unit 601 includes images (time-series images) captured by each imaging device from the recording unit 205 and the detection unit 202, and position information of each person detected from the images (time-series images). Get image information.

対応付け部６０２は、画像情報取得部６０１から取得した画像情報に基づいて、複数の撮像装置において検出された人物を対応付ける。対応付け部６０２で行われる処理については、後で詳しく説明する。誤り率取得部６０３は、対応付け部６０２の対応付け結果に基づいて、識別部２０３のある識別器に関して誤り率を取得する。ここで、誤り率とは、他人受入率（Ｆａｌｓｅ－Ｐｏｓｉｔｉｖｅ；誤検知）と本人拒否率（Ｆａｌｓｅ－Ｎｅｇａｔｉｖｅ；検知漏れ）とを含む。例えば人物Ａを特定したい場合に、本人拒否率は、検出された人物Ａが同一人物でないと識別した結果が誤りである確率（割合）である。他人受入率は、検出された人物Ｂが同一人物であると識別した結果が誤りである確率（割合）である。 The associating unit 602 associates persons detected by a plurality of imaging devices based on the image information acquired from the image information acquiring unit 601 . The processing performed by the associating unit 602 will be described later in detail. The error rate acquisition unit 603 acquires an error rate for a classifier in the identification unit 203 based on the matching result of the matching unit 602 . Here, the error rate includes false-positive (false positive) and true-negative (false-negative) false positives. For example, when it is desired to identify person A, the false rejection rate is the probability (percentage) that the result of identifying that detected person A is not the same person is incorrect. The false acceptance rate is the probability (percentage) that the result of identifying the detected person B as being the same person is incorrect.

誤り率取得部６０３の処理の内容については後述する。決定部６０４は、誤り率取得部６０３で取得された誤り率に基づいて、撮像装置毎の閾値を決定する。閾値決定部６０４の処理の内容についても、後で詳しく説明する。なお、閾値は、初期設定として予め決定された閾値をセットしておく。これによって、元の認識精度を確かめることができる。 Details of the processing of the error rate acquisition unit 603 will be described later. A determination unit 604 determines a threshold for each imaging device based on the error rate acquired by the error rate acquisition unit 603 . The details of the processing performed by the threshold determination unit 604 will also be described in detail later. A predetermined threshold is set as an initial setting. This makes it possible to confirm the original recognition accuracy.

図４は、情報処理装置が実行する処理を説明するフローチャートである。図４を用いて本実施形態の処理の概要を説明する。以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。ただし、図４のフローチャートに示した処理は、コンピュータである図３のＣＰＵ１０１により記憶装置１０４に格納されているコンピュータプログラムに従って実行される。情報処理装置１００は必ずしもこのフローチャートで説明するすべてのステップを行わなくても良い。なお、ここではＳ４０３における識別条件の更新は行わないものとする（第２の実施形態で説明する。）
Ｓ４００では、決定部２０４が識別器の閾値を決定するか否かを判断する。本実施形態においては、時間に応じて判断する。例えば、監視カメラが一定時間（例えば、２４時間）稼働したら閾値を決め直すようにする。また、例えば初回に本情報処理装置を起動する際も、識別条件を新たに決定するようにしてもよい。Ｓ４００でＹｅｓと判断した場合、Ｓ４０２に進む。Ｓ４００でＮｏと判断した場合、Ｓ４０４に進む。 FIG. 4 is a flowchart illustrating processing executed by the information processing device. An overview of the processing of this embodiment will be described with reference to FIG. In the following description, notation of each process (step) is omitted by adding S to the beginning of each process (step). However, the processing shown in the flowchart of FIG. 4 is executed according to a computer program stored in the storage device 104 by the CPU 101 of FIG. 3, which is a computer. The information processing apparatus 100 does not necessarily have to perform all the steps described in this flowchart. Note that the identification conditions are not updated in S403 here (described in the second embodiment).
In S400, the determining unit 204 determines whether or not to determine the threshold value of the discriminator. In this embodiment, determination is made according to time. For example, the threshold is re-determined after the monitoring camera has been in operation for a certain period of time (for example, 24 hours). Also, for example, when the information processing apparatus is activated for the first time, the identification condition may be newly determined. If it is determined Yes in S400, the process proceeds to S402. If it is determined No in S400, the process proceeds to S404.

Ｓ４０１とＳ４０２は閾値決定フェーズである。Ｓ４０１では、決定部２０４が、各カメラの映像について検出された人物の対応付けを行う。詳しい処理は図５を用いて後述する。Ｓ４０２では、決定部２０４が、監視カメラ毎に設定された識別器の閾値を決定する。詳しい説明は後述する。 S401 and S402 are threshold determination phases. In S<b>401 , the determination unit 204 associates the detected person with the video of each camera. Detailed processing will be described later with reference to FIG. In S402, the determination unit 204 determines the threshold value of the discriminator set for each surveillance camera. A detailed description will be given later.

Ｓ４０４は、監視フェーズである。Ｓ４０４では、識別部２０３が、識別器と閾値とを用いて撮像された画像にターゲット人物が含まれていないか識別する。つまり、ターゲット人物を映像から検出する。Ｓ４０４では、識別部２０３が監視を続行するか否かを判断する。本実施形態では、ユーザ指示によって、監視の続行もしくは中断を判断する。監視を続行する場合（Ｙｅｓ）は、Ｓ４００に戻る。監視を中断する場合（Ｎｏ）は、処理を終了する。処理を開始してから一定時間経過後に終了するようにしてもよい。また、所望の人物を識別できた場合に終了するようにしてもよい。 S404 is a monitoring phase. In S404, the identification unit 203 identifies whether the target person is included in the captured image using the classifier and the threshold. That is, the target person is detected from the video. In S404, the identification unit 203 determines whether or not to continue monitoring. In this embodiment, it is determined whether to continue or interrupt monitoring according to a user instruction. If monitoring is to be continued (Yes), the process returns to S400. If monitoring is to be interrupted (No), the process ends. The process may be terminated after a certain period of time has elapsed since the start of the process. Alternatively, the processing may end when the desired person is identified.

図５は、決定部２０４が実行する処理の一例を説明するフローチャートである。最初に前提条件として、処理量が膨大になることを避けるため、以下の処理を行う時間範囲がユーザまたは事前の設定により指定されるか、設置後に予め定めた期間内で行うようにする。 FIG. 5 is a flowchart illustrating an example of processing executed by the determining unit 204. As illustrated in FIG. First, as a precondition, in order to avoid an enormous amount of processing, the time range for performing the following processing is specified by the user or by setting in advance, or is performed within a predetermined period after installation.

以下の図５に沿って、決定部２０４が実行する処理を説明する。まず、Ｓ７０１では、画像取得部２０１が、各撮像装置によって撮像された時系列画像を取得する。画像情報には、時系列画像とカメラの識別子とが含まれる。次に、Ｓ７０２では、対応付け部６０２が、Ｓ７０１で取得した画像情報に基づいて、検出部２０２から各撮像装置の画像から検出された人物の検出結果に対して人物の位置を取得する。 Processing executed by the determining unit 204 will be described with reference to FIG. 5 below. First, in S701, the image acquisition unit 201 acquires time-series images captured by each imaging device. The image information includes time-series images and camera identifiers. Next, in S702, the associating unit 602 acquires the position of the person from the detection result of the person detected from the image of each imaging device from the detection unit 202 based on the image information acquired in S701.

次に、Ｓ７０３では、人体画像抽出部５０１と、部分画像抽出部５０２が、各撮像装置によって撮像された各時系列画像に含まれる人物を示す部分画像（顔画像）と人体画像とを抽出する。ここでは、各時系列画像から検出された人物すべてにこの処理を行う。ここまでで取得した各映像に対して共通する方法（同じ識別器）で、すべての人物の画像を抽出する。照合するときは同じ識別器から取得した特徴同士で比較するためである。なお、検出された人物のうち、検出の信頼度（検出された物体が人物である確からしさ）が所定の閾値より大きいといった条件を満たす一部の人物のみを取得してもよい。例えば、画角の中央付近に映った人物は特徴がうまく抽出できる可能性が高いため、積極的に閾値決定に用いる。この際、画像を入力する識別部２０３にある識別器のうち、もっとも信頼できるものにするとよい。人物識別器の信頼性はあらかじめ定めたテストデータで事前に性能を測ることで取得することができる。 Next, in S703, the human body image extraction unit 501 and the partial image extraction unit 502 extract a partial image (face image) representing a person and a human body image included in each time-series image captured by each imaging device. . Here, this process is performed on all persons detected from each time-series image. Images of all persons are extracted by a common method (same discriminator) for each image acquired so far. This is because the features acquired from the same discriminator are compared when matching is performed. Of the detected persons, only some persons who satisfy the condition that the reliability of detection (likelihood that the detected object is a person) is greater than a predetermined threshold may be acquired. For example, since there is a high possibility that features of a person captured near the center of the angle of view can be successfully extracted, the person is positively used for threshold determination. At this time, it is preferable to select the most reliable classifier among the classifiers in the classifier 203 that inputs the image. The reliability of the person classifier can be obtained by measuring its performance in advance with predetermined test data.

次に、Ｓ７０４では、人体画像照合部５０３が、Ｓ７０３において取得された人体画像のうち、所定の人物を示す注目人体画像に基づいて、第１の撮像装置とは別の第２の撮像装置によって撮像された画像から所定の人物を示す人体画像を照合する。ここで、照合した少なくとも１つ以上の人体画像を第１の人体画像群と呼ぶ。つまり、取得されたすべての画像におけるすべての人体画像のうち、所定の人物を示す注目人体画像と類似した人体画像を所定の人物と見なす。これによって、複数の撮像装置によって撮像された所定の人物の人体画像を対応付ける。この処理は以下のようにして行う。ある撮像装置によって撮像された画像から検出された人物の人体画像と、ほか撮像装置によって撮像された画像から検出された人物の人体画像とを照合する。具体的には、第１撮像装置によって撮像された画像から検出された人物の人体画像Ｘと、第１撮像装置とは異なる撮像装置によって撮像された画像から検出された人物画像Ｙとを比較し、類似度が所定の値より大きい場合、類似した画像であると照合する。処理の高速化のために、時刻情報を用いてもよい。視野の重複がないように設置された監視カメラでは、同時に同じ人物が映ることはない。また、カメラの位置関係によって、一方のカメラに現れた時刻から、他方のカメラまでの移動時間が予測されるので、同一人物が現れやすい時間帯が推定可能である。また、ディープラーニングによって、照合を行ってもよい。 Next, in S704, the human body image matching unit 503 uses a second imaging device different from the first imaging device based on the human body image of interest showing a predetermined person among the human body images acquired in S703. A human body image showing a predetermined person is collated from the captured image. Here, at least one or more human body images that have been collated are referred to as a first human body image group. That is, among all the human body images in all the acquired images, the human body image similar to the target human body image showing the predetermined person is regarded as the predetermined person. Thereby, the human body images of a predetermined person imaged by a plurality of imaging devices are associated with each other. This process is performed as follows. A human body image of a person detected from an image captured by a certain imaging device is collated with a human body image of a person detected from an image captured by another imaging device. Specifically, the human body image X detected from the image captured by the first imaging device and the human body image Y detected from the image captured by the imaging device different from the first imaging device are compared. , if the degree of similarity is greater than a predetermined value, the images are matched as similar images. Time information may be used to speed up processing. Surveillance cameras installed so that their fields of view do not overlap do not show the same person at the same time. Also, depending on the positional relationship of the cameras, the travel time from the time when the person appeared in one camera to the other camera can be predicted, so it is possible to estimate the time period when the same person is likely to appear. Verification may also be performed by deep learning.

続いて、Ｓ７０５では、対応付け部６０２が全ての撮像装置によって撮像されたすべての画像から検出されたすべての人物について、上記処理を行ったか判断する。未処理の人物がある場合（ＹＥＳ）、次の人物を対象に処理を行うためＳ７０１に戻る。全ての人物を処理した場合（Ｓ７０５でＮＯの場合）、Ｓ７０６に進む。 Subsequently, in S705, the associating unit 602 determines whether the above processing has been performed for all persons detected from all images captured by all imaging devices. If there is an unprocessed person (YES), the process returns to S701 to process the next person. If all persons have been processed (NO in S705), the process proceeds to S706.

Ｓ７０６では、対応付け部６０２が、Ｓ７０４で注目特徴と照合した第１の人体画像群と対応する第１の顔画像群を対応付ける。画像から検出された人物に各撮像装置に共通のユニークな識別子（ＩＤ）を付与する。この識別子は後段の誤り率取得部６０３で用いられる。以上が、対応付け部６０２で行われる処理の説明である。この処理によって、ある撮像装置によって撮像されたある人物が検出された画像を、システムに含まれる複数の撮像装置によって撮像された画像と対応付けることができる。その結果、複数の撮像装置で撮像された画像から共通人物を特定することができる。 In S706, the associating unit 602 associates the first facial image group corresponding to the first human body image group collated with the feature of interest in S704. A unique identifier (ID) common to each imaging device is assigned to a person detected from an image. This identifier is used in error rate acquisition section 603 in the subsequent stage. The above is the description of the processing performed by the associating unit 602 . By this processing, an image in which a certain person is detected captured by a certain imaging device can be associated with images captured by a plurality of imaging devices included in the system. As a result, a common person can be identified from images captured by a plurality of imaging devices.

次に、決定部２０４が実行する処理について図８を用いて詳細に説明する。この処理では、ある人物を識別する識別器について、カメラ毎に適切な閾値を設定する。図９を用いて識別条件（閾値）の決定方法について説明する。 Next, processing executed by the determination unit 204 will be described in detail using FIG. In this processing, an appropriate threshold value is set for each camera for a classifier that identifies a certain person. A method of determining the discrimination condition (threshold value) will be described with reference to FIG.

図９のグラフ９０は、縦軸ｙ（ｘ＝０）は頻度を、横軸ｘ（ｙ＝０）は同じ識別器によって出力された特徴同士のペアの類似度を示すヒストグラムである。ここでは、類似度は特徴ベクトルの内積で示されるものとする（－１＜類似度Ｓ＜１）。つまり、類似度が大きいほどペアが同一人物である可能性が高く、類似度は小さいほどペアは他人同士である可能性が高い。まず、従来技術において、識別条件を決定する際には図９（Ａ）に示すヒストグラムが得られる。このとき、顔画像のペア（顔特徴のペア）は本人同士であるか、他人同士であるか分からない（正解のペアが既知でない）。特に、本人同士の顔画像のペアを取得するのが難しい。そのため、２種類の誤り率（本人拒否率と他人受入率）を特定することができない。従って、誤り率が小さくなるような識別条件を自動的に決定することができなかった。 A graph 90 in FIG. 9 is a histogram showing the frequency on the vertical axis y (x=0) and the similarity between pairs of features output by the same classifier on the horizontal axis x (y=0). Here, the similarity is indicated by the inner product of feature vectors (-1<similarity S<1). That is, the higher the degree of similarity, the higher the possibility that the pair is the same person, and the lower the degree of similarity, the higher the possibility that the pair are different people. First, in the prior art, a histogram shown in FIG. 9A is obtained when determining the discrimination condition. At this time, it is not known whether the pair of face images (pair of facial features) is the person himself or another person (the correct pair is not known). In particular, it is difficult to acquire a pair of face images of the person himself/herself. Therefore, two types of error rates (authentic rejection rate and false acceptance rate) cannot be specified. Therefore, it was not possible to automatically determine a discrimination condition that would reduce the error rate.

一方で、本実施形態では、照合結果を用いることで、本人同士と他人同士の顔画像（顔特徴）の２種類のペアが特定できる。本人同士のペアである顔画像群から取得された類似度をグラフ９１に示す。これは本人同士の顔画像の組み合わせが取りうる類似度の頻度を示す。また、他人同士のペアからはグラフ９２が得られる。これは他人同士の顔画像の組み合わせが取りうる類似度の頻度を示す。グラフ９１の左側（０に近い類似度を取る範囲）は、本人の顔画像同士のペアであるのに低い類似度を取るため、識別結果を誤る可能性が高い。この場合の識別ミスを本人拒否率（検知漏れ）と呼ぶ。この本人拒否率は、図９（Ｂ）に示す閾値９００と、グラフ９１とｘ軸（ｙ＝０）が成す面積９０１で示される。この本人拒否率を下げたい場合は、この面積が所定の割合より小さくなるように閾値（識別条件）を大きくすると良い。もう一方の誤り率である他人受入率は、図９（Ｃ）の面積９０２に示される。グラフ９２は、他人同士の顔画像（顔特徴）の類似度をプロットした結果である。 On the other hand, in the present embodiment, two types of pairs of facial images (facial features) of the person himself and another person can be specified by using the matching result. A graph 91 shows the degree of similarity obtained from a group of face images that are pairs of individuals. This indicates the frequency of the degree of similarity that can be taken by combinations of face images of individuals. Also, a graph 92 is obtained from pairs of strangers. This indicates the frequency of similarities that combinations of face images of different people can have. The left side of the graph 91 (the range in which the degree of similarity is close to 0) is a pair of face images of the person, but the degree of similarity is low, so there is a high possibility that the identification result will be erroneous. The identification error in this case is called the false rejection rate (detection omission). This false rejection rate is indicated by the threshold value 900 shown in FIG. 9B and the area 901 formed by the graph 91 and the x-axis (y=0). If it is desired to lower this false rejection rate, it is preferable to increase the threshold (discrimination condition) so that this area becomes smaller than a predetermined ratio. The other error rate, the false acceptance rate, is shown in area 902 in FIG. 9(C). A graph 92 is the result of plotting the degree of similarity between the facial images (facial features) of different people.

面積９０２は、閾値９００‘とグラフ９２とｙ＝０を積分した値である。他人を本人である間違えてしまう他人受入率（誤検知）を減らしたい場合は、面積９０２が所定の割合より小さくなるように閾値を小さくすると良い。なお、面積９０１と面積９０２はトレードオフの関係であって、どちらかを小さくすると一方が大きくなる。したがって、ユースケースに応じて、２つの和を最小にする、一方の確率が所定の割合より小さくなるようにするといった条件を満たすように閾値を決定すると良い。この条件はユーザが予め設定してもよい。このようにして、２種類のヒストグラムを用いて識別条件を設定することができる。現地の映像でこのようなヒストグラムを得るには、従来は人手で正解を
付けて、測定する必要があったが、本発明により、自動的に正解とみなせる
データを得ることにより、人手を省いて、識別条件を得ることが可能になった
Ｓ４０２の処理について図６のフローチャートで説明する。Ｓ８０１では、誤り率取得部６０３が、閾値を更新する対象となる撮像装置に対応する識別器を取得する。この識別器は、様々な人物の顔画像とその個人に固有な特徴を学習したニューラルネットワークである。すなわち、複数の人物の顔画像を複数セット用意し、同一人物には類似した値を出力するよう学習させる。なお、人体画像でも同様の識別器を用いる。人物の部分画像（例えば顔画像や人体画像）を入力すると、人物毎に固有の特徴を出力する。 The area 902 is the value obtained by integrating the threshold 900', the graph 92 and y=0. If it is desired to reduce the false acceptance rate (erroneous detection) in which a stranger is mistaken for the person himself/herself, it is preferable to reduce the threshold so that the area 902 becomes smaller than a predetermined ratio. Note that the area 901 and the area 902 have a trade-off relationship, and if one of them is made smaller, the other becomes larger. Therefore, it is preferable to determine the threshold so as to satisfy conditions such as minimizing the sum of the two and making the probability of one smaller than a predetermined ratio, depending on the use case. This condition may be preset by the user. In this way, discrimination conditions can be set using two types of histograms. Conventionally, to obtain such a histogram from an image of the site, it was necessary to add the correct answers manually and measure them. , the process of S402 in which the identification condition can be obtained will be described with reference to the flow chart of FIG. In S801, the error rate acquisition unit 603 acquires a discriminator corresponding to the imaging device whose threshold is to be updated. This discriminator is a neural network that has learned facial images of various people and their unique features. That is, a plurality of sets of face images of a plurality of persons are prepared, and learning is performed so that similar values are output for the same person. A similar discriminator is also used for human body images. When a partial image of a person (for example, a face image or an image of a human body) is input, unique features are output for each person.

例えば、人物Ｎの顔画像を入力したときに、ベクトルＳｎという出力をしたとする。次に、人物Ｎが映った画像で他のアングルや画角で撮った画像を入力すると、共通した特徴があればベクトルＳｎに近いベクトルＳｎ’と出力される。人物Ｎと異なる人物Ｍの画像がその識別器に入力された場合は、ベクトルＳｎと異なる、人物Ｍに固有なベクトルＳｍが出力される。つまり、２つの画像を入力された識別器の識別結果ベクトルＳｎとＳｍとの距離や内積が所定の値以下あれば、２つの画像に映った人物は同一人物と見なせる。ＳｎとＳｍが所定の値より大きい場合は、２つの画像に映った人物は異なる人物同士である。 For example, assume that a vector Sn is output when a face image of a person N is input. Next, when an image of the person N captured at a different angle or angle of view is input, a vector Sn' close to the vector Sn is output if there is a common feature. When an image of person M different from person N is input to the discriminator, a vector Sm unique to person M and different from vector Sn is output. That is, if the distance or the inner product between the classification result vectors Sn and Sm of the classifier to which the two images are input is equal to or less than a predetermined value, the person reflected in the two images can be regarded as the same person. If Sn and Sm are greater than a predetermined value, the persons appearing in the two images are different persons.

次に、Ｓ８０２では、誤り率取得部６０３が、対応付け部６０２からある期間の全ての撮影装置の画像から検出された全ての人物について付与した識別子を含む対応付け情報を取得する。抽出された各特徴には、画像から検出された人物に各撮像装置に共通のユニークな識別子（ＩＤ）を付与されている。 Next, in step S802, the error rate acquisition unit 603 acquires from the association unit 602 association information including identifiers assigned to all persons detected from images captured by all imaging devices during a certain period. Each extracted feature is given a unique identifier (ID) that is common to each imaging device for the person detected from the image.

次に、Ｓ８０３では、識別部５０５が、所定の人物を示す注目特徴と、別の撮像装置の画像から抽出された特徴のうち注目特徴と照合した特徴を示す照合結果を取得する。つまり、識別部５０５は、所定の人物を示す人体画像と類似した第１の人体画像群と、所定の人物の人体画像と類似しない第２の人体画像群を特定する。例えば、所定の人物の服装（人体特徴）をテンプレートに決定し、他の撮像装置の画像に対してテンプレートマッチングを行った照合結果は、他の撮像装置によって撮像された所定の人物を示している可能性がある。すなわち、第１の人体画像群は同一人物（本人）である可能性が高い人体画像の集合で、第２の人体画像群は所定の人物とは異なる人物である（他人）である可能性が高い人体画像の集合である。また、その人体画像に対応する顔画像についても同様のことがいえる。 Next, in S803, the identification unit 505 acquires a feature of interest indicating a predetermined person and a matching result indicating a feature of features extracted from an image captured by another imaging apparatus that has been matched with the feature of interest. That is, the identifying unit 505 identifies a first human body image group similar to the human body image showing the predetermined person and a second human body image group not similar to the predetermined person. For example, the matching result obtained by determining the clothes (human body features) of a predetermined person as a template and performing template matching on the image of another imaging device indicates the predetermined person imaged by the other imaging device. there is a possibility. That is, the first human body image group is a set of human body images that are highly likely to be the same person (himself), and the second human body image group is a human body image group that is likely to be a person (another person) different from the predetermined person. It is a collection of tall human body images. The same can be said for the face image corresponding to the human body image.

次に、Ｓ８０４では、誤り率取得部６０３が、第１の人体画像群と対応する顔画像群を、画像から人物と対応する（顔）特徴を出力する第１の識別器に入力した第１の出力結果同士を比較し、第１の類似度を取得する。また。誤り率取得部６０３が、第２の人体画像群と対応する顔画像群を、画像から人物と対応する（顔）特徴を出力する第２の識別器に入力した第２の出力結果同士を比較し、第２の類似度を取得する。図９（Ｂ）における、グラフ９１は第１の類似度の頻度を示す。また、図９（Ｂ）における、グラフ９２は第２の類似度の頻度を示す。のちの処理において、この２つのヒストグラムを用いて閾値を決定する。 Next, in S804, the error rate acquisition unit 603 inputs the face image group corresponding to the first human body image group to the first discriminator that outputs (facial) features corresponding to the person from the image. are compared to obtain a first degree of similarity. Also. The error rate acquisition unit 603 inputs the second human body image group and the corresponding face image group to the second classifier that outputs the (face) features corresponding to the person from the image, and compares the second output results. and obtain a second degree of similarity. A graph 91 in FIG. 9B shows the frequency of the first degree of similarity. A graph 92 in FIG. 9B shows the frequency of the second degree of similarity. In later processing, the two histograms are used to determine the threshold.

Ｓ８０５では、誤り率取得部６０３が、第１の類似度と閾値とを比較し、本人の画像を本人でないと誤る可能性を示す本人拒否率を取得する。同様に、誤り率取得部６０３が、第２の類似度と閾値とを比較し、他人の画像を本人であると誤る可能性を示す他人受入率を取得する。なお、誤り率とは、本人拒否率と他人受入率との和で示される。照合結果によって示される本人同士の人体画像（とそれに対応する顔画像）のペアが正しいとして、誤り率を取得する。人物の同一性を識別する際の誤りは、２つの場合が考えられる。すなわち、本来同一であるはずの２つの人物を、異なる人物としてしまう誤り（ＦａｌｓｅＮｅｇａｔｉｖｅ：本人拒否率）と、異なる人物２つを同一であると識別してしまう誤り（ＦａｌｓｅＰｏｓｉｔｉｖｅ：他人受入率）である。この２つの誤りについて、それぞれ取得する。なお、この２つの確率はトレードオフの関係にあるため、どちらかを小さくするともう一方が大きくなってしまう。そのため、ユースケースに応じて、どちらの確率をコントロールするか設定すると良い。または、両方の確率の和が最小になるような閾値を決定すると良い。 In S805, the error rate acquisition unit 603 compares the first degree of similarity with a threshold value, and acquires the false rejection rate indicating the possibility of mistaking the image of the person to be that of the person. Similarly, the error rate acquisition unit 603 compares the second degree of similarity with a threshold, and acquires a false acceptance rate indicating the possibility of mistaking another person's image as the person himself/herself. The error rate is indicated by the sum of the false rejection rate and false acceptance rate. The error rate is obtained assuming that the pair of human body images (and the corresponding facial images) of the individuals indicated by the matching result is correct. There are two possible cases of error in identifying a person's identity. In other words, there is an error in which two persons who should be originally the same are treated as different persons (False Negative: false rejection rate), and an error in which two different persons are identified as being the same (False Positive: false acceptance rate). is. Each of these two errors is acquired. Note that these two probabilities have a trade-off relationship, so if one of them is made smaller, the other becomes larger. Therefore, it is good to set which probability to control according to the use case. Alternatively, it is better to determine a threshold that minimizes the sum of both probabilities.

次に、Ｓ８０６では、決定部６０４が、本人拒否率または他人受入率を所定の割合より小さくなるように閾値を決定する。つまり、決定部６０２は、閾値より小さい第１の類似度の頻度が所定の値よりも少なくなるように閾値を決定する。または、決定部６０４が、閾値より大きい第２の類似度の頻度が所定の値よりも少なくなるように閾値を決定する。または、決定部６０４は、閾値より小さい第１の類似度の頻度と前記閾値より大きい第２の類似度の頻度との和が所定の値より小さくなるように閾値を決定する。 Next, in S806, the determining unit 604 determines a threshold value so that the false rejection rate or the false acceptance rate becomes smaller than a predetermined rate. In other words, the determining unit 602 determines the threshold such that the frequency of the first similarity less than the threshold is less than the predetermined value. Alternatively, the determination unit 604 determines the threshold such that the frequency of the second similarity that is greater than the threshold is less than a predetermined value. Alternatively, the determining unit 604 determines the threshold such that the sum of the frequency of the first similarity less than the threshold and the frequency of the second similarity greater than the threshold is less than a predetermined value.

決定部６０４で行われる処理について説明する。以下では特に、類似度を、閾値を超えたか否かで識別する処理について説明する。決定部３０４では、誤り率取得部６０３で取得された、識別器のパラメータ（典型的には閾値）と誤り率の表を取得して、所望の誤り率に近くなるパラメータを選択する。上述したように、誤り率には、同じ人物を異なると識別してしまう誤り（ＦａｌｓｅＮｅｇａｔｉｖｅ）と、異なる人物を同一と識別してしまう誤り（ＦａｌｓｅＰｏｓｉｔｉｖｅ）の、２種類の誤りがあり、一般に、トレードオフの関係がある。すなわち、ＦａｌｓｅＮｅｇａｔｉｖｅを減らそうとして、類似度が低くても同一と識別するようにする（閾値を下げることに相当）と、異なる人物を同一と識別するＦａｌｓｅＰｏｓｉｔｉｖｅが増える。類似度が高くても同一でないと判断する（閾値を上げることに相当）と、ＦａｌｓｅＰｏｓｉｔｉｖｅは下がるが、ＦａｌｓｅＮｅｇａｔｉｖｅは上がってしまう。通常は、２つの誤り率の総和が最小になるように設定するが、用途によっては、ＦａｌｓｅＰｏｓｉｔｉｖｅを避けたい（誤認証は避けたい）など、目的に応じた設定がとりえる。このような誤り率の目標設定は、あらかじめユーザによる指定等で、事前に定めておくことができる。実際の監視映像で事前に定めた誤り率に近くなるように、識別器のパラメータ、典型的には類似度の閾値を更新することが可能になる。このように選んだ識別器のパラメータを、識別部２０３の識別器の新しいパラメータとして設定する。以上が、決定部６０４で行われる処理の説明である。 Processing performed by the determination unit 604 will be described. In the following, the process of identifying the degree of similarity based on whether or not it exceeds a threshold will be described. The determining unit 304 acquires the classifier parameter (typically the threshold) and the error rate table acquired by the error rate acquiring unit 603, and selects a parameter that approximates the desired error rate. As described above, there are two types of errors in the error rate: errors in which the same person is identified as different (False Negative) and errors in which different persons are identified as the same (False Positive). , there is a trade-off relationship. That is, when trying to reduce False Negatives to identify the same person even if the degree of similarity is low (corresponding to lowering the threshold), False Positives for identifying different persons as the same increases. If it is determined that they are not the same even if the degree of similarity is high (equivalent to raising the threshold), False Positive decreases, but False Negative increases. Usually, it is set so that the sum of the two error rates is minimized, but depending on the application, it can be set according to the purpose, such as avoiding False Positive (wanting to avoid erroneous authentication). Such error rate target setting can be determined in advance by designation by the user or the like. It is possible to update the parameters of the discriminator, typically the similarity thresholds, so that they are close to the predetermined error rate in real surveillance images. The discriminator parameters thus selected are set as new discriminator parameters of the discriminating unit 203 . The above is the description of the processing performed by the determination unit 604 .

すべての識別器について、または予め指定された所定の回数を満たすまでは、Ｓ８０６からＳ８０３に戻り、誤り率取得部６０３が、誤り率を求め直す。これを予め定めた回数繰り返す。これによって、繰り返し処理することで識別の精度を向上させる。以上が、第１の決定で行われる処理の説明である。なお、第１の決定と第２の決定は片方のみ行ってもよいし、両方実施してもよい。また、顔特徴と人体特徴は、ケースに応じて入れ替えて処理を行ってもよい。例えば、学校行事等で似たような服装を着用する人物が多い場合は、人体特徴では個体を識別するのは難しい。そのため、顔特徴を用いて映像全体における人物の抽出を行うと良い。また、顔特徴と人体特徴だけではなく、持ち物や個体識別用の道具を用いて人物の特徴を抽出してもよい。 For all discriminators, or until a predetermined number of times is satisfied, the process returns from S806 to S803, and the error rate acquisition unit 603 recalculates the error rate. This is repeated a predetermined number of times. This improves the accuracy of identification by repeating the process. The above is the description of the processing performed in the first determination. Note that either one of the first determination and the second determination may be performed, or both of them may be performed. Moreover, the facial features and the human body features may be exchanged for processing according to the case. For example, when many people wear similar clothes at school events, it is difficult to identify individuals based on human body characteristics. Therefore, it is preferable to extract a person from the entire video using facial features. In addition to the facial features and human body features, the features of a person may also be extracted using belongings or tools for individual identification.

なお、所定の人物は、より多くのカメラによって撮像されている人物を選択するようにしてもよい。様々な角度から撮像された画像が得られていると識別器の学習や決定がうまくいきやすいためである。また、人物の登場回数（または撮像されている時間）に応じて、決定の際に重みづけを行ってもよい。所定の人物が撮像された画像が多いほど（撮像された時間が長いほど）、所定の人物が様々なアングルで撮像されている可能性が高いためである。こうすることによって、効率的に識別器の決定ができる。また、決定対象である撮像装置に対応する識別器は、撮像装置の設置位置によって決定の重みづけを行ってもよい。 It should be noted that the predetermined person may be selected from persons captured by more cameras. This is because the learning and determination of the discriminator are likely to go well if images captured from various angles are obtained. In addition, weighting may be performed at the time of determination according to the number of times the person appears (or the time during which the image is captured). This is because the greater the number of images in which a predetermined person is captured (the longer the image capturing time), the higher the possibility that the predetermined person is captured at various angles. By doing so, the discriminator can be determined efficiently. Further, the discriminator corresponding to the imaging device to be determined may weight the determination according to the installation position of the imaging device.

次に識別フェーズを説明する。パラメータ決定された識別器を利用する具体例として、１台以上の監視カメラの映像から予め登録された人物（以下、ターゲット人物と呼ぶ）を検出する例を説明する。ターゲット人物は施設の周辺を自由に行き来するため、複数のカメラで検出できることが望ましい。ターゲット人物が検出された場合、その旨をユーザに知らせることでユーザはターゲット人物に対して適切な対応をとることができる。なお、本実施形態における情報処理システムのタスクは、不特定多数の人物が映った映像からターゲット人物１０００を検出することである。ターゲット人物１０００とターゲット人物１０００’は同一人物である。監視カメラに映った人物が、事前に登録してあるターゲット人物に該当するか、画像から得る顔の特徴を使って識別する顔認証を行う。 The identification phase will now be described. As a specific example of using a parameter-determined classifier, an example of detecting a pre-registered person (hereinafter referred to as a target person) from images of one or more surveillance cameras will be described. Since the target person moves freely around the facility, it is desirable to be able to detect it with multiple cameras. When the target person is detected, the user is notified of the fact, so that the user can take appropriate measures against the target person. Note that the task of the information processing system in this embodiment is to detect the target person 1000 from a video in which an unspecified number of people appear. Target person 1000 and target person 1000' are the same person. Face authentication is performed to identify whether a person captured by a surveillance camera corresponds to a pre-registered target person or not using facial features obtained from an image.

識別フェーズでは、図２における画像取得部２０１、検出部２０２、識別部２０３と出力部２０６によって処理が行われる。画像取得部２０１は、各監視カメラからリアルタイムで撮像した時系列画像（映像）を取得する。検出部２０２では、決定フェーズと同様に、画像取得部２０１によって取得された時系列画像から人物を検出する。識別部２０３は、検出部２０２によって検出された人物から、２つの異なる部分特徴を抽出し、特徴から人物を識別する。ここで、決定フェーズで閾値を決定した識別器を用いる。出力部２０６は、識別部２０３によって識別された結果を図示しない表示部等に出力する。 In the identification phase, processing is performed by the image acquisition unit 201, the detection unit 202, the identification unit 203, and the output unit 206 in FIG. The image acquisition unit 201 acquires time-series images (videos) captured in real time from each surveillance camera. The detection unit 202 detects a person from the time-series images acquired by the image acquisition unit 201, as in the determination phase. The identification unit 203 extracts two different partial features from the person detected by the detection unit 202 and identifies the person from the features. Here, a discriminator whose threshold is determined in the determination phase is used. The output unit 206 outputs the result identified by the identification unit 203 to a display unit (not shown) or the like.

図４のフローチャートを用いて、識別フェーズについて説明する。Ｓ４００において、決定部２０４が、識別器の識別条件を更新しないと判断した場合（ＮＯ）、Ｓ４０４に進み、識別フェーズが実行される。Ｓ４０４は、識別部２０２が、各監視カメラによって取得された画像から特定の人物を識別する。なお、監視カメラ（情報処理装置）は、特定の人物の顔画像（人体画像）と登録画像として登録された共通のウォッチリスト（ブラックリスト）を保持しており、撮像された画像に含まれる対象人物が登録画像の人物と類似しているか比較する。登録画像との類似度のうち、最も類似度が高いかつその類似度が識別条件より大きい場合に、対象人物は最も類似した登録画像が示す人物であると識別する。 The identification phase will be described with reference to the flow chart of FIG. In S400, when the determination unit 204 determines not to update the discrimination condition of the discriminator (NO), the process proceeds to S404, and the discrimination phase is executed. In S404, the identification unit 202 identifies a specific person from the images acquired by each surveillance camera. In addition, the surveillance camera (information processing device) holds a common watch list (black list) registered as a specific person's face image (human body image) and registered image, and the target included in the imaged image Compare whether the person is similar to the person in the registered image. If the degree of similarity with the registered image is the highest and the degree of similarity is greater than the identification condition, the target person is identified as the person indicated by the most similar registered image.

図１０を用いてＳ４０４の処理を更に説明する。Ｓ１２０１では、識別部２０３が、対象人物の画像（または特徴）を取得する。ここでは、顔画像による識別器によって個人の識別をする場合を考える。そのため、対象人物の顔を示す対象顔画像を取得する。ユーザが過去の映像データから指定しても良い。または、リアルタイム映像から指定された人物の画像を取得する。 The processing of S404 will be further described with reference to FIG. In S1201, the identification unit 203 acquires an image (or features) of the target person. Here, a case is considered in which an individual is identified by a classifier based on a face image. Therefore, a target face image representing the target person's face is acquired. The user may specify from past video data. Alternatively, an image of a designated person is obtained from real-time video.

Ｓ１２０２では、識別部２０３が、画像取得部２０１で取得した時系列画像から特徴を取得する。なお、ここで取得する特徴は、決定した閾値に対応した特徴がよい。つまり、決定フェーズにおいて、顔特徴による識別器の閾値を決定した場合は、顔特徴を取得する。 In S<b>1202 , the identification unit 203 acquires features from the time-series images acquired by the image acquisition unit 201 . It should be noted that the feature acquired here is preferably a feature corresponding to the determined threshold value. That is, in the determination phase, when the threshold value of the discriminator based on the facial features is determined, the facial features are obtained.

Ｓ１２０４では、人物識別部５０５は、人体画像照合部５０３、および、部分画像照合部５０４による、照合結果をもとに、検出された人物を識別する。識別部２０３が、決定された識別器と、Ｓ１２０３で取得された特徴とに基づいて、時系列画像に含まれる人物を識別する。 In S<b>1204 , the person identification unit 505 identifies the detected person based on the matching results from the human body image matching unit 503 and the partial image matching unit 504 . The identifying unit 203 identifies a person included in the time-series images based on the determined classifier and the features acquired in S1203.

Ｓ１２０５では、識別部２０３が、識別器によって出力された結果と対象人物とを照合する。Ｓ１２０５では、識別部２０３が、Ｓ１２０４の照合結果に基づいて、対象人物が画像に映っているか判断する。Ｓ１２０４で、対象人物と識別結果が一致した場合は、Ｓ１２０６に進む。Ｓ１２０４で、対象人物と識別結果が一致しなかった場合は、Ｓ１２０１に戻る。 In S1205, the identifying unit 203 collates the result output by the classifier with the target person. In S1205, the identification unit 203 determines whether the target person appears in the image based on the collation result of S1204. In S1204, if the target person matches the identification result, the process proceeds to S1206. In S1204, if the target person does not match the identification result, the process returns to S1201.

Ｓ１２０６では、出力部２０６が、閾値に基づいて、対象顔画像を識別器に入力した出力結果と予め登録された登録人物の画像の特徴とを比較した類似度が閾値を満たす場合は対象顔画像が示す人物は登録人物であること、を示す判定結果を出力する。また、類似度が閾値を満たさない場合は対象顔画像が示す人物は登録人物ではないこと、を示す判定結果を出力する。具体的には、モニタ表示やアラート音によって、対象人物が検出されたことをユーザ（監視者）に伝える。 In step S1206, the output unit 206 compares the output result of inputting the target face image to the classifier with the feature of the image of the registered person registered in advance based on the threshold. is a registered person. If the degree of similarity does not satisfy the threshold, a determination result indicating that the person indicated by the target face image is not the registered person is output. Specifically, the user (surveillant) is notified that the target person has been detected by monitor display or an alert sound.

以上に説明した処理によって以下のような効果が期待される。従来ユーザは所望の誤り率を実現するために、設置環境の映像で、実際にどのような誤り率となっているか、確かめる必要があった。これは実際には困難である。次善の策として、所望の誤り率に近くなるように設置環境とパラメータの対応表を用意することが考えられるが、すべての条件を事前に用意することは困難であるため、現実には不十分な対応表しか用意しえない。 The following effects are expected from the processing described above. Conventionally, in order to achieve a desired error rate, the user had to confirm what the actual error rate was in the image of the installation environment. This is difficult in practice. As a second best measure, it is conceivable to prepare a correspondence table between the installation environment and the parameters so that the desired error rate can be obtained. Only a sufficient correspondence table can be prepared.

それに対して、本実施形態では実際の設置環境で得られた映像に対して、異なる撮像装置の映像から共通する人物の画像を取得し、これを用いてさまざまな撮像装置の閾値を変更可能となる。これにより、所望の誤り率を実現する閾値を人手に頼らずに設定可能となる。また、このように得られた映像によって適切なパラメータを設定可能なので、複数台あるカメラの映像に個別に閾値を設定することも可能となる。 On the other hand, in the present embodiment, images of a common person are acquired from images captured by different imaging devices, and the threshold values of various imaging devices can be changed using this image. Become. This makes it possible to set a threshold that achieves a desired error rate without relying on human intervention. In addition, since appropriate parameters can be set according to the images obtained in this way, it is also possible to set threshold values individually for the images of a plurality of cameras.

カメラ台数が数十台から百台以上になる大規模な情報処理システムでは、個々のカメラ映像に対する閾値の設定問題が必ず発生する。本実施形態では、そのような大規模なシステムの課題を、逆に複数のカメラの映像が得られるメリットとして利用し、より信頼性のある閾値を取得可能である。以上が実施形態１の内容の説明である。上記のような処理を実行することによってカメラの設置環境に応じて特定の人物を識別する条件を決定できる。 In a large-scale information processing system with several tens to hundreds of cameras, the problem of setting thresholds for individual camera images always arises. In this embodiment, such a problem of a large-scale system can be used as an advantage of obtaining images from a plurality of cameras, and a more reliable threshold value can be obtained. The above is the description of the contents of the first embodiment. By executing the processing as described above, the conditions for identifying a specific person can be determined according to the installation environment of the camera.

＜実施形態２＞
実施形態１では、複数ある人物識別器のうち最も信頼性の高い識別器を選び出し、その識別器により取得した、カメラ間の人物の対応関係をもとに、誤り率を取得して、適切なパラメータを設定する例を示した。それに対して、本実施形態では、複数ある識別器の取得した、複数のカメラ間人物の対応関係を、相互に参照して、パラメータを更新する点が異なる。 <Embodiment 2>
In the first embodiment, a classifier with the highest reliability is selected from among a plurality of person classifiers, an error rate is acquired based on the person correspondence between cameras acquired by the classifier, and an appropriate An example of setting parameters is shown. On the other hand, the present embodiment differs in that the parameters are updated by mutually referring to a plurality of inter-camera person correspondences acquired by a plurality of classifiers.

以下、具体的に説明する。なお、重複を避けるため、以下の説明において、実施形態１と同じ部分は、省略する。本実施形態にかかわるシステムの構成は、実施形態１と同じであるので、説明を省略する。説明は実施形態１を参照されたい。実施形態１と異なる点は、決定部の処理である。以下、実施形態１と異なる点を中心に説明を行う。 A specific description will be given below. In order to avoid duplication, the same parts as those of the first embodiment are omitted in the following description. The configuration of the system according to this embodiment is the same as that of the first embodiment, so the explanation is omitted. See Embodiment 1 for the description. A different point from the first embodiment is the processing of the determination unit. The following description will focus on the differences from the first embodiment.

決定部の構成は実施形態１と同じである。図７は決定部で行われるＳ４０３の処理の一例を示したフローチャートである。まず、Ｓ９０１では、決定部が、人物識別部の識別器から、任意の識別器を取得する。ここで適切な識別器とは、実施形態１と同じく、もっとも信頼されるべき識別器を選択すればよい。この信頼性は、事前に定められたデータを用いた性能評価を行い、その数値をもとに、取得すればよい。次に、Ｓ９０２では、対応付け部が、取得した識別器を使って、各撮像装置によって所定の期間に撮像された各時系列画像から検出された少なくとも１つ以上のオブジェクト（たとえば人物）について、基準となる識別子（ＩＤ）を付与する。識別子を付与する処理については、実施形態１の対応付け部６０２で行われる処理と同じであるので、割愛する。次に、Ｓ９０３では、決定部が、識別条件を決定する対象となるターゲット識別器を取得する。これは、特に基準は不要で、順番に選択すればよい。次に、Ｓ９０４では、誤り率取得部が、ターゲット識別器を用いて、識別子をもとに誤り率を取得する。Ｓ９０４では、決定部が、全カメラ全人物にＩＤを割り当てる。誤り率の求め方は、実施形態１の誤り率取得部６０３の処理と同じである。全人物に対するＩＤの割り当ても、Ｓ７０２の処理と同様にすればよい。Ｓ９０５では、決定部が、〇〇に基づいてすべての識別器に対して誤り率の取得が完了したか否かを判断する。全識別器に対して処理が済むまでこれを繰り返す（Ｓ９０５でＮｏの場合）。全識別器で誤り率、ＩＤ割り当てが済んだら（Ｓ９０５でＹｅｓの場合）、Ｓ９０６では、決定部が、誤り率に基づいて識別子の補正を行う。識別子の補正は、単純には、１つ以上の識別器で識別された識別結果もとに、多数決を行うようにするとよい。一つ一つの人物について、１つ以上の識別器の識別結果を投票し、もっとも投票数の多かった識別結果をその人物の識別子とすればよい。Ｓ７０５で同時に取得した誤り率をもとに、重み付の投票を行うようにしてもよい。誤り率の低い識別器の重みづけが大きくなるようにすればよい。例えば、１票に誤り率の逆数をかけて投票すればよい。誤り率は、ＦａｌｓｅＮｅｇａｔｉｖｅとＦａｌｓｅＰｏｓｉｔｉｖｅの２つがあるが、両者の和や平均を用いることができる。上記Ｓ９０３から、Ｓ９０６までの処理を所定回数繰り返す。予め定めた回数繰り返すようにしてもよいし、別の基準でやめるようにしてもよい。例えば、識別子の変化が所定の回数より少なくなった場合や、全識別器の誤り率の変化が所定の回数より少なくなった場合、などが考えられる。 The configuration of the determination unit is the same as that of the first embodiment. FIG. 7 is a flowchart showing an example of the processing of S403 performed by the determination unit. First, in S901, the determination unit acquires an arbitrary classifier from the classifiers of the person identification unit. As in the first embodiment, the appropriate discriminator may be a discriminator that should be the most reliable. This reliability can be obtained by performing performance evaluation using predetermined data and obtaining the numerical value. Next, in S902, the associating unit uses the acquired classifier to detect at least one object (for example, a person) from each time-series image captured by each imaging device in a predetermined period. A reference identifier (ID) is given. The process of assigning an identifier is the same as the process performed by the associating unit 602 of the first embodiment, and therefore will be omitted. Next, in S903, the determination unit acquires a target classifier for which a classification condition is to be determined. This does not require a particular criterion, and can be selected in order. Next, in S904, the error rate acquisition unit acquires the error rate based on the identifier using the target classifier. In S904, the determination unit assigns IDs to all persons in all cameras. The method of obtaining the error rate is the same as the processing of the error rate acquisition unit 603 of the first embodiment. Assignment of IDs to all persons may be performed in the same manner as in S702. In S905, the determination unit determines whether acquisition of error rates for all discriminators has been completed based on ◯◯. This is repeated until processing is completed for all discriminators (if No in S905). When the error rate and ID assignment have been completed for all discriminators (Yes in S905), in S906 the determination unit corrects the identifier based on the error rate. Correction of identifiers may simply be performed by majority vote based on the identification results identified by one or more identifiers. For each person, the identification results of one or more classifiers are voted, and the identification result with the largest number of votes is used as the person's identifier. Weighted voting may be performed based on the error rate simultaneously acquired in S705. It suffices to increase the weighting of discriminators with low error rates. For example, one vote may be multiplied by the reciprocal of the error rate. There are two error rates, False Negative and False Positive, and the sum or average of both can be used. The processing from S903 to S906 is repeated a predetermined number of times. It may be repeated a predetermined number of times, or may be stopped based on another criterion. For example, it is conceivable that the number of identifier changes is less than a predetermined number, or that the error rate of all discriminators is less than a predetermined number of times.

上記のようにして求めた識別子をもとに、決定部で、誤り率を取得して、識別条件の変更を行うようにする。この処理は、実施形態１と同じである。このように、識別子の割り当てを１つの識別器の結果ではなく、複数の識別器の結果で補正し、より信頼性の高いものにしていくことで、より適切な識別子の取得と決定が可能になる。例えば、人物の識別を行う場合、人体特徴での照合と、顔特徴での照合の２つが考えられるが、一般的に、解像度が十分であれば、顔特徴の方が、信頼性が高いとされる。人体の特徴は、ＤＬ特徴であっても、色特徴に近い特徴と考えられるので、同じような服装の人物が複数いると、誤る可能性が高い。そのため、相対的に顔特徴の方が、信頼性が高いと言え、識別子は顔特徴で取得するのが妥当と考えられるが、顔特徴も万能ではない。例えば、解像度が低い場合や、顔の向きが正面から大きく離れた場合、顔の一部が隠れてしまっている場合などは、十分な精度は得られず、人体の特徴を用いた方が、精度が高い場合もある。このように、１つの識別器の結果だけを参照して、パラメータを更新するのではなく、複数の識別器の識別結果を相互に参照して、識別子を更新した方が、より適切な識別条件の決定につながり、ユーザの意図した誤り率の実現に寄与できると考えられる。以上が、実施形態２の説明である。 Based on the identifier obtained as described above, the determination unit acquires the error rate and changes the identification condition. This processing is the same as in the first embodiment. In this way, it is possible to acquire and determine more appropriate identifiers by correcting identifier assignment not by the results of a single classifier but by the results of multiple classifiers, making it more reliable. Become. For example, when identifying a person, there are two possible ways to identify a person: matching using human body features and matching using facial features. be done. Human body features, even if they are DL features, are considered to be features close to color features. Therefore, it can be said that facial features are relatively more reliable, and it is appropriate to acquire identifiers based on facial features, but facial features are not perfect. For example, when the resolution is low, when the face direction is far from the front, or when part of the face is hidden, sufficient accuracy cannot be obtained, and it is better to use the features of the human body. Accuracy may be high. In this way, instead of updating the parameter by referring only to the result of one discriminator, it is better to mutually refer to the discrimination results of a plurality of discriminators to update the identifier, which is a more appropriate discrimination condition. is determined, and it is thought that it can contribute to realization of the error rate intended by the user. The above is the description of the second embodiment.

本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、データ通信用のネットワーク又は各種記憶媒体を介してシステム或いは装置に供給する。そして、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、そのプログラムをコンピュータが読み取り可能な記録媒体に記録して提供してもよい。 The present invention is also realized by executing the following processing. That is, the software (program) that implements the functions of the above-described embodiments is supplied to the system or device via a network for data communication or various storage media. Then, the computer (or CPU, MPU, etc.) of the system or device reads and executes the program. Alternatively, the program may be recorded on a computer-readable recording medium and provided.

１情報処理システム
２撮影システム
３監視システム
１０Ａ～１０Ｄ監視カメラ
１００情報処理装置
１０７表示装置
１０００，１０００’ 人物 1 information processing system 2 photographing system 3 surveillance system 10A to 10D surveillance camera 100 information processing device 107 display device 1000, 1000' person

Claims

顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理装置であって、
複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出手段と、
前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定手段と、
前記特定手段によって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定手段と、を有することを特徴とする情報処理装置。 An information processing apparatus for determining a threshold value for determining that a person indicated by the facial feature is the same person as a predetermined person based on an output result of a classifier that outputs a facial feature corresponding to a person from a face image,
extracting means for extracting a face image showing a person's face and a human body image showing a human body corresponding to the face from a plurality of images;
identifying means for identifying a first group of human body images similar to the human body image from the plurality of images;
determining that the face image group corresponding to the first human body image group specified by the specifying means is not the same person based on a first similarity obtained by comparing first output results input to the classifier; and determining means for determining the threshold value so that the ratio of the information to be processed is smaller than a predetermined ratio.

前記決定手段は、前記第１の類似度に基づいて同一人物でないと判定する割合が、予め決定された前記閾値に基づいて取得される同一人物でないと判定される割合より小さくなるように前記閾値を更新することを特徴とする請求項１に記載の情報処理装置。 The determining means sets the threshold value so that a ratio of determining that the person is not the same person based on the first similarity is smaller than a ratio of determining that the person is not the same person acquired based on the predetermined threshold value. 2. The information processing apparatus according to claim 1, wherein the update is performed.

前記決定手段によって決定された前記閾値に基づいて、対象顔画像を前記識別器に入力した出力結果と予め登録された登録人物の顔画像の特徴とを比較した類似度が前記閾値を満たす場合は前記対象顔画像に対応する人物は前記登録人物であること、前記類似度が前記閾値を満たさない場合は前記対象顔画像に対応する人物は前記登録人物ではないこと、を示す判定結果を出力する出力手段を更に有することを特徴とする請求項１に記載の情報処理装置。 If the similarity obtained by comparing the output result of inputting the target face image into the classifier and the feature of the face image of the registered person registered in advance satisfies the threshold based on the threshold determined by the determination means outputting a determination result indicating that the person corresponding to the target face image is the registered person, and that the person corresponding to the target face image is not the registered person if the similarity does not satisfy the threshold; 2. An information processing apparatus according to claim 1, further comprising output means.

前記決定手段は、前記第１の類似度に基づいて、同一人物でないと判定される割合を示す本人拒否率が、所定の値より小さくなるように前記閾値を決定することを特徴とする請求項２または３に記載の情報処理装置。 3. The determining means determines the threshold such that a false rejection rate indicating a rate of determination that the person is not the same person is smaller than a predetermined value based on the first degree of similarity. 4. The information processing device according to 2 or 3.

前記特定手段は、所定の人物を示す前記人体画像と類似した前記第１の人体画像群と、前記所定の人物とは類似しない人物を示す第２の人体画像群を複数の前記画像からさらに特定し、
前記決定手段は、さらに前記第２の人体画像群と対応する顔画像群を前記識別器に入力した第２の出力結果同士を比較した第２の類似度に基づいて、同一人物であると判定される割合が所定の割合より小さくなるように前記閾値を決定することを特徴とする請求項１乃至４のいずれか１項に記載の情報処理装置。 The identifying means further identifies, from the plurality of images, the first human body image group similar to the human body image representing a predetermined person and the second human body image group representing a person not similar to the predetermined person. death,
The determining means further determines that the second human body image group and the corresponding face image group are the same person based on a second similarity obtained by comparing second output results obtained by inputting the face image group corresponding to the second human body image group to the classifier. 5. The information processing apparatus according to any one of claims 1 to 4, wherein the threshold value is determined so that the ratio of the data to be processed is smaller than a predetermined ratio.

前記決定手段は、前記第２の類似度に基づいて、同一人物であると判定される割合を示す他人受入率が、所定の値より小さくなるように前記閾値を決定することを特徴とする請求項５に記載の情報処理装置。 The determination means determines the threshold value so that a false acceptance rate, which indicates a rate of determination that the person is the same person, is smaller than a predetermined value based on the second degree of similarity. Item 6. The information processing apparatus according to item 5.

前記決定手段は、前記第１の類似度に基づいて同一人物でないと判定される割合と前記第２の類似度に基づいて同一人物であると判定される割合との和が所定の値より小さくなるように前記閾値を決定することを特徴とする請求項５または６に記載の情報処理装置。 The determination means determines that the sum of the percentage of individuals determined not to be the same person based on the first degree of similarity and the percentage of individuals determined to be the same person based on the second degree of similarity is smaller than a predetermined value. 7. The information processing apparatus according to claim 5, wherein the threshold is determined so that

前記複数の画像から人物を検出する検出手段を更に有し、
前記抽出手段は、前記検出手段によって検出された人物に基づいて、前記顔画像と、前記人体画像とを抽出することを特徴とする請求項１乃至７のいずれか１項に記載の情報処理装置。 further comprising detecting means for detecting a person from the plurality of images;
8. The information processing apparatus according to claim 1, wherein said extraction means extracts said face image and said human body image based on the person detected by said detection means. .

前記識別器は、人物の顔が含まれる画像を入力とし、人物毎に固有の出力をするように学習された識別器であることを特徴とする請求項１乃至８のいずれか１項に記載の情報処理装置。 9. The classifier according to any one of claims 1 to 8, wherein the classifier is a classifier trained to receive an image including a person's face as an input and produce a unique output for each person. information processing equipment.

前記決定手段は、複数の撮像装置毎に用意されたそれぞれの前記閾値を決定することを特徴とする請求項１乃至９のいずれか１項に記載の情報処理装置。 10. The information processing apparatus according to any one of claims 1 to 9, wherein said determining means determines each of said thresholds prepared for each of a plurality of imaging apparatuses.

前記決定手段は、すでに決定した前記閾値に基づいて、異なる環境に対応する閾値を決定することを特徴とする請求項１乃至１０のいずれか１項に記載の情報処理装置。 11. The information processing apparatus according to any one of claims 1 to 10, wherein said determining means determines thresholds corresponding to different environments based on said already determined thresholds.

コンピュータを、請求項１乃至１１のいずれか１項に記載の情報処理装置が有する各手段として機能させるためのプログラム。 A program for causing a computer to function as each unit included in the information processing apparatus according to any one of claims 1 to 11.

顔画像から人物と対応する顔特徴を出力する識別器の出力結果に基づいて前記顔特徴が示す人物が所定の人物と同一人物であると判定する閾値を決定する情報処理装置による情報処理方法であって、
複数の画像から、人物の顔を示す顔画像と、前記顔と対応する人体を示す人体画像と、を抽出する抽出ステップと、
前記人体画像と類似した第１の人体画像群を複数の前記画像から特定する特定ステップと、
前記特定ステップによって特定された前記第１の人体画像群と対応する顔画像群を、前記識別器に入力した第１の出力結果同士を比較した第１の類似度に基づいて、同一人物でないと判定される割合が所定の割合より小さくなるように前記閾値を決定する決定ステップと、を有することを特徴とする情報処理方法。 An information processing method using an information processing apparatus for determining a threshold value for determining that a person indicated by a facial feature is the same person as a predetermined person based on an output result of a discriminator that outputs a facial feature corresponding to a person from a face image. There is
an extracting step of extracting a face image showing a person's face and a human body image showing a human body corresponding to the face from a plurality of images;
an identifying step of identifying a first group of human body images similar to the human body image from the plurality of images;
The face image group corresponding to the first human body image group specified by the specifying step is determined to be the same person based on a first similarity obtained by comparing the first output results input to the classifier. and a determination step of determining the threshold such that the determined ratio is smaller than a predetermined ratio.