JP6762754B2

JP6762754B2 - Information processing equipment, information processing methods and programs

Info

Publication number: JP6762754B2
Application number: JP2016083882A
Authority: JP
Inventors: 椎山　弘隆; 弘隆椎山; 昌弘松下
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-04-19
Filing date: 2016-04-19
Publication date: 2020-09-30
Anticipated expiration: 2036-04-19
Also published as: JP2017194798A

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method and a program.

従来、監視映像の各フレームから、人物の顔を検索することで、迷子や列車等への乗り遅れ客を検出する技術が知られている。特許文献１には、監視映像の各フレームから人物の顔を検出し、その顔から画像特徴を算出して映像のフレームと対応付けてデータベース（ＤＢ）に蓄積し、例えば迷子の子供の顔をクエリとして、ＤＢから迷子の子供の映像を検索する技術が開示されている。また、特許文献２には、予め検出すべき人物の顔を登録しておき、入力された映像からリアルタイムに人物を検出する技術が開示されている。 Conventionally, there has been known a technique of detecting a lost child or a passenger who missed a train or the like by searching the face of a person from each frame of a surveillance image. In Patent Document 1, a person's face is detected from each frame of a surveillance image, image features are calculated from the face and stored in a database (DB) in association with the image frame, for example, a lost child's face is displayed. As a query, a technique for searching a video of a lost child from a DB is disclosed. Further, Patent Document 2 discloses a technique of registering a face of a person to be detected in advance and detecting the person in real time from the input video.

特開２０１３−１５３３０４号公報Japanese Unexamined Patent Publication No. 2013-153304 特開２０１４−２１５７４７号公報Japanese Unexamined Patent Publication No. 2014-215747

しかしながら、特許文献１の技術では、クエリが発行された時点でＤＢに蓄積された映像を検索対象とするものの、その後に得られた映像から対象人物を検索することができない。このため、迷子や乗り遅れ客の現時点の居場所を知らせ確保することができない。一方で、特許文献２の技術では、クエリが発行された時点よりも前の映像に遡及して人物検索を行う場合には、処理速度が遅くなってしまう。このため、迷子や乗り遅れ客を確保する目的においては、特許文献２の技術では十分でない。 However, in the technique of Patent Document 1, although the video stored in the DB at the time when the query is issued is searched, the target person cannot be searched from the video obtained thereafter. For this reason, it is not possible to inform and secure the current whereabouts of lost children and late passengers. On the other hand, in the technique of Patent Document 2, the processing speed becomes slow when the person is searched retroactively to the video before the time when the query is issued. Therefore, the technique of Patent Document 2 is not sufficient for the purpose of securing lost children and late passengers.

本発明はこのような問題点に鑑みなされたもので、処理速度の低下を招くことなく、過去から継続的に、人物を検索することを目的とする。 The present invention has been made in view of such problems, and an object of the present invention is to continuously search for a person from the past without causing a decrease in processing speed.

そこで、本発明は、情報処理装置であって、人物の検索指示を受け付ける受付手段と、前記検索指示に係る対象人物から抽出された特徴に基づいて、前記検索指示を受け付けた受付時点より前に撮像手段から入力され、記憶手段に記憶された映像から抽出された人物の特徴を第１の検索範囲として、前記対象人物を検索する第１の検索処理を行う第１の検索手段と、前記検索指示に係るクエリ画像から抽出された前記対象人物の特徴に基づいて、前記受付時点の後に前記撮像手段から入力された映像から抽出された人物の特徴を第２の検索範囲としてから前記対象人物を検索する第２の検索処理を行う第２の検索手段とを有し、前記第１の検索手段は、さらに前記受付時点から前記第２の検索手段による前記第２の検索処理の準備が完了し、前記第２の検索処理を開始するまでの間に入力された映像から抽出された特徴から前記対象人物を検索する処理を行うことを特徴とする。 Therefore, the present invention is an information processing device, which is based on a reception means for receiving a search instruction for a person and features extracted from the target person related to the search instruction, before the reception time when the search instruction is received. A first search means for performing a first search process for searching for the target person, and the search, using the characteristics of a person input from the imaging means and extracted from the image stored in the storage means as the first search range . Based on the characteristics of the target person extracted from the query image according to the instruction, the characteristics of the person extracted from the image input from the imaging means after the reception time are set as the second search range, and then the target person is selected. It has a second search means for performing a second search process for searching, and the first search means is further prepared for the second search process by the second search means from the time of the reception. It is characterized in that the process of searching for the target person from the features extracted from the video input before the start of the second search process is performed.

本発明によれば、処理速度の低下を招くことなく、過去から継続的に、人物を検索することができる。 According to the present invention, a person can be continuously searched from the past without causing a decrease in processing speed.

第１の実施形態に係る監視システムを示す図である。It is a figure which shows the monitoring system which concerns on 1st Embodiment. 情報処理装置のソフトウェア構成を示す図である。It is a figure which shows the software configuration of an information processing apparatus. 映像記録処理を示すフローチャートである。It is a flowchart which shows the video recording process. 第１の検索処理及び第２の検索処理の説明図である。It is explanatory drawing of the 1st search process and the 2nd search process. 検索処理を示すフローチャートである。It is a flowchart which shows the search process. 第２の実施形態に係る検索処理を示すフローチャートである。It is a flowchart which shows the search process which concerns on 2nd Embodiment.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は、第１の実施形態に係る監視システムを示す図である。監視システムは、情報処理装置１００と、撮像部としてのカメラ１１２とを有し、クエリとして指定された人物の検索を行う。図１においては、人物検出システムが３台のカメラ１１２を有する場合を例示しているが、カメラ１１２の数は実施形態に限定されるものではない。情報処理装置１００とカメラ１１２は、ネットワーク１１１を介して接続している。 (First Embodiment)
FIG. 1 is a diagram showing a monitoring system according to the first embodiment. The monitoring system has an information processing device 100 and a camera 112 as an imaging unit, and searches for a person designated as a query. Although FIG. 1 illustrates a case where the person detection system has three cameras 112, the number of cameras 112 is not limited to the embodiment. The information processing device 100 and the camera 112 are connected to each other via the network 111.

ＣＰＵ１０１は、情報処理装置１００全体を制御するＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔである。ＲＯＭ１０２は、変更を必要としないプログラムやパラメータを格納するＲｅａｄＯｎｌｙＭｅｍｏｒｙである。ＲＡＭ１０３は、外部装置などから供給されるプログラムやデータを一時記憶するＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙである。外部記憶装置１０４は、情報処理装置１００に固定して設置されたハードディスクやメモリカードなどの記憶装置である。なお、外部記憶装置１０４は、情報処理装置１００から着脱可能なフレキシブルディスク（ＦＤ）やＣｏｍｐａｃｔＤｉｓｋ（ＣＤ）等の光ディスク、磁気や光カード、ＩＣカード、メモリカードなどを含んでもよい。なお、後述する情報処理装置１００の機能や処理は、ＣＰＵ１０１がＲＯＭ１０２や外部記憶装置１０４に格納されているプログラムを読み出し、このプログラムを実行することにより実現されるものである。 The CPU 101 is a Central Processing Unit that controls the entire information processing apparatus 100. The ROM 102 is a Read Only Memory that stores programs and parameters that do not require changes. The RAM 103 is a Random Access Memory that temporarily stores programs and data supplied from an external device or the like. The external storage device 104 is a storage device such as a hard disk or a memory card fixedly installed in the information processing device 100. The external storage device 104 may include an optical disk such as a flexible disk (FD) or a compact disk (CD) that can be attached to and detached from the information processing device 100, a magnetic or optical card, an IC card, a memory card, and the like. The functions and processes of the information processing device 100, which will be described later, are realized by the CPU 101 reading a program stored in the ROM 102 or the external storage device 104 and executing this program.

入力Ｉ／Ｆ１０５は、ユーザの操作を受け、データを入力するポインティングデバイスやキーボードなどの入力部１０９とのインターフェースである。出力デバイスＩ／Ｆ１０６は、情報処理装置１００の保持するデータや供給されたデータを表示するためのモニタ１１０とのインターフェースである。通信Ｉ／Ｆ１０７は、インターネットなどのネットワーク１１１に接続する。カメラ１１２は、監視カメラなどの映像の撮像装置であり、ネットワーク１１１を介して情報処理装置１００に接続されている。システムバス１０８は、１０１〜１０７の各ユニットを通信可能に接続する伝送路である。 The input I / F 105 is an interface with an input unit 109 such as a pointing device or a keyboard that receives a user's operation and inputs data. The output device I / F 106 is an interface with the monitor 110 for displaying the data held by the information processing device 100 and the supplied data. The communication I / F 107 connects to a network 111 such as the Internet. The camera 112 is an image capturing device such as a surveillance camera, and is connected to the information processing device 100 via a network 111. The system bus 108 is a transmission line for communicably connecting the units 101 to 107.

図２は、情報処理装置１００のソフトウェア構成を示す図である。情報処理装置１００は、画像特徴として顔から得る顔画像特徴を用い、これを利用した画像特徴検索を行う。映像受付部２０１は、カメラ１１２により撮影され、カメラ１１２から情報処理装置１００に入力された映像を受け付ける。映像記憶部２０２は、映像受付部２０１が受け付けた映像を記憶する。 FIG. 2 is a diagram showing a software configuration of the information processing device 100. The information processing device 100 uses a face image feature obtained from a face as an image feature, and performs an image feature search using the face image feature. The image receiving unit 201 receives an image taken by the camera 112 and input from the camera 112 to the information processing device 100. The video storage unit 202 stores the video received by the video reception unit 201.

追尾部２０３は、映像受付部２０１が受け付けた映像中の人物の追尾を行う。追尾部２０３は、具体的には、動きベクトルから物体を検出し、次フレームでの探索位置を推定しテンプレートマッチングにより人物追尾を行う。人物追尾処理に関しては、下記の文献を参照することができる。

特開２００２−３７３３３２号公報

追尾部２０３は、同じ人物を追尾している追尾トラックに対して同じ追尾トラックＩＤを発行し、異なる人物の追尾トラックに対しては異なる追尾トラックＩＤを発行する。これにより、一意性が担保され、追尾トラックＩＤから同一人物の特定が可能となる。なお、追尾部２０３は、同じ人物であっても、追尾が途切れた場合は、異なる追尾トラックＩＤを発行する。 The tracking unit 203 tracks a person in the video received by the video reception unit 201. Specifically, the tracking unit 203 detects an object from the motion vector, estimates the search position in the next frame, and performs person tracking by template matching. Regarding the person tracking process, the following documents can be referred to.

JP-A-2002-373332

The tracking unit 203 issues the same tracking track ID to the tracking tracks tracking the same person, and issues different tracking track IDs to the tracking tracks of different persons. As a result, uniqueness is guaranteed, and the same person can be identified from the tracking track ID. The tracking unit 203 issues a different tracking track ID even if the person is the same person, if the tracking is interrupted.

顔検出部２０４は、映像中のフレーム画像等から顔検出を行う。顔検出部２０４は例えば、追尾部２０３で追尾された人物のフレーム画像のそれぞれから顔検出を行う。また、映像受付部２０１から入力された映像や、後述するクエリ画像から、顔検出を行う。顔検出部２０４は、処理対象の画像に対し、片目候補領域を検出し、複数の片目候補領域からペアリングを行い、ペアリングされた両目位置に基づいて顔領域を決定する。なお、画像から人物の顔を検出する方法については、以下の文献を参照することができる。

特開２０１０−１６５１５６号公報
The face detection unit 204 detects the face from a frame image or the like in the video. For example, the face detection unit 204 detects the face from each of the frame images of the person tracked by the tracking unit 203. In addition, face detection is performed from the video input from the video reception unit 201 or the query image described later. The face detection unit 204 detects a one-eye candidate region for the image to be processed, performs pairing from the plurality of one-eye candidate regions, and determines the face region based on the paired eye positions. The following documents can be referred to for a method of detecting a person's face from an image.

JP-A-2010-165156

代表決定部２０５では、追尾された人物のフレーム画像群から、代表となる顔画像を選択する。以下、代表となる顔画像を代表画像と称する。代表決定部２０５は、例えば、顔検出部２０４で検出された顔画像のうち、顔サイズの大きい画像を選択する。顔サイズを用いる理由として、画像特徴の精度の問題が有る。顔画像が大きいほど、精度の高い画像特徴が得られるからである。すなわち、顔画像から画像特徴を算出する際、顔画像の大きさを一定の大きさに変倍する顔サイズ正規化処理を行う必要がある。その際、顔画像が上記一定の大きさよりも大きい場合には縮小処理を行い情報のロスは比較的小さいが、上記一定の大きさよりも小さい場合には超解像度処理の様な画素補完を行う必要があり、情報の劣化が激しい。 The representative determination unit 205 selects a representative face image from the frame image group of the tracked person. Hereinafter, a representative face image will be referred to as a representative image. The representative determination unit 205 selects, for example, an image having a large face size from the face images detected by the face detection unit 204. The reason for using the face size is the problem of accuracy of image features. This is because the larger the face image, the more accurate the image features can be obtained. That is, when calculating the image features from the face image, it is necessary to perform the face size normalization process for scaling the size of the face image to a constant size. At that time, if the face image is larger than the above-mentioned constant size, reduction processing is performed and the loss of information is relatively small, but if it is smaller than the above-mentioned constant size, it is necessary to perform pixel complementation such as super-resolution processing. There is severe deterioration of information.

また、他の例としては、代表決定部２０５は、フレーム画像群から複数の画像を代表画像として選択してもよい。例えば、複数の顔の向きの画像を選択する方法がある。同じ人の画像であっても、顔向きが異なるとその画像から得られる画像特徴が異なるためである。 Further, as another example, the representative determination unit 205 may select a plurality of images from the frame image group as representative images. For example, there is a method of selecting images with a plurality of face orientations. This is because even if the image of the same person is different in face orientation, the image features obtained from the image are different.

代表決定部２０５は、特徴として勾配方向ヒストグラム（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ，ＨＯＧ）を抽出し、ＳＶＲで顔の向きを推定する。勾配方向ヒストグラムは、画像の輝度勾配情報を画像の局所毎にヒストグラム化した特徴で、局所的なノイズや画像の明暗にロバストな特徴として知られている。ノイズや照明変動のような、顔の向きに関係のない変動にロバストな特徴を選択することで、実環境においても安定した顔向き推定を実現している。画像から人物の顔向きを検出する方法については、以下の文献を参照することができる。

ＥｒｉｋＭｕｐｈｙ−Ｃｈｕｔｏｒｉａｎ， "Ｈｅａｄｐｏｓｅｅｓｔｉｍａｔｉｏｎｆｏｒｄｒｉｖｅｒａｓｓｉｓｔａｎｃｅｓｙｓｔｅｍｓ：Ａｒｏｂｕｓｔａｌｇｏｒｉｔｈｍａｎｄｅｘｐｅｒｉｍｅｎｔａｌｅｖａｌｕａｔｉｏｎ，" ｉｎＰｒｏｃ．ＩＥＥＥＣｏｎｆ．ＩｎｔｅｌｌｉｇｅｎｔＴｒａｎｓｐｏｒｔａｔｉｏｎＳｙｓｔｅｍｓ，２００７，ｐｐ．７０９−７１４．
The representative determination unit 205 extracts a gradient direction histogram (Histogram of Oriented Gradient, HOG) as a feature, and estimates the orientation of the face by SVR. The gradient direction histogram is a feature in which the luminance gradient information of an image is made into a histogram for each local part of the image, and is known as a feature that is robust to local noise and light and darkness of the image. By selecting robust features for fluctuations that are not related to face orientation, such as noise and lighting fluctuations, stable face orientation estimation is achieved even in the actual environment. The following documents can be referred to for a method of detecting the face orientation of a person from an image.

Eric Muphy-Chutorian, "Head pose establishment for driver assistance systems: A robot algorithm and experimental evaluation," in Proc. IEEE Conf. Intelligent Transport Systems, 2007, pp. 709-714.

また、代表決定部２０５は、人物の顔向きに代えて、人物の人体の向きを検出するようにしてもよい。人体の向きの検出は、例えば、以下の文献に記載の方法を用いることができる。

特開２０１１−１８６５７６号公報
Further, the representative determination unit 205 may detect the orientation of the human body of the person instead of the orientation of the person's face. For the detection of the orientation of the human body, for example, the method described in the following documents can be used.

Japanese Unexamined Patent Publication No. 2011-186576

また、他の例としては、代表決定部２０５はさらに、画像のブレが少ない画像を代表画像として選択してもよい。動画を撮影するカメラでも静止画のカメラと同様に、その場所の明るさに従いシャッター速度が変わる場合がある。従って、暗い場所や被写体の動き速度により、顔画像のブレが生じることがあり、これは直接的に画像特徴や属性情報の劣化の原因となる。ブレの推定に関しては、代表決定部２０５は、顔画像領域の周波数成分を求め、低周波成分と高周波成分との比率を求め、これが低周波成分の比率が所定の値を超えた時にブレを生じていると判断すればよい。その他、代表決定部２０５は、眼つぶりや口あき等の顔画像については、代表画像として選択しないようにしてもよい。眼つぶりや口あき等があると、器官の画像特徴が変質する可能性があるためである。 Further, as another example, the representative determination unit 205 may further select an image with less blurring as the representative image. Similar to a still image camera, a camera that shoots a moving image may change the shutter speed according to the brightness of the place. Therefore, blurring of the face image may occur depending on a dark place or the movement speed of the subject, which directly causes deterioration of image features and attribute information. Regarding the estimation of blur, the representative determination unit 205 obtains the frequency component of the face image region, obtains the ratio of the low frequency component to the high frequency component, and causes blur when the ratio of the low frequency component exceeds a predetermined value. You can judge that it is. In addition, the representative determination unit 205 may not select a face image such as eye contact or mouth opening as a representative image. This is because the image features of the organ may be altered if there is eye contact or mouth opening.

特徴算出部２０６は、処理対象の顔画像の画像特徴の算出を行う。特徴算出部２０６は、具体的には、人物の顔画像内の目・口などの器官点を求め、各器官点のＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴を算出する。なお、この特徴は例であり、算出される特徴の種類は実施形態に限定されるものではない。第１の特徴記憶部２０７は、映像受付部２０１が受け付けた映像に対して特徴算出部２０６において算出された画像特徴を記憶する。第１の特徴記憶部２０７はさらに、画像特徴のメタデータとして、人物のＩＤや人物を追尾したときの追尾トラックＩＤ、撮影時刻や撮影したカメラ１１２のカメラＩＤを関連付けて記憶する。第１の特徴記憶部２０７は、例えば、外部記憶装置１０４である。 The feature calculation unit 206 calculates the image features of the face image to be processed. Specifically, the feature calculation unit 206 obtains organ points such as eyes and mouth in the face image of a person, and calculates the SIFT (Scale Invariant Feature Transfer) features of each organ point. It should be noted that this feature is an example, and the type of feature calculated is not limited to the embodiment. The first feature storage unit 207 stores the image features calculated by the feature calculation unit 206 with respect to the video received by the video reception unit 201. The first feature storage unit 207 further stores the image feature metadata in association with a person's ID, a tracking track ID when the person is tracked, a shooting time, and a camera ID of the camera 112 that has taken a picture. The first feature storage unit 207 is, for example, an external storage device 104.

クエリ受付部２０８は、クエリ画像の指定を受け付ける。ここで、クエリ画像とは、検索キーとなる人物の顔画像である。検索キーとなる人物を対象人物と称する。例えば、ＣＰＵ１０１は、外部記憶装置１０４に記憶された顔画像をモニタ１１０に表示する。そして、ユーザは、モニタ１１０を見ながら、入力部１０９を介して、検索したい人物の顔画像をクエリ画像として選択する。これに対応し、クエリ受付部２０８は、ユーザにより選択されたクエリ画像を受け付ける。なお、クエリ画像は、１枚でもよく２枚以上であってもよい。 The query reception unit 208 accepts the designation of the query image. Here, the query image is a face image of a person who is a search key. The person who becomes the search key is called the target person. For example, the CPU 101 displays the face image stored in the external storage device 104 on the monitor 110. Then, while looking at the monitor 110, the user selects the face image of the person to be searched as the query image via the input unit 109. Correspondingly, the query reception unit 208 accepts the query image selected by the user. The query image may be one image or two or more images.

第２の特徴記憶部２０９は、クエリ受付部２０８が受け付けたクエリ画像（顔画像）の画像特徴を記憶する。ここで、クエリ画像の画像特徴は、対象人物に対して予め定められた基準特徴の一例である。なお、クエリ画像の画像特徴は、特徴算出部２０６により算出される。第２の特徴記憶部２０９は、例えば、外部記憶装置１０４である。第２の特徴記憶部２０９へのクエリ画像の画像特徴の記録は、ＣＰＵ１０１により行われる。第２の特徴記憶部２０９にクエリ画像の画像特徴を記録する処理は、後述の第２の検索部２１２により実行される入力画像の検索処理を開始するための準備処理である。また、ＣＰＵ１０１は、後述する対象除外の指示を受け付けた場合には、第２の特徴記憶部２０９から画像特徴を削除する。 The second feature storage unit 209 stores the image features of the query image (face image) received by the query reception unit 208. Here, the image feature of the query image is an example of a reference feature predetermined for the target person. The image feature of the query image is calculated by the feature calculation unit 206. The second feature storage unit 209 is, for example, an external storage device 104. The recording of the image feature of the query image in the second feature storage unit 209 is performed by the CPU 101. The process of recording the image features of the query image in the second feature storage unit 209 is a preparatory process for starting the search process of the input image executed by the second search unit 212 described later. Further, when the CPU 101 receives the instruction of exclusion of the target described later, the CPU 101 deletes the image feature from the second feature storage unit 209.

第１の検索部２１１は、第１の検索処理を実行する。ここで、第１の検索処理は、クエリ画像から算出された画像特徴をクエリとし、第１の検索処理に対して設定された検索範囲の映像から対象人物を検索する処理である。本実施形態に係る第１の検索部２１１は、第１の特徴記憶部２０７に記憶された画像特徴を検索範囲とし、第１の特徴記憶部２０７において、類似度が所定閾値よりも高い画像特徴を特定する。類似度の算出にあたっては、各器官点のＳＩＦＴ特徴の距離の総和を求め、距離の総和を正規化し、類似度を求める。 The first search unit 211 executes the first search process. Here, the first search process is a process of using the image feature calculated from the query image as a query and searching for the target person from the images in the search range set for the first search process. The first search unit 211 according to the present embodiment uses the image features stored in the first feature storage unit 207 as the search range, and the image features whose similarity is higher than a predetermined threshold value in the first feature storage unit 207. To identify. In calculating the similarity, the sum of the distances of the SIFT features of each organ point is obtained, the sum of the distances is normalized, and the similarity is obtained.

第２の検索部２１２は、第２の検索処理を実行する。ここで、第２の検索処理は、第２の特徴記憶部２０９に記憶された画像特徴（基準特徴）をクエリとし、第２の検索処理に対して設定された検索範囲の映像から対象人物を検索する処理である。本実施形態に係る第２の検索部２１２は、クエリ画像を指定した検索指示を受け付けた時点以降に、映像受付部２０１が受け付けた映像から算出された画像特徴を検索範囲とする。ここで、検索範囲には、映像受付部２０１が受け付けた映像の他、一時記憶部２１０に記憶されている映像も含まれる。第２の検索部２１２は、検索範囲において、類似度が所定閾値よりも高い画像特徴を特定する。類似度の算出の方法は、第１の検索部２１１において説明したのと同様である。 The second search unit 212 executes the second search process. Here, in the second search process, the image feature (reference feature) stored in the second feature storage unit 209 is used as a query, and the target person is selected from the video in the search range set for the second search process. This is the process of searching. The second search unit 212 according to the present embodiment sets the image feature calculated from the video received by the video reception unit 201 after the time when the search instruction for which the query image is specified is received as the search range. Here, the search range includes not only the video received by the video receiving unit 201 but also the video stored in the temporary storage unit 210. The second search unit 212 identifies an image feature having a similarity higher than a predetermined threshold value in the search range. The method of calculating the similarity is the same as that described in the first search unit 211.

表示処理部２１３は、第１の検索部２１１の検索結果と第２の検索部２１２の識別結果を統合し、その結果をモニタ１１０に表示する。表示処理部２１３は、例えば、カメラ１１２毎に、検索結果を分類して撮影時刻順にモニタ１１０に表示する。また、他の例としては、表示処理部２１３は、カメラ１１２の設置位置をマップ表示し、撮影時刻をもとに移動履歴をモニタ１１０に表示する。なお、表示内容は、実施形態に限定されるものではない。 The display processing unit 213 integrates the search result of the first search unit 211 and the identification result of the second search unit 212, and displays the result on the monitor 110. For example, the display processing unit 213 classifies the search results for each camera 112 and displays them on the monitor 110 in the order of shooting time. As another example, the display processing unit 213 displays the installation position of the camera 112 on a map, and displays the movement history on the monitor 110 based on the shooting time. The displayed content is not limited to the embodiment.

除外受付部２１４は、迷子や乗り遅れ客を確保するなどして、検索する必要が無くなった人物の情報を受け付け、該人物の画像特徴を第２の特徴記憶部２０９から削除する。表示処理部２１３のユーザインタフェース上で迷子や乗り遅れ客の確保したことを指定することが可能である。また、航空機の乗り遅れ客を検索する場合は、航空機の搭乗口を通過したことを検知し、それをトリガとして該人物の画像特徴を第２の特徴記憶部２０９から削除することも可能である。 The exclusion reception unit 214 receives information on a person who no longer needs to be searched by securing a lost child or a late passenger, and deletes the image feature of the person from the second feature storage unit 209. It is possible to specify on the user interface of the display processing unit 213 that a lost child or a late passenger has been secured. Further, when searching for a passenger who missed the aircraft, it is possible to detect that the passenger has passed the boarding gate of the aircraft and use this as a trigger to delete the image feature of the person from the second feature storage unit 209.

図３は、情報処理装置１００による、映像記録処理を示すフローチャートである。映像記録処理は、映像受付部２０１が受け付けた映像を検索可能なように蓄積する処理である。Ｓ３０１において、映像受付部２０１は、映像を受け付ける。次に、Ｓ３０２において、ＣＰＵ１０１は、Ｓ３０１において受け付けた映像を映像記憶部２０２に記録する。このとき、ＣＰＵ１０１は、映像のメタデータとして、撮影時刻や撮影したカメラ１１２のカメラＩＤを関連付けて記録する。本処理は、カメラ１１２から入力された映像を記憶部に記録する記録処理の一例である。 FIG. 3 is a flowchart showing the video recording process by the information processing device 100. The video recording process is a process of accumulating the video received by the video receiving unit 201 so that it can be searched. In S301, the video reception unit 201 receives the video. Next, in S302, the CPU 101 records the video received in S301 in the video storage unit 202. At this time, the CPU 101 records the shooting time and the camera ID of the camera 112 as the video metadata in association with each other. This process is an example of a recording process for recording the image input from the camera 112 in the storage unit.

次に、Ｓ３０３において、追尾部２０３は、各フレーム画像から人物を検出し、追尾を行う。ここで、検出された人物にはフレーム画像毎に別々の人物ＩＤが割り振られ、フレーム画像中の人物の座標とともに一時記憶される。また、追尾を行っている人物に対しては、同じ追尾トラックＩＤが割り振られ、追尾を行っているフレーム画像のＩＤとともに一時記憶される。次に、Ｓ３０４において、追尾部２０３は、追尾が途切れた人物が存在する場合には（Ｓ３０４でＹＥＳ）、処理をＳ３０５へ進める。追尾部２０３は、追尾が途切れた人物が存在しない場合には（Ｓ３０４でＮＯ）、処理をＳ３０１へ進める。 Next, in S303, the tracking unit 203 detects a person from each frame image and performs tracking. Here, a different person ID is assigned to the detected person for each frame image, and is temporarily stored together with the coordinates of the person in the frame image. Further, the same tracking track ID is assigned to the person performing the tracking, and is temporarily stored together with the ID of the frame image being tracked. Next, in S304, the tracking unit 203 advances the process to S305 when there is a person whose tracking is interrupted (YES in S304). If there is no person whose tracking is interrupted (NO in S304), the tracking unit 203 advances the process to S301.

Ｓ３０５において、顔検出部２０４は、追尾部２０３で追尾された人物のフレーム画像のそれぞれから顔検出を行う。次に、Ｓ３０６において、代表決定部２０５は、追尾された人物のフレーム画像群から、代表となる顔画像を１枚もしくは複数枚選択する。次に、Ｓ３０７において、特徴算出部２０６は、Ｓ３０６において選択された１枚もしくは複数枚の顔画像から顔画像特徴の算出を行う。 In S305, the face detection unit 204 detects the face from each of the frame images of the person tracked by the tracking unit 203. Next, in S306, the representative determination unit 205 selects one or a plurality of representative face images from the frame image group of the tracked person. Next, in S307, the feature calculation unit 206 calculates facial image features from one or a plurality of facial images selected in S306.

次に、Ｓ３０８において、ＣＰＵ１０１は、Ｓ３０７で算出された画像特徴を第１の特徴記憶部２０７に記録する。ＣＰＵ１０１はさらに、画像特徴のメタデータとして、人物ＩＤや人物を追尾したときの追尾トラックＩＤ、撮影時刻や撮影カメラを関連付けて記録する。次に、Ｓ３０９において、ＣＰＵ１０１は、蓄積処理の終了指示を受け付けたか否かを確認する。ＣＰＵ１０１は、終了指示を受け付けた場合には（Ｓ３０９でＹＥＳ）、処理を終了する。ＣＰＵ１０１は、終了指示を受け付けなかった場合には（Ｓ３０９でＮＯ）、処理をＳ３０１へ進める。以上の処理により、カメラ１１２から入力された映像に写る人物の顔画像の画像特徴が第１の特徴記憶部２０７に蓄積され、検索可能な状態になる。 Next, in S308, the CPU 101 records the image feature calculated in S307 in the first feature storage unit 207. The CPU 101 further records a person ID, a tracking track ID when the person is tracked, a shooting time, and a shooting camera as metadata of image features. Next, in S309, the CPU 101 confirms whether or not the instruction to end the storage process has been received. When the CPU 101 receives the end instruction (YES in S309), the CPU 101 ends the process. If the CPU 101 does not accept the end instruction (NO in S309), the CPU 101 proceeds to S301. By the above processing, the image features of the face image of the person appearing in the image input from the camera 112 are accumulated in the first feature storage unit 207, and are in a searchable state.

次に、情報処理装置１００による人物検索処理について説明する。人物検索処理の説明に先立ち、図４を参照しつつ、第１の検索部２１１による第１の検索処理と、第２の検索部２１２による第２の検索処理と、について説明する。図４（ａ）は、第１の検索処理及び第２の検索処理の説明図である。第１の検索処理では、抽出された画像特徴をクエリとした照合が行われる。このため、任意の人物をクエリにした検索が可能である。他方、第２の検索処理では、クエリ人物の画像特徴を事前登録しておく必要がある。 Next, the person search process by the information processing device 100 will be described. Prior to the description of the person search process, the first search process by the first search unit 211 and the second search process by the second search unit 212 will be described with reference to FIG. FIG. 4A is an explanatory diagram of the first search process and the second search process. In the first search process, collation is performed using the extracted image features as a query. Therefore, it is possible to search by using an arbitrary person as a query. On the other hand, in the second search process, it is necessary to pre-register the image features of the query person.

精度に関しては、第２の検索処理は、機械学習等による精度面でのメリットがある。速度に関して、第１の検索処理は、登録が完了した映像の検索に関しては速いが、第１の特徴記憶部２０７に反映するまでに時間が掛かるのでスループット（登録及び検索）の観点ではやや劣る。他方、第２の検索処理では、事前登録する人物数をある程度の数に抑えておけばリアルタイムに認証を行うことができるので、スループットが高いと考える。 Regarding accuracy, the second search process has an advantage in terms of accuracy due to machine learning or the like. Regarding the speed, the first search process is fast for the search of the registered video, but it is slightly inferior in terms of throughput (registration and search) because it takes time to reflect it in the first feature storage unit 207. On the other hand, in the second search process, if the number of persons to be pre-registered is suppressed to a certain number, authentication can be performed in real time, so that the throughput is considered to be high.

従って、好ましい用途としては、第１の検索処理は、過去の大量画像からの任意の人物を蓄積動画からの検索であり、第２の検索処理は、限られた人数の特定人物をリアルタイムに識別することである。 Therefore, as a preferred application, the first search process is a search from a video accumulating an arbitrary person from a large number of past images, and the second search process identifies a limited number of specific people in real time. It is to be.

このため、本実施形態に係る情報処理装置１００は、クエリ人物の指定の指示を受け付け、人物検索を開始した時点より前（過去）の映像に関しては第１の検索処理を行い、その時点より後（未来）の映像に対しては第２の検索処理を行うこととする。これにより、例えば迷子探しを適切に行うことができる。過去の映像に対しても検索を行うのは、全てのカメラ１１２に映らない場所に居ることも考えられるため過去の居場所の情報も重要であり、また移動先の推定にも役立つためである。 Therefore, the information processing device 100 according to the present embodiment receives the instruction for designating the query person, performs the first search process for the video before (past) the time when the person search is started, and after that time. The second search process is performed on the (future) video. As a result, for example, a lost child search can be appropriately performed. The reason why the past video is also searched is that the information on the past location is important because it is possible that the camera is in a location that is not reflected by all the cameras 112, and it is also useful for estimating the destination.

ただし、第１の検索処理と第２の検索処理を単に組み合わせるだけでは問題が生ずる。図４（ｂ）は、第１の検索処理と第２の検索処理において検索される検索範囲（時間範囲）を示す図である。第１の検索処理では、第１の特徴記憶部２０７に記録された映像を検索範囲とする。このため、クエリ指定を受け付けた受付時点の映像が検索可能となるまでには、少々の準備期間（ディレイ）４０１が生じる。ここで、準備期間は、映像受付部２０１が検索指示を受け付けた受付時点から、後述の第２の検索部２１２による検索処理の準備処理が完了する準備完了時点までの期間である。 However, simply combining the first search process and the second search process causes a problem. FIG. 4B is a diagram showing a search range (time range) searched in the first search process and the second search process. In the first search process, the image recorded in the first feature storage unit 207 is set as the search range. Therefore, a slight preparation period (delay) 401 occurs before the video at the time of acceptance of the query designation can be searched. Here, the preparation period is a period from the time when the video reception unit 201 receives the search instruction to the time when the preparation for the search process by the second search unit 212, which will be described later, is completed.

第２の検索処理においても、各カメラ１１２の映像から検索する画像特徴のセットや学習等の初期化処理に時間が掛かり、検出処理が可能となるまでの準備期間（ディレイ）４０２が生じる。このため、準備期間４０１と準備期間４０２とを加えた時間帯である検索不可能時間帯４０３が生じることとなる。監視システムの様なセキュリティ用途において、検索不可能時間帯４０３を少なくするのが望ましい。 Also in the second search process, it takes time to perform initialization processes such as setting and learning of image features to be searched from the images of each camera 112, and a preparation period (delay) 402 until the detection process becomes possible occurs. Therefore, a searchable time zone 403, which is a time zone obtained by adding the preparation period 401 and the preparation period 402, occurs. In security applications such as surveillance systems, it is desirable to reduce the non-searchable time zone 403.

図５は、情報処理装置１００による検索処理を示すフローチャートである。Ｓ５０１において、クエリ受付部２０８は、検索クエリとなる人物のクエリ画像を指定した検索指示を受け付ける。ＣＰＵ１０１は、Ｓ５０１の処理の後、３つのスレッドにて処理を実行する。第１のスレッドは、第１の検索処理を行うスレッドであり、Ｓ５０２〜Ｓ５０５の処理を含む。第２のスレッドは、第２の検索処理を行うスレッドであり、Ｓ５１１〜Ｓ５１３の処理を含む。第３のスレッドは、検索結果を統合し、表示する処理であり、Ｓ５２１、Ｓ５２２の処理を含む。 FIG. 5 is a flowchart showing a search process by the information processing device 100. In S501, the query reception unit 208 receives a search instruction specifying a query image of a person to be a search query. After the process of S501, the CPU 101 executes the process in three threads. The first thread is a thread that performs the first search process, and includes the processes of S502 to S505. The second thread is a thread that performs the second search process, and includes the processes of S511 to S513. The third thread is a process of integrating and displaying the search results, and includes the processes of S521 and S522.

第１のスレッドのＳ５０２において、第１の検索部２１１は、第１の検索処理を行うことにより、対象人物の検索を行う。なお、この場合の検索範囲は、第１の特徴記憶部２０７に記憶されている映像、すなわち、検索指示を受け付けた受付時点よりも前に情報処理装置１００に入力され、映像記憶部２０２に記憶された映像の画像特徴である。また、検索クエリは、検索指示に係るクエリ画像から検出された顔画像の画像特徴である。第１の検索部２１１は、検索により得られた画像と、画像の撮影場所と、時刻等の情報を表示処理部２１３へ送る。 In S502 of the first thread, the first search unit 211 searches for the target person by performing the first search process. The search range in this case is input to the information processing device 100 before the image stored in the first feature storage unit 207, that is, the reception time when the search instruction is received, and is stored in the image storage unit 202. This is an image feature of the image. Further, the search query is an image feature of the face image detected from the query image related to the search instruction. The first search unit 211 sends information such as the image obtained by the search, the shooting location of the image, and the time to the display processing unit 213.

次に、Ｓ５０３において、第１の検索部２１１は、第２の検索部２１２による第２の検索処理が開始したか否かを確認する。第１の検索部２１１は、第２の検索処理が開始した場合には（Ｓ５０３でＹｅｓ）、処理をＳ５０５へ進める。第１の検索部２１１は、第２の検索処理が開始していない場合には（Ｓ５０３でＮｏ）、処理をＳ５０４へ進める。 Next, in S503, the first search unit 211 confirms whether or not the second search process by the second search unit 212 has started. When the second search process is started (Yes in S503), the first search unit 211 advances the process to S505. If the second search process has not started (No in S503), the first search unit 211 advances the process to S504.

Ｓ５０４において、第１の検索部２１１は、再び、第１の検索処理を行う。なお、Ｓ５０４における第１の検索処理の検索範囲は、第１の特徴記憶部２０７に記憶されている画像特徴のうち、既に実行済みの第１の検索処理の検索範囲に設定されていない画像特徴である。第１の検索部２１１は、その後処理をＳ５０３へ進める。このように、第１の検索部２１１はだ、第２の検索処理が開始するまで、新たに入力された映像の特徴を検索範囲として、第１の検索処理を繰り返し実行する。第１の検索部２１１はさらに、検索により得られた画像と、画像の撮影場所と、時刻等の情報を表示処理部２１３へ送る。そして、Ｓ５０５において、第１の検索部２１１は、第１の検索処理を停止する。以上で、第１のスレッドの処理が終了する。このように、第２の検索処理が開始されるまでの期間に入力された映像に対しては、第１の検索処理の検索範囲に設定され、第１の検索処理が行われる。したがって、検索不可能時間帯４０３を短くすることができる。 In S504, the first search unit 211 again performs the first search process. The search range of the first search process in S504 is an image feature that is not set in the search range of the first search process that has already been executed among the image features stored in the first feature storage unit 207. Is. The first search unit 211 then proceeds to S503 for processing. In this way, the first search unit 211 repeatedly executes the first search process with the newly input video feature as the search range until the second search process starts. The first search unit 211 further sends information such as the image obtained by the search, the shooting location of the image, and the time to the display processing unit 213. Then, in S505, the first search unit 211 stops the first search process. This completes the processing of the first thread. In this way, the video input during the period until the second search process is started is set in the search range of the first search process, and the first search process is performed. Therefore, the non-searchable time zone 403 can be shortened.

第２のスレッドのＳ５１１においては、第２の検索部２１２は、第２の検索処理を開始するための準備処理を行う。準備処理は、対象人物のクエリ画像の画像特徴を基準特徴として第２の特徴記憶部２０９に登録（記録）する処理である。Ｓ５１１において、表示処理部２１３は、クエリ画像をモニタ１１０に表示してもよい。次に、Ｓ５１２において、第２の検索部２１２は、基準特徴に基づき、受付時点より後に、情報処理装置１００に入力された映像から対象人物を検索する第２の検索処理を開始する。そして、第２の検索部２１２は、検索により得られた画像と、画像の撮影場所と、時間等の情報を表示処理部２１３へ送る。なお、Ｓ５１２において第２の検索処理が開始されると、前述のＳ５０３において、第１の検索部２１１が第２の検索処理が開始したと判断し、第１の検索処理が停止する。 In S511 of the second thread, the second search unit 212 performs a preparatory process for starting the second search process. The preparatory process is a process of registering (recording) the image feature of the query image of the target person in the second feature storage unit 209 as a reference feature. In S511, the display processing unit 213 may display the query image on the monitor 110. Next, in S512, the second search unit 212 starts a second search process for searching the target person from the video input to the information processing device 100 after the reception time based on the reference feature. Then, the second search unit 212 sends information such as the image obtained by the search, the shooting location of the image, and the time to the display processing unit 213. When the second search process is started in S512, the first search unit 211 determines that the second search process has started in the above-mentioned S503, and the first search process is stopped.

次に、Ｓ５１３において、第２の検索部２１２は、ＣＰＵ１０１停止指示を受け付けたか否かを確認する。ＣＰＵ１０１は、停止指示を受け付けなかった場合には（Ｓ５１３でＮｏ）、第２の検索処理を継続する。ＣＰＵ１０１は、停止指示を受け付けた場合には（Ｓ５１３でＹｅｓ）、処理を終了する。例えば、迷子や乗り遅れ客を確保する等して、クエリとして指定された人物の検索が必要なくなった場合に、ユーザは、停止指示を入力する。 Next, in S513, the second search unit 212 confirms whether or not the CPU 101 stop instruction has been accepted. If the CPU 101 does not accept the stop instruction (No in S513), the CPU 101 continues the second search process. When the CPU 101 receives the stop instruction (Yes in S513), the CPU 101 ends the process. For example, when it is no longer necessary to search for the person specified as the query by securing a lost child or a late passenger, the user inputs a stop instruction.

第３のスレッドのＳ５２１において、表示処理部２１３は、Ｓ５０２、Ｓ５０４、Ｓ５１２において得られた検出結果を統合し、統合後の検出結果をモニタ１１０に表示する。次に、Ｓ５２２において、ＣＰＵ１０１は、停止指示を受け付けたか否かを確認する。ＣＰＵ１０１は、停止指示を受け付けなかった場合には（Ｓ５２２でＮｏ）、処理をＳ５２１へ進め、表示処理を継続する。ＣＰＵ１０１は、停止指示を受け付けた場合には（Ｓ５２２でＹｅｓ）、処理を終了する。 In S521 of the third thread, the display processing unit 213 integrates the detection results obtained in S502, S504, and S512, and displays the integrated detection result on the monitor 110. Next, in S522, the CPU 101 confirms whether or not the stop instruction has been accepted. If the CPU 101 does not accept the stop instruction (No in S522), the CPU 101 advances the process to S521 and continues the display process. When the CPU 101 receives the stop instruction (Yes in S522), the CPU 101 ends the process.

以上のように、情報処理装置１００は、検索不可能時間帯４０３について第１の検索処理により対象人物の検索を行い、その後第１の検索処理を停止する。そして、情報処理装置１００は、停止指示を受け付けるまで。第２の検索処理により対象人物の検索を行う。また、クエリ人物を指定して検索実行時に未だ顔画像特徴検索にて特徴量がインデキシングされていない時間帯と、顔画像特徴識別開始までの時間帯を合わせた物に相当する検索不可能時間帯を、第１の検索処理を再帰的に行う事で少なくすることができる。このように、本実施形態の情報処理装置１００は、処理速度の低下を招くことなく、過去から継続的に、人物を検索することができる。 As described above, the information processing apparatus 100 searches for the target person by the first search process for the search impossible time zone 403, and then stops the first search process. Then, the information processing device 100 receives the stop instruction. The target person is searched by the second search process. In addition, the search impossible time zone corresponding to the time zone in which the feature amount is not indexed by the face image feature search when the query person is specified and the search is executed and the time zone until the face image feature identification start is combined. Can be reduced by performing the first search process recursively. As described above, the information processing apparatus 100 of the present embodiment can continuously search for a person from the past without causing a decrease in processing speed.

（第２の実施形態）
次に、第２の実施形態に係る監視システムについて説明する。第２の実施形態に係る監視システムの情報処理装置１００は、第１の検索処理を第２の検索処理の前に行い、第１の検索処理において検出された画像を、第２の検索処理において利用する。これにより、検索精度を向上させることができる。 (Second Embodiment)
Next, the monitoring system according to the second embodiment will be described. The information processing device 100 of the monitoring system according to the second embodiment performs the first search process before the second search process, and the image detected in the first search process is used in the second search process. Use. Thereby, the search accuracy can be improved.

以下、第２の実施形態に係る監視システムについて、第１の実施形態に係る監視システムと異なる点について説明する。図６は、第２の実施形態に係る情報処理装置１００による検索処理を示すフローチャートである。Ｓ６０１において、クエリ受付部２０８は、検索クエリとなる人物のクエリ画像を指定した検索指示を受け付ける。ＣＰＵ１０１は、６０１の処理の後、２つのスレッドにて処理を実行する。第１のスレッドは検索を行うスレッドであり、Ｓ６０２〜Ｓ６１７の処理を含む。第２のスレッドは、検索結果を統合し、表示する処理であり、Ｓ６２１、Ｓ６２２の処理を含む。 Hereinafter, the difference between the monitoring system according to the second embodiment and the monitoring system according to the first embodiment will be described. FIG. 6 is a flowchart showing a search process by the information processing apparatus 100 according to the second embodiment. In S601, the query reception unit 208 receives a search instruction specifying a query image of a person to be a search query. The CPU 101 executes the processing in two threads after the processing of 601. The first thread is a thread that performs a search, and includes the processes of S602 to S617. The second thread is a process of integrating and displaying the search results, and includes the processes of S621 and S622.

第１のスレッドのＳ６０２において、第１の検索部２１１は、第１の検索処理を行うことにより、対象人物の検索を行う。なお、この場合の検索範囲は、第１の特徴記憶部２０７に記憶されている映像、すなわち、検索指示を受け付けた受付時点よりも前に情報処理装置１００に入力され、映像記憶部２０２に記憶された映像の画像特徴である。また、検索クエリは、検索指示に係るクエリ画像から検出された顔画像の画像特徴である。第１の検索部２１１は、検索により得られた画像と、画像の撮影場所と、時刻等の情報を表示処理部２１３へ送る。 In S602 of the first thread, the first search unit 211 searches for the target person by performing the first search process. The search range in this case is input to the information processing device 100 before the image stored in the first feature storage unit 207, that is, the reception time when the search instruction is received, and is stored in the image storage unit 202. This is an image feature of the image. Further, the search query is an image feature of the face image detected from the query image related to the search instruction. The first search unit 211 sends information such as the image obtained by the search, the shooting location of the image, and the time to the display processing unit 213.

ＣＰＵ１０１はさらに、Ｓ６０２の処理の後、２つのスレッドにて処理を実行する。第１１のスレッドは、第１の検索処理を継続するスレッドであり、Ｓ６０３〜Ｓ６０５の処理を含む。第１２のスレッドは、第２の検索処理を行うスレッドであり、Ｓ６１１〜Ｓ６１７の処理を含む。第１１のスレッドのＳ６０３〜Ｓ６０５の処理は、図５を参照しつつ説明した、第１の実施形態の検索処理のＳ５０３〜Ｓ５０５の処理と同様である。 The CPU 101 further executes the processing in two threads after the processing of S602. The eleventh thread is a thread that continues the first search process, and includes the processes S603 to S605. The twelfth thread is a thread that performs the second search process, and includes the processes of S611 to S617. The processing of S603 to S605 of the eleventh thread is the same as the processing of S503 to S505 of the search processing of the first embodiment described with reference to FIG.

第１２のスレッドのＳ６１１において、第２の検索部２１２は、Ｓ６０２における第１の検索処理において対象人物が検出されたか否かを確認する。第２の検索部２１２は、対象人物を検出した場合には（Ｓ６１１でＹｅｓ）、処理をＳ６１２へ進める。第２の検索部２１２は、対象人物を検出しなかった場合には（Ｓ６１１でＮｏ）、処理をＳ６１５へ進める。Ｓ６１２において、第２の検索部２１２は、クエリ画像だけでなく、Ｓ６０２における第１の検索処理において対象人物が検出された検出画像を利用して、第２の検索処理を行うべく、その準備を行う。 In S611 of the twelfth thread, the second search unit 212 confirms whether or not the target person is detected in the first search process in S602. When the second search unit 212 detects the target person (Yes in S611), the second search unit 212 advances the process to S612. If the second search unit 212 does not detect the target person (No in S611), the second search unit 212 proceeds to S615. In S612, the second search unit 212 prepares to perform the second search process by using not only the query image but also the detected image in which the target person is detected in the first search process in S602. Do.

次に、Ｓ６１３において、第２の検索部２１２は、クエリ画像と検出画像とを利用した、第２の検索処理を開始する。次に、Ｓ６１４において、第２の検索部２１２は、停止指示を受け付けたか否かを確認する。第２の検索部２１２は、停止指示を受け付けた場合には（Ｓ６１４でＹｅｓ）、第２１のスレッド処理を終了する。第２の検索部２１２は、停止指示を受け付けなかった場合には（Ｓ６１４でＮｏ）、第２の検索処理を継続する。Ｓ６１５においては、第２の検索部２１２は、クエリ画像のみを利用した第２の検索処理を行うべくその準備処理を行う。なお、Ｓ６１５〜Ｓ６１７の処理は、図５を参照しつつ説明した、第１の実施形態の検索処理のＳ５１１〜５１３の処理と同様である。 Next, in S613, the second search unit 212 starts the second search process using the query image and the detected image. Next, in S614, the second search unit 212 confirms whether or not the stop instruction has been accepted. When the second search unit 212 receives the stop instruction (Yes in S614), the second search unit 212 ends the 21st thread processing. If the second search unit 212 does not accept the stop instruction (No in S614), the second search unit 212 continues the second search process. In S615, the second search unit 212 performs the preparatory process to perform the second search process using only the query image. The processing of S615 to S617 is the same as the processing of S511 to 513 of the search processing of the first embodiment described with reference to FIG.

第２のスレッドのＳ６２１においては、表示処理部２１３は、Ｓ６０２、Ｓ６０４、Ｓ６１３、Ｓ６１６において得られた検出結果を統合し、統合した研修結果をモニタ１１０に表示する。次に、Ｓ６２２において、ＣＰＵ１０１は、停止指示を受け付けたか否かを確認する。ＣＰＵ１０１は、停止指示を受け付けなかった場合には（Ｓ６２２でＮｏ）、処理をＳ６２１へ進め、表示処理を継続する。ＣＰＵ１０１は、停止指示を受け付けた場合には（Ｓ６２２でＹｅｓ）、処理を終了する。なお、第２の実施形態に係る監視システムのこれ以外の構成及び処理は、第１の実施形態に係る監視システムの構成及び処理と同様である。 In S621 of the second thread, the display processing unit 213 integrates the detection results obtained in S602, S604, S613, and S616, and displays the integrated training results on the monitor 110. Next, in S622, the CPU 101 confirms whether or not the stop instruction has been accepted. If the CPU 101 does not accept the stop instruction (No in S622), the CPU 101 advances the process to S621 and continues the display process. When the CPU 101 receives the stop instruction (Yes in S622), the CPU 101 ends the process. The other configurations and processes of the monitoring system according to the second embodiment are the same as the configurations and processes of the monitoring system according to the first embodiment.

以上のように、第２の実施形態の情報処理装置１００は、検索指示の受付時点よりも前に情報処理装置１００に入力され、映像記憶部２０２に記憶された映像の特徴を検索範囲とする第１の検索処理を第２の検索処理の前に実行する。そして、情報処理装置１００は、この第１の検索処理の後に、この第１の検索処理において得られた対象人物の検出画像を用いて第２の検索処理を行う。これにより、検索精度を向上させることができる。さらに、情報処理装置１００は、第１の実施形態と同様に、第２の検索処理が開始するまでは、異なる検索範囲において第１の検索処理を継続する。したがって、情報処理装置１００は、処理速度の低下を招くことなく、過去から継続的に、人物を検索することができる。 As described above, the information processing device 100 of the second embodiment has the characteristics of the video input to the information processing device 100 and stored in the video storage unit 202 as the search range before the time when the search instruction is received. The first search process is executed before the second search process. Then, after the first search process, the information processing apparatus 100 performs a second search process using the detected image of the target person obtained in the first search process. Thereby, the search accuracy can be improved. Further, the information processing apparatus 100 continues the first search process in a different search range until the second search process is started, as in the first embodiment. Therefore, the information processing apparatus 100 can continuously search for a person from the past without causing a decrease in processing speed.

以下、第１の実施形態及び第２の実施形態の変更例について説明する。第１の変更例としては、画像特徴は、顔画像の画像特徴に限定されるものではない。他の例としては、画像特徴は、人体全体から得る人体画像特徴であってもよい。人体画像特徴としては、例えば身長と体幅の比率や、頭身や身長に占める足の長さの比率などの数値が挙げられる。また他の例としては、情報処理装置１００は、画像特徴に替えて、顔を含む人体全体から得られる人物の属性情報を用いてもよい。属性情報の例としては、人種、性別、年齢、メガネの有無、髭、服の色等がある。 Hereinafter, examples of modification of the first embodiment and the second embodiment will be described. As a first modification, the image features are not limited to the image features of the face image. As another example, the image feature may be a human body image feature obtained from the entire human body. Examples of human body image features include numerical values such as the ratio of height to body width and the ratio of foot length to head and body and height. As another example, the information processing apparatus 100 may use the attribute information of a person obtained from the entire human body including the face, instead of the image feature. Examples of attribute information include race, gender, age, presence / absence of glasses, beard, color of clothes, and the like.

また他の例としては、情報処理装置１００は、画像特徴と属性情報の両方を、画像特徴として用いてもよい。例えば、情報処理装置１００は、画像特徴用の第１の検索処理及び第２の検索処理のスコアと、属性情報用の第１の検索処理及び第２の検索処理のスコアと、を個別に算出する。そして、情報処理装置１００は、両スコアの重み付け和を総合スコアとして算出してもよい。 As another example, the information processing apparatus 100 may use both the image feature and the attribute information as the image feature. For example, the information processing apparatus 100 individually calculates the scores of the first search process and the second search process for image features and the scores of the first search process and the second search process for attribute information. To do. Then, the information processing apparatus 100 may calculate the weighted sum of both scores as the total score.

第２の変更例としては、情報処理装置１００は、クエリ人物と検索結果としての画像とを区別可能に表示してもよい。これにより、誤検索か否かをユーザが確認することができる。 As a second modification example, the information processing apparatus 100 may display the query person and the image as the search result in a distinguishable manner. This allows the user to confirm whether or not the search is erroneous.

第３の変更例としては、第１の検索部２１１及び第２の検索部２１２のうち少なくとも一方は、対象人物の複数の画像を検出した場合に、顔の向きが異なる画像を優先的に検出することとしてもよい。 As a third modification example, when at least one of the first search unit 211 and the second search unit 212 detects a plurality of images of the target person, the images having different face orientations are preferentially detected. You may do it.

第４の変更例としては、第１の検索部２１１及び第２の検索部２１２のうち少なくとも一方は、処理時点により近い時点において撮影された画像を、優先的に検出することとしてもよい。迷子等人物確保を目的とする場合に有用である。 As a fourth modification example, at least one of the first search unit 211 and the second search unit 212 may preferentially detect an image taken at a time point closer to the processing time point. It is useful when the purpose is to secure a person such as a lost child.

第５の変更例としては、情報処理装置１００の第１の検索部２１１は、第２の検索処理が開始するまでに情報処理装置１００に入力された映像に対し第２の検索処理を行えばよく、そのための具体的な処理は、実施形態に限定されるものではない。他の例としては、第１の検索部２１１は、第２の検索処理が開始した時点で、第２の検索処理が開始するまでに情報処理装置１００に入力され、映像記憶部２０２に記憶されている映像の特徴を検索範囲とする第１の検索処理を１回行うこととしてもよい。 As a fifth modification example, if the first search unit 211 of the information processing device 100 performs the second search process on the video input to the information processing device 100 by the time the second search process starts. Often, the specific processing for that purpose is not limited to the embodiment. As another example, when the second search process starts, the first search unit 211 is input to the information processing device 100 and stored in the video storage unit 202 by the time the second search process starts. The first search process may be performed once with the feature of the moving image as the search range.

以上、上述した各実施形態によれば、処理速度の低下を招くことなく、過去から継続的に、人物を検索することができる。 As described above, according to each of the above-described embodiments, it is possible to continuously search for a person from the past without causing a decrease in processing speed.

以上、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。上述の実施形態の一部を適宜組み合わせてもよい。 Although the present invention has been described in detail based on the preferred embodiments thereof, the present invention is not limited to these specific embodiments, and various embodiments within the scope of the gist of the present invention are also included in the present invention. included. Some of the above-described embodiments may be combined as appropriate.

上述の説明においては、第１の検索処理において、類似度が所定閾値よりも高い画像特徴を特定することにより、対象人物の検出画像を取得できるようにしている。しかし、例えば、情報処理装置１００を使用する使用者（監視員）が対象人物を特定し、その情報が情報処理装置１００の入力手段により入力されることにより、その対象人物の検出画像を取得できるようにしてもよい。 In the above description, in the first search process, the detected image of the target person can be acquired by specifying the image feature whose similarity is higher than the predetermined threshold value. However, for example, when a user (observer) who uses the information processing device 100 identifies a target person and the information is input by the input means of the information processing device 100, the detected image of the target person can be acquired. You may do so.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００情報処理装置
２０１映像受付部
２０２映像記憶部
２１１第１の検索部
２１２第２の検索部 100 Information processing device 201 Video reception unit 202 Video storage unit 211 First search unit 212 Second search unit

Claims

人物の検索指示を受け付ける受付手段と、
前記検索指示に係る対象人物から抽出された特徴に基づいて、前記検索指示を受け付けた受付時点より前に撮像手段から入力され、記憶手段に記憶された映像から抽出された人物の特徴を第１の検索範囲として、前記対象人物を検索する第１の検索処理を行う第１の検索手段と、
前記検索指示に係るクエリ画像から抽出された前記対象人物の特徴に基づいて、前記受付時点の後に前記撮像手段から入力された映像から抽出された人物の特徴を第２の検索範囲としてから前記対象人物を検索する第２の検索処理を行う第２の検索手段と
を有し、
前記第１の検索手段は、さらに前記受付時点から前記第２の検索手段による前記第２の検索処理の準備が完了し、前記第２の検索処理を開始するまでの間に入力された映像から抽出された特徴から前記対象人物を検索する処理を行うことを特徴とする情報処理装置。 A reception means that accepts search instructions for people,
Based on the characteristics extracted from the target person related to the search instruction, the characteristics of the person input from the imaging means before the reception time when the search instruction is received and extracted from the image stored in the storage means are first. As the search range of , the first search means for performing the first search process for searching the target person, and
Based on the characteristics of the target person extracted from the query image related to the search instruction, the characteristics of the person extracted from the image input from the imaging means after the reception time are set as the second search range, and then the target. It has a second search means for performing a second search process for searching for a person.
The first search means is further described from the video input from the time of the reception until the preparation for the second search process by the second search means is completed and the second search process is started. An information processing device characterized by performing a process of searching for the target person from the extracted features .

前記第１の検索手段の検索結果及び前記第２の検索手段の検索結果を統合する統合手段と、
前記統合手段による統合後の前記検索結果を表示する表示手段と
をさらに有することを特徴とする請求項１に記載の情報処理装置。 An integrated means that integrates the search results of the first search means and the search results of the second search means, and
The information processing apparatus according to claim 1, further comprising a display means for displaying the search result after integration by the integration means.

前記第１の検索手段は、前記受付時点より前に撮像手段から入力され、前記記憶手段に記憶された映像から抽出された人物の特徴を検索範囲とする前記第１の検索処理が終了した後、前記第２の検索処理が開始していない場合には、前記撮像手段によって新たに入力された映像から抽出された特徴を検索範囲として前記対象人物を検索することを特徴とする請求項１又は２に記載の情報処理装置。 The first search means is input from the imaging means before the reception time, and after the first search process in which the characteristics of the person extracted from the image stored in the storage means is the search range is completed. 1. When the second search process has not started, the target person is searched using the feature extracted from the video newly input by the imaging means as the search range. 2. The information processing apparatus according to 2.

前記第１の検索手段は、前記第２の検索処理が開始した場合に、前記受付時点から前記第２の検索手段による前記第２の検索処理の準備が完了し、前記第２の検索処理を開始するまでの間に入力された映像から抽出された特徴から前記対象人物を検索する処理を終了することを特徴とする請求項１乃至３何れか１項に記載の情報処理装置。 When the second search process is started, the first search means completes the preparation for the second search process by the second search means from the reception time, and performs the second search process. The information processing apparatus according to any one of claims 1 to 3, wherein the process of searching for the target person from the features extracted from the video input before the start is completed.

前記第２の検索手段は、前記第１の検索手段により前記対象人物が検出された場合に、前記対象人物が検出された画像と、前記クエリ画像と、に基づいて、前記対象人物を検索することを特徴とする請求項１乃至４何れか１項に記載の情報処理装置。 When the target person is detected by the first search means, the second search means searches for the target person based on the image in which the target person is detected and the query image. The information processing apparatus according to any one of claims 1 to 4, characterized in that.

前記第２の検索手段は、前記第１の検索手段が、前記受付時点より前に前記撮像手段から入力され、前記記憶手段に記憶された映像の特徴を対象とした前記第１の検索処理を終了した後に、前記第２の検索処理を開始することを特徴とする請求項５に記載の情報処理装置。 In the second search means, the first search means performs the first search process targeting the features of an image input from the imaging means before the reception time and stored in the storage means. The information processing apparatus according to claim 5, wherein the second search process is started after the completion.

前記第２の検索手段は、停止指示を受け付けた場合に、前記第２の検索処理を終了することを特徴とする請求項１乃至６何れか１項に記載の情報処理装置。 The information processing device according to any one of claims 1 to 6, wherein the second search means ends the second search process when a stop instruction is received.

前記表示手段は、前記クエリ画像と、前記検索結果と、を区別可能に表示することを特徴とする請求項２に記載の情報処理装置。 The information processing device according to claim 2, wherein the display means displays the query image and the search result in a distinguishable manner.

前記第１の検索手段及び前記第２の検索手段のうち少なくとも一方は、前記対象人物の顔の向きまたは前記対象人物の人体の向きが異なる画像を優先的に検出することを特徴とする請求項１乃至８何れか１項に記載の情報処理装置。 A claim, wherein at least one of the first search means and the second search means preferentially detects an image in which the orientation of the face of the target person or the orientation of the human body of the target person is different. The information processing apparatus according to any one of 1 to 8.

情報処理装置が実行する情報処理方法であって、
人物の検索指示を受け付ける受付ステップと、
前記検索指示に係る対象人物から抽出された特徴に基づいて、前記検索指示を受け付けた受付時点より前に撮像手段から入力され、記憶手段に記憶された映像から抽出された人物の特徴を第１の検索範囲として、前記対象人物を検索する第１の検索処理を行う第１の検索ステップと、
前記検索指示に係るクエリ画像から抽出された前記対象人物の特徴に基づいて、前記受付時点の後に前記撮像手段から入力された映像から抽出された人物の特徴を第２の検索範囲としてから前記対象人物を検索する第２の検索処理を行う第２の検索ステップと
を含み、
前記第１の検索ステップでは、さらに前記受付時点から前記第２の検索ステップにおける前記第２の検索処理の準備が完了し、前記第２の検索処理を開始するまでの間に入力された映像から抽出された特徴から前記対象人物を検索する処理を行うことを特徴とする情報処理方法。 It is an information processing method executed by an information processing device.
A reception step that accepts search instructions for people,
Based on the characteristics extracted from the target person related to the search instruction, the characteristics of the person input from the imaging means before the reception time when the search instruction is received and extracted from the image stored in the storage means are first. As the search range of , the first search step of performing the first search process for searching the target person, and
Based on the characteristics of the target person extracted from the query image related to the search instruction, the characteristics of the person extracted from the image input from the imaging means after the reception time are set as the second search range, and then the target. Includes a second search step that performs a second search process to search for a person.
From the in the first search step, further characterized by the step of providing from the reception time of the second search processing in the second search step is completed, it is input prior to initiating the second search processing video An information processing method characterized by performing a process of searching for the target person from the extracted features .

コンピュータを、
人物の検索指示を受け付ける受付手段と、
前記検索指示に係る対象人物から抽出された特徴に基づいて、前記検索指示を受け付けた受付時点より前に撮像手段から入力され、記憶手段に記憶された映像から抽出された人物の特徴を第１の検索範囲として、前記対象人物を検索する第１の検索処理を行う第１の検索手段と、
前記検索指示に係るクエリ画像から抽出された前記対象人物の特徴に基づいて、前記受付時点の後に前記撮像手段から入力された映像から抽出された人物の特徴を第２の検索範囲としてから前記対象人物を検索する第２の検索処理を行う第２の検索手段と
して機能させ、
前記第１の検索手段は、さらに前記受付時点から第２の検索手段による前記第２の検索処理の準備が完了し、前記第２の検索処理を開始するまでの間に入力された映像から抽出された特徴から前記対象人物を検索する処理を行うプログラム。 Computer,
A reception means that accepts search instructions for people,
Based on the characteristics extracted from the target person related to the search instruction, the characteristics of the person input from the imaging means before the reception time when the search instruction is received and extracted from the image stored in the storage means are first. As the search range of , the first search means for performing the first search process for searching the target person, and
Based on the characteristics of the target person extracted from the query image related to the search instruction, the characteristics of the person extracted from the image input from the imaging means after the reception time are set as the second search range, and then the target. To function as a second search means that performs a second search process to search for a person,
The first search means is further extracted from the video input between the time of the reception and the time when the preparation for the second search process by the second search means is completed and the second search process is started. A program that performs a process of searching for the target person from the features .