JP2009123095A

JP2009123095A - Image analysis device and image analysis method

Info

Publication number: JP2009123095A
Application number: JP2007298377A
Authority: JP
Inventors: Takahiro Mae; 孝宏前; Michiyo Matsui; 美智代松井; Yoshinori Okuma; 好憲大熊
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2007-11-16
Filing date: 2007-11-16
Publication date: 2009-06-04

Abstract

PROBLEM TO BE SOLVED: To provide an image analysis device capable of detecting coherence of the images in consideration of semantic content of the image. SOLUTION: The image analysis device 100 is provided, which is the image analysis device for analyzing the image consisting of a plurality of frame images arranged in time series and which is equipped with a segment creation part 130 for executing evaluation of continuity in two sequential frame images by judging whether the same object exists in the two frame images using the amount of characteristics based on a partial shape extracted from the frame image, and then dividing the image into a plurality of segments based on the evaluation result of the continuity, a face group creation part 140 for detecting a face image of a person from the frame image and grouping the frame image based on the degree of similarity of the detected face image, and an analysis information creation part 150 for creating the analysis information of the segment based on the classification result of the segment and frame image created respectively by the segment creation part 130, and the face group creation part 140. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、映像解析装置及び映像解析方法に関する。 The present invention relates to a video analysis apparatus and a video analysis method.

一般に、ＴＶ映像などの動画像中から利用者が再生や編集の目的で閲覧したい場面を簡単に呼び出したいときに、動画像中の任意の位置にチャプタポイント（区切り）を付加する方法が採られている。チャプタポイントを付加することによって、チャプタポイントを付加した位置の映像を簡単な操作で呼び出すことができ、チャプタポイント毎に区切られた単位（チャプタという）で映像を閲覧することができるようになる。 In general, when a user wants to easily call a scene that he / she wants to view for the purpose of playback or editing from a moving image such as a TV image, a method of adding chapter points (separators) to arbitrary positions in the moving image is employed. ing. By adding chapter points, the video at the position where the chapter points are added can be called by a simple operation, and the video can be viewed in units (called chapters) divided for each chapter point.

また、ＤＶＤレコーダ等の映像を記録する装置では、映像に自動的にチャプタポイントを付加して記録することができるものがある。チャプタポイントを自動的に付与する方法の一例としては、例えば、５分間隔や１０分間隔といった一定時間間隔で、自動的に動画像にチャプタポイントを付加してチャプタを作成するものがある。この場合、単純に固定時間長のチャプタが作成されるので、必ずしも利用者が見たい場面毎にチャプタが作成されるわけではない。 In addition, there are apparatuses that record video such as a DVD recorder that can automatically add chapter points to video and record it. As an example of a method for automatically assigning chapter points, there is a method in which chapter points are automatically added to a moving image at a constant time interval such as a 5-minute interval or a 10-minute interval to create a chapter. In this case, since a chapter having a fixed time length is simply created, a chapter is not necessarily created for each scene that the user wants to see.

一方、映像中から動画像の意味的なまとまりを自動的に検出し、検出されたまとまり毎にチャプタを作成する技術も考案されている。例えば、特許文献１には、利用者によって選択された任意の画像フレームの周辺部分を含む動画像のまとまりを検出する方法が開示されている。この方法では、画像フレームから抽出したカラーヒストグラムやエッジ特徴等を前後の画像フレームと比較することにより、動画像の意味的なまとまりを検出している。 On the other hand, a technique has been devised that automatically detects a semantic group of moving images from a video and creates a chapter for each detected group. For example, Patent Document 1 discloses a method for detecting a group of moving images including a peripheral portion of an arbitrary image frame selected by a user. In this method, a color histogram extracted from an image frame, edge features, and the like are compared with previous and next image frames to detect a semantic group of moving images.

また、特許文献２には、動画像中に登場する人物に着目して動画像のまとまりを検出する方法が開示されている。この方法では、予め利用者によって指定された登場人物の顔画像に一致する画像フレームを映像中から検出し、検出された画像フレームを基準として周辺部分をまとまりとして固定長で検出している。 Patent Document 2 discloses a method for detecting a group of moving images by paying attention to a person appearing in the moving image. In this method, an image frame that matches the face image of the character designated by the user in advance is detected from the video, and the peripheral portion is detected in a fixed length with the detected image frame as a reference.

特表２００４−５２６３７２号公報JP-T-2004-526372 特開２００１−２８５７８７号公報JP 2001-285787 A

しかしながら、特許文献１に記載の技術では、カラーヒストグラムやエッジ特徴を用いているため、映像中の人や物などの被写体が移動した場合や映像内の明度変化が大きい場合、被写体同士で遮蔽が発生したりする場合などに、カラーヒストグラムやエッジ特徴の特徴量が大きく変化し、隣り合う画像フレーム同士の類似性を検出出来ずに不連続なシーンと判定してしまうことが起こる。このため、意味的に連続した動画像のまとまりを正しく検出することができず、細かく分割されたチャプタを作成してしまう可能性がある。 However, since the technique described in Patent Document 1 uses color histograms and edge features, when subjects such as people and objects in the video move or when the brightness change in the video is large, the subjects are shielded from each other. When it occurs, the feature amount of the color histogram or edge feature changes greatly, and the similarity between adjacent image frames cannot be detected and it is determined that the scene is discontinuous. For this reason, a group of semantically continuous moving images cannot be detected correctly, and there is a possibility that chapters that are finely divided are created.

一方、特許文献２に記載の技術では、検出された顔画像を含む画像フレームの前後の固定長フレームをチャプタとして検出しているため、意味的に連続した動画像のまとまりを途中で分割してしまうなど、映像内容の意味合いを考慮せずにチャプターを作成してしまうという問題があった。 On the other hand, in the technique described in Patent Document 2, since a fixed-length frame before and after an image frame including a detected face image is detected as a chapter, a group of semantically continuous moving images is divided in the middle. There is a problem that chapters are created without considering the meaning of the video content.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、映像の意味内容を考慮して映像のまとまりを検出することが可能な、新規かつ改良された映像解析装置及び映像解析方法を提供することにある。 Accordingly, the present invention has been made in view of the above problems, and an object of the present invention is a new and improved technique capable of detecting a group of videos in consideration of the meaning content of the video. An object of the present invention is to provide an image analysis apparatus and an image analysis method.

上記課題を解決するために、本発明のある観点によれば、時系列に配列された複数のフレーム画像からなる映像を解析する映像解析装置であって、フレーム画像より抽出される局所形状に基づく特徴量を用いて、連続する２つのフレーム画像に同じ物体が存在するか否かを判断することによって、２つのフレーム画像の連続性の評価を行い、連続性の評価結果に基づいて映像を複数のセグメントに分割するセグメント作成部と、フレーム画像から人物の顔画像を検出し、検出された顔画像の類似度に基づいてフレーム画像をグループ化する顔グループ作成部と、セグメント作成部により作成されたセグメントと顔グループ作成部によるフレーム画像の分類結果に基づいて、セグメントの解析情報を作成する解析情報作成部と、を備える映像解析装置が提供される。 In order to solve the above-described problem, according to an aspect of the present invention, there is provided a video analysis device that analyzes a video composed of a plurality of frame images arranged in time series, and is based on a local shape extracted from the frame image. The feature quantity is used to evaluate the continuity of two frame images by determining whether or not the same object exists in the two consecutive frame images, and a plurality of videos are obtained based on the continuity evaluation result. A segment creation unit that divides the image into segments, a face group creation unit that detects a face image of a person from the frame image, and groups the frame images based on the similarity of the detected face images, and a segment creation unit And an analysis information creation unit that creates segment analysis information based on the segmented result of the frame image by the segment and face group creation unit. Apparatus is provided.

かかる構成により、意味的に連続した映像のまとまりを細かく分割することなく、映像の意味内容を考慮した一連のセグメントとして分割することが可能となる。これにより、利用者にとって利便性の高い効果的なセグメント作成を行うことができ、利用者は、意味内容が連続した映像のまとまり毎に視聴したい箇所の映像を選択することが可能となる。 With such a configuration, it is possible to divide a group of semantically continuous videos into a series of segments considering the semantic content of the video without finely dividing the group. Thus, it is possible to create an effective segment that is convenient for the user, and the user can select a video of a portion that the user wants to view for each group of videos having continuous semantic contents.

また、セグメント作成部は、各フレーム画像の局所形状に基づく特徴量を抽出する特徴量抽出部と、連続する２つのフレーム画像の特徴量を比較することによって、２つのフレーム画像中に同じ物体が含まれるか否かを検出する物体検出部と、物体検出部による物体検出結果に基づいて、連続する２つのフレーム画像の連続性の評価値を計算し、評価値が所定の閾値より低い場合に２つのフレーム画像の間で映像を分割してセグメントを作成するセグメント生成部と、を含むようにしてもよい。 In addition, the segment creation unit compares a feature amount extraction unit that extracts a feature amount based on the local shape of each frame image with a feature amount of two consecutive frame images, so that the same object is found in the two frame images. When the evaluation value is lower than a predetermined threshold value, an object detection unit that detects whether or not the image is included and an evaluation value of continuity between two consecutive frame images are calculated based on the object detection result by the object detection unit A segment generation unit that divides a video between two frame images to create a segment.

このように、前後のフレーム画像の局所形状に基づく特徴量を用いて、フレーム画像に含まれる物体を認識することによって映像の連続性の評価が行われるため、従来の画像特徴のみを用いる方法を比較して、フレーム間の細かな画像の変動による影響を少なくすることができる。従って、意味的に連続した映像を別のセグメントとして分割してしまう可能性を低減することが可能となる。 As described above, since the continuity of the video is evaluated by recognizing the object included in the frame image using the feature amount based on the local shape of the frame images before and after, the conventional method using only the image feature is used. In comparison, it is possible to reduce the influence of fine image fluctuations between frames. Therefore, it is possible to reduce the possibility of dividing a semantically continuous video as another segment.

また、顔グループ作成部は、フレーム画像から顔画像を検出する顔画像検出部と、顔画像検出部によって検出された顔画像を比較し、類似する顔画像が抽出された連続する複数のフレーム画像を１つの顔グループとするグループ作成部と、グループ作成部によって作成された各顔グループのフレーム画像から１以上の代表画像を選択する代表画像選択部と、を含むようにしてもよい。 The face group creation unit compares a face image detection unit that detects a face image from a frame image and a face image detected by the face image detection unit, and a plurality of continuous frame images from which similar face images are extracted. May be included as a single face group, and a representative image selection unit that selects one or more representative images from the frame images of each face group created by the group creation unit.

これにより、映像に登場する人物毎にフレームをグループ化することができ、さらに、各グループに代表画像が対応付けられることによって、代表画像をそのグループが属するセグメントのサムネイル画像として用いることができる。 As a result, frames can be grouped for each person appearing in the video, and a representative image can be used as a thumbnail image of a segment to which the group belongs by associating a representative image with each group.

また、解析情報作成部は、セグメントに含まれる顔グループの代表画像をセグメントの代表画像として、セグメントの解析情報を作成するようにしてもよい。これにより、顔グループの代表画像をセグメントのサムネイル画像として利用者に提示することができ、利用者は、各セグメントに登場する人物を見て視聴したいセグメントを選択することが可能となる。 The analysis information creation unit may create the segment analysis information using the representative image of the face group included in the segment as the representative image of the segment. Thereby, the representative image of the face group can be presented to the user as a thumbnail image of the segment, and the user can select a segment he / she wants to view by looking at the person appearing in each segment.

また、セグメントの解析情報を表示装置に出力する解析情報出力部と、利用者によって選択されたセグメントの映像を再生し、表示装置に出力する映像出力部をさらに含むようにしてもよい。これにより、利用者に作成されたセグメントの情報を提示することができ、利用者は、作成されたセグメントの単位で映像を視聴することが可能となる。 An analysis information output unit that outputs segment analysis information to the display device and a video output unit that reproduces the video of the segment selected by the user and outputs the segment video to the display device may be further included. Thereby, the information of the created segment can be presented to the user, and the user can view the video in units of the created segment.

また、解析情報出力部は、セグメントの時系列に従って解析情報を出力するようにしてもよい。あるいは、解析情報出力部は、セグメントに含まれる顔グループに対応する顔画像に対応付けられる人物によりセグメントを分類し、分類された結果に従って解析情報を出力するようにしてもよい。 The analysis information output unit may output the analysis information according to the time series of the segments. Alternatively, the analysis information output unit may classify the segments by a person associated with the face image corresponding to the face group included in the segment, and output the analysis information according to the classified result.

また、顔グループ作成部によって検出された顔画像のフレーム画像中の位置に基づいて、フレーム画像中の人物が存在する領域を計算する人物領域作成部をさらに備え、セグメント作成部は、人物領域作成部によって計算された領域の特徴量を比較することによってセグメントを作成するようにしてもよい。このように、人物領域の物体検出結果を用いることにより、顔が他の物体で遮蔽されたり、顔が正面に向いていない場合等においても、画像中に人物が存在することを認識することができる。これにより、人物が登場するフレームをより正確に検出してグループ作成を行うことが可能となる。 In addition, the image forming apparatus further includes a person area creating unit that calculates an area where the person in the frame image exists based on the position in the frame image of the face image detected by the face group creating unit. A segment may be created by comparing the feature quantities of the areas calculated by the section. In this way, by using the object detection result of the person area, it is possible to recognize that a person exists in the image even when the face is shielded by another object or the face is not facing the front. it can. This makes it possible to create a group by more accurately detecting a frame in which a person appears.

また、利用者によって選択された任意の検索画像が入力される検索画像入力部と、検索画像と各フレーム画像との類似度を評価する検索画像評価部と、検索画像から人物の顔画像を検出し、当該顔画像と各フレーム画像から検出された顔画像との類似度を評価する顔類似性評価部と、をさらに備え、解析情報作成部は、検索画像評価部による評価結果と顔類似性評価部による評価結果とに基づいて、解析情報の中から、検索画像に関連するセグメントの解析情報を抽出するようにしてもよい。これにより、予め利用者によって選択された画像に関連する解析情報のみを抽出することができ、利用者にとって関心を持つセグメントの情報のみを提示することが可能となる。 In addition, a search image input unit for inputting an arbitrary search image selected by the user, a search image evaluation unit for evaluating the similarity between the search image and each frame image, and detecting a human face image from the search image And a face similarity evaluation unit that evaluates the degree of similarity between the face image and the face image detected from each frame image, and the analysis information creation unit includes the evaluation result and the face similarity by the search image evaluation unit. Based on the evaluation result by the evaluation unit, the analysis information of the segment related to the search image may be extracted from the analysis information. Thereby, it is possible to extract only the analysis information related to the image selected in advance by the user, and it is possible to present only the information of the segment that is of interest to the user.

また、解析情報作成部は、検索画像評価部によって検索画像との類似度が高いと評価されたフレーム画像が含まれるセグメントの解析情報を抽出するようにしてもよい。 The analysis information creation unit may extract analysis information of a segment including a frame image evaluated by the search image evaluation unit as having a high similarity to the search image.

あるいは、解析情報作成部は、顔類似性評価部によって検索画像から検出された顔画像との類似度が高いと評価された顔画像を含むフレーム画像が含まれるセグメントの解析情報を抽出するようにしてもよい。 Alternatively, the analysis information creating unit extracts the analysis information of the segment including the frame image including the face image evaluated as having high similarity with the face image detected from the search image by the face similarity evaluation unit. May be.

また、上記課題を解決するために、本発明の別の観点によれば、時系列に配列された複数のフレーム画像からなる映像を解析する映像解析方法であって、フレーム画像より抽出される局所形状に基づく特徴量を用いて、連続する２つのフレーム画像に同じ物体が存在するか否かを判断することによって、２つのフレーム画像の連続性の評価を行い、連続性の評価結果に基づいて前記映像を複数のセグメントに分割するセグメント作成ステップと、フレーム画像から人物の顔画像を検出し、検出された顔画像の類似度に基づいてフレーム画像をグループ化する顔グループ作成ステップと、セグメント作成ステップにおいて作成されたセグメントと、顔グループ作成ステップにおけるフレーム画像の分類結果に基づいて、セグメントの解析情報を作成する解析情報作成ステップと、を含むことを特徴とする、映像解析方法が提供される。 In order to solve the above problem, according to another aspect of the present invention, there is provided a video analysis method for analyzing a video composed of a plurality of frame images arranged in time series, wherein a local image extracted from the frame image is analyzed. The continuity of the two frame images is evaluated by determining whether or not the same object exists in the two consecutive frame images using the feature quantity based on the shape, and based on the evaluation result of the continuity A segment creating step for dividing the video into a plurality of segments, a face group creating step for detecting a human face image from the frame image, and grouping the frame images based on the detected similarity of the face images, and a segment creating Based on the segment created in the step and the frame image classification result in the face group creation step, segment analysis information An analysis information generating step of forming, characterized in that it comprises a video analysis method is provided.

以上説明したように本発明によれば、映像の意味内容を考慮して映像のまとまりを検出することが可能となる。 As described above, according to the present invention, it is possible to detect a group of videos in consideration of the semantic content of the video.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

本発明は、ＴＶ映像などの動画像の中から任意の映像のまとまりを検出し、そのまとまりを代表する画像を選択することで、利用者が動画像中の閲覧したい場面を再生表示できる映像解析装置及び映像解析方法に関するものである。本発明にかかる映像解析装置及び映像解析方法は、画像認識技術を用いてフレーム画像内の物体を検出することで、映像シーンのまとまりを細かく分割することなく、映像の意味内容を考慮した一連のセグメントとして検出することを特徴とする。 The present invention is a video analysis that enables a user to reproduce and display a scene that a user wants to view by detecting a set of arbitrary videos from a moving picture such as a TV picture and selecting an image representative of the set. The present invention relates to an apparatus and a video analysis method. A video analysis apparatus and a video analysis method according to the present invention are a series of video images that consider the semantic content of a video scene without finely dividing a group of video scenes by detecting an object in a frame image using image recognition technology. It is detected as a segment.

具体的には、連続するフレーム画像同士の部分的な類似性に着目し、フレーム画像中から局所形状に基づく特徴量を抽出し、前後のフレーム画像の特徴量を比較することにより、両画像に同じ物体が存在するか否かを判断する。抽出された局所的な特徴量を用いることにより、画像中に存在する物体の様々な変動に対しても追跡することができ、カラ一分布や画像フレームのエッジ特徴を用いる手法と比べて映像の連続性を頑健に評価することが可能となる。 Specifically, paying attention to the partial similarity between consecutive frame images, extract feature values based on local shape from the frame images, and compare the feature values of the previous and next frame images. It is determined whether or not the same object exists. By using the extracted local features, it is possible to track various changes in the object present in the image, and the image is compared with the method using the color distribution and the edge feature of the image frame. It is possible to robustly evaluate continuity.

（第１の実施形態）
まず、図１を参照して、本発明の第１の実施形態にかかる映像解析装置について説明する。図１は、本発明の第１の実施形態にかかる映像解析装置１００の概略構成を示すブロック図である。本実施形態にかかる映像解析装置１００は、図１に示すように、映像取得部１１０と、映像データ記憶部１２０と、セグメント作成部１３０と、顔グループ作成部１４０と、解析情報作成部１５０と、解析情報記憶部１６０と、再生部１７０と、表示部１８０とにより構成される。 (First embodiment)
First, with reference to FIG. 1, a video analysis apparatus according to a first embodiment of the present invention will be described. FIG. 1 is a block diagram showing a schematic configuration of a video analysis apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, the video analysis apparatus 100 according to the present embodiment includes a video acquisition unit 110, a video data storage unit 120, a segment creation unit 130, a face group creation unit 140, and an analysis information creation unit 150. The analysis information storage unit 160, the reproduction unit 170, and the display unit 180 are configured.

映像データ記憶部１２０は、１以上の映像データが記録されている記憶媒体である。映像取得部１１０は、映像データ記憶部１２０から映像データを取得し、セグメント作成部１３０及び顔グループ作成部１４０に映像データを入力する。セグメント作成部１３０は、映像取得部１１０から入力された映像を意味的に連続した単位で分割したセグメントを作成する。顔グループ作成部１４０は、映像中に登場する人物の顔画像を認識し、顔画像の類似性に基づいて映像のフレーム画像をグループ化（顔グループという）する。解析情報作成部１５０は、セグメント作成部１３０によるセグメント作成結果と、顔グループ作成部１４０によって作成された顔グループとを用いて、映像の解析情報を作成する。解析情報記憶部１６０は、解析情報作成部１５０によって作成された解析情報が格納される記憶媒体である。再生部１７０は、解析情報作成部１５０によって作成された解析情報を読み出して表示部１８０に表示させ、利用者によって選択されたセグメントの映像を再生する。表示部１８０は、解析情報や映像データの動画像を表示するディスプレイ等の表示装置である。 The video data storage unit 120 is a storage medium in which one or more video data is recorded. The video acquisition unit 110 acquires video data from the video data storage unit 120 and inputs the video data to the segment creation unit 130 and the face group creation unit 140. The segment creation unit 130 creates a segment obtained by dividing the video input from the video acquisition unit 110 in units that are semantically continuous. The face group creation unit 140 recognizes a face image of a person appearing in the video and groups the frame images of the video (referred to as a face group) based on the similarity of the face images. The analysis information creation unit 150 creates video analysis information using the segment creation result by the segment creation unit 130 and the face group created by the face group creation unit 140. The analysis information storage unit 160 is a storage medium in which the analysis information created by the analysis information creation unit 150 is stored. The reproduction unit 170 reads the analysis information created by the analysis information creation unit 150 and displays the analysis information on the display unit 180, and reproduces the video of the segment selected by the user. The display unit 180 is a display device such as a display that displays analysis information and moving images of video data.

（セグメント作成部１３０）
図２を参照して、セグメント作成部１３０の詳細な構成についてさらに説明する。図２は、本実施形態にかかる映像解析装置１００のセグメント作成部１３０の概略構成を示すブロック図である。図２に示すように、セグメント作成部１３０は、特徴量抽出部１３１と、特徴量記憶部１３２と、物体検出部１３３と、セグメント生成部１３４とにより構成される。 (Segment creation unit 130)
With reference to FIG. 2, the detailed structure of the segment creation part 130 is further demonstrated. FIG. 2 is a block diagram illustrating a schematic configuration of the segment creation unit 130 of the video analysis apparatus 100 according to the present embodiment. As shown in FIG. 2, the segment creation unit 130 includes a feature amount extraction unit 131, a feature amount storage unit 132, an object detection unit 133, and a segment generation unit 134.

特徴量抽出部１３１は、動画像を構成する各フレーム画像から特徴量を抽出する。画像からの特徴量の抽出は、下記の非特許文献１〜３に記載されている方法で行うことができる。 The feature amount extraction unit 131 extracts a feature amount from each frame image constituting the moving image. Extraction of the feature amount from the image can be performed by a method described in Non-Patent Documents 1 to 3 below.

非特許文献１：Ｃ．ＨａｒｒｉｓａｎｄＭ．Ｓｔｅｐｈｅｎｓ，“Ａｃｏｍｂｉｎｅｄｃｏｒｎｅｒａｎｄｅｄｇｅｄｅｔｅｃｔｏｒ”，Ｐｒｏｃ．ＡｌｖｅｙＶｉｓｉｏｎＣｏｎｆ．，ｐｐ．１４７−１５１
非特許文献２：Ｄ．Ｌｏｗｅ，“Ｄｉｓｔｉｎｃｔｉｖｅｉｍａｇｅｆｅａｔｕｒｅｓｆｒｏｍｓｃａｌｅ−ｉｎｖａｒｉａｎｔｋｅｙｐｏｉｎｔｓ” ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，６０，２（２００４），ｐｐ．９１−１１０
非特許文献３：Ｎ．ＳｅｂｅａｎｄＭ．Ｓ．Ｌｅｗ，“ＣＯＭＰＡＲＩＮＧＳＡＬＩＥＮＴＰＯＩＮＴＤＥＴＥＣＴＯＲＳ” Ｐｒｏｃ．ＩＥＥＥＩｎｔ．Ｃｏｎｆ．ｏｎＭｕｌｔｉｍｅｄｉａａｎｄＥｘｐｏ，ｐｐ．６５−６８ Non-Patent Document 1: C.I. Harris and M.M. Stephens, “A combined corner and edge detector”, Proc. Alvey Vision Conf. , Pp. 147-151
Non-Patent Document 2: D.D. Low, “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60, 2 (2004), pp. 199-201. 91-110
Non-Patent Document 3: N.R. Sebe and M.M. S. Lew, “COMPARING SALIENT POINT DETECTORS” Proc. IEEE Int. Conf. on Multimedia and Expo, pp. 65-68

ここで、フレーム画像からの特徴量の抽出は、フレーム画像に対し、ＨａｒｒｉｓｆｕｎｃｔｉｏｎやＤｉｆｆｅｒｅｎｃｅｏｆＧａｕｓｓｉａｎｆｕｎｃｔｉｏｎなどを用いて画像フレーム内にある物体形状の局所的な突出点を検出するフィルタを適用したり（非特許文献１、２）、ウェーブレットを用いて物体形状の局所的な突出点を検出するフィルタを適用する（非特許文献３）等の従来既知の手法を用いて物体固有の特徴点を抽出することによって実現することができる。そして、抽出された特徴点の周辺画素の輝度情報を各特徴点の特徴量として抽出する。抽出された特徴量のデータは、特徴量記憶部１３２に格納される。 Here, the feature amount is extracted from the frame image by applying a filter for detecting local protruding points of the object shape in the image frame using Harris function, Difference of Gaussian function, or the like. (Non-Patent Documents 1 and 2) Extracting feature points unique to an object using a conventionally known method such as applying a filter for detecting local protruding points of an object shape using wavelets (Non-Patent Document 3) Can be realized. Then, luminance information of pixels around the extracted feature point is extracted as a feature amount of each feature point. The extracted feature quantity data is stored in the feature quantity storage unit 132.

物体検出部１３３は、特徴量抽出部１３１によって抽出された特徴量のデータに基づいて、各フレーム画像に同じ物体が存在するか否かを検出する。物体検出部１３３は、前後のフレーム画像から得られた特徴量を、例えばユークリッド距離などを用いて比較することにより、前のフレーム画像中に含まれる物体と同一の物体が次のフレーム画像中にも存在するか否かを検出することができる。 The object detection unit 133 detects whether or not the same object exists in each frame image based on the feature amount data extracted by the feature amount extraction unit 131. The object detection unit 133 compares the feature amounts obtained from the previous and next frame images using, for example, the Euclidean distance, so that the same object as the object included in the previous frame image is included in the next frame image. Can also be detected.

図３に、物体検出部１３３による物体検出の例を示す。図３で、フレーム画像１３５ａ〜１３５ｃは動画像中の連続したフレーム画像を示しており、画像中の丸印は、特徴量抽出部１３１によって抽出された局所形状の特徴点を表している。物体検出部１３３は、フレーム画像１３５ａと１３５ｂ、フレーム画像１３５ｂと１３５ｃの各特徴点の特徴量を比較することにより、画像中に含まれる物体の特徴点同士の対応付けを行い、同じ物体が存在するかどうかを判定する。 FIG. 3 shows an example of object detection by the object detection unit 133. In FIG. 3, frame images 135 a to 135 c indicate continuous frame images in the moving image, and a circle in the image represents a feature point of the local shape extracted by the feature amount extraction unit 131. The object detection unit 133 compares the feature amounts of the feature points of the frame images 135a and 135b and the frame images 135b and 135c, thereby associating the feature points of the objects included in the images, and the same object exists. Determine whether to do.

セグメント生成部１３４は、物体検出部１３３による物体検出結果に基づいてフレーム間の連続性を評価し、連続性の評価が高いフレーム同士が１つのセグメントとなるようにセグメントを生成する。例えば、物体検出部１３３による物体検出の結果、前後のフレームで同じ物体が存在した場合には、セグメント生成部１３４はフレーム間の連続性の評価値を高く算出し、同じ物体が存在しない場合には評価値を低く算出する。なお、連続性の評価値は、例えば、物体検出部１３３で特徴量の比較に用いたユークリッド距離等に基づいて算出するようにしてもよい。 The segment generation unit 134 evaluates continuity between frames based on the object detection result by the object detection unit 133, and generates a segment so that frames with high continuity evaluation become one segment. For example, as a result of object detection by the object detection unit 133, when the same object exists in the preceding and following frames, the segment generation unit 134 calculates a high evaluation value of continuity between frames, and when the same object does not exist Calculates the evaluation value low. Note that the continuity evaluation value may be calculated based on, for example, the Euclidean distance used for the feature amount comparison by the object detection unit 133.

このようにして算出された連続性の評価値に基づいて、セグメント生成部１３４は、フレーム間の変化が大きい（連続性の評価値が低い）と判断される位置で動画像を分割し、セグメントを作成する。図４は、セグメント生成部１３４によるセグメントの生成処理の例を示したものである。図４には、１０個のフレーム画像からなる動画像において、連続する２つのフレーム間の連続性を評価した結果が示されている。図４の例では、例えば、フレーム１と２、フレーム２と３、フレーム３と４の連続性の評価値がそれぞれ７０、８０、７０と高い値であるのに対し、フレーム４と５の連続性評価値は１０と低い値になっている。従って、この場合、フレーム１から４までが連続性が高い１つのまとまりであると判断することができる。 Based on the continuity evaluation value calculated in this manner, the segment generation unit 134 divides the moving image at a position where a change between frames is large (the continuity evaluation value is low), and segments Create FIG. 4 shows an example of segment generation processing by the segment generation unit 134. FIG. 4 shows the result of evaluating the continuity between two consecutive frames in a moving image composed of 10 frame images. In the example of FIG. 4, for example, the continuity evaluation values of frames 1 and 2, frames 2 and 3, and frames 3 and 4 are high values of 70, 80, and 70, respectively. The sex evaluation value is as low as 10. Therefore, in this case, it can be determined that the frames 1 to 4 are one unit having high continuity.

セグメントを作成する場合にフレーム間の連続性が高いか低いかを判断する基準として、予め連続性の評価値について閾値を設定し、閾値と評価値との比較により判断するようにしてもよい。図４の例では、閾値を３０と設定しているため、セグメント生成部１３４は、評価値が閾値３０未満であるフレーム４と５との間、及びフレーム８と９との間が変化が大きい（連続性が低い）と判断し、フレーム４と５との間、及びフレーム８と９との間で動画像を分割する。従って、この場合は、セグメントＡ、Ｂ及びＣの３つのセグメントが作成される。 As a reference for determining whether the continuity between frames is high or low when creating a segment, a threshold value may be set in advance for the evaluation value of continuity, and the determination may be made by comparing the threshold value and the evaluation value. In the example of FIG. 4, since the threshold value is set to 30, the segment generation unit 134 has a large change between the frames 4 and 5 where the evaluation value is less than the threshold value 30 and between the frames 8 and 9. (Continuity is low), and the moving image is divided between frames 4 and 5 and between frames 8 and 9. Therefore, in this case, three segments A, B, and C are created.

（顔グループ作成部１４０）
次に、図５を参照して、顔グループ作成部１４０の詳細な構成についてさらに説明する。図５は、本実施形態にかかる映像解析装置１００の顔グループ作成部１４０の概略構成を示すブロック図である。図５に示すように、顔グループ作成部１４０は、顔画像検出部１４１と、顔画像記憶部１４２と、グループ作成部１４３と、代表画像選択部１４４とにより構成される。 (Face group creation unit 140)
Next, the detailed configuration of the face group creation unit 140 will be further described with reference to FIG. FIG. 5 is a block diagram illustrating a schematic configuration of the face group creation unit 140 of the video analysis apparatus 100 according to the present embodiment. As shown in FIG. 5, the face group creation unit 140 includes a face image detection unit 141, a face image storage unit 142, a group creation unit 143, and a representative image selection unit 144.

顔画像検出部１４１は、各フレーム画像中から人物の顔を検出する。顔検出の方法としては、例えば、上述した特許文献２（特開２００１−２８５７８７号公報）に記載されているような既知の顔検出技術を適用してもよい。検出された顔の画像は、顔画像記憶部１４２に記録される。 The face image detection unit 141 detects a human face from each frame image. As a face detection method, for example, a known face detection technique described in Patent Document 2 (Japanese Patent Laid-Open No. 2001-285787) described above may be applied. The detected face image is recorded in the face image storage unit 142.

グループ作成部１４３は、検出された顔と顔画像記憶部１４２に記録された顔との類似評価を行い、類似した顔が検出されるフレーム画像を同じグループ（顔グループ）とする。顔の類似評価には、例えば、上述した特許文献２（特開２００１−２８５７８７号公報）に記載されているような既知の技術を適用してもよい。また、グループ作成部１４３は、例えば、類似した顔が連続して検出されることを判定条件として顔グループを作成するようにしてもよい。 The group creation unit 143 performs similarity evaluation between the detected face and the face recorded in the face image storage unit 142, and sets frame images from which similar faces are detected as the same group (face group). For example, a known technique described in Patent Document 2 (Japanese Patent Laid-Open No. 2001-285787) described above may be applied to the similarity evaluation of the face. In addition, the group creation unit 143 may create a face group, for example, on the condition that similar faces are detected continuously.

図６には、類似した顔が連続して検出されることを判定条件として顔グループを作成した場合の例を示している。図６の例では、１０個のフレーム画像から６つの顔画像が検出されている。フレーム２、３及び４から検出された３つの顔と、次のフレーム５及び６から検出された２つの顔は、顔の類似性が低いため異なるグループとしてグループａとグループｂとに分けられる。一方、最後のフレーム１０から検出された顔は、グループａに含まれる顔画像の人物と同じ人物Ｘの顔であるが、連続して検出されていないため、フレーム１０は別のグループｃとなる。グループ作成の結果は、図６に示すように、各グループの画像に含まれる人物情報（画像に含まれる人物のＩＤ）とそのグループの最初と最後のフレーム番号とが記録される。 FIG. 6 shows an example in which a face group is created on the condition that similar faces are detected continuously. In the example of FIG. 6, six face images are detected from 10 frame images. The three faces detected from the frames 2, 3 and 4 and the two faces detected from the next frames 5 and 6 are divided into groups a and b as different groups because of the low similarity of the faces. On the other hand, the face detected from the last frame 10 is the face of the same person X as the person of the face image included in the group a, but is not continuously detected, so the frame 10 is in another group c. . As a result of the group creation, as shown in FIG. 6, person information (ID of person included in the image) included in the image of each group and the first and last frame numbers of the group are recorded.

また、１つのフレーム画像から複数の顔が検出される場合には、検出された顔の数だけグループ作成処理が繰り返される。この場合、検出された顔が同じ人物であれば同じ顔グループに分類され、違う顔であれば違うグループに分類される。例えば、１つのフレーム画像から人物Ｘの顔が２つ検出された場合は１つのグループに分類し、人物Ｘと人物Ｙの２人の人物が検出された場合は２つのグループに分類される。 When a plurality of faces are detected from one frame image, the group creation process is repeated for the number of detected faces. In this case, if the detected face is the same person, it is classified into the same face group, and if it is a different face, it is classified into a different group. For example, when two faces of the person X are detected from one frame image, the face is classified into one group, and when two persons of the person X and the person Y are detected, they are classified into two groups.

代表画像選択部１４４は、各グループの代表画像として、顔グループに含まれるフレーム画像のうち任意のフレーム画像を選択し、代表画像とフレーム番号とを記録する。代表画像の選択方法は、顔グループ内の最初のフレーム画像を代表画像としてもよいし、あるいは、含まれる顔画像の面積が最も大きいものを代表画像としてもよい。図７は、代表画像選択部１４４による代表画像の選択処理の例を示した図である。図７に示す例は、最初のフレーム画像を代表画像として選択する例であり、グループＡに属するフレーム画像のうち最初のフレーム２が代表画像として選択されている。 The representative image selection unit 144 selects an arbitrary frame image from the frame images included in the face group as the representative image of each group, and records the representative image and the frame number. As a representative image selection method, the first frame image in the face group may be used as the representative image, or the image having the largest area of the included face image may be used as the representative image. FIG. 7 is a diagram illustrating an example of representative image selection processing by the representative image selection unit 144. The example shown in FIG. 7 is an example in which the first frame image is selected as the representative image, and the first frame 2 among the frame images belonging to the group A is selected as the representative image.

（解析情報作成部１５０）
次に、解析情報作成部１５０の詳細についてさらに説明する。解析情報作成部１５０は、セグメント作成部１３０によって作成されたセグメントと、顔グループ作成部１４０によるグループ化の結果とに基づいて解析情報を作成する。図８は、解析情報作成部１５０による解析情報の作成例を示したものである。解析情報には、各セグメントの開始フレームと終了フレームの番号、各人物の代表画像と、人物情報（人物のＩＤ等）が記録される。 (Analysis information creation unit 150)
Next, details of the analysis information creation unit 150 will be further described. The analysis information creation unit 150 creates analysis information based on the segment created by the segment creation unit 130 and the grouping result by the face group creation unit 140. FIG. 8 shows an example of creation of analysis information by the analysis information creation unit 150. In the analysis information, the number of the start frame and the end frame of each segment, a representative image of each person, and person information (person ID, etc.) are recorded.

（再生部１７０）
次に、図９を参照して、再生部１７０の詳細についてさらに説明する。図９は、本実施形態にかかる映像解析装置１００の再生部１７０の概略構成を示すブロック図である。図９に示すように、再生部１７０は、表示形式選択部１７１と、解析情報集計部１７２と、解析情報出力部１７３と、セグメント選択部１７４と、映像出力部１７５とにより構成される。 (Playback unit 170)
Next, the details of the reproducing unit 170 will be further described with reference to FIG. FIG. 9 is a block diagram illustrating a schematic configuration of the playback unit 170 of the video analysis device 100 according to the present embodiment. As shown in FIG. 9, the playback unit 170 includes a display format selection unit 171, an analysis information totaling unit 172, an analysis information output unit 173, a segment selection unit 174, and a video output unit 175.

表示形式選択部１７１は、利用者が解析情報の表示形式を選択するための機能部である。解析情報の表示形式には、例えば、時系列に従って表示する形式や、登場人物の属性（性別、年齢等）に従って表示する形式等が挙げられる。この登場人物の属性は、顔画像記憶部１４２に記憶された顔画像を用いて、下記の非特許文献４に記載される方法などを用いて取得することができる。
非特許文献４：瀧川えりな、細井聖、“顔画像による自動性別・年代推定”、ＯＭＲＯＮＴＥＣＨＮＩＣＳ、Ｖｏｌ．４３、Ｎｏ．１、ｐｐ．３７−４１ The display format selection unit 171 is a functional unit for the user to select the display format of analysis information. Examples of the display format of the analysis information include a display format according to time series, a display format according to the attributes of the characters (gender, age, etc.), and the like. The attributes of the characters can be acquired by using the face image stored in the face image storage unit 142 and using the method described in Non-Patent Document 4 below.
Non-Patent Document 4: Erina Kajikawa, Kiyoshi Hosoi, “Automatic gender and age estimation using face images”, OMRON TECHNICS, Vol. 43, no. 1, pp. 37-41

解析情報集計部１７２は、表示形式選択部１７１において選択された表示形式に合わせて解析情報を表示するために集計を行う。図１０は、人物毎に表示する表示形式が選択された場合の、解析情報集計部１７２による解析情報の集計例を示したものである。図１０の例では、人物毎に、その人物が含まれるセグメントとセグメントに含まれる代表画像とが解析情報として集計されている。例えば、人物Ｘは、セグメントＡ及びＣに含まれているため、人物Ｘに関するセグメントとしてセグメントＡ及びＣの情報（開始フレーム、終了フレーム）と、各セグメントの代表画像とが収集される。 The analysis information totaling unit 172 performs totalization to display analysis information in accordance with the display format selected by the display format selection unit 171. FIG. 10 shows an example of totaling analysis information by the analysis information totaling unit 172 when a display format to be displayed for each person is selected. In the example of FIG. 10, for each person, the segment including the person and the representative image included in the segment are tabulated as analysis information. For example, since the person X is included in the segments A and C, information on the segments A and C (start frame and end frame) and a representative image of each segment are collected as segments related to the person X.

解析情報出力部１７３は、解析情報集計部１７２によって集計された情報を表示部１８０に出力する。例えば、解析情報集計部１７２において、図１０に示した例のように人物毎の集計を行った場合、人物毎に集計されたセグメントの代表画像の一覧を表示したり、あるいは、人物の一覧を表示させ、その中で利用者によって選択された人物が登場するセグメントの代表画像を表示するようにしてもよい。 The analysis information output unit 173 outputs the information aggregated by the analysis information aggregation unit 172 to the display unit 180. For example, when the analysis information totaling unit 172 performs totalization for each person as in the example shown in FIG. 10, a list of representative images of segments totaled for each person is displayed, or a list of persons is displayed. A representative image of a segment in which a person selected by the user appears may be displayed.

セグメント選択部１７４は、解析情報出力部１７３によって表示部１８０に出力された解析情報を見て、利用者が見たいセグメントを選択するための機能部である。映像出力部１７５は、セグメント選択部１７４によって選択されたセグメントの映像を映像データ記憶部１２０より取得して表示部１８０に出力する。映像出力部１７５は、選択されたセグメントの解析情報から該当するセグメントの開始フレーム及び終了フレームの情報を取得し、映像データ記憶部１２０から該当するフレームの映像を取得して表示部１８０に出力する。 The segment selection unit 174 is a functional unit for selecting a segment that the user wants to see by looking at the analysis information output to the display unit 180 by the analysis information output unit 173. The video output unit 175 acquires the video of the segment selected by the segment selection unit 174 from the video data storage unit 120 and outputs it to the display unit 180. The video output unit 175 acquires information on the start frame and end frame of the corresponding segment from the analysis information of the selected segment, acquires the video of the corresponding frame from the video data storage unit 120, and outputs it to the display unit 180. .

以上、映像解析装置１００の構成について説明した。なお、映像解析装置１００を構成する映像取得部１１０、セグメント作成部１３０、顔グループ作成部１４０、解析情報作成部１５０及び再生部等の各機能部は、これらの各部の機能を実行するソフトウェアプログラムをサーバ装置１２０にインストールすることにより実現することも可能であるし、専用のハードウェアにより実現することも可能である。また、上記ソフトウェアプログラムは、コンピュータ読み取り可能な記憶媒体に記憶されたものを読み出すことにより実行されてもよいし、あるいは、ネットワーク等を介して映像解析装置１００に提供されるものであってもよい。また、映像データ記憶部１２０及び解析情報記憶部１６０等の記憶部は、例えば、半導体メモリ、光ディスク、磁気ディスク等の各種の記憶媒体等により構成されてもよい。 The configuration of the video analysis device 100 has been described above. The functional units such as the video acquisition unit 110, the segment creation unit 130, the face group creation unit 140, the analysis information creation unit 150, and the playback unit that constitute the video analysis device 100 are software programs that execute the functions of these units. Can be realized by installing in the server device 120, or can be realized by dedicated hardware. The software program may be executed by reading out a program stored in a computer-readable storage medium, or may be provided to the video analysis apparatus 100 via a network or the like. . In addition, the storage units such as the video data storage unit 120 and the analysis information storage unit 160 may be configured by various storage media such as a semiconductor memory, an optical disk, and a magnetic disk.

次に、図１１を参照して、本実施形態にかかる映像解析装置１００によって実行される映像解析処理について説明する。図１１は、本実施形態にかかる映像解析装置１００によって実行される映像解析処理の流れを示すフローチャートである。 Next, with reference to FIG. 11, a video analysis process executed by the video analysis apparatus 100 according to the present embodiment will be described. FIG. 11 is a flowchart showing a flow of video analysis processing executed by the video analysis device 100 according to the present embodiment.

まず、ステップＳ２００で、映像取得部１１０は、映像データ記憶部１２０から所定の映像データを取得する。この映像データは、デジタルレコーダ等によってテレビ映像などを録画したものや、ビデオカメラ等で撮影された映像等である。 First, in step S 200, the video acquisition unit 110 acquires predetermined video data from the video data storage unit 120. This video data is a video recorded by a digital recorder or the like, or a video shot by a video camera or the like.

次いで、ステップＳ２０２〜Ｓ２０６において、セグメント作成部１３０は、取得された映像データを複数に分割してセグメントを作成する。また、ステップＳ２０８〜ステップＳ２１２において、顔グループ作成部１４０は、映像に含まれる顔画像をグループ化し、代表画像を選択する。ここで、ステップＳ２０２〜Ｓ２０６の処理と、ステップＳ２０８〜ステップＳ２１２の処理は、並行して実行されるようにしてもよく、あるいは、一方の処理が終了した後にもう一方の処理が行われるようにしてもよい。 Next, in steps S202 to S206, the segment creation unit 130 creates a segment by dividing the acquired video data into a plurality of pieces. In step S208 to step S212, the face group creation unit 140 groups the face images included in the video and selects a representative image. Here, the processing in steps S202 to S206 and the processing in steps S208 to S212 may be executed in parallel, or the other processing is performed after the end of one processing. May be.

ステップＳ２０２〜Ｓ２０６の処理について説明する。まず、ステップＳ２０２で、セグメント作成部１３０の特徴量抽出部１３１は、映像取得部１１０から取得した各フレームの画像中の特徴量を抽出する。特徴量の抽出は、上述したように、非特許文献１〜３に記載されているような従来既知の方法を用いて行うことができる。 Processing in steps S202 to S206 will be described. First, in step S 202, the feature amount extraction unit 131 of the segment creation unit 130 extracts a feature amount in the image of each frame acquired from the video acquisition unit 110. As described above, the feature amount can be extracted by using a conventionally known method as described in Non-Patent Documents 1 to 3.

次いで、ステップ２０４で、物体検出部１３３は、抽出された特徴量に基づいて、前後のフレーム画像に同じ物体が存在するか否かを検出する。次いで、ステップＳ２０６で、セグメント作成部１３０は、ステップＳ２０４での物体検出結果に基づいてフレーム間の連続性評価を行い、フレーム間の変化が少ないフレームが同じセグメントとなるようにセグメントを作成する。 Next, in step 204, the object detection unit 133 detects whether or not the same object exists in the preceding and succeeding frame images based on the extracted feature amount. Next, in step S206, the segment creation unit 130 performs continuity evaluation between frames based on the object detection result in step S204, and creates a segment so that frames with little change between frames become the same segment.

次に、Ｓ２０８〜ステップＳ２１２の処理について説明する。まず、ステップＳ２０８で、顔画像検出部１４１は、各フレーム画像中から人物の顔を検出する。顔検出の方法としては、例えば上述した特許文献２（特開２００１−２８５７８７号公報）に記載されているような既知の顔検出技術を適用することができる。 Next, the process from S208 to S212 will be described. First, in step S208, the face image detection unit 141 detects a human face from each frame image. As a face detection method, for example, a known face detection technique as described in Patent Document 2 (Japanese Patent Laid-Open No. 2001-285787) described above can be applied.

次いで、ステップＳ２１０で、グループ作成部１４３は、検出された顔と顔画像記憶部１４２に記録された顔との類似評価を行い、類似した顔が連続して検出されるフレーム画像を同じ顔グループとし、グループ作成結果として各顔グループの顔画像に含まれる人物情報（ＩＤ等）とその顔グループの最初と最後のフレーム番号とを記録する。 Next, in step S210, the group creation unit 143 performs similarity evaluation between the detected face and the face recorded in the face image storage unit 142, and sets frame images in which similar faces are detected in succession to the same face group. As a group creation result, personal information (ID etc.) included in the face image of each face group and the first and last frame numbers of the face group are recorded.

次いで、ステップＳ２１２で、代表画像選択部１４４は、各顔グループの代表画像として、顔グループに含まれるフレーム画像のうち任意のフレーム画像を選択する。代表画像の選択方法は、顔グループ内の最初のフレーム画像を代表画像としてもよいし、含まれる顔画像の面積が最も大きいものを代表画像としてもよい。 Next, in step S212, the representative image selection unit 144 selects an arbitrary frame image from among the frame images included in the face group as the representative image of each face group. As a representative image selection method, the first frame image in the face group may be used as the representative image, or the image having the largest area of the face image may be used as the representative image.

セグメント作成部１３０によるセグメント作成処理（ステップＳ２０２〜Ｓ２０６）及び顔グループ作成部１４０による顔グループ作成処理（Ｓ２０８〜ステップＳ２１２）が終了したら、ステップＳ２１４で、解析情報作成部１５０が解析情報を作成する。解析情報作成部１５０は、セグメント作成部１３０によって作成されたセグメントと、顔グループ作成部１４０によるグループ化の結果とに基づいて解析情報を作成する。作成された解析情報は、ステップＳ２１６で、解析情報作成部１５０によって解析情報記憶部１６０に格納される。 When the segment creation process (steps S202 to S206) by the segment creation unit 130 and the face group creation process (S208 to S212) by the face group creation unit 140 are completed, the analysis information creation unit 150 creates analysis information in step S214. . The analysis information creation unit 150 creates analysis information based on the segment created by the segment creation unit 130 and the grouping result by the face group creation unit 140. The created analysis information is stored in the analysis information storage unit 160 by the analysis information creation unit 150 in step S216.

次いで、ステップＳ２１８で、再生部１７０は、解析情報記憶部１６０に格納された解析情報を表示部１８０に表示し、映像データの再生を行う。図１２は、ステップ２１８の再生部１７０による映像再生処理の流れをより詳細に示したフローチャートである。図１２を参照すると、まず、ステップＳ２２０で、解析情報の表示形式が利用者によって選択される。ステップＳ２２２で、解析情報集計部１７２は、選択された表示形式（時系列に表示、人物毎に表示、等）に合わせて解析情報を集計する。 Next, in step S218, the reproduction unit 170 displays the analysis information stored in the analysis information storage unit 160 on the display unit 180, and reproduces the video data. FIG. 12 is a flowchart showing in more detail the flow of video playback processing by the playback unit 170 in step 218. Referring to FIG. 12, first, in step S220, the display format of analysis information is selected by the user. In step S222, the analysis information totaling unit 172 totals the analysis information in accordance with the selected display format (display in time series, display for each person, etc.).

次いで、ステップＳ２２４で、解析情報出力部１７３は、解析情報集計部１７２によって集計された情報を表示部１８０に出力する。ステップＳ２２６で、利用者は、ステップＳ２２４で表示部１８０に表示された解析情報を見て、閲覧したいセグメントを選択する。ステップＳ２２８で、映像出力部１７５は、選択されたセグメントの映像を映像データ記憶部１２０より取得して再生し、表示部１８０に出力する。 Next, in step S224, the analysis information output unit 173 outputs the information aggregated by the analysis information aggregation unit 172 to the display unit 180. In step S226, the user looks at the analysis information displayed on the display unit 180 in step S224 and selects a segment to be viewed. In step S 228, the video output unit 175 acquires the video of the selected segment from the video data storage unit 120, reproduces it, and outputs it to the display unit 180.

以上のように、上述した第１の実施形態にかかる映像解析装置及び映像解析方法によれば、前後のフレーム画像に含まれる物体を認識することによって映像の連続性の評価が行われるため、従来の画像特徴のみを用いる方法を比較して、フレーム間の細かな画像の変動による影響を少なくすることができる。従って、意味的に連続した映像を別のセグメントとして分割してしまう可能性を低減することが可能となり、利用者にとって利便性の高い効果的なセグメント作成を行うことができる。 As described above, according to the video analysis device and the video analysis method according to the first embodiment described above, since the continuity of video is evaluated by recognizing objects included in the preceding and following frame images, Compared with the method using only the image feature, it is possible to reduce the influence of fine image fluctuations between frames. Therefore, it is possible to reduce the possibility of dividing a semantically continuous video as another segment, and it is possible to create an effective segment that is convenient for the user.

また、映像の解析に際し、予め利用者が関心のある場所や人物を指定する必要がないので、利用者に余分な操作を求めることなく映像の意味に準じたセグメント作成を行うことが可能となる。 In addition, when analyzing the video, it is not necessary to specify a place or person in which the user is interested in advance, so it is possible to create a segment according to the meaning of the video without requiring the user to perform extra operations. .

（第２の実施形態）
次に、本発明の第２の実施形態にかかる映像解析装置について説明する。上述した第１の実施形態においては、フレーム画像全体から特徴量を抽出して物体を検出するように構成されていたが、以下で説明する第２の実施形態は、顔グループ作成部により顔を検出した位置からフレーム画像中に存在する人物の領域を検出し、検出された人物領域から特徴量の抽出を行うようにすることを特徴とする。 (Second Embodiment)
Next, a video analysis apparatus according to the second embodiment of the present invention will be described. In the first embodiment described above, the object is detected by extracting the feature amount from the entire frame image. However, in the second embodiment described below, the face group creation unit performs face detection. A human region existing in the frame image is detected from the detected position, and a feature amount is extracted from the detected human region.

まず、図１３を参照して、本発明の第２の実施形態にかかる映像解析装置の構成について説明する。図１３は、第２の実施形態にかかる映像解析装置３００の概略構成を示すブロック図である。本実施形態にかかる映像解析装置３００は、図１に示した第１の実施形態にかかる映像解析装置１００と比較して、新たに人物領域作成部３９０を備えるようにした点が異なる。以下では、本実施形態に係る映像解析装置３００の構成について、第１の実施形態にかかる映像解析装置１００と異なる点について説明し、その他については重複説明を避けるため省略する。 First, with reference to FIG. 13, the configuration of a video analysis apparatus according to the second embodiment of the present invention will be described. FIG. 13 is a block diagram illustrating a schematic configuration of a video analysis apparatus 300 according to the second embodiment. The video analysis apparatus 300 according to the present embodiment is different from the video analysis apparatus 100 according to the first embodiment shown in FIG. 1 in that a person area creation unit 390 is newly provided. Hereinafter, the configuration of the video analysis device 300 according to the present embodiment will be described with respect to differences from the video analysis device 100 according to the first embodiment, and the rest will be omitted to avoid redundant description.

（人物領域作成部３９０）
図１４は、人物領域作成部３９０、セグメント作成部３３０及び顔グループ作成部３４０の概略構成を示すブロック図である。セグメント作成部３３０及び顔グループ作成部３４０の構成については、第１の実施形態にかかる映像解析装置１００のセグメント作成部１３０及び顔グループ作成部１４０の構成と同様であるので、相違点についてのみ説明する。人物領域作成部３９０は、図１４に示すように、人物領域検出部３９１と、人物領域記憶部３９２とにより構成される。 (Person area creation unit 390)
FIG. 14 is a block diagram illustrating a schematic configuration of the person area creation unit 390, the segment creation unit 330, and the face group creation unit 340. Since the configurations of the segment creation unit 330 and the face group creation unit 340 are the same as the configurations of the segment creation unit 130 and the face group creation unit 140 of the video analysis apparatus 100 according to the first embodiment, only the differences will be described. To do. As shown in FIG. 14, the person area creation unit 390 includes a person area detection unit 391 and a person area storage unit 392.

人物領域検出部３９１は、顔グループ作成部３４０の顔画像検出部３４１より、各フレーム画像から顔画像を検出した結果を受け取り、顔画像が検出された位置から人物が存在する領域（人物領域）を算出して人物領域記憶部３９２に記録する。人物領域記憶部３９２は、人物領域検出部３９１によって検出された、各フレーム画像中の人物領域を記録するための記憶媒体である。人物領域の算出方法は、例えば、顔が検出された位置の下方の一定領域を人物領域と推定することによって算出するようにしてもよい。 The person area detection unit 391 receives a result of detecting a face image from each frame image from the face image detection unit 341 of the face group creation unit 340, and an area where a person exists from the position where the face image is detected (person area) Is calculated and recorded in the person area storage unit 392. The person area storage unit 392 is a storage medium for recording the person area in each frame image detected by the person area detection unit 391. For example, the person area may be calculated by estimating a certain area below the position where the face is detected as the person area.

また、セグメント作成部３３０の特徴量抽出部３３１は、人物領域記憶部３９２から人物領域の情報を取得し、人物領域の特徴量を抽出して特徴量記憶部３３２に格納する。物体検出部３３３は、特徴量記憶部３３２に記録されている人物領域の特徴量を用いて、第１の実施形態の物体検出部１３３と同様の方法により物体検出を行う。そして、人物領域の特徴量を用いた物体検出の結果を顔グループ作成部３４０のグループ作成部３４３に入力する。 Also, the feature amount extraction unit 331 of the segment creation unit 330 acquires information on the person region from the person region storage unit 392, extracts the feature amount of the person region, and stores it in the feature amount storage unit 332. The object detection unit 333 performs object detection by the same method as the object detection unit 133 of the first embodiment, using the feature amount of the person area recorded in the feature amount storage unit 332. Then, the result of object detection using the feature amount of the person region is input to the group creation unit 343 of the face group creation unit 340.

グループ作成部３４３は、顔画像検出部３４１による顔画像の検出結果と、物体検出部３３３による人物領域の物体検出の結果とを用いて顔グループを作成する。顔グループの作成方法は、第１の実施形態のグループ作成部１４３と同様の方法が用いられる。 The group creation unit 343 creates a face group using the detection result of the face image by the face image detection unit 341 and the result of object detection of the person area by the object detection unit 333. As a face group creation method, the same method as the group creation unit 143 of the first embodiment is used.

図１５は、顔画像検出部３４１による顔画像の検出結果と、物体検出部３３３による人物領域の物体検出の結果とを示した図である。図１５に示したフレーム画像のうち左から１番目と２番目のフレームでは、人物の顔がある程度正面に向いているため、顔を認識することができる。従って、人物領域検出部３９１は、左から１、２番目のフレームで検出された顔位置より、フレーム内の太枠で示された人物領域を作成する。 FIG. 15 is a diagram illustrating a detection result of the face image by the face image detection unit 341 and a result of object detection of the person area by the object detection unit 333. In the first and second frames from the left in the frame image shown in FIG. 15, the face can be recognized because the person's face is directed to the front to some extent. Therefore, the person area detection unit 391 creates a person area indicated by a thick frame in the frame from the face positions detected in the first and second frames from the left.

一方、右側の２つのフレームでは、人物の顔が所定の角度以上横向きあるいは後ろ向きになっているため、顔画像検出部３４１が顔として認識することができない。この場合、物体検出部３３３は、２番目のフレームの人物領域が３番目のフレーム内のどこにあるか（左から３番目のフレーム内の太枠）を検出し、その検出結果を人物領域記憶部３９２に格納する。 On the other hand, in the two frames on the right side, the face image detection unit 341 cannot recognize the face as a face because the face of the person is turned sideways or backward by a predetermined angle or more. In this case, the object detection unit 333 detects where the person region of the second frame is in the third frame (thick frame in the third frame from the left), and the detection result is the person region storage unit. It stores in 392.

このように、本実施形態では、顔が検出された際に人物領域検出部３９１で人物領域を作成し、この人物領域の検出を物体検出部３３３で行うことにより、顔の見えない人物の登場を検知することができる。顔の検出に加えて人物領域の物体検出結果を用いることにより、グループ作成部３４３は、人物が登場するフレームをより正確に検出することができる。 As described above, in this embodiment, when a face is detected, a person area is created by the person area detection unit 391, and this person area is detected by the object detection unit 333, so that the appearance of a person whose face cannot be seen appears. Can be detected. By using the object detection result of the person area in addition to the face detection, the group creating unit 343 can more accurately detect the frame in which the person appears.

次に、図１６を参照して、本実施形態にかかる映像解析装置３００によって実行される映像解析処理について説明する。図１６は、本実施形態にかかる映像解析装置３００によって実行される映像解析処理の流れを示すフローチャートである。 Next, with reference to FIG. 16, a video analysis process executed by the video analysis device 300 according to the present embodiment will be described. FIG. 16 is a flowchart showing a flow of video analysis processing executed by the video analysis device 300 according to the present embodiment.

まず、ステップＳ４００で、映像取得部３１０は、映像データ記憶部３２０から所定の映像データを取得し、取得されたデータをセグメント作成部３３０及び顔グループ作成部３４０に入力する。次いで、ステップＳ４０２で、顔グループ作成部３４０の顔画像検出部３４１は、映像の各フレーム画像中から人物の顔を検出する。顔検出の方法としては、上述した第１の実施形態と同様に、特許文献２（特開２００１−２８５７８７号公報）に記載されているような既知の顔検出技術を適用することができる。 First, in step S 400, the video acquisition unit 310 acquires predetermined video data from the video data storage unit 320 and inputs the acquired data to the segment creation unit 330 and the face group creation unit 340. Next, in step S402, the face image detection unit 341 of the face group creation unit 340 detects a person's face from each frame image of the video. As a face detection method, a known face detection technique as described in Patent Document 2 (Japanese Patent Laid-Open No. 2001-285787) can be applied as in the first embodiment described above.

次いで、ステップＳ４０４で、人物領域検出部３９１は、顔画像検出部３４１による顔画像の検出結果に基づいて、各フレーム画像中の人物が存在する領域（人物領域）を算出する。人物領域の算出方法は、例えば、顔が検出された位置の下方の一定領域を人物領域と推定することによって算出するようにしてもよい。 Next, in step S404, the person area detection unit 391 calculates an area (person area) where a person is present in each frame image based on the detection result of the face image by the face image detection unit 341. For example, the person area may be calculated by estimating a certain area below the position where the face is detected as the person area.

次いで、ステップＳ４０６で、セグメント作成部３３０の特徴量抽出部３３１は、映像取得部３１０から取得した各フレームの画像中の特徴量を抽出する。ここで、特徴量抽出部３３１は、人物領域検出部３９１によって検出された人物領域の情報を人物領域記憶部３９２から取得し、人物領域の特徴量の抽出する。次いで、ステップ４０８で、物体検出部３３３は、抽出された特徴量に基づいて、前後のフレーム画像に同じ物体が存在するか否かを検出し、人物領域を検出した場合は、画像内で検出した場所を人物領域記憶部３９２に記録する。 Next, in step S 406, the feature amount extraction unit 331 of the segment creation unit 330 extracts the feature amount in the image of each frame acquired from the video acquisition unit 310. Here, the feature amount extraction unit 331 acquires information on the person region detected by the person region detection unit 391 from the person region storage unit 392, and extracts the feature amount of the person region. Next, in step 408, the object detection unit 333 detects whether or not the same object exists in the preceding and succeeding frame images based on the extracted feature amount, and detects the person area if it is detected in the image. The location is recorded in the person area storage unit 392.

次いで、ステップＳ４１０で、セグメント生成部３３４は、ステップＳ４０８での物体検出結果に基づいてフレーム間の連続性評価を行い、フレーム間の変化が少ないフレームが同じセグメントとなるようにセグメントを作成する。また、ステップＳ４１２で、顔グループ作成部３４０のグループ作成部３４３は、ステップＳ４０２での顔画像の検出結果と、ステップＳ４０８での人物領域の物体検出の結果とを用いて顔グループを作成する。その後、ステップＳ４１４で、代表画像選択部３４４は、ステップＳ４１２で作成された各顔グループの代表画像として、顔グループに含まれるフレーム画像のうち任意のフレーム画像を選択する。 Next, in step S410, the segment generation unit 334 performs continuity evaluation between frames based on the object detection result in step S408, and creates segments so that frames with little change between frames become the same segment. In step S412, the group creation unit 343 of the face group creation unit 340 creates a face group using the detection result of the face image in step S402 and the result of object detection in the person area in step S408. Thereafter, in step S414, the representative image selection unit 344 selects an arbitrary frame image from the frame images included in the face group as the representative image of each face group created in step S412.

なお、図１６に示すように、ステップＳ４１０と、ステップＳ４１２及びＳ４１４とは、並行して実行されるようにしてもよく、あるいは、一方の処理が終了した後にもう一方の処理が行われるようにしてもよい。 As shown in FIG. 16, step S410 and steps S412 and S414 may be executed in parallel, or the other process may be performed after the end of one process. May be.

ステップＳ４１０及びステップＳ４１４の終了後、ステップＳ４１６で、解析情報作成部３５０が解析情報を作成する。ステップＳ４１６以降の処理は、上述した第１の実施形態のステップＳ２１４〜Ｓ２１８の処理と同様であるので、ここでは説明を省略する。 After step S410 and step S414, the analysis information creation unit 350 creates analysis information in step S416. Since the process after step S416 is the same as the process of step S214-S218 of 1st Embodiment mentioned above, description is abbreviate | omitted here.

以上のように、上述した第２の実施形態にかかる映像解析装置及び映像解析方法によれば、顔画像による顔グループ作成の際に、顔画像の検出結果に加えて、人物領域の物体検出結果を用いることにより、顔が他の物体で遮蔽されたり、顔が正面に向いていない場合等においても、画像中に人物が存在することを認識することができる。これにより、人物が登場するフレームをより正確に検出してグループ作成を行うことが可能となる。 As described above, according to the video analysis device and the video analysis method according to the second embodiment described above, in addition to the detection result of the face image, the object detection result of the person area is generated in the face group creation by the face image. By using, it is possible to recognize the presence of a person in an image even when the face is blocked by another object or the face is not facing the front. This makes it possible to create a group by more accurately detecting a frame in which a person appears.

（第３の実施形態）
次に、本発明の第３の実施形態にかかる映像解析装置について説明する。上述した第１及び第２の実施形態においては、映像解析装置によって解析された結果から、利用者が閲覧したい映像のセグメントを選択するように構成されていたが、以下で説明する第３の実施形態は、利用者によって予め任意のフレーム画像を指定してもらうことにより、指定された画像と類似するフレーム画像を持つセグメントや、指定された画像中に登場する人物と同じ人物が登場するセグメントを抽出するようにしたことを特徴とする。 (Third embodiment)
Next, a video analysis apparatus according to the third embodiment of the present invention will be described. In the first and second embodiments described above, the video segment that the user wants to browse is selected from the result of analysis by the video analysis device. However, the third embodiment described below is used. As for the form, by having a user specify an arbitrary frame image in advance, a segment having a frame image similar to the specified image or a segment in which the same person as the person appearing in the specified image appears It is characterized by being extracted.

まず、図１７を参照して、本発明の第３の実施形態にかかる映像解析装置の構成について説明する。図１７は、第３の実施形態にかかる映像解析装置５００の概略構成を示すブロック図である。本実施形態にかかる映像解析装置５００は、図１３に示した第２の実施形態にかかる映像解析装置３００と比較して、新たに画像データ記憶部６１０、検索画像入力部６２０、検索画像評価部６３０及び顔類似性評価部６４０を備えるようにした点が異なる。以下では、本実施形態に係る映像解析装置５００の構成について、第１及び第２の実施形態にかかる映像解析装置１００及び３００と異なる点について説明し、その他については重複説明を避けるため省略する。 First, with reference to FIG. 17, the configuration of a video analysis apparatus according to the third embodiment of the present invention will be described. FIG. 17 is a block diagram illustrating a schematic configuration of a video analysis apparatus 500 according to the third embodiment. Compared with the video analysis apparatus 300 according to the second embodiment shown in FIG. 13, the video analysis apparatus 500 according to the present embodiment newly includes an image data storage unit 610, a search image input unit 620, and a search image evaluation unit. 630 and the face similarity evaluation unit 640 are different. Hereinafter, the configuration of the video analysis device 500 according to the present embodiment will be described with respect to differences from the video analysis devices 100 and 300 according to the first and second embodiments, and the rest will be omitted in order to avoid redundant description.

画像データ記憶部６１０は、利用者が映像中から検索したい画像を指定する際の、検索キーとなる画像（検索画像）が格納される記憶媒体である。検索画像入力部６２０には、画像データ記憶部６１０に格納されている検索画像の中から、利用者によって指定された画像（指定画像）が入力される。検索画像入力部６２０は、指定画像を検索画像評価部６３０に入力する。検索画像評価部６３０は、検索画像と映像の各フレーム画像との類似性を評価する。顔類似性評価部６４０は、検索画像から検出された顔とフレーム画像から検出された顔との類似性を評価する。 The image data storage unit 610 is a storage medium that stores an image (search image) that serves as a search key when a user specifies an image to be searched from video. The search image input unit 620 receives an image (designated image) designated by the user from among the search images stored in the image data storage unit 610. The search image input unit 620 inputs the designated image to the search image evaluation unit 630. The search image evaluation unit 630 evaluates the similarity between the search image and each frame image of the video. The face similarity evaluation unit 640 evaluates the similarity between the face detected from the search image and the face detected from the frame image.

（検索画像評価部６３０）
次に、図１８を参照して、検索画像評価部６３０の詳細についてさらに説明する。図１８は、検索画像評価部６３０及びセグメント作成部５３０の概略構成を示すブロック図である。セグメント作成部５３０の構成については、第１及び第２の実施形態にかかる映像解析装置のセグメント作成部の構成と同様であるので、以下では検索画像評価部６３０と関連する点についてのみ説明する。検索画像評価部６３０は、図１８に示すように、特徴量抽出部６３１と、類似性評価部６３２とにより構成される。 (Search image evaluation unit 630)
Next, the details of the search image evaluation unit 630 will be further described with reference to FIG. FIG. 18 is a block diagram illustrating a schematic configuration of the search image evaluation unit 630 and the segment creation unit 530. Since the configuration of the segment creation unit 530 is the same as the configuration of the segment creation unit of the video analysis apparatus according to the first and second embodiments, only the points related to the search image evaluation unit 630 will be described below. As shown in FIG. 18, the search image evaluation unit 630 includes a feature amount extraction unit 631 and a similarity evaluation unit 632.

特徴量抽出部６３１は、検索画像の特徴量を抽出する。特徴量の抽出は、第１の実施形態において説明したセグメント作成部１３０の特徴量抽出部１３１と同様の方法によって実現できる。類似性評価部６３２は、セグメント作成部５３０の特徴量抽出部５３１によって抽出された各フレーム画像の特徴量と、検索画像評価部６３０の特徴量抽出部６３１によって抽出された検索画像の特徴量とを比較し、両者の類似性を評価する。類似性の評価は、第１の実施形態において説明したセグメント作成部１３０の物体検出部１３３と同様の方法により、２つの画像に同じ物体が存在するか否かを検出し、連続性の評価値を算出するのと同様の方法によって類似性の評価値を算出することができる。 The feature amount extraction unit 631 extracts the feature amount of the search image. The feature amount extraction can be realized by the same method as the feature amount extraction unit 131 of the segment creation unit 130 described in the first embodiment. The similarity evaluation unit 632 includes the feature amount of each frame image extracted by the feature amount extraction unit 531 of the segment creation unit 530 and the feature amount of the search image extracted by the feature amount extraction unit 631 of the search image evaluation unit 630. And evaluate the similarity between the two. The similarity evaluation is performed by detecting whether or not the same object exists in the two images by the same method as the object detection unit 133 of the segment creation unit 130 described in the first embodiment. The similarity evaluation value can be calculated by the same method as that for calculating.

（顔類似性評価部６４０）
次に、図１９を参照して、顔類似性評価部６４０の詳細についてさらに説明する。図１９は、顔類似性評価部６４０及び顔グループ作成部５４０の概略構成を示すブロック図である。顔グループ作成部５４０の構成については、第１及び第２の実施形態にかかる映像解析装置の顔グループ作成部の構成と同様であるので、以下では顔類似性評価部６４０と関連する点についてのみ説明する。顔類似性評価部６４０は、図１９に示すように、検索顔画像検出部６４１と、検索顔画像類似性評価部６４２とにより構成される。 (Face similarity evaluation unit 640)
Next, the details of the face similarity evaluation unit 640 will be further described with reference to FIG. FIG. 19 is a block diagram illustrating a schematic configuration of the face similarity evaluation unit 640 and the face group creation unit 540. Since the configuration of the face group creation unit 540 is the same as the configuration of the face group creation unit of the video analysis apparatus according to the first and second embodiments, only the points related to the face similarity evaluation unit 640 are described below. explain. As shown in FIG. 19, the face similarity evaluation unit 640 includes a search face image detection unit 641 and a search face image similarity evaluation unit 642.

検索顔画像検出部６４１は、検索画像から顔画像を検出する。顔画像の検出は、第１の実施形態において説明した顔グループ作成部１４０の顔画像検出部１４１と同様の方法によって実現することができる。検索顔画像類似性評価部６４２は、検索顔画像検出部６４１によって検出された検索画像中の顔画像と、フレーム画像中の顔画像との類似性を評価する。顔類似性の評価は、第１の実施形態において説明した顔グループ作成部１４０のグループ作成部１４３と同様の方法によって実現することができる。検索顔画像類似性評価部６４２は、類似性の評価の結果、検索画像中の顔画像と類似する顔が含まれるグループを判定する。 The search face image detection unit 641 detects a face image from the search image. The detection of the face image can be realized by a method similar to that of the face image detection unit 141 of the face group creation unit 140 described in the first embodiment. The search face image similarity evaluation unit 642 evaluates the similarity between the face image in the search image detected by the search face image detection unit 641 and the face image in the frame image. The evaluation of face similarity can be realized by the same method as the group creation unit 143 of the face group creation unit 140 described in the first embodiment. The search face image similarity evaluation unit 642 determines a group including faces similar to the face image in the search image as a result of the similarity evaluation.

図２０は、検索画像評価部６３０及び顔類似性評価部６４０による評価結果を示した図である。検索画像評価部６３０の類似性評価部６３２は、各フレーム画像と検索画像との類似性評価を実行し、図２０に示すフレーム類似性評価結果００１を得る。図２０の例では、フレーム番号１〜１０までのフレーム画像のうち、フレーム番号１〜６までが検索画像と類似するフレームであると判断されている。一方、顔類似性評価部６４０の検索顔画像類似性評価部６４２は、フレーム画像から検出された顔画像と検索画像から検出された顔画像の類似性評価を実行し、図２０に示す類似グループ検索結果００２を得る。図２０の例では、フレーム番号３〜６に登場する人物の顔と、フレーム番号８〜１０に登場する人物の顔とが、いずれも検索画像中に登場する人物の顔と類似していると判断されている。 FIG. 20 is a diagram illustrating evaluation results by the search image evaluation unit 630 and the face similarity evaluation unit 640. The similarity evaluation unit 632 of the search image evaluation unit 630 executes similarity evaluation between each frame image and the search image, and obtains a frame similarity evaluation result 001 shown in FIG. In the example of FIG. 20, it is determined that frame numbers 1 to 6 out of frame images 1 to 10 are similar to the search image. On the other hand, the search face image similarity evaluation unit 642 of the face similarity evaluation unit 640 executes similarity evaluation between the face image detected from the frame image and the face image detected from the search image, and the similarity group shown in FIG. A search result 002 is obtained. In the example of FIG. 20, the face of the person appearing in frame numbers 3 to 6 and the face of the person appearing in frame numbers 8 to 10 are both similar to the face of the person appearing in the search image. It has been judged.

次に、図２１を参照して、本実施形態にかかる映像解析装置５００によって実行される映像解析処理について説明する。図２１は、本実施形態にかかる映像解析装置５００によって実行される映像解析処理の流れを示すフローチャートである。 Next, with reference to FIG. 21, a video analysis process executed by the video analysis device 500 according to the present embodiment will be described. FIG. 21 is a flowchart showing the flow of video analysis processing executed by the video analysis device 500 according to the present embodiment.

まず、ステップＳ７００で、利用者は、画像データ記憶部６１０に格納された画像データの中から検索キーとなる検索画像を選択し、検索画像入力部６２０に入力する。次いで、ステップＳ７０２で、検索画像評価部６３０の特徴量抽出部６３１は、検索画像の特徴量を抽出する。次いで、ステップＳ７０４で、顔類似性評価部６４０の検索顔画像検出部６４１は、検索画像から顔画像を検出する。 First, in step S 700, the user selects a search image serving as a search key from the image data stored in the image data storage unit 610 and inputs the search image to the search image input unit 620. Next, in step S702, the feature amount extraction unit 631 of the search image evaluation unit 630 extracts the feature amount of the search image. Next, in step S704, the search face image detection unit 641 of the face similarity evaluation unit 640 detects a face image from the search image.

続くステップＳ７０６〜Ｓ７２０の処理は、上述した第２の実施形態におけるステップＳ４００〜Ｓ４１４の処理と同様であるので、重複説明を避けるためここでは説明を省略する。 The subsequent processing in steps S706 to S720 is the same as the processing in steps S400 to S414 in the second embodiment described above, and thus description thereof is omitted here to avoid duplication.

ステップＳ７１６、Ｓ７２０の後、ステップＳ７２２で、検索画像評価部６３０の類似性評価部６３２は、検索画像と各フレーム画像との類似性評価を行う。次いで、ステップＳ７２４で、顔類似性評価部６４０の検索顔画像類似性評価部６４２は、検索画像中の顔画像と各フレーム画像中の顔画像との類似性評価を行い、検索画像中の顔画像に類似する顔画像が含まれる顔グループを判定する。 After steps S716 and S720, in step S722, the similarity evaluation unit 632 of the search image evaluation unit 630 performs similarity evaluation between the search image and each frame image. Next, in step S724, the search face image similarity evaluation unit 642 of the face similarity evaluation unit 640 performs similarity evaluation between the face image in the search image and the face image in each frame image, and the face in the search image. A face group including a face image similar to the image is determined.

次いで、ステップＳ７２６で、解析情報作成部５５０が解析情報を作成する。上述した第１及び第２の実施形態にかかる解析情報作成部は、セグメント作成部によるセグメント作成結果と、顔グループ作成部による顔認識結果とに基づいて解析情報を作成するように構成されたが、本実施形態にかかる解析情報作成部５５０は、上記の２つに加えて、検索画像評価部６３０及び顔類似性評価部６４０による評価結果を用いて解析情報を作成する。 Next, in step S726, the analysis information creation unit 550 creates analysis information. The analysis information creation unit according to the first and second embodiments described above is configured to create analysis information based on the segment creation result by the segment creation unit and the face recognition result by the face group creation unit. In addition to the above two, the analysis information creation unit 550 according to the present embodiment creates analysis information using the evaluation results from the search image evaluation unit 630 and the face similarity evaluation unit 640.

図２２は、本実施形態にかかる解析情報作成部５５０によって作成される解析情報の一例を示したものである。図２２に示すように、本実施形態にかかる解析情報作成部５５０は、第１、第２の実施形態で作成されるのと同様の解析情報に加えて、検索画像評価部６３０によるフレーム類似性評価結果や、顔類似性評価部６４０による類似グループ検索結果を用いて解析情報を作成する。図２２の例では、フレーム番号２０１〜２３０のフレームが利用者が選択した画像と類似すると判断されている。また、利用者が選択した画像に登場する人物を含むグループとして、フレーム番号２８０〜３００のフレームを含むグループｃと、フレーム番号１００〜１２５のフレームを含むグループｂとが検索されている。 FIG. 22 shows an example of analysis information created by the analysis information creation unit 550 according to the present embodiment. As shown in FIG. 22, the analysis information creation unit 550 according to the present embodiment includes frame similarity by the search image evaluation unit 630 in addition to the same analysis information as that created in the first and second embodiments. Analysis information is created using the evaluation result and the similar group search result by the face similarity evaluation unit 640. In the example of FIG. 22, it is determined that the frames with frame numbers 201 to 230 are similar to the image selected by the user. In addition, as a group including persons appearing in the image selected by the user, a group c including frames with frame numbers 280 to 300 and a group b including frames with frame numbers 100 to 125 are searched.

これらの結果に基づいて、解析情報作成部５５０は、利用者が選択した画像と類似するフレームを含むセグメントや、利用者が選択した画像に登場する人物を含むセグメントの情報を解析情報をして抽出する。図２２の例の場合、利用者が選択した画像と類似するフレームを含むセグメントとしてセグメントＢが抽出され、利用者が選択した画像に登場する人物を含むセグメントとして、セグメントＡ及びＢが抽出される。 Based on these results, the analysis information creation unit 550 analyzes the information of the segment including the frame similar to the image selected by the user and the segment including the person appearing in the image selected by the user. Extract. In the case of the example in FIG. 22, segment B is extracted as a segment including a frame similar to the image selected by the user, and segments A and B are extracted as segments including a person appearing in the image selected by the user. .

ステップＳ７２８で、解析情報は解析情報記憶部１６０に格納され、ステップＳ７３０で、再生部５７０によって表示部１８０に表示される。利用者は、表示部１８０に表示された解析情報から閲覧したいセグメントを選択し、再生部５７０は、選択されたセグメントの映像を映像データ記憶部５２０より取得して再生し、表示部５８０に出力する。 In step S728, the analysis information is stored in the analysis information storage unit 160, and is displayed on the display unit 180 by the reproduction unit 570 in step S730. The user selects a segment to be viewed from the analysis information displayed on the display unit 180, and the playback unit 570 acquires and plays back the video of the selected segment from the video data storage unit 520 and outputs the video to the display unit 580. To do.

以上のように、上述した第３の実施形態にかかる映像解析装置及び映像解析方法によれば、第１及び第２の実施形態において作成された解析情報の中から、予め利用者によって選択された画像に関連する解析情報のみを抽出することができる。これにより、選択された画像と類似した映像を含むセグメントや、選択された画像に含まれる人物が登場するセグメントのみを利用者に提示することができ、利用者が関心を持つセグメントをより簡単に選択することができるようになる。 As described above, according to the video analysis apparatus and the video analysis method according to the third embodiment described above, the user selects in advance from the analysis information created in the first and second embodiments. Only analysis information related to the image can be extracted. This makes it possible to present to the user only segments that contain video similar to the selected image or segments in which the person included in the selected image appears, making it easier to find the segments that the user is interested in You will be able to choose.

以上、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明は係る例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to the example which concerns. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

例えば、上記実施形態では、映像の解析と解析結果の表示とが同一の装置において行われるものとしたが、本発明はかかる例に限定されない。例えば、映像の解析を行う装置と、解析情報の表示及び映像の再生を行う装置とを物理的に分離して構成してもよい。例えば、映像の解析を行う解析装置がネットワークを介して利用者に解析情報を提供し、利用者は、解析情報及び映像を表示させる表示装置を用いて視聴したいセグメントを選択し、映像を視聴することができるようにしてもよい。 For example, in the above embodiment, the video analysis and the analysis result display are performed in the same apparatus, but the present invention is not limited to such an example. For example, a device that analyzes video and a device that displays analysis information and plays video may be physically separated. For example, an analysis device that analyzes video provides analysis information to the user via the network, and the user selects a segment to view using a display device that displays the analysis information and the video, and views the video You may be able to do that.

また、上記第３の実施形態では、検索キーとなる画像は映像中のあるフレームの画像であるとして説明したが、本発明はかかる例に限定されない。例えば、任意の人物の画像や、任意の物体の画像が検索画像として入力されるようにしてもよい。これにより、利用者が関心を持っている人物や商品などが登場するセグメントを抽出することが可能となる。 In the third embodiment, the image serving as the search key has been described as an image of a certain frame in the video, but the present invention is not limited to such an example. For example, an image of an arbitrary person or an image of an arbitrary object may be input as a search image. This makes it possible to extract a segment in which a person or product that the user is interested in appears.

本発明の第１の実施形態にかかる映像解析装置の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a video analysis apparatus according to a first embodiment of the present invention. 同実施形態にかかる映像解析装置のセグメント作成部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the segment preparation part of the video analysis apparatus concerning the embodiment. セグメント作成部による物体検出処理の例を示す図である。It is a figure which shows the example of the object detection process by a segment preparation part. セグメント作成部によるセグメント作成処理の例を示す図である。It is a figure which shows the example of the segment creation process by a segment creation part. 同実施形態にかかる映像解析装置の顔グループ作成部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the face group production part of the video analysis device concerning the embodiment. 顔グループ作成部による顔画像に基づく顔グループ作成処理の例を示す図である。It is a figure which shows the example of the face group creation process based on the face image by a face group creation part. 顔グループ作成部による代表画像の選択処理の例を示す図である。It is a figure which shows the example of the selection process of the representative image by a face group creation part. 解析情報作成部による解析情報の作成例を示す図である。It is a figure which shows the creation example of the analysis information by an analysis information preparation part. 同実施形態にかかる映像解析装置の再生部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the reproducing part of the video analysis apparatus concerning the embodiment. 再生部による解析情報の集計処理の例を示す図である。It is a figure which shows the example of the total process of the analysis information by a reproducing part. 同実施形態にかかる映像解析装置によって実行される映像解析処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the video analysis process performed by the video analysis apparatus concerning the embodiment. 再生部によって実行される映像再生処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the video reproduction process performed by the reproducing part. 本発明の第２の実施形態にかかる映像解析装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the video-analysis apparatus concerning the 2nd Embodiment of this invention. 同実施形態にかかる映像解析装置のセグメント作成部、人物領域作成部及び顔画像認識部の概略構成を示すブロック図である。FIG. 2 is a block diagram illustrating a schematic configuration of a segment creation unit, a person area creation unit, and a face image recognition unit of the video analysis apparatus according to the embodiment. 人物領域作成部による人物領域の検出結果の例を示す図である。It is a figure which shows the example of the detection result of the person area by a person area creation part. 同実施形態にかかる映像解析装置によって実行される映像解析処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the video analysis process performed by the video analysis apparatus concerning the embodiment. 本発明の第３の実施形態にかかる映像解析装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the video-analysis apparatus concerning the 3rd Embodiment of this invention. 同実施形態にかかる映像解析装置の検索画像評価部及びセグメント作成部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the search image evaluation part and segment preparation part of the video analysis apparatus concerning the embodiment. 同実施形態にかかる映像解析装置の顔グループ作成部及び顔類似性評価部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the face group creation part and face similarity evaluation part of the video analysis apparatus concerning the embodiment. 同実施形態にかかる映像解析装置の検索画像評価部及び顔類似性評価部による評価結果の一例を示した図である。It is the figure which showed an example of the evaluation result by the search image evaluation part and face similarity evaluation part of the video analysis device concerning the embodiment. 同実施形態にかかる映像解析装置によって実行される映像解析処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the video analysis process performed by the video analysis apparatus concerning the embodiment. 同実施形態にかかる映像解析装置の解析情報作成部によって作成される解析情報の一例を示した図である。It is the figure which showed an example of the analysis information produced by the analysis information production part of the video analysis device concerning the embodiment.

符号の説明Explanation of symbols

１００、３００、５００映像解析装置
１１０、３１０、５１０映像取得部
１２０、３２０、５２０映像データ記憶部
１３０、３３０、５３０セグメント作成部
１３１、３３１、５３１特徴量抽出部
１３２、３３２、５３２特徴量記憶部
１３３、３３３、５３３物体検出部
１３４、３３４、５３４セグメント生成部
１４０、３４０、５４０顔グループ作成部
１４１、３４１、５４１顔画像検出部
１４２、３４２、５４２顔画像記憶部
１４３、３４３、５４３グループ作成部
１４４、３４４、５４４代表画像選択部
１５０、３５０、５５０解析情報作成部
１６０、３６０、５６０解析情報記憶部
１７０、３７０、５７０再生部
１７１表示形式選択部
１７２解析情報集計部
１７３解析情報出力部
１７４セグメント選択部
１７５映像出力部
１８０、３８０、５８０表示部
３９０、５９０人物領域作成部
３９１人物領域検出部
３９２人物領域記憶部
６１０画像データ記憶部
６２０検索画像入力部
６３０検索画像評価部
６３１特徴量抽出部
６３２類似性評価部
６４０顔類似性評価部
６４１検索顔画像検出部
６４２検索顔画像類似性評価部 100, 300, 500 Video analysis device 110, 310, 510 Video acquisition unit 120, 320, 520 Video data storage unit 130, 330, 530 Segment creation unit 131, 331, 531 Feature amount extraction unit 132, 332, 532 Feature amount storage Unit 133, 333, 533 Object detection unit 134, 334, 534 Segment generation unit 140, 340, 540 Face group creation unit 141, 341, 541 Face image detection unit 142, 342, 542 Face image storage unit 143, 343, 543 group Creation unit 144, 344, 544 Representative image selection unit 150, 350, 550 Analysis information creation unit 160, 360, 560 Analysis information storage unit 170, 370, 570 Playback unit 171 Display format selection unit 172 Analysis information aggregation unit 173 Analysis information output Section 174 Segment selection section 175 Video output unit 180, 380, 580 Display unit 390, 590 Human region creation unit 391 Human region detection unit 392 Human region storage unit 610 Image data storage unit 620 Search image input unit 630 Search image evaluation unit 631 Feature amount extraction unit 632 Similar Sex evaluation unit 640 Face similarity evaluation unit 641 Search face image detection unit 642 Search face image similarity evaluation unit

Claims

時系列に配列された複数のフレーム画像からなる映像を解析する映像解析装置であって、
前記フレーム画像より抽出される局所形状に基づく特徴量を用いて、連続する２つのフレーム画像に同じ物体が存在するか否かを判断することによって、前記２つのフレーム画像の連続性の評価を行い、前記連続性の評価結果に基づいて前記映像を複数のセグメントに分割するセグメント作成部と、
前記フレーム画像から人物の顔画像を検出し、前記検出された顔画像の類似度に基づいて前記フレーム画像をグループ化する顔グループ作成部と、
前記セグメント作成部により作成された前記セグメントと前記顔グループ作成部による前記フレーム画像の分類結果に基づいて、前記セグメントの解析情報を作成する解析情報作成部と、
を備えることを特徴とする、映像解析装置。 A video analysis device for analyzing a video composed of a plurality of frame images arranged in time series,
The continuity of the two frame images is evaluated by determining whether or not the same object exists in the two consecutive frame images using the feature amount based on the local shape extracted from the frame images. A segment creation unit that divides the video into a plurality of segments based on the continuity evaluation result;
A face group creation unit that detects a human face image from the frame image and groups the frame images based on the similarity of the detected face images;
An analysis information creation unit that creates analysis information of the segment based on the segment image created by the segment creation unit and the frame group classification result by the face group creation unit;
A video analysis apparatus comprising:

前記セグメント作成部は、
前記各フレーム画像の局所形状に基づく特徴量を抽出する特徴量抽出部と、
連続する前記２つのフレーム画像の前記特徴量を比較することによって、前記２つのフレーム画像中に同じ物体が含まれるか否かを検出する物体検出部と、
前記物体検出部による物体検出結果に基づいて、連続する前記２つのフレーム画像の連続性の評価値を計算し、前記評価値が所定の閾値より低い場合に前記２つのフレーム画像の間で映像を分割してセグメントを作成するセグメント生成部と、
を含むことを特徴とする、請求項１に記載の映像解析装置。 The segment creation unit
A feature amount extraction unit that extracts a feature amount based on the local shape of each frame image;
An object detection unit that detects whether or not the same object is included in the two frame images by comparing the feature quantities of the two consecutive frame images;
Based on the object detection result by the object detection unit, an evaluation value of the continuity of the two consecutive frame images is calculated, and when the evaluation value is lower than a predetermined threshold, a video is displayed between the two frame images. A segment generator for dividing and creating segments;
The video analysis apparatus according to claim 1, further comprising:

前記顔グループ作成部は、
前記フレーム画像から前記顔画像を検出する顔画像検出部と、
前記顔画像検出部によって検出された顔画像を比較し、類似する顔画像が抽出された連続する複数のフレーム画像を１つの顔グループとするグループ作成部と、
前記グループ作成部によって作成された各顔グループのフレーム画像から１以上の代表画像を選択する代表画像選択部と、
を含むことを特徴とする、請求項１または２に記載の映像解析装置。 The face group creation unit
A face image detection unit for detecting the face image from the frame image;
A group creation unit that compares the face images detected by the face image detection unit and sets a plurality of continuous frame images from which similar face images are extracted as one face group;
A representative image selection unit that selects one or more representative images from the frame images of each face group created by the group creation unit;
The video analysis apparatus according to claim 1, further comprising:

前記解析情報作成部は、前記セグメントに含まれる前記顔グループの代表画像を前記セグメントの代表画像として、前記セグメントの解析情報を作成することを特徴とする、請求項１〜３に記載の映像解析装置。 The video analysis according to claim 1, wherein the analysis information creation unit creates the analysis information of the segment using the representative image of the face group included in the segment as a representative image of the segment. apparatus.

前記セグメントの解析情報を表示装置に出力する解析情報出力部と、
利用者によって選択された前記セグメントの映像を再生し、前記表示装置に出力する映像出力部をさらに含むことを特徴とする、請求項１〜４に記載の映像解析装置。 An analysis information output unit for outputting the analysis information of the segment to a display device;
The video analysis apparatus according to claim 1, further comprising a video output unit that plays back the video of the segment selected by a user and outputs the video to the display device.

前記解析情報出力部は、前記セグメントの時系列に従って前記解析情報を出力することを特徴とする、請求項５に記載の映像解析装置。 The video analysis apparatus according to claim 5, wherein the analysis information output unit outputs the analysis information according to a time series of the segments.

前記解析情報出力部は、前記セグメントに含まれる前記顔グループに対応する顔画像に対応付けられる人物により前記セグメントを分類し、分類された結果に従って前記解析情報を出力することを特徴とする、請求項５または６に記載の映像解析装置。 The analysis information output unit classifies the segment according to a person associated with a face image corresponding to the face group included in the segment, and outputs the analysis information according to the classified result. Item 7. The video analysis device according to Item 5 or 6.

前記顔グループ作成部によって検出された顔画像の前記フレーム画像中の位置に基づいて、前記フレーム画像中の人物が存在する領域を計算する人物領域作成部をさらに備え、
前記セグメント作成部は、前記人物領域作成部によって計算された領域の特徴量を比較することによってセグメントを作成することを特徴とする、請求項１〜７に記載の映像解析装置。 A human region creation unit that calculates a region in which the person in the frame image exists based on the position in the frame image of the face image detected by the face group creation unit;
The video analysis apparatus according to claim 1, wherein the segment creation unit creates a segment by comparing feature amounts of regions calculated by the person region creation unit.

利用者によって選択された任意の検索画像が入力される検索画像入力部と、
前記検索画像と前記各フレーム画像との類似度を評価する検索画像評価部と、
前記検索画像から人物の顔画像を検出し、当該顔画像と前記各フレーム画像から検出された顔画像との類似度を評価する顔類似性評価部と、
をさらに備え、
前記解析情報作成部は、前記検索画像評価部による評価結果と前記顔類似性評価部による評価結果とに基づいて、前記解析情報の中から、前記検索画像に関連するセグメントの解析情報を抽出することを特徴とする、請求項１〜８に記載の映像解析装置。 A search image input unit for inputting an arbitrary search image selected by the user;
A search image evaluation unit for evaluating the similarity between the search image and each frame image;
A face similarity evaluation unit that detects a face image of a person from the search image and evaluates the degree of similarity between the face image and the face image detected from each frame image;
Further comprising
The analysis information creation unit extracts analysis information of a segment related to the search image from the analysis information based on the evaluation result by the search image evaluation unit and the evaluation result by the face similarity evaluation unit. The video analysis apparatus according to claim 1, wherein:

前記解析情報作成部は、前記検索画像評価部によって前記検索画像との類似度が高いと評価された前記フレーム画像が含まれる前記セグメントの解析情報を抽出することを特徴とする、請求項９に記載の映像解析装置。 The analysis information creation unit extracts analysis information of the segment including the frame image evaluated by the search image evaluation unit as having high similarity to the search image. The video analysis apparatus described.

前記解析情報作成部は、前記顔類似性評価部によって前記検索画像から検出された顔画像との類似度が高いと評価された顔画像を含む前記フレーム画像が含まれる前記セグメントの解析情報を抽出することを特徴とする、請求項９または１０に記載の映像解析装置。 The analysis information creation unit extracts analysis information of the segment including the frame image including the face image evaluated as having high similarity with the face image detected from the search image by the face similarity evaluation unit. The video analysis apparatus according to claim 9 or 10, wherein:

時系列に配列された複数のフレーム画像からなる映像を解析する映像解析方法であって、
前記フレーム画像より抽出される局所形状に基づく特徴量を用いて、連続する２つのフレーム画像に同じ物体が存在するか否かを判断することによって、前記２つのフレーム画像の連続性の評価を行い、前記連続性の評価結果に基づいて前記映像を複数のセグメントに分割するセグメント作成ステップと、
前記フレーム画像から人物の顔画像を検出し、前記検出された顔画像の類似度に基づいて前記フレーム画像をグループ化する顔グループ作成ステップと、
前記セグメント作成ステップにおいて作成された前記セグメントと、前記顔グループ作成ステップにおける前記フレーム画像の分類結果に基づいて、前記セグメントの解析情報を作成する解析情報作成ステップと、
を含むことを特徴とする、映像解析方法。 A video analysis method for analyzing a video composed of a plurality of frame images arranged in time series,
The continuity of the two frame images is evaluated by determining whether or not the same object exists in the two consecutive frame images using the feature amount based on the local shape extracted from the frame images. A segment creating step for dividing the video into a plurality of segments based on the continuity evaluation result;
A face group creating step of detecting a human face image from the frame image and grouping the frame images based on a similarity of the detected face image;
Based on the segment created in the segment creating step and the classification result of the frame image in the face group creating step, an analysis information creating step for creating analysis information of the segment;
A video analysis method comprising: