JP2006285878A

JP2006285878A - Video analyzing device and video analyzing program

Info

Publication number: JP2006285878A
Application number: JP2005108094A
Authority: JP
Inventors: Toshihiko Misu; 俊彦三須; Masaki Takahashi; 正樹高橋; Makoto Tadenuma; 眞蓼沼; Hideki Sumiyoshi; 英樹住吉; Masaki Sano; 雅規佐野
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2005-04-04
Filing date: 2005-04-04
Publication date: 2006-10-19
Anticipated expiration: 2025-04-04
Also published as: JP4644022B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video analyzing device and a video analyzing program for analyzing a video, and detecting an event in the video, especially, an event(specific play or the like) in a sport video. <P>SOLUTION: A video analyzing device 1 for analyzing an inputted video, is provided with a silhouette video generation means 2 for generating a silhouette video, a figure tracing means 3 for tracing a figure region, a color identifying means 4 for outputting a color classification number, a feature vector extracting means 5 for calculating feature vectors, an event detection means 6 for detecting the event and a post filter means 7 for specifying the event. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、映像の解析に係り、特に、スポーツ映像を対象にして、当該スポーツ映像内に含まれている人物の座標、移動速度、色等を評価して、スポーツ映像におけるシーンを解析する映像解析装置および映像解析プログラムに関する。 The present invention relates to video analysis, and in particular, for a sports video, a video for analyzing a scene in a sports video by evaluating the coordinates, moving speed, color, etc. of a person included in the sports video. The present invention relates to an analysis apparatus and a video analysis program.

従来の映像解析装置（画像解析システム）としては、入力された映像内における物体領域を追跡して、当該物体領域の座標を出力するものが開示されている（例えば、特許文献１〜３）。 As a conventional video analysis apparatus (image analysis system), an apparatus that tracks an object area in an input video and outputs coordinates of the object area is disclosed (for example, Patent Documents 1 to 3).

特許文献１に開示されている「運動物体計測装置および球技分析システムおよびデータサービスシステム」は、サッカー等の球技を対象として、複数組のＴＶカメラを利用して撮影した映像により、横方向や奥行き方向のみではなく、高さ方向を含めた３次元で、ボール位置を計測する手法である。 The “moving object measuring device, ball game analysis system, and data service system” disclosed in Patent Document 1 is directed to a ball game such as soccer, and uses a plurality of TV cameras to capture a horizontal direction and a depth. This is a technique for measuring the ball position in three dimensions including not only the direction but also the height direction.

特許文献２に開示されている「画像解析システム、画像解析方法および画像解析プログラム記録媒体」は、入力された映像を解析して、複数の物体の分離・合体を想定した追跡手法である。
特許文献３に開示されている「画像特徴抽出装置、画像特徴照合装置および画像検索装置」は、入力された映像を構成する画像から得られた様々な特徴量をファジィルールによって、演算することで、所望の画像を当該映像から検索する手法である。
特開２００１−２７３５００号公報特開２００２−６３５７７号公報特開平５−６４３７号公報 The “image analysis system, image analysis method, and image analysis program recording medium” disclosed in Patent Document 2 is a tracking method that analyzes an input video and assumes separation / merging of a plurality of objects.
The “image feature extraction device, image feature matching device, and image search device” disclosed in Patent Document 3 calculates various feature amounts obtained from images constituting an input video according to fuzzy rules. In this method, a desired image is searched from the video.
JP 2001-273500 A JP 2002-63577 A JP-A-5-6437

しかしながら、従来の映像解析装置（画像解析システム）では、映像を構成する画像から物体追跡を行って、この追跡した物体の座標を出力するか、または、映像を構成する画像から直接得られる特徴量（色など）によって、当該画像を分類するものが主流を占めており、画像からの物体追跡の結果と、画像から得られる特徴量を複合的に解析することにより、映像におけるイベント（出来事）、特に、スポーツ映像におけるイベント（特定のプレイ等）を検出することが可能なものが存在していないかった。 However, in a conventional video analysis device (image analysis system), an object is tracked from an image constituting a video and the coordinates of the tracked object are output, or a feature amount obtained directly from an image constituting the video Those that classify the image by (color etc.) dominate, and by analyzing the result of object tracking from the image and the feature amount obtained from the image in combination, the event (event) in the video, In particular, there is no thing that can detect an event (specific play or the like) in a sports video.

そこで、本発明では、映像を解析して、当該映像におけるイベント（出来事）、特に、スポーツ映像におけるイベント（特定のプレイ等）を検出することができる映像解析装置および映像解析プログラムを提供することを目的とする。 Therefore, the present invention provides a video analysis apparatus and a video analysis program that can analyze a video and detect an event (event) in the video, in particular, an event (specific play, etc.) in a sports video. Objective.

前記課題を解決するため、請求項１に記載の映像解析装置は、入力された映像を解析する映像解析装置であって、シルエット映像生成手段と、領域追跡手段と、色識別手段と、特徴ベクトル計算手段と、イベント検出手段と、ポストフィルタ手段と、を備える構成とした。 In order to solve the above problem, the video analysis device according to claim 1 is a video analysis device for analyzing an input video, wherein a silhouette video generation unit, a region tracking unit, a color identification unit, a feature vector, A calculation means, an event detection means, and a post filter means are provided.

かかる構成によれば、映像解析装置は、シルエット映像生成手段によって、入力された映像からシルエット映像を生成する。シルエット映像とは、背景色の情報に基づいて、人物等の被写体のシルエットを抽出した映像であり、例えば、背景差分法を用いて生成する。なお、ここでいうシルエットとは、背景色と設定した色と異なった色の、任意の面積を有している領域を指し、人物、物体、風景等を指している。続いて、映像解析装置は、領域追跡手段によって、シルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれるシルエットを追跡領域とし、シルエット画像間の差に基づいて追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する。 According to this configuration, the video analysis apparatus generates a silhouette video from the input video by the silhouette video generation unit. A silhouette video is a video in which a silhouette of a subject such as a person is extracted based on background color information, and is generated using, for example, a background subtraction method. Note that the silhouette here refers to a region having an arbitrary area of a color different from the set color and a person, an object, a landscape, and the like. Subsequently, the video analysis device uses the area tracking unit to set the silhouette included in the silhouette image included in the silhouette video generated by the silhouette video generation unit as a tracking area, and tracks the silhouette based on the difference between the silhouette images. In association with the identification number for identifying the region, the estimated coordinates and estimated speed of the tracking region and the area of the tracking region are output.

そして、映像解析装置は、色識別手段によって、推定座標と、シルエット映像と、映像とに基づき、追跡領域の色を識別して、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と識別番号とを出力し、特徴ベクトル計算手段によって、識別番号と対応付けられた、推定座標および推定速度と、追跡領域の面積と、色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。 Then, the video analysis device identifies the color of the tracking region based on the estimated coordinates, the silhouette video, and the video by the color identification unit, and sets in advance to classify the color based on the identified result Output the color classification number and the identification number, and based on at least one of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number by the feature vector calculation means, The feature amount included in the feature vector is calculated.

それから、映像解析装置は、イベント検出手段によって、特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、映像に含まれる各シーンで発生した出来事を示すイベントとして検出し、検出した結果を示すフラグ信号を出力し、ポストフィルタ手段によって、イベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定するイベント出力信号を出力する。 Then, the video analysis device detects an event that has occurred in each scene included in the video when the feature amount included in the feature vector calculated by the feature vector calculation unit satisfies a preset condition by the event detection unit. A flag signal indicating the detected result is output, and the post-filter means outputs at least one of a time-direction filtering process and a logical operation between the flag signals to the flag signal output from the event detection means. To output an event output signal that identifies the event.

なお、予め設定した条件とは、例えば、特徴量ごとの閾値を予め設定しておき、特徴量のいずれかが閾値を越えた場合や、複数の特徴量の組み合わせにおいて、それぞれの特徴量が閾値を越えた場合を指すものである。また、時間方向のフィルタ処理とは、時間方向において、起こりうるイベントを予測することで、フラグ信号を絞り込んでいく処理を指し、フラグ信号間の論理演算とは、フラグ信号同士で、論理演算を行うことで、同一時刻に起こり得ないイベントを除去していく処理を指している。 The preset condition is, for example, that a threshold value for each feature value is set in advance, and when any of the feature values exceeds the threshold value, or in a combination of a plurality of feature values, each feature value is a threshold value. It means the case where it exceeds. The filtering process in the time direction refers to a process of narrowing down flag signals by predicting possible events in the time direction, and the logical operation between flag signals is the logical operation between flag signals. This is a process for removing events that cannot occur at the same time.

請求項２に記載の映像解析装置は、請求項１に記載の映像解析装置において、前記領域追跡手段が、ラベリング手段と、面積判定手段と、逆投影変換手段と、検出追跡手段と、予測推定手段と、遅延手段と、を備える構成とした。 The video analysis apparatus according to claim 2 is the video analysis apparatus according to claim 1, wherein the area tracking unit includes a labeling unit, an area determination unit, a back projection conversion unit, a detection tracking unit, and a prediction estimation unit. Means and a delay means.

かかる構成によれば、映像解析装置は、ラベリング手段によって、シルエット画像に含まれるそれぞれのシルエットを単連結領域とし、この単連結領域に対してラベル番号を付加し、当該ラベル番号を付加した単連結領域の形状に関する領域情報を生成する。なお、単連結領域とは、ある領域がその内部を通る曲線によって、２点を繋ぐことができる場合を連結領域とすると、この連結領域上での任意の単一閉曲線の内部が常に当該連結領域の内部である場合を指すものである。また、領域情報は、少なくとも単連結領域の形状を推定することが可能な座標や長さ（座標間距離）を含むものである。 According to such a configuration, the video analysis apparatus uses the labeling unit to set each silhouette included in the silhouette image as a single connection region, and adds a label number to the single connection region, and the single connection includes the label number. Region information relating to the shape of the region is generated. A single connected region is a connected region where a certain region can connect two points by a curve passing through the inside of the region, and the interior of any single closed curve on the connected region is always the connected region. Indicates the case of being inside. The area information includes at least coordinates and lengths (distance between coordinates) that can estimate the shape of the single connected area.

続いて、映像解析装置は、面積判定手段によって、領域情報に基づいて、各単連結領域の面積を求め、求めた面積が一定範囲内にある単連結領域について、ラベル番号および単連結領域の面積を出力する。そして、映像解析装置は、逆投影変換手段によって、ラベリング手段で生成された領域情報と映像を撮影したカメラの投影中心に基づいて、３次元空間（実際の空間）における各単連結領域の存在場所を示す実座標を、ラベル番号と共に出力する。そして、映像解析装置は、検出追跡手段によって、追跡領域の座標および速度の予測された予測座標および予測速度が予め設定された所定単位時間遅延されて出力された、遅延予測座標および遅延予測速度と、ラベル番号と、面積判定手段から出力された単連結領域の面積と、実座標とに基づいて、ラベル番号に対応させた識別番号と、実座標と遅延予測座標とを対応させた観測座標とを出力すると共に、面積判定手段から出力された単連結領域の面積を追跡領域の面積として出力する。 Subsequently, the video analysis device obtains the area of each single connected region based on the region information by the area determination unit, and for the single connected region where the obtained area is within a certain range, the label number and the area of the single connected region Is output. Then, the video analysis apparatus uses the back projection conversion unit to determine the location of each single connected region in the three-dimensional space (actual space) based on the region information generated by the labeling unit and the projection center of the camera that captured the video. The real coordinates indicating are output together with the label number. Then, the video analysis apparatus outputs the delayed predicted coordinates and the delayed predicted speed, which are output after the predicted predicted coordinates and predicted speed of the tracking region coordinates and speed are delayed by a predetermined unit time set in advance by the detection tracking unit. , Based on the label number, the area of the single connected region output from the area determination means, and the real coordinates, the identification number corresponding to the label number, and the observation coordinates corresponding to the real coordinates and the delayed predicted coordinates Is output, and the area of the single connected region output from the area determining means is output as the area of the tracking region.

そして、映像解析装置は、予測推定手段によって、検出追跡手段で出力された観測座標を、時間方向に濾波予測し、推定座標および推定速度と、予測座標および予測速度とを、識別信号と共にそれぞれ出力し、遅延手段によって、予測推定手段から出力された予測座標および予測速度を所定単位時間遅延して、識別番号と共に検出追跡手段に出力する。 Then, the video analysis apparatus predicts the observation coordinates output by the detection and tracking means by the prediction estimation means in the time direction, and outputs the estimated coordinates and estimated speed, and the predicted coordinates and predicted speed together with the identification signal. Then, the delay means delays the prediction coordinates and the prediction speed output from the prediction estimation means by a predetermined unit time, and outputs them together with the identification number to the detection tracking means.

請求項３に記載の映像解析装置は、請求項１または請求項２に記載の映像解析装置において、前記特徴ベクトルが、シルエット数推定値、シルエット群分布定量化値、シルエット間距離定量化値、シルエット群重心定量化値、シルエット速さ定量化値、シルエット速度定量化値および判定値の少なくとも一つ以上の特徴量によって構成されることを特徴とする。 The video analysis apparatus according to claim 3 is the video analysis apparatus according to claim 1 or 2, wherein the feature vector includes an estimated number of silhouettes, a silhouette group distribution quantification value, a silhouette distance quantification value, It is characterized by comprising at least one feature quantity of a silhouette group centroid quantification value, a silhouette speed quantification value, a silhouette speed quantification value, and a judgment value.

かかる構成によれば、特徴ベクトル計算手段により計算した特徴ベクトルを構成する特徴量（値）の一つとなるシルエット数推定値は、シルエットの数を推定したものであり、シルエット群分布定量化値は、複数のシルエットからなるシルエット群の分布の散らばり度合いを定量化したものであり、シルエット間距離定量化値は、シルエット間の距離を定量化したものである。また、シルエット群重心定量化値は、シルエット群の分布を代表する座標を定量化したものであり、シルエット速さ定量化値は、シルエットの速さを定量化したものである。また、特徴ベクトルを構成する値の一つとなるシルエット速度定量化値は、シルエットの速度を定量化したものであり、判定値は、予め特定したシルエットである特定シルエットが特定の場所に存在するか否かを判定したものである。 According to this configuration, the estimated number of silhouettes, which is one of the feature quantities (values) constituting the feature vector calculated by the feature vector calculation means, is an estimate of the number of silhouettes, and the silhouette group distribution quantification value is The degree of dispersion of the distribution of silhouette groups composed of a plurality of silhouettes is quantified, and the distance quantification value between silhouettes is a quantification of the distance between silhouettes. The silhouette group centroid quantification value is a quantification of coordinates representative of the distribution of silhouette groups, and the silhouette speed quantification value is a quantification of silhouette speed. The silhouette speed quantification value, which is one of the values constituting the feature vector, is a quantification of the silhouette speed, and the judgment value is whether a specific silhouette, which is a previously specified silhouette, exists in a specific place. It is determined whether or not.

請求項４に記載の映像解析装置は、入力されたスポーツ映像を解析する映像解析装置であって、シルエット映像生成手段と、領域追跡手段と、色識別手段と、特徴ベクトル計算手段と、イベント検出手段と、ポストフィルタ手段と、を備える構成とした。 The video analysis device according to claim 4 is a video analysis device for analyzing an input sports video, wherein a silhouette video generation unit, a region tracking unit, a color identification unit, a feature vector calculation unit, an event detection And a post filter means.

かかる構成によれば、映像解析装置は、シルエット映像生成手段によって、スポーツ映像からシルエット映像を生成し、領域追跡手段によって、シルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれる所定面積範囲の領域である人物のシルエットを追跡領域とし、シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する。 According to such a configuration, the video analysis device generates a silhouette video from the sports video by the silhouette video generation unit, and the predetermined image included in the silhouette image constituting the silhouette video generated by the silhouette video generation unit by the region tracking unit. Estimated coordinates of the tracking area in which the silhouette of a person who is an area range area is set as a tracking area, the tracking area is tracked based on a difference between silhouette images, and is associated with an identification number for identifying the tracking area. And the estimated speed and the area of the tracking area are output.

続いて、映像解析装置は、色識別手段によって、推定座標と、シルエット映像と、スポーツ映像とに基づき、追跡領域の色を識別して、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と識別番号とを対応付けて出力し、特徴ベクトル計算手段によって、識別番号と対応付けられた、推定座標および推定速度と、追跡領域の面積と、色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。 Subsequently, the video analyzing apparatus identifies the color of the tracking area based on the estimated coordinates, the silhouette video, and the sports video by the color identification unit, and classifies the color based on the identified result. A color classification number set in advance and an identification number are output in association with each other, and at least of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number by the feature vector calculation means Based on one, the feature amount included in the feature vector is calculated.

そして、映像解析装置は、イベント検出手段によって、特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、検出した結果を示すフラグ信号を出力し、ポストフィルタ手段によって、イベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定するイベント出力信号を出力する。 Then, the video analysis device uses the event detection unit to specify the occurrence that occurred in each scene included in the sports video when the feature amount included in the feature vector calculated by the feature vector calculation unit satisfies a preset condition. Is detected as an event indicating a play, and a flag signal indicating the detected result is output, and the post-filter means outputs a flag signal output from the event detection means to a time direction filtering process and a logical operation between the flag signals. At least one of the processes is performed to output an event output signal that identifies the event.

請求項５に記載の映像解析装置は、請求項４に記載の映像解析装置において、前記領域追跡手段が、ラベリング手段と、面積判定手段と、逆投影変換手段と、検出追跡手段と、予測推定手段と、遅延手段と、を備える構成とした。 The video analysis apparatus according to claim 5 is the video analysis apparatus according to claim 4, wherein the area tracking unit includes a labeling unit, an area determination unit, a back projection conversion unit, a detection tracking unit, and a prediction estimation unit. Means and a delay means.

かかる構成によれば、映像解析装置は、ラベリング手段によって、シルエット画像に含まれるそれぞれの所定面積範囲の領域である人物のシルエットを単連結領域とし、この単連結領域に対してラベル番号を付加し、当該ラベル番号を付加した単連結領域に関する領域情報を生成し、面積判定手段によって、領域情報に基づいて、各単連結領域の面積を求め、求めた面積が一定範囲内にある単連結領域について、ラベル番号および面積を出力する。そして、映像解析装置は、逆投影変換手段によって、ラベリング手段で生成された領域情報とスポーツ映像を撮影したカメラの投影中心とに基づいて、３次元空間における各単連結領域の存在場所を示す実座標を、ラベル番号と共に出力する。 According to such a configuration, the video analysis apparatus uses the labeling unit to make the silhouette of a person, which is an area of each predetermined area included in the silhouette image, a single connected area, and adds a label number to the single connected area. The area information relating to the single connected region to which the label number is added is generated, and the area determination unit obtains the area of each single connected region based on the region information, and the obtained area is within a certain range. , Output the label number and area. Then, the video analysis device uses the back projection conversion unit to indicate the location of each single connected region in the three-dimensional space based on the region information generated by the labeling unit and the projection center of the camera that captured the sports video. Coordinates are output along with the label number.

そして、映像解析装置は、検出追跡手段によって、追跡領域の座標および速度の予測された予測座標および予測速度が予め設定された所定単位時間遅延されて出力された、遅延予測座標および遅延予測速度と、ラベル番号と、面積判定手段から出力された単連結領域の面積と、実座標とに基づいて、ラベル番号に対応させた識別番号と、実座標と遅延予測座標とを対応させた観測座標とを出力すると共に、面積判定手段から出力された単連結領域の面積を追跡領域の面積として出力する。そして、映像解析装置は、予測推定手段によって、検出追跡手段で出力された観測座標を、時間方向に濾波予測し、推定座標および推定速度と、予測座標および予測速度とを、識別信号と共にそれぞれ出力し、遅延手段によって、予測推定手段から出力された予測座標および予測速度を所定単位時間遅延して、識別番号と共に検出追跡手段に出力する。 Then, the video analysis apparatus outputs the delayed predicted coordinates and the delayed predicted speed, which are output after the predicted predicted coordinates and predicted speed of the tracking region coordinates and speed are delayed by a predetermined unit time set in advance by the detection tracking unit. , Based on the label number, the area of the single connected region output from the area determination means, and the real coordinates, the identification number corresponding to the label number, and the observation coordinates corresponding to the real coordinates and the delayed predicted coordinates Is output, and the area of the single connected region output from the area determining means is output as the area of the tracking region. Then, the video analysis apparatus predicts the observation coordinates output by the detection and tracking means by the prediction estimation means in the time direction, and outputs the estimated coordinates and estimated speed, and the predicted coordinates and predicted speed together with the identification signal. Then, the delay means delays the prediction coordinates and the prediction speed output from the prediction estimation means by a predetermined unit time, and outputs them together with the identification number to the detection tracking means.

請求項６に記載の映像解析装置は、請求項４または請求項５に記載の映像解析装置において、前記特徴ベクトルが、人物領域数推定値、人物領域群分布定量化値、人物領域間距離定量化値、人物群重心定量化値、人物領域速さ定量化値、人物領域速度定量化値および判定値の少なくとも一つ以上の特徴量によって構成されることを特徴とする。 The video analysis apparatus according to claim 6 is the video analysis apparatus according to claim 4 or 5, wherein the feature vector includes an estimated number of person areas, a person area group distribution quantification value, and a distance between person areas. It is characterized by comprising at least one feature quantity of a quantified value, a person group centroid quantified value, a person area speed quantified value, a person area speed quantified value, and a judgment value.

かかる構成によれば、特徴ベクトル計算手段により計算された特徴ベクトルを構成する特徴量（値）の一つとなる人物領域数推定値は、人物のシルエットの数を推定したものであり、人物領域群分布定量化値は、複数の人物のシルエットからなる人物シルエット群の分布の散らばり度合いを定量化したものであり、人物間距離定量化値は、人物のシルエット間の距離を定量化したものである。また、人物群重心定量化値は、人物シルエット群の分布を代表する座標を定量化したものであり、人物領域速さ定量化値は、人物のシルエットの速さを定量化したものである。また、特徴ベクトルを構成する値の一つとなる人物領域速度定量化値は、人物のシルエットの速度を定量化したものであり、判定値は、予め特定した人物のシルエットである特定人物シルエットが特定の場所に存在するか否かを判定したものである。 According to such a configuration, the estimated number of person regions that is one of the feature amounts (values) constituting the feature vector calculated by the feature vector calculating means is an estimate of the number of person silhouettes. The distribution quantification value is a quantification of the degree of dispersion of the distribution of a group of person silhouettes consisting of a plurality of person silhouettes, and the interpersonal distance quantification value is a quantification of the distance between silhouettes of people. . The person group center-of-gravity quantification value is obtained by quantifying the coordinates representing the distribution of the person silhouette group, and the person region speed quantification value is obtained by quantifying the speed of the silhouette of the person. The person region speed quantification value, which is one of the values constituting the feature vector, is a quantification of the silhouette of a person's silhouette, and the judgment value is specified by a specific person silhouette that is a silhouette of a person specified in advance. It is determined whether or not it exists in the place.

請求項７に記載の映像解析プログラムは、入力された映像を解析するために、コンピュータを、シルエット映像生成手段、領域追跡手段、色識別手段、特徴ベクトル計算手段、イベント検出手段、ポストフィルタ手段、として機能させる構成とした。 The video analysis program according to claim 7, in order to analyze the input video, the computer includes a silhouette video generation unit, a region tracking unit, a color identification unit, a feature vector calculation unit, an event detection unit, a post filter unit, It was set as the structure made to function as.

かかる構成によれば、映像解析プログラムは、シルエット映像生成手段によって、映像からシルエット映像を生成し、領域追跡手段によって、シルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれるシルエットを追跡領域とし、シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する。続いて、映像解析プログラムは、色識別手段によって、推定座標と、シルエット映像と、映像とに基づき、追跡領域の色を識別して、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と識別番号とを対応付けて出力し、特徴ベクトル計算手段によって、識別番号と対応付けられた、推定座標および推定速度と、追跡領域の面積と、色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。 According to such a configuration, the video analysis program generates a silhouette video from the video by the silhouette video generation unit, and the silhouette included in the silhouette image constituting the silhouette video generated by the silhouette video generation unit by the region tracking unit. The tracking area is tracked based on the difference between the silhouette images, and the tracking area is estimated in coordinates and speed, and the tracking area is associated with an identification number for identifying the tracking area. Is output. Subsequently, the video analysis program identifies the color of the tracking area based on the estimated coordinates, the silhouette video, and the video by the color identification unit, and classifies the color based on the identified result in advance. The set color classification number and the identification number are output in association with each other, and at least one of the estimated coordinates and the estimated speed, the tracking area area, and the color classification number associated with the identification number by the feature vector calculation means. Based on the above, the feature amount to be included in the feature vector is calculated.

そして、映像解析プログラムは、イベント検出手段によって、特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、映像に含まれる各シーンで発生した出来事を示すイベントとして検出し、検出した結果を示すフラグ信号を出力し、ポストフィルタ手段によって、イベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定するイベント出力信号を出力する。 Then, the video analysis program detects an event that occurred in each scene included in the video when the feature amount included in the feature vector calculated by the feature vector calculation unit satisfies a preset condition by the event detection unit. A flag signal indicating the detected result is output, and the post-filter means outputs at least one of a time-direction filtering process and a logical operation between the flag signals to the flag signal output from the event detection means. To output an event output signal that identifies the event.

請求項８に記載の映像解析プログラムは、入力されたスポーツ映像を解析するために、コンピュータを、シルエット映像生成手段、領域追跡手段、色識別手段、特徴ベクトル計算手段、イベント検出手段、ポストフィルタ手段、として機能させる構成とした。 9. The video analysis program according to claim 8, wherein the computer analyzes a silhouette video generation means, a region tracking means, a color identification means, a feature vector calculation means, an event detection means, and a post filter means in order to analyze the input sports video. It was set as the structure made to function as.

かかる構成によれば、映像解析プログラムは、シルエット映像生成手段によって、スポーツ映像からシルエット映像を生成し、領域追跡手段によって、シルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれる所定面積範囲の領域である人物のシルエットを追跡領域とし、シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する。続いて、映像解析プログラムは、色識別手段によって、推定座標と、シルエット映像と、スポーツ映像とに基づき、追跡領域の色を識別して、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と識別番号とを対応付けて出力し、特徴ベクトル計算手段によって、識別番号と対応付けられた、推定座標および推定速度と、追跡領域の面積と、色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。 According to such a configuration, the video analysis program generates a silhouette video from the sports video by the silhouette video generation unit, and the predetermined image included in the silhouette image constituting the silhouette video generated by the silhouette video generation unit by the region tracking unit. Estimated coordinates of the tracking area in which the silhouette of a person who is an area range area is set as a tracking area, the tracking area is tracked based on a difference between silhouette images, and is associated with an identification number for identifying the tracking area. And the estimated speed and the area of the tracking area are output. Subsequently, the video analysis program uses the color identification means to identify the color of the tracking area based on the estimated coordinates, the silhouette video, and the sports video, and to classify the color based on the identified result. A color classification number set in advance and an identification number are output in association with each other, and at least of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number by the feature vector calculation means Based on one, the feature amount included in the feature vector is calculated.

そして、映像解析プログラムは、イベント検出手段によって、特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、検出した結果を示すフラグ信号を出力し、ポストフィルタ手段によって、イベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、前記フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定するイベント出力信号を出力する。 Then, the video analysis program identifies the event that has occurred in each scene included in the sports video when the feature amount included in the feature vector calculated by the feature vector calculation unit satisfies a preset condition by the event detection unit. Is detected as an event indicating the play of the player, a flag signal indicating the detected result is output, and the post-filter means outputs the flag signal output from the event detection means to the time direction filtering process and the logical operation between the flag signals And an event output signal for specifying the event is output.

請求項１、７に記載の発明によれば、映像からシルエット映像を生成し、シルエット映像に含まれるシルエットを追跡領域とし、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積と、当該追跡領域の色を示す色分類番号を出力し、これらの少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。さらに、特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、映像に含まれる各シーンで発生した出来事を示すイベントとして検出し、時間方向のフィルタ処理と、フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定する。このため、入力された映像を解析して、当該映像に含まれている被写体（人物や物体等）の実際の空間上における座標と、速度と、色と、大きさ（面積）によって特徴付けられるイベント（出来事）を検出することができる。 According to the inventions of claims 1 and 7, a silhouette video is generated from a video, a silhouette included in the silhouette video is set as a tracking area, an estimated coordinate and an estimated speed of the tracking area, an area of the tracking area, A color classification number indicating the color of the tracking area is output, and a feature amount included in the feature vector is calculated based on at least one of them. Further, when the feature amount included in the feature vector satisfies a preset condition, it is detected as an event indicating an event that has occurred in each scene included in the video, and the temporal filtering process and the logic between the flag signals are detected. An event is specified by performing at least one of the processing with the operation. For this reason, the input video is analyzed, and is characterized by coordinates, speed, color, and size (area) of the subject (person, object, etc.) in the actual space in the actual space. An event can be detected.

請求項２に記載の発明によれば、シルエット映像に含まれるそれぞれのシルエットを単連結領域とし、ラベル番号を付加し、各単連結領域の面積が一定範囲内にある単連結領域について、遅延予測座標および遅延予測速度と、ラベル番号と、単連結領域の面積と、実座標とに基づいて、ラベル番号に対応させた識別番号と、実座標と遅延予測座標とを対応させた観測座標とを出力すると共に、面積判定手段から出力された単連結領域の面積を追跡領域の面積として出力する。そして、出力された観測座標を、時間方向に濾波予測し、推定座標および推定速度を出力する。このため、映像に含まれる被写体（人物や物体等）を、当該映像の進行時間に沿って（時間方向に）追跡して、当該被写体の座標を出力すると共に、当該被写体の速度および当該被写体の見かけの大きさを出力することができる。 According to the second aspect of the present invention, each silhouette included in the silhouette video is defined as a single connected region, a label number is added, and the single connected region in which the area of each single connected region is within a certain range is subjected to delay prediction. Based on the coordinates and delay prediction speed, the label number, the area of the single connected region, and the real coordinates, the identification number corresponding to the label number, and the observation coordinates corresponding to the real coordinates and the delay prediction coordinates In addition to outputting, the area of the single connected region output from the area determining means is output as the area of the tracking region. The output observation coordinates are predicted to be filtered in the time direction, and the estimated coordinates and the estimated speed are output. For this reason, the subject (person, object, etc.) included in the video is tracked along the time of progress of the video (in the time direction), the coordinates of the subject are output, and the speed of the subject and the subject The apparent size can be output.

請求項３に記載の発明によれば、特徴ベクトルに含まれる特徴量が、シルエット数推定値と、シルエット群分布定量化値と、シルエット速さ定量化値と、シルエット速度定量化値と、判定値との少なくとも１つを備えているので、こういった座標と、速度と、面積と、色分類といった低次の映像特徴量を、映像のイベントに関する高次の映像特徴量に変換することができる。 According to the third aspect of the present invention, the feature amount included in the feature vector is determined by determining the silhouette number estimation value, the silhouette group distribution quantification value, the silhouette speed quantification value, the silhouette speed quantification value, Because it has at least one of the values, it is possible to convert low-order video feature quantities such as coordinates, speed, area, and color classification into higher-order video feature quantities related to video events. it can.

請求項４、８に記載の発明によれば、スポーツ映像からシルエット映像を生成し、シルエット映像に含まれる人物のシルエットを追跡領域とし、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積と、当該追跡領域の色を示す色分類番号を出力し、これらの少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する。さらに、特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、時間方向のフィルタ処理と、フラグ信号間の論理演算との少なくとも一方の処理を行って、イベントを特定する。このため、スポーツ映像を解析して、当該スポーツ映像における特定のプレイ等を検出することができる。 According to the fourth and eighth aspects of the present invention, a silhouette video is generated from a sports video, the silhouette of a person included in the silhouette video is set as a tracking area, the estimated coordinates and speed of the tracking area, and the tracking area A color classification number indicating the area and the color of the tracking region is output, and a feature amount included in the feature vector is calculated based on at least one of them. Further, when the feature amount included in the feature vector satisfies a preset condition, it is detected as an event indicating a specific play occurring in each scene included in the sports video, and filtering in the time direction and flag signal are detected. An event is specified by performing at least one of the logical operations. For this reason, it is possible to analyze a sports video and detect a specific play or the like in the sports video.

請求項５に記載の発明によれば、シルエット映像に含まれるそれぞれの人物のシルエットを単連結領域とし、ラベル番号を付加し、各単連結領域の面積が一定範囲内にある単連結領域について、遅延予測座標および遅延予測速度と、ラベル番号と、単連結領域の面積と、実座標とに基づいて、ラベル番号に対応させた識別番号と、実座標と遅延予測座標とを対応させた観測座標とを出力すると共に、面積判定手段から出力された単連結領域の面積を追跡領域の面積として出力する。そして、出力された観測座標を、時間方向に濾波予測し、推定座標および推定速度を出力する。このため、スポーツ映像に含まれる被写体（選手等）を、当該スポーツ映像の進行時間に沿って（時間方向に）追跡して、当該被写体の座標を出力すると共に、当該被写体の速度および当該被写体の見かけの大きさを出力することができる。 According to the invention of claim 5, the silhouette of each person included in the silhouette video is a single connected region, a label number is added, and the single connected region where the area of each single connected region is within a certain range, Based on the delay prediction coordinates and delay prediction speed, the label number, the area of the single connected region, and the real coordinates, the identification number corresponding to the label number, and the observation coordinates corresponding to the real coordinates and the delay prediction coordinates Are output, and the area of the single connected region output from the area determining means is output as the area of the tracking region. The output observation coordinates are predicted to be filtered in the time direction, and the estimated coordinates and the estimated speed are output. Therefore, a subject (player, etc.) included in the sports video is tracked along the time of progress of the sports video (in the time direction), the coordinates of the subject are output, and the speed of the subject and the subject The apparent size can be output.

請求項６に記載の発明によれば、特徴ベクトルに含まれる特徴量が、人物数推定値と、人物群分布定量化値と、人物速さ定量化値と、人物速度定量化値と、判定値との少なくとも１つを備えているので、こういった座標と、速度と、面積と、色分類といった低次の映像特徴量を、スポーツ映像のイベント（特定のプレイ等）に関する高次の映像特徴量に変換することができる。 According to the sixth aspect of the present invention, the feature amount included in the feature vector is determined based on the estimated number of persons, the person group distribution quantified value, the person speed quantified value, the person speed quantified value, Because it has at least one of the values, these higher-order videos related to sports video events (specific play, etc.), such as coordinates, speed, area and color classification It can be converted into a feature value.

次に、本発明の実施形態について、適宜、図面を参照しながら詳細に説明する。
（映像解析装置の構成）
図１は映像解析装置のブロック図である。図１に示すように、映像解析装置１は、入力された映像（入力映像）を解析して、当該入力映像によって表現されているイベント（出来事）を出力するもので、シルエット映像生成手段２と、人物追跡手段（領域追跡手段）３と、色識別手段４と、特徴ベクトル抽出手段（特徴ベクトル計算手段）５と、イベント検出手段６と、ポストフィルタ手段７とを備えている。特に、この映像解析装置１は、スポーツ映像を解析して、当該スポーツ映像のイベント（特定のプレイ等）を出力するものである。 Next, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
(Configuration of video analysis device)
FIG. 1 is a block diagram of a video analysis apparatus. As shown in FIG. 1, the video analysis device 1 analyzes an input video (input video) and outputs an event (event) expressed by the input video. A person tracking means (area tracking means) 3, a color identification means 4, a feature vector extraction means (feature vector calculation means) 5, an event detection means 6, and a post filter means 7. In particular, the video analysis device 1 analyzes a sports video and outputs an event (specific play or the like) of the sports video.

この映像解析装置１では、入力映像を構成する１枚１枚の画像を入力画像Ｉ（ｘ，ｙ）とし、この入力画像Ｉ（ｘ，ｙ）の画像座標（ｘ，ｙ）に含まれている画素値をｉ（ｘ，ｙ）とする。なお、画素値ｉ（ｘ，ｙ）は、輝度値のみで表される場合もあれば、色（色彩）を表現する２次元以上のベクトルで表される場合もある。この映像解析装置１では、例えば、画素値ｉ（ｘ，ｙ）として、色彩を表現する赤成分、緑成分および青成分からなる３次元の色ベクトルを用いることができる。また、この映像解析装置１では、画素値ｉ（ｘ，ｙ）として、輝度値および２種類の色差値からなる３次元の色ベクトルを用いることができる。なお、この実施形態では、画素値ｉ（ｘ，ｙ）として、輝度値および２種類の色差値からなる３次元の色ベクトルを用いている。 In this video analysis apparatus 1, each image constituting the input video is set as an input image I (x, y), and is included in the image coordinates (x, y) of the input image I (x, y). Let i (x, y) be the pixel value. Note that the pixel value i (x, y) may be represented by only a luminance value, or may be represented by a two-dimensional or higher vector representing a color (color). In the video analysis apparatus 1, for example, a three-dimensional color vector composed of a red component, a green component, and a blue component expressing colors can be used as the pixel value i (x, y). In the video analysis apparatus 1, a three-dimensional color vector composed of a luminance value and two kinds of color difference values can be used as the pixel value i (x, y). In this embodiment, a three-dimensional color vector including a luminance value and two types of color difference values is used as the pixel value i (x, y).

シルエット映像生成手段２は、入力画像Ｉ（ｘ，ｙ）から人物領域（追跡領域）と、非人物領域とを分割した２値（０または１）のシルエット画像Ｓ（ｘ，ｙ）を生成するものである。なお、シルエット映像は、複数のシルエット画像Ｓ（ｘ，ｙ）によって構成される。ここでは、例えば、所定面積範囲の領域である人物領域（人物のシルエット）にはＳ（ｘ，ｙ）＝１の値を、非人物領域にはＳ（ｘ，ｙ）＝０の値をそれぞれ割り当てるものとする。なお、ここでいうシルエットとは、背景色と設定した色と異なった色の、任意の面積を有している領域を指し、人物、物体、風景等を指している。 The silhouette video generation means 2 generates a binary (0 or 1) silhouette image S (x, y) obtained by dividing a person area (tracking area) and a non-person area from an input image I (x, y). Is. The silhouette video is composed of a plurality of silhouette images S (x, y). Here, for example, a value of S (x, y) = 1 is set for a human region (person silhouette) that is a region of a predetermined area range, and a value of S (x, y) = 0 is set for a non-human region. Shall be assigned. Note that the silhouette here refers to a region having an arbitrary area of a color different from the set color and a person, an object, a landscape, and the like.

このシルエット映像生成手段２によって、シルエット画像Ｓ（ｘ，ｙ）を生成する手法としては、例えば、ハードクロマキーに代表されるような背景色情報に基づく手法を用いることができる。この背景色情報に基づく手法では、まず、色ベクトルｃに対して、Ｋ（ｃ）なる関数を定義する。この関数Ｋ（ｃ）は、背景色が色ベクトルｃで表されるときにはＫ（ｃ）＝０となり、背景色が色ベクトルｃで表されないときにはＫ（ｃ）＝１となるように予め設定されているものである。 As a method for generating the silhouette image S (x, y) by the silhouette video generation unit 2, for example, a method based on background color information represented by a hard chroma key can be used. In the method based on the background color information, first, a function K (c) is defined for the color vector c. This function K (c) is preset so that K (c) = 0 when the background color is represented by the color vector c, and K (c) = 1 when the background color is not represented by the color vector c. It is what.

このような関数Ｋ（ｃ）は、色ベクトルｃを入力とする表形式の要素群（データ群）、つまり、ルックアップテーブル（予め設定した色分類番号に相当）によって実現することができる。この場合、ルックアップテーブルの各要素のうち、背景色に対応する要素には“０”を、その他の要素には“１”を予め設定（登録）しておくものとする。例えば、芝生で行うスポーツに関する入力映像の場合、芝生らしい色（例えば、緑色）に対応する要素を“０”とし、それ以外の色に対応する要素を“１”と設定しておくことが想定される。 Such a function K (c) can be realized by a tabular element group (data group) having the color vector c as an input, that is, a lookup table (corresponding to a preset color classification number). In this case, among the elements of the lookup table, “0” is set (registered) in advance for the element corresponding to the background color, and “1” is set for the other elements. For example, in the case of an input video related to sports performed on a lawn, it is assumed that an element corresponding to a lawn-like color (for example, green) is set to “0” and elements corresponding to other colors are set to “1”. Is done.

具体的に説明すると、関数Ｋ（ｃ）を用いたハードクロマキーは、次に示す（１）式を用いることで実現される。

More specifically, the hard chroma key using the function K (c) is realized by using the following equation (1).

また、シルエット映像生成手段２によって、シルエット画像Ｓ（ｘ，ｙ）を生成する手法としては、例えば、背景差分法を用いることができる。この背景差分法による場合、予め、人物の映っていない画像（以下、背景画像という）Ｂ（ｘ，ｙ）を作成しておき、入力画像Ｉ（ｘ，ｙ）との差分を計算し、さらに、閾値処理することでシルエット画像Ｓ（ｘ，ｙ）を生成する。例えば、閾値処理を行う関数をＴ（ｃ）として、次に示す（２）式を用いることで、シルエット画像Ｓ（ｘ，ｙ）が生成される。

Further, as a method for generating the silhouette image S (x, y) by the silhouette video generation means 2, for example, a background difference method can be used. In the case of this background subtraction method, an image (hereinafter referred to as background image) B (x, y) in which no person is shown is created in advance, and the difference from the input image I (x, y) is calculated. The silhouette image S (x, y) is generated by threshold processing. For example, a silhouette image S (x, y) is generated by using the following equation (2), where T (c) is a function that performs threshold processing.

（２）式において、閾値処理の関数Ｔ（ｃ）は、入力画像Ｉ（ｘ，ｙ）の画素値が輝度値のみである場合には、例えば、ゼロ未満（以下）の閾値θ₀およびゼロ以上（を越える）の閾値θ₁に基づいて定義される、次に示す（３）式を用いることができる。 In the equation (2), the threshold processing function T (c) is, for example, a threshold value θ ₀ and zero that are less than zero (below) when the pixel value of the input image I (x, y) is only a luminance value. The following equation (3) defined based on the above threshold value θ ₁ (exceeding) can be used.

また、入力画像Ｉ（ｘ，ｙ）の画素値として、多次元の色ベクトルを用いる場合には、色ベクトルｃが色ベクトル空間内の一定領域内にある場合には、Ｔ（ｃ）＝０、それ以外の場合にはＴ（ｃ）＝１とする。例えば、色ベクトルｃが赤成分ｃ_R、緑成分ｃ_Gおよび青成分ｃ_Bにより構成される場合に、ゼロ未満（以下）の三つの閾値θ_R0、θ_G0およびθ_B0、並びに、ゼロ以上（を越える）の三つの閾値θ_R1、θ_G1およびθ_B1に基づいて定義される、次に示す（４）式を用いることができる。 When a multidimensional color vector is used as the pixel value of the input image I (x, y), T (c) = 0 when the color vector c is in a certain region in the color vector space. In other cases, T (c) = 1. For example, when the color vector c is composed of a red component c _R , a green component c _G and a blue component c _B , three threshold values θ _R0 , θ _G0 and θ _B0 that are less than (less than) zero, and zero or more ( (4) defined below based on three threshold values θ _R1 , θ _G1, and θ _B1 .

人物追跡手段３は、シルエット映像生成手段２で生成された複数のシルエット画像からなるシルエット映像中の各人物のシルエット（追跡領域）を追跡して、時間方向の対応付けを行って、実空間（実際の空間、３次元空間）内における各人物の座標、速度および大きさに関する情報を出力するものである。この人物追跡手段３の詳細な構成を、図２に示す。 The person tracking means 3 tracks the silhouette (tracking area) of each person in a silhouette video made up of a plurality of silhouette images generated by the silhouette video generating means 2 and performs time direction matching to match the real space ( Information on the coordinates, speed, and size of each person in an actual space (three-dimensional space) is output. A detailed configuration of the person tracking means 3 is shown in FIG.

図２に示すように、人物追跡手段３は、ラベリング手段３１と、面積判定手段３２と、逆投影変換手段３３と、検出・追跡手段（検出追跡手段）３４と、予測・推定手段（予測推定手段）３５と、遅延手段３６とを備えている。 As shown in FIG. 2, the person tracking unit 3 includes a labeling unit 31, an area determination unit 32, a backprojection conversion unit 33, a detection / tracking unit (detection tracking unit) 34, and a prediction / estimation unit (prediction estimation). Means) 35 and delay means 36.

ラベリング手段３１は、入力されたシルエット映像を構成するシルエット画像Ｓ（ｘ，ｙ）に含まれている各シルエットに対して、各々異なるラベル番号ｌ∈｛１，２，・・・，Ｎ｝を付加（付与）するものである。ここで、シルエット画像Ｓ（ｘ，ｙ）に含まれている各シルエットは単連結領域であり、Ｓ（ｘ，ｙ）＝１で表すこととする。 The labeling means 31 assigns different label numbers lε {1, 2,..., N} to the silhouettes included in the silhouette images S (x, y) constituting the inputted silhouette video. It is added (given). Here, each silhouette included in the silhouette image S (x, y) is a single connected region and is represented by S (x, y) = 1.

面積判定手段３２は、ラベリング手段３１でラベル番号が付加された各シルエットの面積Ａ（ｌ）を求めると共に、このシルエット面積Ａ（ｌ）が人物の大きさとして妥当であるか否かを判定し、判定した結果、妥当な面積のシルエットのみを抽出するものである。なお、抽出されたラベル番号の集合をＬとすると、この集合Ｌは次に示す（５）式で表せる。つまり、集合Ｌは、人物として妥当な面積のシルエットの集合であるといえる。
Ｌ＝｛ｌ｜Ａ（ｌ）が人物の大きさとして妥当、ｌ∈｛１，２，・・・，Ｎ｝｝
・・・（５）式 The area determination unit 32 obtains the area A (l) of each silhouette to which the label number is added by the labeling unit 31 and determines whether the silhouette area A (l) is appropriate as the size of the person. As a result of the determination, only silhouettes with a reasonable area are extracted. If the set of extracted label numbers is L, this set L can be expressed by the following equation (5). That is, the set L can be said to be a set of silhouettes having a reasonable area as a person.
L = {l | A (l) is valid as the size of a person, l∈ {1, 2,..., N}}
... (5) formula

そして、この面積判定手段３２は、集合Ｌを領域情報として、逆投影変換手段３３に出力すると共に、ラベル番号ｌおよびシルエット面積Ａ（ｌ）を検出・追跡手段３４に出力する。なお、領域情報は、シルエット（単連結領域）の形状に関するものであればよい。例えば、領域情報として、シルエットの座標、座標間距離（長さ）等を含んでいればよい。 The area determination unit 32 outputs the set L as region information to the backprojection conversion unit 33 and outputs the label number l and the silhouette area A (l) to the detection / tracking unit 34. Note that the area information may be related to the shape of the silhouette (single connected area). For example, the region information may include silhouette coordinates, inter-coordinate distance (length), and the like.

この面積判定手段３２において、各シルエットの面積Ａ（ｌ）が人物の大きさとして妥当であるか否かの判定の仕方は、ここでは、人物の大きさとして妥当であるとする面積の上限および下限の閾値を設け、所定範囲を設定しておき、各シルエットの面積Ａ（ｌ）が所定範囲にある場合には妥当であると判定する仕方を採用している。 In this area determination means 32, the method of determining whether or not the area A (l) of each silhouette is appropriate as the size of a person is, here, the upper limit of the area that is appropriate as the size of the person and A method is adopted in which a lower limit threshold is set, a predetermined range is set, and when the area A (l) of each silhouette is within the predetermined range, it is determined to be appropriate.

ここで、集合Ｌを求める手法について、図３、図４を参照して説明する。
図３に示すように、ラベル番号ｌのシルエットの重心（シルエット領域の画像上における重心）Ｇ（ｌ）を求め、入力映像を撮影したカメラ（図示せず）の投影中心からシルエット画像平面上のシルエットの重心Ｇ（ｌ）を通る半直線を引く。この半直線が実空間上における地面から高さｈの平面と交差する点Ｈ（ｌ）を求める。例えば、高さｈの値として人物の身長（例えば、成人の平均身長）の１／２程度の値を設定することにより、シルエットの重心Ｇ（ｌ）が当該シルエットの高さのほぼ１／２の高さに位置することから、点Ｈ（ｌ）は、実空間上における人物の重心座標とほぼ一致することになる。なお、この点Ｈ（ｌ）が実座標に相当している。 Here, a method for obtaining the set L will be described with reference to FIGS.
As shown in FIG. 3, the center of gravity of the silhouette with the label number l (the center of gravity on the silhouette region image) G (l) is obtained, and on the silhouette image plane from the projection center of the camera (not shown) that captured the input video. Draw a half line through the center of gravity G (l) of the silhouette. A point H (l) where this half line intersects the plane of height h from the ground in real space is obtained. For example, by setting a value of about half of the height of a person (for example, the average height of an adult) as the value of height h, the center of gravity G (l) of the silhouette is approximately ½ of the height of the silhouette. Therefore, the point H (l) substantially coincides with the barycentric coordinates of the person in the real space. This point H (l) corresponds to real coordinates.

続いて、図４に示すように、人物の大きさ程度の立体（人物の体積に相当する立体、直方体や円柱等）を点Ｈ（ｌ）の座標に配置したと仮定し、この立体を透視変換して、シルエット画像平面上における立体の像を求める。人物の大きさ程度の立体として、直方体を用いた場合には、当該直方体を投影面（表示装置（図示せず）の表示面）に写像した際の像の輪郭は、四角形、五角形または六角形の多角形の形状となる。 Subsequently, as shown in FIG. 4, it is assumed that a solid approximately the size of a person (a solid corresponding to the volume of the person, a rectangular parallelepiped, a cylinder, etc.) is placed at the coordinates of the point H (l), and this solid is seen through. Conversion is performed to obtain a three-dimensional image on the silhouette image plane. When a rectangular parallelepiped is used as a solid that is about the size of a person, the outline of the image when the rectangular parallelepiped is mapped onto a projection plane (display surface of a display device (not shown)) is a quadrangle, pentagon, or hexagon. It becomes the shape of a polygon.

そして、得られた直方体の像において、投影面に対面する面の面積、または、得られた直方体の像に外接する方形（バウンディングボックス）の面積を求め、Ａ_est（ｌ）とする。また、０＜ｋ₀≦ｋ₁に基づいて定義される、次に示す（６）式を用いた閾値処理により集合Ｌを求めてもよい。 Then, in the obtained rectangular parallelepiped image, the area of the surface facing the projection plane or the area of the rectangle (bounding box) circumscribing the obtained rectangular parallelepiped image is obtained and is defined as A _est (l). Alternatively, the set L may be obtained by threshold processing using the following equation (6) defined based on 0 <k ₀ ≦ k ₁ .

これら図３、図４を参照して説明した、集合Ｌを求める手法では、カメラ（図示せず）から人物までの距離に応じて、正規化した面積を評価しているので、カメラから人物までの距離によらずに人物らしい大きさのシルエット（領域）のみを抽出することができる。 In the method for obtaining the set L described with reference to FIGS. 3 and 4, the normalized area is evaluated according to the distance from the camera (not shown) to the person. It is possible to extract only silhouettes (regions) of a person-like size regardless of the distance.

図２に戻って、映像解析装置１における人物追跡手段３の構成の説明を続ける。
逆投影変換手段３３は、面積判定手段３２から出力された集合Ｌ（人物として妥当な面積のシルエットに付加されているラベル番号の集合）に属する各シルエット（各人物領域）のシルエット画像平面上における重心を、実空間に逆投影することにより、実空間上での人物の座標を計算するものである。ここでは、逆投影変換手段３３は、図３に示した点Ｈ（ｌ）を人物の実座標として、ラベル番号ｌと共に、検出・追跡手段３４に出力している。 Returning to FIG. 2, the description of the configuration of the person tracking means 3 in the video analysis device 1 will be continued.
The backprojection conversion means 33 on the silhouette image plane of each silhouette (each person region) belonging to the set L (a set of label numbers added to silhouettes having an appropriate area as a person) output from the area determination means 32. The coordinates of the person in the real space are calculated by back projecting the center of gravity into the real space. Here, the backprojection conversion means 33 outputs the point H (l) shown in FIG. 3 to the detection / tracking means 34 together with the label number l as the real coordinates of the person.

検出・追跡手段３４は、集合Ｌに属する各ラベル番号ｌ（ｌ∈Ｌ）のシルエット（以下、人物領域という）の各実座標（Ｈ（ｌ））と、遅延手段３６から出力された１単位時間前（所定単位時間）までに追跡・予測されているシルエットの各座標との距離を比較することで、ラベル番号ｌと、既に（１単位時間前までに）追跡・予測されている人物のシルエット（人物領域）に付加（付与）されている識別番号ｍとの対応付け行うものである。 The detection / tracking means 34 includes the real coordinates (H (l)) of silhouettes (hereinafter referred to as person regions) of each label number l (lεL) belonging to the set L, and one unit output from the delay means 36. By comparing the distance to each coordinate of the silhouette that has been tracked / predicted by time (predetermined unit time), the label number l and the person already tracked / predicted (by 1 unit time ago) This is associated with the identification number m added (given) to the silhouette (person area).

この検出・追跡手段３４は、例えば、ラベル番号ｌの人物領域の実空間上での座標Ｈ（ｌ）と、既に追跡されている識別番号ｍの人物領域との座標（遅延手段３６から出力された予測座標）ｘ_t|t-1（ｍ）との距離に基づいて、次に示す（７）式を用いて、識別番号ｍに対応するラベル番号ｌ_match（ｍ）を求める。このラベル番号ｌ_match（ｍ）を求めることで、ラベル番号ｌと識別番号ｍとの対応付けがなされることになる。 This detection / tracking means 34 is, for example, a coordinate (output from the delay means 36) between the coordinates H (l) in the real space of the person area with the label number l and the person area with the identification number m already tracked. The label number l _match (m) corresponding to the identification number m is obtained using the following equation (7) based on the distance from the predicted coordinate) x _{t | t-1} (m). By obtaining the label number l _match (m), the label number l and the identification number m are associated with each other.

或いは、検出・追跡手段３４は、ラベル番号ｌ_match（ｍ）を求める際に、実座標Ｈ（ｌ）と予測座標ｘ_t|t-1（ｍ）との距離が閾値以内でない場合には、対応付けに失敗したとして、ラベル番号ｌ_match（ｍ）に特別な値を設定してもよい。つまり、ラベル番号ｌ_match（ｍ）に特別な値を設定しておくことで、ラベル番号ｌ_match（ｍ）が当該特別な値をとった場合、ラベル番号ｌに対応する識別番号ｍは存在していないとする。この特別な値には、例えば、“０”を用いることができる。このように閾値を用いて、実座標Ｈ（ｌ）と予測座標ｘ_t|t-1（ｍ）との対応付けを行う場合、次に示す（８）式を用いる。 Alternatively, when the detection / tracking means 34 obtains the label number l _match (m), if the distance between the actual coordinate H (l) and the predicted coordinate x _{t | t−1} (m) is not within the threshold value, A special value may be set for the label number l _match (m), assuming that the association has failed. In other words, by setting the special value label number l _match (m), if the label number l _match (m) took the special value, the identification number m corresponding to the label number l is present Suppose not. For this special value, for example, “0” can be used. When the real coordinates H (l) and the predicted coordinates x _{t | t−1} (m) are associated with each other using the threshold value, the following equation (8) is used.

そして、検出・追跡手段３４は、実座標と予測座標との対応付けを行った結果であるラベル番号ｌ_match（ｍ）が付加された人物領域のシルエット画像上における座標に基づいて、識別番号ｍが付加された人物領域の実空間上における座標（以下、観測座標という）ｙ_t（ｍ）を求める。ここでは、ラベル番号ｌ_match（ｍ）が付加された人物領域の実空間上における重心Ｈ（ｌ_match（ｍ））の示す座標を、観測座標ｙ_t（ｍ）としている。 The detection / tracking means 34 then identifies the identification number m based on the coordinates on the silhouette image of the person region to which the label number l _match (m), which is the result of associating the actual coordinates with the predicted coordinates, is added. The coordinates (hereinafter referred to as observation coordinates) y _t (m) in the real space of the person region to which is added are obtained. Here, the coordinates indicated by the center of gravity H (l _match (m)) in the real space of the person region to which the label number l _match (m) is added are set as the observation coordinates y _t (m).

なお、検出・追跡手段３４は、集合Ｌには存在するが、対応する識別番号ｍの存在しないラベル番号ｌが検出された場合には、当該ラベル番号ｌに、新たな識別番号ｍを付加（付与）し、且つ、この新たな識別番号ｍが付加された人物領域の実空間上における座標Ｈ（ｌ）に基づいて観測座標ｙ_t（ｍ）を設定し、出力してもよい。 The detection / tracking means 34 adds a new identification number m to the label number l when a label number l that exists in the set L but does not have a corresponding identification number m is detected ( And the observation coordinates y _t (m) may be set and output based on the coordinates H (l) in the real space of the person region to which the new identification number m is added.

また、検出・追跡手段３４は、対応するラベル番号ｌの無い識別番号ｍが存在した場合には、当該識別番号ｍを無効とし、当該識別番号ｍが付加されている人物領域の観測座標ｙ_t（ｍ）を出力しなくてもよい。 Further, when there is an identification number m without a corresponding label number l, the detection / tracking means 34 invalidates the identification number m, and observes coordinates y _{t of the} person region to which the identification number m is added. (M) may not be output.

さらに、検出・追跡手段３４は、逆投影変換手段３３からラベル番号ｌごとに入力された面積Ａ（ｌ）の中から、ラベル番号ｌ_match（ｍ）のものを面積α（ｍ）として出力する。すなわち、α（ｍ）＝Ａ（ｌ_match（ｍ））とする。なお、この面積α（ｍ）および識別番号ｍは、特徴ベクトル抽出手段５（図１）に出力される。 Further, the detection / tracking means 34 outputs the area with the label number l _match (m) as the area α (m) from the area A (l) inputted for each label number 1 from the back projection conversion means 33. . That is, α (m) = A (l _match (m)). The area α (m) and the identification number m are output to the feature vector extraction means 5 (FIG. 1).

予測・推定手段３５は、検出・追跡手段３４から出力された識別番号ｍと、観測座標ｙ_t（ｍ）とを受け取って、観測座標ｙ_t（ｍ）にフィルタ処理を行うことで、推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）、予測座標ｘ_t+1|t（ｍ）および予測速度ｖ_t+1|t（ｍ）を出力するものである。 The prediction / estimation means 35 receives the identification number m output from the detection / tracking means 34 and the observation coordinates y _t (m), and performs a filtering process on the observation coordinates y _t (m) to thereby estimate the coordinates. x _{t | t} (m), estimated speed v _{t | t} (m), predicted coordinates x _{t + 1 | t} (m), and predicted speed v _{t + 1 | t} (m) are output.

この予測・推定手段３５は、例えば、推定座標ｘ_t|t（ｍ）として、観測座標ｙ_t（ｍ）をそのまま出力してもよい。また、予測・推定手段３５は、推定速度ｖ_t|t（ｍ）を推定座標ｘ_t|t（ｍ）の履歴から計算することが可能である。予測・推定手段３５は、例えば、次に示す（９）式を用いて、推定速度ｖ_t|t（ｍ）を、現在の推定座標ｘ_t|t（ｍ）と、１単位時間過去における推定座標ｘ_t-1|t-1（ｍ）との差分から計算することが可能である。 For example, the prediction / estimation unit 35 may output the observation coordinates y _t (m) as the estimated coordinates x _{t | t} (m). Further, the prediction / estimation means 35 can calculate the estimated speed v _{t | t} (m) from the history of the estimated coordinates x _{t | t} (m). The prediction / estimation means 35 estimates the estimated speed v _{t | t} (m), the current estimated coordinates x _{t | t} (m), and the past for one unit time, for example, using the following equation (9). It is possible to calculate from the difference from the coordinate x _{t-1 | t-1} (m).

また、予測・推定手段３５は、推定座標ｘ_t|t（ｍ）を計算する際に、例えば、ラベル番号ｌ_match（ｍ）の観測座標ｙ_t（ｍ）の履歴と、推定座標ｘ_t|t（ｍ）の履歴とを重み付けして、加算した値を、現在の推定座標ｘ_t|t（ｍ）とすることも可能である。 Further, when calculating the estimated coordinate x _{t | t} (m), the prediction / estimation unit 35, for example, records the observed coordinate y _t (m) of the label number l _match (m) and the estimated coordinate x _{t |} It is also possible to weight the history of _t (m) and add the resulting value to the current estimated coordinate x _{t | t} (m).

さらに、予測・推定手段３５は、例えば、カルマン（ｋａｌｍａｎ）フィルタ、或いは、拡張カルマンフィルタによって、推定座標ｘ_t|t（ｍ）と推定速度ｖ_t|t（ｍ）との少なくとも一方を求めてもよい。予測・推定手段３５がカルマンフィルタを採用する場合は、次に示す（１０）式を用いて、状態ベクトルｚ_t|tを定義する。 Further, the prediction / estimation means 35 may obtain at least one of the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m) by, for example, a Kalman filter or an extended Kalman filter. Good. When the prediction / estimation means 35 employs a Kalman filter, the state vector z _{t | t} is defined using the following equation (10).

また、遅延手段３６（詳細は後記する）から出力される、遅延された予測座標ｘ_t|t-1（ｍ）と、遅延された予測速度ｖ_t|t-1（ｍ）とにより構成される状態ベクトルｚ_t|t-1を、次に示す（１１）式を用いて定義する。 Further, it is composed of a delayed predicted coordinate x _{t | t−1} (m) and a delayed predicted speed v _{t | t−1} (m) output from the delay means 36 (details will be described later). The state vector z _{t | t−1} is defined using the following equation (11).

このように、状態ベクトルｚ_t|tと、状態ベクトルｚ_t|t-1とから、次に示す（１２）式（カルマンフィルタを観測更新式）を適用することにより、遅延手段３６から出力される、遅延された予測座標ｘ_t|t-1（ｍ）および遅延された予測速度ｖ_t|t-1（ｍ）から、現在（現時点）の推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）を求めることができる。 In this way, the state vector z _{t | t} and the state vector z _{t | t−1} are output from the delay means 36 by applying the following equation (12) (Kalman filter is an observation update equation). From the delayed predicted coordinate x _{t | t-1} (m) and the delayed predicted speed v _{t | t-1} (m), the current (current) estimated coordinate x _{t | t} (m) and the estimated speed v _{t | t} (m) can be obtained.

なお、この（１２）式において、Ｐ_t|t-1は、時点ｔ−１までの情報から推定された、時点ｔにおける状態ベクトルの誤差共分散行列であり、Ｐ_0|-1を初期値として、例えば、４行×４列の任意の半正定値行列を設定している。Ｋ_tは、時点ｔにおけるカルマンフィルタからの出力（推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ））を示す行列である。また、（１２）式において、Ａは観測行列であり、Ｉ₂×₂は２行２列の単位行列であり、Ｏ₂×₂は２行２列のゼロ行列であり、次に示す（１３）式によって定義される。 In Equation (12), P _{t | t−1} is an error covariance matrix of the state vector at time t estimated from information up to time t−1, and P _{0 | −1} is an initial value. For example, an arbitrary semi-definite matrix of 4 rows × 4 columns is set. K _t is a matrix indicating outputs (estimated coordinates x _{t | t} (m) and estimated speed v _{t | t} (m)) from the Kalman filter at time t. In Equation (12), A is an observation matrix, I ₂ × ₂ is a 2-by-2 unit matrix, O ₂ × ₂ is a 2-by-2 zero matrix, ) Defined by the formula.

この（１３）式では、観測行列Ａとして、２行４列の行列を設定している。さらに、Ｒ_tは時点ｔにおける観測雑音の共分散行列であり、次に示す（１４）式によって定義される。 In this equation (13), a 2 × 4 matrix is set as the observation matrix A. Further, R _t is a covariance matrix of observation noise at time t, and is defined by the following equation (14).

この（１４）式では、観測雑音の共分散行列Ｒ_tとして、２行２列の対角行列を設定している。なお、この（１４）式におけるρ_xおよびρ_yには、適切な正の定数、若しくは、経過時間に伴って変化する正の変数を設定することが可能である。例えば、（７）式または（８）式の計算に用いた座標Ｈ（ｌ）と予測座標ｘ_t|t-1（ｍ）との距離に応じて、ρ_xまたはρ_yを変化させてもよい。なお、予測・推定手段３５がカルマンフィルタを採用する場合、対応するラベル番号ｌの存在しない識別番号ｍ（すなわち、観測座標ｙ_t（ｍ）が予測・推定手段３５に入力されなかった場合）に対しては、（９）式の代わりに、次に示す（１５）式を用いることとする。 In the equation (14), a 2-by-2 diagonal matrix is set as the observation noise covariance matrix R _t . It should be noted that an appropriate positive constant or a positive variable that changes with the elapsed time can be set in ρ _x and ρ _y in the equation (14). For example, even if ρ _x or ρ _y is changed according to the distance between the coordinate H (l) used in the calculation of the expression (7) or the expression (8) and the predicted coordinate x _{t | t−1} (m). Good. When the prediction / estimation means 35 employs a Kalman filter, for the identification number m that does not have the corresponding label number l (that is, when the observation coordinates y _t (m) are not input to the prediction / estimation means 35). Therefore, the following equation (15) is used instead of equation (9).

さらに、予測・推定手段３５は、推定座標ｘ_t|t（ｍ）と推定速度ｖ_t|t（ｍ）に基づき、次の時点ｔ＋１における予測座標ｘ_t+1|t（ｍ）と予測速度ｖ_t+1|t（ｍ）を、次に示す（１６）式を用いて求め、出力する。 Further, the prediction / estimation means 35, based on the estimated coordinate x _{t | t} (m) and the estimated speed v _{t | t} (m), predicts the coordinate x _{t + 1 | t} (m) at the next time point t + 1 and the predicted speed. v _{t + 1 | t} (m) is obtained using the following equation (16) and output.

この（１６）式では、等速度モデルを参照することで、予測座標ｘ_t+1|t（ｍ）と予測速度ｖ_t+1|t（ｍ）とを求めることができる。なお、（１６）式において、Δｔは処理の時間間隔であり、例えば、１単位時間周期（周期Ｔ）で行う場合、Δｔ＝１となる。 In this equation (16), the predicted coordinate x _{t + 1 | t} (m) and the predicted speed v _{t + 1 | t} (m) can be obtained by referring to the constant velocity model. In the equation (16), Δt is a processing time interval. For example, Δt = 1 when the processing is performed in one unit time period (period T).

また、予測・推定手段３５は、カルマンフィルタを採用した場合、次に示す（１７）式（漸化式）を用いて、予測座標ｘ_t+1|t（ｍ）と予測速度ｖ_t+1|t（ｍ）とを求めることが可能である。 Further, when the Kalman filter is employed, the prediction / estimation means 35 uses the following equation (17) (recurrence equation) to predict the predicted coordinate x _{t + 1 | t} (m) and the predicted speed v _{t + 1 | t} (m) can be obtained.

この（１７）式において、Ｆは、１単位時間における状態遷移モデルを表す行列（状態遷移行列）であり、この状態遷移行列Ｆは、等速度モデルを参照する場合、次に示す（１８）式によって定義される。 In this equation (17), F is a matrix (state transition matrix) representing a state transition model in one unit time. When this state transition matrix F refers to a constant velocity model, the following equation (18) Defined by

また、（１７）式において、Ｑは、状態遷移行列Ｆによる予測によって、新たに加えられる誤差をモデル化したプロセス雑音の共分散行列であり、例えば、プロセス雑音の共分散行列Ｑは、４行４列の半正定値行列を設定している。 In Equation (17), Q is a process noise covariance matrix in which a newly added error is modeled by prediction using the state transition matrix F. For example, the process noise covariance matrix Q is 4 rows. A four-column semi-definite matrix is set.

さらに、（１７）式において、状態ベクトルの誤差共分散行列Ｐ_t|tまたはＰ_t+1|tのある成分、若しくは、いずかの行列、或いは、トレースがある数値範囲に入った場合、すなわち、推定誤差が大きくなったと想定された場合には、予測・推定手段３５は、該当する識別番号ｍを無効とし、該当する推定座標ｘ_t|t（ｍ）、予測座標ｘ_t+1|t（ｍ）、推定速度ｖ_t|t（ｍ）および予測速度ｖ_t+1|t（ｍ）を出力しない。 Further, in the equation (17), when a state vector error covariance matrix P _{t | t} or P _{t + 1 | t} , any matrix, or trace enters a certain numerical range, That is, when it is assumed that the estimation error has increased, the prediction / estimation means 35 invalidates the corresponding identification number m, and the corresponding estimated coordinate x _{t | t} (m), predicted coordinate x _{t + 1 | t} (m), estimated speed v _{t | t} (m) and predicted speed v _{t + 1 | t} (m) are not output.

つまり、推定誤差が大きくなったと想定された場合には、状態ベクトルの誤差共分散行列Ｐ_t|tまたはＰ_t+1|tのある成分が大きくなることが確認されており、これらの行列のある成分、特に対角成分に着目し、当該対角成分が、ある閾値を越えた場合には、該当する推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）、または、予測座標ｘ_t+1|t（ｍ）、および予測速度ｖ_t+1|t（ｍ）が信頼できないものとして無効化する。例えば、次に示す（１９）式のように、状態ベクトルの誤差共分散行列Ｐ_t|tのトレースが閾値θ_P以上になった場合には、予測・推定手段３５は、該当する識別番号ｍを無効にすることができる。 That is, when it is assumed that the estimation error has increased, it has been confirmed that a certain component of the state vector error covariance matrix P _{t | t} or P _{t + 1 | t} increases. When attention is paid to a certain component, particularly a diagonal component, and the diagonal component exceeds a certain threshold value, the corresponding estimated coordinate x _{t | t} (m) and estimated velocity v _{t | t} (m), or The predicted coordinate x _{t + 1 | t} (m) and the predicted speed v _{t + 1 | t} (m) are invalidated as unreliable. For example, when the trace of the error covariance matrix P _{t | t} of the state vector is equal to or greater than the threshold θ _{P as} shown in the following equation (19), the prediction / estimation unit 35 determines the corresponding identification number m Can be disabled.

遅延手段３６は、予測・推定手段３５から出力された予測座標ｘ_t+1|t（ｍ）および予測速度ｖ_t+1|t（ｍ）を、１単位時間遅延して、遅延された予測座標ｘ_t|t-1（ｍ）および遅延された予測速度ｖ_t|t-1（ｍ）を、検出・追跡手段３４に出力するものである。 The delay means 36 delays the prediction coordinates x _{t + 1 | t} (m) and the prediction speed v _{t + 1 | t} (m) output from the prediction / estimation means 35 by one unit time, and delays the prediction. The coordinates x _{t | t-1} (m) and the delayed predicted speed v _{t | t-1} (m) are output to the detection / tracking means 34.

図１に戻って、映像解析装置１の構成の説明を続ける。
色識別手段４は、入力映像と、シルエット映像生成手段２で生成されたシルエット映像（複数のシルエット画像Ｓ（ｘ，ｙ））と、人物追跡手段３の予測・推定手段３５から出力された識別番号ｍ、推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）とに基づいて、入力映像中の各人物のシルエット（各追跡領域、識別番号ｍ）の着衣の色（追跡領域の色）を識別して、色分類番号Ｃ（ｍ）を出力するものである。
この色識別手段４は、例えば、特願２００５−１１９５９の「色識別装置および色識別プログラム」に記載されている手法を用いることができる。 Returning to FIG. 1, the description of the configuration of the video analysis apparatus 1 will be continued.
The color identification unit 4 includes an input video, a silhouette video (a plurality of silhouette images S (x, y)) generated by the silhouette video generation unit 2, and an identification output from the prediction / estimation unit 35 of the person tracking unit 3. Based on the number m, the estimated coordinates x _{t | t} (m), and the estimated speed v _{t | t} (m), the color of the clothing (tracking) of the silhouette of each person (each tracking area, identification number m) in the input video The color of the area) is identified, and the color classification number C (m) is output.
For example, the technique described in “Color Identification Device and Color Identification Program” of Japanese Patent Application No. 2005-11959 can be used as the color identification means 4.

例えば、当該装置１に入力する入力映像を、サッカー競技映像とする場合、一方のチームのフィールド選手を“０”、一方のチームのゴールキーパーを“１”、他方のチームのフィールド選手を“２”、他方のチームのゴールキーパーを“３”、審判を“４”、その他（ボールボーイ、監督、観客等）を“５”といったように、色分類番号Ｃ（ｍ）を定める（割り振る）ことができる。 For example, when the input video input to the device 1 is a soccer game video, the field player of one team is “0”, the goalkeeper of one team is “1”, and the field player of the other team is “2”. Determine (assign) the color classification number C (m), such as “3” for the other team ’s goalkeeper, “4” for the referee, and “5” for others (ballboy, director, spectator, etc.). Can do.

特徴ベクトル抽出手段５は、人物追跡手段３の検出・追跡手段３４から出力された識別番号ｍおよび面積α（ｍ）と、人物追跡手段３の予測・推定手段３５から出力された識別番号ｍ、推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）と、色識別手段４から出力された識別番号ｍおよび色分類番号Ｃ（ｍ）とに基づいて、特徴ベクトルに含める特徴量を求めて出力するものである。 The feature vector extraction unit 5 includes an identification number m and an area α (m) output from the detection / tracking unit 34 of the person tracking unit 3, and an identification number m output from the prediction / estimation unit 35 of the person tracking unit 3. Features to be included in the feature vector based on the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m) and the identification number m and the color classification number C (m) output from the color identification unit 4 The amount is obtained and output.

なお、この特徴ベクトル抽出手段５では、識別番号ｍと対応付けられた、面積α（ｍ）と、推定座標ｘ_t|t（ｍ）と、推定速度ｖ_t|t（ｍ）と、色分類番号Ｃ（ｍ）との少なくとも一つの入力情報に基づいて、特徴ベクトルに含める特徴量を計算しているが、入力情報の種類（数）が増加するほど、特徴ベクトルに含める特徴量の種類（数）も増加することになり、この結果、イベント検出手段６によるイベントの検出精度が向上することになる。この特徴ベクトル抽出手段５の詳細な構成を図５に示す。 In this feature vector extraction means 5, the area α (m), the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m), and the color classification associated with the identification number m. The feature amount included in the feature vector is calculated based on at least one input information with the number C (m). As the type (number) of input information increases, the type of feature amount included in the feature vector ( Number) also increases, and as a result, the accuracy of event detection by the event detection means 6 is improved. A detailed configuration of the feature vector extracting means 5 is shown in FIG.

図５に示すように、特徴ベクトル抽出手段５は、人数計測手段５１と、人物群分散計測手段５２と、人物間距離計測手段５３と、人物群重心計測手段５４と、平均速さ計測手段５５と、平均速度計測手段５６と、特定領域監視手段５７とを備えている。 As shown in FIG. 5, the feature vector extraction means 5 includes a person counting means 51, a person group variance measuring means 52, a person distance measuring means 53, a person group centroid measuring means 54, and an average speed measuring means 55. And an average speed measuring means 56 and a specific area monitoring means 57.

人数計測手段５１は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）、色分類番号Ｃ（ｍ）および面積α（ｍ）の少なくとも１つの情報に基づいて、入力映像内に含まれる人物の数（人数）Ｎ（シルエット数推定値、人物数推定値）を推定するものである。この人数計測手段５１は、最も簡単な手法として、入力された識別番号ｍの総数を数えることで、人数Ｎを求めることができる。また、人数計測手段５１は、図６に示す構成によって、人数Ｎを求めてもよい。 The number-of-people counting means 51 receives the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m), the color classification number C (m), and the area of the input identification number m of each person (each tracking area). Based on at least one piece of information of α (m), the number (number of people) N (number of silhouettes estimated value, number of people estimated value) included in the input video is estimated. As the simplest method, the number counting means 51 can determine the number N of people by counting the total number of input identification numbers m. Further, the number-of-people counting means 51 may obtain the number of people N by the configuration shown in FIG.

図６に示すように、人数計測手段５１は、クロック手段５１１と、順次面積選択手段５１２と、順次座標選択手段５１３と、人物像面積推定手段５１４と、除算手段５１５と、総和演算手段５１６とを備えている。 As shown in FIG. 6, the person counting means 51 includes a clock means 511, a sequential area selecting means 512, a sequential coordinate selecting means 513, a person image area estimating means 514, a dividing means 515, and a sum calculating means 516. It has.

クロック手段５１１は、順次増加または減少する数値時系列を生成して、この数値時系列を、順次面積選択手段５１２、順次座標選択手段５１３および総和演算手段５１６に出力するものである。この数値時系列は、各手段５１２、５１３、５１６において、入力された情報を処理するタイミングを同期させるために基準となるものである。 The clock unit 511 generates a numerical time series that sequentially increases or decreases, and outputs the numerical time series to the area selecting unit 512, the sequential coordinate selecting unit 513, and the sum calculating unit 516 sequentially. This numerical time series is a reference for synchronizing the timing of processing the input information in each means 512, 513, 516.

順次面積選択手段５１２は、識別番号ｍ、面積α（ｍ）および色分類番号Ｃ（ｍ）が入力され、クロック手段５１１から出力された数値時系列に従ったタイミングで、順次指定される識別番号ｍの色分類番号Ｃ（ｍ）が特定の値（色識別手段４の説明のところで示した例であれば、０〜５の整数）のものを抽出して、該当したものの面積α（ｍ）を除算手段５１５に出力するものである。 The sequential area selection unit 512 receives the identification number m, the area α (m), and the color classification number C (m), and is sequentially specified at the timing according to the numerical time series output from the clock unit 511. The color classification number C (m) of m is extracted with a specific value (in the example shown in the description of the color identification means 4, an integer of 0 to 5), and the area α (m) of the corresponding one is extracted. Is output to the dividing means 515.

例えば、前記したように、当該装置１に入力された入力映像がサッカー競技映像であり、一方のチームのフィールド選手が“０”、一方のチームのゴールキーパーが“１”、他方のチームのフィールド選手が“２”、他方のチームのゴールキーパーが“３”、審判が“４”、その他（ボールボーイ、監督、観客等）が“５”と、色分類番号Ｃ（ｍ）が設定されている場合、色分類番号Ｃ（ｍ）が“０”から“３”のもののみを抽出することにより、選手のみを選択する（選び出す）ことができる。 For example, as described above, the input video input to the device 1 is a soccer game video, the field player of one team is “0”, the goalkeeper of one team is “1”, the field of the other team The color classification number C (m) is set as “2” for the player, “3” for the goalkeeper of the other team, “4” for the referee, and “5” for others (ballboy, director, spectator, etc.) If there is, the player can be selected (selected) only by extracting the color classification numbers C (m) from “0” to “3”.

順次座標選択手段５１３は、識別番号ｍ、色分類番号Ｃ（ｍ）、推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）が入力され、クロック手段５１１から出力された数値時系列に従ったタイミングで、順次指定される識別番号ｍの色分類番号Ｃ（ｍ）が特定の値（色識別手段４の説明のところで示した例であれば、０〜５の整数）のものを抽出して、該当したものの推定座標ｘ_t|t（ｍ）を人物像面積推定手段５１４に出力するものである。この順次座標選択手段５１３は、順次面積選択手段５１２と同様に、色分類番号Ｃ（ｍ）が特定の値のものを抽出し、そして、識別番号ｍを介在させ、当該識別番号ｍが対応している推定座標ｘ_t|t（ｍ）を出力している。順次座標選択手段５１３は、順次面積選択手段５１２と同様に、入力映像がサッカー競技映像である場合、特定の値を“０”から“３”とすれば、選手のみを選択する（選び出す）ことができる。 The sequential coordinate selection means 513 receives the identification number m, the color classification number C (m), the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m), and the numerical value output from the clock means 511. The color classification number C (m) of the identification number m sequentially specified at a timing according to the time series is a specific value (in the example shown in the description of the color identification means 4, an integer of 0 to 5). An object is extracted, and the estimated coordinates x _{t | t} (m) of the corresponding object are output to the person image area estimation means 514. Similar to the sequential area selection unit 512, the sequential coordinate selection unit 513 extracts the color classification number C (m) having a specific value, interposes the identification number m, and the identification number m corresponds to the sequential coordinate selection unit 513. The estimated coordinates x _{t | t} (m) are output. Similar to the sequential area selection means 512, the sequential coordinate selection means 513 selects (selects) only the players if the specific value is changed from “0” to “3” when the input video is a soccer game video. Can do.

人物像面積推定手段５１４は、人物の大きさ程度の立体（人物の体積に相当する立体、直方体や円柱等）を、順次座標選択手段５１３から出力された推定座標ｘ_t|t（ｍ）に配置したと仮定し、透視変換を行うことで、画像平面における像を推定するものである。図７に示すように、立体として、直方体を用いた場合には、当該直方体を画像平面（投影面、表示装置（図示せず）の表示面）に写像した際の像の輪郭は、四角形、五角形または六角形の多角形の形状となる。 The person image area estimation unit 514 converts a solid (e.g., a solid corresponding to the volume of a person, a rectangular parallelepiped, a cylinder, or the like) about the size of the person into the estimated coordinates x _{t | t} (m) sequentially output from the coordinate selection unit 513. The image on the image plane is estimated by performing perspective transformation on the assumption that they are arranged. As shown in FIG. 7, when a rectangular parallelepiped is used as a solid, the contour of the image when the rectangular parallelepiped is mapped to an image plane (projection surface, display surface of a display device (not shown)) is a quadrangle, It becomes a pentagonal or hexagonal polygonal shape.

そして、人物像面積推定手段５１４は、得られた直方体の像において、画像平面（投影面）に対面する面の面積、または、得られた像に外接する方形（バウンディングボックス）の面積を求め、推定面積α_est（ｍ）とする。 Then, the human image area estimating means 514 obtains the area of the surface facing the image plane (projection plane) or the area of the rectangle (bounding box) circumscribing the obtained image in the obtained rectangular parallelepiped image, It is assumed that the estimated area α _est (m).

図６に戻って、映像解析装置１の特徴ベクトル抽出手段５の人数計測手段５１の構成の説明を続ける。
除算手段５１５は、順次面積選択手段５１２から出力された面積α（ｍ）を、人物像面積推定手段５１４から出力された推定面積α_est（ｍ）で除算して、除算した結果である面積比ｒ（ｍ）を総和演算手段５１６に出力するものである。この除算手段５１５から出力される面積比ｒ（ｍ）は、次に示す（２０）式で表される。 Returning to FIG. 6, the description of the configuration of the number-of-people measuring means 51 of the feature vector extracting means 5 of the video analysis apparatus 1 will be continued.
The division unit 515 sequentially divides the area α (m) output from the area selection unit 512 by the estimated area α _est (m) output from the person image area estimation unit 514 and divides the area ratio. r (m) is output to the sum calculating means 516. The area ratio r (m) output from the dividing means 515 is expressed by the following equation (20).

この（２０）式で表される面積比ｒ（ｍ）は、入力画像Ｉ（ｘ，ｙ）に映っている人物間のオクルージョンがなく（重なり合いがなく）、シルエット画像Ｓ（ｘ，ｙ）に含まれているシルエット一つに対して、人物一人が丁度含まれている場合には、入力映像を撮影しているカメラの撮影地点（カメラの投影中心）から、人物までの距離によらずにほぼ一定の値をとる。 The area ratio r (m) represented by the equation (20) has no occlusion (no overlap) between the persons shown in the input image I (x, y), and the silhouette image S (x, y). If only one person is included for one included silhouette, regardless of the distance from the shooting point (camera projection center) of the camera that is shooting the input video to the person The value is almost constant.

また、シルエット画像Ｓ（ｘ、ｙ）に含まれているシルエット一つに対して、複数の人物が含まれている場合には、通常、含まれている人物数の増加に伴って、面積比ｒ（ｍ）の値も増加することが推測される。 In addition, when a plurality of persons are included with respect to one silhouette included in the silhouette image S (x, y), the area ratio is usually increased as the number of included persons increases. It is presumed that the value of r (m) also increases.

そこで、（２０）式の代わりに、次に示す（２１）式を用いて、面積比ｒ（ｍ）を求めてもよい。 Therefore, the area ratio r (m) may be obtained using the following equation (21) instead of the equation (20).

この（２１）式において、ｋは定数とし、この定数の値は、面積比ｒ（ｍ）が一つのシルエット内に含まれる人物の数とほぼ一致するような値をとることが好適である。 In this equation (21), k is a constant, and it is preferable that the value of the constant be a value such that the area ratio r (m) substantially matches the number of persons included in one silhouette.

総和演算手段５１６は、除算手段５１５から出力された面積比ｒ（ｍ）を、クロック手段５１１から出力された数値時系列に従ったタイミングで累積し、累積した結果（累積結果）を人数Ｎとし、特徴ベクトルを構成する特徴量の一つとして出力するものである。 The sum calculating means 516 accumulates the area ratio r (m) output from the dividing means 515 at a timing according to the numerical time series output from the clock means 511, and sets the accumulated result (accumulated result) as the number N of people. Are output as one of the feature quantities constituting the feature vector.

図５に示した人物群分散計測手段５２は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）、色分類番号Ｃ（ｍ）に基づいて、入力映像内に含まれる人物の分布が広がっている度合いを定量化し、人物群分布面積Ｖ（シルエット群分布定量化値、人物群分布定量化値）として出力するものである。この人物群分散計測手段５２の詳細な構成を図８に示す。 The person group dispersion measuring unit 52 shown in FIG. 5 is configured to input the estimated coordinates x _{t | t} (m) of the identification number m of each person (each tracking area), the estimated speed v _{t | t} (m), and the color classification. Based on the number C (m), the degree to which the distribution of the persons included in the input video is quantified is quantified and output as a person group distribution area V (silhouette group distribution quantified value, person group distribution quantified value). Is. A detailed configuration of the person group dispersion measuring unit 52 is shown in FIG.

図８に示すように、人物群分散計測手段５２は、クロック手段５２１と、順次座標選択手段５２２と、共分散行列演算手段５２３と、分布面積演算手段５２４とを備えている。なお、クロック手段５２１および順次座標選択手段５２２は、図６に示したクロック手段５１１および順次座標選択手段５１３と同様であるので、説明を省略する。 As shown in FIG. 8, the person group variance measurement unit 52 includes a clock unit 521, a sequential coordinate selection unit 522, a covariance matrix calculation unit 523, and a distribution area calculation unit 524. The clock means 521 and the sequential coordinate selection means 522 are the same as the clock means 511 and the sequential coordinate selection means 513 shown in FIG.

共分散行列演算手段５２３は、順次座標選択手段５２２から出力された推定座標ｘ_t|t（ｍ）の共分散行列を求めるものである。この共分散行列演算手段５２３は、例えば、順次座標選択手段５２２から出力された推定座標ｘ_t|t（ｍ）およびｘ_t|t（ｍ）［ｘ_t|t（ｍ）］^T（Ｔはベクトルの転置を表す）を、クロック手段５２１から出力された数値時系列に従ったタイミングで累積すると共に、順次座標選択手段５２２から出力された推定座標ｘ_t|t（ｍ）の個数（サンプル数）を数えることで、共分散行列Ｄを求める。 The covariance matrix calculation means 523 obtains a covariance matrix of the estimated coordinates x _{t | t} (m) sequentially output from the coordinate selection means 522. For example, the covariance matrix calculating unit 523 may include the estimated coordinates x _{t | t} (m) and x _{t | t} (m) [x _{t | t} (m)] ^T (T is output from the coordinate selecting unit 522 sequentially. The vector transposition) is accumulated at a timing according to the numerical time series output from the clock unit 521, and the number of estimated coordinates x _{t | t} (m) output from the coordinate selection unit 522 sequentially (number of samples) ) To obtain a covariance matrix D.

この共分散行列演算手段５２３により求められた共分散行列Ｄの各成分を、次に示す（２２）式によって定義する。 Each component of the covariance matrix D obtained by the covariance matrix calculation means 523 is defined by the following equation (22).

分布面積演算手段５２４は、共分散行列演算手段５２３で求められた共分散行列Ｄに基づいて、実空間上での人物分布範囲を定量化した人物群分布面積Ｖを求め、特徴ベクトルを構成する特徴量の一つとして出力するものである。人物群分布面積Ｖは、実空間上での面積と同じ次元を有しており、次に示す（２３）式を用いて求めることができる。なお、（２３）式において、βは、０以上の定数である。 Based on the covariance matrix D obtained by the covariance matrix computing means 523, the distribution area computing means 524 obtains a person group distribution area V obtained by quantifying the person distribution range in the real space and constructs a feature vector. This is output as one of the feature quantities. The person group distribution area V has the same dimension as the area in the real space, and can be obtained using the following equation (23). In the equation (23), β is a constant of 0 or more.

図５に示した人物間距離計測手段５３は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）に基づいて、入力映像内に含まれる人物間が近接している度合いを定量化し、人物間距離ｄ（シルエット間距離定量化値、人物間距離定量化値）として出力するものである。この人物間距離計測手段５３の詳細な構成を図９に示す。 The inter-person distance measuring means 53 shown in FIG. 5 is based on the input estimated coordinates x _{t | t} (m) and estimated speed v _{t | t} (m) of the identification number m of each person (each tracking area). The degree of proximity between persons included in the input video is quantified and output as an interpersonal distance d (interval silhouette distance quantified value, interpersonal distance quantified value). A detailed configuration of the interpersonal distance measuring means 53 is shown in FIG.

図９に示すように、人物間距離計測手段５３は、カウンタ手段５３１と、第一番号対応座標選択手段５３２と、第二番号対応座標選択手段５３３と、距離演算手段５３４と、最小値演算手段５３５と、平均値演算手段５３６とを備えている。 As shown in FIG. 9, the inter-person distance measuring unit 53 includes a counter unit 531, a first number corresponding coordinate selecting unit 532, a second number corresponding coordinate selecting unit 533, a distance calculating unit 534, and a minimum value calculating unit. 535 and average value calculation means 536.

カウンタ手段５３１は、二つのカウンタ（図示せず）を備えてなり、当該カウンタの出力ｍ₁およびｍ₂を組み合わせて、二つの識別番号ｍ₁およびｍ₂の全ての組み合わせを走査するものである。なお、ここでいう「走査」とは、いわゆるラスタスキャンを指しており、ここでは、（ｍ₁，ｍ₂）を（０，０）（１，０）（２，０）、・・・、（０，１）（１，１）（２，１）、・・・、（０，２）（１，２）（２，２）といったように全ての組み合わせを発生させることを意味している。 The counter means 531 includes two counters (not shown), and scans all combinations of the two identification numbers m ₁ and m ₂ by combining the outputs m ₁ and m ₂ of the counter. . Here, “scan” refers to a so-called raster scan, and (m ₁ , m ₂ ) is replaced with (0, 0) (1, 0) (2, 0),. (0,1) (1,1) (2,1), ..., (0,2) (1,2) (2,2) means that all combinations are generated. .

第一番号対応座標選択手段５３２は、識別番号ｍの推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）が入力され、カウンタ手段５３１から出力された識別番号ｍ₁に対応する推定座標ｘ_t|t（ｍ₁）を、距離演算手段５３４に出力するものである。 The first number corresponding coordinate selection means 532 receives the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m) of the identification number m, and corresponds to the identification number m ₁ output from the counter means 531. The estimated coordinates x _{t | t} (m ₁ ) to be output are output to the distance calculation means 534.

第二番号対応座標選択手段５３３は、識別番号ｍの推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）が入力され、カウンタ手段５３１から出力された識別番号ｍ₂に対応する推定座標ｘ_t|t（ｍ₂）を、距離演算手段５３４に出力するものである。 The second number corresponding coordinate selection means 533 receives the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m) of the identification number m, and corresponds to the identification number m ₂ output from the counter means 531. The estimated coordinates x _{t | t} (m ₂ ) to be output are output to the distance calculation means 534.

距離演算手段５３４は、第一番号対応座標選択手段５３２から出力された識別番号ｍ₁の推定座標ｘ_t|t（ｍ₁）と、第二番号対応座標選択手段５３３から出力された識別番号ｍ₂の推定座標ｘ_t|t（ｍ₂）との距離をｄ_m1,m2を求め、最小値演算手段５３５に出力するものである。この距離ｄ_m1,m2は、次に示す（２４）式によって表される。 The distance calculation means 534 includes the estimated coordinates x _{t | t} (m ₁ ) of the identification number m ₁ output from the first number corresponding coordinate selection means 532 and the identification number m output from the second number corresponding coordinate selection means 533. ₂ of the estimated coordinates x _{t |} a distance between _t (m ₂₎ obtains the d _{m1, m2,} and outputs the minimum value calculating means 535. This distance d _{m1, m2} is expressed by the following equation (24).

最小値演算手段５３５は、距離演算手段５３４から出力された距離ｄ_m1,m2と、カウンタ手段５３１から出力された識別番号ｍ₁とに基づいて、識別番号ｍ₂の推定座標ｘ_t|t（ｍ₂）から最も距離の近い推定座標ｘ_t|t（ｍ₁）までの距離ｄ_min（ｍ₂）を、次に示す（２５）式を用いて求め、平均値演算出手段５３６に出力するものである。 Minimum value calculating means 535, the distance d _{m1, m @ 2,} which is output from the distance calculating unit 534, based on the identification number m ₁ output from the counter means 531, the identification number m ₂ estimated coordinates x _{t | t} ( from m ₂₎ whose distance estimation coordinates close in x _{t |} distance d _min to t (m ₁₎ and (m _2), determined using the following equation (25), and outputs the average value calculation detecting means 536 Is.

平均値演算手段５３６は、最小値演算手段５３５から出力された距離ｄ_min（ｍ₂）と、カウンタ手段５３１から出力された識別番号ｍ₂とに基づいて、次に示す（２６）式を用いて、距離ｄ_min（ｍ₂）を識別番号ｍ₂に亘って平均演算を行って、平均値ｄを求め、この平均値ｄを人物間距離ｄとし、特徴ベクトルを構成する特徴量の一つとして出力するものである。 Based on the distance d _min (m ₂ ) output from the minimum value calculator 535 and the identification number m ₂ output from the counter 531, the average value calculator 536 uses the following expression (26). Then, the distance d _min (m ₂ ) is averaged over the identification number m ₂ to obtain the average value d, and this average value d is set as the inter-person distance d, which is one of the feature quantities constituting the feature vector. Is output as

図５に示した人物群重心計測手段５４は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）および色分類番号Ｃ（ｍ）に基づいて、入力映像内に含まれる人物が分布している人物分布（人物群）の重心を、人物群重心ｇ（シルエット群重心定量化値、人物群重心定量化値）として出力するものである。この人物群重心計測手段５４の詳細な構成を図１０に示す。 The person group center-of-gravity measuring means 54 shown in FIG. 5 has the input estimated coordinates x _{t | t} (m) of the identification number m of each person (each tracking area), the estimated speed v _{t | t} (m), and the color classification. Based on the number C (m), the centroid of the person distribution (person group) in which the persons included in the input video are distributed is the person group centroid g (silhouette group centroid quantification value, person group centroid quantification value). Is output as FIG. 10 shows a detailed configuration of the person group centroid measuring means 54.

図１０に示すように、人物群重心計測手段５４は、クロック手段５４１と、順次座標選択手段５４２と、平均値演算手段５４３とを備えている。クロック手段５４１および順次座標選択手段５４２は、図６に示したクロック手段５１１および順次座標選択手段５１３と同様であるので、説明を省略する。 As shown in FIG. 10, the person group center-of-gravity measurement unit 54 includes a clock unit 541, a sequential coordinate selection unit 542, and an average value calculation unit 543. The clock means 541 and the sequential coordinate selection means 542 are the same as the clock means 511 and the sequential coordinate selection means 513 shown in FIG.

平均値演算手段５４３は、順次座標選択手段５４２から出力された推定座標ｘ_t|t（ｍ）の平均値（ベクトル）を、次に示す（２７）式を用いて、クロック手段５４１から出力された数値時系列に従ったタイミングで求め、人物群重心ｇとし、特徴ベクトルを構成する特徴量の一つとして出力するものである。 The average value calculation means 543 outputs the average value (vector) of the estimated coordinates x _{t | t} (m) sequentially output from the coordinate selection means 542 from the clock means 541 using the following equation (27). It is obtained at a timing according to the numerical time series, and is output as one of the feature quantities constituting the feature vector as the person group centroid g.

図５に示した平均速さ計測手段５５は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）および色分類番号Ｃ（ｍ）に基づいて、入力映像内に含まれる人物が分布している人物分布（人物群）の重心の速さを、平均速さｓ（シルエット速さ定量化値、人物速さ定量化値）として出力するものである。この平均速さ計測手段５５の詳細な構成を図１１に示す。 The average speed measuring means 55 shown in FIG. 5 receives the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m) of the input identification number m of each person (each tracking area), and the color classification. Based on the number C (m), the speed of the center of gravity of the person distribution (person group) in which the persons included in the input video are distributed is expressed as the average speed s (silhouette speed quantification value, person speed quantification). Output as a conversion value). A detailed configuration of the average speed measuring means 55 is shown in FIG.

図１１に示すように、平均速さ計測手段５５は、クロック手段５５１と、順次速度選択手段５５２と、絶対値演算手段５５３と、平均値演算手段５５４とを備えている。クロック手段５５１は、図６に示したクロック手段５１１と同様であるので、説明を省略する。 As shown in FIG. 11, the average speed measuring means 55 includes a clock means 551, a sequential speed selecting means 552, an absolute value calculating means 553, and an average value calculating means 554. The clock means 551 is the same as the clock means 511 shown in FIG.

順次速度選択手段５５２は、識別番号ｍ、色分類番号Ｃ（ｍ）、推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）が入力され、クロック手段５５１から出力された数値時系列に従ったタイミングで、順次指定される識別番号ｍの色分類番号Ｃ（ｍ）が特定の値のものを抽出して、該当したものの推定速度ｖ_t|t（ｍ）を、絶対値演算手段５５３に出力するものである。順次速度選択手段５５２は、前記したように、入力映像がサッカー競技映像である場合、特定の値を“０”から“３”とすれば、選手のみの推定速度ｖ_t|t（ｍ）を選択する（選び出す）ことができる。 The sequential speed selection means 552 receives the identification number m, the color classification number C (m), the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m), and the numerical value output from the clock means 551. At the timing in accordance with the time series, the color classification number C (m) of the identification number m sequentially specified is extracted with a specific value, and the estimated speed v _{t | t} (m) of the corresponding one is extracted as an absolute value. This is output to the calculation means 553. As described above, the sequential speed selection means 552 determines the estimated speed v _{t | t} (m) only for the player if the specific value is changed from “0” to “3” when the input video is a soccer game video. Can be selected (selected).

絶対値演算手段５５３は、順次速度選択手段５５２から出力された推定速度ｖ_t|t（ｍ）の絶対値を求めるものである。この推定速度ｖ_t|t（ｍ）の絶対値は、次に示す（２８）式によって定義される。 The absolute value calculation means 553 obtains the absolute value of the estimated speed v _{t | t} (m) output from the speed selection means 552 sequentially. The absolute value of the estimated speed v _{t | t} (m) is defined by the following equation (28).

平均値演算手段５５４は、絶対値演算手段５５３から出力された推定速度ｖ_t|t（ｍ）の絶対値の平均値を、次に示す（２９）式を用いて求めた平均速さｓとし、特徴ベクトルを構成する特徴量の一つとして出力するものである。 The average value calculating means 554 sets the average value of the absolute values of the estimated speed v _{t | t} (m) output from the absolute value calculating means 553 as the average speed s obtained using the following equation (29). Are output as one of the feature quantities constituting the feature vector.

図５に示した平均速度計測手段５６は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）および色分類番号Ｃ（ｍ）に基づいて、入力映像内に含まれる人物が分布している人物分布（人物群）の重心の速度を、平均速度ｕ（シルエット速度定量化値、人物速度定量化値）として出力するものである。この平均速度計測手段５６の詳細な構成を図１２に示す。 The average speed measuring means 56 shown in FIG. 5 is configured to input the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m), and the color classification number of the identification number m of each person (each tracking area). Based on C (m), the speed of the center of gravity of the person distribution (person group) in which persons included in the input video are distributed is output as an average speed u (silhouette speed quantified value, person speed quantified value). To do. A detailed configuration of the average speed measuring means 56 is shown in FIG.

図１２に示すように、平均速度計測手段５６は、クロック手段５６１と、順次速度選択手段５６２と、平均値演算手段５６３とを備えている。なお、クロック手段５６１は図６に示したクロック手段５１１と同様であり、順次速度選択手段５６２は図１１に示した順次速度選択手段５５２と同様であるので、説明を省略する。 As shown in FIG. 12, the average speed measurement means 56 includes a clock means 561, a sequential speed selection means 562, and an average value calculation means 563. The clock means 561 is the same as the clock means 511 shown in FIG. 6, and the sequential speed selection means 562 is the same as the sequential speed selection means 552 shown in FIG.

平均値演算手段５６３は、順次速度選択手段５５２から出力された推定速度ｖ_t|t（ｍ）の平均値を、次に示す（３０）式を用いて、クロック手段５６１から出力された数値時系列に従ったタイミングで求め、平均速度ｕとし、特徴ベクトルを構成する特徴量の一つとして出力するものである。 The average value calculation means 563 calculates the average value of the estimated speeds v _{t | t} (m) sequentially output from the speed selection means 552 by using the following equation (30) and the numerical value output from the clock means 561. It is obtained at a timing according to the series, and is output as one of the feature quantities constituting the feature vector, with the average speed u.

図５に示した特定領域監視手段５７は、入力された各人物（各追跡領域）の識別番号ｍの推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）および色分類番号Ｃ（ｍ）に基づいて、入力映像内に含まれる人物が特定領域に含まれているか否かを判定し、判定した結果を、監視結果ｗ（判定値）として出力するものである。この特定領域監視手段５７の詳細な構成を図１３に示す。 The specific area monitoring means 57 shown in FIG. 5 uses the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m), and the color classification number of the input identification number m of each person (each tracking area). Based on C (m), it is determined whether or not a person included in the input video is included in the specific area, and the determined result is output as a monitoring result w (determination value). A detailed configuration of the specific area monitoring unit 57 is shown in FIG.

図１３に示すように、特定領域監視手段５７は、クロック手段５７１と、順次番号対応座標選択手段５７２と、色分類別２次元閾値演算手段５７３とを備えている。なお、クロック手段５７１は、図６に示したクロック手段５１１と同様であるので、説明を省略する。 As shown in FIG. 13, the specific area monitoring unit 57 includes a clock unit 571, a sequential number corresponding coordinate selection unit 572, and a color classification-specific two-dimensional threshold value calculation unit 573. The clock means 571 is the same as the clock means 511 shown in FIG.

順次番号対応座標選択手段５７２は、識別番号ｍ、推定座標ｘ_t|t（ｍ）および推定速度ｖ_t|t（ｍ）が入力され、クロック手段５７１から出力された数値時系列に従ったタイミングで、順次指定される識別番号ｍの推定座標ｘ_t|t（ｍ）を、色分類別２次元閾値演算手段５７３に出力するものである。 The sequential number corresponding coordinate selection means 572 receives the identification number m, the estimated coordinates x _{t | t} (m) and the estimated speed v _{t | t} (m), and the timing according to the numerical time series output from the clock means 571. Thus, the estimated coordinates x _{t | t} (m) of the identification number m sequentially specified are output to the two-dimensional threshold value calculation means 573 classified by color classification.

色分類別２次元閾値演算手段５７３は、識別番号ｍおよび色分類番号Ｃ（ｍ）が入力され、順次番号対応座標選択手段５７２から出力された識別番号ｍの推定座標ｘ_t|t（ｍ）に基づいて、色分類番号Ｃ（ｍ）ごとに予め設定された領域Ｗ（Ｃ）内に、当該色分類番号Ｃ（ｍ）が付加（付与）されている人物の識別番号ｍの推定座標ｘ_t|t（ｍ）（色分類番号Ｃ（ｍ）が該当している人物の推定座標ｘ_t|t（ｍ）とする）が含まれているか否かを判定するものである。 The two-dimensional threshold value calculation means 573 for each color classification receives the identification number m and the color classification number C (m), and sequentially estimates the estimated coordinates x _{t | t} (m) of the identification number m output from the number corresponding coordinate selection means 572. The estimated coordinates x of the identification number m of the person to which the color classification number C (m) is added (given) in the area W (C) set in advance for each color classification number C (m) _{t | t} (m) (estimated coordinates of a person color classification number C (m) is applicable x _{t |} a _t (m)) is to determine whether or not included.

そして、色分類別２次元閾値演算手段５７３は、次に示す（３１）式を用いて、領域Ｗ（Ｃ）内に、色分類番号Ｃ（ｍ）が該当している人物の推定座標ｘ_t|t（ｍ）が含まれている場合には、監視結果ｗ＝ＴＲＵＥを、領域Ｗ（Ｃ）内に、色分類番号Ｃ（ｍ）が該当している人物の推定座標ｘ_t|t（ｍ）が含まれていない場合には、監視結果ｗ＝ＦＡＬＳＥを、特徴ベクトルを構成する特徴量の一つとして出力するものである。 Then, the color classification two-dimensional threshold value calculation means 573 uses the following equation (31) to estimate the estimated coordinates x _{t of the} person corresponding to the color classification number C (m) in the region W (C). _{If | t} (m) is included, the monitoring result w = TRUE is obtained, and the estimated coordinates x _{t | t} (of the person corresponding to the color classification number C (m) in the region W (C) If m) is not included, the monitoring result w = FALSE is output as one of the feature quantities constituting the feature vector.

例えば、色分類別２次元閾値演算手段５７３は、入力映像がサッカー競技映像である場合、サッカーコートの右側の二つのコーナー（右奥または右手前のコーナー）を中心とするそれぞれ半径ｑメートルの２円内の和集合で構成される領域Ｗ（１）と、サッカーコートの左側の二つのコーナー（左奥または左手前のコーナー）を中心とするそれぞれ半径ｑメートルの２円内の和集合で構成される領域Ｗ（３）とする。 For example, when the input video is a soccer game video, the color classification-specific two-dimensional threshold value calculation means 573 has a radius of q meters each having two corners on the right side of the soccer court (the right back corner or the right front corner). Consists of the union of two circles with a radius of q meters centering on the area W (1) composed of the union in the circle and the two left corners (left back or left front corner) of the soccer court. Region W (3) to be performed.

なお、これらＷ（１）およびＷ（３）の括弧内の数字“１”および“３”は色分類番号を指している。ここでは、この色分類番号は、右側に向かって攻撃するチームのフィールド選手を“１”、右側に向かって攻撃するチームのゴールキーパーを“２”、左側に向かって攻撃するチームのフィールド選手を“３”、左側に向かって攻撃するチームのゴールキーパーを“４”、審判を“５”、その他（ボールボーイ、監督、観客等）を“６”としている。 The numbers “1” and “3” in parentheses of W (1) and W (3) indicate color classification numbers. Here, the color classification number is “1” for the field player of the team attacking toward the right side, “2” for the goalkeeper of the team attacking toward the right side, and the field player of the team attacking toward the left side. “3”, “4” for the goalkeeper of the team attacking to the left, “5” for the referee, and “6” for others (ballboy, director, spectator, etc.).

つまり、Ｗ（１）は、色分類番号１の選手（右側に向かって攻撃するチームのフィール選手）が右奥または右手前のコーナーに来た（いる）場合、Ｗ（３）は色分類番号３の選手（左側に向かって攻撃するチームのフィールド選手）が左奥または左手前のコーナーに来た（いる）場合に監視結果ｗ＝ＴＲＵＥとなる。なお、通常、右側または左側に向かって攻撃しているチーム（攻撃側）のゴールキーパーが、守備側のコーナーに来ることはないので、Ｗ（２）（色分類番号２のゴールキーパー（右側に向かって攻撃するチームのゴールキーパー）が右奥または右手前のコーナーに来た（いる）場合）と、Ｗ（４）（色分類番号４のゴールキーパー（左側に向かって攻撃するチームのゴールキーパー）が左奥または左手前のコーナーに来た（いる）場合）とは空集合φとしておく。 That is, W (1) is the color classification number 1 when the player of the color classification number 1 (the feel player of the team attacking toward the right side) comes to the right back or right front corner (is present). When three players (field players of the team attacking toward the left side) have come to the left back corner or the left front corner, the monitoring result w = TRUE. Normally, the goalkeeper of the team attacking to the right or left (attacking side) does not come to the corner on the defensive side, so W (2) (color classification number 2 goalkeeper (on the right side) W (4) (color classification number 4 goalkeeper (goalkeeper of the team attacking toward the left side) (when the goalkeeper of the team attacking towards) is in the right back or right front corner) ) Comes to (being in) the left back corner or the left front corner), the empty set φ is set.

また、Ｗ（５）（色分類番号５の審判がいずれかのコーナーに来た（いる）場合）およびＷ（６）（色分類番号６のボールボーイ、監督、観客等がいずれかのコーナーに来た（いる）場合）も空集合φと設定しておく。そうすることで、守備側のコーナーに、攻撃側の選手（フィールド選手）がいるか否かを判定することができる。 In addition, W (5) (when the referee of color classification number 5 is in any corner) (W) (6) (ball boy, director, spectator, etc. of color classification number 6 is in any corner) If it comes (if there is), it is set as an empty set φ. By doing so, it is possible to determine whether or not there is an attacking player (field player) in the defensive corner.

なお、特徴ベクトル抽出手段５（図１、図５参照）を実装する場合には、クロック手段５１１（図６参照）、クロック手段５２１（図８参照）、クロック手段５４１（図１０参照）、クロック手段５５１（図１１参照）、クロック手段５６１（図１２参照）およびクロック手段５７１（図１３参照）を共通化し、順次座標選択手段５１３（図６参照）、順次座標選択手段５２２（図８参照）および順次座標選択手段５４２（図１０参照）を共通化し、順次速度選択手段５５２（図１１参照）および順次速度選択手段５６２（図１２参照）を共通化して、より少ない手段数によって行ってもよい。 When the feature vector extracting means 5 (see FIGS. 1 and 5) is mounted, the clock means 511 (see FIG. 6), the clock means 521 (see FIG. 8), the clock means 541 (see FIG. 10), the clock The means 551 (see FIG. 11), the clock means 561 (see FIG. 12) and the clock means 571 (see FIG. 13) are made common, and the sequential coordinate selection means 513 (see FIG. 6) and the sequential coordinate selection means 522 (see FIG. 8). Alternatively, the sequential coordinate selection means 542 (see FIG. 10) may be shared, and the sequential speed selection means 552 (see FIG. 11) and sequential speed selection means 562 (see FIG. 12) may be shared, and the number of means may be reduced. .

図１に戻って、映像解析装置１の構成の説明を続ける。
イベント検出手段６は、特徴ベクトル抽出手段５から出力された特徴ベクトルの少なくとも１つの成分（特徴量）に基づいて、入力映像に含まれている映像シーンを特徴付けるイベント（出来事）を検出し、検出した結果を、フラグＥ（フラグ信号）として、ポストフィルタ手段７に出力するものである。なお、このイベント検出手段６は、必要に応じた数ｎ（ｎは任意の整数）のイベント検出手段６（６−１）、６（６−２）、・・・、６（６−ｎ）を備えることができる。 Returning to FIG. 1, the description of the configuration of the video analysis apparatus 1 will be continued.
The event detection unit 6 detects and detects an event (event) that characterizes the video scene included in the input video based on at least one component (feature amount) of the feature vector output from the feature vector extraction unit 5. The result is output to the post filter means 7 as a flag E (flag signal). The event detection means 6 includes a number n (n is an arbitrary integer) of event detection means 6 (6-1), 6 (6-2), ..., 6 (6-n) as necessary. Can be provided.

つまり、このイベント検出手段６は、特徴ベクトルに含まれている特徴量が、予め設定した条件を満たした場合を、イベントとして検出し、検出した結果を示すフラグＥ（フラグ信号）として出力するものである。なお、予め設定した条件とは、ここでは、入力映像を、サッカー映像として、後記する（３６）式から（４２）式までに示した条件を採用している。このイベント検出手段６の詳細な構成を図１４に示す。 That is, the event detection means 6 detects a case where the feature amount included in the feature vector satisfies a preset condition as an event, and outputs it as a flag E (flag signal) indicating the detection result. It is. Here, the conditions set in advance are based on the conditions shown in equations (36) to (42), which will be described later, where the input image is a soccer image. A detailed configuration of the event detection means 6 is shown in FIG.

図１４に示すように、イベント検出手段６は、特徴ベクトルを構成する特徴量の数ｎに応じた数の特徴量間演算手段６１（６１−１）、６１（６１−２）、・・・、６１（６１−ｎ）と、閾値演算手段６２（６２−１）、６２（６２−２）、・・・、６２（６２−ｎ）と、論理演算手段６３と、遅延手段６４とを備えている。 As shown in FIG. 14, the event detection means 6 has a number of feature quantity calculation means 61 (61-1), 61 (61-2),... According to the number n of feature quantities constituting the feature vector. , 61 (61-n), threshold value calculating means 62 (62-1), 62 (62-2),..., 62 (62-n), logic calculating means 63, and delay means 64. ing.

特徴量間演算手段６１は、特徴ベクトルを構成する複数の特徴量の間で演算を行って、演算した結果の値をγとして、閾値演算手段６２に出力するものである。以下、γを複合特徴量と呼称し、必要に応じて下付きの添え字をして区別するものとする。 The feature amount calculation means 61 performs a calculation between a plurality of feature amounts constituting a feature vector, and outputs the calculated result value to the threshold value calculation means 62 as γ. Hereinafter, γ is referred to as a composite feature amount, and is distinguished by subscripts as necessary.

この特徴量間演算手段６１は、例えば、入力された特徴ベクトルの複数種類の特徴量の中から、一種類の特徴量のみを選択し、選択した特徴量の値を変換すること無く出力することができる。次に示す（３２）式は、人数Ｎ、人物間距離ｄ、平均速さｓに関する複合特徴量（一種類（単一）の特徴量をそのまま選択して出力しているので、実際には複合していないが、便宜上、複合特徴量と呼称する）γ_pop、γ_distおよびγ_spdの一例を示している。 This inter-feature quantity calculation means 61, for example, selects only one type of feature quantity from a plurality of types of feature quantities of the input feature vector and outputs the selected feature quantity value without conversion. Can do. In the following equation (32), a composite feature value (one type (single) feature value) related to the number of people N, the distance d between people, and the average speed s is selected and output as it is. Although not shown for convenience, they are referred to as composite feature values). Examples of γ _pop , γ _dist and γ _spd are shown.

なお、この場合、イベント検出手段６には、特徴量間演算手段６１を設けずに、一種類の特徴ベクトルのみを直接、閾値演算手段６２に入力するようにしてもよい。
また、特徴量間演算手段６１は、例えば、入力れた特徴ベクトルのうち、一種類の特徴量のベクトル値を選択し、当該特徴量に線形または非線形の変換を施して出力することが可能である。次に示す（３３）式は、人物群重心ｇの第一成分（水平成分）、平均速度ｕの第一成分（水平成分）を抽出した複合特徴量γ_gravおよびγ_veloの一例を示している。 In this case, the event detection unit 6 may be configured to directly input only one type of feature vector to the threshold value calculation unit 62 without providing the feature amount calculation unit 61.
Also, the feature quantity calculation means 61 can select, for example, a vector value of one type of feature quantity from among the input feature vectors, and perform linear or nonlinear conversion on the feature quantity and output it. is there. The following equation (33) shows an example of the composite feature amounts γ _grav and γ _velo obtained by extracting the first component (horizontal component) of the person group center of gravity g and the first component (horizontal component) of the average velocity u. .

さらに、特徴量間演算手段６１は、例えば、入力された特徴ベクトルのうち、複数種の特徴量の間で演算を行った結果を、複合特徴量γとして出力することが可能である。例えば、人数Ｎと人物群分布面積Ｖとに基づいて、次に示す（３４）式を用いて、人口密度に相当する複合特徴量γ_pdを演算（計算）することが可能である。 Further, the inter-feature quantity calculation means 61 can output, for example, a result obtained by performing computation among a plurality of types of feature quantities among the input feature vectors as a composite feature quantity γ. For example, based on the number of people N and the person group distribution area V, it is possible to calculate (calculate) the composite feature quantity γ _pd corresponding to the population density using the following equation (34).

さらにまた、特徴量間演算手段６１は、例えば、入力された特徴ベクトルのうち、真理値たる一種類の特徴量のみを選択し、当該真理値のＴＲＵＥおよびＦＡＬＳＥに応じて、それぞれ異なる値を出力することが可能である。次に示す（３５）式は、特定領域監視手段５７から出力された監視結果ｗに応じた複合特徴量γ_rgnの一例を示している。 Furthermore, the feature quantity calculation means 61 selects, for example, only one kind of feature quantity that is a truth value from the input feature vectors, and outputs different values according to the truth values TRUE and FALSE. Is possible. The following equation (35) shows an example of the composite feature amount γ _rgn corresponding to the monitoring result w output from the specific area monitoring unit 57.

なお、この特定領域監視手段５７から出力された監視結果ｗに応じた複合特徴量γ_rgnの場合、イベント検出手段６には、特徴量間演算手段６１および閾値演算手段６２を設けずに、監視結果ｗ（真理値たる一種類の特徴量）を直接、論理演算手段６３に入力してもよい。 In the case of the composite feature amount γ _rgn corresponding to the monitoring result w output from the specific region monitoring unit 57, the event detection unit 6 is not provided with the inter-feature amount calculation unit 61 and the threshold value calculation unit 62, and is monitored. The result w (one kind of feature value as a truth value) may be directly input to the logical operation means 63.

閾値演算手段６２は、特徴量間演算手段６１から出力された複合特徴量γが、予め設定された範囲内にある場合に、真理値Ｌ＝ＴＲＵＥを、予め設定された範囲内にない場合に、真理値Ｌ＝ＦＡＬＳＥを、論理演算手段６３に出力するものである。この閾値演算手段６２は、例えば、複合特徴量γとして、平均速さγ_sが入力され、予め設定された閾値θ_s（以下、θに下付添え字は、添え字に対応した閾値を示す）により設定された範囲γ_s≧θ_sを、平均速さγ_sが満たす場合、真理値Ｌ＝ＴＲＵＥを、平均速さγ_sを満たさない場合、真理値Ｌ＝ＦＡＬＳＥを出力することが可能である。 The threshold value calculation means 62, when the composite feature quantity γ output from the feature quantity calculation means 61 is within a preset range, when the truth value L = TRUE is not within the preset range. , Truth value L = FALSE is output to the logical operation means 63. This threshold value calculation means 62 receives, for example, an average speed γ _s as a composite feature amount γ, and a preset threshold value θ _s (hereinafter, subscripts attached to θ indicate threshold values corresponding to the subscripts). ) If the average speed γ _s satisfies the range γ _s ≧ θ _s set by), the truth value L = TRUE can be output, and if the average speed γ _s is not satisfied, the truth value L = FALSE can be output. It is.

論理演算手段６３は、少なくとも１つの閾値演算手段６２から出力された真理値Ｌ₁、Ｌ₂、・・・、Ｌ_n（以下、真理値Ｌの下付添え字により、複数の真理値Ｌを区別する）および遅延手段６４から出力された真理値Ｊ（詳細は後記する）に基づいて、予め設定した論理演算を行って、この論理演算を行った演算結果を、イベントを示すフラグＥ（以下、フラグＥの下付添え字により、複数のフラグＥを区別する）として出力するものである。 The logical operation means 63 outputs the truth values L ₁ , L ₂ ,..., L _n (hereinafter referred to as truth value L subscripts) from the at least one threshold value operation means 62. And a logical operation set in advance based on the truth value J (details will be described later) output from the delay means 64, and the operation result obtained by performing this logical operation is represented by a flag E (hereinafter referred to as an event). , A plurality of flags E are distinguished by a subscript of the flag E).

この論理演算手段６３は、遅延手段６４から出力された真理値Ｊが存在する場合には、当該真理値Ｊに対しても、真理値Ｌに行った論理演算と、同一または異なる論理演算を行って、この論理演算を行った演算結果を、真理値（複数の真理値からなるベクトル）ｅとして、遅延手段６４に出力する。 When there is a truth value J output from the delay means 64, the logic operation means 63 performs the same or different logic operation on the truth value J as the logic operation performed on the truth value L. The result of the logical operation is output to the delay means 64 as a truth value (vector consisting of a plurality of truth values) e.

遅延手段６４は、論理演算手段６３から出力された真理値ｅ（真理値ｅがベクトルの場合、各成分）を所定時間遅延して、真理値Ｊとして、論理演算手段６３に出力するものである。なお、イベント検出手段６には、遅延手段６４を設けずに、論理演算手段６３から真理値ｅを出力することと、真理値Ｊを出力することとを省略してもよい。 The delay means 64 delays the truth value e output from the logic operation means 63 (each component when the truth value e is a vector) by a predetermined time, and outputs it to the logic operation means 63 as a truth value J. . Note that the event detection means 6 may be omitted without providing the delay means 64 and outputting the truth value e from the logic operation means 63 and outputting the truth value J.

ここで、入力映像がサッカー映像である場合のイベント検出手段６の処理について説明する。
サッカー映像において、コーナーキックのイベントを検出する場合には、例えば、コーナーに一人の攻撃側の選手がいて、攻撃側の選手および守備側の選手の動きが少なく、ペナルティエリア内に多くの選手が存在し、且つ、人口密度が高い状態を検出すればよいことになる。こういった状況を、次に示す（３６）式を用いて表し、コーナーキックのイベントに関するフラグＥ_CKを求めることができる。 Here, the processing of the event detection means 6 when the input video is a soccer video will be described.
When detecting a corner kick event in a soccer video, for example, there is one attacking player in the corner, there is little movement of the attacking player and the defensive player, and there are many players in the penalty area. It is only necessary to detect a state that exists and has a high population density. Such a situation can be expressed using the following equation (36), and the flag _ECK relating to the corner kick event can be obtained.

この（３６）式において、Ｌ_corner1は、コーナー（右側の２コーナーに設定した半径ｑメートルの２つ円内の和集合Ｗ（１）および左側の２コーナーに設定した半径ｑメートルの２つの円内の和集合Ｗ（３）、並びに、空集合φであるＷ（２）、Ｗ（４）、Ｗ（５）およびＷ（６））のいずれかに選手がいるか否かを示す真理値であり、この場合、ｗ（ＴＲＵＥ）であるので、コーナーに選手がいることを示している。Ｌ_static1は、平均速さｓが平均速さｓの閾値θ_static1以内または以上であるかを示す真理値であり、この場合、ｓ≦θ_static1であるので、平均速さｓは閾値θ_static1以内であることを示している。また、Ｌ_many1は、人数Ｎが人数Ｎの閾値θ_many1以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_manyであるので、人数Ｎは閾値θ_many1以上であることを示している。Ｌ_dense1は、人口密度γ_pdが人口密度の閾値θ_dense1以内または以上であるかを示す真理値であり、この場合、γ_pd≧θ_dense1であるので、人口密度γ_pdはθ_dense1以上であることを示している。Ｅ_ck1は、コーナーキックのイベントに関するフラグの一つであり、この場合、Ｌ_corner1、Ｌ_static1、Ｌ_many1およびＬ_dense1のすべてを満たす場合に検出される。 In this equation (36), L _corner1 is a corner (the union W (1) in two circles with a radius of q meters set at the two right corners and two circles with a radius of q meters set at the two left corners) Truth value indicating whether or not there is a player in any of the union set W (3) and the empty set φ W (2), W (4), W (5) and W (6)) Yes, in this case, w (TRUE), indicating that there is a player in the corner. L _static1 is a truth value indicating whether the average speed s is within or above the threshold θ _static1 of the average speed s. In this case, since s ≦ θ _static1 , the average speed s is within the threshold θ _static1. It is shown that. Further, L _many1 is a truth value indicating whether the number N is within the threshold value θ _{many1 of the} number N or more. In this case, N ≧ θ _many , so that the number N is _{equal to} or more than the threshold value θ _many1. Show. L _dense1 is a truth value indicating whether the population density γ _pd is within or above the population density threshold θ _dense1 , and in this case, since γ _pd ≧ θ _dense1 , the population density γ _pd is _greater than or _equal to θ _dense1. It is shown that. E _ck1 is one of the flags related to the corner kick event. In this case, E _ck1 is detected when all of L _corner1 , L _static1 , L _many1 and L _dense1 are satisfied.

なお、ここでは、イベント検出手段６１は、特定領域監視手段５７（図５、図１３参照）から出力された監視結果ｗを、特徴量間演算手段６１および閾値演算手段６２を介さずに、直接、真理値Ｌ_corner1として、設定している。また、イベント検出手段６１は、人物計測手段５１（図５、図６参照）から出力された人数Ｎと、平均速さ計測手段５５（図５、図１１参照）から出力された平均速さｓとを、特徴量間演算手段６１を介さず、直接、閾値演算手段６２に入力し、この閾値演算手段６２の演算結果（計算結果）を、真理値Ｌ_many1、Ｌ_static1として設定している。 Here, the event detection unit 61 directly outputs the monitoring result w output from the specific region monitoring unit 57 (see FIGS. 5 and 13) without using the feature amount calculation unit 61 and the threshold value calculation unit 62. , The truth value L _corner1 is set. Further, the event detection means 61 includes the number N of people output from the person measurement means 51 (see FIGS. 5 and 6) and the average speed s output from the average speed measurement means 55 (see FIGS. 5 and 11). _Are directly input to the threshold value calculation unit 62 without using the feature quantity calculation unit 61, and the calculation results (calculation results) of the threshold value calculation unit 62 are set as the truth values L _many1 and L _static1 .

また、サッカー映像において、フリーキックのイベントを検出する場合、フリーキックのイベントに関するフラグＥ_FKは、次に示す（３７）式を用いて求めることができる。 Further, when a free kick event is detected in a soccer video, the flag _EFK related to the free kick event can be obtained using the following equation (37).

この（３７）式において、Ｌ_corner2は、コーナー（右側の２コーナーに設定した半径ｑメートルの２つ円内の和集合Ｗ（１）および左側の２コーナーに設定した半径ｑメートルの２つの円内の和集合Ｗ（３）、並びに、空集合φであるＷ（２）、Ｗ（４）、Ｗ（５）およびＷ（６））のいずれかに選手がいるか否かを示す真理値であり、この場合、ｗ（ＴＲＵＥ）であるので、コーナーに選手がいることを示している。Ｌ_static2は、平均速さｓが平均速さｓの閾値θ_static2以内または以上であるかを示す真理値であり、この場合、ｓ≦θ_static2であるので、平均速さｓは閾値θ_static2以内であることを示している。また、Ｌ_many2は、人数Ｎが人数Ｎの閾値θ_many2以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many2であるので、人数Ｎは閾値θ_many2以上であることを示している。Ｌ_dense2は、人口密度γ_pdが人口密度の閾値θ_dense以内または以上であるかを示す真理値であり、この場合、γ_pd≧θ_dense2であるので、人口密度γ_pdはθ_dense2以上であることを示している。Ｅ_Fk2は、フリーキックのイベントに関するフラグの一つであり、この場合、Ｌ_corner2に（¬：論理否定）が付加されているので、Ｌ_corner2を満たさず、且つ、Ｌ_static2、Ｌ_many2およびＬ_dense2を満たす場合に検出される。
つまり、フリーキックの場合、コーナーキックとは異なり、コーナーエリアに選手がいることはないと想定でき、他の状況はコーナーキックとほぼ同じとなる。 In this equation (37), L _corner2 is a corner (a union W (1) in two circles with a radius of q meters set at the two right corners and two circles with a radius of q meters set at the two left corners) Truth value indicating whether or not there is a player in any of the union set W (3) and the empty set φ W (2), W (4), W (5) and W (6)) Yes, in this case, w (TRUE), indicating that there is a player in the corner. L _static2 is a truth value indicating whether the average speed s is within the threshold value θ _static2 of the average speed s or more, and in this case, since s ≦ θ _static2 , the average speed s is within the threshold value θ _static2. It is shown that. In addition, L _many2 is a truth value indicating whether the number N is within the threshold value θ _{many2 of the} number N or more. In this case, N ≧ θ _many2 , so that the number N is _{equal to} or more than the threshold value θ _many2. Show. L _dense2 is a truth value indicating whether the population density γ _pd is within or above the population density threshold θ _dense . In this case, since γ _pd ≧ θ _dense2 , the population density γ _pd is _greater than or _equal to θ _dense2. It is shown that. E _Fk2 is one of the flags relating to the free kick event. In this case, _since ( _corner : logical negation) is added to L _corner2 , L _corner2 is not satisfied, and L _static2 , L _many2 and L _Detected when _dense2 is satisfied.
In other words, unlike a corner kick, in the case of a free kick, it can be assumed that there are no players in the corner area, and other situations are almost the same as the corner kick.

また、サッカー映像において、左側に向かって攻撃しているイベントを検出する場合、左側に向かって攻撃しているイベントに関するフラグＥ_leftは、次に示す（３８）式を用いて求めることができる。 Further, when an event attacking toward the left side is detected in the soccer video, the flag E _left related to the event attacking toward the left side can be obtained using the following equation (38).

この（３８）式において、Ｌ_left3は、平均速度ｕの第一成分（右向きを正とした場合の平均速度ｕの水平成分）を符号反転した−γ_velo（左に向かう選手の平均速度）が平均速度ｕの閾値θ_left3以内または以上であるかを示す真理値であり、この場合、−γ_velo≧θ_left3であるので、平均速度ｕの第一成分を符号反転した−γ_veloは閾値θ_left3以上であることを示している。また、Ｌ_many3は、人数Ｎが人数Ｎの閾値θ_many3以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many3であるので、人数Ｎは閾値θ_many3以上であることを示している。Ｅ_left3は、左側に向かって攻撃しているイベントに関するフラグの一つであり、この場合、Ｌ_left3およびＬ_many3を満たす場合に検出される。 In this equation (38), L _left3 is -γ _velo (the average speed of the player heading to the left) obtained by inverting the sign of the first component of the average speed u (the horizontal component of the average speed u when the right direction is positive). Truth value indicating whether the average velocity u is within or above the threshold value θ _left3 , and in this case, −γ _velo ≧ θ _left3 , and therefore −γ _velo obtained by inverting the sign of the first component of the average velocity u is the threshold θ Indicates that it is more than _left3 . In addition, L _many3 is a truth value indicating whether the number N is within the threshold value θ _{many3 of the} number N or more. In this case, N ≧ θ _many3 , so that the number N is _{equal to} or more than the threshold value θ _many3. Show. E _left3 is one of the flags related to the event that is attacking toward the left side. In this case, E _left3 is detected when L _left3 and L _many3 are satisfied.

また、サッカー映像において、右側に向かって攻撃しているイベントを検出する場合、右側に向かって攻撃しているイベントに関するフラグＥ_rightは、次に示す（３９）式を用いて求めることができる。 Further, when an event attacking toward the right side is detected in the soccer video, the flag E _right related to the event attacking toward the right side can be obtained using the following equation (39).

この（３９）式において、Ｌ_right4は、平均速度ｕの第一成分（右向きを正とした場合の平均速度ｕの水平成分）γ_velo（右に向かう選手の平均速度）が平均速度ｕの閾値θ_right4以内または以上であるかを示す真理値であり、この場合、γ_velo≧θ_right4であるので、平均速度ｕの第一成分γ_veloは閾値θ_right4以上であることを示している。また、Ｌ_many4は、人数Ｎが人数Ｎの閾値θ_many4以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many4であるので、人数Ｎは閾値θ_many4以上であることを示している。Ｅ_right4は、右側に向かって攻撃しているイベントに関するフラグの一つであり、この場合、Ｌ_right4およびＬ_many4を満たす場合に検出される。 In this equation (39), L _right4 is the first component of the average speed u (the horizontal component of the average speed u when the right direction is positive) γ _velo (the average speed of the player heading to the right) is the threshold value of the average speed u It is a truth value indicating whether it is within or above θ _right4 . In this case, since γ _velo ≧ θ _right4 , the first component γ _velo of the average speed u is _greater than or _equal to the threshold θ _right4 . Further, L _many4 is a truth value indicating whether the number N is within the threshold value θ _{many4 of the} number N or more. In this case, N ≧ θ _many4 , so that the number N is _{equal to} or more than the threshold value θ _many4. Show. E _right4 is one of the flags relating to the event that is attacking toward the right side. In this case, E _right4 is detected when L _right4 and L _many4 are satisfied.

さらに、サッカー映像において、左側のゴール付近で、左側に向かって攻撃しているイベントを検出する場合、この左側ゴール付近で、左側に向かって攻撃しているイベントに関するフラグＥ_left＿goalは、次に示す（４０）式を用いて求めることができる。 Further, when an event attacking toward the left side is detected near the left goal in the soccer video, the flag E _{left_goal} related to the event attacking toward the left side near the left goal is as follows. (40) It can obtain | require using Formula.

この（４０）式において、Ｌ_left5は、平均速度ｕの第一成分（右向きを正とした場合の平均速度ｕの水平成分）を符号反転した−γ_velo（左に向かう選手の平均速度）が平均速度ｕの閾値θ_left5以内または以上であるかを示す真理値であり、この場合、−γ_velo≧θ_left5であるので、平均速度ｕの第一成分を符号反転した−γ_veloは閾値θ_left5以上であることを示している。また、Ｌ_goal5は、人物群重心ｇの第一成分（右向きを正とした場合の人物群重心ｇの水平成分）を符合反転した−γ_gravが人物群重心ｇの閾値θ_goal5以内または以上であるかを示す真理値であり、この場合、−γ_grav≧θ_goal5であるので、人物群重心ｇの第一成分を符合反転した−γ_gravは閾値θ_goal5以上であることを示している。Ｌ_many5は、人数Ｎが人数Ｎの閾値θ_many5以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many5であるので、人数Ｎは閾値θ_many5以上であることを示している。Ｅ_left＿goal5は、左側ゴール付近で、左側に向かって攻撃しているイベントに関するフラグの一つであり、この場合、Ｌ_left5、Ｌ_goal5およびＬ_many5を満たす場合に検出される。 In this equation (40), L _left5 is -γ _velo (average speed of the player heading to the left) obtained by inverting the sign of the first component of the average speed u (the horizontal component of the average speed u when the right direction is positive). This is a truth value indicating whether the average speed u is within or above the threshold value θ _left5 , and in this case, −γ _velo ≧ θ _left5 , and therefore −γ _velo obtained by inverting the sign of the first component of the average speed u is the threshold θ Indicates that it is more than _left5 . Further, L _Goal5 is the first component of the person group centroid g (the right positive and the horizontal component of the person group centroid g in the case of) a sign inverted-gamma _grav threshold theta _Goal5 within or more person group centroid g the truth value indicating whether, in this case, since it is -γ _{_grav} ≧ _θ goal5, -γ _grav that sign inverting the first component of the person group centroid g indicates that the threshold value theta _Goal5 more. L _many5 is a truth value indicating whether the number N is within the threshold θ _{many5 of the} number N or more, and in this case, since N ≧ θ _many5 , it indicates that the number N is the threshold θ _many5 or more. Yes. E _{left_goal5} is one of the flags related to the event that is attacking toward the left side near the left goal, and is detected when L _left5 , L _goal5, and L _many5 are satisfied.

さらにまた、サッカー映像において、右側のゴール付近で、右側に向かって攻撃しているイベントを検出する場合、この右側ゴール付近で、右側に向かって攻撃しているイベントに関するフラグＥ_right＿goalは、次に示す（４１）式を用いて求めることができる。 Furthermore, in the soccer video, when an event attacking toward the right side is detected near the right goal, the flag E _{right_goal} regarding the event attacking toward the right side near the right goal is It can be obtained using the equation (41) shown.

この（４１）式において、Ｌ_right6は、平均速度ｕの第一成分（右向きを正とした場合の平均速度ｕの水平成分）γ_velo（右に向かう選手の平均速度）が平均速度ｕの閾値θ_right6以内または以上であるかを示す真理値であり、この場合、γ_velo≧θ_right6であるので、平均速度ｕの第二成分γ_veloは閾値θ_right6以上であることを示している。また、Ｌ_goal6は、人物群重心ｇの第一成分（右向きを正とした場合の人物群重心ｇの水平成分）γ_gravが人物群重心ｇの閾値θ_goal6以内または以上であるかを示す真理値であり、この場合、γ_grav≧θ_goal6であるので、人物群重心ｇの第一成分γ_gravは閾値θ_goal6以上であることを示している。Ｌ_many6は、人数Ｎが人数Ｎの閾値θ_many6以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many6であるので、人数Ｎは閾値θ_many6以上であることを示している。Ｅ_{right＿goal6}は、右側ゴール付近で、右側に向かって攻撃しているイベントに関するフラグの一つであり、この場合、Ｌ_right6、Ｌ_goal6およびＬ_many6を満たす場合に検出される。 In this equation (41), L _right6 is the threshold value of the first component of the average speed u (the horizontal component of the average speed u when the right direction is positive) γ _velo (the average speed of the player toward the right) is the average speed u It is a truth value indicating whether it is within or above θ _right6 . In this case, since γ _velo ≧ θ _right6 , the second component γ _velo of the average speed u is _greater than or _equal to the threshold θ _right6 . Also, L _goal6 is a truth indicating whether the first component of the human group gravity center g (the horizontal component of the human group gravity center g when the right direction is positive) γ _grav is within or above the threshold θ _goal6 of the human group gravity center g. In this case, since γ _grav ≧ θ _{goal 6} , the first component γ _grav of the person group center of gravity g indicates a threshold θ _{goal 6} or more. L _many6 is a truth value indicating whether the number N is within the threshold value θ _{many6 of the} number N or more. In this case, since N ≧ θ _many6 , the number N indicates that the number N is the threshold value θ _many6 or more. Yes. E _{right_goal6} is one of the flags related to the event that is attacking toward the right side near the right side goal. In this case, E _{right_goal6} is detected when L _right6 , L _goal6, and L _many6 are satisfied.

或いはまた、サッカー映像において、攻撃の方向が右方向から左方向に、または、左方向から右方向に変化する瞬間のイベントに関するフラグＥ_turnは、次に示す（４２）式を用いて求めることができる。 Alternatively, in the soccer video, the flag E _turn related to the event at the moment when the attack direction changes from the right direction to the left direction or from the left direction to the right direction can be obtained using the following equation (42). it can.

この（４２）式において、Ｌ_right7は、平均速度ｕの第一成分γ_velo（右に向かう選手の平均速度）が平均速度ｕの閾値θ_right7以内または以上であるかを示す真理値であり、この場合、γ_velo≧θ_right7であるので、平均速度ｕの第一成分γ_veloは閾値θ_right7以上であることを示している。Ｌ_left7は、平均速度ｕの第一成分を符合反転した−γ_velo（左に向かう選手の平均速度）が平均速度ｕの閾値θ_left7以内または以上であるかを示す真理値であり、この場合、−γ_velo≦θ_left7であるので、平均速度ｕの第一成分を符合反転した−γ_veloは閾値θ_left7以内であることを示している。また、Ｌ_many7は、人数Ｎが人数Ｎの閾値θ_many7以内または以上であるかを示す真理値であり、この場合、Ｎ≧θ_many7であるので、人数Ｎは閾値θ_many7以上であることを示している。ｅ_right7はフラグＥ_turnを検出するために遅延手段６４に入力する真理値の一つであり、Ｌ_right7およびＬ_many7を満たす場合に検出される。ｅ_left7はフラグＥ_turnを検出するために遅延手段６４に入力する真理値の一つであり、Ｌ_left7およびＬ_many7を満たす場合に検出される。Ｅ_turn7は、攻撃の方向が右方向から左方向に、または、左方向から右方向に変化する瞬間のイベントに関するフラグの一つであり、この場合、Ｌ_left7および１単位時間過去のｅ_right7（遅延手段６４で遅延されたｅ_right7）を満たすか、または、Ｌ_right7および１単位時間過去のｅ_left7（遅延手段６４で遅延されたｅ_left7）を満たし、且つ、Ｌ_many7を満たす場合に検出される。 In this equation (42), L _right7 is a truth value indicating whether the first component γ _velo of the average speed u (the average speed of the player heading to the right) is within or above the threshold θ _right7 of the average speed u, In this case, since γ _velo ≧ θ _right7 , the first component γ _velo of the average speed u is _greater than or _equal to the threshold θ _right7 . L _left7 is a truth value indicating whether -γ _velo (average speed of the player heading to the left) _{obtained by sign-} inverting the first component of the average speed u is within or above the threshold θ _left7 of the average speed u. since in -γ _{_velo} ≦ _θ left7, -γ _velo which the first component and sign inversion of the average velocity u indicates that it is within the threshold _θ left7. In addition, L _many7 is a truth value indicating whether the number N is within the threshold value θ _{many7 of the} number N or more. In this case, since N ≧ θ _many7 , the number N is _{equal to} or more than the threshold value θ _many7. Show. e _right7 is one of the truth values input to the delay means 64 in order to detect the flag E _turn, and is detected when L _right7 and L _many7 are satisfied. e _left7 is one of the truth values input to the delay means 64 in order to detect the flag E _turn and is detected when L _left7 and L _many7 are satisfied. E _turn7 is one of the flags relating to the event at the moment when the attack direction changes from right to left or from left to right. In this case, L _left7 and e _right7 (one right past e _right7 ( meet or e _right7) delayed by the delay means 64, or satisfies L _Right7 and 1 unit of time past _e left7 (delay means 64 e is delayed by _Left7), and are detected when satisfying L _Many7 The

図１に戻って、映像解析装置１の構成の説明を続ける。
ポストフィルタ手段７は、イベント検出手段６から出力された１以上のフラグＥに対して、時間方向のフィルタ処理と、フラグＥ間の論理演算処理との少なくとも一方の処理を行って、最終的なイベント出力（イベントを特定するイベント出力信号）εを求めるものである。以下、イベント出力εに付される下付添え字（後記する）により、複数のイベント出力εを区別することとする。このポストフィルタ手段７の詳細な構成を図１５に示す。 Returning to FIG. 1, the description of the configuration of the video analysis apparatus 1 will be continued.
The post filter means 7 performs at least one of a time-direction filtering process and a logical operation process between the flags E on one or more flags E output from the event detecting means 6 to obtain a final result. Event output (event output signal for specifying an event) ε is obtained. Hereinafter, a plurality of event outputs ε are distinguished by subscripts (described later) attached to the event output ε. A detailed configuration of the post filter means 7 is shown in FIG.

図１５に示すように、ポストフィルタ手段７は、時間率フィルタ手段７１（７１−１、７１−２、・・・、７１−７）と、タイムアウト処理手段７２（７２−１、７２−２、・・・、７２−７）と、イベント特定論理演算手段７３（７３−２、７３−４、７３−５、７３−６）とを備えている。 As shown in FIG. 15, the post filter means 7 includes a time rate filter means 71 (71-1, 71-2,..., 71-7) and a timeout processing means 72 (72-1, 72-2, 72-7) and event specifying logic operation means 73 (73-2, 73-4, 73-5, 73-6).

時間率フィルタ手段７１は、入力されたフラグＥそれぞれに対して、時間方向のフィルタ処理を施すものである。この時間率フィルタ手段７１の詳細な構成を図１６に示す。
図１６に示すように、時間率フィルタ手段７１は、遅延手段７１１（７１１−１、７１１−２、７１１−３、・・・、７１１−（Δ−１））と、時間率演算手段７１２と、閾値演算手段７１３とを備えている。 The time rate filter means 71 performs filtering in the time direction for each of the input flags E. A detailed configuration of this time rate filter means 71 is shown in FIG.
As shown in FIG. 16, the time rate filter means 71 includes delay means 711 (711-1, 711-2, 711-3,..., 711- (Δ−1)), time rate calculation means 712, , And threshold value calculation means 713.

遅延手段７１１は、複数（Δ−１）個（Δは１以上の整数）から構成されており、入力されたフラグＥを１単位時間遅延させて、時間率演算手段７１２に出力すると共に、連続して配置されている次の遅延手段７１１に出力するものである。 The delay means 711 is composed of a plurality (Δ−1) (Δ is an integer equal to or greater than 1). The delay means 711 delays the input flag E by one unit time and outputs it to the time rate calculation means 712 and continuously. Are output to the next delay means 711.

時間率演算手段７１２は、時間率フィルタ手段７１に入力されるフラグＥの真理値ＴＲＵＥおよび遅延手段７１１から出力される（Δ−１）個のフラグＥの真理値ＴＲＵＥの総数を数え、数えた総数をΔで除算した結果（除算結果、時間率ρ）を、閾値演算手段７１３に出力するものである。 The time rate calculation means 712 counts and counts the total number of truth values TRUE of the flag E input to the time rate filter means 71 and the truth value TRUE of (Δ−1) flags E output from the delay means 711. The result obtained by dividing the total number by Δ (division result, time rate ρ) is output to the threshold value calculation means 713.

閾値演算手段７１３は、時間率演算手段７１２から出力された除算結果（時間率ρ）が、予め設定した閾値θρ以上上であるか否かを判定し、閾値θρ以上であった場合に、真理値ＴＲＵＥを、閾値θρ以上でなかった場合に、真理値ＦＡＬＳＥを、タイムアウト処理手段７２（図１５）に出力するものである。 The threshold value calculation means 713 determines whether or not the division result (time rate ρ) output from the time rate calculation means 712 is greater than or equal to a preset threshold value θρ. When the value TRUE is not equal to or greater than the threshold value θρ, the truth value FALSE is output to the timeout processing means 72 (FIG. 15).

ここで、時間率フィルタ手段７１の処理を、図１７に示すフラグＥの真理値（ＴＲＵＥまたはＦＡＬＳＥ）の例を参照して説明する（適宜、図１６参照）。
まず、時間率フィルタ手段７１は、（Δ−１）個の遅延手段７１１によって、時刻ｔ（現時点ｔ）から時刻ｔ−Δ（Δ単位時間過去の時点ｔ−Δ）に至る幅Δの窓を、入力されたフラグＥの真理値の時系列に対して設定する。続いて、時間率フィルタ手段７１は、時間率演算手段７１２によって、当該窓内において、真理値がＴＲＵＥであった割合（時間率ρ）を求めて閾値演算手段７１３に出力する。そして、時間率フィルタ手段７１は、閾値演算手段７１３によって、時間率ρと閾値θρとの比較を行い、時間率ρが閾値θρ以上になった場合には、出力をＴＲＵＥ（入力真理値をＴＲＵＥ）とし、時間率ρが閾値θρ未満の場合には、出力をＦＡＬＳＥ（入力真理値をＦＡＬＳＥ）とする。 Here, the processing of the time rate filter means 71 will be described with reference to an example of the truth value (TRUE or FALSE) of the flag E shown in FIG. 17 (see FIG. 16 as appropriate).
First, the time rate filter means 71 uses the (Δ−1) delay means 711 to open a window of width Δ from time t (current time t) to time t−Δ (Δ unit time past time t−Δ). , It is set for the time series of the truth value of the input flag E. Subsequently, the time rate filter unit 71 obtains a ratio (time rate ρ) in which the truth value is TRUE within the window by the time rate calculation unit 712, and outputs it to the threshold value calculation unit 713. Then, the time rate filter means 71 compares the time rate ρ with the threshold value θρ by the threshold value calculating means 713, and when the time rate ρ becomes equal to or greater than the threshold value θρ, the output is TRUE (the input truth value is TRUE). If the time rate ρ is less than the threshold θρ, the output is FALSE (the input truth value is FALSE).

図１５に示したタイムアウト処理手段７２は、時間率フィルタ手段７１から入力された入力真理値（ＴＲＵＥまたはＦＡＬＳＥ）を、δ単位時間に基づいて、タイムアウト処理を行って、真理値ＴＲＵＥまたは真理値ＦＡＬＳＥを、イベント特定論理演算手段７３に出力するものである。 The time-out processing means 72 shown in FIG. 15 performs a time-out process on the input truth value (TRUE or FALSE) input from the time rate filter means 71 based on the δ unit time, and the truth value TRUE or the truth value FALSE. Is output to the event specifying logic operation means 73.

すなわち、このタイムアウト処理手段７２は、時間率フィルタ手段７１から入力された入力真理値がＴＲＵＥの場合には、真理値ＴＲＵＥを、イベント特定論理演算手段７３に出力するものである。また、タイムアウト処理手段７２は、入力真理値がＦＡＬＳＥであり、且つ、入力真理値がＴＲＵＥからＦＡＬＳＥに立ち下がった時点からδ単位時間を経過するまでの間は、真理値ＴＲＵＥを出力する。 That is, when the input truth value input from the time rate filter means 71 is TRUE, the timeout processing means 72 outputs the truth value TRUE to the event specifying logic operation means 73. Further, the timeout processing means 72 outputs the truth value TRUE from the time when the input truth value is FALSE and the input truth value falls from TRUE to FALSE until a δ unit time elapses.

また、このタイムアウト処理手段７２は、時間率フィルタ手段７１から入力された入力真理値がＦＡＬＳＥであり、且つ、入力真理値がＴＲＵＥからＦＡＬＳＥに立ち下がった時点からδ単位時間を超える場合には、真理値ＦＡＬＳＥを出力する。このタイムアウト処理手段７２の詳細な構成を図１８に示す。 The time-out processing unit 72 is configured such that when the input truth value input from the time rate filter unit 71 is FALSE and the input truth value exceeds δ unit time from the time when the input truth value falls from TRUE to FALSE, The truth value FALSE is output. A detailed configuration of the time-out processing means 72 is shown in FIG.

図１８に示すように、タイムアウト処理手段７２は、単安定マルチバイブレータ手段７２１と、論理和演算手段７２２とを備えている。
単安定マルチバイブレータ手段７２１は、入力真理値（ＴＲＵＥまたはＦＡＬＳＥ）の立ち下がりエッジ（ＴＲＵＥからＦＡＬＳＥになった瞬間）を検出し、最近（直近）の立ち下がりの時点からδ単位時間を経過するまでの間のみ真理値ＴＲＵＥを、論理和演算手段７２２に出力する。また、単安定マルチバイブレータ手段７２１は、最近の立ち下がり時点からδ単位時間を経過した後は、真理値ＦＡＬＳＥを、論理和演算手段７２２に出力するものである。 As shown in FIG. 18, the timeout processing means 72 includes monostable multivibrator means 721 and OR operation means 722.
The monostable multivibrator means 721 detects the falling edge (the moment when TRUE or FALSE) of the input truth value (TRUE or FALSE), and until δ unit time elapses from the latest (most recent) falling time The truth value TRUE is output to the logical sum operation means 722 only during The monostable multivibrator unit 721 outputs the truth value FALSE to the logical sum operation unit 722 after δ unit time has elapsed from the latest falling point.

論理和演算手段７２２は、タイムアウト処理手段７２に入力された入力真理値と、単安定マルチバイブレータ手段７２１から出力された真理値との論理和を演算（計算）し、演算結果を出力するものである。 The logical sum calculation means 722 calculates (calculates) the logical sum of the input truth value input to the timeout processing means 72 and the truth value output from the monostable multivibrator means 721, and outputs the calculation result. is there.

ここで、タイムアウト処理手段７２の処理を、図１９に示す入力真理値の例を参照して説明する（適宜、図１８参照）。
まず、タイムアウト処理手段７２は、入力真理値の時系列に対して、単安定マルチバイブレータ手段７２１によって、入力真理値の立ち下がりエッジを検出し、最近（直近）の立ち下がりの時点からδ単位時間を経過するまでの間のみ真理値ＴＲＵＥを、最近の立ち下がり時点からδ単位時間を経過した後は、真理値ＦＡＬＳＥを、論理和演算手段７２２に出力する。そして、タイムアウト処理手段７２は、論理和演算手段７２２によって、入力された入力真理値と、単安定マルチバイブレータ手段７２１から出力された真理値との論理和（いわゆる、ｏｒ）をとる。つまり、タイムアウト処理手段７２の入力と、単安定マルチバイブレータ手段７２１の出力とのいずれかがＴＲＵＥの場合には、真理値ＴＲＵＥを出力する。 Here, the processing of the timeout processing means 72 will be described with reference to an example of the input truth value shown in FIG. 19 (see FIG. 18 as appropriate).
First, the time-out processing means 72 detects the falling edge of the input truth value by the monostable multivibrator means 721 with respect to the time series of the input truth value, and the δ unit time from the latest (most recent) fall time point. The truth value TRUE is output to the logical sum calculation means 722 only after the lapse of δ, and the truth value FALSE is output after δ unit time has elapsed from the latest falling point. Then, the time-out processing means 72 takes the logical sum (so-called or) of the input truth value input from the OR operation means 722 and the truth value output from the monostable multivibrator means 721. That is, when either the input of the timeout processing unit 72 or the output of the monostable multivibrator unit 721 is TRUE, the truth value TRUE is output.

なお、図１５に示したイベント特定論理演算手段７３は、タイムアウト処理手段７２から出力された真理値ＴＲＵＥまたは真理値ＦＡＬＳＥとに基づいて、論理演算を行って、論理演算を行った結果をイベント出力（イベント出力信号）として出力するものである。 The event specifying logic operation means 73 shown in FIG. 15 performs a logic operation based on the truth value TRUE or the truth value FALSE output from the timeout processing means 72, and outputs the result of the logic operation as an event output. (Event output signal).

例えば、イベント特定論理演算手段７３（７３−２）は、タイムアウト処理手段７２（７２−１）から出力された真理値ＦＡＬＳＥと、タイムアウト処理手段７２（７２−１）から出力された真理値との論理積（いわゆる、ａｎｄ）を計算し、論理演算を行った結果を、イベント出力ε_FKとして出力する。 For example, the event specifying logic operation means 73 (73-2) calculates the truth value FALSE output from the timeout processing means 72 (72-1) and the truth value output from the timeout processing means 72 (72-1). The logical product (so-called and) is calculated, and the result of the logical operation is output as the event output ε _FK .

このように、コーナーキックのイベント出力ε_CKと、フリーキックのイベント出力ε_FKとの論理積をとっている理由は、サッカー映像において、コーナーキックとフリーキックとを比較すると、ボールを蹴る位置（コーナーか任意の箇所）が異なっているだけで、選手の平均速さや人口密度といった状況は似かよっていて、区別が難しいからである。つまり、コーナーキックとフリーキックとは同時には起こり得ないことを根拠に、この論理積をとることによって、いずれかのイベントを特定することができる。 In this way, the reason for the logical product of the corner kick event output ε _CK and the free kick event output ε _FK is that the position of kicking the ball ( This is because the situation such as the average speed and population density of the players is similar, and it is difficult to distinguish them, only in the corners or arbitrary places). That is, any event can be specified by taking this logical product on the basis that a corner kick and a free kick cannot occur at the same time.

例えば、イベント特定論理演算手段７３（７３−４）は、タイムアウト処理手段７２（７２−３）から出力された真理値ＦＡＬＳＥと、タイムアウト処理手段７２（７２−４）から出力された真理値との論理積（いわゆる、ａｎｄ）を計算し、論理演算を行った結果を、イベント出力ε_right＿goalとして出力する。 For example, the event specifying logic operation means 73 (73-4) calculates the truth value FALSE output from the timeout processing means 72 (72-3) and the truth value output from the timeout processing means 72 (72-4). The logical product (so-called and) is calculated, and the result of the logical operation is output as the event output ε _{right_goal} .

このように、左側ゴール付近で、左側に向かって攻撃しているイベント出力ε_left＿goaと、右側ゴール付近で、右側に向かって攻撃しているイベント出力ε_right＿goaとの論理積をとっている理由は、サッカー映像において、ボールの位置する場所が左側ゴール付近にあるのか右側ゴール付近にあるのかが異なっているだけで、選手の平均速さや人物群重心の場所は似かよっていて、区別が難しいからである。つまり、左側ゴール付近と右側ゴール付近とには同時にボールが存在しないことを根拠に、この論理積をとることによって、いずれかのイベントを特定することができる。 In this way, in the vicinity of the left side of the goal, and the event output _ε left_goa that are attacking towards the left side, in the vicinity of the right goal, the reason for taking a logical product of the event output _ε right_goa attacking toward the right side In soccer video, it is difficult to distinguish the place where the ball is located near the left goal or the right goal, because the average speed of the players and the location of the center of gravity of the group of people are similar. is there. That is, any event can be specified by taking this logical product on the basis that there is no ball in the vicinity of the left goal and the right goal.

例えば、イベント特定論理演算手段７３（７３−５）は、イベント特定論理演算手段７３（７３−４）から出力されたイベント出力ε_right＿goalの論理否定と、タイムアウト処理手段７２（７２−５）から出力された真理値との論理積（いわゆる、ａｎｄ）を計算し、論理演算を行った結果を、イベント出力ε_leftとして出力する。 For example, the event specifying logic operation means 73 (73-5) outputs a logical negation of the event output ε _{right_goal} output from the event specifying logic operation means 73 (73-4) and the timeout processing means 72 (72-5). The logical product (so-called “and”) with the calculated truth value is calculated, and the result of the logical operation is output as the event output ε _left .

このように、右側ゴール付近で、右側に向かって攻撃しているイベント出力ε_right＿goalと、左側に向かって攻撃しているイベント出力ε_leftとの論理積をとっている理由は、サッカー映像において、右側ゴール付近にボールが位置しているが、左側に攻撃する選手にボールが奪われた、いわゆる“カウンター”を検出するためである。 Thus, in the vicinity of the right goal, the reason for taking the logical product of the event output ε _{right_goal} attacking toward the right side and the event output ε _left attacking toward the left side is This is to detect a so-called “counter” in which the ball is positioned near the right-side goal, but the player has attacked the left-hand side.

例えば、イベント特定論理演算手段７３（７３−６）は、イベント特定論理演算手段７３（７３−５）から出力されたイベント出力ε_leftの論理否定と、タイムアウト処理手段７２（７２−６）から出力された真理値との論理積（いわゆる、ａｎｄ）を計算し、論理演算を行った結果を、イベント出力ε_rightとして出力する。 For example, the event specifying logic operation unit 73 (73-6) outputs a logical negation of the event output ε _left output from the event specifying logic operation unit 73 (73-5) and the timeout processing unit 72 (72-6). The logical product (so-called “and”) with the calculated truth value is calculated, and the result of the logical operation is output as the event output ε _right .

このように、左側に向かって攻撃しているイベント出力ε_leftと、右側に向かって攻撃しているイベント出力ε_rightとの論理積をとっている理由は、サッカー映像において、ボールを蹴っている選手が左方向に攻撃する選手なのか右方向に攻撃する選手なのかが異なっているだけで、選手の平均速さや人物群重心の場所は似かよっていて、区別が難しいからである。つまり、右側に向かって攻撃することと左側に向かって攻撃することとは同時に発生しないことを根拠に、この論理積をとることによって、いずれかのイベントを特定することができる。 In this way, the reason for taking the logical product of the event output ε _left attacking toward the _left side and the event output ε _right attacking toward the right side is kicking the ball in the soccer video The only difference is whether the player is attacking in the left direction or the player attacking in the right direction, and the average speed of the players and the location of the center of gravity of the group of people are similar and difficult to distinguish. That is, any event can be specified by taking this logical product on the basis that attacking toward the right side and attacking toward the left side do not occur simultaneously.

なお、ポストフィルタ手段７（図１５参照）は、入力されたフラグＥの真理値や、時間率フィルタ手段７１から出力された真理値や、タイムアウト処理手段７２から出力された真理値や、イベント特定論理演算手段７３から出力されたイベント出力を、任意の時間率フィルタ手段７１、タイムアウト処理手段７２およびイベント特定論理演算手段７３に出力することが可能である（ポストフィルタ手段７内の各構成の任意の接続が可能である）。 Note that the post filter means 7 (see FIG. 15) is configured to input the truth value of the flag E, the truth value output from the time rate filter means 71, the truth value output from the timeout processing means 72, and the event specification. The event output output from the logic operation means 73 can be output to any time rate filter means 71, time-out processing means 72, and event specific logic operation means 73 (arbitrary components in the post filter means 7). Can be connected).

例えば、図２０に示すように、ポストフィルタ手段７Ａを、構成することも可能である。図２０に示したポストフィルタ手段７Ａは、フラグＥ_CKの真理値とフラグＥ_FKの真理値とを入力とし、図１５に示した時間率フィルタ手段７１（７１−１）およびタイムアウト処理手段７２（７２−１）によってイベント出力ε_CKを出力する系統と、時間率フィルタ７１（７１−２）、タイムアウト処理手段７２（７２−２）およびイベント特定論理演算手段７３（７３−２）によってイベント出力ε_FKを出力する系統とに、新たに、イベント特定論理演算手段７３（７３−８）およびイベント特定論理演算手段７３（７３−９）を付加した構成となっている。 For example, as shown in FIG. 20, the post filter means 7A can be configured. The post filter means 7A shown in FIG. 20 receives the truth value of the flag _{ECK and} the truth value of the flag E _FK as input, and the time rate filter means 71 (71-1) and the timeout processing means 72 ( 72-1), the event output ε _CK is output by the time rate filter 71 (71-2), the time-out processing means 72 (72-2), and the event specifying logic operation means 73 (73-2). _An event specifying logic operation unit 73 (73-8) and an event specifying logic operation unit 73 (73-9) are newly added to the system that outputs _FK .

イベント特定論理演算手段７３（７３−８）は、フラグＥ_CKの真理値とフラグＥ_FKの真理値とに基づいて、イベント出力εを時間率フィルタ７１（７１−２）に出力するものである。
イベント特定論理演算手段７３（７３−９）は、時間率フィルタ手段７１（７１−１）から出力された真理値と、時間率フィルタ手段７１（７１−２）から出力された真理値とに基づいて、イベント出力εをタイムアウト処理手段７２（７２−２）に出力するものである。 Event specific logical operation unit 73 (73-8), based on the truth value of truth and flags E _FK flags E _CK, and outputs an event output ε time constant filter 71 (71-2) .
The event specifying logic operation means 73 (73-9) is based on the truth value output from the time rate filter means 71 (71-1) and the truth value output from the time rate filter means 71 (71-2). The event output ε is output to the timeout processing means 72 (72-2).

図１に示した映像解析装置１によれば、シルエット映像生成手段２によって、スポーツ映像からシルエット映像を生成し、シルエット映像に含まれる人物のシルエットを追跡領域とし、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積と、当該追跡領域の色を示す色分類番号を出力し、特徴ベクトル抽出手段５によって、これらの少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算し、イベント検出手段６によって、特徴量が、予め設定した条件を満たした場合を、スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、ポストフィルタ手段７によって、イベントを特定する。このため、スポーツ映像を解析して、当該スポーツ映像における特定のプレイ等を検出することができる。 According to the video analysis apparatus 1 shown in FIG. 1, a silhouette video generation unit 2 generates a silhouette video from a sports video, uses a person's silhouette included in the silhouette video as a tracking area, and estimates coordinates and estimations of the tracking area. The speed, the area of the tracking region, and the color classification number indicating the color of the tracking region are output, and the feature vector extraction means 5 calculates the feature amount to be included in the feature vector based on at least one of these, and the event The case where the feature amount satisfies a preset condition is detected by the detecting unit 6 as an event indicating a specific play occurring in each scene included in the sports video, and the event is specified by the post filter unit 7. For this reason, it is possible to analyze a sports video and detect a specific play or the like in the sports video.

また、この映像解析装置１によれば、人物追跡手段３によって、シルエット映像に含まれるそれぞれの人物のシルエットを単連結領域とし、ラベル番号を付加し、各単連結領域の面積が一定範囲内にある単連結領域について、遅延予測座標および遅延予測速度と、ラベル番号と、単連結領域の面積と、実座標とに基づいて、ラベル番号に対応させた識別番号と、実座標と遅延予測座標とを対応させた観測座標とを出力すると共に、面積判定手段から出力された単連結領域の面積を追跡領域の面積として出力する。そして、出力された観測座標を、時間方向に濾波予測し、推定座標および推定速度を出力する。このため、スポーツ映像に含まれる被写体（選手等）を、当該スポーツ映像の進行時間に沿って（時間方向に）追跡して、当該被写体の座標を出力すると共に、当該被写体の速度および当該被写体の見かけの大きさを出力することができる。 In addition, according to this video analysis apparatus 1, the person tracking means 3 makes each person's silhouette included in the silhouette video a single connected area, adds a label number, and the area of each single connected area is within a certain range. For a single connected region, based on the delay prediction coordinate and delay prediction speed, the label number, the area of the single connection region, and the real coordinate, the identification number corresponding to the label number, the real coordinate, and the delayed prediction coordinate And the coordinate of the single connected region output from the area determining means is output as the area of the tracking region. The output observation coordinates are predicted to be filtered in the time direction, and the estimated coordinates and the estimated speed are output. Therefore, a subject (player, etc.) included in the sports video is tracked along the time of progress of the sports video (in the time direction), the coordinates of the subject are output, and the speed of the subject and the subject The apparent size can be output.

さらに、この映像解析装置１によれば、特徴ベクトル抽出手段５によって、特徴ベクトルに含まれる特徴量が、人物数推定値と、人物群分布定量化値と、人物間距離定量化値と、人物速さ定量化値と、人物速度定量化値と、判定値との少なくとも１つを備えているので、こういった座標と、速度と、面積と、色分類といった低次の映像特徴量を、スポーツ映像のイベント（特定のプレイ等）に関する高次の映像特徴量に変換することができる。 Furthermore, according to this video analysis apparatus 1, the feature vector extraction means 5 uses the feature vector extraction means 5 to calculate the number of persons, the person group distribution quantification value, the interpersonal distance quantification value, Since it has at least one of speed quantification value, person speed quantification value, and judgment value, low-order video features such as coordinates, speed, area, and color classification, It can be converted into a higher-order video feature amount related to an event (specific play or the like) of a sports video.

（映像解析装置の全体動作）
次に、図２１に示すフローチャートを参照して、映像解析装置１の全体の動作を説明する（適宜、図１参照）。なお、この図２１に示したフローチャートは、入力映像として、サッカー映像が入力された場合に、映像解析装置１の概略の動作を説明したものである。 (Overall operation of video analyzer)
Next, the overall operation of the video analysis apparatus 1 will be described with reference to the flowchart shown in FIG. 21 (see FIG. 1 as appropriate). Note that the flowchart shown in FIG. 21 describes the general operation of the video analysis apparatus 1 when a soccer video is input as an input video.

まず、映像解析装置１は、シルエット映像生成手段２によって、入力映像（複数の入力画像Ｉ（ｘ，ｙ））からシルエット映像（複数のシルエット画像Ｓ（ｘ，ｙ））を生成し（ステップＳ１）、人物追跡手段３によって、シルエット映像に含まれている領域（人物のシルエット）を追跡領域とし、識別番号ｍ、推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）および面積α（ｍ）を出力する（ステップＳ２）。 First, the video analysis apparatus 1 generates silhouette videos (plural silhouette images S (x, y)) from the input videos (plural input images I (x, y)) by the silhouette video generation means 2 (step S1). ), By the person tracking means 3, the area included in the silhouette video (the silhouette of the person) is set as the tracking area, the identification number m, the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m) and The area α (m) is output (step S2).

また、映像解析装置１は、入力映像と、シルエット映像生成手段２から出力されたシルエット映像と、人物追跡手段３から出力された追跡領域および識別番号ｍとから、色識別手段４によって、識別番号ｍと色分類番号Ｃ（ｍ）とを対応付ける（ステップＳ３）。そして、映像解析装置１は、特徴ベクトル抽出手段５によって、入力された識別番号ｍ、推定座標ｘ_t|t（ｍ）、推定速度ｖ_t|t（ｍ）、面積α（ｍ）および色分類番号Ｃ（ｍ）とに基づいて、人数Ｎ、人物分布面積Ｖ、人物間距離ｄ、人物群重心ｇ、平均速さｓ、平均速度ｕおよび監視結果ｗの少なくとも１つを含む特徴ベクトルを出力する（ステップＳ４）。 Also, the video analysis apparatus 1 uses the color identification unit 4 to identify the identification number from the input video, the silhouette video output from the silhouette video generation unit 2, the tracking area and the identification number m output from the person tracking unit 3. The m is associated with the color classification number C (m) (step S3). Then, the video analysis apparatus 1 uses the feature vector extraction means 5 to input the identification number m, the estimated coordinates x _{t | t} (m), the estimated speed v _{t | t} (m), the area α (m), and the color classification. Based on the number C (m), a feature vector including at least one of the number of people N, the person distribution area V, the distance d between people, the person group center of gravity g, the average speed s, the average speed u, and the monitoring result w is output. (Step S4).

そして、映像解析装置１は、イベント検出手段６によって、イベント（特定のプレイ）を示すフラグによって、当該イベントを検出し（ステップＳ５）、ポストフィルタ手段７によって、イベントを特定し、コーナーキックを示すイベント出力ε_CKと、フリーキックを示すイベント出力ε_FKと、左側ゴール付近で、左側に向かって攻撃していることを示すイベント出力ε_left＿goalと、右側ゴール付近で、右側に向かって攻撃していることを示すイベント出力ε_right＿goalと、左側に向かって攻撃していることを示すイベント出力ε_leftと、右側に向かって攻撃イベント出力ε_rightと、選手の攻撃する向き（右方向から左方向、または、左方向から右方向）が変わったことを示すイベント出力ε_turnとのいずれかを出力する（ステップＳ６）。 Then, the video analysis device 1 detects the event by the event detection unit 6 using a flag indicating the event (specific play) (step S5), specifies the event by the post filter unit 7, and indicates a corner kick. Event output ε _CK , event output ε _FK indicating a free kick, event output ε _{left_goal} indicating that the _player is attacking to the left near the left goal, and attacking toward the right near the right goal Event output ε _{right_goal} indicating that he is attacking, event output ε _left indicating that he is attacking toward the _left side, attack event output ε _right toward the right side, and the player's attack direction (from right to left, _{Alternatively} , one of the event outputs ε _turn indicating that the left direction has changed (to the right direction) is output (step S6).

以上、本発明の実施形態について説明したが、本発明は前記実施形態には限定されない。例えば、本実施形態では、映像解析装置１に入力される入力映像を、専ら、スポーツ映像として説明したがこれに限定されるものではなく、風景が描写された映像や、ドラマの映像等であってもよい。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment. For example, in the present embodiment, the input video input to the video analysis device 1 has been described exclusively as a sports video. However, the present invention is not limited to this, and may be a video depicting a landscape, a drama video, or the like. May be.

また、映像解析装置１の各構成の処理を行わせるように、一般的または汎用的なコンピュータ言語によって記述した映像解析プログラムとして構成することも可能である。この場合、映像解析装置１と同様の効果を得ることができる。 Moreover, it is also possible to configure as a video analysis program described in a general or general-purpose computer language so that the processing of each component of the video analysis device 1 is performed. In this case, the same effect as the video analysis device 1 can be obtained.

本発明の実施形態に係る映像解析装置のブロック図である。1 is a block diagram of a video analysis apparatus according to an embodiment of the present invention. 図１に示した人物追跡手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the person tracking means shown in FIG. 面積判定手段による処理の概念を説明した図である。It is a figure explaining the concept of the process by an area determination means. 面積判定手段および逆投影手段による処理の概念を説明した図である。It is a figure explaining the concept of the process by an area determination means and a back projection means. 特徴ベクトル抽出手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the feature vector extraction means. 人数計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of a people count means. 人物像面積推定手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of a person image area estimation means. 人物群分散計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of a person group dispersion | distribution measurement means. 人物間距離計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the distance measurement means between persons. 人物群重心計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of a person group gravity center measurement means. 平均速さ計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the average speed measurement means. 平均速度計測手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the average speed measurement means. 特定領域監視手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the specific area | region monitoring means. イベント検出手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of an event detection means. ポストフィルタ手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the post filter means. 時間率フィルタ手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the time rate filter means. 時間率フィルタ手段による処理を説明した図である。It is a figure explaining the process by a time rate filter means. タイムアウト処理手段の構成の一例を示したブロック図である。It is the block diagram which showed an example of the structure of the time-out process means. タイムアウト処理手段による処理を説明した図である。It is a figure explaining the process by the time-out process means. ポストフィルタ手段の別の構成の一例を示したブロック図である。It is the block diagram which showed an example of another structure of a post filter means. 図１に示した映像解析装置の全体動作を説明したフローチャートである。2 is a flowchart illustrating an overall operation of the video analysis apparatus illustrated in FIG. 1.

符号の説明Explanation of symbols

１映像解析装置
２シルエット映像生成手段
３人物追跡手段
４色識別手段
５特徴ベクトル抽出手段（特徴ベクトル計算手段）
６イベント検出手段
７ポストフィルタ手段
３１ラベリング手段
３２面積判定手段
３３逆投影変換手段
３４検出・追跡手段（検出追跡手段）
３５予測・推定手段（予測推定手段）
３６遅延手段
５１人数計測手段
５２人物群分散計測手段
５３人物間距離計測手段
５４人物群重心計測手段
５５平均速さ計測手段
５６平均速度計測手段
５７特定領域監視手段
６１特徴量間演算手段
６２閾値演算手段
６３論理演算手段
６４遅延手段
７１時間率フィルタ手段
７２タイムアウト処理手段
７３イベント特定論理演算手段
５１１クロック手段
５１２順次面積選択手段
５１３順次座標選択手段
５１４人物像面積推定手段
５１５除算手段
５１６総和演算手段
５２１クロック手段
５２２順次座標選択手段
５２３共分散行列演算手段
５２４分布面積演算手段
５３１クロック手段
５３２第一番号対応座標選択手段
５３３第二番号対応座標選択手段
５３４距離演算手段
５３５最小値演算手段
５３６平均値演算手段
５４１クロック手段
５４２順次座標選択手段
５４３平均値演算手段
５５１クロック手段
５５２順次速度選択手段
５５３絶対値演算手段
５５４平均値演算手段
５６１クロック手段
５６２順次速度選択手段
５６３平均値演算手段
５７１クロック手段
５７２順次番号対応座標選択手段
５７３色分類別２次元閾値演算手段
７１１遅延手段
７１２時間率演算手段
７１３閾値演算手段
７２１単安定マルチバイブレータ手段
７２２論理和演算手段 DESCRIPTION OF SYMBOLS 1 Image | video analysis apparatus 2 Silhouette image | video production | generation means 3 Person tracking means 4 Color identification means 5 Feature vector extraction means (feature vector calculation means)
6 Event detection means 7 Post filter means 31 Labeling means 32 Area determination means 33 Backprojection conversion means 34 Detection / tracking means (detection tracking means)
35 Prediction / estimation means (prediction estimation means)
36 delay means 51 person count means 52 person group dispersion measurement means 53 person distance measurement means 54 person group center of gravity measurement means 55 average speed measurement means 56 average speed measurement means 57 specific area monitoring means 61 feature quantity calculation means 62 threshold value calculation Means 63 Logical operation means 64 Delay means 71 Time rate filter means 72 Timeout processing means 73 Event specific logic operation means 511 Clock means 512 Sequential area selection means 513 Sequential coordinate selection means 514 Human image area estimation means 515 Division means 516 Summation operation means 521 Clock means 522 Sequential coordinate selection means 523 Covariance matrix calculation means 524 Distribution area calculation means 531 Clock means 532 First number corresponding coordinate selection means 533 Second number corresponding coordinate selection means 534 Distance calculation means 535 Minimum value calculation means 536 Average value calculation Calculation means 541 Clock means 542 Sequential coordinate selection means 543 Average value calculation means 551 Clock means 552 Sequential speed selection means 553 Absolute value calculation means 554 Average value calculation means 561 Clock means 562 Sequential speed selection means 563 Average value calculation means 571 Clock means 572 Sequential number corresponding coordinate selection means 573 Two-dimensional threshold value calculation means by color classification 711 Delay means 712 Time rate calculation means 713 Threshold value calculation means 721 Monostable multivibrator means 722 OR operation means

Claims

入力された映像を解析する映像解析装置であって、
前記映像からシルエット映像を生成するシルエット映像生成手段と、
このシルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれるシルエットを追跡領域とし、前記シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する領域追跡手段と、
前記推定座標と、前記シルエット映像と、前記映像とに基づき、前記追跡領域の色を識別し、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と前記識別番号とを対応付けて出力する色識別手段と、
前記識別番号と対応付けられた、前記推定座標および前記推定速度と、前記追跡領域の面積と、前記色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する特徴ベクトル計算手段と、
この特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、前記映像に含まれる各シーンで発生した出来事を示すイベントとして検出し、検出した結果を示すフラグ信号を出力するイベント検出手段と、
このイベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、前記フラグ信号間の論理演算との少なくとも一方の処理を行って、前記イベントを特定するイベント出力信号を出力するポストフィルタ手段と、
を備えることを特徴とする映像解析装置。 A video analysis device that analyzes input video,
Silhouette video generation means for generating a silhouette video from the video;
Identification for identifying the tracking area by tracking the tracking area based on the difference between the silhouette images, using the silhouette included in the silhouette image constituting the silhouette video generated by the silhouette video generation means as the tracking area. An area tracking means for outputting the estimated coordinates and estimated speed of the tracking area and the area of the tracking area in association with the number;
Based on the estimated coordinates, the silhouette image, and the image, the color of the tracking area is identified, and based on the identified result, a color classification number set in advance to classify the color and the identification number Color identification means for associating and outputting,
Feature vector calculation means for calculating a feature quantity to be included in a feature vector based on at least one of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number When,
When the feature amount included in the feature vector calculated by the feature vector calculating means satisfies a preset condition, it is detected as an event indicating an event occurring in each scene included in the video, and the detection result is Event detection means for outputting a flag signal indicating;
Post-filter means for outputting an event output signal for identifying the event by performing at least one of a time-direction filtering process and a logical operation between the flag signals on the flag signal output by the event detection means When,
A video analysis apparatus comprising:

前記領域追跡手段は、
前記シルエット画像に含まれるそれぞれのシルエットを単連結領域とし、この単連結領域に対してラベル番号を付加し、当該ラベル番号を付加した単連結領域の形状に関する領域情報を生成するラベリング手段と、
前記領域情報に基づいて、各単連結領域の面積を求め、求めた面積が一定範囲内にある単連結領域について、前記ラベル番号および前記単連結領域の面積を出力する面積判定手段と、
前記ラベリング手段で生成された領域情報と前記映像を撮影したカメラの投影中心とに基づいて、３次元空間における各単連結領域の存在場所を示す実座標を、前記ラベル番号と共に出力する逆投影変換手段と、
前記追跡領域の座標および速度の予測された予測座標および予測速度が予め設定された所定単位時間遅延されて出力された、遅延予測座標および遅延予測速度と、前記ラベル番号と、前記面積判定手段から出力された単連結領域の面積と、前記実座標とに基づいて、前記ラベル番号に対応させた前記識別番号と、前記実座標と前記遅延予測座標とを対応させた観測座標とを出力すると共に、前記面積判定手段から出力された単連結領域の面積を前記追跡領域の面積として出力する検出追跡手段と、
この検出追跡手段で出力された観測座標を、時間方向に濾波予測し、前記推定座標および前記推定速度と、前記予測座標および前記予測速度とを、前記識別番号と共にそれぞれ出力する予測推定手段と、
この予測推定手段から出力された前記予測座標および前記予測速度を、前記所定単位時間遅延して、前記識別番号と共に前記検出追跡手段に出力する遅延手段と、
を備えることを特徴とする請求項１に記載の映像解析装置。 The region tracking means includes
Labeling means for making each silhouette included in the silhouette image a single connected region, adding a label number to the single connected region, and generating region information regarding the shape of the single connected region to which the label number is added,
Based on the region information, the area of each single connected region is obtained, and for the single connected region where the obtained area is within a certain range, an area determination unit that outputs the label number and the area of the single connected region;
Based on the area information generated by the labeling means and the projection center of the camera that captured the video, back projection transformation that outputs real coordinates indicating the location of each single connected area in a three-dimensional space together with the label number Means,
The predicted coordinates and the predicted speed of the tracking area coordinates and speed, which are output after being delayed by a predetermined unit time set in advance, the delayed predicted coordinates and the delayed predicted speed, the label number, and the area determination unit Based on the output area of the single connected region and the real coordinates, the identification number corresponding to the label number and the observation coordinates corresponding to the real coordinates and the delayed predicted coordinates are output. Detecting and tracking means for outputting the area of the single connected region output from the area determining means as the area of the tracking region;
Prediction estimating means for predicting filtered observation coordinates output by the detection tracking means in the time direction, and outputting the estimated coordinates and the estimated speed, and the predicted coordinates and the predicted speed together with the identification number,
Delay means for delaying the predicted coordinates and the predicted speed output from the prediction estimation means to the detection tracking means together with the identification number;
The video analysis apparatus according to claim 1, further comprising:

前記特徴ベクトルは、前記シルエットの数を推定したシルエット数推定値、複数の前記シルエットからなるシルエット群の分布の散らばり度合いを定量化したシルエット群分布定量化値、前記シルエット間の距離を定量化したシルエット間距離定量化値、前記シルエット群の分布を代表する座標を定量化したシルエット群重心定量化値、前記シルエットの速さを定量化したシルエット速さ定量化値、前記シルエットの速度を定量化したシルエット速度定量化値および予め特定したシルエットである特定シルエットが特定の場所に存在するか否かを判定した判定値の少なくとも一つ以上の特徴量によって構成されることを特徴とする請求項１または請求項２に記載の映像解析装置。 The feature vector is an estimated number of silhouettes that estimates the number of silhouettes, a silhouette group distribution quantification value that quantifies the degree of dispersion of the distribution of silhouette groups composed of a plurality of silhouettes, and a distance between the silhouettes. Distance quantification value between silhouettes, silhouette group centroid quantification value quantifying coordinates representing the distribution of the silhouette group, silhouette speed quantification value quantifying the speed of the silhouette, quantifying the speed of the silhouette The at least one feature amount of the determined silhouette speed quantification value and a determination value for determining whether or not a specific silhouette, which is a silhouette specified in advance, exists in a specific place is characterized in that Or the video-analysis apparatus of Claim 2.

入力されたスポーツ映像を解析する映像解析装置であって、
前記スポーツ映像からシルエット映像を生成するシルエット映像生成手段と、
このシルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれる所定面積範囲の領域である人物のシルエットを追跡領域とし、前記シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する領域追跡手段と、
前記推定座標と、前記シルエット映像と、前記スポーツ映像とに基づき、前記追跡領域の色を識別し、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と前記識別番号とを対応付けて出力する色識別手段と、
前記識別番号と対応付けられた、前記推定座標および前記推定速度と、前記追跡領域の面積と、前記色分類番号との少なくとも一つに基づき、特徴ベクトルを計算する特徴ベクトル計算手段と、
この特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、前記スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、検出した結果を示すフラグ信号を出力するイベント検出手段と、
このイベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、前記フラグ信号間の論理演算との少なくとも一方の処理を行って、前記イベントを特定するイベント出力信号を出力するポストフィルタ手段と、
を備えることを特徴とする映像解析装置。 A video analysis device for analyzing an input sports video,
Silhouette video generation means for generating a silhouette video from the sports video;
The tracking area is a silhouette of a person that is a region of a predetermined area included in the silhouette image constituting the silhouette video generated by the silhouette video generation means, and the tracking area is tracked based on the difference between the silhouette images. In association with an identification number for identifying the tracking region, region tracking means for outputting the estimated coordinates and estimated speed of the tracking region and the area of the tracking region;
Based on the estimated coordinates, the silhouette video, and the sports video, the color of the tracking area is identified, and based on the identified result, a color classification number and the identification number set in advance to classify the color Color identification means for outputting
Feature vector calculation means for calculating a feature vector based on at least one of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number;
When the feature amount included in the feature vector calculated by the feature vector calculation means satisfies a preset condition, it is detected as an event indicating a specific play occurring in each scene included in the sports video, and is detected. Event detection means for outputting a flag signal indicating the result obtained,
Post-filter means for outputting an event output signal for identifying the event by performing at least one of a time-direction filtering process and a logical operation between the flag signals on the flag signal output by the event detection means When,
A video analysis apparatus comprising:

前記領域追跡手段は、
前記シルエット画像に含まれるそれぞれの所定面積範囲の領域である人物のシルエットを単連結領域とし、この単連結領域に対してラベル番号を付加し、当該ラベル番号を付加した単連結領域の形状に関する領域情報を生成するラベリング手段と、
前記領域情報に基づいて、各単連結領域の面積を求め、求めた面積が一定範囲内にある単連結領域について、前記ラベル番号および前記単連結領域の面積を出力する面積判定手段と、
前記ラベリング手段で生成された領域情報と前記スポーツ映像を撮影したカメラの投影中心とに基づいて、３次元空間における各単連結領域の存在場所を示す実座標を、前記ラベル番号と共に出力する逆投影変換手段と、
前記追跡領域の座標および速度の予測された予測座標および予測速度が予め設定された所定単位時間遅延されて出力された、遅延予測座標および遅延予測速度と、前記ラベル番号と、前記面積判定手段から出力された単連結領域の面積と、前記実座標とに基づいて、前記ラベル番号に対応させた前記識別番号と、前記実座標と前記遅延予測座標とを対応させた観測座標とを出力すると共に、前記面積判定手段から出力された単連結領域の面積を前記追跡領域の面積として出力する検出追跡手段と、
この検出追跡手段で出力された観測座標を、時間方向に濾波予測し、前記推定座標および前記推定速度と、前記予測座標および前記予測速度とを、前記識別番号と共にそれぞれ出力する予測推定手段と、
この予測推定手段から出力された前記予測座標および前記予測速度を前記所定単位時間遅延して、前記識別番号と共に前記検出追跡手段に出力する遅延手段と、
を備えることを特徴とする請求項４に記載の映像解析装置。 The region tracking means includes
A region related to the shape of a single connected region to which a silhouette of a person, which is a region of a predetermined area range included in the silhouette image, is a single connected region, a label number is added to the single connected region, and the label number is added. A labeling means for generating information;
Based on the region information, the area of each single connected region is obtained, and for the single connected region where the obtained area is within a certain range, an area determination unit that outputs the label number and the area of the single connected region;
Based on the area information generated by the labeling means and the projection center of the camera that captured the sports video, back projection that outputs real coordinates indicating the location of each single connected area in a three-dimensional space together with the label number Conversion means;
The predicted coordinates and the predicted speed of the tracking area coordinates and speed, which are output after being delayed by a predetermined unit time set in advance, the delayed predicted coordinates and the delayed predicted speed, the label number, and the area determination unit Based on the output area of the single connected region and the real coordinates, the identification number corresponding to the label number and the observation coordinates corresponding to the real coordinates and the delayed predicted coordinates are output. Detecting and tracking means for outputting the area of the single connected region output from the area determining means as the area of the tracking region;
Prediction estimating means for predicting filtered observation coordinates output by the detection tracking means in the time direction, and outputting the estimated coordinates and the estimated speed, and the predicted coordinates and the predicted speed together with the identification number,
Delay means for delaying the predicted coordinates and the predicted speed output from the prediction estimation means to the detection tracking means together with the identification number;
The video analysis apparatus according to claim 4, further comprising:

前記特徴ベクトルは、前記人物のシルエットの数を推定した人物数推定値、複数の前記人物のシルエットからなる人物シルエット群の分布の散らばり度合いを定量化した人物群分布定量化値、前記人物のシルエット間の距離を定量化した人物間距離定量化値、前記人物群の分布を代表する座標を定量化した人物群重心定量化値、前記人物のシルエットの速さを定量化した人物速さ定量化値、前記人物のシルエットの速度を定量化した人物速度定量化値および予め特定した人物のシルエットである特定人物シルエットが特定の場所に存在するか否かを判定した判定値の少なくとも一つ以上の特徴量によって構成されることを特徴とする請求項４または請求項５に記載の映像解析装置。 The feature vector includes a person number estimation value obtained by estimating the number of silhouettes of the person, a person group distribution quantification value obtained by quantifying the degree of dispersion of a distribution of person silhouette groups including a plurality of silhouettes of the person, and the silhouette of the person. Quantified distance between persons quantified distance, quantified value of centroid of person group quantified coordinates representing the distribution of the person group, quantified person speed quantified the speed of the silhouette of the person At least one of a value, a human speed quantification value obtained by quantifying the speed of the person's silhouette, and a determination value used to determine whether or not a specific person silhouette that is a previously specified person silhouette exists in a specific place 6. The video analysis apparatus according to claim 4, wherein the video analysis apparatus is constituted by a feature amount.

入力された映像を解析するために、コンピュータを、
前記映像からシルエット映像を生成するシルエット映像生成手段、
このシルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれるシルエットを追跡領域とし、前記シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する領域追跡手段、
前記推定座標と、前記シルエット映像と、前記映像とに基づき、前記追跡領域の色を識別し、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と前記識別番号とを対応付けて出力する色識別手段、
前記識別番号と対応付けられた、前記推定座標および前記推定速度と、前記追跡領域の面積と、前記色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する特徴ベクトル計算手段、
この特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、前記映像に含まれる各シーンで発生した出来事を示すイベントとして検出し、検出した結果を示すフラグ信号を出力するイベント検出手段、
このイベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、前記フラグ信号間の論理演算との少なくとも一方の処理を行って、前記イベントを特定するイベント出力信号を出力するポストフィルタ手段、
として機能させることを特徴とする映像解析プログラム。 In order to analyze the input video, computer
Silhouette video generation means for generating a silhouette video from the video,
Identification for identifying the tracking area by tracking the tracking area based on the difference between the silhouette images, using the silhouette included in the silhouette image constituting the silhouette video generated by the silhouette video generation means as the tracking area. Region tracking means for outputting the estimated coordinates and estimated speed of the tracking region and the area of the tracking region in association with the number;
Based on the estimated coordinates, the silhouette image, and the image, the color of the tracking area is identified, and based on the identified result, a color classification number set in advance to classify the color and the identification number Color identification means for associating and outputting
Feature vector calculation means for calculating a feature quantity to be included in a feature vector based on at least one of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number ,
When the feature amount included in the feature vector calculated by the feature vector calculating means satisfies a preset condition, it is detected as an event indicating an event occurring in each scene included in the video, and the detection result is Event detection means for outputting a flag signal indicating,
Post-filter means for outputting an event output signal for identifying the event by performing at least one of a time-direction filtering process and a logical operation between the flag signals on the flag signal output by the event detection means ,
A video analysis program characterized by functioning as

入力されたスポーツ映像を解析するために、コンピュータを、
前記スポーツ映像からシルエット映像を生成するシルエット映像生成手段、
このシルエット映像生成手段で生成されたシルエット映像を構成するシルエット画像に含まれる所定面積範囲の領域である人物のシルエットを追跡領域とし、前記シルエット画像間の差に基づいて当該追跡領域を追跡し、当該追跡領域を識別するための識別番号と対応付けて、当該追跡領域の推定座標および推定速度と、当該追跡領域の面積とを出力する領域追跡手段、
前記推定座標と、前記シルエット映像と、前記スポーツ映像とに基づき、前記追跡領域の色を識別し、識別した結果に基づいて、当該色を分類するために予め設定した色分類番号と前記識別番号とを対応付けて出力する色識別手段、
前記識別番号と対応付けられた、前記推定座標および前記推定速度と、前記追跡領域の面積と、前記色分類番号との少なくとも一つに基づき、特徴ベクトルに含める特徴量を計算する特徴ベクトル計算手段、
この特徴ベクトル計算手段で計算された特徴ベクトルに含まれる特徴量が、予め設定した条件を満たした場合を、前記スポーツ映像に含まれる各シーンで発生した特定のプレイを示すイベントとして検出し、検出した結果を示すフラグ信号を出力するイベント検出手段、
このイベント検出手段で出力されたフラグ信号に、時間方向のフィルタ処理と、前記フラグ信号間の論理演算との少なくとも一方の処理を行って、前記イベントを特定するイベント出力信号を出力するポストフィルタ手段、
として機能させることを特徴とする映像解析プログラム。 In order to analyze the input sports video, computer
Silhouette video generation means for generating a silhouette video from the sports video,
The tracking area is a silhouette of a person that is a region of a predetermined area included in the silhouette image constituting the silhouette video generated by the silhouette video generation means, and the tracking area is tracked based on the difference between the silhouette images. An area tracking means for outputting the estimated coordinates and estimated speed of the tracking area and the area of the tracking area in association with an identification number for identifying the tracking area;
Based on the estimated coordinates, the silhouette video, and the sports video, the color of the tracking area is identified, and based on the identified result, a color classification number and the identification number set in advance to classify the color Color identification means for outputting
Feature vector calculation means for calculating a feature quantity to be included in a feature vector based on at least one of the estimated coordinates and the estimated speed, the area of the tracking region, and the color classification number associated with the identification number ,
When the feature amount included in the feature vector calculated by the feature vector calculation means satisfies a preset condition, it is detected as an event indicating a specific play occurring in each scene included in the sports video, and is detected. Event detection means for outputting a flag signal indicating the result obtained,
Post-filter means for outputting an event output signal for identifying the event by performing at least one of a time-direction filtering process and a logical operation between the flag signals on the flag signal output by the event detection means ,
A video analysis program characterized by functioning as