JP2010171798A

JP2010171798A - Video processing device and video processing method

Info

Publication number: JP2010171798A
Application number: JP2009013139A
Authority: JP
Inventors: Toyokazu Itakura; 豊和板倉
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-01-23
Filing date: 2009-01-23
Publication date: 2010-08-05
Anticipated expiration: 2029-01-23
Also published as: JP5159654B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize more efficient viewing by extracting only a scene with a high viewing value from among digest videos and presenting it to a user. <P>SOLUTION: In a feature amount extraction unit 16, a feature amount vector of scene information on a steady scene extracted by a specific scene extraction unit 13 is extracted from the scene information on a digest video and in a discriminator learning unit 18, a discriminator is produced from the feature amount vector. In the feature amount extraction unit 16, the feature amount vector of the scene information on the discrimination object scene extracted by a discrimination object scene extraction unit 14 is extracted from the scene information on the digest video of an extraction object. In a non-steady scene detection unit 19, the feature amount vector of the discrimination object scene is supplied to the discriminator for comparison, and whether the discrimination object scene is a non-steady scene is discriminated and the discriminated non-steady scene is displayed in a result display unit 22. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、映像処理装置及びその方法に関し、特に、映像コンテンツを再生する際に、映像を電子的に処理して特徴的なシーンのみを利用者に提供する映像処理装置及びその方法に関する。 The present invention relates to a video processing apparatus and method, and more particularly, to a video processing apparatus and method for electronically processing video and providing only a characteristic scene to a user when playing back video content.

録画機器の録画容量の大容量化により、テレビ番組を録画して後で視聴するという視聴スタイルが一般的になっている。また、多チャンネル化による番組数の増大により、個人が視聴する番組の数も非常に増加している。更に通信環境の向上によるインターネット経由での大容量映像コンテンツの利用増加などにより、インターネットを通じて映像コンテンツを個人の所有するコンピュータに保存して楽しむことも一般的になっている。 Due to the increase in recording capacity of recording devices, a viewing style in which a television program is recorded and viewed later has become common. In addition, with the increase in the number of programs due to the increase in the number of channels, the number of programs that an individual views is also increasing greatly. Furthermore, due to an increase in the use of large-capacity video content via the Internet due to the improvement of the communication environment, it has become common to store video content on a personal computer through the Internet and enjoy it.

このような映像コンテンツ数の増加に伴い、映像コンテンツの視聴時に自分の視聴したいシーンだけを取り出して効率的に視聴したいという要求がある。 With such an increase in the number of video contents, there is a demand for efficiently viewing only the scenes that the user wants to view when viewing the video contents.

このような要求に答えるため、テレビ番組の録画機器やコンピュータなどには様々な映像処理機能が付与されている。効率的な視聴を実現するための映像処理機能の従来技術としては、以下のようなダイジェスト映像作成技術が知られている。 In order to respond to such a request, various video processing functions are added to a television program recorder or a computer. The following digest video creation techniques are known as conventional techniques for video processing functions for realizing efficient viewing.

まず、ある映像を構成するフレーム画像間において、画像特徴量の変化が一定以上の地点をカットの切り替わりとみなして映像を分割する。これら分割された各映像をそれぞれシーンと称す。すなわち、映像は、複数のシーンに分割される。ここで画像特徴量とは、例えば、画像が有する各画素の色の割合や、画像を格子状に分割した個々の領域毎の画素の平均値の分布等である。 First, between frame images constituting a certain video, the video is divided by regarding a point where the change in the image feature amount is a certain level or more as a cut switching. Each of the divided videos is called a scene. That is, the video is divided into a plurality of scenes. Here, the image feature amount is, for example, the ratio of the color of each pixel included in the image, the distribution of the average value of the pixels for each area obtained by dividing the image in a grid pattern, or the like.

次に、例えば、各シーンの先頭のフレーム画像をシーンの特徴量として、各シーンの先頭のフレーム画像の特徴量を計算する。そして、特徴量が似たシーンを同一グループとしてクラスタリングを行うことで、ダイジェスト映像を作成することができる。このようなダイジェスト映像を利用者に対して表示し、利用者がある特定グループのシーンだけを指定して視聴することで、効率的な映像コンテンツの視聴が実現される（特許文献１参照）。 Next, for example, the feature amount of the first frame image of each scene is calculated using the first frame image of each scene as the feature amount of the scene. A digest video can be created by clustering scenes with similar feature quantities into the same group. By displaying such a digest video to the user and specifying and viewing only a specific group of scenes, the user can efficiently view the video content (see Patent Document 1).

しかし、従来のダイジェスト作成機能においては、特定シーンの中で視聴する価値の高い、通常とは異なるシーンを抽出して利用者に提示することはできず、更に効率的な視聴を実現することができないという問題がある。 However, with the conventional digest creation function, it is not possible to extract and present to the user an unusual scene that is highly worth viewing in a specific scene, and it is possible to realize more efficient viewing. There is a problem that you can not.

例えば、野球中継における特定シーンが内野守備シーンである例を考えると、従来のダイジェスト作成機能においては、野球中継の映像コンテンツから、内野守備シーンのみを抽出してダイジェスト映像を作成することは可能であった。しかし、作成されたダイジェスト映像は、大多数の代わり映えのしない、いわば視聴する価値の低いシーンと、ごく少数のファインプレーのシーンやエラーシーンなどの通常とは異なった、特に視聴する価値の高いシーンとで構成されており、これらの視聴する価値の低いシーンと視聴する価値の高いシーンとを識別することは考えられていなかった。従って、特に視聴する価値の高いシーンのみで構成されたダイジェスト映像は作成されず、より効率的な視聴を実現することができないという問題がある。 For example, considering an example in which a specific scene in a baseball broadcast is an infield defensive scene, the conventional digest creation function can create a digest video by extracting only the infield defensive scene from the video content of the baseball broadcast. there were. However, the digest video that is created is a scene that is not suitable for the majority, that is, a scene that is not worth watching, and a very small number of fine play scenes and error scenes that are unusual and particularly worth watching. Therefore, it has not been considered to distinguish between these scenes with low viewing value and scenes with high viewing value. Accordingly, there is a problem that a digest video composed only of scenes that are particularly worth watching is not created, and more efficient viewing cannot be realized.

特許第４０６７３２６号公報Japanese Patent No. 40673326

本発明は、ダイジェスト映像の中から、視聴する価値の高いシーンのみを抽出して利用者に提示することで、更に効率的な視聴を実現することができる映像処理装置及びその方法を提供することを目的とする。 The present invention provides a video processing apparatus and method capable of realizing more efficient viewing by extracting only scenes having high viewing value from digest video and presenting them to the user. With the goal.

上述の課題を解決するため、本発明の映像処理装置は、映像コンテンツ情報を記憶する映像データベースと、前記映像コンテンツに含まれる特定シーンから構成されるダイジェスト映像を特定するダイジェスト情報を保存するダイジェスト情報データベースと、このダイジェスト情報データベースに保存された前記ダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出する特定シーン抽出部と、この特定シーン抽出部により抽出された前記シーン情報に対応した前記複数の定常シーンを、前記映像データベースから抽出し、これらの複数の定常シーンをそれぞれ構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出する特徴量抽出部と、この特徴量抽出部により抽出された前記複数の定常シーンの対応するフレーム画像毎の特徴量ベクトル集合により識別器を作成する識別器学習部と、前記ダイジェスト情報データベースに保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出する識別対象シーン抽出部と、この識別対象シーン抽出部により抽出された前記シーン情報に対応する前記識別対象シーンを前記映像データベースから抽出し、この識別対象シーンを構成する前記フレーム画像毎に特徴量ベクトルを抽出する前記特徴量抽出部と、この特徴量抽出部により抽出された前記特徴量ベクトルを前記識別器に供給して比較することにより、前記識別対象シーンが非定常シーンであるか否かを識別する非定常シーン検出部と、この非定常シーン検出部により識別された前記非定常シーンを表示する結果表示部と、を具備することを特徴とするものである。 In order to solve the above-described problems, a video processing apparatus according to the present invention includes a video database for storing video content information, and digest information for storing digest information for specifying a digest video composed of a specific scene included in the video content. A specific scene extraction unit that extracts scene information that specifies a plurality of stationary scenes composed of a plurality of frame images from the digest information stored in the digest information database, and the database extracted by the specific scene extraction unit Extracting a plurality of stationary scenes corresponding to scene information from the video database, and extracting a feature vector for each frame image of a predetermined time interval constituting each of the plurality of stationary scenes; The plurality of features extracted by the feature amount extraction unit A classifier learning unit that creates a classifier based on a set of feature vectors for each frame image corresponding to a stationary scene, and an identification target scene composed of a plurality of frame images are identified from the extraction target digest information stored in the digest information database. An identification target scene extraction unit for extracting scene information to be extracted, and the frame constituting the identification target scene by extracting the identification target scene corresponding to the scene information extracted by the identification target scene extraction unit from the video database. The feature amount extraction unit that extracts a feature amount vector for each image and the feature amount vector extracted by the feature amount extraction unit are supplied to the discriminator for comparison, whereby the scene to be identified is an unsteady scene The non-stationary scene detection unit for identifying whether or not It is characterized in that it comprises and a result display section for displaying the other has been the unsteady scene.

また、本発明の映像処理方法は、特定シーン抽出部において、ダイジェスト情報データベースに保存され、映像コンテンツに含まれる特定シーンから構成されるダイジェスト映像を特定するダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出するステップと、特徴量抽出部において、前記ステップにより抽出された前記シーン情報に対応した前記複数の定常シーンを、前記映像データベースから抽出し、これらの複数の定常シーンをそれぞれ構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出するステップと、識別器学習部において、前記特徴量ベクトルを抽出するステップにより抽出された前記複数の定常シーンの対応するフレーム画像毎の特徴量ベクトル集合により識別器を作成するステップと、識別対象シーン抽出部において、前記ダイジェスト情報データベースに保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出するステップと、前記特徴量抽出部において、前記識別対象シーンを特定するシーン情報を抽出するステップにより抽出された前記シーン情報に対応した前記識別対象シーンを前記映像データベースから抽出し、この識別対象シーンを構成する前記フレーム画像毎に特徴量ベクトルを抽出するステップと、非定常シーン抽出部において、前記識別対象シーンの特徴量ベクトルを抽出するステップにより、前記識別対象シーンが非定常シーンであるか否かを識別するステップと、結果表示部において、前記非定常シーンであるか否かを識別するステップにより識別された前記非定常シーンを表示するステップと、を具備することを特徴とする方法である。 In the video processing method of the present invention, the specific scene extraction unit stores a plurality of frame images from the digest information that is stored in the digest information database and specifies the digest video that includes the specific scene included in the video content. Extracting the scene information identifying the stationary scene of the image, and in the feature amount extraction unit, extracting the plurality of stationary scenes corresponding to the scene information extracted in the step from the video database, A step of extracting a feature vector for each frame image of a predetermined time interval constituting each stationary scene, and a step of extracting the feature vector in the discriminator learning unit correspond to the plurality of stationary scenes. Classifier based on feature vector set for each frame image A step of creating, a step of extracting scene information for identifying a scene to be identified consisting of a plurality of frame images from the extraction target digest information stored in the digest information database in the scene to be extracted of the identification target, and the feature amount extraction In the section, the identification target scene corresponding to the scene information extracted by the step of extracting scene information for identifying the identification target scene is extracted from the video database, and for each frame image constituting the identification target scene. A step of extracting a feature quantity vector; a step of extracting whether or not the identification target scene is a non-stationary scene by a step of extracting a feature quantity vector of the identification target scene in a non-stationary scene extraction unit; and a result Whether or not the display unit is the unsteady scene A method characterized by comprising the step of displaying the non-stationary scene identified by identifying.

本発明によれば、ダイジェスト映像の中から視聴する価値の高いシーンのみを抽出して利用者に提示することで、更に効率的な視聴を実現することができる映像処理装置及びその方法を提供することができる。 According to the present invention, it is possible to provide a video processing apparatus and method that can realize more efficient viewing by extracting only a scene worth watching from a digest video and presenting it to the user. be able to.

第１の実施形態に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 1st Embodiment. 映像ＤＢに記録された映像コンテンツのデータを示すテーブル図である。It is a table figure which shows the data of the video content recorded on video DB. ダイジェスト映像を示す模式図である。It is a schematic diagram which shows a digest image | video. ダイジェスト情報ＤＢに記録されたダイジェスト映像のデータを示すテーブル図である。It is a table figure which shows the data of the digest image | video recorded on digest information DB. 映像単位から特徴量ベクトルを抽出する様子を説明する説明図である。It is explanatory drawing explaining a mode that a feature-value vector is extracted from a video unit. 特徴量ベクトルと識別面との関係を示す説明図である。It is explanatory drawing which shows the relationship between a feature-value vector and an identification surface. 識別器を作成する方法を説明する説明図である。It is explanatory drawing explaining the method to produce a discriminator. 識別対象シーンが定常シーンであると識別する方法を示した模式図である。It is the schematic diagram which showed the method of identifying that an identification object scene is a stationary scene. 識別対象シーンが非定常シーンであると識別する方法を示した模式図である。It is the schematic diagram which showed the method of identifying that an identification object scene is an unsteady scene. 識別対象シーンが非定常シーンであると識別する方法を示した模式図である。It is the schematic diagram which showed the method of identifying that an identification object scene is an unsteady scene. 第１の実施形態に係る映像処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video processing apparatus which concerns on 1st Embodiment. 特定シーン抽出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a specific scene extraction part. 特徴量抽出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a feature-value extraction part. 識別器学習部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a discriminator learning part. 識別対象シーン抽出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the identification object scene extraction part. 非定常シーン検出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an unsteady scene detection part. 第２の実施形態に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る映像処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video processing apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る映像処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video processing apparatus which concerns on 2nd Embodiment.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下に説明する各実施形態に係る映像処理装置は、既存のダイジェスト映像から通常のシーンを取り出し、これらのシーンの特徴量を学習データとして識別器に学習させる。そして、この識別器を用いて、視聴したいダイジェスト映像から、通常とは異なるシーンを取り出して表示する映像処理装置である。なお、以下の説明において、視聴したいダイジェスト映像を、抽出対象ダイジェスト映像と称す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the video processing apparatus according to each embodiment described below takes out normal scenes from existing digest video, and causes the discriminator to learn feature quantities of these scenes as learning data. And it is a video processing apparatus which takes out and displays a scene different from usual from the digest video to view using this discriminator. In the following description, a digest video to be viewed is referred to as an extraction target digest video.

（第１の実施形態）
まず、第１の実施形態に係る映像処理装置について、図１乃至図１２を参照して説明する。なお、本実施形態に係る映像処理装置においては、機器構成数が最小である。 (First embodiment)
First, the video processing apparatus according to the first embodiment will be described with reference to FIGS. In the video processing apparatus according to the present embodiment, the number of device configurations is the smallest.

初めに、既存のダイジェスト映像から学習データを作成して識別器に学習させるための構成について、説明する。この構成は、ダイジェスト情報ＤＢ１２に保存されたダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出する特定シーン抽出手段と、抽出されたシーン情報に対応する複数の定常シーンを映像ＤＢ１１から抽出し、これらの複数の定常シーンをそれぞれ構成するフレーム画像毎に特徴量ベクトルを抽出する手段と、抽出された複数の定常シーンの対応するフレーム画像毎の特徴量ベクトル集合を基準ベクトルとして有する識別器と、からなる。以下に、これらの構成について、図１乃至図５を参照して詳細に説明する。 First, a configuration for creating learning data from an existing digest video and causing a discriminator to learn will be described. This configuration includes specific scene extracting means for extracting scene information for specifying a plurality of stationary scenes composed of a plurality of frame images from the digest information stored in the digest information DB 12, and a plurality of stationary information corresponding to the extracted scene information. A means for extracting a scene from the video DB 11 and extracting a feature vector for each frame image constituting each of the plurality of stationary scenes, and a feature vector set for each frame image corresponding to the extracted plurality of stationary scenes. A classifier having a reference vector. Hereinafter, these configurations will be described in detail with reference to FIGS. 1 to 5.

図１は、第１の実施形態に係る映像処理装置の構成を示すブロック図である。図１に示すように、まず、第１の実施形態に係る映像処理装置において、映像データベース１１（以下、データベースをＤＢと称す）には、複数の映像コンテンツが、図２に示すテーブルの形で記録されている。 FIG. 1 is a block diagram showing the configuration of the video processing apparatus according to the first embodiment. As shown in FIG. 1, first, in the video processing device according to the first embodiment, a plurality of video contents are stored in the form of a table shown in FIG. It is recorded.

図２に示すテーブルを構成する各レコードは、映像コンテンツを識別する映像番号、映像コンテンツの名前である映像名、映像コンテンツの種類である種類、映像コンテンツが作成された日時である作成日時、映像コンテンツの映像時間である時間、実際に映像データが格納されている記録領域（図示せず）へのリンク先情報を示す映像データリンクからなる。なお、上述の例においては、この映像ＤＢ１１には、映像コンテンツが実際に格納された記憶領域へのリンク先情報が保存されているが、実際に映像コンテンツが保存されていてもよい。 Each record constituting the table shown in FIG. 2 includes a video number for identifying video content, a video name that is the name of the video content, a type that is the type of video content, a creation date and time that is the date and time when the video content was created, and a video It consists of a video data link indicating link destination information to a recording area (not shown) where video data is actually stored, which is the video time of the content. In the above example, the video DB 11 stores link destination information to the storage area where the video content is actually stored, but the video content may actually be stored.

また、図１に示すダイジェスト情報ＤＢ１２には、映像ＤＢ１１に記録された映像コンテンツに含まれる特定のシーンから構成されるダイジェスト映像を特定するダイジェスト情報が記録されている。 In the digest information DB 12 shown in FIG. 1, digest information for specifying a digest video composed of a specific scene included in the video content recorded in the video DB 11 is recorded.

図３Ａは、上述のダイジェスト映像の一例として、映像ＤＢ１１に記録された「プロ野球オールスター戦第１戦」から作成されたダイジェスト映像を示す模式図である。図３Ａに示すダイジェスト映像は、バッター対ピッチャーのシーンがあり、バッターが打つと内野守備のシーンに移り、次に打ったバッターの選手の顔のアップのシーンになり、またピッチャー対バッターのシーンになる映像である。 FIG. 3A is a schematic diagram showing a digest video created from the “professional baseball all-star game first game” recorded in the video DB 11 as an example of the digest video described above. The digest video shown in FIG. 3A has a batter-to-pitcher scene. When the batter hits, it moves to the infield defense scene, then the batter player's face up scene, and the pitcher-to-batter scene. It is an image.

このような図３Ａに示すダイジェスト映像は、ダイジェスト情報ＤＢ１２に、図３Ｂに示すテーブルの形で記録されている。このテーブルを構成する各レコードには、それぞれに、シーン情報が記録されている。 The digest video shown in FIG. 3A is recorded in the digest information DB 12 in the form of a table shown in FIG. 3B. In each record constituting this table, scene information is recorded.

シーン情報は、シーン毎に連続したシーン番号と、シーンラベルとを有し、さらに、各シーンの映像コンテンツ中における位置を示す位置情報と、各シーンが視聴する価値の低い定常シーンであるか、視聴する価値の高い非定常シーンであるかを区別する情報と、を備えている。この定常シーンと非定常シーンとについては、後に詳述する。 The scene information has a continuous scene number for each scene and a scene label, and further indicates position information indicating a position in the video content of each scene, and whether each scene is a low-value stationary scene to be viewed, And information for distinguishing whether the scene is a non-stationary scene that is highly worth viewing. The steady scene and the unsteady scene will be described in detail later.

ここで、シーンラベルとは、各シーンの映像名である。例えば、上述の野球中継のダイジェスト映像の例において、シーンラベルとは、「ピッチャー対バッター」、「守備シーン」、「選手アップ」などである。 Here, the scene label is a video name of each scene. For example, in the example of the digest video of the baseball broadcast described above, the scene labels are “pitcher vs batter”, “defense scene”, “player up”, and the like.

また、位置情報は、映像ＤＢ１１に記録された映像コンテンツの映像番号に対応した映像番号及び、映像ＤＢ１１に記録された映像コンテンツのフレーム番号に対応し、それぞれのシーンの開始フレームを示す開始フレーム番号及び、シーンの終了フレームを示す終了フレーム番号からなる。 The position information corresponds to the video number corresponding to the video number of the video content recorded in the video DB 11 and the start frame number indicating the start frame of each scene corresponding to the frame number of the video content recorded in the video DB 11. And an end frame number indicating the end frame of the scene.

また、定常シーンであるか非定常シーンであるかを区別する情報は、付加情報１〜Ｎからなる。付加情報は、図３Ｂの例ではＮ＝２であり、例えば付加情報１には、何回表もしくは何回裏のプレイシーンであるかがわかる情報が記録されている。また、付加情報２には、例えば守備シーンに対する「１ゴ」（１塁ゴロの意味）、「３失」（３塁手がエラーしたという意味）などのスコアブック情報が記録されている。すなわち、各シーンが定常シーンであるか非定常シーンであるかは、この付加情報によって区別される。 Information for distinguishing whether the scene is a stationary scene or an unsteady scene includes additional information 1 to N. The additional information is N = 2 in the example of FIG. 3B. For example, in the additional information 1, information indicating how many times the front or back play scene is recorded is recorded. Further, in the additional information 2, for example, scorebook information such as “1go” (meaning 1 塁 goro) and “3 lost” (meaning that 3rd hand has an error) for the defensive scene is recorded. That is, whether each scene is a stationary scene or an unsteady scene is distinguished by this additional information.

ここで定常シーンとは、通常起こり得る確率が高い、特に視聴する価値の低いシーンである。例えば、野球中継の映像における定常シーンとは、守備シーンにおけるアウトとなるシーンである。すなわち、「守備シーン」なるシーンラベルを有するシーンの中で、「１ゴ」などと記された付加情報を有するシーンである。 Here, the stationary scene is a scene that has a high probability of being normally generated and that is particularly low in view value. For example, a stationary scene in a baseball broadcast video is a scene that is out in a defensive scene. That is, among scenes having a scene label of “defense scene”, the scene has additional information such as “1go”.

これとは反対に、非定常シーンとは、通常では起こり得る確率が低い、特に視聴する価値の高いシーンである。例えば、野球中継の映像における非定常シーンとは、守備シーンにおけるエラーのシーンである。すなわち、「守備シーン」なるシーンラベルを有するシーンの中で、「３失」などと記された付加情報を有するシーンである。 On the other hand, a non-stationary scene is a scene that has a low probability of being normally generated and that is particularly worth watching. For example, an unsteady scene in a baseball broadcast video is an error scene in a defensive scene. That is, among scenes having a scene label of “defense scene”, the scene has additional information such as “3 lost”.

なお、この付加情報は、例えば視聴者等によって入力されたものであってもよいし、後述する第３の実施形態のように、例えばインターネット等を介して外部から取得された情報であってもよい。 The additional information may be input by a viewer or the like, for example, or may be information acquired from the outside via the Internet or the like as in a third embodiment to be described later. Good.

次に、上述のダイジェスト情報ＤＢ１２に保存されたダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出する特定シーン抽出手段について説明する。 Next, a specific scene extracting means for extracting scene information for specifying a plurality of stationary scenes composed of a plurality of frame images from the digest information stored in the digest information DB 12 will be described.

上述したダイジェスト情報ＤＢ１２には、図１に示すように、特定シーン抽出部１３が接続される。この特定シーン抽出部１３は、ダイジェスト情報ＤＢ１２に記録されている既存のダイジェスト映像のうち、指定された少なくとも１つのダイジェスト映像のシーン情報から、定常シーンのシーン情報のみを抽出する。抽出された定常シーンのシーン情報は、特定シーン情報ＲＡＭ１５に記録される。 As shown in FIG. 1, a specific scene extraction unit 13 is connected to the digest information DB 12 described above. The specific scene extraction unit 13 extracts only the scene information of the steady scene from the scene information of at least one designated digest video among the existing digest videos recorded in the digest information DB 12. The extracted scene information of the steady scene is recorded in the specific scene information RAM 15.

この定常シーンの抽出は、例えば視聴者等が、シーンラベルと付加情報とを指定することにより抽出される。 The stationary scene is extracted by, for example, a viewer or the like specifying a scene label and additional information.

例えば、視聴者が、シーンラベルとして「守備シーン」、付加情報として「１ゴ」等のアウトとなるシーンのスコアブック情報を指定することで、抽出される。 For example, it is extracted when the viewer designates scorebook information of an out scene such as “defensive scene” as a scene label and “1go” as additional information.

図１に示す特定シーン情報ＲＡＭ１５には、特定シーン抽出部１３で抽出した定常シーンのシーン情報が、抽出された順に新たに連続した定常シーンのシーン番号が付与されて、テーブルの形で記録されている。 In the specific scene information RAM 15 shown in FIG. 1, the scene information of the steady scene extracted by the specific scene extracting unit 13 is given a new continuous scene number in the order of extraction and is recorded in the form of a table. ing.

次に、上述の手段により抽出された複数の定常シーンを特定するシーン情報に対応した複数の定常シーンを映像ＤＢ１１から抽出し、これらの複数の定常シーンをそれぞれ構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出する手段について説明する。 Next, a plurality of stationary scenes corresponding to the scene information specifying the plurality of stationary scenes extracted by the above-described means are extracted from the video DB 11, and frame images at predetermined time intervals that respectively constitute the plurality of stationary scenes. A means for extracting the feature vector for each will be described.

上述の特定シーン情報ＲＡＭ１５には、図１に示すように、特徴量抽出部１６が接続されている。特徴量抽出部１６は、特定シーン情報ＲＡＭ１５に記録された定常シーンのシーンを参照して、これらのシーン情報に対応する定常シーンを、特徴量抽出部１６に接続された映像ＤＢ１１からそれぞれ抽出する。そして、定常シーンをそれぞれ構成する所定の時間間隔の映像単位の特徴量ベクトルを算出する。さらに、算出された特徴量ベクトルを、特徴量抽出部１６に接続された特徴量ベクトルＲＡＭ１７に記録する。 As shown in FIG. 1, a feature quantity extraction unit 16 is connected to the specific scene information RAM 15 described above. The feature quantity extraction unit 16 refers to the scenes of the stationary scene recorded in the specific scene information RAM 15 and extracts the stationary scenes corresponding to the scene information from the video DB 11 connected to the feature quantity extraction unit 16. . Then, a feature vector in units of video at predetermined time intervals constituting each of the steady scenes is calculated. Further, the calculated feature vector is recorded in the feature vector RAM 17 connected to the feature extraction unit 16.

ここで映像単位とは、抽出されたシーンを、フレーム単位、または数フレーム単位などの所定の時間間隔でサンプリングして得られる時系列のフレーム画像である。図４は、抽出されたシーンの１つからサンプリングして得られた時系列に沿うＮ枚の映像単位から、映像単位毎に特徴量ベクトルを算出する様子を模式的に示す説明図である。図４に示すように、映像単位の数がＮフレームであったとすると、ｖ１〜ｖＮのＮ個の特徴量ベクトルが算出される。なお、サンプリングする時間間隔は、一定であってもよいし、また、異なった時間間隔毎にサンプリングしてもよい。 Here, the video unit is a time-series frame image obtained by sampling the extracted scene at a predetermined time interval such as a frame unit or several frame units. FIG. 4 is an explanatory diagram schematically showing how a feature vector is calculated for each video unit from N video units along a time series obtained by sampling from one of the extracted scenes. As shown in FIG. 4, assuming that the number of video units is N frames, N feature vectors v1 to vN are calculated. Note that the sampling time interval may be constant, or may be sampled at different time intervals.

また、特徴量ベクトルとは、映像単位毎に算出される特徴量をベクトル表示したものである。例えば、映像コンテンツがＭｐｅｇ−１形式のデータとして映像ＤＢ１１に記録されているとし、この映像コンテンツを構成する１つのシーンの映像単位を、シーンの先頭からI番目のフレーム画像（Ｉフレーム）とP番目のフレーム画像（Ｐフレーム）であるとする。このとき、ＩフレームとＰフレームとから、動きベクトルを算出する。更にこの動きベクトルから、中継カメラの動きを示すパラメータを推定する。この中継カメラの動きを示すパラメータを、Iフレームの特徴量ベクトルとする。すなわち、ある１つのシーンからサンプリングされたｊ番目の映像単位から、中継カメラの動きを示すパラメータとして、ｃ_ｊ，１、ｃ_ｊ，２、・・・、ｃ_ｊ，ＫのＫ個のパラメータを算出した場合、ｊ番目の映像単位の特徴量ベクトルＶｊとは、Ｖｊ＝（ｃ_ｊ，１、ｃ_ｊ，２、・・・、ｃ_ｊ，Ｋ）である。 Also, the feature quantity vector is a vector display of the feature quantity calculated for each video unit. For example, if video content is recorded in the video DB 11 as data in the Mpeg-1 format, the video unit of one scene constituting this video content is defined as the I-th frame image (I frame) and P from the beginning of the scene. It is assumed that this is the th frame image (P frame). At this time, a motion vector is calculated from the I frame and the P frame. Further, a parameter indicating the motion of the relay camera is estimated from this motion vector. The parameter indicating the motion of the relay camera is defined as a feature vector of the I frame. That is, from the j-th video unit sampled from a certain scene _{, K} parameters c _{j, 1} , c _{j, 2} ,..., C _j, K are set as parameters indicating the movement of the relay camera. When calculated, the feature quantity vector Vj of the j-th video unit is Vj = (c _{j, 1} , c _{j, 2} ,..., C _{j, K} ).

図１に示す特徴量ベクトルＲＡＭ１７には、特徴量抽出部１６で抽出した定常シーンの特徴量ベクトルが、定常シーンのシーン番号、各シーンを構成する映像単位の映像単位番号とともに、テーブルの形で記録されている。なお、この映像単位番号は、各シーンの先頭から順に付された連続する番号である。 In the feature quantity vector RAM 17 shown in FIG. 1, the feature quantity vectors of the stationary scene extracted by the feature quantity extraction unit 16 are displayed in the form of a table together with the scene number of the stationary scene and the video unit number of the video unit constituting each scene. It is recorded. This video unit number is a consecutive number assigned in order from the beginning of each scene.

次に、上述の手段により抽出された複数の定常シーンの対応するフレーム画像の特徴量ベクトル集合を学習データとして、この学習データから学習される識別器について説明する。 Next, a classifier that is learned from learning data using a feature vector set of frame images corresponding to a plurality of stationary scenes extracted by the above-described means will be described.

上述の特徴量ベクトルＲＡＭ１７には、図１に示すように、識別器学習部１８が接続されている。識別器学習部１８は、特徴量ベクトルＲＡＭ１７に記録された定常シーンの特徴量ベクトルの集合から、識別器を作成する。ここで識別器とは、後述するように、特徴量ベクトルの集合である学習データから学習された識別面をいう。そして、少なくとも識別器毎に付与される識別器番号と識別面からなる識別器の情報を、識別器ＤＢ２０に記録する。 As shown in FIG. 1, a classifier learning unit 18 is connected to the feature vector RAM 17 described above. The discriminator learning unit 18 creates a discriminator from the set of feature amount vectors of the stationary scene recorded in the feature amount vector RAM 17. Here, as will be described later, the discriminator means an identification plane learned from learning data that is a set of feature quantity vectors. And the information of the discriminator which consists of the discriminator number provided for every discriminator and a discriminating surface is recorded on discriminator DB20.

ここで、識別面１８１とは、図５Ａに模式的に示すように、特徴量ベクトル空間において、既存のダイジェスト映像から抽出された各定常シーンを構成するそれぞれの映像単位のうち、同一番目（例えばｉ番目）の映像単位がそれぞれ有する特徴量ベクトル（Ｖ_ｉ ^１、Ｖ_ｉ ^２、・・・、Ｖ_ｉ ^Ｍ）を学習データとして学習された面である。すなわち、この特徴量ベクトル（Ｖ_ｉ ^１、Ｖ_ｉ ^２、・・・、Ｖ_ｉ ^Ｍ）の集合は、後に非定常シーン検出部１９にて識別対象シーンの特徴量ベクトルと比較するための基準ベクトルであり、この基準ベクトルによって特徴量ベクトル空間に形成される面が識別面である。 Here, as schematically shown in FIG. 5A, the identification surface 181 is the same (for example, the same unit among the video units constituting each stationary scene extracted from the existing digest video in the feature vector space (for example, This is a plane learned using the feature vector (V _i ¹ , V _i ² ,..., V _i ^M ) of each i-th video unit as learning data. That is, the reference vectors for comparing the feature vector _{^{_{^{(V i 1, V i 2}}}} , ···, V i M) set of later a feature quantity vector of the identification target scene at unsteady scene detection unit 19 The surface formed in the feature vector space by this reference vector is the identification surface.

なお、識別面の形成において、特徴量ベクトル（Ｖ_ｉ ^１、Ｖ_ｉ ^２、・・・、Ｖ_ｉ ^Ｍ）のうち、他のベクトルと全く異なる点を示すベクトルが存在する場合、そのベクトルは除外して識別面が学習される。この除外する特徴量ベクトルの範囲は、後述する非定常シーン検出部１９にて識別対象シーンから非定常シーンを識別する識別精度に応じて、任意に指定することができる。 It should be noted that, in the formation of the identification surface, if there is a vector indicating a point that is completely different from other vectors among the feature amount vectors (V _i ¹ , V _i ² ,..., V _i ^M ), the vector is excluded. Thus, the identification surface is learned. The range of the feature vector to be excluded can be arbitrarily designated according to the identification accuracy for identifying the unsteady scene from the classification target scene by the unsteady scene detection unit 19 described later.

図５Ｂは、時系列に沿って並べられた全部でＮ_Ｍ個の特徴量ベクトルの集合である学習データから、識別器ｆｉ（ｉ＝１〜Ｎ_Ｍ）を学習させる様子を模式的に示す説明図である。図５Ｂに示すように、特徴量ベクトルＲＡＭ１７から、シーン番号が１、２、・・・、Ｍであるそれぞれの定常シーンを構成するそれぞれの映像単位のうち、ｉ番目の映像単位がそれぞれ有する特徴量ベクトルＶ_ｉ ^１、Ｖ_ｉ ^２、・・・、Ｖ_ｉ ^Ｍを抽出する場合、特徴量ベクトル空間において、特徴量ベクトルＶ_ｉ ^１、Ｖ_ｉ ^２、・・・、Ｖ_ｉ ^Ｍの全てからなる集合１８２を囲むことによって、図５Ａに示すような識別面１８１が形成される。この識別面１８１を学習させることで、識別器ｆｉを作成する。なお、図５Ａに示すような識別面１８１は、抽出された全ての特徴量ベクトルから、識別する識別精度に応じて任意に指定した範囲内の特徴量ベクトルからなる集合によって形成されてもよい。 Figure 5B, when the learning data is a set of N _M-number of feature vectors in total of ordered along the sequence shows how to learn the classifier fi (i = 1~N _M) schematically described FIG. As shown in FIG. 5B, from the feature quantity vector RAM 17, the i-th video unit has each of the video units constituting each stationary scene having scene numbers 1, 2,... the amount vector _{^{_{^{V i 1, V i 2,}}}} ···, when extracting _V ^{i M,} the feature amount vector space, the feature vector _{^{_{^{V i 1, V i 2,}}}} ···, consisting of all the _V ^{i M} By surrounding the set 182, an identification surface 181 as shown in FIG. 5A is formed. The discriminator fi is created by learning the discriminating surface 181. Note that the identification surface 181 as shown in FIG. 5A may be formed by a set of feature amount vectors within a range arbitrarily designated according to identification accuracy from all extracted feature amount vectors.

例えば、学習する識別器は、１−ｃｌａｓｓＳＶＭである。学習データの要素である特徴量ベクトルとして、前述のように中継カメラの動きを示すパラメータを用いた場合、１−ｃｌａｓｓＳＶＭは、特徴量ベクトル空間において、それぞれの定常シーンにおける中継カメラの動きを示す複数の特徴量ベクトルを囲むことで形成される識別面を学習する。 For example, the classifier to learn is 1-class SVM. When the parameter indicating the motion of the relay camera is used as the feature vector that is an element of the learning data as described above, the 1-class SVM has a plurality of relay cameras that indicate the motion of each relay scene in the feature vector space. A discriminant plane formed by enclosing the feature vector is learned.

以上の構成により、既存のダイジェスト映像から学習データを作成して識別器に学習させることができる。 With the above configuration, learning data can be created from an existing digest video and can be learned by a discriminator.

次に、学習データを学習した識別器を用いて、抽出対象ダイジェスト映像から、通常とは異なるシーン、すなわち、非定常シーンを取り出して表示するための構成について説明する。この構成は、ダイジェスト情報ＤＢ１２に保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出する手段と、抽出された識別対象シーンのシーン情報に対応した識別対象シーンを映像ＤＢ１１から抽出し、この識別対象シーンを構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出する手段と、抽出された識別対象シーンを構成するフレーム画像毎の特徴量ベクトルを識別器に供給することにより、この識別器が有する学習データである基準ベクトルと、識別対象シーンの特徴量ベクトルとを比較して非定常シーンを出力する手段と、からなる。以下に、これらの構成について、図１及び、図６Ａ乃至図６Ｃを参照して説明する。 Next, a configuration for extracting and displaying a scene different from normal, that is, an unsteady scene from the digest video to be extracted using the discriminator that has learned learning data will be described. This configuration includes means for extracting scene information for identifying an identification target scene composed of a plurality of frame images from extraction target digest information stored in the digest information DB 12, and identification corresponding to the scene information of the extracted identification target scene. Means for extracting the target scene from the video DB 11 and extracting a feature vector for each frame image of a predetermined time interval constituting the identification target scene; and a feature vector for each frame image constituting the extracted identification target scene Is supplied to the discriminator, and a reference vector that is learning data of the discriminator is compared with a feature vector of the scene to be discriminated to output a non-stationary scene. Hereinafter, these configurations will be described with reference to FIGS. 1 and 6A to 6C.

なお、この非定常シーンの取り出しは、上述の特定シーン抽出部１３において付加情報を参照することで定常シーンであるか、非定常シーンであるかを区別したのと同様に実行できるとも思われる。しかし、例えば、抽出対象ダイジェスト映像が、新たにダイジェスト情報ＤＢ１２に記録されたばかりのダイジェスト映像であった場合には、各シーンに付加情報は付されておらず、定常シーンであるか、非定常シーンであるかを区別することはできない。また、抽出対象ダイジェスト映像が既存のダイジェスト映像であり、各シーンに付加情報が付されていた場合であっても、ファインプレーシーン等のスコア上に表現されないが視聴する価値の高いシーン等は、付加情報を参照するのみでは区別することができない。従って、上述した識別器を用いて、抽出対象ダイジェスト映像の各シーンが、定常シーンであるか、非定常シーンであるかを識別する必要がある。 It should be noted that the extraction of the non-stationary scene can be executed in the same manner as the above-described specific scene extraction unit 13 refers to the additional information to distinguish whether the scene is a stationary scene or an unsteady scene. However, for example, if the digest video to be extracted is a digest video that has just been newly recorded in the digest information DB 12, no additional information is attached to each scene, and it is a steady scene or an unsteady scene. Cannot be distinguished. Also, even if the digest video to be extracted is an existing digest video and additional information is added to each scene, scenes that are not represented on the score of fine play scenes or the like but are worth watching are added. It cannot be distinguished only by referring to the information. Therefore, it is necessary to identify whether each scene of the extraction target digest video is a stationary scene or an unsteady scene using the above-described classifier.

初めに、ダイジェスト情報ＤＢ１２に保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出する手段について説明する。 First, a means for extracting scene information for identifying an identification target scene composed of a plurality of frame images from extraction target digest information stored in the digest information DB 12 will be described.

まず、上述のダイジェスト情報ＤＢ１２に記録されたダイジェスト情報によって特定されるダイジェスト映像の中から、例えば視聴者が、抽出対象ダイジェスト映像を指定する。このように、抽出対象ダイジェスト映像を指定すると、このダイジェスト情報ＤＢ１２に接続された図１に示す識別対象シーン抽出部１５において、抽出対象ダイジェスト映像を特定するダイジェスト情報の中から、識別対象シーンを特定するシーン情報が抽出される。このシーン情報によって特定される識別対象シーンは、特定シーン抽出部１３で指定したシーンラベルと同一シーンラベルを有するシーンである。例えば、上述の野球中継の例において、特定シーン抽出部１３で指定したシーンラベルが守備シーンである場合、識別対象シーンとは、抽出対象ダイジェスト映像における守備シーンである。 First, for example, the viewer designates the digest video to be extracted from the digest video specified by the digest information recorded in the digest information DB 12 described above. As described above, when the extraction target digest video is specified, the identification target scene is specified from the digest information for specifying the extraction target digest video in the identification target scene extraction unit 15 shown in FIG. 1 connected to the digest information DB 12. The scene information to be extracted is extracted. The identification target scene specified by this scene information is a scene having the same scene label as the scene label specified by the specific scene extraction unit 13. For example, in the above-described baseball broadcast example, when the scene label specified by the specific scene extraction unit 13 is a defensive scene, the identification target scene is a defensive scene in the extraction target digest video.

このような識別対象シーンは、特定シーン抽出部１３で指定したシーンラベルと同一のシーンラベルを指定することで抽出される。抽出された識別対象シーンのシーン情報は、この識別対象シーン抽出部１５に接続された特定シーン情報ＲＡＭ１５に記録される。 Such a scene to be identified is extracted by designating the same scene label as the scene label designated by the specific scene extraction unit 13. The extracted scene information of the identification target scene is recorded in the specific scene information RAM 15 connected to the identification target scene extraction unit 15.

特定シーン情報ＲＡＭ１５には、特定シーン抽出部１３で抽出した定常シーンのシーン情報とともに、識別対象シーン抽出部１５で抽出した識別対象シーンのシーン情報が記録されている。 In the specific scene information RAM 15, scene information of the identification target scene extracted by the identification target scene extraction unit 15 is recorded along with the scene information of the stationary scene extracted by the specific scene extraction unit 13.

次に、上述の手段により抽出された識別対象シーンのシーン情報に対応した識別対象シーンを映像ＤＢ１１から抽出し、この識別対象シーンを構成するフレーム画像毎に特徴量ベクトルを抽出する手段について説明する。 Next, means for extracting an identification target scene corresponding to the scene information of the identification target scene extracted by the above-described means from the video DB 11 and extracting a feature vector for each frame image constituting the identification target scene will be described. .

特定シーン情報ＲＡＭ１５には、上述したように、特徴量抽出部１６が接続されており、定常シーンに対する特徴量ベクトルの算出と同様にして、識別対象シーンを構成する各単位映像の特徴量ベクトルを算出し、この算出された特徴量ベクトルを、特徴量ベクトルＲＡＭ１７に記録する。 As described above, the feature amount extraction unit 16 is connected to the specific scene information RAM 15, and the feature amount vector of each unit video constituting the identification target scene is obtained in the same manner as the calculation of the feature amount vector for the stationary scene. The calculated feature quantity vector is recorded in the feature quantity vector RAM 17.

特徴量ベクトルＲＡＭ１７には、定常シーンの特徴量ベクトルとともに、識別対象シーンの特徴量ベクトルが記録されている。 The feature quantity vector RAM 17 stores the feature quantity vector of the identification target scene together with the feature quantity vector of the stationary scene.

次に、上述の手段により抽出された識別対象シーンを構成するフレーム画像毎の特徴量ベクトルを識別器に供給することにより、この識別器が有する基準ベクトルと、識別対象シーンの特徴量ベクトルとを比較して非定常シーンを出力する手段について説明する。 Next, by supplying to the discriminator a feature vector for each frame image constituting the scene to be identified extracted by the above-described means, the reference vector of the discriminator and the feature vector of the scene to be discriminated are obtained. A means for outputting an unsteady scene in comparison will be described.

特徴量ベクトルＲＡＭ１７には、図１に示すように、非定常シーン検出部１９が、上述した識別器学習部１８とともに接続されている。この非定常シーン検出部１９は、識別対象シーン抽出部１４で抽出したシーン情報によって特定される識別対象シーンが、非定常シーンであるかどうかを識別する。そして、非定常シーンであると識別された識別対象シーンのシーン情報は、非定常シーン検出部１９に接続された非定常シーン情報ＤＢ２１に記録される。 As shown in FIG. 1, an unsteady scene detection unit 19 is connected to the feature vector RAM 17 together with the classifier learning unit 18 described above. The unsteady scene detection unit 19 identifies whether or not the identification target scene specified by the scene information extracted by the identification target scene extraction unit 14 is an unsteady scene. Then, the scene information of the scene to be identified that is identified as an unsteady scene is recorded in the unsteady scene information DB 21 connected to the unsteady scene detection unit 19.

ここで、識別対象シーンが非定常シーンであるかどうかの識別は、識別対象シーンの特徴量ベクトルと、非定常シーン検出部１９に接続された識別器ＤＢ２０に記録された識別器の情報とを用いて識別する。すなわち、識別器ＤＢ２０に記録された識別器の情報から、対象の識別器を特定し、この特定された識別器に、識別対象シーンの特徴量ベクトルを供給し、これと識別器が有する学習データとを比較することで識別される。 Here, the identification of whether or not the classification target scene is a non-stationary scene is performed by using the feature quantity vector of the classification target scene and the information of the classifier recorded in the classifier DB 20 connected to the non-stationary scene detection unit 19. Use to identify. That is, a target classifier is identified from the classifier information recorded in the classifier DB 20, and a feature quantity vector of the classification target scene is supplied to the identified classifier. And are identified by comparing

図６Ａ、図６Ｂ、図６Ｃは、定常シーンであるか非定常シーンであるかを識別したい識別対象シーンの特徴量ベクトルｖｉ（ｉ＝１〜Ｎ）を、それぞれ識別器ｆｉ（ｉ＝１〜Ｎ）に時系列順に供給することで、識別対象シーンの識別を行う様子を示した模式図である。図６Ａ、図６Ｂ、図６Ｃに示すように、ｉ＝１から順にＮまで特徴量ベクトルｖｉに対して識別器ｆｉを用いて識別テストを行う。この識別テストの結果、識別対象シーンの特徴量ベクトルｖｉが、識別器ｆｉの識別面内に存在すると判断されれば（図６Ａ、図６Ｂ、図６Ｃにおいて○で示す）、次の特徴量ベクトルの識別に進む。 FIG. 6A, FIG. 6B, and FIG. 6C show the feature quantity vectors vi (i = 1 to N) of the scenes to be identified for identifying whether the scenes are stationary or non-stationary scenes, respectively. It is the schematic diagram which showed a mode that the recognition object scene is identified by supplying to N) in time series order. As shown in FIG. 6A, FIG. 6B, and FIG. 6C, the discrimination test is performed on the feature quantity vector vi from i = 1 to N in order using the discriminator fi. If it is determined as a result of this discrimination test that the feature quantity vector vi of the scene to be discriminated exists in the discrimination plane of the discriminator fi (indicated by a circle in FIGS. 6A, 6B, and 6C), the next feature quantity vector Proceed to identification.

図６Ａに示すように、識別対象シーンが有する全ての特徴量ベクトルｖｉが、対応する識別器ｆｉの識別面内に存在すれば、その識別対象シーンを定常シーンとして識別する。 As shown in FIG. 6A, if all the feature vectors vi of the identification target scene are present in the identification plane of the corresponding classifier fi, the identification target scene is identified as a stationary scene.

反対に、図６Ｂに示すように、特徴量ベクトルｖｉが、識別器ｆｉの識別面外に存在すると判断されれば（図６Ｂにおいて×で示す）、その時点で識別対象シーンを非定常シーンとして識別する。 On the other hand, as shown in FIG. 6B, if it is determined that the feature vector vi exists outside the discrimination plane of the discriminator fi (indicated by x in FIG. 6B), the scene to be identified is set as a non-stationary scene at that time Identify.

なお、ｉ＝１からＮまで順に上述の識別テストを行い、特徴量ベクトルｖｉが、識別器ｆｉの識別面外に存在するとの判定が２回以上連続すれば、非定常シーンとして識別してもよい。図６Ｃは、識別面外に存在するとの判定が２回連続した場合に、非定常シーンとして識別する例である。 If the above-described discrimination test is performed in order from i = 1 to N, and it is determined that the feature vector vi exists outside the discrimination plane of the discriminator fi for two or more times, it may be discriminated as an unsteady scene. Good. FIG. 6C is an example of identifying as an unsteady scene when it is determined that the image exists outside the identification plane for two consecutive times.

例えば野球の守備シーンにおいて、映像単位の特徴量ベクトルを、中継カメラの動きを示すパラメータとした場合、識別したい守備シーンのすべての映像単位の識別結果が○であればそのシーンは定常シーン、つまり、中継カメラの動きに異常がなく、アウトになったシーンであると考えられる。一方、途中で識別結果の×が少なくとも１つ以上連続すれば、その時点で中継カメラの動きに何らかの異常が起きたと考えられ、そのシーンは非定常シーンであると識別される。 For example, in a baseball defensive scene, when the feature quantity vector for each video is used as a parameter indicating the motion of the relay camera, if the identification result for all video units of the defensive scene to be identified is ○, the scene is a steady scene, that is, It is considered that the scene was out of the scene without any abnormal movement of the relay camera. On the other hand, if at least one or more identification results continue in the middle, it is considered that some abnormality has occurred in the motion of the relay camera at that time, and the scene is identified as an unsteady scene.

以上に説明した非定常シーン抽出手段によって抽出された非定常シーンのシーン情報は、非定常シーン検出部１９に接続された非定常シーン情報ＤＢ２１に記録される。 The scene information of the unsteady scene extracted by the unsteady scene extraction unit described above is recorded in the unsteady scene information DB 21 connected to the unsteady scene detection unit 19.

図１に示す非定常シーン情報ＤＢ２１には、非定常シーンとして識別された識別対象シーンのシーン情報が、テーブルの形で記録されている。 In the unsteady scene information DB 21 shown in FIG. 1, the scene information of the identification target scene identified as the unsteady scene is recorded in the form of a table.

この非定常シーン情報ＤＢ２１には、結果表示部２２が接続されている。結果表示部２２は、上述した映像ＤＢ１１も接続されており、非定常シーン情報ＤＢ２１に記録された非定常シーンのシーン情報を参照することで、結果表示部２２に接続された映像ＤＢ１１から対応するシーンが抽出され、この抽出されたシーンが表示される。表示形態は、例えばサムネイル表示であるが、本実施形態において限定されるものではない。 A result display unit 22 is connected to the unsteady scene information DB 21. The result display unit 22 is also connected to the video DB 11 described above, and corresponds to the video DB 11 connected to the result display unit 22 by referring to the scene information of the non-stationary scene recorded in the non-stationary scene information DB 21. A scene is extracted, and the extracted scene is displayed. The display form is, for example, a thumbnail display, but is not limited in this embodiment.

このように、非定常シーン抽出手段によって、抽出対象ダイジェスト映像から非定常シーンを抽出して表示された映像は、視聴する価値の高いシーンのみが抽出されているため、効率的な視聴を実現することができる。 As described above, since the unsteady scene is extracted from the digest video to be extracted by the unsteady scene extraction unit and displayed, only the scenes worth viewing are extracted, so that efficient viewing is realized. be able to.

以上に、第１の実施形態に係る映像処理装置の構成について説明した。 The configuration of the video processing apparatus according to the first embodiment has been described above.

次に、上述の映像処理装置の動作について、図７乃至図１２を参照して説明する。 Next, the operation of the above-described video processing apparatus will be described with reference to FIGS.

まず、図７を参照して、第１の実施形態に係る映像処理装置による映像処理方法を簡単に説明する。図７は、第１の実施形態に係る映像処理装置による映像処理方法を示すフローチャートである。 First, with reference to FIG. 7, a video processing method by the video processing apparatus according to the first embodiment will be briefly described. FIG. 7 is a flowchart illustrating a video processing method by the video processing apparatus according to the first embodiment.

図７に示すように、本実施形態の映像処理方法においては、まず、既存のダイジェスト映像から学習データを作成して識別器に学習させる。この方法は、まず初めに、既存のダイジェスト映像の中から、少なくとも１つのダイジェスト映像を指定し、続いて、特定のシーンラベルと付加情報とを指定することにより、特定シーン抽出部１３において、ダイジェスト情報ＤＢ１２に記録された既存のダイジェスト映像のシーン情報から定常シーンのシーン情報が抽出される（Ｓ１０１）。抽出された定常シーンのシーン情報は、特定シーン情報ＲＡＭ１５に記録される。 As shown in FIG. 7, in the video processing method of the present embodiment, first, learning data is created from an existing digest video and is learned by a discriminator. In this method, first, at least one digest video is specified from existing digest videos, and then a specific scene label and additional information are specified, whereby the specific scene extraction unit 13 performs the digest. The scene information of the stationary scene is extracted from the scene information of the existing digest video recorded in the information DB 12 (S101). The extracted scene information of the steady scene is recorded in the specific scene information RAM 15.

次に、定常シーンが抽出され、特定シーン情報ＲＡＭ１５に記録されると、特徴量抽出部１６において、抽出された定常シーンを構成するそれぞれの映像単位の特徴量ベクトルが算出される（Ｓ１０２）。算出された特徴量ベクトルは、特徴量ベクトルＲＡＭ１７に記録される。 Next, when a stationary scene is extracted and recorded in the specific scene information RAM 15, the feature amount extraction unit 16 calculates a feature amount vector for each video unit constituting the extracted stationary scene (S102). The calculated feature quantity vector is recorded in the feature quantity vector RAM 17.

次に、定常シーンを構成するそれぞれの映像単位の特徴量ベクトルが算出され、特徴量ベクトルＲＡＭ１７に記録されると、識別器学習部１８において、図５Ｂに示すように、特徴量ベクトルＲＡＭ１７に記録された特徴量ベクトルの集合１８２を学習データとし、識別器ｆｉとして図５Ａに示すような識別面１８１を学習させる（Ｓ１０３）。識別器ｆｉの情報は、識別器ＤＢ２０に記録される。 Next, when the feature vector of each video unit constituting the stationary scene is calculated and recorded in the feature vector RAM 17, the classifier learning unit 18 records the feature vector in the feature vector RAM 17 as shown in FIG. 5B. The set 182 of feature quantity vectors thus obtained is used as learning data, and a discriminating surface 181 as shown in FIG. 5A is learned as a discriminator fi (S103). Information on the discriminator fi is recorded in the discriminator DB 20.

以上のＳ１０１〜Ｓ１０３のステップにより、既存のダイジェスト映像から学習データを作成して識別器に学習させる。続いて、抽出対象ダイジェスト映像から非定常シーンを抽出し、抽出された非定常シーンを表示する。 Through the above steps S101 to S103, learning data is created from the existing digest video, and the discriminator is trained. Subsequently, the unsteady scene is extracted from the extraction target digest video, and the extracted unsteady scene is displayed.

この方法は、まず、ダイジェスト情報ＤＢ１２に記録されたダイジェスト映像の中から、抽出対象ダイジェスト映像を指定すると、識別対象シーン抽出部１４において、指定された抽出対象ダイジェスト映像のシーン情報から、特定のシーンラベルと同一のシーンラベルを有する識別対象シーンのシーン情報が抽出される。抽出された識別対象シーンのシーン情報は、特定シーン情報ＲＡＭ１５に記録される。そして、特定シーン情報ＲＡＭ１５に記録された識別対象シーンの特徴量ベクトルを算出する（Ｓ１０４）。この算出された識別対象シーンの特徴量ベクトルは、特徴量ベクトルＲＡＭ１７に記録される。 In this method, first, when an extraction target digest video is specified from the digest video recorded in the digest information DB 12, the identification target scene extraction unit 14 determines a specific scene from the scene information of the specified extraction target digest video. The scene information of the scene to be identified having the same scene label as the label is extracted. The extracted scene information of the scene to be identified is recorded in the specific scene information RAM 15. Then, the feature amount vector of the identification target scene recorded in the specific scene information RAM 15 is calculated (S104). The calculated feature vector of the scene to be identified is recorded in the feature vector RAM 17.

次に、識別対象シーンを抽出し、このシーンの特徴量ベクトルを算出して特徴量ベクトルＲＡＭ１７に記録すると、非定常シーン検出部１９において、特徴量ベクトルＲＡＭ１７に記録された識別対象シーンの特徴量ベクトルと、識別器ＤＢ２０に記録された識別器の情報とを用いて、識別対象シーンが非定常シーンであるか否かを識別する（Ｓ１０５）。識別された非定常シーンのシーン情報は、非定常シーン情報ＤＢ２１に記録される。 Next, when an identification target scene is extracted, a feature vector of this scene is calculated and recorded in the feature vector RAM 17, the unsteady scene detection unit 19 records the feature quantity of the identification target scene recorded in the feature vector RAM 17. Using the vector and the information of the classifier recorded in the classifier DB 20, it is identified whether or not the scene to be identified is an unsteady scene (S105). The scene information of the identified unsteady scene is recorded in the unsteady scene information DB 21.

以上のＳ１０４、Ｓ１０５のステップにより、抽出対象ダイジェスト映像から非定常シーンが抽出される。 By the steps of S104 and S105, an unsteady scene is extracted from the extraction target digest video.

最後に、非定常シーンのシーン情報が抽出され、非定常シーン情報ＤＢ２１に記録されると、このシーン情報を参照することで、映像ＤＢ１１から、非定常シーンを抽出し、この非定常シーンを結果表示部２２に表示する（Ｓ１０６）。 Finally, when the scene information of the unsteady scene is extracted and recorded in the unsteady scene information DB 21, the unsteady scene is extracted from the video DB 11 by referring to this scene information, and the unsteady scene is obtained as a result. The information is displayed on the display unit 22 (S106).

次に、上述の各ステップについて、図８乃至図１２を参照して詳細に説明する。 Next, each step described above will be described in detail with reference to FIGS.

Ｓ１０１のステップによって、既存のダイジェスト映像から定常シーンを抽出する方法について説明する。 A method for extracting a stationary scene from an existing digest video in step S101 will be described.

図８は、特定シーン抽出部１３において、定常シーンを抽出する方法を示すフローチャートである。図８に示すように、定常シーンの抽出は、まず、ｉ＝１として（Ｓ２０１）、このｉが、ダイジェスト情報ＤＢ１２に記録された既存のダイジェスト映像のシーン数を超えているかどうかを判断する（Ｓ２０２）。もし超えていれば、抽出動作は終了する。ここでシーン数とは、ダイジェスト映像に含まれるシーンの数をいう。ダイジェスト映像を構成する各シーンには、時系列順に１番から連続した番号がそれぞれ付与されているため、シーン数とは、実質的にはシーン番号の最大値である。 FIG. 8 is a flowchart showing a method for extracting a stationary scene in the specific scene extracting unit 13. As shown in FIG. 8, in the extraction of a stationary scene, first, i = 1 is set (S201), and it is determined whether or not i exceeds the number of existing digest video scenes recorded in the digest information DB 12 (see FIG. 8). S202). If so, the extraction operation ends. Here, the number of scenes refers to the number of scenes included in the digest video. Since each scene constituting the digest video is assigned a continuous number from the first in chronological order, the number of scenes is substantially the maximum value of the scene number.

これとは反対に、ｉがシーン数を超えていなければ、既存のダイジェスト映像におけるシーン番号がｉのシーン（以下、ｉ番目のシーンと称す）において、指定されたシーンラベルと付加情報を参照し、ｉ番目のシーンが抽出対象の定常シーンであれば、ｉ番目のシーンのシーン情報を、特定シーン情報ＲＡＭ１５に記録する（Ｓ２０３）。なお、特定シーン情報ＲＡＭ１５に記録されるシーン情報には、記録される順に、新たに連続したシーン番号が付与される。 On the other hand, if i does not exceed the number of scenes, the specified scene label and additional information are referred to in the scene with the scene number i in the existing digest video (hereinafter referred to as the i-th scene). If the i-th scene is a stationary scene to be extracted, the scene information of the i-th scene is recorded in the specific scene information RAM 15 (S203). The scene information recorded in the specific scene information RAM 15 is given a new consecutive scene number in the order of recording.

次に、ｉ＝ｉ＋１として（Ｓ２０４）、次のシーン、すなわち、ｉ＋１番目のシーンの参照に移る。 Next, i = i + 1 is set (S204), and the next scene, that is, the i + 1th scene is referred to.

以上に説明したＳ２０２〜Ｓ２０４の動作を、ｉが、抽出対象のダイジェスト映像におけるシーン数を超えるまで繰り返すことで、既存のダイジェスト映像から定常シーンが抽出される。 A stationary scene is extracted from an existing digest video by repeating the operations of S202 to S204 described above until i exceeds the number of scenes in the digest video to be extracted.

次に、Ｓ１０２のステップによって、定常シーンから特徴量ベクトルを算出する方法について説明する。 Next, a method for calculating a feature vector from a steady scene in step S102 will be described.

図９は、特徴量抽出部１６において、定常シーンを構成するそれぞれの映像単位の特徴量ベクトルを算出する方法を示すフローチャートである。 FIG. 9 is a flowchart showing a method for calculating the feature quantity vector for each video unit constituting the steady scene in the feature quantity extraction unit 16.

図９に示すように、特徴量ベクトルの算出は、まず、ｉ＝１として（Ｓ３０１）、ｉが、特定シーン情報ＲＡＭ１５に記録されている定常シーンのシーン数を超えているかどうかを判断する（Ｓ３０２）。もし超えていれば、抽出動作は終了する。 As shown in FIG. 9, in calculating the feature vector, first, i = 1 is set (S301), and it is determined whether i exceeds the number of scenes of the stationary scene recorded in the specific scene information RAM 15 (see FIG. 9). S302). If so, the extraction operation ends.

これとは反対に、ｉが、シーン数を超えていなければ、ｊ＝１として（Ｓ３０３）、このｊが、特定シーン情報ＲＡＭ１５に記録されているｉ番目のシーンを構成する映像単位の映像単位数を超えているかどうかを判断する（Ｓ３０４）。もし超えていれば、ｉ＝ｉ＋１として（Ｓ３０５）、Ｓ３０２のステップに戻る。例えば、ある１つの定常シーンからサンプリングされた映像単位が１００であれば、映像単位数は１００であるため、ｉが１００より大きければ、ｉ＝ｉ＋１として、Ｓ３０２のステップに戻る。 On the other hand, if i does not exceed the number of scenes, j = 1 is set (S303), and j is a video unit of the video unit constituting the i-th scene recorded in the specific scene information RAM 15. It is determined whether the number is exceeded (S304). If so, i = i + 1 is set (S305), and the process returns to S302. For example, if the number of video units sampled from one steady scene is 100, the number of video units is 100. If i is larger than 100, i = i + 1 is set and the process returns to step S302.

これとは反対に、ｊが、ｉ番目のシーンの映像単位数を超えていなければ、映像ＤＢ１１を参照し、ｉ番目のシーンにおけるｊ番目の映像単位の特徴量ベクトルを算出する。算出された特徴量ベクトルは、特徴量ベクトルＲＡＭ１７に記録される（Ｓ３０６）。 On the other hand, if j does not exceed the number of video units in the i-th scene, the feature vector of the j-th video unit in the i-th scene is calculated with reference to the video DB 11. The calculated feature vector is recorded in the feature vector RAM 17 (S306).

次に、特徴量ベクトルを算出し、これを特徴量ベクトルＲＡＭ１７に記録すると、ｊ＝ｊ＋１として（Ｓ３０７）、Ｓ３０４のステップに戻る。 Next, when a feature quantity vector is calculated and recorded in the feature quantity vector RAM 17, j = j + 1 is set (S307), and the process returns to the step of S304.

以上に説明したＳ３０２〜Ｓ３０８の動作を、ｊがｉ番目のシーンの映像単位数を超え、ｉが特定シーン情報ＲＡＭ１５に記録されているシーン数を超えるまで繰り返すことで、既存のダイジェスト映像から抽出された定常シーンを構成するそれぞれの映像単位の特徴量ベクトルが算出される。 Extracting from the existing digest video by repeating the operations of S302 to S308 described above until j exceeds the number of video units of the i-th scene and i exceeds the number of scenes recorded in the specific scene information RAM 15. A feature quantity vector for each video unit constituting the steady scene is calculated.

次に、Ｓ１０３のステップによって、定常シーンから算出された特徴量ベクトルから学習データを作成し、この学習データを識別器に学習させる方法について説明する。 Next, a description will be given of a method in which learning data is created from the feature amount vector calculated from the steady scene and the discriminator learns this learning data in step S103.

図１０は、識別器学習部１８において、図５Ａに示すような識別面１８１を学習させて識別器ｆｉを作成する方法を示すフローチャートである。なお、以下の説明において、特徴量ベクトルＲＡＭ１７に記録されている定常シーン数をＭとし、ｊ番目の定常シーンの映像単位数をＮｊとする点は、上述した通りである。ここではさらに、ｊ番目の定常シーンを構成する単位映像のうち、ｉ番目の映像単位の特徴量ベクトルをｖ（ｊ，ｉ）とする。 FIG. 10 is a flowchart showing a method of creating the discriminator fi by learning the discriminant surface 181 as shown in FIG. 5A in the discriminator learning unit 18. In the following description, the number of stationary scenes recorded in the feature vector RAM 17 is M, and the number of video units of the j-th stationary scene is Nj, as described above. Here, the feature quantity vector of the i-th video unit among the unit videos constituting the j-th stationary scene is assumed to be v (j, i).

図１０に示すように、特徴量ベクトルＲＡＭ１７に記録された学習データから識別面１８１を学習して識別器ｆｉを構築する方法は、まず、ｉ＝１として（Ｓ４０１）、このｉが、特徴量ベクトルＲＡＭ１７に記録された定常シーンの映像単位数Ｎｊ（ｊ＝１〜Ｍ）を超えているかどうかを判断する（Ｓ４０２）。もし超えていれば、特徴量ベクトル（学習データ）の抽出動作を終了する。 As shown in FIG. 10, the method of learning the discriminant plane 181 from the learning data recorded in the feature quantity vector RAM 17 and constructing the discriminator fi first sets i = 1 (S401). It is determined whether or not the number of video units Nj (j = 1 to M) of the stationary scene recorded in the vector RAM 17 is exceeded (S402). If so, the feature quantity vector (learning data) extraction operation is terminated.

反対に、ｉが、特徴量ベクトルＲＡＭ１７に記録された定常シーンの映像単位数Ｎｊ（ｊ＝１〜Ｍ）を超えていなければ、特徴量ベクトルＲＡＭ１７に記録されているＭ個の定常シーンの全てについて、それぞれｉ番目の映像単位から算出されたＭ個の特徴量ベクトルｖ（ｊ，ｉ）（ｊ＝１〜Ｍ）を集める。そして、集められたＭ個の特徴量ベクトルを学習データとして識別面１８１を学習し、識別器ｆｉを作成する（Ｓ４０３）。 On the other hand, if i does not exceed the number of video units Nj (j = 1 to M) of the stationary scene recorded in the feature vector RAM 17, all of the M stationary scenes recorded in the feature vector RAM 17 are used. , M feature quantity vectors v (j, i) (j = 1 to M) calculated from the i-th video unit are collected. Then, the discriminating surface 181 is learned using the collected M feature quantity vectors as learning data, and the discriminator fi is created (S403).

次に、ｉ＝ｉ＋１とし（Ｓ４０４）、次の単位映像の識別器ｆｉ＋１の学習に移る。 Next, i = i + 1 is set (S404), and the next unit video discriminator fi + 1 is moved to learning.

以上のＳ４０２〜Ｓ４０４の動作を、ｉが、特徴量ベクトルＲＡＭ１７に記録された定常シーンの映像単位数Ｎｊ（ｊ＝１〜Ｍ）を超えるまで繰り返すことで、Ｎｊ個の識別器ｆのそれぞれを学習する。 The operations of S402 to S404 are repeated until i exceeds the number of video units Nj (j = 1 to M) of the stationary scene recorded in the feature vector RAM 17, whereby each of the Nj discriminators f is changed. learn.

次に、Ｓ１０４のステップによって、ダイジェスト情報ＤＢ１２に記録された抽出対象ダイジェスト映像から、識別対象シーンを抽出し、このシーンの特徴量ベクトルを算出する方法について説明する。 Next, a description will be given of a method of extracting an identification target scene from the extraction target digest video recorded in the digest information DB 12 in step S104 and calculating a feature vector of this scene.

図１１は、抽出対象ダイジェスト映像から、識別対象シーンを抽出する方法を示すフローチャートである。 FIG. 11 is a flowchart illustrating a method for extracting an identification target scene from an extraction target digest video.

図１１に示すように、識別対象シーンの抽出は、まず抽出対象ダイジェスト映像が指定されると、ｉ＝１として（Ｓ５０１）、このｉが、ダイジェスト情報ＤＢ１２に記録された抽出対象ダイジェスト映像が有するシーン数を超えているかどうかを判断する（Ｓ５０２）。もし超えていれば、識別対象シーンの抽出を終了する。 As shown in FIG. 11, in the extraction of a scene to be identified, first, when an extraction target digest video is designated, i = 1 is set (S501), and this i is included in the extraction target digest video recorded in the digest information DB12. It is determined whether the number of scenes has been exceeded (S502). If so, the extraction of the scene to be identified is terminated.

反対に、ｉが、抽出対象ダイジェスト映像が有するシーン数を超えていなければ、抽出対象ダイジェスト映像のｉ番目のシーンを参照し、ｉ番目のシーンが、Ｓ１０１のステップで指定したシーンラベルと同一の指定したシーンラベルを有していれば、そのシーンを識別対象シーンとして、そのシーンが有するシーン情報を特定シーン情報ＲＡＭ１５に記録する（Ｓ５０３）。なお、特定シーン情報ＲＡＭ１５に記録されるシーン情報には、記録される順に、新たに連続したシーン番号が付与される。 On the contrary, if i does not exceed the number of scenes included in the extraction target digest video, the i-th scene of the extraction target digest video is referred to, and the i-th scene is the same as the scene label specified in step S101. If it has the designated scene label, the scene is recorded in the specific scene information RAM 15 with the scene as an identification target scene (S503). The scene information recorded in the specific scene information RAM 15 is given a new consecutive scene number in the order of recording.

次に、特定シーン情報ＲＡＭ１５に記録された識別対象シーンのシーン情報から、特徴量抽出部１６において、識別対象シーンの特徴量ベクトルを算出し、結果を特徴量ベクトルＲＡＭ１７に記録する（Ｓ５０４）。ここで、識別対象シーンの特徴量ベクトルの算出は、図９に示す動作と同様に実行する。 Next, the feature quantity extraction unit 16 calculates the feature quantity vector of the identification target scene from the scene information of the identification target scene recorded in the specific scene information RAM 15, and records the result in the feature quantity vector RAM 17 (S504). Here, the feature amount vector of the identification target scene is calculated in the same manner as the operation shown in FIG.

次に、ｉ＝ｉ＋１とし、次のシーンの参照に移る（Ｓ５０５）。 Next, i = i + 1 is set, and the next scene is referred to (S505).

以上のＳ５０２〜Ｓ５０５の動作を、ｉが、抽出対象ダイジェスト映像が有するシーン数を超えるまで繰り返すことで、抽出対象ダイジェスト映像から、識別対象シーンが抽出され、このシーンの特徴量ベクトルが算出される。 By repeating the operations of S502 to S505 until i exceeds the number of scenes included in the extraction target digest video, the identification target scene is extracted from the extraction target digest video, and the feature vector of this scene is calculated. .

次に、Ｓ１０５のステップによって、識別対象シーンから、非定常シーンを抽出する方法について説明する。 Next, a method for extracting an unsteady scene from the scene to be identified in step S105 will be described.

図１２は、識別対象シーンから、非定常シーンを抽出する方法を示すフローチャートである。 FIG. 12 is a flowchart showing a method for extracting a non-stationary scene from a scene to be identified.

図１２に示すように、非定常シーンの抽出は、まず、ｉ＝１として（Ｓ６０１）、ｉが、識別対象シーンの特徴量ベクトルの数Ｎ、すなわち、識別対象シーンを構成する映像単位数を超えているかどうかを判断する（Ｓ６０２）。もし超えていれば、非定常シーンの検出を終了する。 As shown in FIG. 12, in extracting a non-stationary scene, first, i = 1 is set (S601), and i is the number N of feature quantity vectors of the identification target scene, that is, the number of video units constituting the identification target scene. It is determined whether or not it exceeds (S602). If so, the detection of the unsteady scene is terminated.

反対に、ｉが、識別対象シーンを構成する映像単位数Ｎを超えていなければ、特徴量ベクトルＲＡＭ１７に記録されている識別対象シーンのｉ番目の特徴量ベクトルｖｉを、識別器ＤＢ２０に記録されている識別器ｆｉにかけ（Ｓ６０３）、識別を行う。 On the other hand, if i does not exceed the number N of video units constituting the classification target scene, the i-th feature quantity vector vi of the classification target scene recorded in the feature quantity vector RAM 17 is recorded in the discriminator DB 20. The discriminator fi is applied (S603) to perform discrimination.

ここで識別は、図６Ａ乃至図６Ｃに示したように、ｉ番目の特徴量ベクトルｖｉを特徴量空間にプロットし、学習された識別器ｆｉの識別面１８１と比較することで行われる。識別面１８１は、複数の定常シーンが有するそれぞれの特徴量ベクトルから学習されたものである。比較の結果、識別対象シーンのｉ番目の特徴量ベクトルｖｉが、識別面外であれば、識別対象シーンを非定常シーンとして識別し（Ｓ６０４）、非定常シーン情報ＤＢ２１に、識別対象シーンのシーン情報を記録する（Ｓ６０５）。 Here, as shown in FIGS. 6A to 6C, the i-th feature quantity vector vi is plotted in the feature quantity space and compared with the learned discrimination plane 181 of the classifier fi. The identification surface 181 is learned from each feature quantity vector of a plurality of stationary scenes. As a result of the comparison, if the i-th feature vector vi of the identification target scene is out of the identification plane, the identification target scene is identified as a non-stationary scene (S604), and the scene of the identification target scene is stored in the non-stationary scene information DB 21. Information is recorded (S605).

一方で、ｉ番目の特徴量ベクトルｖｉが、識別器ｆｉの識別面内にあれば、ｉ＝ｉ＋１として、次の特徴量ベクトルの識別に移る（Ｓ６０６）。 On the other hand, if the i-th feature vector vi is within the discrimination plane of the discriminator fi, i = i + 1 is set, and the next feature vector is identified (S606).

以上のＳ６０２〜Ｓ６０６の動作を、ｉが、特徴量ベクトルの数Ｎを超えるまで繰り返すことで、識別対象シーンから非定常シーンが抽出される。これとは逆に、識別の結果、識別対象シーンの特徴量ベクトルｖｉを識別器ｆｉにかけた結果、一度も非定常シーンと識別されなかった場合、その識別対象シーンは定常シーンと識別される。 By repeating the operations of S602 to S606 until i exceeds the number N of feature vectors, a non-stationary scene is extracted from the scene to be identified. On the other hand, as a result of identification, if the feature quantity vector vi of the identification target scene is applied to the discriminator fi and is not identified as an unsteady scene, the identification target scene is identified as a stationary scene.

なお、上述の識別方法は、一度でも特徴量ベクトルが、識別器に学習させた学習データである識別面の外にあると判断されれば、そのシーンを非定常シーンとして処理した。しかし、この判断は、連続して複数回識別面の外にあると判断された場合に、非定常シーンとして処理するようにしてもよい。すなわち、特徴量ベクトルが、識別器に学習させた学習データである識別面の外にあるという判断が、少なくとも一回なされれば、非定常シーンとして処理する。 In the above-described identification method, if it is determined that the feature vector is outside the identification plane, which is learning data learned by the classifier, the scene is processed as an unsteady scene. However, this determination may be processed as an unsteady scene when it is determined that the image is outside the identification plane a plurality of times in succession. That is, if it is determined at least once that the feature vector is outside the discrimination surface, which is learning data learned by the discriminator, it is processed as an unsteady scene.

また、上述の場合は、特徴量ベクトルの数と識別器の数とが同数であったが、特徴量ベクトルの数が識別器の数より多かった場合は、識別器の数だけ特徴量ベクトルを判断することで、定常／非定常を判断してもよいし、また、特徴量ベクトルの数が識別器の数より多いと認識された時点で、このシーンを非定常シーンであると判断してもよい。特にこの判断は、特徴量ベクトルの数が識別器の数より多い時点で、カメラの動きに何らかの異常が生じている可能性が高いためである。 In the case described above, the number of feature vectors and the number of classifiers are the same. However, when the number of feature vectors is larger than the number of classifiers, feature vectors corresponding to the number of classifiers are used. By determining, it may be determined whether the scene is stationary or non-stationary, and when it is recognized that the number of feature vectors is larger than the number of classifiers, this scene is determined to be an unsteady scene. Also good. This determination is particularly because there is a high possibility that some abnormality has occurred in the movement of the camera when the number of feature vectors is larger than the number of classifiers.

このように識別対象シーンから非定常シーンが抽出され、この非定常シーンのシーン情報が、非定常シーン情報ＤＢ２１に記録されると、Ｓ１０６のステップに従って、非定常シーンが結果表示部２２に表示される。 As described above, when the unsteady scene is extracted from the identification target scene and the scene information of the unsteady scene is recorded in the unsteady scene information DB 21, the unsteady scene is displayed on the result display unit 22 in accordance with the step of S106. The

以上に示す第１の実施形態に係る映像処理装置及びその方法によれば、学習手段によって、既存のダイジェスト映像から抽出された定常シーンの特徴量ベクトルを識別器に学習させる。そして、非定常シーン抽出手段において、この識別器を用いて、抽出対象ダイジェスト映像から抽出された識別対象シーンから、非定常シーンのみが抽出される。最後に、この抽出された非定常シーンを表示する。従って、抽出対象ダイジェスト映像から、視聴する価値の高い非定常シーンのみを抽出した効率的なダイジェスト映像を作成することができる。 According to the video processing apparatus and method thereof according to the first embodiment described above, the learning unit causes the classifier to learn the feature quantity vector of the stationary scene extracted from the existing digest video. Then, in the unsteady scene extracting means, only the unsteady scene is extracted from the classification target scene extracted from the extraction target digest video by using this classifier. Finally, the extracted unsteady scene is displayed. Therefore, it is possible to create an efficient digest video in which only unsteady scenes that are worth viewing are extracted from the digest video to be extracted.

（第２の実施形態）
次に、本発明の他の実施形態として、第２の実施形態に係る映像処理装置について、図面を参照して説明する。 (Second Embodiment)
Next, as another embodiment of the present invention, a video processing apparatus according to a second embodiment will be described with reference to the drawings.

図１３は、第２の実施形態に係る映像処理装置の構成を示すブロック図である。図１３に示すように、第２の実施形態に係る映像処理装置は、
映像ＤＢ１１とダイジェスト情報ＤＢ１２とに接続されたダイジェスト作成部２３を有する点が、第１の実施形態に係る映像処理装置と異なっている。 FIG. 13 is a block diagram illustrating a configuration of a video processing apparatus according to the second embodiment. As shown in FIG. 13, the video processing apparatus according to the second embodiment
The point which has the digest production | generation part 23 connected to image | video DB11 and digest information DB12 differs from the image processing apparatus which concerns on 1st Embodiment.

図１４は、第２の実施形態に係る映像処理装置の動作を示すフローチャートである。第２の実施形態に係る映像処理装置の動作に関しては、映像ＤＢ１１に記録された映像から、例えば上述した従来のダイジェスト映像作成技術によりダイジェスト映像を作成し、作成されたダイジェスト映像を図３Ｂに示すテーブルの形でダイジェスト情報ＤＢ１２に格納する（Ｓ７００）以外は、第１の実施形態に係る映像処理装置の動作と同様であるため、詳しい説明は省略する。 FIG. 14 is a flowchart showing the operation of the video processing apparatus according to the second embodiment. Regarding the operation of the video processing apparatus according to the second embodiment, for example, a digest video is created from the video recorded in the video DB 11 by the above-described conventional digest video creation technique, and the created digest video is shown in FIG. 3B. Except for storing in the digest information DB 12 in the form of a table (S700), the operation is the same as the operation of the video processing apparatus according to the first embodiment, and a detailed description thereof will be omitted.

以上に説明したように、第２の実施形態に係る映像処理装置であっても、第１の実施形態と同様の効果を得ることができる。 As described above, even the video processing apparatus according to the second embodiment can obtain the same effects as those of the first embodiment.

（第３の実施形態）
次に、本発明の他の実施形態として、第３の実施形態に係る映像処理装置について、図面を参照して説明する。 (Third embodiment)
Next, as another embodiment of the present invention, a video processing apparatus according to a third embodiment will be described with reference to the drawings.

図１５は、第３の実施形態に係る映像処理装置の構成を示すブロック図である。図１５に示すように、第３の実施形態に係る映像処理装置は、ダイジェスト作成部２３で作成されるダイジェスト映像に付与される付加情報が、例えばインターネット２４等のネットワークを通じて外部から得ることができる点が、第２の実施形態と異なっている。 FIG. 15 is a block diagram illustrating a configuration of a video processing apparatus according to the third embodiment. As shown in FIG. 15, the video processing apparatus according to the third embodiment can obtain additional information added to the digest video created by the digest creation unit 23 from the outside through a network such as the Internet 24, for example. This is different from the second embodiment.

すなわち、ダイジェスト情報ＤＢ１２には、ダイジェスト映像に付加情報を付与するための付加情報付与部２５が接続されている。この付加情報は、付加情報付与部２５に接続された付加情報ＤＢ２６に格納されており、このＤＢ２６から得ることができる。付加情報ＤＢ２６は、例えばインターネット２４等のネットワークに、付加情報入手部２７を介して接続されており、付加情報ＤＢ２６に格納される付加情報は、インターネット２４を介して外部から入手することができる。 In other words, the digest information DB 12 is connected with an additional information adding unit 25 for adding additional information to the digest video. This additional information is stored in the additional information DB 26 connected to the additional information adding unit 25, and can be obtained from this DB 26. The additional information DB 26 is connected to a network such as the Internet 24 via an additional information acquisition unit 27, and the additional information stored in the additional information DB 26 can be acquired from the outside via the Internet 24.

図１６は、上述した第３の実施形態に係る映像処理装置の動作を示すフローチャートである。第３の実施形態に係る映像処理装置の動作に関しても、ダイジェスト作成部でダイジェスト映像を作成するステップ（Ｓ８００）と、Ｓ８００で作成されたダイジェスト映像から定常シーンを抽出するステップ（Ｓ８０１）から、結果を表示するステップ（Ｓ８０６）までは、第２の実施形態と同様であるため、詳しい説明は省略する。ここでは、ダイジェスト情報ＤＢ１２に記録されているシーンのシーン情報に付加情報を付与するステップについて、説明する。 FIG. 16 is a flowchart showing the operation of the video processing apparatus according to the third embodiment described above. Regarding the operation of the video processing apparatus according to the third embodiment, the results from the step of creating a digest video by the digest creation unit (S800) and the step of extracting the steady scene from the digest video created in S800 (S801) Since the process up to the step of displaying (S806) is the same as in the second embodiment, detailed description thereof is omitted. Here, the step of adding additional information to the scene information of the scene recorded in the digest information DB 12 will be described.

まず、ダイジェスト作成部２３でダイジェスト映像が作成される（Ｓ８００）と、付加情報入手部２７は、インターネット２４を通じて付加情報を入手し、付加情報ＤＢ２６に格納する（Ｓ８００１）。例えば、インターネット上の野球のスコア情報から、「２ゴ」や「３失」などのスコアを入手する。 First, when a digest video is created by the digest creation unit 23 (S800), the additional information obtaining unit 27 obtains additional information through the Internet 24 and stores it in the additional information DB 26 (S8001). For example, a score such as “2go” or “3 lost” is obtained from score information of baseball on the Internet.

次に、付加情報付与部２５により、インターネット２４を通じて入手され、付加情報ＤＢ２６に格納された付加情報を、ダイジェスト情報ＤＢ１２に記録されているシーンのシーン情報に付与する（Ｓ８００２）。例えば、各野球の守備シーンのシーン情報に対して、「２ゴ」や「３失」などのスコア情報を付加情報として付与する。 Next, the additional information adding unit 25 adds the additional information obtained through the Internet 24 and stored in the additional information DB 26 to the scene information of the scene recorded in the digest information DB 12 (S8002). For example, score information such as “2go” or “3 lost” is added as additional information to the scene information of each baseball defensive scene.

以上に説明したＳ８００１、Ｓ８００２のステップにより、ダイジェスト映像に付加情報を付与した後は、第１の実施形態と同様である。 After adding the additional information to the digest video by the steps of S8001 and S8002 described above, the process is the same as that of the first embodiment.

このように、第３の実施形態に係る映像処理装置であっても、第１の実施形態と同様の効果を得ることができる。 Thus, even the video processing apparatus according to the third embodiment can obtain the same effects as those of the first embodiment.

以上に、本発明の実施形態に係る映像処理装置について説明した。第１、第２、第３の各実施形態による映像処理方法は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤなど）、半導体メモリなどの記録媒体に格納して頒布することができる。 The video processing apparatus according to the embodiment of the present invention has been described above. The video processing method according to each of the first, second, and third embodiments includes a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, DVD, etc.), and a semiconductor memory as programs that can be executed by a computer. It can be stored and distributed in a recording medium.

また、本発明は、上述の実施形態そのままに限定されるものではなく、その趣旨を逸脱しない範囲で自由に構成要素等を変形、削除して具体化することができる。 Further, the present invention is not limited to the above-described embodiment as it is, and can be embodied by freely modifying and deleting components and the like without departing from the spirit of the present invention.

１１・・・映像ＤＢ、１２・・・ダイジェスト情報ＤＢ、１３・・・特定シーン抽出部、１４・・・識別対象シーン抽出部、１５・・・特定シーン情報ＲＡＭ、１６・・・特徴量抽出部、１７・・・特徴量ベクトルＲＡＭ、１８・・・識別器学習部、１８１・・・識別面、１８２・・・特徴量ベクトルの集合、１９・・・非定常シーン検出部、２０・・・識別器ＤＢ、２１・・・非定常シーン情報ＤＢ、２２・・・結果表示部、２３・・・ダイジェスト作成部、２４・・・インターネット、２５・・・付加情報付与部、２６・・・付加情報ＤＢ、２７・・・付加情報入手部。 DESCRIPTION OF SYMBOLS 11 ... Video | video DB, 12 ... Digest information DB, 13 ... Specific scene extraction part, 14 ... Identification object scene extraction part, 15 ... Specific scene information RAM, 16 ... Feature-value extraction ,... Feature quantity vector RAM, 18... Classifier learning section, 181... Discriminating plane, 182... Set of feature quantity vectors, 19. Discriminator DB, 21 ... Unsteady scene information DB, 22 ... Result display unit, 23 ... Digest creation unit, 24 ... Internet, 25 ... Additional information adding unit, 26 ... Additional information DB, 27... Additional information acquisition unit.

Claims

映像コンテンツ情報を記憶する映像データベースと、
前記映像コンテンツに含まれる特定シーンから構成されるダイジェスト映像を特定するダイジェスト情報を保存するダイジェスト情報データベースと、
このダイジェスト情報データベースに保存された前記ダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出する特定シーン抽出部と、
この特定シーン抽出部により抽出された前記シーン情報に対応した前記複数の定常シーンを、前記映像データベースから抽出し、これらの複数の定常シーンをそれぞれ構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出する特徴量抽出部と、
この特徴量抽出部により抽出された前記複数の定常シーンの対応するフレーム画像毎の特徴量ベクトル集合により識別器を作成する識別器学習部と、
前記ダイジェスト情報データベースに保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出する識別対象シーン抽出部と、
この識別対象シーン抽出部により抽出された前記シーン情報に対応する前記識別対象シーンを前記映像データベースから抽出し、この識別対象シーンを構成する前記フレーム画像毎に特徴量ベクトルを抽出する前記特徴量抽出部と、
この特徴量抽出部により抽出された前記特徴量ベクトルを前記識別器に供給して比較することにより、前記識別対象シーンが非定常シーンであるか否かを識別する非定常シーン検出部と、
この非定常シーン検出部により識別された前記非定常シーンを表示する結果表示部と、
を具備することを特徴とする映像処理装置。 A video database for storing video content information;
A digest information database for storing digest information for specifying a digest video composed of a specific scene included in the video content;
A specific scene extraction unit for extracting scene information for specifying a plurality of stationary scenes composed of a plurality of frame images from the digest information stored in the digest information database;
The plurality of stationary scenes corresponding to the scene information extracted by the specific scene extracting unit are extracted from the video database, and feature amounts are obtained for each frame image of a predetermined time interval constituting each of the plurality of stationary scenes. A feature extraction unit for extracting vectors;
A discriminator learning unit that creates a discriminator based on a feature vector set for each frame image corresponding to the plurality of stationary scenes extracted by the feature amount extraction unit;
An identification target scene extraction unit for extracting scene information for identifying an identification target scene composed of a plurality of frame images from the extraction target digest information stored in the digest information database;
The feature amount extraction for extracting the identification target scene corresponding to the scene information extracted by the identification target scene extraction unit from the video database, and extracting a feature vector for each frame image constituting the identification target scene. And
A non-stationary scene detection unit for identifying whether or not the scene to be identified is a non-stationary scene by supplying the feature vector extracted by the feature amount extraction unit to the classifier for comparison;
A result display unit for displaying the unsteady scene identified by the unsteady scene detection unit;
A video processing apparatus comprising:

前記映像データベースと前記ダイジェスト情報データベースとに接続され、前記映像データベースに記録された前記映像コンテンツから前記ダイジェスト映像を作成するダイジェスト作成部をさらに具備することを特徴とする請求項１に記載の映像処理装置。 The video processing according to claim 1, further comprising: a digest creation unit that is connected to the video database and the digest information database and creates the digest video from the video content recorded in the video database. apparatus.

ネットワークを介して付加情報を入手し、この入手された前記付加情報を、前記ダイジェスト情報を保存する前記ダイジェスト情報データベースに付与する付加情報付与手段をさらに具備することを特徴とする請求項２に記載の映像処理装置。 The additional information adding means for acquiring additional information via a network and adding the acquired additional information to the digest information database for storing the digest information. Video processing equipment.

前記識別器と、前記識別対象シーンの特徴量ベクトルとを比較することにより非定常シーンを識別する手段は、前記識別対象シーンの特徴量ベクトルの少なくとも１つが、対応する前記識別器が特徴量ベクトル空間に形成する領域の範囲外である場合に、前記識別対象シーンを前記非定常シーンとして識別することを特徴とする請求項１乃至３のいずれかに記載の映像処理装置。 The means for discriminating a non-stationary scene by comparing the classifier and the feature quantity vector of the classification target scene is characterized in that at least one of the feature quantity vectors of the classification target scene corresponds to the feature quantity vector corresponding to the classifier. The video processing apparatus according to claim 1, wherein the scene to be identified is identified as the non-stationary scene when it is outside a range of a region formed in space.

特定シーン抽出部において、ダイジェスト情報データベースに保存され、映像コンテンツに含まれる特定シーンから構成されるダイジェスト映像を特定するダイジェスト情報から、複数のフレーム画像からなる複数の定常シーンを特定するシーン情報を抽出するステップと、
特徴量抽出部において、前記ステップにより抽出された前記シーン情報に対応した前記複数の定常シーンを、前記映像データベースから抽出し、これらの複数の定常シーンをそれぞれ構成する所定の時間間隔のフレーム画像毎に特徴量ベクトルを抽出するステップと、
識別器学習部において、前記特徴量ベクトルを抽出するステップにより抽出された前記複数の定常シーンの対応するフレーム画像毎の特徴量ベクトル集合により識別器を作成するステップと、
識別対象シーン抽出部において、前記ダイジェスト情報データベースに保存された抽出対象ダイジェスト情報から、複数のフレーム画像からなる識別対象シーンを特定するシーン情報を抽出するステップと、
前記特徴量抽出部において、前記識別対象シーンを特定するシーン情報を抽出するステップにより抽出された前記シーン情報に対応した前記識別対象シーンを前記映像データベースから抽出し、この識別対象シーンを構成する前記フレーム画像毎に特徴量ベクトルを抽出するステップと、
非定常シーン抽出部において、前記識別対象シーンの特徴量ベクトルを抽出するステップにより、前記識別対象シーンが非定常シーンであるか否かを識別するステップと、
結果表示部において、前記非定常シーンであるか否かを識別するステップにより識別された前記非定常シーンを表示するステップと、
を具備することを特徴とする映像処理方法。 In the specific scene extraction unit, scene information for specifying a plurality of stationary scenes composed of a plurality of frame images is extracted from the digest information for specifying a digest video composed of specific scenes included in the video content stored in the digest information database. And steps to
In the feature amount extraction unit, the plurality of stationary scenes corresponding to the scene information extracted in the step are extracted from the video database, and each frame image of a predetermined time interval constituting each of the plurality of stationary scenes is extracted. Extracting a feature vector into
In the discriminator learning unit, creating a discriminator from a feature vector set for each frame image corresponding to the plurality of stationary scenes extracted by the step of extracting the feature vector;
A step of extracting scene information for identifying a scene to be identified composed of a plurality of frame images from the digest information to be extracted stored in the digest information database in the scene to be identified for identification;
In the feature amount extraction unit, the identification target scene corresponding to the scene information extracted by the step of extracting scene information specifying the identification target scene is extracted from the video database, and the identification target scene is configured Extracting a feature vector for each frame image;
In the unsteady scene extraction unit, identifying whether the identification target scene is an unsteady scene by extracting the feature quantity vector of the identification target scene; and
In the result display unit, displaying the unsteady scene identified by the step of identifying whether or not the unsteady scene; and
A video processing method comprising: