JPH0816610A

JPH0816610A - Animation retrieving system and animation retrieving data extraction method

Info

Publication number: JPH0816610A
Application number: JP6144907A
Authority: JP
Inventors: Kenji Kawasaki; 健治川崎; Yoshiaki Morimoto; 義章森本; Tetsuo Tanaka; 哲雄田中; Akira Tanaka; 晶田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-06-27
Filing date: 1994-06-27
Publication date: 1996-01-19
Anticipated expiration: 2019-10-06
Also published as: JP3573493B2

Abstract

PURPOSE:To extract only a characteristic frame by applying both the switching of scenes and the level and the change of the sound volume as the standards for extraction of the retrieving data out of the AV data that includes the animation data and the voice data. CONSTITUTION:A scene switching detection part 1041 detects the scene switching of the AV data, and a sound volume detection part 1042 detects the sound volume out of the AV data. Then a retrieving data selection management part 1043 extracts the frames larger than a fixed sound volume level out of a detected scene as the retrieving data.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、映像を記憶している媒
体から利用者が所定の場面についての映像を検索する、
動画検索システムおよびその方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention allows a user to retrieve an image of a predetermined scene from a medium storing the image.
The present invention relates to a video search system and its method.

【０００２】[0002]

【従来の技術】近年、マルチメディアシステムの普及に
伴い、ＡＶデータを扱うシステムが次々と開発されはじ
めている。ここでＡＶデータとは、動画像を表わす動画
データと音声データを合わせ持ったデータを指す。ＡＶ
データの形式としては、動画データと音声データを時間
を基準として並列管理している形式が主流をなしてい
る。例えばApple Computer社のQuickTimeでは、動画デ
ータや音声データを別々のトラックに入れ、それらのト
ラックを動じにアクセスしながら、音声を伴った動画
を、再生している。2. Description of the Related Art In recent years, with the spread of multimedia systems, systems for handling AV data have been developed one after another. Here, the AV data refers to data having both moving image data representing moving images and audio data. AV
The mainstream data format is a format in which moving image data and audio data are managed in parallel on the basis of time. For example, Apple Computer's QuickTime puts video data and audio data in separate tracks, and while playing these tracks, it plays back video with audio.

【０００３】まず、動画データの符号化について説明す
る。動画データは、音声データと比較すると、単位時間
当りの情報量がきわめて多い。しかし、一般に連続した
動画では、前後の画像と注目している画像とはよく似て
いるので、データとして見ると冗長性が非常に多い。そ
こで、ディジタル的に高能率符号化を行い、必要な情報
量を大きく削減することができる。現在、高能率符号化
の手法としては、ＩＳＯ（国際標準化機構）／ＩＥＣ
（国際電気標準会議）のＭＰＥＧ（Moving Picture Cod
ing Experts Group）によって定められた国際標準草案
ＭＰＥＧ１が主流である。本特許はＡＶデータの検索を
行うためのものなので、膨大な量のＡＶデータを扱うこ
とが考えられる。必然的にそのＡＶデータに高能率符号
化が施されている可能性も十分有り得る。そこで、まず
ＭＰＥＧ１の符号化方式について、図１０を用いて説明
する。First, encoding of moving image data will be described. The moving image data has an extremely large amount of information per unit time as compared with the audio data. However, in a continuous moving image, in general, the images before and after the image are very similar to the image of interest, and therefore when viewed as data, there is a great deal of redundancy. Therefore, it is possible to digitally perform high-efficiency encoding and significantly reduce the required amount of information. Currently, ISO (International Organization for Standardization) / IEC is used as a high-efficiency coding method.
MPEG (Moving Picture Cod) of (International Electrotechnical Commission)
The international standard draft MPEG1 established by the ing Experts Group) is the mainstream. Since this patent is for searching AV data, it is possible to handle a huge amount of AV data. Inevitably, the AV data may be highly efficient coded. Therefore, first, the MPEG1 encoding method will be described with reference to FIG.

【０００４】先に述べたように、一般的に連続した動画
では、前後の画像と注目している画像とはよく似てい
る。そこでＭＰＥＧ１では、例えば画像１００４０を符
号化しようとする時には、時間的に前方の画像１００１
０との差分を取り、その差分値を符号化してＰピクチャ
（前方予測符号化画像）１００８０とする。また、画像
１００２０を符号化しようとする時には、時間的に前方
の画像１００１０か、後方の画像１００４０、もしくは
前方と後方から作られた補間画像との差分を取り、その
差分値を符号化してＢピクチャ（両方向予測符号化画
像）１００６０とする。同様に画像１００３０を符号化
しようとする時にも、画像１００１０と１００４０から
Ｂピクチャ１００７０をつくる。このように符号化する
ことにより、時間軸方向の冗長度を減らして、情報量を
少なくしている。また、常に差分を符号化するのではな
く、例えば入力画像１００１０をそのまま符号化して、
Ｉピクチャ（フレーム内符号化画像）１００５０とする
こともある。As described above, in a continuous moving image, the preceding and following images are very similar to the image of interest. Therefore, in MPEG1, for example, when an image 10040 is to be encoded, the temporally preceding image 1001
The difference from 0 is taken, and the difference value is encoded to form a P picture (forward predictive encoded image) 10080. Further, when the image 10020 is to be encoded, the difference between the image 10010 in the front in time, the image 10040 in the rear, or the interpolated image formed from the front and the rear is calculated, and the difference value is encoded to B. A picture (bidirectional prediction coded image) 10060 is used. Similarly, when the image 10030 is to be encoded, the B picture 10070 is created from the images 10010 and 10040. By encoding in this way, the redundancy in the time axis direction is reduced and the amount of information is reduced. Further, instead of always encoding the difference, for example, by encoding the input image 10010 as it is,
It may be an I picture (intra-frame coded image) 10050.

【０００５】また、ＭＰＥＧ１では単に同じ位置の差を
とるだけでなく、動き補償を使用する。これは、マクロ
ブロック単位で、符号化しようとする画像の前画像の中
で、符号化しようとするブロックと一番差分が少ないブ
ロックを探索し、それとの差分をとることにより、さら
に送らなければならないデータを削減する手法をいう。
ここで、マクロブロックとは１６×１６画素のブロック
である。実際には、Ｐピクチャでは動き補償後の予測画
との差分をとったものと、差分をとらないものの二者の
うちデータ量の少ないものをマクロブロック単位に選択
して符号化する。しかしこれでもまだ、物体の動いたう
しろから出てきた部分に関しては、多くのデータを送ら
なければならない。そこでＢピクチャでは、すでに復号
化された動き補償後の時間的に前方だけでなく、後方ま
たはその両者から作った補間画像との差分をとったもの
と、差分をとらないものの四者のうち一番データ量の少
ないものを同じくマクロブロック単位に選択して符号化
する。このようにすれば、ほとんどのデータは送らなく
ても済む。Also, in MPEG1, not only the same difference in position is taken, but also motion compensation is used. This is because, in macroblock units, in the previous image of the image to be encoded, the block that has the smallest difference from the block to be encoded is searched for, and the difference from that is searched for to send further. This is a method of reducing data that cannot be obtained.
Here, the macro block is a block of 16 × 16 pixels. Actually, in the P picture, the one with a small data amount of the two, that is, the one that takes the difference from the prediction image after motion compensation and the one that does not take the difference, is selected and encoded in macro block units. But this still requires sending a lot of data about the part of the object that came out from behind. Therefore, in the B picture, one of the four is obtained not only by the difference between the decoded image and the interpolated image formed from the backward or both of the temporally forward direction after the motion compensation and the backward direction. Similarly, the one with the smallest data amount is selected and encoded in macro block units. This way most of the data need not be sent.

【０００６】ＭＰＥＧ１も、動画データや音声データを
別々のトラックに入れ、それらのトラックを同時にアク
セスして、音声を伴った動画を再生している。従って、
例えば、ある時間の音声データに対応する動画データを
抽出することも可能である。Also in MPEG1, moving picture data and audio data are put in separate tracks, and those tracks are simultaneously accessed to reproduce a moving picture accompanied by sound. Therefore,
For example, it is possible to extract moving image data corresponding to audio data at a certain time.

【０００７】ところで、このようなＡＶデータを用いた
システム、特にテキスト、グラフィック、サウンド等、
各種データを利用してプレゼンテーション用などのアプ
リケーションを作成するオーサリングシステムにおいて
は、ＡＶデータの検索とその表示が重要な問題となる。
なぜなら、ＡＶデータは他のデータに比べ、再生するた
めのデータ読み込み処理に必要とされる時間が長い上
に、ＡＶデータそのものの再生時間も長い可能性がある
からである。By the way, a system using such AV data, especially text, graphics, sound, etc.
In an authoring system that uses various data to create an application for presentation and the like, retrieval of AV data and its display are important problems.
This is because AV data requires a longer time to read data for reproduction than other data, and the AV data itself may have a longer reproduction time.

【０００８】従来、ＡＶデータの検索を行うシステムに
おいて、希望するデータを取り出す機構の一つとして
は、予め登録しておいたブラウジング（拾い読み）用の
データを、使用者の指示により個々にまたは複数個一覧
表示するブラウジング検索があった。この検索方式で
は、ブラウジング用のデータを見ることにより短時間に
動画の内容が把握できるため、ＡＶデータの検索方式と
しては優れた方式である。[0008] Conventionally, in a system for searching AV data, one of the mechanisms for extracting desired data is to use pre-registered browsing data individually or in accordance with a user's instruction. There was a browsing search that displayed a list of individual items. This search method is an excellent method for searching AV data because the content of the moving image can be grasped in a short time by looking at the browsing data.

【０００９】ところで、ブラウジング検索方式の問題点
としては、予め検索用のデータを設定しておく必要があ
るので、その設定作業に非常に手間がかかるという点が
挙げられる。この問題点を解決するために、場面切替の
発生部分を検出し、その前後いずれかのフレームを検索
用の画像として検索リストに抽出、検索時にはそれらを
一覧表示する、特開平３−２９２５７２号公報に記載の
技術がある。ここでフレームとは、動画データを構成す
る最小単位の要素の静止画像である。場面（シーン）と
は、被写体が連続的に撮影されているフレームの集合体
であると定義する。場面切替の発生部分とは、異なる２
場面の接続する部分を指すものとする。例えば一般的な
テレビ動画では、１秒間は３０フレームの静止画像で構
成されている。この方式では、次に検出される場面切替
部分までの時間が一定時間以下の場面切替部分を除いて
抽出することも可能である。この方式を用いれば、１場
面につき１フレームの検索画像を抽出し、検索用のデー
タとして利用することができる。By the way, a problem with the browsing search method is that it is necessary to set search data in advance, so that the setting work is very troublesome. In order to solve this problem, a scene switching occurrence portion is detected, one of frames before and after the scene switching is extracted as a search image into a search list, and a list is displayed at the time of search. JP-A-3-292257 There is a technology described in. Here, a frame is a still image of a minimum unit element that constitutes moving image data. A scene is defined as a set of frames in which a subject is continuously photographed. 2 which is different from the scene switching part
Refers to the connecting part of the scene. For example, a general television moving image is composed of 30 frames of still images for 1 second. In this method, it is also possible to extract a scene switching portion, which is detected next time, excluding a scene switching portion whose time is equal to or less than a certain time. If this method is used, one frame of a search image can be extracted for each scene and used as search data.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、上記公
報に記載の技術においては、ＡＶデータが沢山の場面数
を持つ場合には、検索対象となる画像数もそれだけ多く
なるので、検索に時間を要するという問題がある。つま
り、単に場面切替の生じたフレームを検索用の画像とす
るだけでは、動画検索用システムおよびその方法として
は不十分である。However, in the technique described in the above publication, when the AV data has a large number of scenes, the number of images to be searched also increases, so that it takes time to search. There is a problem. In other words, merely using a frame in which scene switching has occurred as a search image is insufficient as a moving image search system and method.

【００１１】この発明は、重要な場面については、一場
面につき、複数フレームの検索画像を抽出し、きめ細か
な検索データの抽出を行うことを目的とする。An object of the present invention is to extract a plurality of frames of search images for each important scene and to extract fine search data for each important scene.

【００１２】また、この発明は、重要でない場面につい
ては、場面から検索画像を抽出せずに、検索用データ数
の削減を行うことを目的とする。It is another object of the present invention to reduce the number of search data without extracting a search image from a scene that is not important.

【００１３】[0013]

【課題を解決するための手段】上記課題を解決するため
に、本発明によれば、複数のシーケンシャルな画像フレ
ームより構成される動画像を表わす動画データと、前記
各フレ−ムのそれぞれに対応づけられるシ−ケンシャル
な音声データとを入力し、入力した動画デ−タの中か
ら、前記動画像に含まれる場面毎に、少なくとも一つの
前記フレ−ムに対応する動画デ−タを検索用データとし
て抽出する動画検索システムにおいて、入力する前記動
画データの内容に基づいて、前記各場面毎に、当該場面
を構成するフレ−ムを検出する場面切替検出手段と、入
力する前記音声データの音量を、対応するフレ−ム毎に
検出する音量検出手段と、前記場面切替検出手段により
検出された各場面について、前記音量検出手段により予
め定められた値以上の音量が検出された少なくとも一つ
のフレ−ムに対応する動画デ−タを、当該各場面につい
ての検索用データとして抽出する検索用データ抽出手段
とを備えることができる。In order to solve the above-mentioned problems, according to the present invention, moving image data representing a moving image composed of a plurality of sequential image frames and the respective frames are dealt with. Sequential audio data to be input is input, and from the input moving image data, for each scene included in the moving image, at least one moving image data corresponding to the frame is searched. In a moving image search system for extracting as data, a scene switching detecting means for detecting a frame forming the scene for each scene based on the content of the inputted moving picture data, and a volume of the input audio data. For each scene detected by the corresponding frame, and for each scene detected by the scene switching detection means, a value equal to or more than a value predetermined by the sound volume detection means. At least one deflection volume is detected - Video de corresponding arm - the motor can be provided with a search data extracting means for extracting a search data for the respective scenes.

【００１４】また、複数のシーケンシャルな画像フレー
ムより構成される動画像を表わす動画データと、前記各
フレ−ムのそれぞれに対応づけられるシ−ケンシャルな
音声データとを入力し、入力した動画デ−タの中から、
前記動画像に含まれる場面毎に、少なくとも一つの前記
フレ−ムに対応する動画デ−タを検索用データとして抽
出する動画検索システムにおいて、入力する前記動画デ
ータの内容に基づいて、前記各場面毎に、当該場面を構
成するフレ−ムを検出する場面切替検出手段と、入力す
る前記音声データの音量を、対応するフレ−ム毎に検出
する音量検出手段と、前記場面切替検出手段により検出
された各場面について、前記音量検出手段が検出した各
フレーム毎の音量に基づいて、予め定められた間隔離れ
たフレームに対する音量変化が予め定められた値以上で
ある少なくとも一つのフレームに対応する動画デ−タ
を、当該各場面についての検索用データとして抽出する
検索用データ抽出手段とを備えることもできる。Further, moving image data representing a moving image composed of a plurality of sequential image frames and sequential voice data associated with each of the frames are input, and the input moving image data is input. Out of the
In a moving image search system for extracting moving image data corresponding to at least one frame as search data for each scene included in the moving image, each of the scenes is based on the content of the input moving image data. Each time, a scene switching detecting means for detecting a frame constituting the scene, a volume detecting means for detecting the volume of the input audio data for each corresponding frame, and a scene switching detecting means for detecting the volume. For each scene, based on the volume of each frame detected by the volume detecting means, a moving image corresponding to at least one frame in which the volume change with respect to a frame separated by a predetermined interval is a predetermined value or more. It is also possible to provide a search data extracting means for extracting the data as search data for each scene.

【００１５】[0015]

【作用】まず、ＡＶデータについて説明する。ＡＶデー
タとは、動画像を表わす動画データと、対応する音声を
表わす音声データとを合わせ持ったデータである。First, the AV data will be described. The AV data is data including moving image data representing a moving image and audio data representing corresponding audio.

【００１６】ＡＶデータに対し、場面切替の発生部分
と、各フレームの音量とをそれぞれ検出する。これらの
検出した情報を元に、ＡＶデータから必要なフレームを
抽出し、これから検索用データを作成して表示する。For the AV data, a scene switching occurrence portion and the volume of each frame are detected. Based on the detected information, necessary frames are extracted from the AV data, and search data is created and displayed.

【００１７】即ち、動画検索用データの生成の為の基準
として、場面切替だけでなく音声データについても着目
する。なぜなら、音量の大きい部分や音量変化の激しい
部分は、ＡＶデータにとって、より特徴的な部分である
と考えられるからである。動画検索用データは、フレー
ム中から検索の為に必要な部分を選択し、それを結合す
ることにより生成する。That is, not only scene switching but also audio data is focused as a reference for generating moving image search data. The reason for this is that the high volume part and the high volume change part are considered to be more characteristic parts for AV data. The moving image search data is generated by selecting a necessary part for searching from the frames and combining them.

【００１８】ＡＶデータは、通常複数のシーンから構成
されており、それぞれのシーンの再生時間に応じて、そ
の内容の重要度も異なる。そこで、各シーンの再生時間
に応じて、以下のように動画検索用データを抽出する。AV data is usually composed of a plurality of scenes, and the degree of importance of its contents varies depending on the reproduction time of each scene. Therefore, the moving image search data is extracted as follows according to the reproduction time of each scene.

【００１９】再生時間の短いシーンからは、基本的には
意味を持たないと判断して検索用データを抽出しない。
ただし、該シーンの平均音量が一定値以上のものについ
ては、何らかの意味を持つものとして、該シーン全てに
ついて検索用データを抽出する。From the scene having a short reproduction time, it is basically judged that it has no meaning and the search data is not extracted.
However, if the average sound volume of the scene is equal to or higher than a certain value, the search data is extracted for all the scenes, as having some meaning.

【００２０】再生時間の長いシーンは、何らかの意味を
持つものと判断して、特徴部分を１つ抽出する。ただ
し、シーンが非常に長い場合にも特徴部分が１つしか抽
出されないといった事態を避けたい場合には、検索者の
指示により、同一シーン内から複数の特徴部分を抽出で
きるようにする。さらにこの場合、以下の２つの機能の
内のいづれかを選択できるようにする。A scene having a long reproduction time is judged to have some meaning, and one characteristic portion is extracted. However, if it is desired to avoid a situation where only one characteristic portion is extracted even when the scene is very long, a plurality of characteristic portions can be extracted from the same scene according to a searcher's instruction. Furthermore, in this case, either of the following two functions can be selected.

【００２１】（１）複数の特徴部分を抽出したい場合に
は、一定音量以上の全てのフレームを抽出する。(1) When it is desired to extract a plurality of characteristic portions, all frames having a certain volume or higher are extracted.

【００２２】（２）特徴部分を１つだけ抽出したい場合
には、該シーン中の最大音量を示すフレームを抽出す
る。(2) When it is desired to extract only one characteristic part, the frame showing the maximum volume in the scene is extracted.

【００２３】再生時間が中間的な長さのシーンは、平均
音量が一定値以上であれば、何らかの意味を持つものと
して、該シーン中の最大音量を示す部分を抽出する。平
均音量が一定値未満であっても、該シーン中に一定音量
差以上の音量変化部分が存在すれば、何らかの意味を持
つものと判断して、該シーン中の最大の音量変化を示す
部分を抽出する。それ以外のシーンは、特に意味の無い
ものと判断して、抽出しない。For a scene having an intermediate reproduction time, if the average sound volume is a certain value or more, it is assumed that it has some meaning, and the portion showing the maximum sound volume in the scene is extracted. Even if the average volume is less than a certain value, if there is a volume change portion that is equal to or more than a certain volume difference in the scene, it is determined to have some meaning, and the portion showing the maximum volume change in the scene is determined. Extract. The other scenes are judged to be meaningless and are not extracted.

【００２４】ＡＶデータは、複数のフレームによって構
成されており、それぞれのフレームごとにある大きさの
最大，最小，平均等の音量的特徴を持っている。シーン
中の音量変化を調べる場合は、あるフレームの音量を、
そのフレームとは異なるフレームの音量と比較しなけれ
ばならない。そこで、シーン中の音量変化を調べる場合
に、以下のように行なう。The AV data is composed of a plurality of frames, and has volume characteristics such as maximum, minimum, and average of a certain size for each frame. To check the volume change in a scene, set the volume of a certain frame to
It must be compared to the volume of a different frame than that frame. Therefore, when investigating a change in volume in a scene, the procedure is as follows.

【００２５】まず、音量変化を調べる場合に、すぐ前の
フレームを差分検出の対象としたのでは、ノイズの影響
により、音量差分が最大の部分を正確に判定できない場
合がある。そこで、何フレーム前の音量を差分検出時の
比較対象とするかを検索者が設定できるようにする。First, in the case of investigating a change in sound volume, if the immediately preceding frame is used as the object of difference detection, there are cases where the maximum sound volume difference cannot be accurately determined due to the influence of noise. Therefore, it is possible for the searcher to set how many frames before the volume is to be compared when the difference is detected.

【００２６】さらに、音量変化の最大部分の検出は、音
量の増加が最大の部分を検出したい場合と、音量の変化
が最も大きい部分、すなわち音量差分の絶対値が最大の
部分を検出したい場合の２通りがある。これら２通りの
内のどちらを選択するかは、検索者が設定できるように
する。Further, the detection of the maximum part of the volume change is carried out in the case where it is desired to detect the part in which the increase in the volume is maximum and in the part where the change in the volume is the largest, that is, the part where the absolute value of the volume difference is the maximum. There are two ways. The searcher can set which of these two methods is selected.

【００２７】また、検索用データは複数の特徴場面から
抽出されたフレームを結合することにより構成される。
しかし、検索用データ生成時に検索用データを大きくと
り過ぎると、検索者が参照しなければならない画像数が
増えるので、検索のために必要な時間も多くなる。そこ
で、検索用データを必要以上に抽出し過ぎないようにす
るために、以下のようにする。The search data is formed by combining the frames extracted from a plurality of characteristic scenes.
However, if the search data is too large when the search data is generated, the number of images that the searcher has to refer to increases, and the time required for the search also increases. Therefore, in order not to excessively extract the search data, the following is performed.

【００２８】まず、検索用データを抽出する各場面から
抽出されるを構成するフレームの数は、検索者が決めら
れるように、抽出時に特徴部分の前後何フレームを抽出
するかを設定できるようにする。First, as for the number of frames constituting the extracted from each scene for extracting the search data, it is possible to set how many frames before and after the characteristic portion are extracted at the time of extraction so that the searcher can decide. To do.

【００２９】また、検索用データが大き過ぎる場合、検
索者が参照しなければならない画像数が増えるので、検
索のために必要な時間も多くなる。そこで、大き過ぎる
検索用データを減らすために、以下のようにする。If the search data is too large, the number of images that the searcher has to refer to increases, and the time required for the search also increases. Therefore, in order to reduce the amount of search data that is too large, the following is performed.

【００３０】まず、検索用データの抽出を許容しうる最
大の場面数を、検索者が設定できるようにしておく。設
定場面数よりも検索用データを抽出する場面数が大きい
場合には、検索用データを抽出する場面数を減らして、
設定場面数以下にする。この場合に、最大音量が最小の
場面と音量差分が最小の場面のどちらを先に削除するか
を、検索者が設定できるようにする。First, the searcher is allowed to set the maximum number of scenes that allow extraction of search data. If the number of scenes for which search data is extracted is greater than the number of set scenes, reduce the number of scenes for which search data is extracted,
Set less than the number of scenes. In this case, the searcher can set which of the scene with the smallest maximum volume and the scene with the smallest volume difference is deleted first.

【００３１】また、検索用データとして許容しうる最大
の総フレーム数を、検索者が設定できるようにする。設
定フレーム数よりも検索用データの総フレーム数が大き
い場合には、検索用データのフレーム数を減らして、設
定フレーム数以下にする。Further, the searcher can set the maximum total number of frames that can be accepted as search data. If the total number of search data frames is larger than the set number of frames, the number of search data frames is reduced to a set number of frames or less.

【００３２】上記の、検索用データのフレーム数を設定
フレーム数以下にする機能において、逆に検索用データ
が小さくなり過ぎた場合、検索時にその内容が理解でき
なくなる恐れがある。そこで、これを回避するため、１
シーン当りのフレーム数を小さくし過ぎてその内容が理
解できなくなるようなことがないように、１シーンに最
低限必要なフレーム数も、検索者が設定できるようにす
る。In the above function of reducing the number of frames of the search data to the set number of frames or less, if the search data is too small, the contents may not be understood at the time of search. Therefore, to avoid this, 1
The minimum number of frames required for one scene can be set by the searcher so that the number of frames per scene is not too small and the contents cannot be understood.

【００３３】[0033]

【実施例】図１は本発明の一実施例における動画検索用
データ生成方式のブロック図である。図１中の１０１０
は本実施例の動画検索用データ生成方式全体を制御する
主制御部、１０２０は使用者がキーボードなどから動画
ファイルの検索指示や、検索用データ抽出時に必要とさ
れるパラメータの入力を行う入力部である。１０３０は
動画ファイルの記録、読み出しを行うデータ記憶管理
部、１０４０は動画ファイル中の特徴部分を抽出する検
索用データ生成部、１０５０は検索用データ生成部１０
４０で管理される検索データを表示する表示部、１０６
０は上記各部間のデータ交換を行う通信部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram of a moving image search data generation system according to an embodiment of the present invention. 1010 in FIG.
Is a main control unit that controls the entire moving image search data generation method of the present embodiment, and 1020 is an input unit that allows the user to instruct moving image file search from a keyboard or the like and to input parameters required when extracting the search data. Is. Reference numeral 1030 is a data storage management unit for recording and reading moving image files, 1040 is a search data generation unit for extracting characteristic portions in the moving image file, and 1050 is a search data generation unit 10.
A display unit for displaying search data managed by 40, 106
Reference numeral 0 is a communication unit for exchanging data between the above units.

【００３４】上記構成は、例えば、主制御部１０１０を
ＣＰＵで、入力部１０２０をキーボードで、データ記憶
管理部１０３０をハードディスクで、表示部１０５０を
ＣＲＴディスプレイで、検索用データ生成部１０４０を
ＲＯＭ、ＲＡＭ等に格納されＣＰＵで実行されるプログ
ラムで、通信部１０６０をバスを用いて実現することに
より、従来からよく知られた装置で構成可能である。In the above configuration, for example, the main control unit 1010 is a CPU, the input unit 1020 is a keyboard, the data storage management unit 1030 is a hard disk, the display unit 1050 is a CRT display, the search data generation unit 1040 is a ROM, The communication unit 1060 is a program stored in a RAM or the like and executed by a CPU, and can be configured by a well-known device by implementing the communication unit 1060 using a bus.

【００３５】図１を実現可能な装置構成図を、図１１に
示す。図１１において、１１０１０はＣＰＵ（中央処理
装置）、１１０２０はキーボード、１１０３０はハード
ディスク、１１０４０はＲＯＭ（リードオンリメモ
リ）、１１０５０はディスプレイ、１１０６０はバスを
示す。FIG. 11 shows an apparatus configuration diagram capable of realizing FIG. In FIG. 11, 11010 is a CPU (central processing unit), 11020 is a keyboard, 11030 is a hard disk, 11040 is a ROM (read only memory), 11050 is a display, and 11060 is a bus.

【００３６】図１に戻り１０３１、１０３２はデータ記
憶管理部１０３０を構成する。１０３１はＡＶデータを
記憶するデータ記憶部、１０３２は主制御部１０１０ま
たは検索用データ生成部１０４０からの制御でデータ記
憶部１０３１のデータを読み出すデータ読み出し部であ
る。１０４１、１０４２、１０４３は検索用データ生成
部１０４０を構成する。１０４１はデータ読み出し部１
０３２から読み出される動画ファイルの場面切替を検出
する場面切替検出部である。１０４２はデータ読み出し
部１０３２から読み出される動画ファイルのフレームご
との音量を検出する音量検出部である。１０４３は、場
面切替検出部１０４１と音量検出部１０４２の情報を元
に所定の処理を行い、その結果を管理する検索用データ
選択管理部で、当該フレーム番号または当該フレームデ
ータを保持するものとする。Returning to FIG. 1, 1031 and 1032 form a data storage management unit 1030. Reference numeral 1031 is a data storage unit that stores AV data, and 1032 is a data reading unit that reads data from the data storage unit 1031 under the control of the main control unit 1010 or the search data generation unit 1040. Reference numerals 1041, 1042, and 1043 configure a search data generation unit 1040. 1041 is a data reading unit 1
This is a scene switching detection unit that detects scene switching of the moving image file read out from 032. A volume detection unit 1042 detects the volume of each frame of the moving image file read from the data reading unit 1032. Reference numeral 1043 denotes a search data selection management unit that performs a predetermined process based on the information of the scene switching detection unit 1041 and the volume detection unit 1042, and manages the result, and holds the frame number or the frame data. .

【００３７】まず、場面切替の検出処理について説明す
る。場面切替の検出処理は、例えば、時間的に連続する
２フレーム間の相関および画像全体の並行移動を示す動
きベクトルの情報を用いて行える。詳しくは、まず１つ
のフレーム内で２次元的な小ブロックを定め、時間的に
連続する２フレーム間において、各ブロックの相関をと
り（例えば誤差の２乗）、相関があるか否かを判断する
（誤差の２乗の大きさ）。相関がある場合（誤差の２乗
が小さい）は、連続する場面（シーン）と考えられ、場
面切替は発生していない。しかし、このフレーム相関だ
けではパン（左右移動）やチルト（上下移動）等のカメ
ラの並行移動に起因するシーンを場面切替と判断してし
まうことになるため、更に画面の並行移動量、すなわち
動きベクトルを検出し、この動きベクトルに基づいて場
面切替か否かを判別する（動きベクトルが検出された場
合は連続シーン）。この動きベクトルの検出は、例えば
フレーム画像の空間的勾配と、画像間差の関係から求ま
る。動きベクトルの検出の方法に関しては、文献「画像
のデジタル信号処理：吹抜敬彦著、日刊工業新聞社」に
詳しい。First, the scene switching detection processing will be described. The scene switching detection processing can be performed using, for example, the correlation between two temporally consecutive frames and the information of the motion vector indicating the parallel movement of the entire image. Specifically, first, a two-dimensional small block is defined within one frame, and the correlation of each block (for example, the square of the error) between two temporally consecutive frames is determined to determine whether there is a correlation. Yes (square of error). When there is a correlation (the square of the error is small), it is considered as continuous scenes (scenes), and scene switching has not occurred. However, with this frame correlation alone, the scene caused by the parallel movement of the camera such as pan (horizontal movement) and tilt (vertical movement) will be judged as scene switching. A vector is detected, and it is determined whether or not the scene is switched based on the motion vector (continuous scene when the motion vector is detected). The detection of the motion vector can be obtained, for example, from the relationship between the spatial gradient of the frame image and the difference between the images. The method of detecting a motion vector is described in detail in the document "Digital signal processing of images: written by Takahiko Fukibe, Nikkan Kogyo Shimbun".

【００３８】ここで、動きベクトルの検出方法について
説明する。代表的な動きベクトルの検出方法としては、
ブロックマッチング法がある。以下、ブロックマッチン
グ法について、図１２を用いて説明する。ブロックマッ
チング法は、符号化対象画像の１マクロブロック（１６
×１６画素のブロック）と、前画像の全てのマクロブロ
ックを比較し、画像の差分が最も小さいマクロブロック
へのベクトルを動きベクトルとして採用する方法であ
る。例えばマクロブロック１２０１０の動きベクトルを
求めたい場合、前画像から読み出すマクロブロック１２
０２０〜１２０７０の位置を少しずつずらしていく（こ
の図ではｘ軸方向に２ずつずらしている）。読み出した
それぞれのブロックをマクロブロック１２０１０と比較
し、最も画像の差分の小さいブロック（この図ではブロ
ック１２０４０）を求める。ブロックマッチングは、候
補となりうるブロックの数だけ行う。この図ではベクト
ル（６，０）が動きベクトルとして採用される。Here, a method of detecting a motion vector will be described. As a typical motion vector detection method,
There is a block matching method. The block matching method will be described below with reference to FIG. The block matching method uses one macroblock (16
This is a method of comparing a block of 16 pixels) with all macroblocks of the previous image and adopting the vector to the macroblock having the smallest image difference as the motion vector. For example, to obtain the motion vector of the macroblock 12010, the macroblock 12 read from the previous image
The positions of 020 to 12070 are gradually shifted (in this figure, they are shifted by 2 in the x-axis direction). Each read block is compared with the macro block 12010 to obtain the block with the smallest image difference (block 12040 in this figure). Block matching is performed for the number of blocks that can be candidates. In this figure, the vector (6,0) is adopted as the motion vector.

【００３９】したがって、例えばＭＰＥＧ１において場
面切替を検出する場合には、Ｐピクチャにおいて、動き
補償後の予測画との差分をとったブロックと、差分をと
らないブロックの数を調べ、差分をとったブロックの数
が一定値以上であれば、動きベクトルは検出されたと考
えて、連続する場面と判断しても差し支えない。Therefore, for example, in the case of detecting a scene change in MPEG1, in the P picture, the number of blocks that have a difference from the motion-compensated predicted image and the number of blocks that do not have a difference are checked, and the difference is calculated. If the number of blocks is equal to or greater than a certain value, it is considered that the motion vector has been detected, and it may be determined that the scenes are continuous.

【００４０】従って、場面切替が発生したと判断される
のは、フレーム間相関が無く、しかも動きベクトルが認
められなかった場合となる。Therefore, it is judged that scene switching has occurred when there is no interframe correlation and no motion vector is recognized.

【００４１】ここで音量検出処理について説明する。音
量検出処理において、各フレームごとの音量値をそのま
まの値で用いると、各フレームのノイズの影響を大きく
受けてしまう。そこで、フレームごとのノイズを除去す
るために、各フレームごとの音量値は、例えば、音量を
求めたいフレームと時間的に連続する前後３フレーム、
つまり音量を求めたいフレームを含めて合計７フレーム
の平均値をとり、それを該フレームの音量値として定義
する。もちろん、この方法以外のノイズ除去方法を用い
ることも可能である。The volume detection processing will be described here. In the sound volume detection process, if the sound volume value for each frame is used as it is, it will be greatly affected by noise in each frame. Therefore, in order to remove noise for each frame, the volume value for each frame is, for example, three frames before and after the frame that is temporally continuous with the frame whose volume is to be obtained,
That is, the average value of a total of 7 frames including the frame for which the volume is to be obtained is taken and defined as the volume value of the frame. Of course, it is also possible to use a noise removal method other than this method.

【００４２】以下、各フレームの音量を検出する方法を
説明する。通常、音響波形をディジタル化するために
は、図１３に示すように、標本化、量子化、符号化を順
番に行う必要がある。標本化では、時間を細かく区切っ
て、その単位時間での波形の高さを見る。量子化では、
その波形の高さを、ある桁数の２進数の細かさに区切っ
て読む。符号化では、音響波形を、量子化によって得ら
れた値のディジタル信号に変換する。一般の音楽用コン
パクトディスクでは、標本化周波数（標本化する際、１
秒間に刻む時間の数）が４４．１ｋＨｚ、量子化ビット
数（量子化する際、音の強弱を区切る細かさ）が１６ビ
ットである。The method of detecting the volume of each frame will be described below. Usually, in order to digitize an acoustic waveform, it is necessary to perform sampling, quantization, and encoding in order as shown in FIG. In sampling, the time is finely divided and the height of the waveform at that unit time is viewed. In quantization,
The height of the waveform is divided into a certain number of binary details and read. In encoding, an acoustic waveform is converted into a digital signal having a value obtained by quantization. In a general compact disc for music, the sampling frequency (when sampling, 1
The number of times per second is 44.1 kHz, and the number of quantization bits (fineness that separates the strength of sound when quantizing) is 16 bits.

【００４３】したがって各フレームの音量は、例えば図
１４に示すように、１フレーム時間内の符号化数値群の
中で、例えば最大値１４０１０をそのフレームの音量と
して定義することにより求めることができる。もちろ
ん、１フレームに対応する時間内で、符号化された値の
平均値や最小値１４０２０をそのフレームの音量として
定義しても構わない。Therefore, the volume of each frame can be obtained by defining, for example, the maximum value 14010 as the volume of the frame in the coded numerical value group within one frame time as shown in FIG. Of course, within the time corresponding to one frame, the average value or minimum value 14020 of the coded values may be defined as the volume of that frame.

【００４４】本実施例では場面切替検出方法、音量検出
方法として、上記したフレーム間相関と動きベクトル、
フレーム音量定義を用いる。また、図８は検索用データ
の生成過程で必要となる各シーンごとの情報を格納する
シーン特徴テーブルである。テーブルの各行にはそれぞ
れの場面のシーン終了フレーム番号、シーン長、平均音
量、最大音量フレームアドレス、最大音量、最大音量差
分フレームアドレス、最大音量差分を格納する。テーブ
ルをリストによって実現することにより、テーブルのサ
イズは任意に変化させることが可能である。In this embodiment, as the scene switching detection method and the volume detection method, the above-mentioned interframe correlation and motion vector,
Use the frame volume definition. FIG. 8 is a scene feature table that stores information for each scene that is necessary in the process of generating search data. In each row of the table, the scene end frame number, scene length, average volume, maximum volume frame address, maximum volume, maximum volume difference frame address, and maximum volume difference of each scene are stored. By realizing the table by the list, the size of the table can be arbitrarily changed.

【００４５】以下、図１のブロック図の動作例を図２の
フローチャートを使って説明する。図２は動画検索用デ
ータ生成方式の動作を示すフローチャートである。ま
ず、動画ファイルの検索処理が入力部１０２０の指示に
よって開始されると、検索者は検索したい動画ファイル
を入力部１０２０で選択する（ステップ２０１０）。主
制御部１０１０は、指示された動画ファイルを検索する
ために、検索用データ生成部１０４０に対し当該動画フ
ァイルの検索用データの作成指示を行う。検索用データ
生成部１０４０では、検索用データ選択管理部１０４３
がデータ読み出し部１０３２に当該動画ファイルの読み
出し指示を行う（ステップ２０２０）。検索用データ選
択管理部１０４３は、当該動画ファイルの特徴部分を明
らかにするために、場面切替検出部１０４１と音量検出
部１０４２に対し場面切替発生情報と各フレームの音量
情報を要求し、図８に示すシーン特徴テーブルを作成す
る（ステップ２０３０）。検索用データ選択管理部１０
４３は、シーン特徴テーブルから得られる情報を元に、
当該動画ファイル中の特徴部分を図９の抽出テーブルに
抽出する（ステップ２０４０）。抽出テーブルの各行に
は検索データを構成するそれぞれの場面の最大音量、最
大音量差分、場面の開始フレーム番号、終了フレーム番
号を格納する。テーブルをリストによって実現すること
により、テーブルのサイズは任意に変化させることが可
能である。検索用データ選択管理部１０４３は、入力部
１０２０によって与えられる条件に基づいて、抽出テー
ブルから検索用のデータを選択する（ステップ２０５
０）。最後に、主制御部１０１０は表示部１０５０に検
索用データの一覧表示を指示し、表示部１０５０は抽出
テーブルに保存される各シーンの開始フレーム番号から
終了フレーム番号までのフレームを一覧表示する（ステ
ップ２０６０）。なお、これらの各部間のデータ交換
は、通信部１０６０を介して行われる。The operation example of the block diagram of FIG. 1 will be described below with reference to the flowchart of FIG. FIG. 2 is a flowchart showing the operation of the moving image search data generation method. First, when the moving image file search process is started by an instruction from the input unit 1020, the searcher selects the moving image file to be searched by the input unit 1020 (step 2010). In order to search the instructed moving image file, the main control unit 1010 instructs the search data generation unit 1040 to create search data for the moving image file. In the search data generation unit 1040, the search data selection management unit 1043
Instructs the data reading unit 1032 to read the moving image file (step 2020). The search data selection management unit 1043 requests scene switching occurrence information and volume information of each frame to the scene switching detection unit 1041 and the volume detection unit 1042 in order to clarify the characteristic portion of the moving image file, and FIG. The scene feature table shown in is created (step 2030). Search data selection management unit 10
43 is based on the information obtained from the scene feature table,
The characteristic part in the moving image file is extracted to the extraction table of FIG. 9 (step 2040). In each row of the extraction table, the maximum volume of each scene, the maximum volume difference, the start frame number, and the end frame number of each scene that make up the search data are stored. By realizing the table by the list, the size of the table can be arbitrarily changed. The search data selection management unit 1043 selects search data from the extraction table based on the condition given by the input unit 1020 (step 205).
0). Finally, the main control unit 1010 instructs the display unit 1050 to display a list of search data, and the display unit 1050 displays a list of frames from the start frame number to the end frame number of each scene stored in the extraction table ( Step 2060). Data exchange between these units is performed via the communication unit 1060.

【００４６】図２のフローチャート中のシーン特徴テー
ブル作成処理、特徴部分抽出処理、検索用データ選択処
理については、図３,図５,図７にそれぞれの処理内容例
を表すフローチャートを示す。さらに図３のフローチャ
ート中の場面切替検出処理については図４に、図５のフ
ローチャート中の同一場面内複数特徴部分抽出処理につ
いては図６に、それぞれの処理内容例を表すフローチャ
ートを示す。以下、図３から図７までのフローチャート
について、その処理内容例を順番に説明する。Regarding the scene feature table creation process, feature portion extraction process, and search data selection process in the flowchart of FIG. 2, flowcharts showing respective processing content examples are shown in FIGS. 3, 5, and 7. Further, FIG. 4 shows the scene change detection processing in the flowchart of FIG. 3, and FIG. 6 shows the same scene plural feature portion extraction processing in the flowchart of FIG. Hereinafter, with respect to the flowcharts of FIGS. 3 to 7, examples of processing contents thereof will be sequentially described.

【００４７】図３に示すシーン特徴テーブル作成処理
（ステップ２０３０）の動作例を説明する。まず、２フ
レーム間の音量差分を検出する場合に、任意の相対フレ
ームとの音量差分が検出できるように、何フレーム前の
音量との差分をとるかを指定する音量差分検出対象フレ
ーム数αを、入力部１０２０により指定する。また、α
フレーム前のフレームとの音量差分が最大となるフレー
ムを検出する場合に、音量の増加が最大となるフレーム
をとるか、それとも音量の変化、すなわち音量差分の絶
対値が最大となるフレームをとるかを、入力部１０２０
により指定する。（ステップ３０１０）。動画ファイル
の最初のフレームからのフレーム数をカウントする現フ
レーム番号に０をセットする（ステップ３０１５）。指
定ファイルの１フレーム目を読み込み（ステップ３０２
０）、初期設定値として、１シーンのフレーム数をカウ
ントするフレームカウンタに１を、１シーン中の各フレ
ームの音量の合計を表す変数総音量に０を、１シーン中
で音量が最大となるフレームがどこにあるかを表す変数
最大音量フレーム番号に現フレーム番号を、そのフレー
ムの音量を表す変数最大音量に０を、１シーン中でαフ
レーム前との音量差が最大となるフレームがどこにある
かを表す変数最大音量差分フレーム番号に現フレーム番
号を、そのフレームの音量とαフレーム前のフレームの
音量の差を表す変数最大音量差分に０をセットする（ス
テップ３０３０）。現フレーム番号に１を加える（ステ
ップ３０３５）。現在読み込んでいるフレームの音量が
最大音量よりも大きいかどうかを調べ（ステップ３０４
０）、大きい場合には、最大音量フレーム番号を現フレ
ーム番号に、最大音量を現フレームの音量値に置き換え
る（ステップ３０５０）。An operation example of the scene feature table creation process (step 2030) shown in FIG. 3 will be described. First, when the volume difference between two frames is detected, the volume difference detection target frame number α that specifies the difference with the volume of the previous frame so that the volume difference with an arbitrary relative frame can be detected. , Is specified by the input unit 1020. Also, α
When detecting the frame with the largest volume difference from the previous frame, whether the frame with the largest volume increase or the volume change, that is, the frame with the largest absolute value of the volume difference is taken. Input section 1020
Specify by. (Step 3010). The current frame number for counting the number of frames from the first frame of the moving image file is set to 0 (step 3015). Read the first frame of the specified file (step 302
0), as a default value, 1 is set to a frame counter that counts the number of frames in one scene, and 0 is set to a variable total volume that represents the total volume of each frame in one scene. Variable maximum volume indicating where the frame is located The current frame number is set in the frame number, 0 is set in the variable maximum volume indicating the volume of that frame, and where the maximum volume difference from the α frame before is in one scene. The current frame number is set to the variable maximum volume difference frame number indicating that or, and 0 is set to the variable maximum volume difference indicating the difference between the volume of that frame and the frame before the α frame (step 3030). Add 1 to the current frame number (step 3035). It is checked whether the volume of the frame currently read is higher than the maximum volume (step 304).
0) If larger, replace the maximum volume frame number with the current frame number and the maximum volume with the volume value of the current frame (step 3050).

【００４８】次に、αフレーム前のフレームとの音量差
分をとる前に、αフレーム前のフレームが同一シーン内
に存在するかどうかを見る必要があるので、フレームカ
ウンタの値がα＋１以上あるかどうかを調べる（ステッ
プ３０６０）。α＋１以上ある場合には、αフレーム前
の音量と比較することができるので、αフレーム前の音
量を検出した後（ステップ３０７０）、音量の増加が最
大となるフレームを最大音量差分フレームとしてとる場
合には、最大音量差分が”（現フレーム音量）−（αフ
レーム前の音量）”よりも小さいかどうかを調べる（ス
テップ３０８０，３０９０）。音量差分の絶対値が最大
となるフレームを最大音量差分フレームとしてとる場合
には、”（現フレーム音量）−（αフレーム前の音
量）”の絶対値よりも小さいかどうかを調べる（ステッ
プ３０８０，３１００）。小さい場合には、現音量差分
が最大音量差分となるので、最大音量差分フレーム番号
に現フレーム番号を、最大音量差分に”（現フレーム音
量）−（αフレーム前の音量）”の絶対値を、それぞれ
代入する（ステップ３１１０）。さらに、最大音量差分
の値が０よりも小さければ（ステップ３１２０）、最大
音量差分フレーム番号からαを引く（ステップ３１３
０）。これは、音量減少方向の差分をとる場合に、その
下がり始めのフレームを音量減少のフレームとして抽出
したいためである。Next, it is necessary to check whether or not the frame before the α frame exists in the same scene before taking the volume difference from the frame before the α frame, so whether the value of the frame counter is α + 1 or more. It is checked (step 3060). When α + 1 or more, the volume can be compared with the volume before the α frame. Therefore, after detecting the volume before the α frame (step 3070), the frame with the largest increase in volume is taken as the maximum volume difference frame. Is checked whether the maximum volume difference is smaller than “(current frame volume) − (volume before α frame)” (steps 3080, 3090). When the frame having the maximum absolute value of the volume difference is taken as the maximum volume difference frame, it is checked whether it is smaller than the absolute value of “(current frame volume) − (volume before α frame)” (step 3080, 3100). If it is small, the current volume difference is the maximum volume difference, so the maximum volume difference frame number is the current frame number, and the maximum volume difference is the absolute value of "(current frame volume)-(volume before α frame)". , Respectively (step 3110). Further, if the value of the maximum volume difference is smaller than 0 (step 3120), α is subtracted from the maximum volume difference frame number (step 313).
0). This is because when the difference in the volume decreasing direction is taken, it is desired to extract the frame at the beginning of the decrease as the volume decreasing frame.

【００４９】次に、総音量に現フレーム音量を加え（ス
テップ３１４０）、次フレームが存在するか否かを調べ
る。次フレームが存在する場合には、次フレームを読み
込み（ステップ３１６０）、場面切替検出処理（ステッ
プ３１７０）を経て、場面切替フラグが立っているかど
うかを見る（ステップ３１８０）。フラグが立っていな
ければ、フレームカウンタに１を加えた後（ステップ３
２１０）、再び現フレーム番号に１を加える（ステップ
３０３５）。フラグが立っている場合は、図８に示すシ
ーン特徴テーブル８０１０の新しい行に、シーン終了フ
レーム番号８０１５として現フレーム番号の値を、シー
ン長８０２０としてフレームカウンタの値を、平均音量
８０３０として”（総音量）÷（フレームカウンタ）”
の値を、最大音量フレーム番号８０４０として変数最大
音量フレーム番号の値を、最大音量８０５０として変数
最大音量の値を、最大音量差分フレーム番号８０６０と
して変数最大音量差分フレーム番号の値を、最大音量差
分８０７０として変数最大音量差分の値をそれぞれ登録
し（ステップ３１９０）、終了フラグが立っていればシ
ーン特徴テーブル作成処理を終了、立っていなければ、
再び初期設定（ステップ３０３０）に戻り、処理を継続
する。以上、図３のシーン特徴テーブル作成処理（ステ
ップ２０３０）の動作例を説明した。Next, the volume of the current frame is added to the total volume (step 3140) to check whether or not the next frame exists. If the next frame exists, the next frame is read (step 3160), and the scene switching detection process (step 3170) is performed to see if the scene switching flag is set (step 3180). If the flag is not set, add 1 to the frame counter (step 3
210) and add 1 to the current frame number again (step 3035). If the flag is set, a new line of the scene feature table 8010 shown in FIG. 8 indicates the current frame number as the scene end frame number 8015, the frame counter value as the scene length 8020, and the average volume 8030 as "( Total volume) / (frame counter) "
, The value of the variable maximum volume frame number as the maximum volume frame number 8040, the value of the variable maximum volume as the maximum volume 8050, the value of the variable maximum volume difference frame number as the maximum volume difference frame number 8060, and the maximum volume difference The value of the variable maximum volume difference is registered as 8070 (step 3190). If the end flag is set, the scene feature table creation processing is ended.
The process is continued by returning to the initial setting (step 3030) again. The operation example of the scene feature table creation process (step 2030) in FIG. 3 has been described above.

【００５０】続いて、図３に示す場面切替検出処理（ス
テップ３１７０）の動作例を、図４を用いて説明する。
まず、前フレームと現フレームの間に相関があるか（ス
テップ４０１０）、動きベクトルがあるかどうかを調べ
（ステップ４０２０）、前フレームと現フレームの間に
相関も動きベクトルもない場合には、場面切替が発生し
たものと判断して場面切替フラグを立てて（ステップ４
０３０）、場面切替検出処理を終了する。相関あるいは
動きベクトルの内、少なくとも一つ以上が検出された場
合には、前フレームと現フレームは同一シーン内にある
と判断して、場面切替フラグを降ろして（ステップ４０
４０）、場面切替検出処理を終了する。以上、図４の場
面切替検出処理（ステップ３１７０）の動作例を説明し
た。Next, an operation example of the scene change detection process (step 3170) shown in FIG. 3 will be described with reference to FIG.
First, it is checked whether there is a correlation between the previous frame and the current frame (step 4010) or whether there is a motion vector (step 4020). If there is no correlation or motion vector between the previous frame and the current frame, It is judged that the scene switching has occurred, and a scene switching flag is set (step 4
030), the scene switching detection process ends. If at least one of the correlations or motion vectors is detected, it is determined that the previous frame and the current frame are in the same scene, and the scene switching flag is lowered (step 40).
40), the scene switching detection process ends. The operation example of the scene switching detection process (step 3170) in FIG. 4 has been described above.

【００５１】続いて、図２に示す特徴部分抽出処理（ス
テップ２０４０）の動作例を図５を用いて説明する。ま
ず、特徴部分を選別するためのパラメータとして、シー
ン長ａ,ｂ、音量ｃ,ｄ,ｅ、最大音量差分ｆを、入力部
１０２０により設定する（ステップ５０１０）。加え
て、入力部１０２０により、特徴フレームの前後何フレ
ームを特徴部分として抽出するかを変数ｘで指定する。
また、シーン長がｂ以上の場合に複数の特徴部分を抽出
するか否かを入力部１０２０により指定する（ステップ
５０２０）。シーン特徴テーブル８０１０の１番目の行
を読み込み（ステップ５０３０）、シーンカウンタの値
を０に設定する（ステップ５０４０）。Next, an operation example of the characteristic portion extraction processing (step 2040) shown in FIG. 2 will be described with reference to FIG. First, scene lengths a, b, sound volumes c, d, e and maximum sound volume difference f are set by the input unit 1020 as parameters for selecting characteristic portions (step 5010). In addition, the input unit 1020 designates how many frames before and after the characteristic frame are extracted as the characteristic portion by the variable x.
Further, when the scene length is b or more, whether or not to extract a plurality of characteristic portions is designated by the input unit 1020 (step 5020). The first row of the scene feature table 8010 is read (step 5030), and the value of the scene counter is set to 0 (step 5040).

【００５２】この後、シーンの長さによる場合分けを行
う。まず、対象となるシーンがある程度以下の短いシー
ンであるかどうかを見るため、シーン長８０２０がａ以
下であるかどうかを調べ（ステップ５０５０）、シーン
長８０２０がａ以下でなければ、今度は対象となるシー
ンがある程度以上の長いシーンであるかどうかを見るた
め、シーン長８０２０がｂ以上であるかどうかを調べる
（ステップ５０８０）。シーン長８０２０がａ以下であ
る場合には、さらに平均音量８０３０がｄ以上であるか
どうかを調べる（ステップ５０６０）。平均音量８０３
０がｄ以上である場合には、シーン長８０２０が短くて
も特徴的な部分であると判断して、そのシーンの全フレ
ームを抽出テーブルに抽出するために、最大音量９０２
０に最大音量８０５０の値を、最大音量差分９０３０に
最大音量差分８０７０の値を、開始フレーム番号９０４
０に”（シーン終了フレーム番号）−（シーン長）＋
１”を、終了フレーム番号９０５０にシーン終了フレー
ム番号８０１５を、それぞれ登録（ステップ５０７０）
した後、シーンカウンタに１を加える（ステップ５１７
０）。平均音量８０３０がｄよりも小さい場合には、そ
の部分は特徴的ではないと判断して、何も行わない。After that, cases are classified according to the length of the scene. First, to see whether or not the target scene is a short scene of a certain length or less, it is checked whether or not the scene length 8020 is a or less (step 5050). If the scene length 8020 is not a or less, then the target In order to see whether or not the scene is a scene longer than a certain length, it is checked whether the scene length 8020 is b or more (step 5080). If the scene length 8020 is a or less, it is further checked whether the average volume 8030 is d or more (step 5060). Average volume 803
If 0 is greater than or equal to d, it is determined that the scene length 8020 is a characteristic portion even if it is short, and all the frames of the scene are extracted in the extraction table.
The maximum volume difference 8050 is 0, the maximum volume difference 9030 is the maximum volume difference 8070, and the start frame number 904 is
0 to "(scene end frame number)-(scene length) +
1 ", and the scene end frame number 8015 is registered in the end frame number 9050 (step 5070).
After that, 1 is added to the scene counter (step 517).
0). When the average volume 8030 is smaller than d, it is determined that the part is not characteristic and nothing is done.

【００５３】シーン長８０２０がｂ以上でない場合に
は、中間的なシーン長のシーンであると判断して、平均
音量８０３０がｅ以上であるかどうかを調べる（ステッ
プ５１３０）。平均音量８０３０がｅ以上である場合に
は、音量が最大となる部分がシーン中最も特徴的な部分
であると判断して、最大音量フレーム番号のフレームと
その前後ｘフレームを抽出テーブルに抽出するために、
最大音量９０２０に最大音量８０５０の値を、最大音量
差分９０３０に最大音量差分８０７０の値を、開始フレ
ーム番号９０４０に”（最大音量フレーム番号）−ｘ”
を、終了フレーム番号９０５０に”（最大音量フレーム
番号）＋ｘ”を、それぞれ登録し（ステップ５１４
０）、シーンカウンタに１を加える（ステップ５１７
０）。平均音量８０３０がｅよりも小さい場合には、今
度は最大音量差分８０７０がｆ以上であるかどうかを調
べる（ステップ５１５０）。最大音量差分８０７０がｆ
以上である場合には、音量の差分が最大となる部分がシ
ーン中最も特徴的な部分であると判断して、最大音量差
分フレーム番号のフレームとその前後ｘフレームを抽出
テーブルに抽出するために、最大音量９０２０に最大音
量８０５０の値を、最大音量差分９０３０に最大音量差
分８０７０の値を、開始フレーム番号９０４０に”（最
大音量差分フレーム番号）−ｘ”を、終了フレーム番号
９０５０に”（最大音量差分フレーム番号）＋ｘ”を、
それぞれ登録し（ステップ５１６０）、シーンカウンタ
に１を加える（ステップ５１７０）。最大音量差分がｆ
よりも小さい場合には、そのシーンは特徴的ではないと
判断して、何も行わない。If the scene length 8020 is not b or more, it is determined that the scene is an intermediate scene length, and it is checked whether the average volume 8030 is e or more (step 5130). When the average volume 8030 is equal to or higher than e, it is determined that the part having the maximum volume is the most characteristic part in the scene, and the frame having the maximum volume frame number and the x frames before and after the frame are extracted to the extraction table. for,
The maximum volume 9020 is the maximum volume 8050, the maximum volume difference 9030 is the maximum volume difference 8070, and the start frame number 9040 is "(maximum volume frame number) -x".
And "(maximum volume frame number) + x" are registered in the end frame number 9050 (step 514
0), 1 is added to the scene counter (step 517)
0). If the average volume 8030 is smaller than e, then it is checked whether or not the maximum volume difference 8070 is f or more (step 5150). Maximum volume difference 8070 is f
In the above case, it is determined that the part having the maximum volume difference is the most characteristic part in the scene, and the frame having the maximum volume difference frame number and the x frames before and after the frame are extracted in the extraction table. , The maximum volume 9020 is the maximum volume 8050, the maximum volume difference 9030 is the maximum volume difference 8070, the start frame number 9040 is "(maximum volume difference frame number) -x", and the end frame number 9050 is "( Maximum volume difference frame number) + x ",
Each is registered (step 5160), and 1 is added to the scene counter (step 5170). The maximum volume difference is f
If it is smaller than the above, it judges that the scene is not characteristic and does nothing.

【００５４】一方、シーン長がｂ以上である場合には、
ある程度以上長いシーンであると判断して、まず、その
シーンから複数の特徴部分を抽出するかどうかを調べる
（ステップ５０９０）。複数の特徴部分を抽出する場合
には、それらの特徴部分の数をカウントするシーンサブ
カウンタに０をセットし（ステップ５１００）、同一場
面内複数特徴部分抽出処理（ステップ５１１０）により
そのシーンの特徴部分を抽出し、シーンカウンタに”
（シーンサブカウンタ）−１”を加えた（ステップ５１
２０）後、シーンカウンタに１を加える（ステップ５１
７０）。つまり、ステップ５１２０とステップ５１７０
を合わせて考えると、シーンカウンタにシーンサブカウ
ンタの値を加えることに等しい。同一シーン内から複数
の特徴部分を抽出しない場合には、音量が最大となる部
分がシーン中最も特徴的な部分であると判断して、最大
音量フレーム番号のフレームとその前後ｘフレームを抽
出テーブルに抽出するために、最大音量９０２０に最大
音量８０５０の値を、最大音量差分９０３０に最大音量
差分８０７０の値を、開始フレーム番号９０４０に”
（最大音量フレーム番号）−ｘ”を、終了フレーム番号
９０５０に”（最大音量フレーム番号）＋ｘ”を、それ
ぞれ登録（ステップ５１４０）、シーンカウンタに１を
加える（ステップ５１７０）。On the other hand, when the scene length is b or more,
It is determined that the scene is longer than a certain length, and it is first checked whether or not a plurality of characteristic portions are extracted from the scene (step 5090). When extracting a plurality of characteristic parts, the scene sub-counter that counts the number of those characteristic parts is set to 0 (step 5100), and the feature of the scene is extracted by the plural characteristic part extraction process in the same scene (step 5110). Extract the part and add it to the scene counter
(Scene sub-counter) -1 "is added (step 51
20) After that, 1 is added to the scene counter (step 51).
70). That is, step 5120 and step 5170
Considering together, it is equivalent to adding the value of the scene sub-counter to the scene counter. If a plurality of characteristic parts are not extracted from the same scene, it is determined that the part with the highest volume is the most characteristic part in the scene, and the frame with the highest volume frame number and x frames before and after it are extracted from the extraction table. To the maximum volume 9020, the maximum volume difference 9030, the maximum volume difference 8070, and the start frame number 9040.
(Maximum volume frame number) -x "," (maximum volume frame number) + x "are registered as the end frame number 9050 (step 5140), and 1 is added to the scene counter (step 5170).

【００５５】これらのステップのいずれかを経た後、シ
ーン特徴テーブル８０１０に次の行が存在するかどうか
を調べ（ステップ５１８０）、存在する場合はその行を
読んで、再びシーン長がａ以下であるかどうか調べ（ス
テップ５０５０）、処理を継続する。存在しない場合に
は、特徴部分抽出処理を終了する。以上、図５の特徴部
分抽出処理（ステップ２０４０）の動作例を説明した。After any of these steps, the scene feature table 8010 is checked to see if there is a next row (step 5180), and if so, that row is read again and the scene length is less than or equal to a. It is checked whether there is any (step 5050), and the process is continued. If it does not exist, the characteristic portion extraction processing ends. The operation example of the characteristic portion extraction process (step 2040) in FIG. 5 has been described above.

【００５６】続いて、図５に示す同一場面内複数特徴部
分抽出処理（ステップ５１１０）の動作例を、図６を用
いて説明する。まず、複数の特徴部分を抽出するシーン
のフレーム数と同数のフラグを用意して、”（シーン終
了フレーム番号）−（シーン長）＋１”からシーン終了
フレーム番号までの番号を割り当て、各フレームと１対
１に対応付ける（ステップ６０１０）。抽出するシーン
の１番目のフレームを読み込み（ステップ６０２０）、
現フレームの音量が音量ｃ以上であるかどうかを調べる
（ステップ６０３０）。音量ｃ以上であれば、現フレー
ムとその前後のｘフレームに対応するフラグを立てる。
その後、次フレームが存在するかどうかを調べ、存在す
る場合にはそのフレームを読み込んで（ステップ６０６
０）、フレーム音量が音量ｃ以上であるかどうかを再び
調べる（ステップ６０３０）。次フレームが存在しない
場合には、時間的に連続するフラグの立っている部分が
合計幾つあるかを調べてシーンサブカウンタにその値を
セットし（ステップ６０７０）、最後にフラグに対応す
るフレームを抽出テーブルに抽出するために、最大音量
９０２０に最大音量８０５０の値を、最大音量差分９０
３０に最大音量差分８０７０の値を、開始フレーム番号
９０４０に連続部分の開始フラグ番号を、終了フレーム
番号９０５０に連続部分の終了番号を、全ての連続部分
について順次登録し（ステップ６０８０）、同一場面内
複数特徴部分抽出処理を終了する。以上、図６の同一場
面内複数特徴部分抽出処理（ステップ５１１０）の動作
例を説明した。Next, an operation example of the same-scene multiple feature portion extraction process (step 5110) shown in FIG. 5 will be described with reference to FIG. First, prepare the same number of flags as the number of frames of a scene from which a plurality of characteristic parts are extracted, and assign a number from "(scene end frame number)-(scene length) +1" to the scene end frame number to identify each frame. One-to-one correspondence (step 6010). Read the first frame of the scene to be extracted (step 6020),
It is checked whether the volume of the current frame is greater than or equal to the volume c (step 6030). If the volume is equal to or higher than the volume c, flags corresponding to the current frame and x frames before and after the current frame are set.
After that, it is checked whether or not the next frame exists, and if there is, the frame is read (step 606).
0), it is checked again whether the frame volume is not less than the volume c (step 6030). If the next frame does not exist, it is checked how many temporally consecutive flagged portions are present and the value is set in the scene sub-counter (step 6070), and finally the frame corresponding to the flag is set. In order to extract the maximum volume 9020 to the maximum volume 9020, the maximum volume difference 90
The maximum volume difference 8070 is registered as the value 30, the start flag number of the continuous portion is set as the start frame number 9040, and the end number of the continuous portion is set as the end frame number 9050 for all the continuous portions (step 6080). The inner plural feature portion extraction process is ended. The operation example of the multiple feature portion extraction process (step 5110) in the same scene in FIG. 6 has been described above.

【００５７】続いて、図２に示す検索用データ選択処理
（ステップ２０５０）の動作例を、図７を用いて説明す
る。まず、検索用データの最大場面数を指定する最大許
容限度場面数と、最大フレーム数を指定する最大許容限
度フレーム数を指定する。また、１つの特徴部分を構成
するフレームの数が小さくなり過ぎて内容が理解できな
くなることを防止するために、１シーンを構成するフレ
ーム数の下限を指定する最小必要限度フレーム数を指定
する。さらに、最大音量の大きな特徴部分と最大音量差
分の大きな特徴部分のどちらを検索用のデータとして優
先するかを指定する（ステップ７０１０）。次に、シー
ンカウンタの値が最大許容限度場面数よりも多いかどう
かを調べる（ステップ７０２０）。シーンカウンタの値
が最大許容限度場面数よりも多い場合には、抽出場面数
を減らす必要があるので、最大音量の大きな特徴シーン
を最大音量差分の大きな特徴シーンよりも検索用のデー
タとして優先するかどうかを調べ（ステップ７０３
０）、優先する場合には最大音量差分が一番小さいシー
ンを、優先しない場合には最大音量が一番小さいシーン
を抽出テーブルからそれぞれ削除（ステップ７０４０）
（ステップ７０５０）した後、シーンカウンタから１を
引いて（ステップ７０５５）、再びステップ７０２０に
戻り、シーンカウンタの値が最大許容限度場面数以下に
なるまでこれを繰り返す。この際、最大音量または最大
音量差分が一番小さいシーンが複数ある場合には、終了
フレーム番号が一番小さいシーンを削除する。Next, an operation example of the search data selection process (step 2050) shown in FIG. 2 will be described with reference to FIG. First, the maximum allowable number of scenes that specifies the maximum number of scenes of the search data and the maximum allowable number of frames that specifies the maximum number of frames are specified. Further, in order to prevent the number of frames forming one characteristic portion from becoming too small and making it impossible to understand the contents, the minimum necessary limit frame number that specifies the lower limit of the number of frames forming one scene is specified. Further, which one of the characteristic part having the largest maximum volume and the characteristic part having the largest maximum volume difference is designated as the search data is designated (step 7010). Next, it is checked whether the value of the scene counter is larger than the maximum allowable number of scenes (step 7020). When the value of the scene counter is greater than the maximum allowable number of scenes, it is necessary to reduce the number of extracted scenes. Therefore, the characteristic scene with a large maximum volume is prioritized as the data for retrieval over the characteristic scene with a large maximum volume difference. Check whether or not (step 703)
0), if priority is given, the scene with the smallest maximum volume difference is deleted from the extraction table, and if it is not given priority, the scene with the smallest maximum volume difference is deleted from the extraction table (step 7040).
After (step 7050), 1 is subtracted from the scene counter (step 7055), the process returns to step 7020, and this is repeated until the value of the scene counter becomes equal to or less than the maximum allowable limit number of scenes. At this time, if there are a plurality of scenes having the smallest maximum volume or the maximum volume difference, the scene having the smallest end frame number is deleted.

【００５８】抽出場面数が最大許容限度場面数以下の場
合には、まず、抽出テーブルの各行の”（終了フレーム
番号）−（開始フレーム番号）＋１”の総和である抽出
フレーム数を計算して（ステップ７０５８）、削除禁止
シーンカウンタに０をセットする（ステップ７０６
０）。削除禁止シーンカウンタは、１シーンを構成する
フレームの数が、最低必要限度フレーム数よりも小さく
なったシーンの数をカウントするカウンタである。次
に、削除禁止シーンカウンタの値がシーンカウンタの値
に等しいかどうかを調べる（ステップ７０７０）。等し
い場合には、削除できるシーンが存在しないことを意味
するので、検索用データ選択処理を終了する。等しくな
い場合には、抽出フレーム数が最大許容限度フレーム数
よりも大きいかどうかを調べる（ステップ７０８０）。
抽出フレーム数が最大許容限度フレーム数以下の場合に
は、フレーム削除の必要はないので、検索用データ選択
処理を終了する。When the number of extracted scenes is equal to or less than the maximum allowable number of scenes, first, the number of extracted frames which is the sum of "(end frame number)-(start frame number) +1" in each row of the extraction table is calculated. (Step 7058), 0 is set to the deletion prohibited scene counter (step 706).
0). The deletion-prohibited scene counter is a counter that counts the number of scenes in which the number of frames making up one scene is smaller than the minimum required limit number of frames. Next, it is checked whether the value of the deletion prohibited scene counter is equal to the value of the scene counter (step 7070). If they are equal, it means that there is no scene that can be deleted, and the search data selection process is terminated. If they are not equal, it is checked whether the number of extracted frames is larger than the maximum allowable limit number of frames (step 7080).
If the number of extracted frames is equal to or less than the maximum allowable limit number of frames, there is no need to delete the frames, and the search data selection process ends.

【００５９】抽出フレーム数が最大許容限度フレーム数
よりも大きい場合には、抽出フレーム数を削減しなけれ
ばならないので、まず、１つ目のシーンのシーン長を”
（終了フレーム番号）−（開始フレーム番号）＋１”よ
り計算して（ステップ７０９０）、シーン長が最低必要
限度フレーム数よりも大きいかどうかを調べる（ステッ
プ７１００）。シーン長が最低必要限度フレーム数より
も大きい場合には、そのシーンからのフレーム削減が可
能なので、シーンの最前後１フレームずつを削除するた
めに、シーンの開始フレーム番号に１を加え、終了フレ
ーム番号から１を引き（ステップ７１１０）、抽出フレ
ーム数から２を引く（ステップ７１１５）。シーン長が
最低必要限度フレームよりも大きくない場合には、それ
以上フレーム数を削減するとそのシーンの内容が理解で
きなくなるので、そのシーンからのフレーム削除は不可
能と見なして、削除禁止カウンタに１を加える（ステッ
プ７１２０）。次に、抽出テーブルに次シーンが存在す
るかどうかを調べる（ステップ７１３０）。存在する場
合には、次シーンのシーン長を計算して（ステップ７１
４０）、ステップ７１００から再び、次シーンが存在し
なくなるまでこれを繰り返す。次シーンが存在しなくな
った場合には、ステップ７０７０に戻り、削除禁止シー
ンカウンタがシーンカウンタに等しく、かつ抽出フレー
ム数が最大許容限度フレーム数以下になるまで、この処
理を繰り返す。以上、図７の検索用データ選択処理（ス
テップ２０５０）の動作例を説明した。When the number of extracted frames is larger than the maximum allowable limit number of frames, it is necessary to reduce the number of extracted frames. Therefore, first, the scene length of the first scene is set to "
(End frame number)-(start frame number) +1 "(step 7090) to check whether the scene length is greater than the minimum required maximum number of frames (step 7100). If it is larger than that, it is possible to reduce frames from the scene. Therefore, in order to delete the front and rear frames of the scene one by one, 1 is added to the start frame number of the scene and 1 is subtracted from the end frame number (step 7110). ), And subtract 2 from the number of extracted frames (step 7115) If the scene length is not larger than the minimum required limit frame, further reduction of the number of frames will make the contents of the scene incomprehensible, so The frame deletion is regarded as impossible, and 1 is added to the deletion prohibition counter (step 7120). , The extraction table determine whether the next scene exists (step 7130). If present, it calculates the scene length of the next scene (step 71
40), the process is repeated from step 7100 until there is no next scene. When the next scene does not exist, the process returns to step 7070, and this processing is repeated until the deletion prohibited scene counter is equal to the scene counter and the number of extracted frames becomes equal to or less than the maximum allowable limit frame number. The operation example of the search data selection process (step 2050) in FIG. 7 has been described above.

【００６０】上記実施例では、検索用データ生成処理を
検索時に毎回行うものとしているが、図１の検索用デー
タ選択管理部１０４３内に検索後も抽出テーブルを保持
する機能を持たせれば、毎回行う必要はない。この際に
は、主制御部１０１０がブラウズ指示を受けた動画ファ
イルに関し検索用データ選択管理部１０４３をチェック
し、既に検索用データが生成されている場合には、その
検索用データに基づいて直接表示する。生成されていな
い場合にのみ検索用データ生成部１０４０で、検索用デ
ータの作成を行う。In the above embodiment, the search data generation process is performed every time the search is performed. However, if the search data selection management unit 1043 shown in FIG. No need to do. At this time, the main control unit 1010 checks the search data selection management unit 1043 for the moving image file for which the browse instruction is received, and if the search data has already been generated, it is directly searched based on the search data. indicate. The search data generation unit 1040 creates search data only when the search data has not been generated.

【００６１】また、上記実施例では、検索用データ生成
の為の各種パラメータ設定は、使用者が全て図１の入力
部１０２０により行うものとしているが、検索用データ
選択管理部１０４３内で保持されるシーン特徴テーブル
の情報をパラメータとして利用することも可能である。
例えば、図５のｃに、そのシーンの平均音量の１．５倍
の値を使用することもできる。これは、ｃをステップ５
０１０で設定せず、同一場面内複数特徴部分抽出処理５
１１０の開始時に、ｃ＝（平均音量）×１．５を設定す
ればよい。また、ステップ５０１０で、シーン長ｂとし
て動画ファイル全体のシーン長の１０％の値を用いた
り、音量ｅとして動画ファイル全体の平均音量の２倍の
値を用いることも可能である。これらの場合は、ｂ＝
（シーン特徴テーブルのシーン長総和）×０．１、およ
びｅ＝〔｛（シーン長×平均音量）の総和｝÷（シーン
長の総和）〕×２を、図５のステップ５０１０で設定す
れば良い。Further, in the above-described embodiment, the user sets all the various parameters for generating the search data by using the input unit 1020 of FIG. 1, but is held in the search data selection management unit 1043. It is also possible to use the information of the scene feature table as a parameter.
For example, a value of 1.5 times the average volume of the scene can be used for c in FIG. This is step 5 for c
No setting in 010, multiple feature part extraction processing in the same scene 5
At the start of 110, c = (average volume) × 1.5 may be set. Also, in step 5010, it is possible to use a value of 10% of the scene length of the entire moving image file as the scene length b, and a value of twice the average volume of the entire moving image file as the volume e. In these cases, b =
If (total sum of scene lengths in scene feature table) × 0.1 and e = [(total sum of (scene length × average volume)} ÷ (total sum of scene lengths)] × 2 are set in step 5010 of FIG. good.

【００６２】また、現在ＡＶデータを扱う場合、映像を
蓄積する媒体としてはＶＴＲ等のテープが主流である。
記憶媒体としてのテープは、シーケンシャルアクセス
で、低速なマスストレージといえる。このような媒体を
利用したシステムでは、検索の効果を高めるために、検
索用データ選択管理部１０４３において、検索用データ
を高速ランダムアクセスが可能な別の記憶媒体（ハード
ディスク等）に蓄積することが考えられる。この方法に
よれば、動画ファイルの高速検索が可能になり、使用者
は必要シーンのみをテープから再生することができる。When currently handling AV data, tapes such as VTRs are the mainstream as a medium for storing video.
A tape as a storage medium has sequential access and can be said to be low-speed mass storage. In a system using such a medium, in order to enhance the effect of the search, the search data selection management unit 1043 may store the search data in another storage medium (such as a hard disk) that is capable of high-speed random access. Conceivable. According to this method, the moving image file can be searched at high speed, and the user can reproduce only the necessary scene from the tape.

【００６３】また、デジタル映像編集装置等では、アナ
ログテープから装置内のデジタルメディア（ハードディ
スク等）にＡ／Ｄ変換して格納させる（ダウンロード）
作業があり、本実施例での検索用データ生成処理をこの
作業時に行うものとしてもよい。そうすればダウンロー
ド終了後、直ちに動画ファイルの高速検索が可能とな
る。In a digital video editing device or the like, an analog tape is A / D converted and stored in a digital medium (hard disk or the like) in the device (download).
There is work, and the search data generation process in this embodiment may be performed during this work. By doing so, high-speed search for moving image files will be possible immediately after downloading.

【００６４】以上説明したように、本実施例によれば、
場面切り替えの生じたフレームと、フレームの音声情報
とに基づいて、検索用の画像（動画）を抽出し、任意の
ＡＶデータ（ファイル）の検索が実現可能になる。As described above, according to this embodiment,
It is possible to extract an image (moving image) for search based on the frame in which the scene switching has occurred and the audio information of the frame, and to search for arbitrary AV data (file).

【００６５】これにより、従来のように、検索を行う前
にあらかじめ使用者が検索したい特徴部分を指定して検
索用データを作成しておく必要がなくなり、使用者の作
業量を低減できるという効果が得られる。As a result, unlike the prior art, it is not necessary for the user to specify the characteristic portion to be searched in advance and to create the search data before the search is performed, and the work amount of the user can be reduced. Is obtained.

【００６６】[0066]

【発明の効果】上記のように、本発明によれば、重要な
（特徴的な）場面については、一場面につき複数フレー
ムを検出用データとして抽出し、検出用データをきめ細
かく抽出することができる。As described above, according to the present invention, for important (characteristic) scenes, a plurality of frames can be extracted as detection data for each scene, and the detection data can be extracted finely. .

【００６７】また、本発明によれば、重要でない場面に
ついては、該場面から検索用データを抽出せずに、検索
用データ数の削減を行うことができる。Further, according to the present invention, for a scene that is not important, the number of search data can be reduced without extracting the search data from the scene.

【図面の簡単な説明】[Brief description of drawings]

【図１】図１は、本発明の実施例に係る動画検索用デ
ータ生成方式の全体的な構成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a moving image search data generation method according to an embodiment of the present invention.

【図２】図２は、図１の実施例の動作を示すフローチ
ャートである。FIG. 2 is a flowchart showing the operation of the embodiment shown in FIG.

【図３】図３は、図２のフローチャート中のシーン特
徴テーブル作成処理の動作を示すフローチャートであ
る。FIG. 3 is a flowchart showing an operation of scene feature table creation processing in the flowchart of FIG.

【図４】図４は、図３のフローチャート中の場面切替
検出処理の動作を示すフローチャートである。FIG. 4 is a flowchart showing an operation of a scene switching detection process in the flowchart of FIG.

【図５】図５は、図２のフローチャート中の特徴シー
ン抽出処理の動作を示すフローチャートである。5 is a flowchart showing an operation of a characteristic scene extraction process in the flowchart of FIG.

【図６】図６は、図５のフローチャート中の同一場面
内複数特徴部分抽出処理の動作を示すフローチャートで
ある。6 is a flowchart showing an operation of a process for extracting a plurality of characteristic portions within the same scene in the flowchart of FIG. 5;

【図７】図７は、図２のフローチャート中の検索用デ
ータ選択処理の動作を示すフローチャートである。FIG. 7 is a flowchart showing an operation of search data selection processing in the flowchart of FIG.

【図８】図８は、検索用データ生成の過程で作成され
るシーン特徴テーブルである。FIG. 8 is a scene feature table created in the process of generating search data.

【図９】図９は、検索用データ生成の過程で作成され
る抽出テーブルである。FIG. 9 is an extraction table created in the process of generating search data.

【図１０】図１０は、ＭＰＧ１の符号化方式に関する
説明図である。FIG. 10 is an explanatory diagram regarding an encoding method of MPG1.

【図１１】図１１は、本発明の実現可能な装置構成図
である。FIG. 11 is a device configuration diagram in which the present invention can be realized.

【図１２】図１２は、ブロックマッチング法に関する
説明図である。FIG. 12 is an explanatory diagram regarding a block matching method.

【図１３】図１３は、各フレームの音量を検出する方
法に関する説明図である。FIG. 13 is an explanatory diagram related to a method of detecting the volume of each frame.

【図１４】図１４は、各フレームの音量を検出する方
法に関する説明図である。FIG. 14 is an explanatory diagram related to a method of detecting the volume of each frame.

【符号の説明】[Explanation of symbols]

１０１０…主制御部１０２０…入力部１０３０…データ記憶管理部１０３１…データ記憶部１０３２…データ読み出し部１０４０…検索用データ生成部１０４１…場面切替検出部１０４２…音量検出部１０４３…検索用データ選択管理部１０５０…表示部１０６０…通信部 1010 ... Main control unit 1020 ... Input unit 1030 ... Data storage management unit 1031 ... Data storage unit 1032 ... Data reading unit 1040 ... Search data generation unit 1041 ... Scene switching detection unit 1042 ... Volume detection unit 1043 ... Search data selection management Part 1050 ... Display part 1060 ... Communication part

フロントページの続き (72)発明者田中晶神奈川県川崎市麻生区王禅寺1099 株式会社日立製作所システム開発研究所内Front Page Continuation (72) Inventor Akira Tanaka 1099 Ozenji, Aso-ku, Kawasaki-shi, Kanagawa Hitachi, Ltd. System Development Laboratory

Claims

【特許請求の範囲】[Claims]

【請求項１】複数のシーケンシャルな画像フレームより
構成される動画像を表わす動画データと、前記各フレ−
ムのそれぞれに対応づけられるシ−ケンシャルな音声デ
ータとを入力し、入力した動画デ−タの中から、前記動
画像に含まれる場面毎に、少なくとも一つの前記フレ−
ムに対応する動画デ−タを検索用データとして抽出する
動画検索システムにおいて、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出する場面切替
検出手段と、入力する前記音声データの音量を、対応するフレ−ム毎
に検出する音量検出手段と、前記場面切替検出手段により検出された各場面につい
て、前記音量検出手段により予め定められた値以上の音
量が検出された少なくとも一つのフレ−ムに対応する動
画デ−タを、当該各場面についての検索用データとして
抽出する検索用データ抽出手段と、を備えることを特徴とする動画検索システム。1. A moving image data representing a moving image composed of a plurality of sequential image frames, and each of the frames.
Sequential audio data associated with each of the frames is input, and at least one of the frames is included in the input moving image data for each scene included in the moving image.
In a moving image search system for extracting moving image data corresponding to a frame as search data, a scene switching for detecting a frame forming the scene for each scene based on the content of the inputted moving image data. Detection means, volume detection means for detecting the volume of the input audio data for each corresponding frame, and for each scene detected by the scene switching detection means, a value predetermined by the volume detection means A moving image search system, comprising: search data extracting means for extracting, as search data for each scene, moving image data corresponding to at least one frame in which the above sound volume is detected. .

【請求項２】複数のシーケンシャルな画像フレームより
構成される動画像を表わす動画データと、前記各フレ−
ムのそれぞれに対応づけられるシ−ケンシャルな音声デ
ータとを入力し、入力した動画デ−タの中から、前記動
画像に含まれる場面毎に、少なくとも一つの前記フレ−
ムに対応する動画デ−タを検索用データとして抽出する
動画検索システムにおいて、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出する場面切替
検出手段と、入力する前記音声データの音量を、対応するフレ−ム毎
に検出する音量検出手段と、前記場面切替検出手段により検出された各場面につい
て、前記音量検出手段が検出した各フレーム毎の音量に
基づいて、予め定められた間隔離れたフレームに対する
音量変化が予め定められた値以上である少なくとも一つ
のフレームに対応する動画デ−タを、当該各場面につい
ての検索用データとして抽出する検索用データ抽出手段
と、を備えることを特徴とする動画検索システム。2. A moving image data representing a moving image composed of a plurality of sequential image frames, and each of the frames.
Sequential audio data associated with each of the frames is input, and at least one of the frames is included in the input moving image data for each scene included in the moving image.
In a moving image search system for extracting moving image data corresponding to a frame as search data, a scene switching for detecting a frame forming the scene for each scene based on the content of the inputted moving image data. Detection means, volume detection means for detecting the volume of the input audio data for each corresponding frame, and for each scene detected by the scene switching detection means, for each frame detected by the volume detection means. On the basis of the sound volume of, the moving picture data corresponding to at least one frame whose sound volume change with respect to a frame spaced by a predetermined distance is equal to or more than a predetermined value is extracted as search data for each scene. A moving image search system comprising: a search data extracting unit.

【請求項３】請求項１または２記載の動画検索システム
において、前記検索用データ抽出手段は、前記場面切替検出手段により検出された場面のうち、前
記音量検出手段により検出された音量に基づいて前記場
面中における平均音量を算出する算出手段を備え、該算出手段により算出された平均音量が予め定められた
音量以上の場面についてのみ検索用データを抽出するこ
とを特徴とする動画検索システム。3. The moving image search system according to claim 1 or 2, wherein the search data extracting means is based on the volume detected by the volume detecting means among the scenes detected by the scene switching detecting means. A moving image search system comprising: a calculating means for calculating an average volume in the scene, and extracting search data only for a scene in which the average volume calculated by the calculating means is equal to or higher than a predetermined volume.

【請求項４】請求項１記載の動画検索システムにおい
て、前記検索用データ抽出手段は、前記場面切替検出手段により検出された各場面につい
て、最大音量を有するフレームを検出する検出手段を備
え、該検出手段により検出されたフレームに対応する動画デ
ータを、前記各場面についての検索用データとして抽出
することを特徴とする動画検索システム。4. The moving image search system according to claim 1, wherein the search data extraction means includes a detection means for detecting a frame having the maximum volume for each scene detected by the scene switching detection means. A moving image search system, wherein moving image data corresponding to a frame detected by the detection means is extracted as search data for each scene.

【請求項５】請求項２記載の動画検索システムにおい
て、前記検索用データ抽出手段は、前記場面切替検出手段により検出された各場面につい
て、予め定められた間隔の離れたフレームに対する音量
変化が最大であるフレームに対応する動画データを、前
記各場面についての検索用データとして抽出することを
特徴とする動画検索システム。5. The moving image search system according to claim 2, wherein the search data extraction unit has a maximum change in volume with respect to each frame detected by the scene switching detection unit with respect to frames spaced apart by a predetermined interval. The video search system is characterized by extracting video data corresponding to a frame as search data for each scene.

【請求項６】複数のシーケンシャルな画像フレームより
構成される動画像を表わす動画データと、前記各フレ−
ムのそれぞれに対応づけられるシ−ケンシャルな音声デ
ータとを入力し、入力した動画デ−タの中から、前記動
画像に含まれる場面毎に、少なくとも一つの前記フレ−
ムに対応する動画デ−タを検索用データとして抽出する
動画検索システムにおいて、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出する場面切替
検出手段と、入力する前記音声データの音量を、対応するフレ−ム毎
に検出する音量検出手段と、前記場面切替検出手段により検出された各場面につい
て、前記音量検出手段により予め定められた値以上の音
量が検出された少なくとも一つのフレ−ムに対応する動
画デ−タを、当該各場面についての検索用データとして
抽出する第１の検索用データ抽出手段と、前記場面切替検出手段により検出された各場面につい
て、前記音量検出手段が検出した各フレーム毎の音量に
基づいて、予め定められた間隔離れたフレームに対する
音量変化が予め定められた値以上である少なくとも一つ
のフレームに対応する動画デ−タを、当該各場面につい
ての検索用データとして抽出する第２の検索用データ抽
出手段と、前記第１の検索用データ抽出手段または第２の検索用デ
ータ抽出手段の一方を、外部からの指示により切替て有
効化する手段と、を備えることを特徴とする動画検索システム。6. A moving image data representing a moving image composed of a plurality of sequential image frames, and each frame.
Sequential audio data associated with each of the frames is input, and at least one of the frames is included in the input moving image data for each scene included in the moving image.
In a moving image search system for extracting moving image data corresponding to a frame as search data, a scene switching for detecting a frame forming the scene for each scene based on the content of the inputted moving image data. Detection means, volume detection means for detecting the volume of the input audio data for each corresponding frame, and for each scene detected by the scene switching detection means, a value predetermined by the volume detection means First search data extracting means for extracting moving image data corresponding to at least one frame in which the above sound volume is detected as search data for each scene, and detected by the scene switching detection means. With respect to each of the scenes, a change in volume for frames spaced by a predetermined interval is predetermined based on the volume of each frame detected by the volume detecting means. Second search data extracting means for extracting, as search data for each scene, moving image data corresponding to at least one frame having a value greater than or equal to the calculated value; and the first search data extracting means. Alternatively, the moving image search system is provided with a means for switching one of the second search data extracting means and enabling it by an instruction from the outside.

【請求項７】請求項１または２記載の動画検索システム
において、前記検索用データ抽出手段は、前記場面切替検出手段により検出された各場面から抽出
するフレームの数の最大値を、外部から指示される指示
手段を備え、該指示手段により指示された数以下に、前記各場面につ
いて抽出するフレーム数を制限する手段を有することを
特徴とする動画検索システム。7. The moving image search system according to claim 1, wherein the search data extraction unit externally indicates a maximum number of frames to be extracted from each scene detected by the scene switching detection unit. The moving image search system, further comprising: means for limiting the number of frames to be extracted for each of the scenes, the number being equal to or less than the number instructed by the instruction means.

【請求項８】請求項１〜７記載の動画検索システムにお
いて、前記検索用データ抽出手段は、前記検索用データを抽出
する場面数の最大値を外部から指示される指示手段を備
え、該指示手段により指示された数以下に、検索用データを
抽出する場面数を制限する手段を有することを特徴とす
る動画検索システム。8. The moving image search system according to any one of claims 1 to 7, wherein the search data extraction means includes an instruction means for externally instructing a maximum value of the number of scenes for extracting the search data. A moving image search system comprising means for limiting the number of scenes from which search data is extracted to be equal to or less than the number designated by the means.

【請求項９】複数のシーケンシャルな画像フレームより
構成される動画像を表わす動画データと、前記各フレ−
ムのそれぞれに対応づけられるシ−ケンシャルな音声デ
ータとを入力し、入力した動画デ−タの中から、前記動
画像に含まれる場面毎に、少なくとも一つの前記フレ−
ムに対応する動画デ−タを検索用データとして抽出する
動画検索システムにおいて、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出する場面切替
検出手段と、入力する前記音声データの音量を、対応するフレ−ム毎
に検出する音量検出手段と、前記場面切替検出手段により検出された各場面につい
て、前記音量検出手段により予め定められた値以上の音
量が検出されたフレ−ムとその近傍のフレームに対応す
る動画デ−タを、前記各場面についての検索用データと
して抽出する検索用データ抽出手段と、を備えることを特徴とする動画検索システム。9. A moving image data representing a moving image composed of a plurality of sequential image frames, and each frame.
Sequential audio data associated with each of the frames is input, and at least one of the frames is included in the input moving image data for each scene included in the moving image.
In a moving image search system for extracting moving image data corresponding to a frame as search data, a scene switching for detecting a frame forming the scene for each scene based on the content of the inputted moving image data. Detection means, volume detection means for detecting the volume of the input audio data for each corresponding frame, and for each scene detected by the scene switching detection means, a value predetermined by the volume detection means A moving picture data corresponding to the frame in which the sound volume is detected and the frames in the vicinity thereof, as search data extracting means for extracting the moving picture data, and a moving picture characterized by comprising: Search system.

【請求項１０】複数のシーケンシャルな画像フレームよ
り構成される動画像を表わす動画データと、前記各フレ
−ムのそれぞれに対応づけられるシ−ケンシャルな音声
データとを入力し、入力した動画デ−タの中から、前記
動画像に含まれる場面毎に、少なくとも一つの前記フレ
−ムに対応する動画デ−タを検索用データとして抽出す
る動画検索システムにおいて、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出する場面切替
検出手段と、入力する前記音声データの音量を、対応するフレ−ム毎
に検出する音量検出手段と、前記場面切替検出手段により検出された各場面につい
て、予め定められた間隔離れたフレームに対する音量変
化が予め定められた値以上であるフレームとその近傍の
フレームに対応する動画デ−タを、前記各場面について
の検索用データとして抽出する検索用データ抽出手段
と、を備えることを特徴とする動画検索システム。10. Moving image data representing a moving image composed of a plurality of sequential image frames and sequential audio data associated with each of the frames are inputted, and the inputted moving image data is inputted. In a moving image search system for extracting, as search data, moving image data corresponding to at least one of the frames for each scene included in the moving image, based on the contents of the input moving image data. Then, for each of the scenes, scene switching detection means for detecting a frame forming the scene, volume detection means for detecting the volume of the audio data to be input for each corresponding frame, and the scene For each scene detected by the switching detection means, a frame whose volume change with respect to a frame separated by a predetermined interval is equal to or more than a predetermined value, and its vicinity Video retrieval system, characterized in that it comprises a motor, a search data extracting means for extracting said as search data for each scene, the - videos de that corresponds to the frame.

【請求項１１】複数のシーケンシャルな画像フレームよ
り構成される動画像を表わす動画データと、前記各フレ
−ムのそれぞれに対応づけられるシ−ケンシャルな音声
データとを入力し、入力した動画デ−タの中から、前記
動画像に含まれる場面毎に、少なくとも一つの前記フレ
−ムに対応する動画デ−タを検索用データとして抽出す
る動画検索データ抽出方法であって、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出し、入力する前記音声データの音量を、対応するフレ−ム毎
に検出し、検出された各場面について、予め定められた値以上の音
量が検出された少なくとも一つのフレ−ムに対応する動
画デ−タを、当該各場面についての検索用データとして
抽出することを特徴とする動画検索データ抽出方法。11. A moving image data input by inputting moving image data representing a moving image composed of a plurality of sequential image frames and sequential audio data associated with each of the frames. A video search data extraction method for extracting video data corresponding to at least one frame as search data for each scene included in the video from the video, and the video data to be input. Based on the contents of the above, for each of the scenes, the frame that constitutes the scene is detected, the volume of the input audio data is detected for each corresponding frame, and for each detected scene, A moving image characterized by extracting moving image data corresponding to at least one frame in which a sound volume equal to or higher than a predetermined value is detected as search data for each scene. Search data extraction method.

【請求項１２】複数のシーケンシャルな画像フレームよ
り構成される動画像を表わす動画データと、前記各フレ
−ムのそれぞれに対応づけられるシ−ケンシャルな音声
データとを入力し、入力した動画デ−タの中から、前記
動画像に含まれる場面毎に、少なくとも一つの前記フレ
−ムに対応する動画デ−タを検索用データとして抽出す
る動画検索データ抽出方法であって、入力する前記動画データの内容に基づいて、前記各場面
毎に、当該場面を構成するフレ−ムを検出し、入力する前記音声データの音量を、対応するフレ−ム毎
に検出し、検出された各場面について、検出した各フレーム毎の音
量に基づいて、予め定められた間隔離れたフレームに対
する音量変化が予め定められた値以上である少なくとも
一つのフレームに対応する動画デ−タを、当該各場面に
ついての検索用データとして抽出することを特徴とする
動画検索データ抽出方法。12. Moving image data representing a moving image composed of a plurality of sequential image frames and sequential audio data associated with each of the frames are inputted, and the inputted moving image data is inputted. A video search data extraction method for extracting video data corresponding to at least one frame as search data for each scene included in the video from the video, and the video data to be input. Based on the contents of the above, for each of the scenes, the frame that constitutes the scene is detected, the volume of the input audio data is detected for each corresponding frame, and for each detected scene, A moving image corresponding to at least one frame in which the change in volume for a frame separated by a predetermined interval is equal to or more than a predetermined value based on the detected sound volume of each frame - Video Search data extraction method and extracting the data, as search data for the respective scenes.