JP2004363643A

JP2004363643A - Edit method, edit system, and program for stream data

Info

Publication number: JP2004363643A
Application number: JP2003155893A
Authority: JP
Inventors: Miyoshi Fukui; 美佳福井; Takayuki Miyazawa; 隆幸宮澤; Masaru Suzuki; 優鈴木; Hiroko Hayama; 寛子羽山; Koji Urata; 耕二浦田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-05-30
Filing date: 2003-05-30
Publication date: 2004-12-24
Anticipated expiration: 2023-05-30
Also published as: JP3816901B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an edit system capable of more easily carrying out an edit job for stream data such as audio and video. <P>SOLUTION: In the system, a meaning/role analysis section 14 attaches meaning/role identification information to each of partial stream data in the stream data from a stream data input section 11, a reproduction control information generating section 15 generates reproduction control information for controlling the presence/absence of a reproduction and reproduction order of the partial stream data on the basis of the meaning/role identification information and stores the reproduction control information to a storage section 16, and the received stream data are reproduced according to the reproduction control information stored in the storage section 16. The system has a reproduction control information edit section 18 for using an edit screen on which respective time ranges of the partial stream data and information denoting a meaning/role are displayed in cross-reference with each other to edit the reproduction control information according to the command entry of a user. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、映像や音声などのストリームデータを入力して編集するストリームデータ編集方法と編集システム及びプログラムに関する。
【０００２】
【従来の技術】
近年、安価なコンピュータやインターネットの普及により、教室での教師と複数の生徒によるグループ学習に代えてあるいはグループ学習を補佐するために、コンピュータを用いた通信教育、いわゆるイーラーニング（ｅ−Ｌｅａｒｎｉｎｇ）システムが一般化しつつある。時間や場所の制約がないため、誰でも家庭や職場で自主的な学習を行うことができる。
【０００３】
例えば、家庭向けとしては中高年層の労働者のための再教育、語学などの生涯教育、不登校児童の在宅学習などを目的とした通信教育課程などが提案され、そのための多くの自習教材が作成されている。職場においては、社内外の状況の変化に対応した迅速な人材育成、日進月歩の専門技術の習得、最新ＯＡ機器の操作の習得など、各自の必要やレベルに合わせた個別学習があり、そのような学習のためにｅ−Ｌｅａｒｎｉｎｇシステムの導入が相次いでいる。
【０００４】
ｅ−Ｌｅａｒｎｉｎｇシステムによって個別の目的にあった学習成果を得るためには、高品質の学習教材が必要とされる。教材が単純なテキストのみのコンテンツであれば、その作成は学習分野の専門家である教育者が自ら作成することも可能である。しかし昨今では、より学習効果を高めるため、画像や映像・音声なども含めた、いわゆるマルチメディアの学習教材が一般的になってきている。
【０００５】
マルチメディア学習教材の作成及び編集は、一般の教育者には難しい作業である。そのため、教育者がマルチメディアコンテンツの作成作業（オーサリング）に習熟した編集作業者に委託し、教育者と編集作業者の共同でマルチメディア学習教材を作成するケースが多くなる。従って、マルチメディア教材の作成には多大な費用と時間がかかり、迅速な教材供給が滞ってしまう。
【０００６】
一方、ディジタルビデオカメラや動画撮影機能付きの携帯電話機の爆発的な普及により、誰もが気軽に映像を取得し、その映像をネットワークなどを介して他人と共有する環境が整ってきている。映像が多量に蓄積される状況になって、所望の映像を簡単に検索したり、編集して再利用したいというニーズも高まってきている。
【０００７】
オフィスでは、各自の知識やノウハウなどを文書にして蓄積しておき、これをユーザが活用するナレッジマネジメントシステムが導入されている。同様のシステムは、例えば顧客相談窓口においてオペレータが顧客の質問に対する回答の内容をテキスト情報で記録しておき、他のオペレータが同じような質問を受けたときに、そのテキスト情報を検索して再利用するといった用途に利用されている。これらのシステムでは、情報を人手でテキスト情報に変換して記録しておき、ユーザは自然言語検索の技術を利用して再利用する。
【０００８】
映像や音声などのストリームデータとして記録された情報についても、テキスト情報が付加されていれば同様に検索ができる。しかし、ストリームデータの望みのシーンを直接検索するためには、マルチメディア情報の記述のための国際標準規格であるＭＰＥＧ７（ＭｏｔｉｏｎＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐｐｈａｓｅ７）のようなシーン記述方式に従って、シーン毎に検索のための説明文をテキスト情報として付加するという煩雑な作業が必要になる。重要なシーンだけ取り出して、意味のある順に並べ替えたりといった編集作業を行うとさらに有効であるが、このような作業は非常に繁雑で、一般のユーザが行うと多大な手間がかかる。
【０００９】
こうしたストリームデータの検索のための編集作業を自動化するために、ストリームデータに検索キーなどの説明テキストを自動で付加する技術がいくつか開発されている。ニュース映像を解析してシーンの区切りを検出したり、字幕の文字を認識したり、アナウンサーの発声する読み上げ音声に対して音声認識を行って重要なキーワードを抽出し、検索キーとして付加するといった技術はビデオアーカイブシステムやビデオ録画要約システムなどで試作され、あるいは実現されている。
【００１０】
例えば、「アノテーションに基づくディジタルコンテンツの高度利用（後編）」長尾確，情報処理学会学会誌Ｖｏｌ．４２Ｎｏ．８Ａｕｇ．２００１，ｐｐ．７８７−７９２（非特許文献１）の特に７８９頁に記載されているビデオアノテーションエディタでは、ニュース音声の音声認識と、映像シーンの変わり目のシーン検出は自動で行っており、それ以外は手動で人間が操作して指定している。
【００１１】
【非特許文献１】
「アノテーションに基づくディジタルコンテンツの高度利用（後編）」長尾確，情報処理学会学会誌Ｖｏｌ．４２Ｎｏ．８Ａｕｇ．２００１，ｐｐ．７８７−７９２
【発明が解決しようとする課題】
非特許文献１に記載された技術は、ニュース映像などのキーワード付加作業の自動化と検索、及び重要シーンの抽出を前提としており、先のｅ−Ｌｅａｒｎｉｎｇのための学習教材のような、映像を素材とした知識を伝達するコンテンツを人手で編集する場合の作業を支援するものではない。従って、システムが自動で解析し作成した結果を変更する場合は、従来の編集システムを利用することになる。すなわち、人手によってキーワードや説明文章を変更・追加したり、適切なシーンのみを切り出し直したり、映像素材の入れ替えを行ったりする場合には、従来通り煩雑な作業を行う必要がある。
【００１２】
このように従来のストリームデータ編集技術では、映像や音声を素材とした知識伝達用のコンテンツを作成・編集する作業に手間がかかり、迅速な知識伝達や教育に対応できなかった。
【００１３】
本発明は、音声や映像のようなストリームデータの編集作業をより容易に行うことができるストリームデータの編集方法と編集システム及びプログラムを提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記の課題を解決するため、本発明の観点では音声及び映像の少なくとも一方を含むストリームデータ中の各々の部分ストリームデータが持つ情報伝達における意味役割を解析して、該意味役割を表す意味役割識別情報を部分ストリームデータに付加する。意味役割識別情報と予め定められた規則に基づいて、部分ストリームデータの各々の再生の有無及び再生順序を制御する再生制御情報を作成して記憶する。部分ストリームデータの各々の時間範囲と意味役割とを対応付けて表示し、該表示に対するユーザの指示入力に従って、記憶されている再生制御情報を編集する。記憶された再生制御情報に従って、入力されるストリームデータを再生する。
【００１５】
本発明の別の観点によると、音声及び映像の少なくとも一方を含むストリームデータを入力する処理と、入力されるストリームデータ中の各々の部分ストリームデータが持つ情報伝達における意味役割を解析し、該意味役割を表す意味役割識別情報を前記部分ストリームデータに付加する処理と、前記意味役割識別情報に基づいて前記部分ストリームデータの各々の再生の有無及び再生順序を制御する再生制御情報を作成する処理と、前記再生制御情報を記憶する処理と、前記部分ストリームデータの各々の時間範囲と前記意味役割とを対応付けて表示し、該表示に対するユーザの指示入力に従って、記憶された再生制御情報を編集する処理と、前記記憶された再生制御情報に従って、前記入力されるストリームデータを再生する処理とをコンピュータに行わせるためのプログラムを提供することができる。
【００１６】
このように部分ストリームデータの各々の時間範囲と意味役割とを対応付けて表示し、この表示に対するユーザの指示入力に従って再生制御情報を編集することにより、ストリームデータの編集を容易に行うことが可能となる。
【００１７】
【発明の実施の形態】
以下、図面を参照して本発明の実施形態を説明する。
図１に示されるように、本実施形態のストリームデータ編集システムでは、ストリームデータ入力部１１によって映像、音声などのストリームデータが入力される。ストリームデータ入力部１１は、ディジタルビデオカメラのような映像音声取得デバイスや、インターネットやＬＡＮなどのネットワークを介して伝送されてくるストリームデータを受信する装置であってもよいし、ＤＶＤのような記憶媒体に記憶されたストリームデータを再生する装置であってもよい。
【００１８】
入力されたストリームデータは、ストリームデータ記憶部１２に蓄積されると共に、ストリームデータ処理部１３に入力される。ストリームデータ処理部１３は、意味役割解析部１４、再生制御情報作成部１５、再生制御情報記憶部１６、ストリーム再生部１７及び再生制御情報編集部１８を有する。ストリームデータ処理部１３は、具体的にはＣＰＵであり、ソフトウェアすなわち編集プログラムにより処理を行う。ストリーム再生部１７及び再生制御情報編集部１８には、映像や音声を出力する出力部１９が接続される。
【００１９】
図２を用いて、本実施形態における処理手順の概略を図１中の各部の構成と共に説明する。
まず、映像や音声などのストリームデータがストリームデータ入力部１１によって入力される（ステップＳ２１）。入力されたストリームデータは、ストリームデータ記憶部１２に記憶される（ステップＳ２２）。
【００２０】
入力されたストリームデータは、ストリームデータ処理部１３内の意味役割解析部１４にも渡され、意味役割解析が行われる（ステップＳ２３）。意味役割解析部１４では、入力されたストリームデータに含まれる部分ストリームデータを抽出し、部分ストリームデータの意味役割を解析して、意味役割識別情報を部分ストリームデータに付加する。
【００２１】
意味役割解析部１４は好ましくは複数の部分ストリーム間の対応関係をも解析する機能を有し、対応関係を抽出すると対応関係がある旨を示す情報を意味役割識別情報に含ませるものとする。例えば、質問とそれに対する回答は互いに対応しているので、これら質問と回答のそれぞれの部分ストリームデータについて対応関係が抽出される。
【００２２】
意味役割解析部１４によって意味役割識別情報が付加されたストリームデータは再生制御情報作成部１５に入力され、意味役割識別情報と予め定められた規則に基づいてストリーム再生部１７によるストリームデータの再生を制御するための再生制御情報が作成される（ステップＳ２４）。再生制御情報については後述するが、具体的には例えば部分ストリームデータの各々の再生の有無及び再生順序を制御する情報である。作成された再生制御情報は、再生制御情報記憶部１６に記憶される（ステップＳ２５）。
【００２３】
ストリーム再生部１７では、再生制御情報記憶部１６に記憶された再生制御情報に基づき、ストリームデータ入力部１１から入力されるストリームデータ中の再生制御情報に対応する部分ストリームデータがストリームデータ記憶部１２から読み出され、これが出力部１９を介して映像や音声として再生される（ステップＳ２６）。出力部１９は、映像を表示するディスプレイや音声を出力するスピーカを含む。さらに、出力部１９はストリーム再生部１７によって再生される編集後のストリームデータを例えばＣＤ−Ｒ，ＣＤ−ＲＷ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＤＶＤ−ＲＡＭまたはＨＤＤのようなディスク媒体、あるいはビデオテープなどのテープ媒体に記録するようにしてもよい。
【００２４】
再生制御情報編集部１８では、再生制御情報記憶部１６に記憶された再生制御情報に基づいて出力部１９を介して再生制御情報のための編集画面を提示する。再生制御情報編集部１８は、さらにユーザからの編集画面に対する編集指示入力を受け付けて再生制御情報を編集する（ステップＳ２７）。編集後の再生制御情報は、再生制御情報記憶部１６に再び記憶される。
なお、意味役割解析は上記した方法に限定するものではなく、他の方法を用いてもよい。
【００２５】
次に、意味役割解析部１４について詳しく説明する。意味役割解析部１４は、例えば入力されたストリームデータが対話映像である場合を例にとると、対話映像中の音声発話を音声の切れ目など適当な位置で区切って音声認識し、認識された発話内容から予めパタン辞書に登録しておいた「ありがとう」などの発話パタンを抽出して、発話パタンの出現位置から発話について「挨拶」「質問」及び「回答」などの意味役割の尤度を求める。
【００２６】
次に、予め求めておいた発話の意味役割の遷移確率（例えば、挨拶の後ろは挨拶になりやすいといった、意味役割同士の前後関係の出現確率）に基づいて、発話毎の意味役割の尤度を補正する。これにより、対話映像のストリームデータを発話単位の部分ストリームデータに切り出して、各々の部分ストリームデータに求められた意味役割の情報を付加する処理を行う。
【００２７】
次に、図３を用いて意味役割解析の処理手順の具体例を説明する。この意味役割解析の処理手順は、特願２００３−５４４２７に詳細に記載されている通りである。まず、ストリームデータ入力部１１あるいはストリーム記憶部１２を介して音声認識テキストを読み込み、形態素解析を行う（ステップＳ３１〜Ｓ３２）。図４に示す形態素解析結果１０１の例では、記号１０２、１０３及び１０４で示すアンダーライン部分が形態素解析された音声認識テキスト部分である。例えば、形態素解析済みである音声認識テキスト１０２の部分は、「よろしくお願いします」というテキストを形態素解析した結果部分である。
【００２８】
次に、予め用意されているパターン規則を適用して形態素解析結果の分析を行う（ステップＳ３３）。パターン規則は、特徴情報の意味を示す特徴情報識別情報と形態素解析パターンとを関連付けて、あるいは対応付けて記述したものである。特徴情報識別情報は予め定義されており、例えば各発話の意味を表す。
【００２９】
図５には、パターン規則記述例であるパターン規則表２００を示す。ここでは各発話の意味を表す意味役割識別情報として、「挨拶」、「相槌」、「質問」、「回答」、「確認」、「演示」及び「その他」の７つが予め定義されていると仮定する。図５のパターン規則表２００は、各形態素パターン２０２が意味役割識別情報２０１のうちのどれに出現しやすいかを表したものである。重み付け係数（スコア）２０３は、ある形態素パターンが出現したときに、その形態素パターンがどの意味役割識別情報に対応しやすいかを数値で表したものである。図５の例では、重み付け係数（スコア）２０３は数値が大きいほど、対応する意味役割になりやすいことを示す。形態素パターン２０２は、例えばいくつかの会話データから各発話の意味を決定すると思われる特徴的な部分を抜き出した形態素パターンである。形態素解析結果として付加される記号＜＞で挟まれた部分は、品詞を示している。
【００３０】
図５では、意味役割識別情報２０１は上述した７種類の意味役割識別情報が質問者の発話である場合と、回答者の発話である場合に分かれている。「挨拶」、「相槌」などの意味役割識別情報の後に、記号（Ｑ）が付いている識別情報は質問者の識別情報を表し、記号（Ａ）が付いている識別情報は回答者の識別情報を表す。すなわち、図５に示す意味役割識別情報２０１には質問者、回答者という役割の情報も含まれている。
【００３１】
図５の例では、形態素解析を行った発話の中の形態素パターン２０２中の「こんにちは＜感＞」という形態素パターンが含まれる場合、その発話が質問者のものであっても回答者のものであっても、「挨拶」という意味役割になりやすい、ということを示している。「なんですが＜付＞」という形態素パターンを含む発話は、質問者のものである場合は「質問」になりやすく、回答者のものである場合は「回答」になりやすいということを示している。従って、図５に示す意味役割識別情報２０１は、発話の意味を決定するために質問者、回答者等の役割別になっている。
【００３２】
パターン規則適用ステップＳ３３では、各発話の形態素解析結果を図５に示したパターン規則表２００に従って分析し、その発話に対応する意味役割識別情報を推定する。例えば、音声認識結果の中に「こんにちは」というテキストがあった場合、これはパターン規則中の「こんにちは＜感＞」という形態素パターンとマッチする。これが質問者の発話したものである場合は、「こんにちは」というテキストに対する意味役割候補として、「挨拶」、「相槌」、「質問」、「回答」、「確認」、「演示」及び「その他」の７つの意味役割識別情報のうちの「挨拶」に、パターン規則中の「挨拶（Ｑ）」のスコアが加算される。
【００３３】
一つのパターンマッチで、複数の意味役割識別情報にスコアが加算される場合もある。一つの音声認識テキストに複数の形態素パターンがマッチする場合もあり、この場合はその都度マッチした形態素パターンのスコアが加算される。発話者が質問者の場合は、質問者に対する意味役割識別情報（記号Ｑの付く意味役割識別情報）のみのスコアが加算され、発話者が回答者の場合は、回答者に対する意味役割識別情（記号Ａの付く意味役割識別情報）のみのスコアが加算されるようにする。
【００３４】
次に、意味役割識別情報の割り当てを行う（ステップＳ３４）。ステップＳ３４では、音声認識結果テキスト毎に最もスコアの高い、意味役割識別情報を割り当てる。形態素パターンにマッチしなかった等により、意味役割が不明な発話には、識別情報は割り当てられなくてもよい。意味役割識別情報割り当てステップＳ３４での意味役割識別情報の割り当ては行わず、意味役割識別情報遷移確率適用ステップＳ３５の処理後に、意味役割識別情報割り当て修正ステップＳ３５で割り当てるようにしてもよい。
【００３５】
意味役割識別情報遷移確率適用ステップＳ３５では、形態素パターンではなく、会話の前後関係から各発話の意味役割を推定する。テキストデータを対象とした既存の意味役割解析は、上記パターンマッチングのみを行い、最もスコアの高い意味役割を与える（例えば、「知識情報共有システム（ＫＩＤＳ）のヘルプデスク業務への適用」、第１３回人工知能学会全国大会論文集、ｐ４８４−ｐ４８７（１９９９））。
【００３６】
音声認識結果には、認識誤りが含まれる可能性があるため、形態素パターンと意味役割識別情報との対応だけでは十分な精度が得られない可能性もある。一方、対話には「質問は回答に先行する」などの意味役割識別情報の遷移の制約があると期待される。従って、意味役割識別情報遷移確率適用ステップＳ３５では、パターン規則適用ステップＳ３３により得られた各音声認識結果に対するそれぞれの意味役割識別情報のスコアを、意味役割識別情報毎の他の意味役割への遷移確率を定義した意味役割識別情報遷移確率表のデータを用いて補正する。
【００３７】
意味役割識別情報遷移確率表は、例えば質問者、回答者別に発話に割り当てられる意味役割識別情報全てについて、各意味役割識別情報の次にどの意味役割識別情報が出現しやすいかという確率を定義した表である。上述したように、意味役割識別情報には質問者、回答者という役割の情報を含むので、結果として意味役割識別情報遷移確率表は、質問者、回答者という役割に基く、意味役割識別情報の遷移確率を含む。
【００３８】
図６には、意味役割識別情報遷移確率表３００の例を示す。この例の意味役割識別情報遷移確率表３００は、先行発話の意味役割識別情報３０１から後続発話の意味役割識別情報３０２への遷移確率を示している。意味役割識別情報遷移確率表３００には、意味役割識別情報のほかに、対話の開始を示す「開始」と対話の終了を示す「終了」も含まれている。このようにすることで、それぞれの意味役割識別情報が対話の先頭に出現する確率、及び対話の最後に出現する確率も利用することができる。意味役割識別情報遷移確率表３００は、例えば対話の先頭の発話は、質問者の挨拶である確率が０．５６であり、質問者の挨拶の次にくる発話が回答者の挨拶である確率が０．５４であるということを示している。
【００３９】
遷移確率によるスコアの補正には、例えばビダビアルゴリズムが用いられる。遷移確率によるスコアの補正時に、形態素パターンにマッチしなかった発話については、全てのスコアが０となっているため、補正前に全てのスコアに例えば（１／意味役割識別情報の数）などの等スコアを与えるなどの前処理を行ってもよい。
【００４０】
次に、意味役割識別情報割り当て修正ステップＳ３６では、意味役割識別情報遷移確率適用ステップＳ３５により導出された最適な意味役割識別情報を各音声認識結果のテキストに割り当てる。遷移確率を用いることで、形態素パターンによる解析では意味役割を特定できなかった発話に対しても、意味役割識別情報を割り当てることができる。
【００４１】
意味役割遷移確率適用ステップＳ３５で、意味役割識別情報遷移確率表の遷移確率情報に基いて最適な意味役割識別情報を見つけられなかった場合には、意味役割識別情報割り当てステップＳ３４で割り当てられた意味役割識別情報を採用すればよい。意味役割識別情報割り当てステップＳ３４による意味役割識別情報割り当てを行わない場合は、形態素パターン適用ステップＳ３３で最もスコアが高かった意味役割識別情報を採用する。
【００４２】
上述したような意味役割解析により、例えば図７に示すように対話の音声認識結果の前に、発話の開始時間、終了時間及び発話者の名前（Ｑ，Ａなど）と、「質問」「回答」「相槌」などの意味役割を示す意味役割識別情報が付加されて出力される。
【００４３】
図８に示すように、質問者と回答者の映像が別のビデオカメラなどを通じて別のビデオストリームとしてストリームデータ入力部１１により入力される場合がある。このような場合には、図９に示すように質問者と回答者の映像であるビデオストリームをそれぞれ部分ストリームに分割して音声認識を行い、別々のデータとして出力する。
【００４４】
次に、二つの音声認識結果データに含まれる各発話の時系列データなどを用いて、発話順に並べて１つの対話データとしてマージする。マージ結果に対して図８中に示されるように意味役割解析を行い、図１０に示されるような意味役割解析結果を生成する。図１０が図７と異なる点は、図１０の上側に示されるようにビデオストリームＩＤが複数あることである。
【００４５】
次に、意味役割解析結果を基に映像の再生制御情報を作成する。例えば、以下のように再生制御情報作成規則を適用し、映像の再生制御情報を作成する。予め質問者と回答者の役割が決定している場合、図１１に示すように質問者映像として質問者の質問発話部分の映像と、回答者映像として回答者の回答発話部分の映像のみを並べて、再生制御情報を作成する。この再生制御情報作成規則の場合、映像に含まれる他の挨拶、相槌、回答者による質問、質問者による回答は、再生制御情報に含めない。
【００４６】
図１２を用いて、再生制御情報作成部１５における再生制御情報の作成手順の一例を説明する。
まず、ストリームデータ入力部１１によってストリームデータに含まれる部分ストリームデータとして、発話データを撮影時間順に１つ入力する（ステップＳ４１）。入力された発話データに対して意味役割解析により付加された意味役割識別情報から、発話データの意味役割は質問者の質問であるか否かを判定する（ステップＳ４２）。発話データの意味役割が質問者の回答でなければ、引き続き発話データの意味役割は回答者の回答か否かを判定する（ステップＳ４３）。
【００４７】
発話データの意味役割が質問者の質問か回答者の回答であれば、発話データに対して新規シーンＩＤを付与して再生制御情報である再生リストを生成し、再生制御情報記憶部１６に記憶する（ステップＳ４４）。以下、ステップＳ４１〜Ｓ４４の処理を繰り返す。
【００４８】
図１３に、図１１中の意味役割解析結果を基に図１２の手順で生成された再生制御情報である再生リストの例を示す。部分ストリームデータである各シーンデータの再生時間は、意味役割解析識別情報の頭に付加されている、発話の終了時間から開始時間を引いた秒数がセットされる。図１３の再生リストでは、質問者側の映像と回答者側の映像の二つのストリームデータがあるので、質問者側のストリームデータのＩＤと回答者側のストリームデータのＩＤの二つを再生制御情報として記憶し、コンテンツを再生する際には、二つのストリームデータから、対応する映像の時間情報を基に映像を再生する。
【００４９】
図１３の再生リストでは、質問者側の映像と回答者側の映像を別のストリームデータとしているが、これらを一つのストリームデータとしてもよい。また、図１３では各シーンデータに再生順番の情報を付加して、１番から順に再生するように指定しているが、特に再生順番という情報を付加せず、シーンデータの並び順に従って順に再生してもよい。さらに、各シーンデータの再生開始時間として再生開始から各シーンが再生されるまでの時間を指定してもよい。これによりシーンとシーンの間で映像が流れない部分を作ったり、二つの映像が重複して再生される時間が存在するように指定することも可能になる。
【００５０】
次に、上記のようにして再生情報記憶部１６に記憶された再生制御情報を読み込んで再生情報編集部１８が編集作業を行う。図１４に、再生制御編集部１８で編集ツールとして用いる編集画面の一例を示す。再生情報編集部１８が図１３に示した再生制御情報を読み込むと、これが図１４に示すような編集画面として出力部１９で表示される。図１４の例では、編集ウィンドウの左部分に、ストリーム再生部１７によって再生されるストリームデータである映像の再生表示部分が組み込まれているが、編集画面をストリームデータの再生表示とは別の画面で表示しても構わない。
【００５１】
図１４によると、編集ウィンドウの下方に再生制御情報を編集するための表示（以下、編集用表示という）４００が存在する。編集用表示４００は、この例では部分ストリームデータの各々の時間範囲を表す水平方向に延びたバー４０１と、バー４０１の下側に隣接して文字で表示された「質問」、「回答」などの意味役割表示４０２を有する。バー４０１には、部分ストリームデータの境界の時間位置に相当する位置に区切り線があり、これによって部分ストリームデータの時間範囲が分かる。また、バー４０１の下側の意味役割表示４０２から、各々の部分ストリームデータの持つ意味役割が分かる。さらに、編集用表示４００にはバー４０１の上側に隣接して時刻を表示したタイムライン４０３も存在する。
【００５２】
意味役割は文字で表示する以外に、質問を青、回答は赤など、色分けで表現する他、フォントや文字属性を買えてもよい。
【００５３】
図１４の例の編集用表示４００から、再生制御情報により再生が制御されるストリームデータは、質問と回答が並ぶような構造を持ったコンテンツであることが分かる。ここで、ユーザの指示入力によって、例えば部分ストリームデータの時間範囲を示すバー４０１上で、意味役割表示４０２が「回答」と表示されている位置を矢印のカーソルで選択すると、ストリームデータの回答部分が再生されることにより、回答の内容をチェックすることができる。
【００５４】
また、各シーンの発話内容を概要とし、意味役割をシーンタイプとして表示することで、ユーザに編集させてもよい。例えば、図１５に示すようにタイトルや概要などを人手で編集してもよい。人手による編集を行った後、例えば編集ウィンドウ内で「更新」を指示することにより、編集後の内容が新規の再生制御情報として再生制御情報記憶部１６に記憶される。図１６に、図１５の編集画面上での更新指示により、図１３の再生制御情報である再生リストを更新した後の再生リストの例を示す。
【００５５】
このように部分ストリームデータの各々の時間範囲と意味役割とを対応付けて表示し、この表示に対するユーザの指示入力に従って再生制御情報を編集することにより、ストリームデータの編集を容易に行うことが可能となる。すなわち、編集する映像や音声の構造が一目で理解でき、編集作業における試行錯誤が少なくなる。特に、上述のように特定の意味役割の付加された部分ストリームのみを再生してチェックするという編集作業や、あるいは後述するように部分ストリームデータの取捨選択、意味役割の変更、部分ストリームデータの切り出し範囲を変更するといった編集作業を効率よく行うことが可能となる。
【００５６】
さらに、素材の意味役割に基づいて編集を行うことにより、他人にとってわかりやすく、学習しやすいコンテンツの作成が可能になる。これにより、特に映像の編集作業に習熟しない一般のユーザでも、効率よく知識を伝達する映像や音声の作成・編集を容易に行うことができるようになる。
【００５７】
次に、図１７を用いて再生制御情報の別の作成手順について、図１０中に示される意味役割解析結果を例に説明する。
図１０の発話データを１つずつ入力し（ステップＳ５１）、発話データの意味役割は質問者の質問か否かを判定する（ステップＳ５２）。図１０の例では、「えっと、代官山で、、、」という発話データが質問者の質問なので、この発話データが入力されるとステップＳ５４に進み、再生リストにシーンデータが登録されているか否かを確認する。ここでは、まだ再生リストにシーンデータが登録されていないので、図１８に示すように新規シーンデータ（シーンＩＤ：０００１）を作成して再生リストに登録する（ステップＳ５８）。
【００５８】
図１０中の次の発話データ「そうですね。あのー、洋風と、」は回答者の回答なので、ステップＳ５２からステップＳ５３を経由してステップＳ５４に進み、再生リストにシーンデータが登録されているか否かを確認する。図１８の再生リストには既にシーンデータが登録されているので、ステップＳ５５に進んで直前のシーンデータと同じビデオストリームか否かを調べる。ここでは、図１８に示すように直前のシーンデータ（シーンＩＤ：０００１）のビデオストリームＩＤは質問者のものであり、回答者とは別のビデオストリームであるので、ステップＳ５６に進み、図１９に示すように新規シーンデータ（シーンＩＤ：０００２）を作って再生リストに登録する。
【００５９】
図１０中の発話データ「そうですね。あのー、洋風と、」の次の発話データ「はい。」は相槌なので、再生リストには含まれない。さらに次の発話データ「アジア料理とかもあるんですけど、」は回答者の回答なので、ステップＳ５４からステップＳ５５へ進む。直前のシーンデータも回答者のものであり、ビデオストリームＩＤは同じなので、ステップＳ５５からステップＳ５６へ進み、直前のシーンデータとの時間間隔が２秒以内かどうかを判定する。
【００６０】
図１０に示されるように、発話データ「アジア料理とかもあるんですけど、」の開始時刻は００：１５である。一方、直前のシーンデータである発話データ「そうですね。あのー、洋風と、」の開始時刻は００：０７、終了時間は００：１０であり、ステップＳ５６で判定される時間間隔は５秒であるので、ステップＳ５８に進み、図２０に示すように新規シーンデータ（シーンＩＤ：０００５）を作って再生リストに登録する。
【００６１】
このように処理を進めていくと、発話データ「ま、定番になるんですけど」を解析する前の再生リストは、図２１に示すようになる。図１０に示されるように、発話データ「ま、定番になるんですけど」の開始時刻は００：２２、直前の「アジアですと、」の終了時刻は００：２１となるため、その間隔は１秒になる。そこで、ステップＳ５６からステップＳ５７に進み、図２２に示すように直前のシーンデータ（シーンＩＤ：０００４）の概要に発話データを追加する。再生時間は、追加する発話データの終了時刻００：２４から、直前のシーンデータの開始時刻００：２０を引いて４秒とセットする。
【００６２】
以下、同様に回答者の回答データをシーンＩＤ：０００４のシーンデータに加えてゆき、図２３に示すような再生リストが生成される。図２３の再生リストを編集ツールで見ると、図２４に示すように表示される。図２４に示されるように、シーンＩＤ：０００４のシーンデータは意味役割解析処理で区切られた単位ではなく、連続した一本のビデオストリームとして貼り付けられる。
【００６３】
以下、図１４に示した編集画面を用いた再生制御情報の編集によるストリームデータ編集の具体例を幾つか説明する。
（ストリームデータの分割）
まず、連続したストリームデータを途中で分割する処理の例について述べる。例えば、図２５のシーンＩＤ：０００２のシーンデータにおいて「そうですね。あのー、洋風と」という発言のうちの「そうですね。あのー、」の部分の映像及び音声を分割する場合、映像を見ながら再生と中断を繰り返し、「あのー、」と「洋風と」の間で再生を中断する。ここで、図２５中に示されるように例えばポップアップメニューのようなものを用いて、ユーザが「分割」を選択指示すると、図２６に示す再生リストのようなデータ構造に更新される。
【００６４】
図２６によれば、シーンＩＤ：０００２のシーンデータの再生時間が２秒になり、この後に新たにシーンＩＤ：０００５のシーンデータが挿入される。シーンＩＤ：０００５のシーンデータの開始時刻は、シーンＩＤ：０００２のシーンデータの再生時間の直後からとなる。挿入されたシーンＩＤ：０００５のシーンデータの再生順番が３になり、以降のシーンデータの再生順番は１つずつ繰り下げられる。図２６では、概要の文章と意味役割タグは、シーンＩＤ：０００２と同じデータが自動的にシーンＩＤ：０００５にも挿入されているが、それぞれのビデオストリームの音声データを音声認識し直して、概要の文章を変更する処理を行っても良い。
【００６５】
また、概要や意味役割を人手で変更することも可能になる。例えば、シーンＩＤ：０００２の意味役割を「相槌」に、概要を「そうですね。あのー」に人手で変更し、シーンＩＤ：０００５の概要を「洋風と」に変更して更新する作業を人手で行うことが可能である。
【００６６】
このようにして、ストリームデータの分割作業を行うことにより、ユーザは意味役割の切り出し範囲を変更することができる。
【００６７】
（不要シーンデータの削除）
次に、不要なシーンデータを削除する処理の例を示す。例えば、さきほど分割して作成した図２６の再生リストにおいて、シーンＩＤ：０００２のシーンデータを削除する場合、図２７に示すようにユーザが部分ストリームデータの時間範囲を示すバー上でシーンを選択して、ポップアップメニューなどで「削除」を選択指示する。図２８に示されるように、シーンＩＤ：０００２のシーンデータが消され、以降のシーンデータの再生順番が１つずつ繰り上げられたストリームデータが作成される。そこで、シーンＩＤ：０００５の概要データを図２９に示すように「洋風と」のみに修正してデータを更新すると、再生リストは図３０に示すようになる。
【００６８】
（ストリームデータの差し替え）
次に、再生リストの一部のシーンのストリームデータ（例えばビデオストリーム）を別のデータと差し替える処理について説明する。例えば、図３８の左に示すような再生リストにおいて、２番目の回答シーンの映像を他の映像に差し替えるとする。例えば、図３１に示すようにプルダウンメニューなどを用いてユーザが「開く」を選択指示すると、図３２に示すように差し替える再生リストのファイル名を入力するダイアログが表示される。ここで、図３２において当該ダイアログにファイル名を入力して「開く」を指示すると、指定した再生リストを表示した、もう一つの編集ウィンドウが表示される。
【００６９】
一方、図３２において「探す」というボタンをクリックすると、例えば図３３に示すような検索ウィンドウが表示される。ここで、ユーザが例えば「代官山の和食のお店」などといった質問文を入力して「検索」を指示すると、自然言語検索技術を用いて再生リストの文字情報を検索した結果が検索ウィンドウ内にリスト表示される。検索ウィンドウ内の１，２，…という数字の表示は、検索結果のスコアの高い順番を示している。黒い星印で示されるマークなどによりスコアの高さを示してもよい。スコアの横には、検索された再生リストの中の該当するシーンのタイトルや概要などが表示される。その下に、再生リストの各シーンの長さを示す矩形などを表示してもよい。どのシーンが該当するのかは、別に太枠などで示している。各シーン毎にシーンの最初の映像（サムネイル）を表示してもよい。図３４に示すように、シーンを示す矩形の下に「質問」、「回答」などの意味役割を表示してもよい。
【００７０】
図３３や図３４の検索ウィンドウ内でユーザが該当するシーンやファイルを選択すると、図３５に示すように別の編集ウィンドウでその再生リストが表示される。開かれた再生リストの２つ目のシーンは、代官山の和食に関する回答の映像が含まれている。このシーンを図３５の下方に表示されている部分ストリームデータの時間範囲を示すバー上で選択して、ポップアップメニューなどで「コピー」を選択指示すると、選択されたシーンの再生リストの情報がバッファにコピーされる。
【００７１】
次に、図３１に示したように最初に開いていた再生リストのウィンドウで、図３６に示すようにユーザが差し替えを行うシーンを部分ストリームデータの時間範囲を示すバー上で選択して、ポップアップメニューなどで「差し替え」を選択すると、図３８のように再生リストのシーンＩＤ：０００２のシーンデータが、バッファにコピーされていた再生リストの情報で差し替えられる。ビデオストリームＩＤ、開始時間、再生時間、タイトル、概要、意味役割、話者役割などが変更される。図３７に、部分ストリームデータの時間範囲を示すバー上で差し替えを行うべきシーンを選択したときの概要欄の表示を示す。
【００７２】
（ストリームデータの挿入）
一方、シーンデータを差し替えずに、新たなシーンデータを挿入する場合は、図３９に例を示すようにユーザが挿入を行いたい位置で再生を停止して、ポップアップメニューなどで「挿入」を選択指示することにより、図４０に示すように新たなシーンデータが挿入される。図４１に示すように、再生リストの途中に新たにシーンＩＤ：０００５のシーンデータが挿入され、再生順番は直前のシーンの次になる。以降のシーンデータは、再生順番が１つずつ繰り下げられる。
【００７３】
（ストリームデータの差し替え録画）
次に、シーンデータの映像や音声をその場でアフレコ（ａｆｔｅｒｒｅｃｏｒｄｉｎｇ）により差し替える場合の例を以下に示す。図４２に示すように、ユーザが部分ストリームデータの時間範囲を示すバー上で映像や音声を差し替えたいシーンを選択して、「差替録画」を指示する。これにより、例えば図４３に示すようにカメラから映像や音声を取り込んで録画するウィンドウが表示される。この録画ウィンドウ内で、ユーザが「録画開始」を指示すると、その場でユーザがカメラに向かって話す映像と音声を取り込む。この場合、カメラでユーザの顔だけでなく、手元の資料を撮影したり、機器の操作を行いながら説明する実演映像などを撮影してもよい。
【００７４】
ここで、例えば「中断」を選択すると撮影を一旦中止し、「終了」を選択すると撮影を終了する。その後、「差し替え実行」などを選択すると、図４４に示すように、図４２に示すようにして選択されたシーンのシーンデータが新しく録画された映像データで差し替えられる。
【００７５】
このときに、概要部分とシーンタイプなどは、差し替え前のデータをそのまま残してもよい。逆に、撮影中か撮影後に録画している音声を音声認識しておき、概要部分を音声認識した結果で差し替えてもよい。このような差し替え録画により、再生リストのデータは例えば図４５に示すように変更される。
【００７６】
図４６に、別の編集画面の例を示す。ここでは、シーン情報と映像情報を別トラックに分けており、複数の部分ストリームデータをまとめて１つのシーンとすることが可能になっている。これにより、発話の意味役割解析の単位にとらわれず、大きな意味のまとまりでシーン情報を付加することができる。
【００７７】
例えば、図４６に示すように質問とそれに対する回答をひとまとめにしてシーンとし、これにタイトルと概要説明を付けるようにすれば、ユーザの編集作業が減るので楽になる。また、検索時においても意味役割の細かい単位で検索するだけでなく、シーン単位で検索することが可能になり、検索結果の表示を見やすくする効果も期待できる。
【００７８】
この場合、意味役割解析部１４では前述のように意味役割の解析と共に、複数の部分ストリーム間の対応関係を解析し、対応関係を抽出すると対応関係がある旨を示す情報を意味役割識別情報に含ませる。一方、再生制御情報作成部１５は、対応関係が抽出された複数の部分ストリームデータについて一括して再生の有無及び再生順序を制御するような再生制御情報（再生リスト）を作成する。
【００７９】
図４７に、図４６の編集画面に対応する再生リストの構造を示す。再生リストのデータは、シーンデータとショットデータの２階層構造になっている。意味役割解析結果から、例えば図４８に示す手順により、上述の部分ストリームデータ間の対応関係を有する質問と回答のペアが同じシーンに属するように自動生成される。図４８の手順は、図１７に示した手順と類似している。
【００８０】
まず、発話データを１つずつ入力し（ステップＳ６１）、発話データの意味役割は質問者の質問か否かを判定する（ステップＳ６２）。入力された発話データが質問者の質問でなければ、ステップＳ６３で発話データの意味役割が回答者の回答か否かを判定する。入力された発話データが質問者の質問か、回答者の回答であればステップＳ６４に進み、再生リストにショットデータが登録されているか否かを確認する。入力された発話データが質問者の質問でみなく、回答者の回答でもない場合は、ステップＳ６１に戻る。
【００８１】
再生リストにショットデータが登録されていれば、ステップＳ６５に進み、入力された発話データが直前のショットと同じビデオストリームかどうかを調べ、そうであればステップＳ６６に進んで直前のショットとの時間間隔が２秒以内かどうかを調べる。入力された発話データが直前のショットと同じビデオストリームであり、かつ直前のショットとの時間間隔が２秒以内であれば、ステップＳ６７により直前のショットデータに発話文言を付加し、再生時間を増やす。
【００８２】
再生リストにショットデータが登録されているが、入力された発話データが直前のショットと同じビデオストリームでない場合、及び入力された発話データが直前のショットと同じビデオストリームであるが、直前のショットとの時間間隔が２秒以内でない場合は、ステップＳ６８に進んで新規ショットデータを作成する。この後、ステップＳ６９により新規ショットデータが直前のショットと同じビデオストリームか否かを調べ、同じ場合はステップＳ７１で直前のショットが属するシーンの下に新規ショットデータを接続する。
【００８３】
新規ショットデータが直前のビデオストリームと同じでない場合は、ステップＳ７０に進んで発話データの意味役割が回答者の回答か否かを判定し、そうであればステップＳ７１に進む。ステップＳ７０及び先のステップＳ６３において、発話データの意味役割が回答者の回答でない場合はステップＳ７２に進み、新規シーンデータを作成してショットデータをその下に接続する。ステップＳ７１またはステップＳ７２の処理が終了するとステップＳ６１に戻り、以上の処理を繰り返す。
【００８４】
このようにシーンでまとめると、シーンごと削除したり、順番を加える場合のユーザの作業が楽になる。例えば、図４６でシーンを表すバーを選択して、ポップアップメニュー等で削除をして、移動先で挿入する等の指定により、ユーザは対応関係をもつ複数の部分ストリームデータを一括して操作できる。また、図４６の映像のうちの「回答」をあらわすバーを選択して削除すると、シーンの長さもその分縮んで、残りの質問と回答を１つずつ含む長さのシーンに自動的に変更される。
【００８５】
一方、図４７では２階層にして、必ずシーンの下にショットがあるようになっているが、拘束関係をなくしてもよい。シーンはシーン、ショットはショットで、それぞれ再生するときの再生開始時間を別途データとして持ち、映像は続いていてもシーンを変えるといったことも可能である。これにより、カラオケのように映像は同じでも字幕だけを変えてゆく、といった再生リストの作成が可能になる。
【００８６】
図４９に、再生リストを作る元となる意味役割解析結果を確認する対話ビューアの例を示す。ここでは、図１０の意味役割解析結果を表示している。再生を開始すると、質問者と回答者の映像が同時に再生され、発言の意味役割解析データと音声認識データを表示するとともに、下のタイムライン上で再生中の発話がどれかを色を変えたりなどにより示す。
【００８７】
図５０の対話ビューアに示すように、タイムライン上の発話を表すバーの下に意味役割と発話の内容を表示してもよい。また、ユーザが発話を表すバーを直接指示すると、その発話から再生を開始してもよい。
【００８８】
再生リストを表示中にユーザが編集画面などから対話ビューアを呼び出すと、図５１の対話ビューアに示すように再生リストに含まれる映像データの区間のバーの色を変えるなどにより、その発話が再生リストに含まれているかを明示的に表示してもよい。ユーザは、対話全体を聞いて再生リストに含まれる発話のコンテキストを確認することが可能になる。意味役割解析結果の誤りをチェックしたり、重要な発話が再生リストから漏れていないかなどの確認行うこともできる。
【００８９】
図５２に、対話ビューアの別の例を示す。ユーザは選択された発話のみを再生するか、対話データのすべてを再生するか選択することができる。発話の選択を一つ一つ手作業で行うのではなく、「質問者の質問」、「回答者の回答」というように、話者役割と発話の意味役割をチェックボックスなどで指定することによって、一括して選択するインタフェースを用意しても良い。これによってユーザの指示作業が減り、効率よく対話データを確認することができる。
【００９０】
図５３に示すように、ユーザが発話データを選択し直した後、ポップアップメニューなどにより「新規作成」を指示すると、選択した発話データを含む新しい再生リストを作成するようにしてもよい。また、「コピー」を選択した後、編集ツールで開いた再生リストの任意の箇所で「挿入」や「差し替え」を指示することにより、自動生成された再生リストから漏れてしまった重要な映像データを再生リストに取り込むことができる。
【００９１】
また、図５４に示すように、シーンをバーで表すのではなく、画像等で表してもよい。例えば、各シーンの特徴的な画像（サムネイル）に意味役割を付加して表示する編集画面を設けてもよい。ユーザはシーンを選択してドラッグ・アンド・ドロップ等の操作で、順番を入れ替えたり、コピーや削除等の操作を簡単に行うことができる。
【００９２】
本実施形態では、再生リストに格納する情報をタイトル、概要、意味役割、話者役割及び音声認識結果などとしているが、これに限定するものではない。例えば、概要は一般的な聴衆を対象に記述しているが、初心者と中級者、年代別などのような複数レベルのユーザ毎に概要を記述できるような編集ツールにしてもよい。出来上がったコンテンツを見るユーザに合わせて、表示する概要を変えることが可能になる。
【００９３】
カメラに向かって物を見せて、操作の実演をしながら説明をするような映像の場合は、説明している物の名前、機能の名前などを細かく入力するようにしてもよい。物の名前や機能の名前は、音声認識結果から情報抽出技術により自動で抽出してもよいし、編集ツールを操作するユーザが手作業で入力してもよい。これにより、「○○の××操作について教えて」といった質問に対して適切なシーンを検索してユーザに示すことができる。
【００９４】
また、本実施形態では映像と音声を同一トラックで取り扱っていたが、音声トラックを別に設けてもよい。これにより、例えば映像は常に回答者の音声データを再生するといった再生形態が可能になる。
【００９５】
さらに、映像もしくは音声のみをアフレコしたり、別のストリームデータで差替えることが可能になる。更に音声トラックを１つに限らず複数も設けることにより、回答者と質問者の音声を同時に再生したり、解説やＢＧＭを重ねて再生することが可能になる。
【００９６】
なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。
【００９７】
【発明の効果】
以上説明したように、本発明によれば従来では非常に煩雑であったストリームデータの編集作業を効率よく行うことができる。
【００９８】
例えば、自動的に作成されたコンテンツを人手で修正する場合、システムがコンテンツを作成した意図が分かりやすく示されているため、ユーザは試行錯誤なく修正作業を行うことができる。これにより、映像編集に慣れない一般のユーザでも自分で映像や音声の編集を行い、迅速に自分の意図通りの知識伝達コンテンツを作成・編集することができる。
【００９９】
また、映像や音声の一部を他の映像や音声に差し替えることも容易になる。例えば、回答を説明する映像のみをユーザのレベルに合わせて用意し、その部分だけを差し替えた映像をユーザに合わせて提示するといった編集作業も容易に行うことができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係るストリームデータ編集システムの構成を示すブロック図
【図２】同実施形態におけるストリームデータの編集手順を示すフローチャート
【図３】同実施形態における意味役割解析の処理手順を示すフローチャート
【図４】意味役割解析における形態素解析結果の例を示す図
【図５】意味役割解析で用いられるパターン規則表の例を示す図
【図６】意味役割解析で用いられる意味役割識別情報遷移確率表の例を示す図
【図７】意味役割解析により部分ストリームデータに意味役割識別情報が付加される様子の例を示す図
【図８】質問者と回答者の映像が別のビデオストリームとして入力される場合のストリームデータ編集の概要を示す図
【図９】質問者と回答者の映像であるビデオストリームを部分ストリームに分割して音声認識して別々のストリームデータとして出力する様子を示す図
【図１０】意味役割解析結果の例を示す図
【図１１】再生制御情報の作成例を示す図
【図１２】再生制御情報の作成手順の一例を示すフローチャート
【図１３】図１１中の意味役割解析結果を基に作成される再生制御情報の例を示す図
【図１４】再生制御情報編集のための編集画面の例を示す図
【図１５】再生制御情報の編集後に更新指示を行った編集画面の例を示す図
【図１６】更新後の再生制御情報の例を示す図
【図１７】再生制御情報の作成手順の他の例を示すフローチャート
【図１８】再生制御情報の作成手順における第１の新規シーンデータ作成後の再生制御情報を示す図
【図１９】再生制御情報の作成手順における第２の新規シーンデータ作成後の再生制御情報を示す図
【図２０】再生制御情報の作成手順における第３の新規シーンデータ作成後の再生制御情報を示す図
【図２１】再生制御情報の作成手順における特定の発話データ解析前の再生制御情報を示す図
【図２２】再生制御情報の作成手順における特定の発話データ追加後の再生制御情報を示す図
【図２３】再生制御情報の作成手順における最終的な再生制御情報の例を示す図
【図２４】図２３の再生制御情報を反映させた編集画面の例を示す図
【図２５】ストリームデータ編集の具体例であるストリームデータ分割処理を説明するための編集画面を示す図
【図２６】ストリームデータ分割処理時における更新後の再生制御情報を示す図
【図２７】ストリームデータ編集の具体例である不要シーンデータの削除処理を説明するための編集画面を示す図
【図２８】不要シーンデータ削除処理時における更新後の再生制御情報を示す図
【図２９】不要シーンデータ削除処理後に概要データが修正された編集画面を示す図
【図３０】概要データの修正により更新された後の再生制御情報を示す図
【図３１】ストリームデータ編集の具体例であるストリームデータ差し替え処理を説明するための第１の編集画面を示す図
【図３２】ストリームデータ差し替え処理を説明するための第２の編集画面を示す図
【図３３】ストリームデータ差し替え処理を説明するための第３の編集画面を示す図
【図３４】ストリームデータ差し替え処理を説明するための第４の編集画面を示す図
【図３５】ストリームデータ差し替え処理を説明するための第５の編集画面を示す図
【図３６】ストリームデータ差し替え処理を説明するための第６の編集画面を示す図
【図３７】ストリームデータ差し替え処理を説明するための第７の編集画面を示す図
【図３８】ストリームデータ差し替えによる更新後の再生制御情報を示す図
【図３９】ストリームデータ編集の具体例であるストリームデータ挿入処理を説明するための第１の編集画面を示す図
【図４０】ストリームデータ挿入処理を説明するための第２の編集画面を示す図
【図４１】ストリームデータ挿入による更新後の再生制御情報を示す図
【図４２】ストリームデータ編集の具体例であるストリームデータ差し替え処理を説明するための第１の編集画面を示す図
【図４３】ストリームデータ差し替え処理を説明するための第２の編集画面を示す図
【図４４】ストリームデータ差し替え処理を説明するための第３の編集画面を示す図
【図４５】ストリームデータ差し替えによる更新後の再生制御情報を示す図
【図４６】再生制御情報編集のための編集画面の別の例を示す図
【図４７】図４６の編集画面を実現する再生制御情報を示す図
【図４８】再生制御情報の作成手順の別の例を示すフローチャート
【図４９】意味役割解析結果を確認する対話ビューアの一例を示す図
【図５０】図４９に示す対話ビューアの変形例を示す図
【図５１】図４９に示す対話ビューアの他の変形例を示す図
【図５２】意味役割解析結果を確認する対話ビューアの他の例を示す図
【図５３】図５２に示す対話ビューアの変形例を示す図
【図５４】再生制御情報編集のための編集画面のさらに別の例を示す図
【符号の説明】
１１…ストリームデータ入力部、１２…ストリームデータ記憶部、１３…ストリームデータ処理部、１４…意味役割解析部、１５…再生制御情報作成部、１６…再生制御情報記憶部、１７…ストリーム再生部、１８…再生制御情報編集部、１９…出力部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a stream data editing method, an editing system, and a program for inputting and editing stream data such as video and audio.
[0002]
[Prior art]
2. Description of the Related Art In recent years, with the spread of inexpensive computers and the Internet, distance learning using computers, so-called e-learning system, has been used to replace group learning by a teacher and a plurality of students in a classroom or to assist group learning. Is becoming more common. Because there are no restrictions on time or place, anyone can study independently at home or at work.
[0003]
For example, for home use, re-education for middle-aged and elderly workers, lifelong education such as language, correspondence courses for homeless learning of school refusal children, etc. have been proposed, and many self-study materials have been created for that purpose. Have been. In the workplace, there are individual learning programs tailored to the needs and levels of each employee, such as rapid human resource development in response to changes in internal and external situations, the acquisition of advanced technical skills, and the operation of the latest OA equipment. The introduction of e-learning systems for learning is continuing.
[0004]
In order to obtain learning results that meet individual needs by the e-learning system, high-quality learning materials are required. If the teaching material is simple text-only content, it can be created by educators who are experts in the learning field. However, in recent years, so-called multimedia learning materials including images, videos, sounds, and the like have become popular in order to further enhance the learning effect.
[0005]
Creating and editing multimedia learning materials is a difficult task for ordinary educators. Therefore, there are many cases where an educator entrusts an editing worker who has mastered the creation work (authoring) of multimedia contents to create a multimedia learning teaching material jointly by the educator and the editing worker. Therefore, it takes a great deal of cost and time to create multimedia teaching materials, and a rapid supply of the teaching materials is delayed.
[0006]
On the other hand, due to the explosive spread of digital video cameras and mobile phones with a moving image shooting function, an environment has been established in which anyone can easily acquire images and share the images with others via a network or the like. As a large number of videos are accumulated, there is a growing need to easily search for, edit, and reuse desired videos.
[0007]
In the office, a knowledge management system in which each user's knowledge and know-how is stored as a document and the user utilizes the document is introduced. In a similar system, for example, an operator records the contents of an answer to a customer's question as text information at a customer consultation desk, and when another operator receives a similar question, searches the text information and re-executes the search. It is used for applications such as use. In these systems, information is manually converted into text information and recorded, and the user reuses the information using a natural language search technique.
[0008]
Information recorded as stream data such as video and audio can be similarly searched if text information is added. However, in order to directly search a desired scene of stream data, it is necessary to search for each scene according to a scene description method such as MPEG7 (Motion Picture Experts Group 7), which is an international standard for describing multimedia information. It is necessary to perform a complicated operation of adding an explanatory sentence as text information. It is more effective to perform an editing operation such as extracting only important scenes and rearranging them in a meaningful order. However, such an operation is very complicated and takes a great deal of time if performed by a general user.
[0009]
In order to automate editing work for searching for such stream data, several techniques for automatically adding explanatory text such as a search key to stream data have been developed. Technology that analyzes news videos to detect scene breaks, recognizes subtitles, and performs voice recognition on the spoken voice uttered by an announcer to extract important keywords and add them as search keys. Has been prototyped or implemented in video archiving systems and video recording summarization systems.
[0010]
For example, "Advanced Use of Digital Content Based on Annotation (Part 2)", Nagao Satoshi, Journal of Information Processing Society of Japan, Vol. 42 No. 8 Aug. 2001, pp. 787-792 (Non-Patent Document 1), in particular, in the video annotation editor described on page 789, speech recognition of news speech and scene detection at the transition of a video scene are automatically performed, and in other cases, humans are manually operated. Is operated and specified.
[0011]
[Non-patent document 1]
"Advanced Use of Digital Content Based on Annotations (Part 2)" Satoru Nagao, IPSJ Journal Vol. 42 No. 8 Aug. 2001, pp. 787-792
[Problems to be solved by the invention]
The technology described in Non-Patent Document 1 is based on the premise of automating and retrieving a keyword adding operation such as a news video and extracting important scenes, and uses video such as a learning material for e-learning. It does not support the task of manually editing the content that conveys the knowledge. Therefore, when the system automatically changes the result of analysis and creation, a conventional editing system is used. That is, when manually changing or adding a keyword or an explanatory sentence, re-cutting out only an appropriate scene, or replacing a video material, it is necessary to perform a complicated operation as before.
[0012]
As described above, with the conventional stream data editing technology, it takes time and effort to create and edit the content for knowledge transfer using video and audio materials, and cannot cope with rapid knowledge transfer and education.
[0013]
SUMMARY OF THE INVENTION An object of the present invention is to provide a stream data editing method, an editing system, and a program that can more easily edit stream data such as audio and video.
[0014]
[Means for Solving the Problems]
In order to solve the above-described problem, according to an aspect of the present invention, a semantic role in information transmission of each partial stream data in stream data including at least one of audio and video is analyzed, and a semantic role identification representing the semantic role is analyzed. Information is added to the partial stream data. Based on the semantic role identification information and a predetermined rule, reproduction control information for controlling whether or not each partial stream data is reproduced and a reproduction order are created and stored. The time range and the semantic role of each of the partial stream data are displayed in association with each other, and the stored reproduction control information is edited according to a user's instruction input for the display. The input stream data is reproduced according to the stored reproduction control information.
[0015]
According to another aspect of the present invention, a process of inputting stream data including at least one of audio and video, and analyzing a semantic role in information transmission of each partial stream data in the input stream data, and analyzing the meaning. A process of adding semantic role identification information representing a role to the partial stream data, and a process of creating playback control information for controlling the presence / absence of each of the partial stream data and a playback order based on the semantic role identification information. Processing for storing the reproduction control information, displaying each time range of the partial stream data and the semantic role in association with each other, and editing the stored reproduction control information according to a user's instruction input for the display. Processing and reproducing the input stream data according to the stored reproduction control information. It is possible to provide a program for causing a computer.
[0016]
As described above, the time range of each partial stream data and the semantic role are displayed in association with each other, and the reproduction control information is edited according to the user's instruction input for this display, so that the stream data can be easily edited. It becomes.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
As shown in FIG. 1, in the stream data editing system of the present embodiment, stream data such as video and audio is input by the stream data input unit 11. The stream data input unit 11 may be a video / audio acquisition device such as a digital video camera, a device that receives stream data transmitted via a network such as the Internet or a LAN, or a storage device such as a DVD. An apparatus for reproducing stream data stored in a medium may be used.
[0018]
The input stream data is stored in the stream data storage unit 12 and input to the stream data processing unit 13. The stream data processing unit 13 includes a semantic role analysis unit 14, a playback control information creation unit 15, a playback control information storage unit 16, a stream playback unit 17, and a playback control information editing unit 18. The stream data processing unit 13 is specifically a CPU, and performs processing using software, that is, an editing program. An output unit 19 that outputs video and audio is connected to the stream reproduction unit 17 and the reproduction control information editing unit 18.
[0019]
The outline of the processing procedure in this embodiment will be described with reference to FIG. 2 together with the configuration of each unit in FIG.
First, stream data such as video and audio is input by the stream data input unit 11 (step S21). The input stream data is stored in the stream data storage unit 12 (Step S22).
[0020]
The input stream data is also passed to the semantic role analysis unit 14 in the stream data processing unit 13, and the semantic role analysis is performed (step S23). The semantic role analysis unit 14 extracts partial stream data included in the input stream data, analyzes the semantic role of the partial stream data, and adds semantic role identification information to the partial stream data.
[0021]
The semantic role analysis unit 14 preferably has a function of analyzing the correspondence between a plurality of partial streams, and when the correspondence is extracted, information indicating that there is a correspondence is included in the semantic role identification information. For example, since the question and the answer to it correspond to each other, the correspondence is extracted for each partial stream data of the question and the answer.
[0022]
The stream data to which the semantic role identification information is added by the semantic role analysis unit 14 is input to the reproduction control information creation unit 15, and the stream reproduction unit 17 reproduces the stream data based on the semantic role identification information and a predetermined rule. Playback control information for control is created (step S24). Although the reproduction control information will be described later, specifically, it is, for example, information for controlling whether or not each partial stream data is reproduced and a reproduction order. The created reproduction control information is stored in the reproduction control information storage unit 16 (Step S25).
[0023]
In the stream reproduction unit 17, based on the reproduction control information stored in the reproduction control information storage unit 16, the partial stream data corresponding to the reproduction control information in the stream data input from the stream data input unit 11 is stored in the stream data storage unit 12. And is reproduced as video and audio via the output unit 19 (step S26). The output unit 19 includes a display for displaying video and a speaker for outputting audio. Further, the output unit 19 outputs the edited stream data reproduced by the stream reproducing unit 17 to a disk medium such as a CD-R, a CD-RW, a DVD-R, a DVD-RW, a DVD-RAM, an HDD, or a video. It may be recorded on a tape medium such as a tape.
[0024]
The reproduction control information editing unit 18 presents an editing screen for reproduction control information via the output unit 19 based on the reproduction control information stored in the reproduction control information storage unit 16. The reproduction control information editing unit 18 further edits the reproduction control information in response to an input of an editing instruction on the editing screen from the user (step S27). The edited reproduction control information is stored again in the reproduction control information storage unit 16.
The semantic role analysis is not limited to the method described above, and another method may be used.
[0025]
Next, the semantic role analysis unit 14 will be described in detail. The semantic role analysis unit 14, for example, in a case where the input stream data is a conversational video, performs speech recognition by dividing the speech utterance in the conversational video at an appropriate position such as a break in the speech, and recognizes the recognized speech. Extract utterance patterns such as "Thank you" registered in advance in the pattern dictionary from the contents and calculate the likelihood of semantic roles such as "greeting", "question" and "answer" for the utterance from the appearance position of the utterance pattern .
[0026]
Next, the likelihood of the semantic role of each utterance is determined based on the transition probability of the semantic role of the utterance determined in advance (for example, the probability of occurrence of the context between the semantic roles such that the greeting is likely to be a greeting after the greeting). Is corrected. As a result, a process is performed in which the stream data of the interactive video is cut out into partial stream data in utterance units, and information of the determined meaning role is added to each partial stream data.
[0027]
Next, a specific example of the processing procedure of the semantic role analysis will be described with reference to FIG. The processing procedure of this semantic role analysis is as described in detail in Japanese Patent Application No. 2003-54427. First, the speech recognition text is read via the stream data input unit 11 or the stream storage unit 12, and morphological analysis is performed (steps S31 to S32). In the example of the morphological analysis result 101 shown in FIG. 4, the underlined portions indicated by the symbols 102, 103 and 104 are the speech recognition text portions subjected to the morphological analysis. For example, the part of the speech recognition text 102 that has been morphologically analyzed is a result of morphological analysis of the text "Thank you".
[0028]
Next, a morphological analysis result is analyzed by applying a prepared pattern rule (step S33). The pattern rule describes feature information identification information indicating the meaning of feature information and a morphological analysis pattern in association with each other or in association with each other. The feature information identification information is defined in advance, and represents, for example, the meaning of each utterance.
[0029]
FIG. 5 shows a pattern rule table 200 which is an example of a pattern rule description. Here, it is assumed that seven meanings of “greeting”, “sourcing”, “question”, “answer”, “confirmation”, “demonstration”, and “other” are defined in advance as semantic role identification information representing the meaning of each utterance. Assume. The pattern rule table 200 shown in FIG. 5 indicates in which of the semantic role identification information 201 each morpheme pattern 202 is likely to appear. The weighting coefficient (score) 203 is a numerical value representing, when a certain morpheme pattern appears, which semantic role identification information the morpheme pattern easily corresponds to. In the example of FIG. 5, the larger the numerical value of the weighting coefficient (score) 203, the more easily it becomes the corresponding meaning role. The morpheme pattern 202 is, for example, a morpheme pattern obtained by extracting a characteristic part that is likely to determine the meaning of each utterance from some conversation data. The part between the symbols <> added as the result of the morphological analysis indicates the part of speech.
[0030]
In FIG. 5, the semantic role identification information 201 is divided into a case where the seven types of semantic role identification information described above are utterances of the questioner and a case where the seven types of semantic role identification information are utterances of the respondent. The identification information with a symbol (Q) after the semantic role identification information such as “greeting” or “sootsu” represents the identification information of the questioner, and the identification information with the symbol (A) identifies the respondent. Represents information. That is, the semantic role identification information 201 shown in FIG. 5 also includes information on the roles of the questioner and the respondent.
[0031]
In the example of FIG. 5, if they contain morphological pattern of "Hello <feeling>" in the morphological pattern 202 in the speech, which was carried out morphological analysis, one of the things in there even if respondents of the utterance is questioner Even if there is, it means that it is easy to assume the meaning role of "greeting". Show that utterances containing the morpheme pattern "whatever <attachment" are likely to be "questions" if they belong to the questioner, and "answers" if they belong to the respondent. I have. Therefore, the semantic role identification information 201 shown in FIG. 5 is classified according to the role of the questioner, the respondent, etc. in order to determine the meaning of the utterance.
[0032]
In the pattern rule application step S33, the morphological analysis result of each utterance is analyzed according to the pattern rule table 200 shown in FIG. 5, and semantic role identification information corresponding to the utterance is estimated. For example, if there is a text "Hello" in the speech recognition result, which matches the morphological pattern of "Hello <feeling>" in the pattern rule. In the case in which this is spoken of the questioner, as meaning role candidates for the text "Hello", "greeting", "nod", "question", "answer", "check", "Demonstration" and "Other" Of the seven meaning role identification information, the score of "greeting (Q)" in the pattern rule is added.
[0033]
A single pattern match may add a score to a plurality of semantic role identification information. A plurality of morpheme patterns may match one voice recognition text. In this case, the score of the matched morpheme pattern is added each time. When the speaker is the questioner, a score of only the semantic role identification information (semantic role identification information with the symbol Q) for the questioner is added, and when the speaker is the respondent, the semantic role identification information for the respondent ( Only the score of the meaning role identification information with the symbol A) is added.
[0034]
Next, semantic role identification information is assigned (step S34). In step S34, semantic role identification information having the highest score is assigned to each speech recognition result text. It is not necessary to assign identification information to an utterance whose semantic role is unknown because it does not match the morpheme pattern. The semantic role identification information may not be assigned in the semantic role identification information assignment step S34, and may be assigned in the semantic role identification information assignment modification step S35 after the processing of the semantic role identification information transition probability applying step S35.
[0035]
In the semantic role identification information transition probability application step S35, the semantic role of each utterance is estimated not from the morpheme pattern but from the context of the conversation. The existing semantic role analysis for text data performs only the pattern matching described above and gives the semantic role with the highest score (for example, “Application of Knowledge Information Sharing System (KIDS) to Help Desk Business”, thirteenth Proceedings of the Annual Conference of the Japanese Society for Artificial Intelligence, p484-487 (1999)).
[0036]
Since there is a possibility that a recognition error is included in the speech recognition result, there is a possibility that sufficient accuracy cannot be obtained only by the correspondence between the morphological pattern and the semantic role identification information. On the other hand, it is expected that the dialogue has a restriction on the transition of the semantic role identification information such as “a question precedes an answer”. Therefore, in the semantic role identification information transition probability applying step S35, the score of each semantic role identification information for each speech recognition result obtained in the pattern rule applying step S33 is changed to another semantic role for each semantic role identification information. The probability is corrected using the data of the semantic role identification information transition probability table that defines the probability.
[0037]
The semantic role identification information transition probability table defines, for example, for all the semantic role identification information assigned to the utterance for each questioner or respondent, the probability of which semantic role identification information is likely to appear next to each semantic role identification information is defined. It is a table. As described above, the semantic role identification information includes information on the roles of the questioner and the respondent, and as a result, the semantic role identification information transition probability table includes the semantic role identification information based on the roles of the questioner and the respondent. Including transition probabilities.
[0038]
FIG. 6 shows an example of the semantic role identification information transition probability table 300. The meaning role identification information transition probability table 300 in this example indicates the transition probability from the meaning role identification information 301 of the preceding utterance to the meaning role identification information 302 of the subsequent utterance. In addition to the semantic role identification information, the semantic role identification information transition probability table 300 also includes “start” indicating the start of the dialog and “end” indicating the end of the dialog. In this manner, the probability that each semantic role identification information appears at the beginning of the conversation and the probability that it appears at the end of the conversation can also be used. In the semantic role identification information transition probability table 300, for example, the probability that the utterance at the head of the dialogue is the greeting of the questioner is 0.56, and the probability that the utterance following the greeting of the questioner is the greeting of the respondent is 0.54.
[0039]
For example, the Vidabi algorithm is used to correct the score based on the transition probability. At the time of correcting the score based on the transition probability, for the utterance that did not match the morpheme pattern, all the scores are 0. Therefore, before the correction, all the scores include, for example, (1 / number of semantic role identification information). Preprocessing such as giving equal scores may be performed.
[0040]
Next, in the semantic role identification information assignment modification step S36, the optimal semantic role identification information derived in the semantic role identification information transition probability applying step S35 is assigned to the text of each speech recognition result. By using the transition probability, semantic role identification information can be assigned to an utterance for which a semantic role could not be identified by analysis using a morphological pattern.
[0041]
If the optimal semantic role identification information cannot be found based on the transition probability information in the semantic role identification information transition probability table in the semantic role transition probability applying step S35, the meaning assigned in the semantic role identification information assigning step S34. Role identification information may be employed. When the semantic role identification information is not allocated in the semantic role identification information allocation step S34, the semantic role identification information having the highest score in the morphological pattern application step S33 is adopted.
[0042]
By the semantic role analysis as described above, for example, as shown in FIG. 7, before the speech recognition result of the dialogue, the start time and end time of the utterance, the name of the speaker (Q, A, etc.), the "question", "answer""Semantic role identification information indicating a semantic role such as" Sato ha "is added and output.
[0043]
As shown in FIG. 8, the video of the questioner and the respondent may be input by the stream data input unit 11 as another video stream through another video camera or the like. In such a case, as shown in FIG. 9, a video stream which is a video of a questioner and a respondent is divided into respective partial streams, voice recognition is performed, and output as separate data.
[0044]
Next, using the time-series data of each utterance included in the two speech recognition result data, the utterances are arranged in the order of utterance and merged as one conversation data. A semantic role analysis is performed on the merge result as shown in FIG. 8, and a semantic role analysis result as shown in FIG. 10 is generated. FIG. 10 differs from FIG. 7 in that there are a plurality of video stream IDs as shown in the upper part of FIG.
[0045]
Next, video reproduction control information is created based on the semantic role analysis result. For example, the reproduction control information creation rule is applied as follows to create the reproduction control information of the video. When the roles of the questioner and the respondent are determined in advance, as shown in FIG. 11, only the video of the question utterance portion of the questioner as the questioner video and the video of the answer utterance portion of the respondent as the respondent video are arranged side by side. , Create playback control information. In the case of this playback control information creation rule, other greetings, companions, questions by respondents, and answers by questioners included in the video are not included in the playback control information.
[0046]
An example of a procedure for creating the reproduction control information in the playback control information creating unit 15 will be described with reference to FIG.
First, one piece of utterance data is input by the stream data input unit 11 as partial stream data included in the stream data in order of shooting time (step S41). From the semantic role identification information added to the input utterance data by the semantic role analysis, it is determined whether the semantic role of the utterance data is a question of the questioner (step S42). If the semantic role of the utterance data is not the answer of the questioner, it is determined whether the semantic role of the utterance data is the answer of the respondent (step S43).
[0047]
If the semantic role of the utterance data is the question of the questioner or the answer of the respondent, a new scene ID is assigned to the utterance data to generate a reproduction list as reproduction control information, which is stored in the reproduction control information storage unit 16. (Step S44). Hereinafter, the processing of steps S41 to S44 is repeated.
[0048]
FIG. 13 shows an example of a reproduction list, which is reproduction control information generated by the procedure of FIG. 12 based on the semantic role analysis result in FIG. The reproduction time of each scene data, which is a partial stream data, is set to the number of seconds obtained by subtracting the start time from the end time of the utterance added to the head of the semantic role analysis identification information. In the reproduction list of FIG. 13, there are two stream data of the video of the questioner and the video of the respondent. Therefore, the reproduction control is performed on the ID of the stream data of the questioner and the ID of the stream data of the respondent. When storing the information and reproducing the content, the video is reproduced from the two stream data based on the time information of the corresponding video.
[0049]
In the playlist of FIG. 13, the video of the questioner and the video of the respondent are different stream data, but they may be one stream data. Further, in FIG. 13, the reproduction order information is added to each scene data and the reproduction is designated in order from the first. However, the information of the reproduction order is not added, and the reproduction is performed in order according to the arrangement order of the scene data. May be. Further, a time from the start of reproduction to the reproduction of each scene may be designated as the reproduction start time of each scene data. This makes it possible to create a portion in which video does not flow between scenes, or to specify that there is a time during which two videos are reproduced overlappingly.
[0050]
Next, the reproduction control information stored in the reproduction information storage unit 16 as described above is read, and the reproduction information editing unit 18 performs an editing operation. FIG. 14 shows an example of an editing screen used as an editing tool in the reproduction control editing unit 18. When the reproduction information editing unit 18 reads the reproduction control information shown in FIG. 13, this is displayed on the output unit 19 as an editing screen as shown in FIG. In the example of FIG. 14, the playback display portion of the video which is the stream data played back by the stream playback unit 17 is incorporated in the left portion of the editing window, but the editing screen is displayed on a screen different from the playback display of the stream data. May be displayed.
[0051]
According to FIG. 14, a display (hereinafter, referred to as an editing display) 400 for editing the reproduction control information exists below the editing window. In this example, the display 400 for editing includes, in this example, a bar 401 extending in the horizontal direction representing each time range of the partial stream data, and “question”, “answer”, etc. displayed in characters adjacent to the lower side of the bar 401. Has a meaning role display 402. The bar 401 has a dividing line at a position corresponding to the time position of the boundary of the partial stream data, whereby the time range of the partial stream data can be known. Further, the meaning role of each partial stream data can be known from the meaning role display 402 below the bar 401. Further, the editing display 400 also has a timeline 403 displaying the time adjacent to the upper side of the bar 401.
[0052]
In addition to displaying the semantic roles in characters, the questions may be expressed in different colors, such as blue for questions and red for answers, and fonts and character attributes may be purchased.
[0053]
From the editing display 400 in the example of FIG. 14, it can be understood that the stream data whose reproduction is controlled by the reproduction control information is a content having a structure in which questions and answers are arranged. Here, when a position where the meaning role display 402 is displayed as “answer” is selected with an arrow cursor on the bar 401 indicating the time range of the partial stream data by the user's instruction input, for example, the answer portion of the stream data is selected. Is reproduced, the content of the answer can be checked.
[0054]
In addition, the utterance contents of each scene may be summarized and the semantic role may be displayed as a scene type so that the user can edit the scene. For example, as shown in FIG. 15, the title and the outline may be edited manually. After performing the manual editing, for example, by instructing “update” in the editing window, the edited content is stored in the reproduction control information storage unit 16 as new reproduction control information. FIG. 16 shows an example of the playlist after updating the playlist, which is the playback control information in FIG. 13, according to the update instruction on the editing screen in FIG.
[0055]
As described above, the time range of each partial stream data and the semantic role are displayed in association with each other, and the reproduction control information is edited according to the user's instruction input for this display, so that the stream data can be easily edited. It becomes. That is, the structure of the video or audio to be edited can be understood at a glance, and trial and error in the editing operation is reduced. In particular, the editing work of reproducing and checking only the partial stream to which a specific semantic role is added as described above, or selecting or removing the partial stream data, changing the semantic role, and extracting the partial stream data as described later. Editing work such as changing the range can be performed efficiently.
[0056]
Furthermore, by editing based on the semantic role of the material, it is possible to create content that is easy for other people to understand and learn. As a result, even a general user who is not particularly proficient in video editing can easily create and edit video and audio that efficiently transmit knowledge.
[0057]
Next, another procedure for creating the reproduction control information will be described with reference to FIG. 17, using the semantic role analysis result shown in FIG. 10 as an example.
The utterance data of FIG. 10 is input one by one (step S51), and it is determined whether the meaning role of the utterance data is a question of a questioner (step S52). In the example of FIG. 10, the utterance data “Uh, Daikanyama ,,,,” is the question of the questioner. When this utterance data is input, the process proceeds to step S54 to determine whether or not the scene data is registered in the playlist. Check if. Here, since scene data has not been registered in the playlist yet, new scene data (scene ID: 0001) is created and registered in the playlist as shown in FIG. 18 (step S58).
[0058]
Since the next utterance data in FIG. 10 "Yes, uh, Western style," is the answer of the respondent, the process proceeds from step S52 to step S54 via step S53, and determines whether or not the scene data is registered in the playlist. Check. Since scene data has already been registered in the reproduction list in FIG. 18, the flow advances to step S55 to check whether or not the video stream is the same as the previous scene data. Here, as shown in FIG. 18, the video stream ID of the immediately preceding scene data (scene ID: 0001) belongs to the questioner and is a different video stream from the respondent, so the process proceeds to step S56, and FIG. , New scene data (scene ID: 0002) is created and registered in the playlist.
[0059]
The utterance data “Yes.” Next to the utterance data “Yes, that is, Western style,” in FIG. 10 is a hammer, and is not included in the playlist. Further, since the next utterance data "There is also Asian cuisine," is the answer of the respondent, the process proceeds from step S54 to step S55. Since the immediately preceding scene data belongs to the respondent and has the same video stream ID, the process proceeds from step S55 to step S56, and it is determined whether or not the time interval with the immediately preceding scene data is within 2 seconds.
[0060]
As shown in FIG. 10, the start time of the utterance data “There is also Asian food,” is 00:15. On the other hand, the start time of the utterance data “yes, that is, western style”, which is the immediately preceding scene data, is 00:07 and the end time is 00:10, and the time interval determined in step S56 is 5 seconds. In step S58, new scene data (scene ID: 0005) is created as shown in FIG.
[0061]
By proceeding in this manner, the reproduction list before the analysis of the utterance data "Well, it becomes a standard" is as shown in FIG. As shown in FIG. 10, the start time of the utterance data “Well, it's going to be a standard” is 00:22, and the end time of the previous “In Asia” is 00:21. One second. Therefore, the process proceeds from step S56 to step S57, and utterance data is added to the outline of the immediately preceding scene data (scene ID: 0004) as shown in FIG. The reproduction time is set to 4 seconds by subtracting the start time 00:20 of the immediately preceding scene data from the end time 00:24 of the utterance data to be added.
[0062]
Hereinafter, similarly, the answer data of the respondent is added to the scene data of the scene ID: 0004, and a reproduction list as shown in FIG. 23 is generated. When the playlist shown in FIG. 23 is viewed with the editing tool, it is displayed as shown in FIG. As shown in FIG. 24, the scene data of the scene ID: 0004 is pasted as one continuous video stream instead of a unit divided by the semantic role analysis processing.
[0063]
Hereinafter, some specific examples of stream data editing by editing the playback control information using the editing screen shown in FIG. 14 will be described.
(Divide stream data)
First, an example of a process of dividing continuous stream data in the middle will be described. For example, in the scene data of scene ID: 0002 in FIG. 25, when dividing the video and audio of the part of "Oh, that's the Western style" of the remark "Oh, that's the Western style," the playback and interruption are performed while watching the video. Is repeated, and the playback is interrupted between "Ah," and "Western style." Here, as shown in FIG. 25, when the user selects and instructs “split” using, for example, a pop-up menu, the data structure is updated to a data structure like a playlist shown in FIG.
[0064]
According to FIG. 26, the reproduction time of the scene data of the scene ID: 0002 becomes 2 seconds, and thereafter, the scene data of the scene ID: 0005 is newly inserted. The start time of the scene data of the scene ID: 0005 is immediately after the reproduction time of the scene data of the scene ID: 0002. The playback order of the scene data of the inserted scene ID: 0005 becomes 3, and the playback order of the subsequent scene data is moved down by one. In FIG. 26, the same data as the scene ID: 0002 is automatically inserted into the scene ID: 0005 in the outline sentence and the semantic role tag. A process of changing the summary sentence may be performed.
[0065]
In addition, the outline and the meaning role can be manually changed. For example, the work of manually changing the semantic role of the scene ID: 0002 to “Aoi” and the outline to “Yes, that is”, and changing the outline of the scene ID: 0005 to “Western style” and updating manually is performed. It is possible.
[0066]
In this way, by performing the stream data dividing operation, the user can change the semantic role extraction range.
[0067]
(Deletion of unnecessary scene data)
Next, an example of processing for deleting unnecessary scene data will be described. For example, when the scene data of scene ID: 0002 is to be deleted from the reproduction list of FIG. 26 created in advance, the user selects a scene on the bar indicating the time range of the partial stream data as shown in FIG. And select and instruct "Delete" from a pop-up menu. As shown in FIG. 28, the scene data of the scene ID: 0002 is erased, and stream data in which the reproduction order of the subsequent scene data is moved up one by one is created. Therefore, when the outline data of the scene ID: 0005 is modified to only "Western style" as shown in FIG. 29 and the data is updated, the reproduction list becomes as shown in FIG.
[0068]
(Replace stream data)
Next, a process of replacing stream data (for example, a video stream) of some scenes in a playlist with another data will be described. For example, suppose that the video of the second answer scene is replaced with another video in a playlist as shown on the left of FIG. For example, when the user selects and instructs “open” using a pull-down menu or the like as shown in FIG. 31, a dialog for inputting a file name of a playlist to be replaced is displayed as shown in FIG. Here, in FIG. 32, when a file name is input to the dialog and "open" is instructed, another editing window displaying the specified playlist is displayed.
[0069]
On the other hand, when the button "Search" is clicked in FIG. 32, for example, a search window as shown in FIG. 33 is displayed. Here, when the user inputs a question sentence such as, for example, "Japanese restaurant in Daikanyama" and instructs "search", the search result of the character information of the playlist using the natural language search technology is displayed in the search window. A list is displayed. The display of the numbers 1, 2, ... in the search window indicates the order of the highest score in the search results. The score may be indicated by a mark or the like indicated by a black star. Next to the score, the title, outline, and the like of the corresponding scene in the searched playlist are displayed. Below that, a rectangle or the like indicating the length of each scene in the playlist may be displayed. Which scene corresponds to is indicated by a bold frame or the like. The first video (thumbnail) of a scene may be displayed for each scene. As shown in FIG. 34, semantic roles such as “question” and “answer” may be displayed below the rectangle indicating the scene.
[0070]
When the user selects a corresponding scene or file in the search windows of FIGS. 33 and 34, the play list is displayed in another editing window as shown in FIG. The second scene of the opened playlist contains a video of answers about Japanese food in Daikanyama. When this scene is selected on the bar indicating the time range of the partial stream data displayed at the bottom of FIG. 35 and "Copy" is selected and instructed by a pop-up menu or the like, the information of the playlist of the selected scene is stored in the buffer. Is copied to
[0071]
Next, in the playlist window that was first opened as shown in FIG. 31, the user selects a scene to be replaced on the bar indicating the time range of the partial stream data as shown in FIG. When "replace" is selected in a menu or the like, the scene data of the scene ID: 0002 of the playlist is replaced with the information of the playlist copied to the buffer as shown in FIG. The video stream ID, start time, playback time, title, summary, semantic role, speaker role, etc. are changed. FIG. 37 shows the display of the summary column when a scene to be replaced is selected on the bar indicating the time range of the partial stream data.
[0072]
(Insert stream data)
On the other hand, when inserting new scene data without replacing the scene data, stop playback at the position where the user wants to insert as shown in an example in FIG. 39, and select “Insert” from a pop-up menu or the like. By instructing, new scene data is inserted as shown in FIG. As shown in FIG. 41, new scene data of scene ID: 0005 is inserted in the middle of the playback list, and the playback order is next to the immediately preceding scene. In the subsequent scene data, the reproduction order is moved down one by one.
[0073]
(Replacement recording of stream data)
Next, an example in which video and audio of scene data are replaced on the spot by after recording (after recording) will be described below. As shown in FIG. 42, the user selects a scene whose video or audio is to be replaced on the bar indicating the time range of the partial stream data, and instructs “replacement recording”. As a result, for example, as shown in FIG. 43, a window for capturing and recording video and audio from the camera is displayed. When the user instructs “start recording” in the recording window, the video and audio that the user speaks to the camera on the spot are captured. In this case, the camera may capture not only the user's face but also a document at hand, or a demonstration video to be described while operating the device.
[0074]
Here, for example, if "interrupt" is selected, the photographing is temporarily stopped, and if "end" is selected, the photographing is ended. After that, when “execute replacement” is selected, as shown in FIG. 44, the scene data of the scene selected as shown in FIG. 42 is replaced with the newly recorded video data.
[0075]
At this time, the data before replacement may be left as it is for the outline part and the scene type. Conversely, the voice recorded during or after the photographing may be subjected to voice recognition, and the outline may be replaced with the result of voice recognition. By such replacement recording, the data of the playlist is changed, for example, as shown in FIG.
[0076]
FIG. 46 shows another example of the editing screen. Here, the scene information and the video information are divided into different tracks, and a plurality of partial stream data can be combined into one scene. This makes it possible to add scene information in a large unit of meaning, regardless of the unit of the semantic role analysis of the utterance.
[0077]
For example, as shown in FIG. 46, if a question and its answer are grouped into a scene, and a title and a brief description are added to the scene, editing work by the user is reduced, which makes it easier. In addition, at the time of retrieval, not only retrieval in units of small meaning roles can be performed, but also retrieval in scene units can be performed, and an effect of making display of retrieval results easy to expect can be expected.
[0078]
In this case, the semantic role analysis unit 14 analyzes the correspondence between a plurality of partial streams together with the analysis of the semantic role as described above, and extracts the correspondence to convert information indicating that there is a correspondence into the semantic role identification information. Include. On the other hand, the playback control information creation unit 15 creates playback control information (playlist) that controls the presence / absence of playback and the playback order for a plurality of partial stream data from which the correspondence has been extracted.
[0079]
FIG. 47 shows the structure of a playlist corresponding to the editing screen of FIG. Play list data has a two-layer structure of scene data and shot data. From the result of the semantic role analysis, for example, a procedure shown in FIG. 48 is used to automatically generate a question-answer pair having a correspondence between the partial stream data so as to belong to the same scene. The procedure in FIG. 48 is similar to the procedure shown in FIG.
[0080]
First, utterance data is input one by one (step S61), and it is determined whether the semantic role of the utterance data is a question of a questioner (step S62). If the input utterance data is not the question of the questioner, it is determined in step S63 whether the meaning role of the utterance data is the answer of the respondent. If the input utterance data is the question of the questioner or the answer of the respondent, the process proceeds to step S64 to check whether shot data is registered in the play list. If the input utterance data is neither a question of the questioner nor a response of the respondent, the process returns to step S61.
[0081]
If the shot data is registered in the play list, the flow advances to step S65 to check whether or not the inputted utterance data is the same video stream as the immediately preceding shot. Check if the interval is within 2 seconds. If the input utterance data is the same video stream as the immediately preceding shot and the time interval from the immediately preceding shot is within 2 seconds, the utterance word is added to the immediately preceding shot data in step S67 to increase the reproduction time. .
[0082]
If shot data is registered in the play list, but the input utterance data is not the same video stream as the previous shot, and if the input utterance data is the same video stream as the previous shot, If the time interval is not less than 2 seconds, the flow advances to step S68 to create new shot data. Thereafter, in step S69, it is determined whether or not the new shot data is the same video stream as the immediately preceding shot. If they are the same, the new shot data is connected to the scene to which the immediately preceding shot belongs in step S71.
[0083]
If the new shot data is not the same as the immediately preceding video stream, the process proceeds to step S70 to determine whether the semantic role of the utterance data is the answer of the respondent, and if so, the process proceeds to step S71. In step S70 and the previous step S63, if the semantic role of the utterance data is not the answer of the respondent, the process proceeds to step S72, in which new scene data is created and shot data is connected therebelow. Upon completion of the process in the step S71 or the step S72, the process returns to the step S61, and the above process is repeated.
[0084]
When the scenes are combined as described above, the work of the user when deleting or adding the order for each scene becomes easy. For example, by selecting a bar representing a scene in FIG. 46 and deleting it with a pop-up menu or the like and inserting it at a destination, the user can operate a plurality of partial stream data having a corresponding relationship collectively. . When the bar representing the "answer" in the video in FIG. 46 is selected and deleted, the length of the scene is reduced by that amount and automatically changed to a scene including the remaining questions and answers one by one. Is done.
[0085]
On the other hand, in FIG. 47, the shot is always provided below the scene in two layers, but the constraint relationship may be eliminated. A scene is a scene, and a shot is a shot, and a reproduction start time at the time of reproduction is separately provided as data, and the scene can be changed even if the video is continued. This makes it possible to create a playlist in which only the subtitles are changed even if the video is the same as in karaoke.
[0086]
FIG. 49 shows an example of a dialogue viewer for confirming the semantic role analysis result from which a playlist is created. Here, the semantic role analysis result of FIG. 10 is displayed. When playback is started, the video of the questioner and the respondent is played simultaneously, displaying the semantic role analysis data of the utterance and the voice recognition data, and changing the color of the utterance being played on the timeline below. Indicated by etc.
[0087]
As shown in the dialogue viewer of FIG. 50, the meaning role and the content of the utterance may be displayed below the bar indicating the utterance on the timeline. Alternatively, when the user directly instructs the bar representing the utterance, the reproduction may be started from the utterance.
[0088]
When the user calls the dialogue viewer from the editing screen or the like while the playlist is being displayed, the utterance is changed by changing the color of the bar of the video data section included in the playlist as shown in the dialogue viewer of FIG. May be explicitly displayed. The user can listen to the entire conversation and check the context of the utterance included in the playlist. It is also possible to check for errors in the semantic role analysis results, and to confirm whether important utterances are not omitted from the playlist.
[0089]
FIG. 52 shows another example of the interactive viewer. The user can select to play only the selected utterance or to play all of the interactive data. Rather than manually selecting utterances one by one, by specifying the speaker role and semantic role of the utterance with check boxes, such as "questioner's question" and "answerer's answer" Alternatively, an interface for selecting all at once may be prepared. Thereby, the user's instruction work is reduced, and the conversation data can be efficiently confirmed.
[0090]
As shown in FIG. 53, when the user selects utterance data again and then instructs “new creation” by a pop-up menu or the like, a new playlist including the selected utterance data may be created. In addition, after selecting "Copy", by specifying "Insert" or "Replace" at any point in the playlist opened with the editing tool, important video data leaked from the automatically generated playlist To a playlist.
[0091]
Further, as shown in FIG. 54, the scene may be represented by an image or the like instead of being represented by a bar. For example, an editing screen may be provided for adding a semantic role to a characteristic image (thumbnail) of each scene and displaying the image. The user can easily change the order, copy, delete, etc., by selecting a scene and performing an operation such as drag and drop.
[0092]
In the present embodiment, the information stored in the playlist is the title, the outline, the semantic role, the speaker role, the speech recognition result, and the like, but is not limited thereto. For example, although the outline is described for a general audience, an editing tool that can describe the outline for each of multiple levels of users, such as beginners, intermediates, and ages, may be used. The displayed outline can be changed according to the user who views the completed content.
[0093]
In the case of a video in which an object is shown to the camera and explanations are given while performing an operation demonstration, the name of the object being explained, the name of a function, or the like may be input in detail. The name of the object or the name of the function may be automatically extracted from the speech recognition result by the information extraction technique, or may be manually input by the user who operates the editing tool. This makes it possible to search for an appropriate scene in response to a question such as "Tell me about the XX operation of XX" and show it to the user.
[0094]
Further, in the present embodiment, video and audio are handled on the same track, but an audio track may be provided separately. Thus, for example, a video can be reproduced in such a manner that the voice data of the respondent is always reproduced.
[0095]
Furthermore, it becomes possible to post-record only video or audio, or replace it with another stream data. Further, by providing not only one audio track but also a plurality of audio tracks, it is possible to simultaneously reproduce the voices of the respondent and the questioner, and to reproduce the commentary and BGM in a superimposed manner.
[0096]
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying constituent elements in an implementation stage without departing from the scope of the invention. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, components of different embodiments may be appropriately combined.
[0097]
【The invention's effect】
As described above, according to the present invention, it is possible to efficiently perform stream data editing work which has been very complicated in the past.
[0098]
For example, when the automatically created content is manually corrected, the user can perform the correction work without trial and error because the intention of the system to create the content is clearly shown. As a result, even a general user who is not accustomed to video editing can edit video and audio by himself and quickly create and edit knowledge transfer content as intended.
[0099]
In addition, it is easy to replace a part of the video or audio with another video or audio. For example, it is possible to easily perform an editing operation in which only a video explaining the answer is prepared according to the level of the user, and a video in which only the part is replaced is presented to the user.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a stream data editing system according to an embodiment of the present invention.
FIG. 2 is an exemplary flowchart showing a procedure for editing stream data according to the embodiment;
FIG. 3 is a flowchart showing a processing procedure of semantic role analysis in the embodiment.
FIG. 4 is a diagram showing an example of a morphological analysis result in semantic role analysis;
FIG. 5 is a diagram showing an example of a pattern rule table used in semantic role analysis.
FIG. 6 is a diagram showing an example of a semantic role identification information transition probability table used in semantic role analysis.
FIG. 7 is a diagram illustrating an example of a state in which semantic role identification information is added to partial stream data by semantic role analysis.
FIG. 8 is a diagram showing an outline of stream data editing when video of a questioner and a respondent is input as different video streams.
FIG. 9 is a diagram showing a state in which a video stream, which is a video of a questioner and a respondent, is divided into partial streams, and the stream is recognized and output as separate stream data.
FIG. 10 is a diagram showing an example of a semantic role analysis result.
FIG. 11 is a diagram showing an example of creating reproduction control information.
FIG. 12 is a flowchart showing an example of a procedure for creating reproduction control information.
13 is a diagram showing an example of reproduction control information created based on the semantic role analysis result in FIG.
FIG. 14 is a diagram showing an example of an editing screen for editing playback control information.
FIG. 15 is a diagram showing an example of an editing screen in which an update instruction has been issued after editing the reproduction control information;
FIG. 16 is a diagram showing an example of reproduction control information after updating;
FIG. 17 is a flowchart showing another example of a procedure for creating reproduction control information.
FIG. 18 is a diagram showing playback control information after creating first new scene data in a procedure for creating playback control information.
FIG. 19 is a diagram showing playback control information after creating second new scene data in the playback control information creating procedure.
FIG. 20 is a diagram showing playback control information after creating third new scene data in the creation procedure of the playback control information.
FIG. 21 is a diagram showing reproduction control information before analysis of specific utterance data in a reproduction control information creation procedure;
FIG. 22 is a diagram showing reproduction control information after addition of specific utterance data in a reproduction control information creation procedure;
FIG. 23 is a diagram showing an example of final reproduction control information in a procedure for generating reproduction control information.
24 is a diagram showing an example of an editing screen on which the reproduction control information of FIG. 23 is reflected.
FIG. 25 is a diagram showing an editing screen for describing stream data division processing which is a specific example of stream data editing.
FIG. 26 is a diagram showing playback control information after updating during stream data division processing;
FIG. 27 is a diagram showing an editing screen for describing a process of deleting unnecessary scene data, which is a specific example of stream data editing.
FIG. 28 is a diagram showing playback control information after updating during unnecessary scene data deletion processing.
FIG. 29 is a diagram showing an editing screen in which summary data has been corrected after unnecessary scene data deletion processing;
FIG. 30 is a diagram showing playback control information after being updated by correction of summary data.
FIG. 31 is a diagram showing a first editing screen for describing stream data replacement processing which is a specific example of stream data editing.
FIG. 32 is a view showing a second editing screen for explaining stream data replacement processing;
FIG. 33 is a diagram showing a third editing screen for explaining stream data replacement processing;
FIG. 34 is a view showing a fourth editing screen for explaining stream data replacement processing;
FIG. 35 is a view showing a fifth editing screen for explaining stream data replacement processing;
FIG. 36 is a view showing a sixth editing screen for explaining stream data replacement processing;
FIG. 37 is a view showing a seventh editing screen for explaining stream data replacement processing;
FIG. 38 is a diagram showing playback control information after updating by replacing stream data.
FIG. 39 is a diagram showing a first edit screen for describing stream data insertion processing which is a specific example of stream data editing.
FIG. 40 is a diagram showing a second editing screen for explaining stream data insertion processing;
FIG. 41 is a diagram showing playback control information after updating by stream data insertion.
FIG. 42 is a diagram showing a first edit screen for describing stream data replacement processing which is a specific example of stream data editing.
FIG. 43 is a view showing a second editing screen for explaining stream data replacement processing;
FIG. 44 is a view showing a third editing screen for explaining stream data replacement processing;
FIG. 45 is a diagram showing playback control information after updating by replacing stream data.
FIG. 46 is a diagram showing another example of an edit screen for editing reproduction control information.
47 is a diagram showing playback control information for realizing the editing screen in FIG. 46.
FIG. 48 is a flowchart showing another example of a procedure for creating playback control information.
FIG. 49 is a diagram showing an example of a dialogue viewer for confirming the semantic role analysis result.
50 is a diagram showing a modification of the dialogue viewer shown in FIG. 49.
FIG. 51 is a diagram showing another modification of the interactive viewer shown in FIG. 49;
FIG. 52 is a diagram showing another example of the dialogue viewer for confirming the semantic role analysis result.
FIG. 53 is a view showing a modification of the dialogue viewer shown in FIG. 52;
FIG. 54 is a view showing still another example of an editing screen for editing playback control information.
[Explanation of symbols]
11 stream data input section, 12 stream data storage section, 13 stream data processing section, 14 semantic role analysis section, 15 playback control information creation section, 16 playback control information storage section, 17 stream playback section, 18: playback control information editing unit, 19: output unit.

Claims

音声及び映像の少なくとも一方を含むストリームデータを入力するステップと、
入力されるストリームデータ中の各々の部分ストリームデータが持つ情報伝達における意味役割を解析し、該意味役割を表す意味役割識別情報を前記部分ストリームデータに付加するステップと、
前記意味役割識別情報に基づいて前記部分ストリームデータの各々の再生の有無及び再生順序を制御する再生制御情報を作成するステップと、
前記再生制御情報を記憶するステップと、
前記部分ストリームデータの各々の時間範囲と前記意味役割とを対応付けて表示し、該表示に対するユーザの指示入力に従って、記憶された再生制御情報を編集するステップと、
前記記憶された再生制御情報に従って、前記入力されるストリームデータを再生するステップとを具備するストリームデータ編集方法。Inputting stream data including at least one of audio and video;
Analyzing the semantic role in information transmission of each partial stream data in the input stream data, and adding semantic role identification information representing the semantic role to the partial stream data;
Creating playback control information for controlling the presence / absence and order of playback of each of the partial stream data based on the semantic role identification information;
Storing the reproduction control information;
Displaying each time range of the partial stream data and the semantic role in association with each other, and editing the stored playback control information according to a user's instruction input for the display;
Playing the input stream data according to the stored playback control information.

前記意味役割は、「質問」、「回答」、「挨拶」、「相槌」、「解説」及び「報告」のうちの少なくとも一つを含む請求項１に記載のストリームデータ編集方法。2. The stream data editing method according to claim 1, wherein the semantic role includes at least one of “question”, “answer”, “greeting”, “accomplishment”, “explanation”, and “report”.

音声及び映像の少なくとも一方を含むストリームデータを入力する入力手段と、
入力されるストリームデータ中の各々の部分ストリームデータが持つ情報伝達における意味役割を解析し、該意味役割を表す意味役割識別情報を前記部分ストリームデータに付加する意味役割解析手段と、
前記意味役割識別情報に基づいて前記部分ストリームデータの各々の再生の有無及び再生順序を制御する再生制御情報を作成する再生制御情報作成手段と、
前記再生制御情報を記憶する記憶手段と、
前記部分ストリームデータの各々の時間範囲と前記意味役割を示す情報とを対応付けて表示し、該表示に対するユーザの指示入力に従って、記憶された再生制御情報を編集する編集手段と、
前記記憶手段に記憶された再生制御情報に従って、前記入力されるストリームデータを再生する再生手段とを具備するストリームデータ編集システム。Input means for inputting stream data including at least one of audio and video,
Semantic role analysis means for analyzing the semantic role in information transmission of each partial stream data in the input stream data, and adding semantic role identification information representing the semantic role to the partial stream data;
Playback control information creating means for creating playback control information for controlling the presence / absence and order of playback of each of the partial stream data based on the semantic role identification information;
Storage means for storing the reproduction control information;
Editing means for displaying the time range of each of the partial stream data and the information indicating the semantic role in association with each other, and editing the stored reproduction control information according to a user's instruction input for the display;
A stream data editing system comprising: a playback unit that plays back the input stream data in accordance with the playback control information stored in the storage unit.

前記意味役割は、「質問」、「回答」、「挨拶」、「相槌」、「解説」及び「報告」のうちの少なくとも一つを含む請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, wherein the semantic role includes at least one of “question”, “answer”, “greeting”, “accomplishment”, “commentary”, and “report”. 5.

前記編集手段は、前記時間範囲と前記意味役割を示す情報とを対応付けて表示する際に、前記時間範囲をバーで表示し、前記意味役割を示す情報を前記バーに隣接して表示する請求項３に記載のストリームデータ編集システム。The editing means, when displaying the time range and the information indicating the semantic role in association with each other, displays the time range with a bar, and displays the information indicating the semantic role adjacent to the bar. Item 4. The stream data editing system according to Item 3.

前記編集手段は、前記ユーザからの指示入力により前記意味役割を示す情報が変更可能である請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, wherein the editing unit is capable of changing information indicating the semantic role by inputting an instruction from the user.

前記編集手段は、前記意味役割の種類別の単位または前記部分ストリームデータの単位で前記編集を行う請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, wherein the editing unit performs the editing in units of the type of the semantic role or in units of the partial stream data.

前記編集手段は、前記ユーザからの指示入力により前記記憶手段に記憶されたストリームデータ中の部分ストリームデータを差し替えるように前記編集を行う請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, wherein the editing unit performs the editing so as to replace partial stream data in the stream data stored in the storage unit in response to an instruction input from the user. 5.

前記意味役割解析手段は、意味役割の解析と共に複数の部分ストリーム間の対応関係を解析し、前記再生制御情報作成手段は、該対応関係が抽出された複数の部分ストリームデータについて一括して再生の有無及び再生順序を制御する再生制御情報を作成する請求項３に記載のストリームデータ編集システム。The semantic role analysis unit analyzes the correspondence between the plurality of partial streams together with the analysis of the semantic role, and the reproduction control information creating unit collectively reproduces the plurality of partial stream data from which the correspondence has been extracted. 4. The stream data editing system according to claim 3, wherein reproduction control information for controlling presence / absence and a reproduction order is created.

前記意味役割解析手段は、少なくとも２本以上の関連する内容のストリームデータを対象として複数の部分ストリームの意味役割と部分ストリーム間の対応関係を解析するものであることを特徴とする請求項３に記載のストリームデータ編集システム。4. The method according to claim 3, wherein the semantic role analysis unit analyzes the semantic role of the plurality of partial streams and the correspondence between the partial streams for stream data of at least two or more related contents. The described stream data editing system.

前記再生手段は、前記記憶手段に記憶された再生制御情報に基づき、再生しないストリームデータの範囲を示す情報を表示する請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, wherein the reproducing unit displays information indicating a range of stream data not to be reproduced based on the reproduction control information stored in the storage unit. 5.

前記再生手段により再生されたストリームデータを記録する手段をさらに具備する請求項３に記載のストリームデータ編集システム。4. The stream data editing system according to claim 3, further comprising: means for recording the stream data reproduced by said reproducing means.

音声及び映像の少なくとも一方を含むストリームデータを入力する処理と、
入力されるストリームデータ中の各々の部分ストリームデータが持つ情報伝達における意味役割を解析し、該意味役割を表す意味役割識別情報を前記部分ストリームデータに付加する処理と、
前記意味役割識別情報に基づいて前記部分ストリームデータの各々の再生の有無及び再生順序を制御する再生制御情報を作成する処理と、
前記再生制御情報を記憶する処理と、
前記部分ストリームデータの各々の時間範囲と前記意味役割とを対応付けて表示し、該表示に対するユーザの指示入力に従って、記憶された再生制御情報を編集する処理と、
前記記憶された再生制御情報に従って、前記入力されるストリームデータを再生する処理とをコンピュータに行わせるためのプログラム。A process of inputting stream data including at least one of audio and video;
Analyzing the semantic role in information transmission of each partial stream data in the input stream data, and adding semantic role identification information representing the semantic role to the partial stream data;
A process of creating playback control information for controlling the presence or absence and playback order of each of the partial stream data based on the semantic role identification information;
A process of storing the reproduction control information;
A process of displaying each time range of the partial stream data and the semantic role in association with each other, and editing the stored reproduction control information according to a user's instruction input for the display;
A program for causing a computer to perform a process of reproducing the input stream data in accordance with the stored reproduction control information.