JP4342529B2

JP4342529B2 - Authoring support device, authoring support method and program, and authoring information sharing system

Info

Publication number: JP4342529B2
Application number: JP2006095943A
Authority: JP
Inventors: 秀樹筒井; 創吾坪井; 智弘山崎; 千加夫土谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2009-10-14
Anticipated expiration: 2026-03-30
Also published as: JP2007272975A

Description

本発明は、コンテンツに対してユーザがコメントを付与する作業を支援するオーサリング支援装置、オーサリング支援方法及びプログラム、並びにオーサリング情報共有システムに関する。 The present invention relates to an authoring support apparatus, an authoring support method and program, and an authoring information sharing system that support an operation in which a user gives a comment to content.

近年、コンテンツ（例えば映像コンテンツ）に対してユーザがコメントを付与するオーサリング技術が良く利用されるようになって来ている。 In recent years, an authoring technique in which a user gives a comment to content (for example, video content) has been frequently used.

しかし、コンテンツに対してコメントを付与する場合、コメント付与の対象とするコンテンツ中の一点（時間的に幅のない点）、あるいは、開始点及び終了点（開始時刻及び終了時刻）を、ユーザが手動により指定して、コメントを付与するしかなかった。 However, when a comment is given to content, the user selects one point (a point having no time width) in the content to be commented, or a start point and an end point (start time and end time). There was no choice but to specify it manually and add a comment.

例えば特許文献１には、ユーザが指定したシーンのある時刻に対してコメントを書く方法が開示されている。しかしながら、この方法では、ユーザは、ある時間的な範囲を持ったシーンについてコメントしたくても、そのシーン中のある時間的な１点にしかコメントを付与することができなかった。 For example, Patent Document 1 discloses a method of writing a comment at a certain time of a scene specified by a user. However, in this method, even if the user wants to comment on a scene having a certain temporal range, the user can give a comment to only one temporal point in the scene.

また、例えば一般的なビデオオーサリングツールでは、シーンの開始点と終了点を指定して、シーンを区切ることにより時間的な幅を持つシーンに対してコメントを付与することもできるが、この場合は、ユーザはシーンの開始点と終了点を指定しなければならなかった。 In addition, for example, in a general video authoring tool, it is possible to specify a scene start point and end point, and to add a comment to a scene having a temporal width by dividing the scene. In this case, The user had to specify the start and end points of the scene.

また、例えば非特許文献１には、映像に関して視聴者間でコミュニケーションをとる手法について開示されている。しかしながら、この方法では、ユーザが映像についてコメントを入力する範囲は、予め区切られた時間範囲であり、ユーザの任意の範囲を指定することはできなかった。
特開２００４−１９３８７１号公報山田一穂，宮川和，森本正志，児島治彦：映像の構造情報を活用した視聴者間コミュニケーション方法の提案，情報処理学会，グループウェアとネットワークサービス，第４３巻，２００１年 For example, Non-Patent Document 1 discloses a technique for communicating between viewers regarding video. However, in this method, the range in which the user inputs a comment about the video is a time range that is divided in advance, and the user's arbitrary range cannot be specified.
JP 2004-193871 A Kazuho Yamada, Kazu Miyagawa, Masashi Morimoto, Haruhiko Kojima: Proposal of communication method between viewers using video structure information, Information Processing Society of Japan, Groupware and Network Service, Vol. 43, 2001

従来のオーサリング支援技術においては、ユーザがコンテンツにコメントを付与する場合、ユーザがコメントを付与する対象とする範囲の開始点及び終了点を指定する必要があった。また、他の従来技術として、ユーザがコンテンツ中の１点のみを指定するものがあるが、これはコンテンツ中の１つの時間的な点に対してコメントを付与し、時間的に幅を持った領域にコメントを付与することはできなかったり、開始点だけを指定し、領域の幅がシーンによらずに予め決められてしまっていたりするものであった。 In the conventional authoring support technology, when a user gives a comment to content, it is necessary to specify a start point and an end point of a range to which the user gives a comment. In addition, as another conventional technique, there is one in which the user designates only one point in the content, but this gives a comment to one temporal point in the content and has a time range. A comment cannot be given to the area, or only the start point is specified, and the width of the area is determined in advance regardless of the scene.

本発明は、上記事情を考慮してなされたもので、ユーザがコンテンツ中の時間的に幅を持った領域にコメントを付与する場合に、ユーザの負担を軽減することの可能なオーサリング支援装置、オーサリング支援方法及びプログラム、並びにオーサリング情報共有システムを提供することを目的とする。 The present invention has been made in consideration of the above circumstances, and an authoring support apparatus that can reduce the burden on the user when the user gives a comment to an area having a temporal width in the content, It is an object to provide an authoring support method and program, and an authoring information sharing system.

本発明に係るオーサリング支援装置は、少なくとも映像を含むコンテンツを記憶する記憶手段と、前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、前記抽出の結果を表示する表示手段とを備えたことを特徴とする。
また、本発明に係るオーサリング支援装置は、少なくとも映像を含むコンテンツを記憶する記憶手段と、前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、前記抽出の結果を表示する表示手段とを備えたことを特徴とする。 The authoring support apparatus according to the present invention receives storage means for storing content including at least video, and input of a comment to be given to a predetermined scene in the content in association with a representative time indicating a specific location in the content. input means for attaching only, by comparing the contents of the comments entered, and each character information defined by advance one or a plurality in association with each individual scene type, imparts the comment discriminating means for discriminating the scene type of the predetermined scene, according to the judgment the scene type, the near point the region into a neighboring region or the representative time including representatives time in Tsu, the predetermined Sea as information indicating the down interval, the time information enabling specifying the start and end points of the section according to the scene type in the content An extraction means for extracting, and a display means for displaying the extraction result are provided .
Further, the authoring support apparatus according to the present invention associates storage means for storing content including at least video and input of a comment to be given to a predetermined scene in the content with a representative time indicating a specific location in the content. And comparing the input comment content and the character information obtained from the image data, audio data, or text data included in the content according to a method defined in advance in association with the scene type. And determining means for determining the scene type of the predetermined scene to which the comment is added, and an area including the representative time in the content or an area in the vicinity of the representative time according to the determined scene type. From the content in the content as information indicating the section of the predetermined scene. Characterized by comprising extracting means for extracting the time information as the start point of the section of the chromatography emission type enables identifying and end point, and display means for displaying the results of said extraction.

また、本発明は、サーバ装置と、複数のクライアント装置とを含むオーサリング情報共有システムにおいて、前記クライアント装置は、少なくとも映像を含むコンテンツを記憶する記憶手段と、前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、前記抽出の結果を表示する表示手段と、前記サーバ装置へ、前記コメントと前記シーン種別と前記所定のシーンの区間を示す情報とを含むオーサリング情報を送信する送信手段とを備え、前記サーバ装置は、前記クライアント装置から前記オーサリング情報を受信する受信手段と、受信された前記オーサリング情報を記憶するオーサリング情報記憶手段とを備えたことを特徴とする。
また、本発明は、サーバ装置と、複数のクライアント装置とを含むオーサリング情報共有システムにおいて、前記クライアント装置は、少なくとも映像を含むコンテンツを記憶する記憶手段と、前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、前記抽出の結果を表示する表示手段と、前記サーバ装置へ、前記コメントと前記シーン種別と前記所定のシーンの区間を示す情報とを含むオーサリング情報を送信する送信手段とを備え、前記サーバ装置は、前記クライアント装置から前記オーサリング情報を受信する受信手段と、受信された前記オーサリング情報を記憶するオーサリング情報記憶手段とを備えたことを特徴とする。 According to the present invention, in an authoring information sharing system including a server device and a plurality of client devices, the client device provides storage means for storing content including at least video, and a predetermined scene in the content. the input of the comment, and input means for attaching accept connection with specific indicate where representative time of the content, and content of the comments entered, by preliminarily 1 or more in association with each individual scene type by comparing the respective defined character information, and determination means for determining a scene type of the predetermined scene which imparts the comment, in accordance with the discriminated said scene type, the representative in the content area or these areas or near the representative time including time, as information indicating a section of the predetermined scene, before In the content, the extraction means for extracting the time information that can specify the start point and the end point of the section relating to the scene type, the display means for displaying the extraction result, the server device, the comment and the and transmitting means for transmitting an authoring information including information indicating a scene type a section of the predetermined scene, the server apparatus includes receiving means for receiving the authoring information from the client device, it received the And authoring information storage means for storing authoring information .
According to the present invention, in an authoring information sharing system including a server device and a plurality of client devices, the client device provides storage means for storing content including at least video, and a predetermined scene in the content. An image included in the content in accordance with a method defined in advance in association with an input unit that accepts an input of a comment in association with a representative time indicating a specific location in the content, the content of the input comment, and a scene type By comparing the character information obtained from the data, audio data or text data, the determining means for determining the scene type of the predetermined scene to which the comment is attached, and according to the determined scene type, The area including the representative time in the content or the representative time Extraction means for extracting time information enabling identification of a start point and an end point of the section relating to the scene type in the content as information indicating a section of the predetermined scene from a nearby area, and a result of the extraction Display means for displaying information, and transmission means for transmitting authoring information including information indicating the comment, the scene type, and the section of the predetermined scene to the server apparatus, and the server apparatus includes the client apparatus Receiving means for receiving the authoring information, and authoring information storage means for storing the received authoring information.

なお、装置に係る本発明は方法に係る発明としても成立し、方法に係る本発明は装置に係る発明としても成立する。
また、装置または方法に係る本発明は、コンピュータに当該発明に相当する手順を実行させるための（あるいはコンピュータを当該発明に相当する手段として機能させるための、あるいはコンピュータに当該発明に相当する機能を実現させるための）プログラムとしても成立し、該プログラムを記録したコンピュータ読み取り可能な記録媒体としても成立する。 The present invention relating to the apparatus is also established as an invention relating to a method, and the present invention relating to a method is also established as an invention relating to an apparatus.
Further, the present invention relating to an apparatus or a method has a function for causing a computer to execute a procedure corresponding to the invention (or for causing a computer to function as a means corresponding to the invention, or for a computer to have a function corresponding to the invention. It can also be realized as a program (for realizing the program), and can also be realized as a computer-readable recording medium on which the program is recorded.

本発明によれば、ユーザがコンテンツ中の時間的に幅を持った領域にコメントを付与する場合に、ユーザの負担を軽減することが可能になる。 ADVANTAGE OF THE INVENTION According to this invention, when a user gives a comment to the area | region which has time width in content, it becomes possible to reduce a user's burden.

以下、図面を参照しながら本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
以下では、コメントを付与する対象となるコンテンツとして、映像コンテンツを例にとって説明する。また、以下では、映像コンテンツの映像部分を中心に説明するが、映像コンテンツは、映像部分に音声部分を伴うものであっても、映像部分に音声部分を伴わないものであってもよい。 (First embodiment)
Hereinafter, video content will be described as an example of content to be commented. In the following description, the video portion of the video content will be mainly described. However, the video content may be accompanied by an audio portion in the video portion or may not be accompanied by an audio portion in the video portion.

図１に、本発明の第１の実施形態に係るオーサリング支援装置の構成例を示す。 FIG. 1 shows a configuration example of an authoring support apparatus according to the first embodiment of the present invention.

図１に示されるように、本実施形態のオーサリング支援装置１は、ユーザからのコメントの入力やコメント入力のための所定の指示等を受け付けるコメント入力部１１、シーン種別に応じてコンテンツからシーンを抽出するシーン抽出部１２、コンテンツの表示やコメント入力のための表示等を行う表示部１３、ユーザにより入力されたコメントに基づいて、コメント付与の対象となるシーンのシーン種別を判別するシーン種別判別部１４、シーン種別の判別のためのルールを蓄積したシーン種別判別ルール記憶部１５、ユーザがコメントを付与する対象であるコンテンツを蓄積したコンテンツ蓄積部１６、コンテンツの再生等を制御するコンテンツ制御部１７を備えている。 As shown in FIG. 1, the authoring support apparatus 1 according to the present embodiment includes a comment input unit 11 that receives a comment input from a user, a predetermined instruction for inputting a comment, and the like, and a scene from content according to a scene type. Scene extraction unit 12 to extract, display unit 13 for displaying content, displaying for comment input, and the like, scene type determination for determining the scene type of a scene to be commented based on a comment input by a user 14, a scene type determination rule storage unit 15 that stores a rule for determining a scene type, a content storage unit 16 that stores content to which a user gives a comment, and a content control unit that controls content reproduction and the like 17 is provided.

なお、図１のオーサリング支援装置は、１つの装置として構成してもよい。また、図１のオーサリング支援装置の各要素がネットワークを介して複数の装置上に分散して存在する形態も可能である（例えば、サーバクライアントモデルで構築することも可能である）。 Note that the authoring support apparatus of FIG. 1 may be configured as one apparatus. In addition, a form in which each element of the authoring support apparatus of FIG. 1 is distributed and present on a plurality of apparatuses via a network is also possible (for example, it is possible to construct with a server client model).

以下、本実施形態のオーサリング支援装置の動作について説明する。 The operation of the authoring support apparatus according to this embodiment will be described below.

図２に、本実施形態に係るオーサリング支援装置における処理手順の一例を示す。 FIG. 2 shows an example of a processing procedure in the authoring support apparatus according to the present embodiment.

ユーザは、まず、所望のコンテンツの閲覧を行うために、再生・巻き戻し・早送りなどの指示を（図示しない指示受付部から）与える。 First, the user gives an instruction (from an instruction receiving unit (not shown)) such as reproduction, rewinding, and fast-forwarding in order to browse a desired content.

オーサリング支援装置のコンテンツ制御部１７は、（図示しない指示受付部により）ユーザから受け付けた指示に応じて、コンテンツの再生・巻き戻し・早送りなどのコンテンツ再生制御を行う（ステップＳ１０）。その際、コンテンツは、表示部１３に表示されるようにしてもよい。なお、コンテンツが音声を伴うものである場合には、音声が（図示しないスピーカ等から）出力される。 The content control unit 17 of the authoring support apparatus performs content playback control such as playback / rewinding / fast-forwarding of content in accordance with an instruction received from the user (by an instruction receiving unit (not shown)) (step S10). At that time, the content may be displayed on the display unit 13. If the content is accompanied by sound, the sound is output (from a speaker or the like not shown).

次に、ユーザは、コンテンツの閲覧中に、コメントを記入するシーンになった場合、コメント入力部１１を操作することにより、コメントを書き込む。 Next, the user writes a comment by operating the comment input unit 11 when a scene in which a comment is to be entered during browsing of content.

ここでは、最初に、コメント書込指示を入力するものとする。 Here, first, a comment writing instruction is input.

オーサリング支援装置のコメント入力部１１は、ユーザからのコメント書込指示が入力された場合（ステップＳ１１）、ユーザからのコメントの書き込みを受け付ける（ステップＳ１２）。 The comment input unit 11 of the authoring support apparatus receives a comment write from the user (step S12) when a comment write instruction is input from the user (step S11).

ここでは、一例として、コンテンツの再生中又は一時停止中に、ユーザからのコメント書込指示が（コメント入力部１１から）入力された場合に、コメント入力インターフェースを表示させる（なお、ここでは、表示部１３に表示させるものとする）。 Here, as an example, a comment input interface is displayed when a comment writing instruction is input from the user (from the comment input unit 11) during playback or pause of the content (in this case, display It is assumed to be displayed on the part 13).

コンテンツの再生中にコメント書込指示が入力された場合には、再生中のコンテンツを、コメントを書き込むシーン（例えば、ユーザからのコメント書込指示が入力されたときに再生中であったコンテンツの箇所）で一時停止してから、コメント入力インターフェースを表示させるようにしてもよいし、上記コンテンツを再生させたまま、コメント入力インターフェースを表示させるようにしてもよい。いずれの方法を取るかについては、予め定める（例えば、予めユーザがいずれかを選択して設定しておく）ようにしてもよいし、その都度ユーザが指示する（例えば、ユーザがコメント書込指示をするときに併せて指示する（具体的には、例えば、コメント書込指示ボタンを２種類用意し、所望のボタンをクリックして指示する、あるいは、コメント書込指示ボタンをクリックしたかダブルクリックしたかによって指示する、など））ようにしてもよい。 When a comment writing instruction is input during playback of the content, the content being played back is changed to the scene in which the comment is written (for example, the content being played back when the comment writing instruction from the user is input). The comment input interface may be displayed after pausing at a point), or the comment input interface may be displayed while the content is being reproduced. Which method is used may be determined in advance (for example, the user selects and sets in advance), or the user instructs each time (for example, the user instructs to write a comment). (Specifically, for example, two types of comment writing instruction buttons are prepared and the user clicks the desired button to instruct, or the comment writing instruction button is clicked or double-clicked. It may be instructed according to whether or not, etc.)).

コンテンツの一時停止中にコメント書込指示が入力された場合には、コンテンツが一時停止されたまま、コメント入力インターフェースを表示させるものとする。 When a comment writing instruction is input while the content is paused, the comment input interface is displayed while the content is paused.

このとき、コメント入力部１１は、ユーザがコメントを書き込もうとした場所（例えば、ユーザからのコメント書込指示が入力されたときに再生中又は一時停止中であったコンテンツの箇所）を示す情報として、当該場所をコンテンツ中で特定する時間（例えば、コンテンツの先頭からの時間）を保持する。この時間のことを「代表時間」と呼ぶものとする。例えば、ユーザがコンテンツの再生中にコメント書込指示を入力した場合は、コメント書込指示を入力したときのコンテンツの再生時間を保持し、ユーザが一時停止してコメント入力部１１を操作した場合は、この一時停止したときのコンテンツの再生時間を保持する。 At this time, the comment input unit 11 is information indicating a place where the user tried to write a comment (for example, a location of content that was being played back or paused when a comment writing instruction was input from the user). The time for specifying the location in the content (for example, the time from the beginning of the content) is held. This time is referred to as “representative time”. For example, when the user inputs a comment writing instruction during content playback, the content playback time when the comment writing instruction is input is held, and the user pauses and operates the comment input unit 11 Holds the playback time of the content at the time of the pause.

図３に、上記のコメント入力インターフェースの一例を示す。 FIG. 3 shows an example of the comment input interface.

図３に例示するように、このコメント入力インターフェースには、コメントを書き込むコメント記入部３０１と、そのコメントに応じたシーンを抽出するためのシーン抽出ボタン３０２と、再生場所を表示するタイムバー３０３と、コンテンツを表示する表示部３０４とが設けられている。 As illustrated in FIG. 3, the comment input interface includes a comment entry unit 301 for writing a comment, a scene extraction button 302 for extracting a scene corresponding to the comment, and a time bar 303 for displaying a playback location. And a display unit 304 for displaying content.

なお、図３の例には、コメント入力インターフェースとしてコンテンツ表示もあるが、コンテンツ表示は表示部１３にあってもよい。図３の例では、コンテンツ表示に代表時間でのコンテンツが表示されている。 In the example of FIG. 3, there is content display as a comment input interface, but the content display may be on the display unit 13. In the example of FIG. 3, the content at the representative time is displayed on the content display.

ここで、ユーザは、コメント記入部３０１にコメントを書き込む。図３の例では、「このセリフかっこいい！」と書き込まれた場合を例示している。コメントの書き込みには、どのような方法を利用しても構わない。例えば、コメント入力部１１が書き込みインターフェースを持っており、キーボードにより文字を入力するようにしてもよいし、リモコンにより文字を入力するようにしてもよいし、他の方法によっても構わない。 Here, the user writes a comment in the comment entry unit 301. In the example of FIG. 3, a case where “this line is cool!” Is illustrated. Any method may be used for writing a comment. For example, the comment input unit 11 has a writing interface, and characters may be input using a keyboard, characters may be input using a remote controller, or other methods may be used.

図３の例では、コメントの入力が終ると、ユーザがシーン抽出ボタン３０２を押すことにより、ステップＳ１３のシーン種別判定へ進むことになる。 In the example of FIG. 3, when the input of the comment is completed, the user presses the scene extraction button 302, and the process proceeds to the scene type determination in step S13.

なお、上記例では、ステップＳ１３のシーン種別判定へ進む方法として、ユーザがシーン抽出ボタン３０２を押す方法をとったが、ユーザが明示的な指示を行う他の方法も可能である。また、ユーザからの明示的な指示がなくても、一定時間以上、ユーザからのコメントの入力がなかった場合に、コメントの入力が完了したものとみなして、ステップＳ１３のシーン種別判定へ進むようにする方法も可能である。 In the above example, as a method of proceeding to the scene type determination in step S13, the method in which the user presses the scene extraction button 302 is used. However, other methods in which the user gives an explicit instruction are also possible. Even if there is no explicit instruction from the user, if there is no input from the user for a certain time or longer, it is considered that the input of the comment has been completed, and the process proceeds to the scene type determination in step S13. It is also possible to make it.

ステップＳ１３のシーン種別判定は、シーン種別判定部１４により行われる。シーン種別判定部１４は、シーン種別判定にあたって、その判定のためのルールであるシーン種別判定ルールを用いる。 The scene type determination in step S13 is performed by the scene type determination unit 14. The scene type determination unit 14 uses a scene type determination rule that is a rule for the determination of the scene type.

図４に、シーン種別判別ルール記憶部１５に蓄積されているシーン種別判定ルールの一例を示す。 FIG. 4 shows an example of the scene type determination rules stored in the scene type determination rule storage unit 15.

図４に例示するように、このシーン種別判定ルールは、ルールＩＤ、種別、ウェイト、表記を持っている。ここで、「ルールＩＤ」（ルール識別子）は、各ルールを管理（識別）するためのＩＤ、「種別」は、当該ルールにより判定されるシーン種別、「ウェイト」は、シーン種別を判定するときのスコアを計算するための荷重、「表記」は、ユーザが記入したコメントに対してマッチさせる表記である。 As illustrated in FIG. 4, this scene type determination rule has a rule ID, a type, a weight, and a notation. Here, “rule ID” (rule identifier) is an ID for managing (identifying) each rule, “type” is a scene type determined by the rule, and “weight” is a scene type. The load for calculating the score, “notation”, is a notation that matches a comment entered by the user.

シーン種別判定部１４は、ユーザが入力したコメントに対して、そのシーン種別毎のスコアを計算する。 The scene type determination unit 14 calculates a score for each scene type with respect to the comment input by the user.

この計算方法は、例えば、Ｓｃｏｒｅ＿ｉ＝ＳＵＭ（Ｗｅｉｇｈｔ＿ｉ）である。ここで、Ｓｃｏｒｅ＿ｉは、ｉ番目のシーン種別のスコア、Ｗｅｉｇｈｔ＿ｉは、ｉ番目のシーン種別を持つルールのうち、表記がマッチしたルールのウェイトである。 This calculation method is, for example, Score_i = SUM (Weight_i). Here, Score_i is a score of the i-th scene type, and Weight_i is a weight of a rule whose notation matches among the rules having the i-th scene type.

ここで、このときのシーン種別判定方法について具体例を用いて説明する。 Here, the scene type determination method at this time will be described using a specific example.

例えば、ユーザが「このセリフは無いよね。ヒドイこと言うなぁ」というコメントの書き込みをした場合についてスコアを計算するものとする。 For example, it is assumed that a score is calculated when a user writes a comment “There is no such line.

この場合、図４のルールのうち、ルールＩＤ＝１，２，３のルールにおいて表記がマッチする。この場合、１番目のシーン種別として「セリフ」のスコア、２番目のシーン種別として「カット」のスコア、３番目のシーン種別として「ＣＭ」のスコア、４番目のシーン種別として「テロップ」のスコアを計算すると、
Ｓｃｏｒｅ＿１＝ＳＵＭ（２，１，１）＝４
Ｓｃｏｒｅ＿２＝ＳＵＭ（）＝０
Ｓｃｏｒｅ＿３＝ＳＵＭ（）＝０
Ｓｃｏｒｅ＿４＝ＳＵＭ（）＝０
となり、セリフのスコアは４点、それ以外のスコアは０点となる。 In this case, the notation matches in the rules of rule ID = 1, 2, 3 among the rules of FIG. In this case, the first scene type is “Serif” score, the second scene type is “Cut” score, the third scene type is “CM” score, and the fourth scene type is “Telop” score. When calculating
Score_1 = SUM (2,1,1) = 4
Score_2 = SUM () = 0
Score_3 = SUM () = 0
Score_4 = SUM () = 0
Thus, the score of the dialogue is 4 points, and the other scores are 0 points.

このようにして、ユーザが入力したコメントに対して、各シーン種別毎にスコアが計算される。 In this way, the score is calculated for each scene type with respect to the comment input by the user.

図４のシーン種別判定ルールは、ユーザの入力したコメントから、シーン種別がより正しく判定されるように変更して用いると好ましい。 The scene type determination rule in FIG. 4 is preferably used by changing the comment so that the scene type is determined more correctly based on the comment input by the user.

ところで、シーン種別毎のスコアを計算する別の手法として、ルールに記述されている表記とユーザが入力したコメントとの類似度を利用することもできる。 By the way, as another method for calculating the score for each scene type, the similarity between the notation described in the rule and the comment input by the user can be used.

この類似度を計算する方法としては、公知の自然言語処理技術を用いることができる。 As a method for calculating the similarity, a known natural language processing technique can be used.

例えば、ルールに記述されている表記とユーザが入力したコメントとに共通に現れる語の数をスコアとしてもよい。 For example, the score may be the number of words that appear in common in the notation described in the rule and the comment entered by the user.

また、例えば、類似度を計算する方法として、文献“東京大学出版会、徳永健伸著、言語と計算−５情報検索と言語処理、第２章「情報検索の基礎」”にある、検索質問に対する文書の適合度を計算するベクトル空間モデルを利用して、ルールの表記とユーザのコメントとの類似度を定義してもよい。 Also, for example, as a method of calculating similarity, for the search question in the document “The University of Tokyo Press, Takenobu Tokunaga, Language and Calculation-5 Information Search and Language Processing, Chapter 2“ Basics of Information Search ”” You may define the similarity of a rule description and a user's comment using the vector space model which calculates the fitness of a document.

これらは、ルールに記述されている表記とユーザが入力したコメントとに共通に現れる語とを利用してスコアを計算するため、必ずしもルールのウェイトは必要ではない。 Since the score is calculated using the words that appear in the notation described in the rule and the comment input by the user, the weight of the rule is not necessarily required.

また、図示していないが、類義語辞書や同義語辞書を用いることにより、表記上異なる表現であってもその内容の類似性から、類似度を求めることもできる。 Further, although not shown, by using a synonym dictionary or a synonym dictionary, the similarity can be obtained from the similarity of the contents even if the expressions are different in notation.

これらにより、表記として、例えば図５のように、コンテンツのセリフやテロップの内容を、そのまま、ユーザが入力したコメントと比較することも、できるようになる。図５において、（セリフ）には、代表時間でのセリフがそのまま入力される。また、（テロップ）には、代表時間でのテロップがそのまま入力される。 Thus, as a notation, for example, as shown in FIG. 5, it is possible to compare the contents of words and telops with the comments input by the user as they are. In FIG. 5, the words in the representative time are input as they are in (Serif). In (telop), the telop at the representative time is input as it is.

これにより、例えば、コンテンツ内のセリフとして、「特許明細書を書くのは難しい」というような発言があった場合、このルールは、図６のようになる。 Thus, for example, when there is a statement such as “It is difficult to write a patent specification” as a dialogue in the content, this rule becomes as shown in FIG.

ユーザがコメントとして、「特許明細書を書くのって、そんなに難しいんですか？」と入力した場合、例えば、共通に現れる語をスコアとする場合、「特許明細書」「書く」「難しい」が共通として現れるため、このルールＩＤの表記とユーザのコメントとの類似度は、３となる。 For example, when the user inputs “is it so difficult to write a patent specification?” As a comment, for example, when a word that appears in common is used as a score, “patent specification” “writing” “difficult” Since they appear as common, the similarity between the rule ID notation and the user's comment is 3.

これにより、シーン種別が「セリフ」のスコアを、シーン種別が「セリフ」であるルールにおいて計算された類似度の最大値としてもよいし、類似度の合計としてもよい。最大値とする場合は、これらのルールの中で「セリフ」のルールでは、ルールＩＤ＝９のルールが類似度最大となるため、シーン種別がセリフのスコアは３となる。 Thereby, the score of the scene type “Serif” may be the maximum value of the similarity calculated in the rule whose scene type is “Serif”, or may be the sum of the similarities. In the case of the maximum value, among the rules of “Serif” among these rules, the rule with the rule ID = 9 has the highest similarity, so the score of the scene type is “3”.

次に、シーン種別判定部１４で計算されたスコアをもとに、ステップＳ１４では、シーン抽出部１２により対応するシーンが抽出される。 Next, based on the score calculated by the scene type determination unit 14, a corresponding scene is extracted by the scene extraction unit 12 in step S14.

ここで、シーン種別が「セリフ」と判定された場合を例にとって、シーン抽出部１２の動作について説明する。 Here, the operation of the scene extraction unit 12 will be described by taking as an example a case where the scene type is determined to be “line”.

シーン抽出部１２は、コンテンツから、「代表時間」付近のセリフの開始時間から終了時間までを抽出する。 The scene extraction unit 12 extracts from the content from the start time to the end time of a line near “representative time”.

ここでは、シーン抽出部１２は、字幕認識部（図示せず）を持つ。この字幕認識部は、コンテンツの字幕が表示されたときの先頭からの時間と、字幕の表示が消えたときの先頭からの時間を抽出する。 Here, the scene extraction unit 12 has a caption recognition unit (not shown). The subtitle recognition unit extracts the time from the beginning when the subtitle of the content is displayed and the time from the top when the subtitle display disappears.

字幕を認識する方法としては、例えば、コンテンツが放送波の場合は、字幕放送として受信した字幕を使うことができる。また、例えば、コンテンツがＤＶＤの場合も、字幕を認識することができる。また、例えば、画面に表示された文字から、テロップ認識技術により、字幕を認識することもできる。 As a method for recognizing subtitles, for example, when the content is broadcast waves, subtitles received as subtitle broadcasts can be used. For example, even when the content is a DVD, it is possible to recognize subtitles. Further, for example, subtitles can be recognized from characters displayed on the screen by using a telop recognition technique.

図７に、字幕が表示されたときのコンテンツの先頭からの経過時間である開始時間と、字幕が消えたときのコンテンツの先頭からの経過時間である終了時間の例を示す。図７では、開始時間と終了時間とがミリ秒で記述されている。コンテンツが放送番組の場合は、コンテンツの先頭は、番組の開始時刻として計算することができる。なお、ユーザがシーン抽出ボタン３０２を押したときに、そのときの代表時間の付近のみについて、字幕の開始時間と終了時間を抽出してもよいし、予めコンテンツの全ての字幕の開始時間と終了時間を求めて、図７のようなテーブルを作成しておいてもよい。 FIG. 7 shows an example of the start time that is the elapsed time from the beginning of the content when the subtitle is displayed and the end time that is the elapsed time from the top of the content when the subtitle disappears. In FIG. 7, the start time and the end time are described in milliseconds. If the content is a broadcast program, the beginning of the content can be calculated as the start time of the program. When the user presses the scene extraction button 302, the subtitle start time and end time may be extracted only in the vicinity of the representative time at that time, or the start time and end time of all subtitles of the content in advance. A table as shown in FIG. 7 may be created for the time.

この開始時間から終了時間までの間で、代表時間を含む時間間隔を、ユーザが入力したコメントに対応するシーンとして抽出する。例えば、代表時間がコンテンツの先頭から９７秒のときは、図７を参照して、９６８９１ミリ秒から９８０６６ミリ秒までを対応するシーンとして抽出する。 Between the start time and the end time, a time interval including the representative time is extracted as a scene corresponding to the comment input by the user. For example, when the representative time is 97 seconds from the beginning of the content, referring to FIG. 7, 96891 milliseconds to 98066 milliseconds are extracted as corresponding scenes.

もし代表時間を含む時間間隔が図７のいずれの開始時間から終了時間にもあたらない場合は、最も近い時間間隔を対応するシーンとして抽出してもよい。 If the time interval including the representative time does not correspond to any start time to end time in FIG. 7, the nearest time interval may be extracted as the corresponding scene.

上記では、シーン種別が「セリフ」と判定された場合に、字幕情報を利用して図７の時間間隔を抽出する例について説明したが、音声認識技術を用いて字幕を抽出することも可能である。この場合、シーン抽出部１２は、音声認識部（図示せず）を持つ。この音声認識部により、セリフの開始時間と終了時間とを決定することができる。例えば、代表時間の付近について、コンテンツの音声部分の音声認識を行い、音声認識結果に応じてコンテンツからセリフの区間を抽出する（例えば、コンテンツ中でセリフが開始されたときの先頭からの時間と、セリフが終了したときの先頭からの時間を抽出する）。 In the above, the example in which the time interval of FIG. 7 is extracted using the caption information when the scene type is determined to be “serif” has been described. However, it is also possible to extract the caption using the voice recognition technology. is there. In this case, the scene extraction unit 12 has a voice recognition unit (not shown). The speech recognition unit can determine the start time and end time of the speech. For example, speech recognition of the audio portion of the content is performed near the representative time, and a speech segment is extracted from the content according to the speech recognition result (for example, the time from the beginning when the speech is started in the content) , Extract the time from the beginning when the line ends).

また、シーン抽出部１２において、字幕認識結果と音声認識結果との両方を用いることもできる。これは、例えば実際の音声と表示される字幕とがずれている場合に、そのずれを補正することにより、正確にセリフのシーンを抽出することができるので、有効である。この場合、シーン抽出部１２は、字幕認識部と音声認識部との両方を持つ。まず、字幕認識部により、字幕の時間間隔を抽出するとともに、字幕の発話内容を文字情報として抽出する。次に、その時間間隔の付近で、発話内容の文字情報を発話している時間間隔を音声認識部により探索する。これは、発話内容を文字情報として音声認識部に与えることにより、何も情報がない場合に比較して、セリフ区間の抽出精度を向上することができるためである。このようにして、字幕認識部により抽出されたセリフの時間間隔に対して、そのずれを音声認識部により補正することにより、セリフのシーンをより正確に抽出することができる。 The scene extraction unit 12 can also use both the caption recognition result and the voice recognition result. This is effective because, for example, when the actual sound and the displayed subtitles are shifted, by correcting the shift, a speech scene can be accurately extracted. In this case, the scene extraction unit 12 has both a caption recognition unit and a voice recognition unit. First, the subtitle recognition unit extracts the time interval of the subtitle and extracts the utterance content of the subtitle as character information. Next, the speech recognition unit searches for a time interval in which the character information of the utterance content is spoken near the time interval. This is because the speech segment extraction accuracy can be improved by giving the speech content to the speech recognition unit as character information as compared with the case where there is no information. In this way, by correcting the shift by the speech recognition unit with respect to the time interval of the speech extracted by the caption recognition unit, the speech scene can be extracted more accurately.

以上のことから、シーン種別が「セリフ」の場合のシーンを抽出することができるようになる。 From the above, it is possible to extract a scene when the scene type is “Serif”.

次に、シーン種別が「カット」と判定された場合を例にとって、シーン抽出部１２の動作について説明する。 Next, taking the case where the scene type is determined to be “cut” as an example, the operation of the scene extraction unit 12 will be described.

この場合、シーン抽出部１２は、カット検出部（図示せず）を持つ。カット検出部は、コンテンツにおけるカメラ切り替えのタイミングを抽出する。これは、例えば、コンテンツにおいて、前後のフレームでの画像を比較し、その類似度を計算することで抽出することができる。この類似度は、前後のフレームでの画像の同じ座標の画素値を比較し、その差の合計として求めることができる。この差がある閾値以上の場合、前後のフレームでの画像が大きく変化していることを意味するので、この点をカット点とすることができる。 In this case, the scene extraction unit 12 has a cut detection unit (not shown). The cut detection unit extracts the camera switching timing in the content. This can be extracted, for example, by comparing images in the previous and next frames in content and calculating their similarity. This similarity can be obtained as a sum of the differences by comparing pixel values at the same coordinates of images in the previous and subsequent frames. If this difference is greater than or equal to a certain threshold value, it means that the images in the previous and subsequent frames have changed significantly, and this point can be used as a cut point.

また、カット検出のための他の手法として、文献２“「ゆう度比検定を用いたＭＰＥＧビットストリームからの動画像カット検出手法」、「電子情報通信学会論文誌Ｖｏｌ．Ｊ８２−Ｄ２Ｎｏ．３」、（１９９９年３月）、金子敏充、堀修著、社団法人電子情報通信学会発行、３６１頁〜３７０頁”や文献３“「動きベクトル符号量を用いたＭＰＥＧ動画像からの高速カット検出」，電子情報通信学会パターン認識・メディア理解研究会（ＰＲＭＵ），（１９９６年１１月），金子敏充，堀修，社団法人電子情報通信学会発行”のような手法を用いることもできる。 Further, as other methods for cut detection, Reference 2 ““ Moving image cut detection method from MPEG bitstream using likelihood ratio test ””, “The Institute of Electronics, Information and Communication Engineers, Vol. J82-D2 No. 3” "(March 1999), Toshimitsu Kaneko and Osamu Hori, published by the Institute of Electronics, Information and Communication Engineers, pages 361 to 370" and reference 3 "High-speed cut detection from MPEG video using motion vector code amount" "The Institute of Electronics, Information and Communication Engineers, Pattern Recognition and Media Understanding Study Group (PRMU), (November 1996), Toshimitsu Kaneko, Osamu Hori, published by The Institute of Electronics, Information and Communication Engineers" can be used.

図８に、このようにして検出したカット点でのコンテンツの先頭からの時間の例を示す。カット点はコンテンツから順に検出されるため、先行するカットの終了時間と、これに後続するカットの開始時間とが一致している（この点が、図７に示したセリフの時間間隔とは異なっている）。 FIG. 8 shows an example of the time from the beginning of the content at the cut point thus detected. Since the cut points are detected in order from the contents, the end time of the preceding cut is coincident with the start time of the subsequent cut (this is different from the time interval of the lines shown in FIG. 7). ing).

このようにして検出したカットの時間間隔（図８）は、シーン種別が「セリフ」の場合における図７の使い方と同様に用いる。 The cut time interval detected in this way (FIG. 8) is used in the same manner as in FIG. 7 when the scene type is “Serif”.

ユーザにより入力されたコメントのシーン種別が「カット」の場合、代表時間を含むカットの時間間隔を対応するシーンとして抽出する。このとき、字幕の場合との相違点は、必ずどこかのカット間隔が対応することである。例えば、全くカメラ切り替えのないコンテンツの場合は、コンテンツ全体が１つのカット間隔となり、どこを代表時間としてもコンテンツ全体が対応するシーンとして抽出することができる。 When the scene type of the comment input by the user is “cut”, the cut time interval including the representative time is extracted as the corresponding scene. At this time, the difference from the case of subtitles is that some cut interval always corresponds. For example, in the case of content with no camera switching, the entire content has one cut interval, and it can be extracted as a scene corresponding to the entire content regardless of where the representative time is.

次に、シーン種別が「ＣＭ」（コマーシャルメッセージ）と判定された場合を例にとって、シーン抽出部１２の動作について説明する。 Next, the operation of the scene extraction unit 12 will be described by taking as an example a case where the scene type is determined to be “CM” (commercial message).

この場合、シーン抽出部１２は、ＣＭ検出部（図示せず）を持つ。ＣＭ検出部は、コンテンツにおけるＣＭを抽出する。 In this case, the scene extraction unit 12 has a CM detection unit (not shown). The CM detection unit extracts a CM in the content.

ＣＭを検出する方法としては、例えば、以下に示すようなものがある。
（１）（例えば前述した方法により）カット検出を行い、その検出時間が１５秒（または３０秒）ごとに検出される部分をＣＭ部分と判定する。
（２）ＣＭの前後で無音区間があることを利用して、無音区間が１５秒（または３０秒）毎に検出された場合、その無音区間をＣＭ部分と判定する。
（３）音声がモノラルの番組コンテンツのときにステレオ放送の部分をＣＭと判定する。
（４）ＣＭの画像パターンを記憶しておき、その画像パターンとマッチする部分をＣＭ部分と判定する。
（５）例えば特開２００３−２５７１６０号公報に開示されているような、ＴＶ信号の音声モード、ＴＶ信号における映像信号レベル、音声信号レベルパターンを利用してＣＭ区間を検出する。これは、例えば、番組コンテンツが二カ国放送の場合に、そうでない部分をＣＭ部分と判定する。 As a method for detecting the CM, for example, there is the following method.
(1) Cut detection is performed (for example, by the method described above), and a portion where the detection time is detected every 15 seconds (or 30 seconds) is determined as a CM portion.
(2) Utilizing the fact that there is a silent section before and after the CM, if a silent section is detected every 15 seconds (or 30 seconds), the silent section is determined as the CM portion.
(3) When the sound is monophonic program content, the part of the stereo broadcast is determined as CM.
(4) A CM image pattern is stored, and a portion matching the image pattern is determined as a CM portion.
(5) The CM section is detected using the audio mode of the TV signal, the video signal level in the TV signal, and the audio signal level pattern as disclosed in, for example, Japanese Patent Laid-Open No. 2003-257160. For example, in the case where the program content is broadcast in two countries, the part that is not so is determined as the CM part.

ＣＭ検出部では、上記のうちのいずれかの方法を用いて、または、上記方法のいくつか組み合わせて用いることにより、入力されたコンテンツからＣＭ部分を検出する。 The CM detection unit detects a CM portion from the input content by using any one of the above methods or by using some combination of the above methods.

このようにして検出したＣＭでのコンテンツの先頭からの時間の例を図９に示す。これも、シーン種別がセリフの場合における図７の使い方と同様に用いる。 An example of the time from the beginning of the content in the CM detected in this way is shown in FIG. This is also used in the same manner as in FIG. 7 when the scene type is a line.

次に、シーン種別が「テロップ」と判定された場合を例にとって、シーン抽出部１２の動作について説明する。 Next, the operation of the scene extraction unit 12 will be described taking the case where the scene type is determined as “telop” as an example.

この場合、シーン抽出部１２は、テロップ認識部を持つ。テロップ認識部は、コンテンツにおけるテロップが表示されたときの先頭からの時間と、テロップの表示が消えたときの先頭からの時間を抽出する。 In this case, the scene extraction unit 12 has a telop recognition unit. The telop recognition unit extracts the time from the beginning when the telop in the content is displayed and the time from the beginning when the telop display disappears.

テロップを認識する方法としては、例えば、映像認識技術を用いることができる。画面からＯＣＲによって文字を認識することにより、テロップを認識することができる。 As a method for recognizing a telop, for example, a video recognition technique can be used. A telop can be recognized by recognizing characters from the screen by OCR.

テロップ認識結果は、シーン種別が「セリフ」のときと同様に図７のような時間間隔が得られる。この結果は、シーン種別が「セリフ」の場合と同様に用いられる。 As the telop recognition result, a time interval as shown in FIG. 7 is obtained as in the case where the scene type is “Serif”. This result is used in the same manner as when the scene type is “Serif”.

以上のようにして、シーン種別が「セリフ」、「カット」、「ＣＭ」、「テロップ」について説明したが、これら全てをシーン抽出部１２が持ってもよいし、どれか一部を持ってもよい。 As described above, the scene types “serif”, “cut”, “CM”, and “telop” have been described. However, all of these may be included in the scene extraction unit 12 or some of them may be included. Also good.

また、シーン種別判定部１４が他のシーン種別を認識する場合は、それに対応した時間間隔を抽出する機能を持つことにより、対応することができる。例えば、曲のシーンを抽出する機能をシーン抽出部１２が持つことにより、ユーザが例えば「このＢＧＭ」という記入を行ったときに、シーン種別判定部１４は「曲」というシーン種別を出力し、シーン抽出部１２がそれに対応するシーンを抽出することにより、対応するシーンを抽出することも可能である。 Further, when the scene type determination unit 14 recognizes another scene type, it can be handled by having a function of extracting a time interval corresponding to the scene type. For example, since the scene extraction unit 12 has a function of extracting a scene of a song, when the user enters, for example, “this BGM”, the scene type determination unit 14 outputs the scene type “music”. It is also possible to extract the corresponding scene by the scene extraction unit 12 extracting the corresponding scene.

また、シーン区間の時間間隔を、図示しない通信手段を用いて、時間間隔を配信するサーバと通信し、対応する時間間隔を受信することも可能である。また、コンテンツ蓄積部１６にコンテンツを蓄積する場合は、放送波を受信したり、ネットワークから受信したりすることにより蓄積するが、このとき、シーン区間の時間間隔を同時に受信することも可能である。これらの場合、シーン区間の時間間隔を配信するサーバが必要となるが、これは、例えば、これまでシーン抽出部１２を用いた抽出方法により作成してもよいし、例えば、手動で抽出したシーン区間をサーバが保持し、配信してもよい。手動でシーン区間を作成する場合には、これまで記述したシーン区間だけでなく、俳優に対応するシーン区間や、車や服などのシーン区間を作成することも可能である。 It is also possible to communicate the time interval of the scene section with a server that distributes the time interval using a communication means (not shown) and receive the corresponding time interval. When content is stored in the content storage unit 16, it is stored by receiving broadcast waves or received from a network. At this time, it is also possible to simultaneously receive the time interval of the scene section. . In these cases, a server that distributes the time interval of the scene section is required. This may be created by, for example, an extraction method using the scene extraction unit 12 so far, for example, a manually extracted scene. The section may be held by the server and distributed. When creating a scene section manually, it is possible to create not only the scene section described so far, but also a scene section corresponding to an actor and a scene section such as a car or clothes.

次に、ステップＳ１５では、コメントに対応するシーン抽出結果に関する表示を行う。この表示は、表示部１３で行われるようにしてもよい。 Next, in step S15, a display relating to the scene extraction result corresponding to the comment is performed. This display may be performed on the display unit 13.

図１０に、表示画面の一例を示す。 FIG. 10 shows an example of the display screen.

図１０に例示するように、この表示画面には、コメントを書き込むコメント入力部４０１と、コンテンツの時間を示すタイムバー４０３と、コメントに対応するシーン抽出結果を示す時間間隔ポインタ４０２と、コンテンツを表示する表示部４０４と、シーン種別を表示するシーン種別表示部４０５とが設けられている。 As illustrated in FIG. 10, the display screen includes a comment input unit 401 for writing a comment, a time bar 403 indicating the time of the content, a time interval pointer 402 indicating a scene extraction result corresponding to the comment, and content. A display unit 404 for displaying and a scene type display unit 405 for displaying the scene type are provided.

表示部４０４では、例えば、コンテンツ中で、ユーザが入力したコメントに対応するシーンを優先的に表示するようにしてもよい（例えば、ユーザがコメントを入力した代表時間で停止した画像を表示してもよい）。また、例えば、抽出されたシーン区間を強調した表示をしてもよい（例えば、抽出されたシーン区間だけをループ再生するようにしてもよい）。また、これら以外の表示情報も可能である。 In the display unit 404, for example, a scene corresponding to a comment input by the user may be preferentially displayed in the content (for example, an image stopped at a representative time when the user inputs a comment is displayed). Is good). For example, the extracted scene section may be highlighted (for example, only the extracted scene section may be reproduced in a loop). Display information other than these is also possible.

時間間隔ポインタ４０２では、抽出されたシーン区間が幅をもって表示されている。 In the time interval pointer 402, the extracted scene section is displayed with a width.

シーン種別表示部４０５では、抽出されたシーン区間がプルダウンメニューとして表示されている。ここでは、シーン種別判定部１４がスコアを付けた順に表示される（図１０では、シーン種別の第1候補として「セリフ」が表示されている場合を例示している）。このとき、シーン抽出部１２が抽出できなかったシーン種別は表示されなくてもよい。ユーザは別のシーン種別も選択することができる。ユーザが別のシーン種別を選択すると、ユーザに選択されたシーン種別の時間間隔が適用される。 In the scene type display unit 405, the extracted scene section is displayed as a pull-down menu. Here, the scene type determination unit 14 displays the scores in the order in which the scores are given (FIG. 10 illustrates the case where “Serif” is displayed as the first candidate for the scene type). At this time, the scene type that could not be extracted by the scene extraction unit 12 may not be displayed. The user can select another scene type. When the user selects another scene type, the time interval of the scene type selected by the user is applied.

また、シーン抽出部１２で抽出したシーン区間以外にも、代表時間の付近から一定区間をシーン区間として抽出して表示してもよい。図１０のシーン種別表示部４０５では、代表時間から前後３秒の区間を抽出した場合と、代表時間から前１０秒の区間を抽出した場合と、代表時間から後ろ１０秒の区間を抽出した場合がシーン種別として選択できるようにシーン種別表示部４０５に表示されている。 In addition to the scene section extracted by the scene extracting unit 12, a fixed section may be extracted as a scene section from the vicinity of the representative time and displayed. In the scene type display unit 405 of FIG. 10, when a section 3 seconds before and after is extracted from the representative time, when a section 10 seconds before is extracted from the representative time, and when a section 10 seconds behind is extracted from the representative time Is displayed on the scene type display unit 405 so that the scene type can be selected.

シーン種別表示部４０５でユーザが別のシーン種別を選択すると、表示部４０４と時間間隔ポインタ４０２は、そのシーン区間が強調して表示される。 When the user selects another scene type on the scene type display unit 405, the display unit 404 and the time interval pointer 402 are displayed with the scene section highlighted.

ところで、図２の手順例では、ユーザによるコメント入力が完了してから（又はユーザによるコメント入力が完了したものとみなされてから）、ステップＳ１３のシーン種別判定へ進む方法をとっているが、ユーザによるコメント入力の途中であっても、随時、ステップＳ１３へ進むようにすることも可能である。 By the way, in the procedure example of FIG. 2, after the comment input by the user is completed (or after it is considered that the comment input by the user is completed), the process proceeds to the scene type determination in step S13. Even during comment input by the user, it is possible to proceed to step S13 at any time.

図１１に、この場合の処理手順の一例を示す。図２の処理手順と相違する点は、ステップＳ１５の表示の後に、コメントが追加されたかどうかの判断（ステップＳ１６）があることである。 FIG. 11 shows an example of the processing procedure in this case. The difference from the processing procedure of FIG. 2 is that there is a determination (step S16) whether or not a comment has been added after the display in step S15.

例えば、ユーザが「このセリフは無いよね。ヒドイこと言うなぁ」というコメントを入力しようとする場合、「このセリフ」まで入力した時点で、本オーサリング支援装置はシーン種別を判定することができる。よって、ユーザがコメント入力部１１を操作して、コメントの入力を始めるとき、文字を入力する毎にステップＳ１３へ進んで処理を行う。ここで、文字を入力する毎にとは、例えば、ユーザが１文字入力する毎でもよいし、かな漢変換を確定する毎でもよいし、ある一定時間（例えば３秒）入力が無かったときにステップＳ１３へ進んでもよい。さらに文字が追加されると、入力中のコメント全体についてステップＳ１３以降の処理を行う。この場合、シーン抽出ボタン３０２は不要となる。このように、ユーザのコメント入力に応じて、随時、対応シーンを抽出することにより、よりユーザの負担を軽減することができる。 For example, when the user intends to input a comment “There is no such line. Don't say that,” the authoring support apparatus can determine the scene type at the time of inputting “This line”. Therefore, when the user operates the comment input unit 11 to start inputting a comment, the process proceeds to step S13 every time a character is input. Here, every time a character is input, for example, every time a user inputs one character, every time a Kana-Kan conversion is confirmed, or when there is no input for a certain time (for example, 3 seconds). You may progress to step S13. When more characters are added, the processing after step S13 is performed on the entire comment being input. In this case, the scene extraction button 302 is not necessary. In this way, the user's burden can be further reduced by extracting the corresponding scene at any time according to the user's comment input.

以上のことから、本実施形態により、ユーザは入力したコメントに対応するシーン区間を容易に抽出することができるようになり、オーサリングの作業の負担を軽減することができる。 As described above, according to the present embodiment, the user can easily extract a scene section corresponding to the input comment, and the burden of authoring work can be reduced.

（第２の実施形態）
以下では、第１の実施形態と相違する点を中心に説明する。 (Second Embodiment)
Below, it demonstrates centering on the point which is different from 1st Embodiment.

図１２に、本発明の第２の実施形態に係るオーサリング情報共有システムの構成例を示す。 FIG. 12 shows a configuration example of an authoring information sharing system according to the second embodiment of the present invention.

図１２に示されるように、本実施形態のオーサリング情報共有システムには、ネットワーク５に接続されたサーバ３とクライアント２とが含まれている。なお、クライアント２は、図１２では１台のみ示しているが、複数台存在して構わない。 As shown in FIG. 12, the authoring information sharing system of this embodiment includes a server 3 and a client 2 connected to a network 5. Although only one client 2 is shown in FIG. 12, a plurality of clients 2 may exist.

クライアント２は、ユーザからのコメントの入力やコメント入力のための所定の指示等を受け付けるコメント入力部１１、シーン種別に応じてコンテンツからシーンを抽出するシーン抽出部１２、コンテンツの表示やコメント入力のための表示等を行う表示部１３、ユーザにより入力されたコメントに基づいて、コメント付与の対象となるシーンのシーン種別を判別するシーン種別判別部１４、シーン種別の判別のためのルールを蓄積したシーン種別判別ルール記憶部１５、ユーザがコメントを付与する対象であるコンテンツを蓄積したコンテンツ蓄積部１６、コンテンツの再生等を制御するコンテンツ制御部１７、サーバ３との通信を行う通信部１８を備えている。すなわち、図１２のクライアント２は、図１２のオーサリング支援装置１に、通信部１８を追加したものである。 The client 2 includes a comment input unit 11 that receives a comment input from a user, a predetermined instruction for comment input, a scene extraction unit 12 that extracts a scene from content according to a scene type, content display and comment input A display unit 13 for performing display, a scene type determining unit 14 for determining a scene type of a scene to be commented based on a comment input by a user, and a rule for determining a scene type. A scene type determination rule storage unit 15, a content storage unit 16 that stores content to which a user gives comments, a content control unit 17 that controls playback of content, and a communication unit 18 that communicates with the server 3. ing. That is, the client 2 in FIG. 12 is obtained by adding a communication unit 18 to the authoring support apparatus 1 in FIG.

サーバ３は、クライアント２との通信を行う通信部３３、投稿されたオーサリング情報を蓄積するオーサリング情報蓄積部３２、コメントを要約するコメント要約部３１を備えている。 The server 3 includes a communication unit 33 that communicates with the client 2, an authoring information storage unit 32 that stores posted authoring information, and a comment summary unit 31 that summarizes comments.

なお、図１２の構成例では、コメント要約部３１は、サーバ３側に設けられているが、各々のクライアント２側に設けるようにしてもよい。また、サーバ３側にコメント要約部を設けるとともに、クライアント２の全部又は一部にコメント要約部３１を設けるようにしても構わない。 In the configuration example of FIG. 12, the comment summary unit 31 is provided on the server 3 side, but may be provided on each client 2 side. Further, the comment summary unit may be provided on the server 3 side, and the comment summary unit 31 may be provided on all or part of the client 2.

ユーザは、第１の実施形態で説明した方法を用いて、対象コンテンツに付与すべきコメントを入力し、クライアント２は、第１の実施形態のオーサリング支援装置１と同様にして、そのコメントに対応するシーンを抽出する。本実施形態では、一人のユーザにより付与されたコメントを、複数のユーザ間で共有するために、そのクライアント２からサーバ３への投稿を行う。 The user inputs a comment to be given to the target content by using the method described in the first embodiment, and the client 2 responds to the comment in the same manner as the authoring support apparatus 1 in the first embodiment. The scene to be extracted. In the present embodiment, in order to share a comment given by one user among a plurality of users, posting from the client 2 to the server 3 is performed.

ユーザがクライアント２においてコメントを記入し、そのコメントに対応するシーンがクライアント２により抽出された後、例えばクライアント２に設けられた投稿ボタン（図示せず）をユーザが押すことにより、これに応答してクライアント２は、コメントと、これに対応するシーン区間の時間情報と、シーン種別とを含むオーサリング情報を、通信部１８を用いて、サーバ３へ送信する。 After the user enters a comment in the client 2 and the scene corresponding to the comment is extracted by the client 2, the user responds to this by pressing a post button (not shown) provided in the client 2, for example. Then, the client 2 transmits the authoring information including the comment, the time information of the corresponding scene section, and the scene type to the server 3 using the communication unit 18.

サーバ３は、クライアント２から投稿されたオーサリング情報を、通信部３３を用いて受信し、オーサリング情報蓄積部３２に蓄積する。 The server 3 receives the authoring information posted from the client 2 using the communication unit 33 and accumulates it in the authoring information storage unit 32.

共有されたオーサリング情報は、個々のクライアント２で表示することができる。 The shared authoring information can be displayed by each client 2.

図１３に、このときのコメント表示画面の一例を示す。 FIG. 13 shows an example of the comment display screen at this time.

図１０に例示するように、このコメント表示画面には、コメント表示部６０１、コンテンツ表示画面６０２が設けられている。クライアント２においては、オーサリング情報に含まれる「シーン区間の時間情報」をもとに、オーサリング情報に含まれる「コメント」に対応するシーンを再生するときに同期して、当該「コメント」を表示することができる。また、「コメント」と同時に、オーサリング情報に含まれる「シーン種別」を表示することができる。 As illustrated in FIG. 10, a comment display unit 601 and a content display screen 602 are provided on the comment display screen. The client 2 displays the “comment” in synchronization with the playback of the scene corresponding to the “comment” included in the authoring information based on the “time information of the scene section” included in the authoring information. be able to. In addition to the “comment”, the “scene type” included in the authoring information can be displayed.

サーバ３は、クライアント２からオーサリング情報の送信要求を受けると、対応するオーサリング情報をクライアント２へ送信する。 When the server 3 receives the authoring information transmission request from the client 2, the server 3 transmits the corresponding authoring information to the client 2.

また、ユーザが選択したオーサリング情報に含まれる「シーン種別」も表示することにより、簡単なコメントであっても、それが「カット」についてのコメントなのか、「セリフ」についてのコメントなのか、「ＣＭ」についてのコメントなのかなどを、ユーザは理解することが可能となる。 In addition, by displaying the “scene type” included in the authoring information selected by the user, even if it is a simple comment, whether it is a comment about “cut” or “comment”, The user can understand whether the comment is about “CM”.

これにより、ユーザ間でシーンに対応するコメントを共有することができる。 Thereby, the comment corresponding to a scene can be shared between users.

ところで、あるシーンに対応する共有されているオーサリング情報が多すぎると、クライアント２で表示できない場合がある。この場合、コメントが表示できるようにコメント要約部３１によりコメントを要約するようにしてもよい。 By the way, if there is too much shared authoring information corresponding to a certain scene, the client 2 may not be able to display it. In this case, the comment summary unit 31 may summarize the comment so that the comment can be displayed.

コメント要約部３１は、言語処理技術を用いて、似ているコメントを要約する。これには言語処理技術の複数文書の要約技術を用いることができる。 The comment summary unit 31 summarizes similar comments using language processing technology. For this, a multi-document summarization technique of a language processing technique can be used.

例えば、各コメントから形態素解析技術により単語を抽出する。同じシーン区間に対応する複数のコメントから、重複する単語が多いコメントを類似したコメントとして、要約する。２つのコメントを要約する方法は、例えば、片方のコメントのみを選択することで可能である。 For example, words are extracted from each comment by a morphological analysis technique. From a plurality of comments corresponding to the same scene section, comments with many overlapping words are summarized as similar comments. A method of summarizing two comments is possible, for example, by selecting only one comment.

同じシーンであるか否かの判断方法については、例えば、オーサリング情報が持つシーン区間の時間情報が一致する場合にのみ同じシーンと判断する方法や、オーサリング情報が持つシーン区間の時間情報が一致するか又は一致しなくても類似している場合に同じシーンと判断する方法などが考えられる。 As for a method for determining whether or not the scenes are the same, for example, a method of determining that the scenes are the same only when the time information of the scene sections included in the authoring information matches, or the time information of the scene sections included in the authoring information matches. Or a method of determining that the scenes are the same if they are similar even if they do not match.

また、ユーザがシーン種別を選択してオーサリング情報を作成した場合、あるユーザはセリフに関するコメントとして作成し、また別のユーザは同じ区間で同じ単語を使ってコメントを作成しても、カットに関するコメントを作成した場合には、別のコメント種別を選択することになる。この場合、コメントの表示は類似していてもユーザの意図は異なるため、要約しないことが望ましい。 In addition, when the user selects the scene type and creates the authoring information, one user creates a comment about the dialogue, and another user creates a comment using the same word in the same section, but the comment about the cut When a comment is created, another comment type is selected. In this case, since the display of the comment is similar, the intention of the user is different, so it is desirable not to summarize.

よって、同じシーンに対応するコメントで要約するのではなく、同じシーンでかつ同じシーン種別を持つコメントで要約処理を行うことにより、例えば、同じシーンでかつコメントに含まれる単語が似ている場合でも、シーン種別が異なる場合には要約しない処理が可能となり、ユーザの意図に沿った表示と要約が可能となる。 Therefore, instead of summarizing with comments corresponding to the same scene, for example, even if the words included in the comment are similar in the same scene by performing the summarization process with the same scene and the same scene type. When the scene types are different, processing without summarization is possible, and display and summarization according to the user's intention are possible.

以上のことから、ユーザは容易にシーンに対するコメントを作成することができ、また容易にコメント情報を共有・表示することが可能となる。 From the above, the user can easily create a comment for a scene, and can easily share and display comment information.

なお、以上の各機能は、ソフトウェアとして記述し適当な機構をもったコンピュータに処理させても実現可能である。
また、本実施形態は、コンピュータに所定の手順を実行させるための、あるいはコンピュータを所定の手段として機能させるための、あるいはコンピュータに所定の機能を実現させるためのプログラムとして実施することもできる。加えて該プログラムを記録したコンピュータ読取り可能な記録媒体として実施することもできる。 Each of the above functions can be realized even if it is described as software and processed by a computer having an appropriate mechanism.
The present embodiment can also be implemented as a program for causing a computer to execute a predetermined procedure, causing a computer to function as a predetermined means, or causing a computer to realize a predetermined function. In addition, the present invention can be implemented as a computer-readable recording medium on which the program is recorded.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の第１の実施形態に係るオーサリング支援装置の構成例を示す図The figure which shows the structural example of the authoring assistance apparatus which concerns on the 1st Embodiment of this invention. 同実施形態に係るオーサリング支援装置における処理手順の一例を示すフローチャートThe flowchart which shows an example of the process sequence in the authoring assistance apparatus which concerns on the same embodiment 同実施形態におけるコメント入力インターフェースの一例を示す図The figure which shows an example of the comment input interface in the embodiment シーン種別判定ルールの一例を示す図The figure which shows an example of a scene classification determination rule シーン種別判定ルールの一例を示す図The figure which shows an example of a scene classification determination rule シーン種別判定ルールの一例を示す図The figure which shows an example of a scene classification determination rule 字幕の開始時間及び終了時間について説明するための図A diagram for explaining the start time and end time of subtitles カットの開始時間及び終了時間について説明するための図Diagram for explaining the start time and end time of cut ＣＭの開始時間及び終了時間について説明するための図A diagram for explaining the start time and end time of CM 表示画面の一例を示す図Figure showing an example of the display screen 同実施形態に係るオーサリング支援装置における処理手順の他の例を示すフローチャートThe flowchart which shows the other example of the process sequence in the authoring assistance apparatus which concerns on the same embodiment 本発明の第２の実施形態に係るオーサリング情報共有システムの構成例を示す図The figure which shows the structural example of the authoring information sharing system which concerns on the 2nd Embodiment of this invention. 表示画面の一例を示す図Figure showing an example of the display screen

符号の説明Explanation of symbols

１…オーサリング支援装置、２…クライアント装置、１１…コメント入力部、１２…シーン抽出部、１３…表示部、１４…シーン種別判別部、１５…シーン種別判別ルール記憶部、１６…コンテンツ蓄積部、１７…コンテンツ制御部、１８…通信部、５…ネットワーク、３…サーバ装置、３１…コメント要約部、３２…オーサリング情報蓄積部、３３…通信部 DESCRIPTION OF SYMBOLS 1 ... Authoring assistance apparatus, 2 ... Client apparatus, 11 ... Comment input part, 12 ... Scene extraction part, 13 ... Display part, 14 ... Scene classification discrimination | determination part, 15 ... Scene classification discrimination rule memory | storage part, 16 ... Content storage part, DESCRIPTION OF SYMBOLS 17 ... Content control part, 18 ... Communication part, 5 ... Network, 3 ... Server apparatus, 31 ... Comment summary part, 32 ... Authoring information storage part, 33 ... Communication part

Claims

少なくとも映像を含むコンテンツを記憶する記憶手段と、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、
入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、
前記抽出の結果を表示する表示手段とを備えたことを特徴とするオーサリング支援装置。 Storage means for storing content including at least video;
The input of the comment to be added to a predetermined scene in the content, an input unit attaching accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the respective character information defined by advance one or a plurality in association with each individual scene type, the predetermined scene scene imparting the comment A discriminating means for discriminating the type;
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content Extraction means for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
An authoring support apparatus comprising display means for displaying the extraction result.

少なくとも映像を含むコンテンツを記憶する記憶手段と、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、
入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、
前記抽出の結果を表示する表示手段とを備えたことを特徴とするオーサリング支援装置。 Storage means for storing content including at least video;
The input of the comment to be added to a predetermined scene in the content, an input unit attaching accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the image data included in the content according to a method defined in advance in association with the scene type, a character information obtained from the audio data or text data, the comment Discriminating means for discriminating the scene type of the predetermined scene to which
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content Extraction means for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
An authoring support apparatus comprising display means for displaying the extraction result.

前記入力手段は、前記コメントの入力を受け付けるのに先立って、前記コンテンツの再生中又は一時停止中にコメント書込指示の入力を受け付け、該コメント書込指示が入力された場合に、前記コメントを入力するためのコメント入力インターフェースを表示して、該コメント入力インターフェースにより前記コメントの入力を受け付けることを特徴とする請求項１または２に記載のオーサリング支援装置。 Prior to accepting the input of the comment, the input means accepts an input of a comment writing instruction during playback or pause of the content, and when the comment writing instruction is input, display the comment input interface for inputting, authoring support apparatus according to claim 1 or 2, characterized in that receiving an input of the comment by the comment input interface.

前記特定の箇所は、前記コンテンツの再生中にユーザにより前記コメント書込指示が入力された場合における、該コメント書込指示が入力されたときに再生されていた前記コンテンツ中の箇所、又は前記コンテンツの一時停止中にユーザにより前記コメント書込指示が入力された場合における、該コメント書込指示が入力されたときに一時停止されていた前記コンテンツ中の箇所であることを特徴とする請求項３記載のオーサリング支援装置。 The particular area, in a case where the comment write instruction by the user during the reproduction of the content is input, location in said content that was being reproduced when the comment writing instruction is input, or the content in the case where the comment write instruction by the user during a temporary stop is input, according to claim 3, characterized in that the portion of the in content that has been paused when the comment writing instruction is input authoring support equipment described.

前記抽出手段は、前記コンテンツが音声をも含むものである場合に前記コンテンツの音声を認識する音声認識手段を有するものであり、
前記抽出手段は、選択された前記シーン種別がセリフである場合に、前記所定のシーンの区間を示す情報として、前記音声認識手段による音声認識結果に応じて、前記コンテンツからセリフの区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means, the content is one having a voice recognition means for recognizing a voice of the content if it is intended to include speech,
Before Ki抽 out means, if the selected the scene type is a speech, wherein the information indicating a section of a predetermined scene, in response to said speech recognition result by the speech recognition means, the section of speech from the content authoring support device according to claim 1 or 2, characterized in that to extract information indicating a.

前記抽出手段は、前記コンテンツのＣＭ区間を認識するＣＭ区間認識手段を有するものであり、
前記抽出手段は、選択された前記シーン種別がＣＭである場合に、前記所定のシーンの区間を示す情報として、前記ＣＭ区間認識手段によるＣＭ区間認識結果に応じて、前記コンテンツからＣＭの区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means are those having the CM segment recognizing means for recognizing a CM section of the content,
Before Ki抽 out means, if the selected the scene type is a CM, as information indicating a section of the predetermined scene, according to the CM section recognition result obtained by the CM interval recognizing means, CM from the content The authoring support apparatus according to claim 1 or 2, wherein information indicating a section is extracted.

前記抽出手段は、コンテンツのカットを認識するカット認識手段を有するものであり、
前記抽出手段は、選択された前記シーン種別がカットである場合に、前記所定のシーンの区間を示す情報として、前記カット認識手段によるカット認識結果に応じて、前記コンテンツからカットの区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means is one having a cut-recognizing means for recognizing the cut content,
Before Ki抽 out means, if the selected the scene type is a cut, the as information indicating a section of a predetermined scene, in response to said cut recognition result obtained by cutting recognition means section cut from the content authoring support device according to claim 1 or 2, characterized in that to extract information indicating a.

前記抽出手段は、前記コンテンツのテロップを認識するテロップ認識手段を有するものであり、
前記抽出手段は、選択された前記シーン種別がテロップである場合に、前記所定のシーンの区間を示す情報として、前記テロップ認識手段によるテロップ認識結果に応じて、前記コンテンツからテロップの区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means is one having a telop recognition means for recognizing the telop of the content,
Before Ki抽 detecting means, when the scene type selected is telop, as information indicating a section of the predetermined scene, according to telop recognition result by the telop recognition unit, the section of the telop from the content authoring support device according to claim 1 or 2, characterized in that to extract information indicating a.

前記抽出手段は、前記コンテンツの字幕を認識する字幕認識手段を有するものであり、
前記抽出手段は、選択された前記シーン種別が字幕である場合に、前記所定のシーンの区間を示す情報として、前記字幕認識手段による字幕認識結果に応じて、前記コンテンツから字幕の区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means is one having a caption recognizing means for recognizing the caption of the content,
Before Ki抽 detecting means, when the scene type selected is a subtitle, as information indicating a section of the predetermined scene, according to the caption recognition result by the caption recognition means, the interval of the subtitle from the content authoring support device according to claim 1 or 2, characterized in that to extract information indicating a.

前記抽出手段は、前記コンテンツの字幕を認識する字幕認識手段と、前記コンテンツが音声をも含むものである場合に前記コンテンツの音声を認識する音声認識手段とを有するものであり、
前記抽出手段は、選択された前記シーン種別が字幕である場合に、前記所定のシーンの区間を示す情報として、前記字幕認識手段により前記コンテンツの字幕から、対応する音声の内容を抽出し、次いで、前記音声認識手段により、前記コンテンツから、該抽出された内容の音声が発話された区間を示す情報を抽出することによって、前記コンテンツから字幕の区間を示す情報を抽出するものであることを特徴とする請求項１または２に記載のオーサリング支援装置。 Before Ki抽 detecting means includes recognizing caption recognition means subtitles of the content, the content is one having a speech recognition means for recognizing a voice of the content if it is intended to include speech,
Before Ki抽 detecting means is extracted when the scene type selected is a subtitle, as information indicating a section of the predetermined scene, the subtitle of the content by the caption recognition means, the contents of the corresponding audio Then, the information indicating the subtitle section is extracted from the content by extracting the information indicating the section in which the extracted speech is uttered from the content by the voice recognition means. The authoring support apparatus according to claim 1 or 2, characterized in that

前記表示手段は、入力された前記コメントの内容に基づいて抽出された前記所定のシーンを優先的に表示することを特徴とする請求項１ないし１０のいずれか１項に記載のオーサリング支援装置。 The display means, the authoring assist apparatus according to any one of claims 1 to 10, characterized in that displaying the predetermined scene extracted based on the content of the comments entered preferentially.

前記表示手段は、入力された前記コメントの内容に基づいて抽出された前記所定のシーンを強調して表示することを特徴とする請求項１ないし１１のいずれか１項に記載のオーサリング支援装置。 The authoring support apparatus according to any one of claims 1 to 11 , wherein the display unit highlights and displays the predetermined scene extracted based on the content of the input comment.

少なくとも映像を含むコンテンツを記憶する記憶手段を備えたオーサリング支援装置のオーサリング支援方法において、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける受付ステップと、
入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別ステップと、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出ステップと、
前記抽出の結果を表示する表示ステップとを有することを特徴とするオーサリング支援方法。 In an authoring support method for an authoring support device comprising a storage means for storing content including at least video,
The input of the comment to be added to a predetermined scene in the content, and accepting step of applying accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the respective character information defined by advance one or a plurality in association with each individual scene type, the predetermined scene scene imparting the comment A discriminating step for discriminating the type;
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content An extraction step for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
An authoring support method comprising: a display step of displaying the extraction result.

少なくとも映像を含むコンテンツを記憶する記憶手段を備えたオーサリング支援装置のオーサリング支援方法において、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける受付ステップと、
入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別ステップと、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出ステップと、
前記抽出の結果を表示する表示ステップとを有することを特徴とするオーサリング支援方法。 In an authoring support method for an authoring support device comprising a storage means for storing content including at least video,
The input of the comment to be added to a predetermined scene in the content, and accepting step of applying accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the image data contained in the content according to a method defined in advance in association with the scene type, a character information obtained from the audio data or text data, the comment Determining step for determining the scene type of the predetermined scene to which
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content An extraction step for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
An authoring support method comprising: a display step of displaying the extraction result.

少なくとも映像を含むコンテンツを記憶する記憶手段を備えたオーサリング支援装置としてコンピュータを機能させるためのプログラムであって、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける受付ステップと、
入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別ステップと、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出ステップと、
前記抽出の結果を表示する表示ステップとをコンピュータに実行させることを特徴とするプログラム。 A program for causing a computer to function as an authoring support apparatus including a storage unit that stores content including at least a video,
The input of the comment to be added to a predetermined scene in the content, and accepting step of applying accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the respective character information defined by advance one or a plurality in association with each individual scene type, the predetermined scene scene imparting the comment A discriminating step for discriminating the type;
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content An extraction step for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
A program causing a computer to execute a display step of displaying the extraction result.

少なくとも映像を含むコンテンツを記憶する記憶手段を備えたオーサリング支援装置としてコンピュータを機能させるためのプログラムであって、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける受付ステップと、
入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別ステップと、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出ステップと、
前記抽出の結果を表示する表示ステップとをコンピュータに実行させることを特徴とするプログラム。 A program for causing a computer to function as an authoring support apparatus including a storage unit that stores content including at least a video,
The input of the comment to be added to a predetermined scene in the content, and accepting step of applying accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the image data included in the content according to a method defined in advance in association with the scene type, a character information obtained from the audio data or text data, the comment Determining step for determining the scene type of the predetermined scene to which
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content An extraction step for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
A program causing a computer to execute a display step of displaying the extraction result.

サーバ装置と、複数のクライアント装置とを含むオーサリング情報共有システムにおいて、
前記クライアント装置は、
少なくとも映像を含むコンテンツを記憶する記憶手段と、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、
入力された前記コメントの内容と、個々のシーン種別ごとに対応付けて予め1又は複数個ずつ規定された文字情報の各々とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、
前記抽出の結果を表示する表示手段と、
前記サーバ装置へ、前記コメントと前記シーン種別と前記所定のシーンの区間を示す情報とを含むオーサリング情報を送信する送信手段とを備え、
前記サーバ装置は、
前記クライアント装置から前記オーサリング情報を受信する受信手段と、
受信された前記オーサリング情報を記憶するオーサリング情報記憶手段とを備えたことを特徴とするオーサリング情報共有システム。 In an authoring information sharing system including a server device and a plurality of client devices,
The client device is
Storage means for storing content including at least video;
The input of the comment to be added to a predetermined scene in the content, an input unit attaching accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the respective character information defined by advance one or a plurality in association with each individual scene type, the predetermined scene scene imparting the comment A discriminating means for discriminating the type;
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content Extraction means for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
Display means for displaying the result of the extraction;
Wherein the server apparatus, and transmission means for transmitting the authoring information including the information indicating the comments and the scene type and interval of the predetermined scene,
The server device
Receiving means for receiving the authoring information from the client device;
An authoring information sharing system comprising: authoring information storage means for storing the received authoring information.

サーバ装置と、複数のクライアント装置とを含むオーサリング情報共有システムにおいて、
前記クライアント装置は、
少なくとも映像を含むコンテンツを記憶する記憶手段と、
前記コンテンツ中の所定のシーンに付与するコメントの入力を、前記コンテンツにおける特定の箇所を示す代表時間と関連付けて受け付ける入力手段と、
入力された前記コメントの内容と、シーン種別に対応付けて予め規定された方法に従って前記コンテンツに含まれる画像データ、音声データ又はテキストデータから得られた文字情報とを比較することによって、前記コメントを付与する前記所定のシーンのシーン種別を判別する判別手段と、
判別された前記シーン種別に応じて、前記コンテンツ中の前記代表時間を含む領域又は前記代表時間の近傍の領域から、前記所定のシーンの区間を示す情報として、前記コンテンツにおける該シーン種別に係る区間の開始点と終了点とを特定可能とする時間情報を抽出する抽出手段と、
前記抽出の結果を表示する表示手段と、
前記サーバ装置へ、前記コメントと前記シーン種別と前記所定のシーンの区間を示す情報とを含むオーサリング情報を送信する送信手段とを備え、
前記サーバ装置は、
前記クライアント装置から前記オーサリング情報を受信する受信手段と、
受信された前記オーサリング情報を記憶するオーサリング情報記憶手段とを備えたことを特徴とするオーサリング情報共有システム。 In an authoring information sharing system including a server device and a plurality of client devices,
The client device is
Storage means for storing content including at least video;
The input of the comment to be added to a predetermined scene in the content, an input unit attaching accepted in association with the representative time indicating a specific location in the content,
And the contents of the inputted comment, by comparing the image data included in the content according to a method defined in advance in association with the scene type, a character information obtained from the audio data or text data, the comment Discriminating means for discriminating the scene type of the predetermined scene to which
Depending on the determined the scene type, the content the region into a neighboring region or the representative time including representatives time during, as information indicating a section of the predetermined scene, the scene in the content Extraction means for extracting time information that makes it possible to specify the start point and end point of the section related to the type ;
Display means for displaying the result of the extraction;
Wherein the server apparatus, and transmission means for transmitting the authoring information including the information indicating the comments and the scene type and interval of the predetermined scene,
The server device
Receiving means for receiving the authoring information from the client device;
An authoring information sharing system comprising: authoring information storage means for storing the received authoring information.

前記サーバ装置は、前記オーサリング情報記憶手段に記憶されている複数の前記オーサリング情報にそれぞれ含まれる前記コメントを要約するためのコメント要約手段を更に備えたことを特徴とする請求項１７または１８に記載のオーサリング情報共有システム。 19. The server device according to claim 17 or 18, further comprising comment summarizing means for summarizing the comments respectively included in the plurality of authoring information stored in the authoring information storage means. Authoring information sharing system.

前記コメント要約手段は、前記オーサリング情報記憶手段に記憶されている複数の前記オーサリング情報にそれぞれ含まれる前記コメントのうち、前記シーン種別が同じであり且つ前記所定のシーンが相対応するものについて、前記要約を行うことを特徴とする請求項１９に記載のオーサリング情報共有システム。 The comment summarizing unit includes the comments included in each of the plurality of authoring information stored in the authoring information storage unit, and the scene type is the same and the predetermined scene corresponds to the comment. The authoring information sharing system according to claim 19 , wherein summarization is performed.