JP2009267693A

JP2009267693A - Moving image collaboration system and method, and computer program

Info

Publication number: JP2009267693A
Application number: JP2008113870A
Authority: JP
Inventors: Mikio Maeda; 幹夫前田; Yoji Yamato; 庸次山登
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-04-24
Filing date: 2008-04-24
Publication date: 2009-11-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide moving image collaboration which can record voice on a moving image in accordance with the moving image in moving image viewing on a network and which allows a user to listen the recorded voice. <P>SOLUTION: The moving image collaboration system is composed of a server 200 and a user terminal 390, and the user terminal 390 has means for transmitting voice (record) files recorded and associated with a reproduction time of the moving image in the user terminal 390 to the server 200. The server 200 uses the voice (record) files received from the user terminal 390 to combine voices and produce a voice (combined) file in synchronization with the reproduction time of the moving image, and distributes the file to the user terminal 390 together with moving image data. By reproducing the moving data and voice (combined) file received in the user terminal 390, a user 400 can hear voices of other users while viewing the moving image. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、動画ファイルの再生に合わせてユーザが音声を吹き込み、複数のユーザが吹き込んだ音声を再生して、ネットワークユーザ間でコラボレーションを行う動画コラボレーションシステム及び方法ならびにコンピュータプログラムに関する。 The present invention relates to a moving image collaboration system and method, and a computer program, in which a user blows sounds in accordance with the reproduction of a moving image file, and reproduces the sounds blown by a plurality of users to collaborate between network users.

従来、ネットワーク上で動画ファイルを共有し、動画ファイルを視聴しながら、動画の時間軸に合わせてチヤット等の文字コミュニケーションを行なうサービスがある（例えば、非特許文献１、非特許文献２参照）。
ウィキペディア，［online］，［平成２０年３月１１日］，インターネット＜URL:http://ja.wikipedia.org/wiki/YouTube＞ウィキペディア，［online］，［平成２０年３月１１日］，インターネット＜URL:http://ja.wikipedia.org/wiki/ニコニコ動画＞ 2. Description of the Related Art Conventionally, there is a service for sharing a moving image file on a network and performing character communication such as chat in accordance with the moving image time axis while viewing the moving image file (see, for example, Non-Patent Document 1 and Non-Patent Document 2).
Wikipedia, [online], [March 11, 2008], Internet <URL: http: //en.wikipedia.org/wiki/YouTube> Wikipedia, [online], [March 11, 2008], Internet <URL: http: //en.wikipedia.org/wiki/Nico Nico Douga>

しかし、従来の技術では、動画ファイルを視聴しながら、動画の時間軸に合わせてユーザ自身が発声する音声を共有する音声コミュニケーションを行う事ができず、円滑なコラボレーションをする事が出来なかった。 However, in the conventional technology, it is not possible to perform voice communication for sharing the voice uttered by the user in accordance with the time axis of the video while viewing the video file, and smooth collaboration cannot be performed.

上述の課題を鑑み、本発明は、動画の時間軸に合わせて複数のユーザが発声する音声を共有することができる動画コラボレーションシステム及び方法ならびにコンピュータプログラムを提供することを目的とする。 In view of the above-described problems, an object of the present invention is to provide a moving image collaboration system and method, and a computer program that can share voices uttered by a plurality of users in accordance with a moving image time axis.

本発明は、動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムであって、前記サーバは、前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成手段と、前記端末へ動画データを送信する動画送信手段と、前記動画送信手段が送信する動画データに併せて、前記音声合成手段が生成した合成音声データを前記端末へ送信する音声送信手段とを備え、前記端末は、前記サーバから動画データを受信する動画受信手段と、前記動画受信手段が受信した動画データを再生する動画再生手段と、前記動画データに併せて前記サーバから合成音声データを受信する音声受信手段と、前記音声受信手段が受信した合成音声データの音声を再生する音声再生手段と、自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信手段とを備える、ことを特徴とする動画コラボレーションシステムである。 The present invention is a video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network, the server corresponding to the playback time of the video received from the terminal A plurality of voiced audio data, which are attached audio data, are used to synthesize a plurality of voices so that the playback times of the videos match, and to generate synthesized voice data, and to send the video data to the terminal Video transmission means that transmits the synthesized voice data generated by the voice synthesis means to the terminal in addition to the video data transmitted by the video transmission means. Moving image receiving means for receiving data, moving image reproducing means for reproducing moving image data received by the moving image receiving means, and the server together with the moving image data Voice receiving means for receiving the synthesized voice data, voice playback means for playing back the voice of the synthesized voice data received by the voice receiving means, and voice data that is blown into the terminal and associated with the playback time of the video. A moving picture collaboration system comprising a blowing voice transmitting means for sending certain blowing voice data to the server.

本発明は、上記動画コラボレーションシステムであって、前記端末は、検索文字列を前記サーバへ送信する検索要求手段と、前記検索要求手段によって送信した検索文字列に対応して、前記サーバから動画データの識別子の一覧を受信する一覧受信手段と、前記一覧受信手段が受信した一覧の中から選択した識別子を前記サーバへ送信する識別子送信手段とをさらに備え、前記サーバは、動画データ及び合成音声データを識別子と対応付けて記憶するとともに、動画データのメタデータ及び識別子を対応付けて記憶する記憶手段と、前記端末から受信した検索文字列により前記記憶手段を検索し、検索の結果得られた識別子の一覧を取得し、前記端末へ返送するメタデータ管理手段とをさらに備え、前記動画送信手段は、前記端末から受信した識別子に対応する動画データを前記記憶手段から読み出して返送し、前記音声送信手段は、前記端末から受信した識別子に対応する合成音声データを前記記憶手段から読み出して返送する、ことを特徴とする。 The present invention is the above-described video collaboration system, wherein the terminal corresponds to the search request unit that transmits a search character string to the server, and the video data from the server corresponding to the search character string transmitted by the search request unit. Further comprising: a list receiving means for receiving a list of identifiers; and an identifier transmitting means for transmitting an identifier selected from the list received by the list receiving means to the server, wherein the server includes moving image data and synthesized audio data. Is stored in association with the identifier, the storage means for storing the metadata of the video data and the identifier in association with each other, and the storage means is searched with the search character string received from the terminal, and the identifier obtained as a result of the search A metadata management means for acquiring a list of data and returning it to the terminal, wherein the moving picture transmitting means receives from the terminal The moving image data corresponding to the identifier returned by reading from said memory means, said voice transmitting unit returns the synthesized speech data corresponding to the identifier received from the terminal is read from said storage means, characterized in that.

本発明は、上記動画コラボレーションシステムであって、前記記憶手段は、前記動画データの音声から人の声を消去した音声のデータである消去音声データをさらに前記識別子と対応付けて記憶し、前記音声送信手段は、前記端末から受信した識別子に対応する消去音声データを前記記憶手段から読み出して返送し、前記音声受信手段は、前記動画データに併せて前記サーバから、消去音声データを受信し、前記音声再生手段は、前記動画データの音声を消音して、前記消去音声データを再生する、ことを特徴とする。 The present invention is the above-described video collaboration system, wherein the storage unit further stores erased voice data, which is voice data obtained by erasing a human voice from the voice of the moving picture data, in association with the identifier, and stores the voice The transmission means reads out the erasure audio data corresponding to the identifier received from the terminal and returns it from the storage means, the audio reception means receives the erasure audio data from the server together with the moving image data, and The sound reproduction means is characterized in that the sound of the moving image data is muted and the erased sound data is reproduced.

本発明は、上記動画コラボレーションシステムであって、前記端末は、動画データと、当該動画データのメタデータを前記サーバへ送信する登録動画送信手段をさらに備え、前記サーバは、前記端末から動画データとメタデータを受信する登録動画受信手段と、動画データの識別子と対応づけて、前記登録動画受信手段が受信した動画データ及びメタデータを前記記憶手段に書き込む動画登録手段と、前記登録動画受信手段が受信した動画データから、音声のみを抜き出した分離音声データを生成し、当該分離音声データから人の声を除いて消去音声データを生成し、生成した消去音声データを前記識別子と対応付けて前記記憶手段に保存する音声消去手段とをさらに備える、ことを特徴とする。 The present invention is the above-described video collaboration system, wherein the terminal further includes registered video transmission means for transmitting video data and metadata of the video data to the server, and the server receives video data from the terminal Registered moving image receiving means for receiving metadata, moving image registration means for writing the moving image data and metadata received by the registered moving image receiving means in the storage means in association with the identifier of the moving image data, and the registered moving image receiving means Separated voice data extracted from the received video data is generated, human voice is removed from the separated voice data, erased voice data is generated, and the generated erased voice data is associated with the identifier and stored. Voice erasing means stored in the means.

本発明は、上記動画コラボレーションシステムであって、前記端末は、ユーザの認証情報を前記サーバへ送信する認証要求手段をさらに備え、前記サーバは、前記端末からユーザの認証情報を受信し、当該認証情報によって認証を行うユーザ管理手段と、前記ユーザ管理手段によって認証された場合に、前記ユーザに対応したプラグインソフトウェアを前記端末へ返送するアプリケーション管理手段とをさらに備え、前記端末は、前記認証要求手段により送信した認証情報が認証された場合に返送されるプラグインソフトウェアにより、前記動画受信手段、前記動画再生手段、前記音声受信手段、前記音声再生手段、及び前記吹込音声送信手段を生成する、ことを特徴とする。 The present invention is the above-described video collaboration system, wherein the terminal further includes an authentication request unit that transmits user authentication information to the server, and the server receives the user authentication information from the terminal, and User management means for performing authentication based on information; and application management means for returning plug-in software corresponding to the user to the terminal when authenticated by the user management means, wherein the terminal includes the authentication request Generating the moving picture receiving means, the moving picture playing means, the voice receiving means, the voice playing means, and the blowing voice sending means by plug-in software returned when the authentication information sent by the means is authenticated; It is characterized by that.

本発明は、上記動画コラボレーションシステムであって、前記サーバは、前記動画データを、前記動画再生手段によって再生可能なデータ形式に変換する動画変換手段をさらに備える、ことを特徴とする。 The present invention is the above-described moving image collaboration system, wherein the server further includes moving image conversion means for converting the moving image data into a data format reproducible by the moving image reproduction means.

また、本発明は、動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムに用いられる動画コラボレーション方法であって、前記サーバにおいて、音声合成手段が、前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成過程と、動画送信手段が、前記端末へ動画データを送信する動画送信過程と、音声送信手段が、前記動画送信過程において送信する動画データに併せて、前記音声合成過程において生成した合成音声データを前記端末へ送信する音声送信過程とを有し、前記端末において、動画受信手段が、前記サーバから動画データを受信する動画受信過程と、動画再生手段が、前記動画受信手段が受信した動画データを再生する動画再生過程と、音声受信手段が、前記動画データに併せて前記サーバから合成音声データを受信する音声受信過程と、音声再生手段が、前記音声受信過程において受信した合成音声データの音声を再生する音声再生過程と、吹込音声送信手段が、自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信過程とを有する、ことを特徴とする動画コラボレーション方法である。 The present invention is also a video collaboration method used in a video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network. Using a plurality of blow-in audio data, which is audio data associated with the playback time of the video, received from the terminal, synthesize a plurality of sounds so that the playback time of the video matches, and generate synthesized audio data A voice synthesizing process, a video transmitting process in which the video transmitting means transmits the video data to the terminal, and a voice generated by the voice transmitting means in the voice synthesizing process in addition to the video data transmitted in the video transmitting process. An audio transmission process for transmitting audio data to the terminal, wherein the moving image receiving means transmits the moving image from the server to the moving image. A video receiving process for receiving data, a video playback means for playing back the video data received by the video receiving means, and a voice receiving means for receiving synthesized voice data from the server together with the video data. The audio receiving process for receiving, the audio reproducing means for reproducing the sound of the synthesized voice data received in the audio receiving process, and the blowing audio transmitting means are blown into the terminal and correspond to the reproduction time of the moving image. It is a moving image collaboration method characterized by having the blowing audio | voice transmission process which transmits the blowing audio | voice data which are the data of the attached | subjected audio | voice to the said server.

また、本発明は、動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムの前記サーバとして用いられるコンピュータを、前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成手段、前記端末へ動画データを送信する動画送信手段、前記動画送信手段が送信する動画データに併せて、前記音声合成手段が生成した合成音声データを前記端末へ送信する音声送信手段、として動作させることを特徴とするコンピュータプログラムである。 In addition, the present invention provides a reproduction of a moving image received from the terminal as a computer used as the server of a moving image collaboration system in which a server that distributes a moving image and a terminal that reproduces the moving image are connected via a network. A plurality of voiced voice data, which are voice data associated with the time, a plurality of voices are synthesized so that the playback time of the video matches, and voice synthesis means for generating synthesized voice data; video data to the terminal A computer program for operating as a voice transmitting means for transmitting the synthesized voice data generated by the voice synthesizing means to the terminal together with the moving picture data transmitted by the moving picture transmitting means It is.

また、本発明は、動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムの前記端末として用いられるコンピュータを、前記サーバから動画データを受信する動画受信手段、前記動画受信手段が受信した動画データを再生する動画再生手段、前記動画データに併せて前記サーバから、複数の端末より送信された複数の吹込音声データを用いて、動画の再生時刻が合致するように複数の音声を合成して生成された合成音声データを受信する音声受信手段、前記音声受信手段が受信した合成音声データの音声を再生する音声再生手段、自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信手段、として動作させることを特徴とするコンピュータプログラムである。 Further, the present invention provides a moving image for receiving moving image data from the server, using a computer used as the terminal of the moving image collaboration system in which a moving image distribution server and a terminal that reproduces the moving image are connected via a network. Receiving means, moving picture reproducing means for reproducing moving picture data received by the moving picture receiving means, and using the plurality of blowing audio data transmitted from a plurality of terminals from the server together with the moving picture data, Audio receiving means for receiving synthesized audio data generated by synthesizing a plurality of sounds so as to match, audio reproducing means for reproducing the audio of the synthesized audio data received by the audio receiving means, Insufflation sound transmission means for transmitting insufflation sound data, which is sound data associated with the reproduction time, to the server A computer program for causing the work.

本発明によれば、サーバから配信する動画の視聴者間で肉声を共有して、動画コラボレーションを実現することができる。これにより、あたかもライブコンサート会場にいるかのような音場の共有が可能となる。 According to the present invention, it is possible to realize a video collaboration by sharing a voice between viewers of a video distributed from a server. This makes it possible to share the sound field as if it were in a live concert venue.

以下、本発明の一実施形態を、図面を参照して説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

まず、動画コラボレーションについてＴＶ（テレビジョン）とコンサートの例を引用し説明する。
ＴＶは、ＴＶ局より放映内容（以下、コンテンツ）配信し、エンドユーザのＴＶ受信機で配信されたコンテンツを受信して動画や音声を出力することにより、エンドユーザがコンテンツを楽しむものである。このとき、エンドユーザはコンテンツを鑑賞するのみで、自らが発する音声や動作を、同じコンテンツを同時に視聴している他のユーザへ伝える事ができない。このことは、ライブコンサートのように、会場でコンテンツ提供者（演奏者）とコンテンツ視聴者（客）が一体となってコンテンツを楽しむ共有感が提供されていない事を意味する。
本発明では、複数のユーザが発する音声により、上記共有感を生み出す事を可能とする。ここでは、これを、動画コラボレーション、若しくは音声の共有と命名する。 First, video collaboration will be explained with reference to examples of TV (television) and concerts.
The TV distributes broadcast content (hereinafter referred to as content) from a TV station, receives the content distributed by the end user's TV receiver, and outputs video and audio, thereby allowing the end user to enjoy the content. At this time, the end user only appreciates the content, and cannot transmit the voice or action he / she utters to other users who are simultaneously viewing the same content. This means that, as in a live concert, there is no shared feeling that content providers (players) and content viewers (customers) can enjoy together in the venue.
In the present invention, it is possible to generate the above-mentioned feeling of sharing by voices uttered by a plurality of users. Here, this is named video collaboration or audio sharing.

図１は、本発明の一実施形態による動画コラボレーションシステムの概要を示す図である。サーバ２００と、ユーザ４００が保有するユーザ端末３９０とは、ＩＰ（インターネットプロトコル）等を用いたネットワーク１００を通じて通信が可能である。ユーザ端末３９０は複数存在するが、同図においては、１つのみを記載している。 FIG. 1 is a diagram showing an overview of a video collaboration system according to an embodiment of the present invention. The server 200 and the user terminal 390 owned by the user 400 can communicate through the network 100 using IP (Internet Protocol) or the like. Although there are a plurality of user terminals 390, only one is shown in the figure.

サーバ２００は、サーバアプリケーション（以下、アプリケーションを「ＡＰＬ」と記載）の実行手段、２次記憶装置およびＳＱＬ装置等の記憶手段を有する。ユーザ端末３９０は、ブラウザ３００の実行手段を有し、ブラウザ３００上で動作するブラウザＡＰＬ（若しくはプラグインソフトウェア）を利用して、動画の保存、視聴、音声の共有が可能である。 The server 200 includes storage means such as execution means for a server application (hereinafter referred to as “APL”), a secondary storage device, and an SQL device. The user terminal 390 has a means for executing the browser 300, and can use a browser APL (or plug-in software) that operates on the browser 300 to store a moving image, view it, and share audio.

図２は、本発明の一実施形態による動画コラボレーションシステムが提供するサービスの概念を示す図である。ここでは、ユーザ４００として、ユーザ４００ａ、４００ｂ、４００ｃがサービスを利用しているものとする。 FIG. 2 is a diagram illustrating a concept of a service provided by the video collaboration system according to an embodiment of the present invention. Here, it is assumed that the user 400a, 400b, 400c is using the service as the user 400.

ユーザ４００ａは、自身の保持するユーザ端末３９０で、汎用的に使用されている既存のＷｅｂ（ウェブ）ブラウザ（例えば、ＩｎｔｅｒｎｅｔＥｘｐｌｏｒｅｒ、Ｆｉｒｅｆｏｘ等）と、汎用的に使用されている既存のプラグインソフトウェア（例えば、ＡｄｏｂｅＦｌａｓｈ等）をブラウザとして利用し、サーバ２００より配信される動画ファイル（ＦＬＶ形式等）を受信して再生し、視聴を行っている。
一方、ユーザ４００ｂは、動画ファイルを視聴しながら、プラグインソフトウェアの音声入力機能を利用し、ユーザ端末３９０に自らの音声を吹き込んでいる。吹込んだ音声の音声データ（以下、「音声（吹込）」と記載）は、ユーザ端末３９０からサーバ２００へ送信される。
また、ユーザ４００ｃは、ユーザ４００ｂと同様に、ユーザ端末３９０に音声を吹き込み、吹き込んだ音声の音声データである音声（吹込）は、ユーザ端末３９０からサーバ２００へ送信される。 The user 400a uses an existing Web browser (for example, Internet Explorer, Firefox, etc.) that is generally used on the user terminal 390 held by the user 400a, and existing plug-in software that is generally used (for example, Internet Explorer, Firefox). For example, Adobe Flash is used as a browser, and a moving image file (FLV format or the like) distributed from the server 200 is received, played, and viewed.
On the other hand, the user 400b uses his / her voice input function of the plug-in software while viewing the moving image file, and blows his / her voice into the user terminal 390. Voice data of the blown voice (hereinafter referred to as “voice (blowing)”) is transmitted from the user terminal 390 to the server 200.
Similarly to the user 400b, the user 400c blows voice into the user terminal 390, and voice (blow), which is voice data of the blown voice, is transmitted from the user terminal 390 to the server 200.

サーバ２００は、ユーザ４００ｂの音声データとユーザ４００ｃの音声データとを用い、動画の再生時間帯を一致させるように、ユーザ４００ｂの音声とユーザ４００ｃの音声の合成を行なう。
一方、配信される動画ファイルは、サーバ２００の２次記憶装置に保存されており、ユーザ４００ａのユーザ端末３９０から受信した要求に応じて配信される。また、サーバ２００は、あらかじめ動画ファイルの音声から人の声（肉声）の部分の音声を消去した音声消去データ（以下、「音声（消去）」）を作成し、該音声（消去）と音声（吹込）を合成した合成音声の音声データ（以下、「音声（合成）」）を生成して、動画ファイルとともにユーザ端末３９０へ配信する。 The server 200 uses the audio data of the user 400b and the audio data of the user 400c, and synthesizes the audio of the user 400b and the audio of the user 400c so that the playback time zones of the moving images are matched.
On the other hand, the moving image file to be distributed is stored in the secondary storage device of the server 200 and is distributed in response to the request received from the user terminal 390 of the user 400a. In addition, the server 200 creates voice erasure data (hereinafter referred to as “voice (erase)”) by deleting the voice of the human voice (real voice) from the voice of the video file in advance, and the voice (erase) and voice (erase) The voice data of the synthesized voice (hereinafter referred to as “voice (synthesized)”) synthesized with the speech is generated and distributed to the user terminal 390 together with the moving image file.

以上によって、ユーザ４００ａは、通常の動画ファイル視聴の他、動画ファイルの音声をミュート（消音）して音声（合成）を再生し、ユーザ４００ｂやユーザ４００ｃの音声を聞いたり、音声（消去）を再生して、音声（吹込）の生成を実施しやすくしたりする事が可能である。 As described above, the user 400a plays the sound (synthesized) by muting the sound of the moving image file, listening to the sound of the user 400b and the user 400c, and listening to the sound (erasing) in addition to the normal viewing of the moving image file. It is possible to reproduce and facilitate the generation of sound (blowing).

図３は、本実施の形態によるサーバ２００の機能ブロック図である。
同図に示すように、サーバ２００は、プロトコル解析部２１３と、サーバＡＰＬベース部２１４と、サーバＡＰＬメイン部２１５を具備している。 FIG. 3 is a functional block diagram of the server 200 according to this embodiment.
As shown in the figure, the server 200 includes a protocol analysis unit 213, a server APL base unit 214, and a server APL main unit 215.

サーバＡＰＬベース部２１４は、サーバＡＰＬメイン部２１５およびプロトコル解析部２１３を起動し、サーバ２００をサービス稼動状態へ遷移させる。また、サーバＡＰＬベース部２１４は、２次記憶装置２１６、ＳＱＬ装置２１７やサーバ２００のハードウェアへのアクセス手順（以下、「ＡＰＩ」（Application Program Interface））をサーバＡＰＬメイン部２１５へ提供する。 The server APL base unit 214 activates the server APL main unit 215 and the protocol analysis unit 213 and causes the server 200 to transition to the service operating state. The server APL base unit 214 also provides the server APL main unit 215 with access procedures to the hardware of the secondary storage device 216, the SQL device 217, and the server 200 (hereinafter referred to as “API” (Application Program Interface)).

プロトコル解析部２１３は、ユーザ端末３９０のブラウザＡＰＬから受信した各種信号を１次解析し、サーバＡＰＬメイン部２１５が提供する必要な機能の呼び出しを実施する。この事により、新規機能を追加する場合、サーバＡＰＬメイン部２１５に機能追加するとともに、プロトコル解析部２１３が該新規機能ヘアクセスする必要がある信号を受信したときに、該新規機能ヘアクセスする手段を追加することにより、新規機能の追加が容易となる。 The protocol analysis unit 213 performs primary analysis on various signals received from the browser APL of the user terminal 390, and calls a necessary function provided by the server APL main unit 215. As a result, when a new function is added, the function is added to the server APL main unit 215, and when the protocol analysis unit 213 receives a signal that needs to access the new function, means for accessing the new function By adding, it becomes easy to add a new function.

サーバＡＰＬメイン部２１５は、動画送受信機能部２０１、動画変換機能部２０２、動画管理機能部２０３、動画音声分離機能部２０４、ユーザ管理機能部２０５、ブラウザＡＰＬ管理機能部２０６、音声送受信機能部２０７、音声消去機能部２０８、音声管理機能部２０９、音声合成機能部２１０、メタデータ管理機能部２１１、及び、ＴｉｍｅＩｎｄｅｘ（タイムインデックス）機能部２１２を備える。 The server APL main unit 215 includes a moving image transmission / reception function unit 201, a moving image conversion function unit 202, a moving image management function unit 203, a moving image audio separation function unit 204, a user management function unit 205, a browser APL management function unit 206, and an audio transmission / reception function unit 207. A voice erasure function unit 208, a voice management function unit 209, a voice synthesis function unit 210, a metadata management function unit 211, and a TimeIndex function unit 212.

動画送受信機能部２０１は、動画ファイルの受信や、送信を行う機能を有する。受信可能なファイル形式としては、ＡＶＩ、ＭＰＥＧ等がある。また、送信可能なファイル形式としては、ＦＬＶ、Ｈ２６４等がある。
動画変換機能部２０２は、動画送受信機能部２０１から転送された動画ファイルを、送信可能なファイル形式、ＦＬＶ、Ｈ２６４等に変換する機能を有する。 The moving image transmission / reception function unit 201 has a function of receiving and transmitting a moving image file. Receivable file formats include AVI and MPEG. As file formats that can be transmitted, there are FLV, H264, and the like.
The moving image conversion function unit 202 has a function of converting the moving image file transferred from the moving image transmission / reception function unit 201 into a transmittable file format, FLV, H264, or the like.

動画管理機能部２０３は、動画ファイルを一意に識別可能な識別子（ＩＤ）を動画ファイルへ付与する機能を有する。この識別子により、動画ファイルに関連する音声ファイル、メタデータ等をリンクさせる。また、動画管理機能部２０３は、音声管理機能部２０９を起動し、音声関連の機能処理を委譲する。また、プラグインソフトウェアからの動画送信要求に関しては、識別子を元に動画ファイル、各種音声ファイルを特定し、特定した該音声ファイルをユーザ端末３９０上で動作しているプラグインソフトウェアへ送信する。 The moving image management function unit 203 has a function of giving an identifier (ID) that can uniquely identify a moving image file to the moving image file. By this identifier, an audio file, metadata, etc. related to the moving image file are linked. In addition, the moving image management function unit 203 activates the sound management function unit 209 and delegates sound-related function processing. Further, regarding the moving image transmission request from the plug-in software, the moving image file and various audio files are identified based on the identifier, and the identified audio file is transmitted to the plug-in software operating on the user terminal 390.

ユーザ管理機能部２０５は、ユーザＩＤの作成、パスワードの管理、その他のユーザ情報を管理する機能を有する。各種データは、ＳＱＬ装置２１７に保存され、各種データを引数に関連データを読み出す事が可能である。
ブラウザＡＰＬ管理機能部２０６は、ユーザ端末３９０のブラウザ３００に表示するＷｅｂページの管理、プラグインソフトウェアの管理を行なう機能を有する。また、ブラウザＡＰＬ管理機能部２０６は、ユーザＩＤに応じて、必要なＷｅｂページやプラグインソフトウェアを決定する機能を有する。この事により、ユーザ毎に異なったサービスレベルのサービスを提供する事が可能になる。 The user management function unit 205 has functions for creating user IDs, managing passwords, and managing other user information. Various data is stored in the SQL device 217, and related data can be read out using various data as arguments.
The browser APL management function unit 206 has a function of managing a Web page displayed on the browser 300 of the user terminal 390 and managing plug-in software. Further, the browser APL management function unit 206 has a function of determining a necessary Web page and plug-in software according to the user ID. This makes it possible to provide services with different service levels for each user.

音声管理機能部２０９は、動画音声分離機能部２０４や音声消去機能部２０８を呼び出し、音声（分離）ファイルや音声（消去）ファイルの作成を指示する機能を有する。作成ファイルは識別子（ＩＤ）と対応させて、２次記憶装置２１６に保存可能である。また、音声管理機能部２０９は、識別子（ＩＤ）を元に、音声（消去）ファイルや音声（合成）ファイルを読出して、ユーザ端末３９０で動作しているプラグインソフトウェアへ送信する機能を有する。また、音声（吹込）ファイルをユーザ端末３９０で動作しているプラグインソフトウェアから受信する機能を有する。 The sound management function unit 209 has a function of calling the moving image / sound separation function unit 204 and the sound deletion function unit 208 and instructing the creation of a sound (separation) file and a sound (deletion) file. The created file can be stored in the secondary storage device 216 in association with an identifier (ID). The voice management function unit 209 has a function of reading a voice (erase) file and a voice (synthesized) file based on the identifier (ID) and transmitting them to plug-in software running on the user terminal 390. In addition, it has a function of receiving an audio (blowing) file from plug-in software running on the user terminal 390.

動画音声分離機能部２０４は、動画ファイル内の音声部分を、音声（分離）ファイルとして別途ファイル化する機能を有する。
音声消去機能部２０８は、バンドパスフィルタを用い、音声（分離）ファイルから、人間の音声部分を除いた音声（消去）ファイルを作成する機能を有する。 The moving image / sound separation function unit 204 has a function of separately forming a sound part in a moving image file as a sound (separated) file.
The voice erasure function unit 208 has a function of creating a voice (erase) file by removing a human voice part from a voice (separated) file using a bandpass filter.

音声合成機能部２１０は、プラグインソフトウェアから音声（吹込）ファイルを受信し、音声（合成）ファイルを２次記憶装置２１６から読出して、ＴｉｍｅＩｎｄｅｘ機能部２１２を呼び出し、音声（吹込）ファイルと音声（合成）ファイル時間軸が一致するように音声波形を合成し、音声（合成）ファイルを作成する機能を有する。
ＴｉｍｅＩｎｄｅｘ機能部２１２は、各種音声ファイルの時間軸（ｍｓｅｃ単位）をポインタ読出する機能と、該音声ファイルの波形合成時に、ポインタ位置を音声合成機能部２１０に通知する機能を有する。 The voice synthesis function unit 210 receives a voice (blowing) file from the plug-in software, reads the voice (synthesizing) file from the secondary storage device 216, calls the TimeIndex function unit 212, and creates a voice (blowing) file and a voice ( Synthesis) A function of synthesizing a speech waveform so that the file time axes coincide with each other and creating a speech (synthesis) file.
The TimeIndex function unit 212 has a function of reading out the time axis (in msec units) of various audio files as a pointer, and a function of notifying the voice synthesis function unit 210 of the pointer position at the time of waveform synthesis of the audio file.

音声送受信機能部２０７は、音声（消去）ファイル、音声（合成）ファイルをユーザ端末３９０で動作しているプラグインソフトウェアへ送信し、音声（吹込）ファイルをユーザ端末３９０で動作しているプラグインソフトウェアから受信する機能を有する。
メタデータ管理機能部２１１は、識別子（ＩＤ）とメタデータを受信して、ＳＱＬ装置２１７へ保存する機能を有する。また、メタデータ管理機能部２１１は、識別子（ＩＤ）やメタデータを引数としてＳＱＬ検索を行なう機能を有する。 The voice transmission / reception function unit 207 transmits a voice (erase) file and a voice (synthesized) file to plug-in software running on the user terminal 390, and a voice (blowing) file runs on the plug-in running on the user terminal 390. It has a function to receive from software.
The metadata management function unit 211 has a function of receiving an identifier (ID) and metadata and storing them in the SQL device 217. The metadata management function unit 211 has a function of performing an SQL search using an identifier (ID) or metadata as an argument.

２次記憶装置２１６は、動画ファイル、各種音声ファイル、その他ファイルを保存可能な記憶装置である。
ＳＱＬ装置２１７は、ユーザＩＤ、識別子ＩＤ、メタデータ、その他データを保存可能で、各データをリレーショナルに管理可能な記憶装置である。 The secondary storage device 216 is a storage device that can store a moving image file, various audio files, and other files.
The SQL device 217 is a storage device that can store user ID, identifier ID, metadata, and other data, and can manage each data in a relational manner.

図４は、ユーザ端末３９０の詳細な機能ブロック図である。
ユーザ端末３９０は、例えば、パーソナルコンピュータ、携帯電話等のコンピュータ端末であり、同図に示すように、プロトコル解析部３０９及びブラウザＡＰＬメイン部３１０を備えるブラウザ３００と、Ｗｅｂブラウザ３１１とを具備している。ブラウザ３００は、ユーザ端末３９０がサーバ２００から受信したプラグインソフトウェアを内部に備える記憶手段に記憶し、当該プラグインソフトウェアが読み出され、実行されることによって実現されるものである。 FIG. 4 is a detailed functional block diagram of the user terminal 390.
The user terminal 390 is a computer terminal such as a personal computer or a mobile phone, for example, and includes a browser 300 including a protocol analysis unit 309 and a browser APL main unit 310, and a Web browser 311 as shown in FIG. Yes. The browser 300 is realized by storing the plug-in software received by the user terminal 390 from the server 200 in a storage unit, and reading and executing the plug-in software.

Ｗｅｂブラウザ３１１は、コンピュータ端末上で動作するアプリケーションであり、汎用的な既存のＷｅｂブラウザのアプリケーションが使用可能である。Ｗｅｂブラウザ３１１は、マルチメディア機能を提供する、汎用的な既存のプラグインソフトウェア（例えば、Ａｄｏｂｅｆｌａｓｈｐｌａｙｅｒ等）を起動することが可能である。 The web browser 311 is an application that runs on a computer terminal, and a general-purpose existing web browser application can be used. The web browser 311 can start general-purpose existing plug-in software (for example, Adobe flash player) that provides a multimedia function.

プロトコル解析部３０９は、プロトコル解析部２１３と同等の機能を有し、機能追加が容易である。
ブラウザＡＰＬメイン部３１０は、動画送受信機能部３０１、動画再生機能部３０２、動画取込機能部３０３、ＴｉｍｅＩｎｄｅｘ機能部３０４、メタデータ管理機能部３０５、音声送受信機能部３０６、音声再生機能部３０７、音声取込機能部３０８を備える。 The protocol analysis unit 309 has the same function as the protocol analysis unit 213, and it is easy to add functions.
The browser APL main unit 310 includes a video transmission / reception function unit 301, a video playback function unit 302, a video capture function unit 303, a TimeIndex function unit 304, a metadata management function unit 305, an audio transmission / reception function unit 306, an audio playback function unit 307, A voice capturing function unit 308 is provided.

動画送受信機能部３０１は、動画ファイルの受信や、送信を行う機能を有する。受信可能なファイル形式としては、ＦＬＶ、Ｈ２６４等がある。送信可能なファイル形式としては、ＡＶＩ、ＭＰＥＧ等がある。
動画再生機能部３０２は、動画送受信機能部３０１が受信した動画ファイルを再生する機能を有する。動画再生機能部３０２は、再生、一時停止、早送り、巻き戻し、画面最大化、音量調節、等が可能であり、メタデータの入力インタフェースも具備する。
動画取込機能部３０３は、例えばユーザ端末３９０のＨＤＤ（ハードディスクドライブ）上に記憶されている動画ファイルを読み込み、またメタデータ管理機能部３０５よりメタデータを読出し、動画送受信機能部３０１へ転送して、サーバ２００へ送信する機能を有する。
メタデータ管理機能部３０５は、ユーザ４００が入力手段により入力した文字列を取込み、動画ファイルの送信と同時に送信する機能を有する。また、ユーザ４００が入力した文字列をメタデータと識別子、該メタデータをサーバ２００へ送信する機能と、該送信データの応答として、メタデータと一致、関連するデータの一覧を受信する機能を有する。 The moving image transmission / reception function unit 301 has a function of receiving and transmitting a moving image file. Receivable file formats include FLV and H264. As file formats that can be transmitted, there are AVI, MPEG, and the like.
The moving image playback function unit 302 has a function of playing back a moving image file received by the moving image transmission / reception function unit 301. The moving image playback function unit 302 can perform playback, pause, fast forward, rewind, screen maximization, volume adjustment, and the like, and includes a metadata input interface.
The moving image capture function unit 303 reads, for example, a moving image file stored on the HDD (hard disk drive) of the user terminal 390, reads metadata from the metadata management function unit 305, and transfers the metadata to the moving image transmission / reception function unit 301. And has a function of transmitting to the server 200.
The metadata management function unit 305 has a function of taking in a character string input by the user 400 using the input means and transmitting it simultaneously with transmission of the moving image file. In addition, it has a function of transmitting a character string input by the user 400 as metadata and an identifier, transmitting the metadata to the server 200, and receiving a list of related data that matches the metadata as a response to the transmission data. .

ＴｉｍｅＩｎｄｅｘ機能部３０４は、音声取込機能部３０８と連携し、音声ファイルの時間軸を一意に識別できるポインタを音声取込機能部３０８に提供する機能を有する。ポインタはｍｓｅｃ単位で埋め込みが可能である。
音声取込機能部３０８は、ＴｉｍｅＩｎｄｅｘ機能部３０４と連携し、音声ファイルへ時間軸を一意に識別できるポインタを音声ファイルに埋め込む機能を有する。ポインタを埋め込まれた音声ファイルは、サーバ２００へ送信される。 The TimeIndex function unit 304 has a function of providing a pointer that can uniquely identify the time axis of the sound file to the sound capturing function unit 308 in cooperation with the sound capturing function unit 308. The pointer can be embedded in msec units.
The voice capturing function unit 308 cooperates with the TimeIndex function unit 304 and has a function of embedding a pointer capable of uniquely identifying the time axis in the voice file in the voice file. The audio file in which the pointer is embedded is transmitted to the server 200.

音声送受信機能部３０６は、音声（消去）、音声（合成）ファイルを受信し、音声再生機能部３０７へこれらの音声ファイルを転送する機能を有する。
音声再生機能部３０７は、動画音声、音声（消去）、音声（合成）ファイルを選択的に再生する機能を有する。音声再生機能部３０７は、音声（消去）ファイル、または、音声（合成）ファイルを再生するときは、動画音声をミュートする。 The voice transmission / reception function unit 306 has a function of receiving voice (erase) and voice (synthesis) files, and transferring these voice files to the voice playback function unit 307.
The audio reproduction function unit 307 has a function of selectively reproducing moving image audio, audio (erase), and audio (synthesized) files. The audio reproduction function unit 307 mutes the moving image audio when reproducing an audio (erase) file or an audio (synthesized) file.

図５は、動画および音声の種別の詳細を示す図である。
動画ファイルは、動画像および音声により構成される既存の一般形式の動画ファイルである。この動画ファイルの音声部分を音声（動画）とする。動画ファイルの音声（動画）を抽出し、単体の音声ファイルとしたものが、音声（分離）ファイル（分離音声データ）である。音声（分離）から、バンドパスフィルタを用いて肉声のみを消去して音声ファイルとしたものが、音声（消去）ファイル（消去音声データ）である。ユーザ４００がユーザ端末３９０上から吹き込んだ肉声のファイルは、音声（吹込）ファイル（吹込音声データ）である。そして、複数の音声（吹込）ファイルの音声波形を合成したものが、音声（合成）ファイル（合成音声データ）である。つまり音声（合成）ファイルを再生すると、複数のユーザの肉声が再生される。 FIG. 5 is a diagram showing details of the types of moving images and audio.
The moving image file is an existing general format moving image file composed of moving images and sound. Let the audio part of this moving image file be audio (moving image). An audio (separated) file (separated audio data) is obtained by extracting the audio (moving image) of the video file and converting it into a single audio file. An audio (erased) file (erased audio data) is obtained by deleting only the real voice from the audio (separated) using a band-pass filter to form an audio file. The real voice file that the user 400 blows in from the user terminal 390 is a voice (blowing) file (blowing voice data). A voice (synthesized) file (synthesized voice data) is obtained by synthesizing voice waveforms of a plurality of voice (blowing) files. That is, when a voice (synthesized) file is played, the voices of a plurality of users are played.

図６は、音声合成の詳細を示す図である。
ユーザの音声の合成は、実時間とは非同期で実施される。つまり、音声（合成）が吹き込まれると同時に、本システムを経由し、Ｗｅｂブラウザ３１１で再生される訳ではない。例えば、動画ファイルが１０分で構成されているものを取り上げる。ユーザ４００ａは該動画ファイルをＰＭ１２：００に見始めたと仮定する。視聴はＰＭ１２：１０に終了する。一方、ユーザ４００ｂは該ファイルをＰＭ１３：００に見始めるかも知れない。この場合、視聴はＰＭ１３：１０に終了する。このユーザ４００ａとユーザ４００ｂは、動画視聴に合わせて音声（吹込）ファイルを作成したとする。ユーザ４００ａの吹き込んだ音声（吹込）のファイルを音声（吹込）ファイル５００ａ、ユーザ４００ｂの吹き込んだ音声（吹込）のファイルを音声（吹込）ファイル５００ｂとする。これらの音声（吹込）ファイルにはそれぞれ、ＴｉｍｅＩｎｄｅｘ機能部３０４により、時間軸のポインタがｍｓｅｃ単位で埋め込まれる。
つまり、音声（吹込）ファイル５００ａには、動画の時間軸に対応した００：００から１０：００の間のポインタが埋め込まれている。同様に音声（吹込）ファイル５００ｂにも００：００から１０：００の間のポインタが埋め込まれている。 FIG. 6 is a diagram showing details of speech synthesis.
The user's speech synthesis is performed asynchronously with real time. That is, at the same time as the voice (synthesis) is blown, it is not reproduced by the Web browser 311 via this system. For example, a video file composed of 10 minutes is taken up. It is assumed that the user 400a starts watching the moving image file at PM 12:00. Viewing ends at PM 12:10. On the other hand, the user 400b may begin to view the file at PM 13:00. In this case, viewing ends at PM 13:10. It is assumed that the user 400a and the user 400b have created an audio (blowing) file in accordance with moving image viewing. A voice (blowing) file that the user 400a has blown is referred to as a voice (blowing) file 500a, and a voice (blowing) file that the user 400b has blown is referred to as a voice (blowing) file 500b. In each of these audio (blowing) files, the time index function unit 304 embeds a time axis pointer in msec units.
That is, a pointer between 0:00 and 10:00 corresponding to the time axis of the moving image is embedded in the audio (blowing) file 500a. Similarly, a pointer between 0:00 and 10:00 is also embedded in the voice (blowing) file 500b.

サーバ２００は、音声（吹込）ファイル５００ａと音声（吹込）ファイル５００ｂを受信する。受信した音声（吹込）ファイル５００ａと音声（吹込）ファイル５００ｂは、時間軸が一致するように音声波形の合成が行なわれ、音声（合成）ファイルが出力される。該音声（合成）ファイルは、動画ファイルがサーバ２００からユーザ端末３９０へ送信されるときに、同時に送信される。また、ユーザ端末３９０における動画ファイル再生時に、動画ファイルと音声（合成）ファイルの時間軸が一致するように音声（合成）ファイルを再生する。この事により、動画ファイル視聴時に、あたかも他のユーザ４００ａ、ユーザ４００ｂの肉声が同時に聞こえているかのようなサービスを提供可能である。本サービスの名称を、動画コラボレーション、または、非同期音声再生と呼ぶ。 The server 200 receives the voice (blowing) file 500a and the voice (blowing) file 500b. The received voice (blowing) file 500a and voice (blowing) file 500b are synthesized with voice waveforms so that their time axes coincide with each other, and a voice (synthesized) file is output. The audio (synthetic) file is transmitted simultaneously when the moving image file is transmitted from the server 200 to the user terminal 390. In addition, when playing back a moving image file on the user terminal 390, the sound (synthesized) file is played so that the time axes of the moving image file and the sound (synthesized) file coincide. As a result, it is possible to provide a service as if the other users 400a and 400b are simultaneously hearing the voice of the other user 400a when viewing the moving image file. The name of this service is called video collaboration or asynchronous audio playback.

図７は、ログインにおける動画コラボレーションシステムの詳細な処理手順を示す図である。
ログインは、ユーザ端末３９０のＷｅｂブラウザ３１１に入力されたユーザＩＤ、パスワードで実施される。Ｗｅｂブラウザ３１１は、ユーザ端末３９０の備えるキーボードやボタンなどの入力手段によってユーザ４００が入力したユーザＩＤ、パスワードの入力を受けると（ステップＳ１００１）、サーバ２００のサーバＡＰＬベース部２１４へ該ユーザＩＤ、パスワードを転送する（ステップＳ１００２）。ＡＰＬベース部２１４は、プロトコル解析部２１３を起動し（ステップＳ１００３）、プロトコル解析部２１３は、ユーザ端末３９０から受信した情報からどの機能を起動するかを選択し、選択したユーザ管理機能部２０５を起動する（ステップＳ１００４）。ユーザ管理機能部２０５は、受信したユーザＩＤ、パスワードチェックを行う（ステップＳ１００５）。ユーザ管理機能部２０５は、ユーザＩＤとパスワードの対応をＳＱＬ装置２１７から読出し（ステップＳ１００６、Ｓ１００７）、受信したユーザＩＤ及びパラメータの組みと一致するものがある場合は正当であると判断して、ＡＰＬの送信指示をブラウザＡＰＬ管理機能部２０６へ転送する（ステップＳ１００８、Ｓ１００９）。 FIG. 7 is a diagram showing a detailed processing procedure of the video collaboration system in login.
The login is performed with the user ID and password input to the web browser 311 of the user terminal 390. When the Web browser 311 receives the user ID and password input by the user 400 using input means such as a keyboard and buttons provided in the user terminal 390 (step S1001), the Web browser 311 sends the user ID and the password to the server APL base unit 214 of the server 200. The password is transferred (step S1002). The APL base unit 214 activates the protocol analysis unit 213 (step S1003), and the protocol analysis unit 213 selects which function to activate from the information received from the user terminal 390, and selects the selected user management function unit 205. Start (step S1004). The user management function unit 205 checks the received user ID and password (step S1005). The user management function unit 205 reads the correspondence between the user ID and the password from the SQL device 217 (steps S1006 and S1007), determines that there is a match with the received user ID and parameter combination, The APL transmission instruction is transferred to the browser APL management function unit 206 (steps S1008 and S1009).

ブラウザＡＰＬ管理機能部２０６は、ユーザ管理機能部２０５からの指示を受信し、ＡＰＬ送信を行う（ステップＳ１０１０）。ブラウザＡＰＬ管理機能部２０６は、ユーザＩＤからどの版のプラグインソフトウェアを必要としているかをＳＱＬ装置２１７から読出すと（ステップＳ１０１１、Ｓ１０１２）、対応するプラグインソフトウェアをサーバＡＰＬベース部２１４へ転送する（ステップＳ１０１３）。サーバＡＰＬベース部２１４は、プラグインソフトウェアをユーザ端末３９０のＷｅｂブラウザ３１１へ転送する（ステップＳ１０１４）。Ｗｅｂブラウザ３１１は、受信した該プラグインソフトウェアを起動する（ステップＳ１０１５）。これにより、ユーザ端末３９０では、ブラウザ３００が動作することになる。 The browser APL management function unit 206 receives the instruction from the user management function unit 205 and performs APL transmission (step S1010). When the browser APL management function unit 206 reads from the SQL device 217 which version of plug-in software is required from the user ID (steps S1011 and S1012), the browser APL management function unit 206 transfers the corresponding plug-in software to the server APL base unit 214. (Step S1013). The server APL base unit 214 transfers the plug-in software to the web browser 311 of the user terminal 390 (step S1014). The web browser 311 activates the received plug-in software (step S1015). As a result, the browser 300 operates on the user terminal 390.

図８及び図９は、動画取込における動画コラボレーションシステムの詳細な処理手順を示す図である。
図８において、ユーザ４００による指示等を契機に、動画取込がユーザ端末３９０の動画取込機能部３０３で実施される。つまり、動画取込機能部３０３は、ユーザが指示した動画ファイルをユーザ端末３９０のＨＤＤから読出し（ステップＳ１１０１）、動画送受信機能部３０１へ転送する（ステップＳ１１０２）。また、メタデータ管理機能部３０５は、ユーザ４００がユーザ端末３９０の入力手段により入力した文字列を受信し（ステップＳ１１０３、Ｓ１１０４）、この文字列をメタデータとして取込み、動画送受信機能部３０１へ転送する（ステップＳ１１０５）。動画ファイルとメタデータは、動画送受信機能部３０１からＷｅｂブラウザ３１１を経由し、サーバ２００のサーバＡＰＬベース部２１４へ転送される（ステップＳ１１０６、Ｓ１１０７、Ｓ１１０８）。 8 and 9 are diagrams illustrating a detailed processing procedure of the video collaboration system in video capture.
In FIG. 8, the moving image capturing is performed by the moving image capturing function unit 303 of the user terminal 390 in response to an instruction or the like by the user 400. That is, the moving image capture function unit 303 reads out the moving image file designated by the user from the HDD of the user terminal 390 (step S1101), and transfers it to the moving image transmission / reception function unit 301 (step S1102). Also, the metadata management function unit 305 receives a character string input by the user 400 using the input unit of the user terminal 390 (steps S1103 and S1104), takes this character string as metadata, and transfers it to the video transmission / reception function unit 301. (Step S1105). The moving image file and the metadata are transferred from the moving image transmission / reception function unit 301 to the server APL base unit 214 of the server 200 via the Web browser 311 (steps S1106, S1107, S1108).

続いて、図９において、サーバＡＰＬベース部２１４は、プロトコル解析部２１３を起動し（ステップＳ１２０１）、プロトコル解析部２１３は、ユーザ端末３９０より受信した情報からどの機能を起動するかを選択し、選択した動画送受信機能部２０１を起動する（ステップＳ１２０２）。動画送受信機能部２０１は、動画管理機能部２０３を起動し（ステップＳ１２０３）、動画管理機能部２０３は、動画ファイルを一意に識別可能な識別子（ＩＤ）を付与する（ステップＳ１２０４）。 Subsequently, in FIG. 9, the server APL base unit 214 activates the protocol analysis unit 213 (step S1201), and the protocol analysis unit 213 selects which function to activate from the information received from the user terminal 390, The selected moving image transmission / reception function unit 201 is activated (step S1202). The moving image transmission / reception function unit 201 activates the moving image management function unit 203 (step S1203), and the moving image management function unit 203 assigns an identifier (ID) that can uniquely identify the moving image file (step S1204).

動画管理機能部２０３は、メタデータ管理機能部２１１にＩＤ、メタデータを通知し（ステップＳ１２０５）、メタデータ管理機能部２１１は、ＩＤをキーに検索できるように、メタデータをＩＤと対応づけてＳＱＬ装置２１７へ保存する（ステップＳ１２０６）。また、動画管理機能部２０３は、動画変換機能部２０２へＩＤを通知して応答を受信する（ステップＳ１２０７、Ｓ１２０８）。動画変換機能部２０２は、動画ファイルをユーザ端末３９０において再生可能なＦＬＶやＨ２６４等の形式へ変換し（ステップＳ１２０９）、ＩＤで検索できるように、変換された動画ファイルをＩＤと対応づけて２次記憶装置２１６へ保存する（ステップＳ１２１０、Ｓ１２１１）。 The video management function unit 203 notifies the metadata management function unit 211 of the ID and metadata (step S1205), and the metadata management function unit 211 associates the metadata with the ID so that it can be searched using the ID as a key. To the SQL device 217 (step S1206). The moving image management function unit 203 notifies the moving image conversion function unit 202 of the ID and receives a response (steps S1207 and S1208). The moving image conversion function unit 202 converts the moving image file into a format such as FLV or H264 that can be played back on the user terminal 390 (step S1209), and associates the converted moving image file with the ID so that it can be searched by ID. Saving to the next storage device 216 (steps S1210, S1211).

ステップＳ１２０８において動画変換機能部２０２から応答を受信した動画管理機能部２０３は、音声管理機能部２０９を起動する（ステップＳ１２１２、Ｓ１２１３）。音声管理機能部２０９は、動画音声分離機能部２０４を呼び出し（ステップＳ１２１４）、動画音声分離機能部２０４は、音声分離を起動する（ステップＳ１２１５）。これにより、動画音声分離機能部２０４は、ユーザ端末３９０から受信した動画ファイルを読込み、読込んだ動画ファイルから音声（動画）を抜き出して、音声（分離）ファイルを作成する（ステップＳ１２１６）。動画音声分離機能部２０４は、ＩＤで検索できるように、音声（分離）ファイルをＩＤと対応づけて２次記憶装置２１６へ保存する（ステップＳ１２１７）。 The video management function unit 203 that has received the response from the video conversion function unit 202 in step S1208 activates the audio management function unit 209 (steps S1212 and S1213). The audio management function unit 209 calls the video / audio separation function unit 204 (step S1214), and the video / audio separation function unit 204 activates audio separation (step S1215). Thereby, the moving image / audio separation function unit 204 reads the moving image file received from the user terminal 390, extracts the sound (moving image) from the read moving image file, and creates an audio (separated) file (step S1216). The moving image / sound separation function unit 204 stores the sound (separated) file in the secondary storage device 216 in association with the ID so that it can be searched by the ID (step S1217).

続いて、動画音声分離機能部２０４は、音声管理機能部２０９を起動し（ステップＳ１２１８）、音声管理機能部２０９は、音声消去機能部２０８を呼び出す（ステップＳ１２１９）。これにより、音声消去機能部２０８は、音声消去を起動する（ステップＳ１２２０）。すなわち、音声消去機能部２０８は、２次記憶装置２１６からＩＤに対応した音声（分離）ファイルを読込み（ステップＳ１２２１、Ｓ１２２２、Ｓ１２２３）、人の声の周波数帯を除去するバンドパスフィルタを用いて、読込んだ音声（分離）ファイルから肉声部分を削除し、音声（消去）ファイルを作成する（ステップＳ１２２４）。音声消去機能部２０８は、ＩＤで検索できるように、音声（消去）ファイルをＩＤと対応づけて２次記憶装置２１６へ保存する（ステップＳ１２２５）。 Subsequently, the moving image / voice separation function unit 204 activates the voice management function unit 209 (step S1218), and the voice management function unit 209 calls the voice erasure function unit 208 (step S1219). Thereby, the voice erasure function unit 208 activates voice erasure (step S1220). That is, the voice erasure function unit 208 reads a voice (separated) file corresponding to the ID from the secondary storage device 216 (steps S1221, S1222, and S1223), and uses a bandpass filter that removes a human voice frequency band. Then, the voice part is deleted from the read voice (separated) file, and a voice (erased) file is created (step S1224). The voice erasure function unit 208 stores the voice (erase) file in the secondary storage device 216 in association with the ID so that it can be searched by the ID (step S1225).

図１０及び図１１は、動画再生における動画コラボレーションシステムの詳細な処理手順を示す図である。
図１０において、ユーザ４００による指示等を契機に、動画再生がユーザ端末３９０の動画再生機能部３０２で実施される（ステップＳ１３０１）。動画再生機能部３０２は、ユーザ４００がユーザ端末３９０の入力手段により入力した動画検索のキーとなる文字列を取込むと、動画検索を開始し（ステップＳ１３０２）、取込んだ文字列をメタデータ管理機能部３０５へ転送する（ステップＳ１３０３）。文字列は、メタデータ管理機能部３０５からＷｅｂブラウザ３１１を経由し、サーバ２００のサーバＡＰＬベース部２１４へ転送される（ステップＳ１３０４、Ｓ１３０５）。 10 and 11 are diagrams illustrating a detailed processing procedure of the video collaboration system in video playback.
In FIG. 10, the video playback is performed by the video playback function unit 302 of the user terminal 390 in response to an instruction from the user 400 (step S1301). When the video playback function unit 302 takes in a character string that is a key for video search input by the user 400 using the input unit of the user terminal 390, the video playback function unit 302 starts video search (step S1302). Transfer to the management function unit 305 (step S1303). The character string is transferred from the metadata management function unit 305 to the server APL base unit 214 of the server 200 via the Web browser 311 (steps S1304 and S1305).

ＡＰＬベース部２１４は、プロトコル解析部２１３を起動し（ステップＳ１３０６）、プロトコル解析部２１３は、ユーザ端末３９０より受信した情報からどの機能を起動するかを選択し、選択したメタデータ管理機能部２１１を起動する（ステップＳ１３０７）。メタデータ管理機能部２１１は、ユーザ端末３９０から受信した該文字列を用いてＳＱＬ装置２１７を検索すると（ステップＳ１３０８、Ｓ１３０９）、ＳＱＬ装置２１７から該文字列と合致するか該文字列を含む、あるいは、該文字列と関連するメタデータに対応付けられている動画ファイルのＩＤの一覧が出力される（ステップＳ１３１０、Ｓ１３１１）。メタデータ管理機能部２１１は、出力されたＩＤ一覧を、サーバＡＰＬベース部２１４を経由して、ユーザ端末３９０へ通知する（ステップＳ１３１２、Ｓ１３１３）。 The APL base unit 214 activates the protocol analysis unit 213 (step S1306), and the protocol analysis unit 213 selects which function to activate from the information received from the user terminal 390, and selects the selected metadata management function unit 211. Is activated (step S1307). When the metadata management function unit 211 searches the SQL device 217 using the character string received from the user terminal 390 (steps S1308 and S1309), the metadata management function unit 211 matches or includes the character string from the SQL device 217. Alternatively, a list of video file IDs associated with the metadata associated with the character string is output (steps S1310 and S1311). The metadata management function unit 211 notifies the user terminal 390 of the output ID list via the server APL base unit 214 (steps S1312, S1313).

ユーザ端末３９０のプロトコル解析部３０９は、Ｗｅｂブラウザ３１１を経由してサーバ２００より受信した情報からどの機能を起動するかを選択し、選択したメタデータ管理機能部３０５を起動する（ステップＳ１３１４、Ｓ１３１５）。メタデータ管理機能部３０５は、動画再生機能部３０２へＩＤ一覧を転送する（ステップＳＳ１３１６）。動画再生機能部３０２は、ＩＤ一覧をユーザ端末３９０のディスプレイに表示するなどして出力し、動画検索を終了する(ステップＳ１３１７）。 The protocol analysis unit 309 of the user terminal 390 selects which function to activate from the information received from the server 200 via the web browser 311 and activates the selected metadata management function unit 305 (steps S1314 and S1315). ). The metadata management function unit 305 transfers the ID list to the moving image playback function unit 302 (step SS1316). The moving image playback function unit 302 outputs the ID list by displaying it on the display of the user terminal 390, and ends the moving image search (step S1317).

続いて、図１１において、ユーザ４００は、ユーザ端末３９０の入力手段により、出力されたＩＤ一覧から所望の動画のＩＤを指定する（ステップＳ１４０１）。動画再生機能部３０２は、指定されたＩＤを動画送受信機能部３０１へ通知し（ステップＳ１４０２）、動画送受信機能部３０１は、通知されたＩＤを、Ｗｅｂブラウザ３１１を経由して、サーバ２００のサーバＡＰＬベース部２１４へ転送する（ステップＳ１４０３、Ｓ１４０４）。 Subsequently, in FIG. 11, the user 400 designates an ID of a desired moving image from the output ID list using the input unit of the user terminal 390 (step S1401). The moving image playback function unit 302 notifies the specified ID to the moving image transmission / reception function unit 301 (step S1402), and the moving image transmission / reception function unit 301 sends the notified ID to the server 200 via the web browser 311. Transfer to the APL base unit 214 (steps S1403 and S1404).

サーバＡＰＬベース部２１４は、プロトコル解析部２１３を起動し（ステップＳ１４０５）、プロトコル解析部２１３は、受信した情報からどの機能を起動するかを選択し、選択した動画管理機能部２０３を起動する（ステップＳ１４０６）。動画管理機能部２０３は、ユーザ端末３９０から受信したＩＤと一致する動画ファイルを２次記憶装置２１６から読出す（ステップＳ１４０７、Ｓ１４０８）。読出された動画ファイルは、動画送受信機能部２０１へ転送され（ステップＳ１４０９）、動画送受信機能部２０１は、この動画ファイルを、サーバＡＰＬベース部２１４を経由し、ユーザ端末３９０のＷｅｂブラウザ３１１へ転送する（ステップＳ１４１０、Ｓ１４１１）。 The server APL base unit 214 activates the protocol analysis unit 213 (step S1405), the protocol analysis unit 213 selects which function to activate from the received information, and activates the selected moving image management function unit 203 ( Step S1406). The moving image management function unit 203 reads a moving image file that matches the ID received from the user terminal 390 from the secondary storage device 216 (steps S1407 and S1408). The read moving image file is transferred to the moving image transmission / reception function unit 201 (step S1409), and the moving image transmission / reception function unit 201 transfers the moving image file to the web browser 311 of the user terminal 390 via the server APL base unit 214. (Steps S1410 and S1411).

ユーザ端末３９０のプロトコル解析部３０９は、Ｗｅｂブラウザ３１１を経由してサーバ２００より受信した情報からどの機能を起動するかを選択する（ステップＳ１４１２）。プロトコル解析部３０９は、選択した動画送受信機能部３０１へ動画ファイルを転送し、さらに、動画送受信機能部３０１は、動画再生機能部３０２へ動画ファイルを転送する（ステップＳ１４１３、Ｓ１４１４）。 The protocol analysis unit 309 of the user terminal 390 selects which function to activate from information received from the server 200 via the Web browser 311 (step S1412). The protocol analysis unit 309 transfers the moving image file to the selected moving image transmission / reception function unit 301, and the moving image transmission / reception function unit 301 transfers the moving image file to the moving image reproduction function unit 302 (steps S1413 and S1414).

また、サーバ２００において動画管理機能部２０３は、音声管理機能部２０９を起動してＩＤを通知する（ステップＳ１４１５）。音声管理機能部２０９は、ＩＤと一致する音声（消去）ファイル、音声（合成）ファイルを２次記憶装置２１６から読出し（ステップＳ１４１６、Ｓ１４１７、Ｓ１４１８）、音声送受信機能部２０７へ転送する（ステップＳ１４１９）。音声送受信機能部２０７は、音声（消去）ファイル、音声（合成）ファイルを、サーバＡＰＬベース部２１４を経由し、ユーザ端末３９０のＷｅｂブラウザ３１１へ転送する（ステップＳ１４２０、Ｓ１４２１）。 Also, in the server 200, the moving image management function unit 203 activates the voice management function unit 209 and notifies the ID (step S1415). The voice management function unit 209 reads the voice (erase) file and voice (synthesis) file that match the ID from the secondary storage device 216 (steps S1416, S1417, and S1418) and transfers them to the voice transmission / reception function unit 207 (step S1419). ). The voice transmission / reception function unit 207 transfers the voice (erase) file and the voice (synthesis) file to the Web browser 311 of the user terminal 390 via the server APL base unit 214 (steps S1420 and S1421).

ユーザ端末３９０のプロトコル解析部３０９は、Ｗｅｂブラウザ３１１を経由してサーバ２００より受信した情報から、どの機能を起動するかを選択する（ステップＳ１４２２）。プロトコル解析部３０９は、選択した音声送受信機能部３０６を起動し、音声送受信機能部３０６は、音声（消去）ファイル、音声（合成）ファイルを受信する（ステップＳ１４２３）。音声送受信機能部３０６は、音声（消去）ファイルと音声（合成）ファイルを、音声再生機能部３０７へ転送する（ステップＳ１４２４）。動画再生機能部３０２は、ＩＤの送信後または動画フィアルの受信後に待ちタイマーを起動しており（ステップＳ１４２５）、この待ちタイマー満了前に、動画再生機能部３０２が動画ファイルを受信完了し、かつ、音声再生機能部３０７が音声（消去）ファイルと音声（合成）ファイルを受信完了しているかをチェックする（ステップＳ１４２６）。 The protocol analysis unit 309 of the user terminal 390 selects which function to activate from the information received from the server 200 via the Web browser 311 (step S1422). The protocol analysis unit 309 activates the selected voice transmission / reception function unit 306, and the voice transmission / reception function unit 306 receives the voice (erase) file and the voice (synthesis) file (step S1423). The voice transmission / reception function unit 306 transfers the voice (erase) file and the voice (synthesis) file to the voice playback function unit 307 (step S1424). The video playback function unit 302 starts a waiting timer after the ID is transmitted or the video file is received (step S1425), and before the waiting timer expires, the video playback function unit 302 completes receiving the video file, and Then, it is checked whether the audio reproduction function unit 307 has received the audio (erase) file and the audio (synthesis) file (step S1426).

音声再生機能部３０７は、もし受信完了していない場合は（ステップＳ１４２６：ＮＯ）、待ちタイマーを再び起動して一定時間後に再度チェックし（ステップＳ１４２５、Ｓ１４２６）、受信完了している場合は（ステップＳ１４２６：ＹＥＳ）、動画ファイルを再生する（ステップＳ１４２７）。音声再生機能部３０７は、動画ファイル再生時に音声（合成）を再生したい場合は、その旨を音声再生機能部３０７に指示し、音声再生機能部３０７は、音声（動画）をミュートして、音声（合成）ファイルを再生する（ステップＳ１４２８、Ｓ１４２９）。一方、音声再生機能部３０７は、音声（消去）を再生したい場合は、その旨を音声再生機能部３０７に指示し、音声再生機能部３０７は、音声（動画）をミュートして、音声（消去）ファイルを再生する（ステップＳ１４３０、Ｓ１４３１）。なお、動画ファイル再生時に音声（合成）を再生するか、音声（消去）を再生するかは、予めユーザ端末３９０に設定されるか、ユーザ４００により指示が入力される。 If the reception is not completed (step S1426: NO), the audio playback function unit 307 starts the waiting timer again and checks again after a predetermined time (steps S1425 and S1426). Step S1426: YES), the moving image file is reproduced (Step S1427). The audio playback function unit 307 instructs the audio playback function unit 307 to play back audio (synthetic) during playback of a video file, and the audio playback function unit 307 mutes the audio (video) to generate audio. The (composite) file is reproduced (steps S1428 and S1429). On the other hand, when the audio reproduction function unit 307 wants to reproduce the sound (erasure), the audio reproduction function unit 307 instructs the audio reproduction function unit 307 to do so, and the audio reproduction function unit 307 mutes the audio (video) and outputs the audio (erasure). ) Play the file (steps S1430, S1431). Note that whether to reproduce sound (synthesis) or sound (erase) when reproducing a moving image file is set in the user terminal 390 in advance or an instruction is input by the user 400.

図１２は、音声吹込における動画コラボレーションシステムの詳細な処理手順を示す図であり、図１１に示す手順によってユーザ端末３９０において動画が再生されている間に実行される。 FIG. 12 is a diagram illustrating a detailed processing procedure of the moving image collaboration system in the sound blowing, and is executed while a moving image is being reproduced on the user terminal 390 according to the procedure illustrated in FIG. 11.

ユーザ４００による指示等を契機として、音声吹込みがユーザ端末３９０の音声取込機能部３０８で実施される（ステップＳ１５０１）。音声取込機能部３０８は、取込開始時に、ＴｉｍｅＩｎｄｅｘ機能部３０４を呼び出す（ステップＳ１５０２、Ｓ１５０３）。ＴｉｍｅＩｎｄｅｘ機能部３０４は、たとえば、動画の開始を００：００：００：００（時間：分：秒：ミリ秒）として、動画再生時間をｍｓｅｃ（ミリ秒）単位でカウントしており、音声取込機能部３０８から呼び出されたときのタイマカウンタ、すなわち、音声吹込み開始時の時刻のカウンタを通知する（ステップＳ１５０４、Ｓ１５０５）。音声取込機能部３０８は、ユーザ端末３９０の備えるマイク等の集音装置を用いた音声の取込みを行い、ユーザ４００が音声吹込み終了指示を入力するまで、取込んだ音声のデータをメモリ等の記憶手段に保持しておく。音声取込機能部３０８は、音声吹込み終了指示が入力されると、ＴｉｍｅＩｎｄｅｘ機能部３０４を呼び出して音声吹込み終了時の時刻カウンタを取得し、記憶手段に保持していた音声のデータに、ＴｉｍｅＩｎｄｅｘ機能部３０４から通知された音声吹込み開始時及び終了時のタイマカウンタ（ＴｉｍｅＩｎｄｅｘポインタ）を埋め込んだ音声（吹込）ファイルを生成する（ステップＳ１５０６）。音声取込機能部３０８は、生成した音声（吹込）ファイルを、Ｗｅｂブラウザ３１１を経由して、サーバ２００のサーバＡＰＬベース部２１４へ転送する（ステップＳ１５０７、Ｓ１５０８、Ｓ１５０９）。 In response to an instruction from the user 400, voice blowing is performed by the voice capturing function unit 308 of the user terminal 390 (step S 1501). The voice capturing function unit 308 calls the TimeIndex function unit 304 at the start of capturing (steps S1502 and S1503). For example, the TimeIndex function unit 304 sets the moving image start time to 00:00:00 (hours: minutes: seconds: milliseconds) and counts the moving image reproduction time in units of msec (milliseconds). A timer counter when called from the function unit 308, that is, a counter of the time at the start of voice blowing is notified (steps S1504 and S1505). The voice capturing function unit 308 captures voice using a sound collecting device such as a microphone provided in the user terminal 390, and stores the captured voice data in a memory or the like until the user 400 inputs a voice blowing end instruction. Is stored in the storage means. When the voice blowing end instruction is input, the voice capturing function unit 308 calls the TimeIndex function unit 304 to acquire a time counter at the time of voice blowing end, and stores the voice data held in the storage unit. A voice (blowing) file in which the timer counter (TimeIndex pointer) at the start and end of voice blowing notified from the TimeIndex function unit 304 is embedded is generated (step S1506). The voice capturing function unit 308 transfers the generated voice (blowing) file to the server APL base unit 214 of the server 200 via the web browser 311 (steps S1507, S1508, and S1509).

サーバＡＰＬベース部２１４は、プロトコル解析部２１３を起動し（ステップＳ１５１０）、プロトコル解析部２１３は、ユーザ端末３９０より受信した情報からどの機能を起動するかを選択し、選択した音声送受信機能部２０７を起動する（ステップＳ１５１１）。音声送受信機能部２０７は、音声（吹込）を受信すると、音声管理機能部２０９を起動する（ステップＳ１５１２）。さらに、音声管理機能部２０９は、音声合成機能部２１０へ音声（吹込）ファイルを転送し（ステップＳ１５１３）、音声合成機能部２１０は、音声（吹込）ファイルをＩＤと対応付けて２次記憶装置２１６へ保存する（ステップＳ１５１４、Ｓ１５１３）。なお、ＩＤは、ユーザ端末３９０において音声（吹込）ファイルに付加することでもよく、動画選択時にユーザ端末３９０から受信し、サーバ２００において保持していたものでもよい。 The server APL base unit 214 activates the protocol analysis unit 213 (step S1510), and the protocol analysis unit 213 selects which function to activate from the information received from the user terminal 390, and selects the selected voice transmission / reception function unit 207. Is activated (step S1511). The voice transmission / reception function unit 207 activates the voice management function unit 209 when receiving voice (blowing) (step S1512). Further, the voice management function unit 209 transfers the voice (blow) file to the voice synthesis function unit 210 (step S1513), and the voice synthesis function unit 210 associates the voice (blow) file with the ID and stores it in the secondary storage device. It is stored in H.216 (Steps S1514 and S1513). The ID may be added to the voice (blowing) file at the user terminal 390, or may be received from the user terminal 390 when the moving image is selected and held at the server 200.

音声合成機能部２１０は、ＩＤに対応した音声（合成）ファイルを２次記憶装置２１６から読出し（ステップＳ１５１６、Ｓ１５１７、Ｓ１５１８）、音声（吹込）ファイルと音声（合成）ファイルをメモリ上へ展開する。このとき、音声合成機能部２１０は、音声（吹込）ファイルを異なるユーザ端末３９０から複数受信し、２次記憶装置２１６へ保存しているものとする。
ＴｉｍｅＩｎｄｅｘ機能部２１２は、展開された各音声（吹込）ファイルと、音声（合成）ファイルのＴｉｍｅＩｎｄｅｘを読み出し（ステップＳ１５１９、Ｓ１５２０、Ｓ１５２１）、音声合成機能部２１０は、各音声（吹込）ファイルと音声（合成）のＴｉｍｅｌｎｄｅｘポインタが一致するように音声波形を合成して、新規の音声（合成）ファイルを作成し、２次記憶装置２１６へこの新規の音声（合成）をＩＤと対応付けて保存する（ステップＳ１５２２、Ｓ１５２３）。これにより、新規の音声（合成）ファイルは、２次記憶装置２１６への登録後に、当該ＩＤの動画を要求してきたユーザ端末３９０に送信されることになる。 The voice synthesis function unit 210 reads a voice (synthesis) file corresponding to the ID from the secondary storage device 216 (steps S1516, S1517, and S1518), and expands the voice (blowing) file and the voice (synthesis) file on the memory. . At this time, it is assumed that the voice synthesis function unit 210 receives a plurality of voice (blowing) files from different user terminals 390 and stores them in the secondary storage device 216.
The TimeIndex function unit 212 reads each expanded voice (speech) file and the TimeIndex of the speech (synthesis) file (steps S1519, S1520, S1521), and the speech synthesis function unit 210 reads each speech (suffix) file and the speech. The voice waveform is synthesized so that the (Synthesis) Timenexex pointers coincide with each other to create a new voice (synthesized) file, and the new voice (synthesized) is stored in the secondary storage device 216 in association with the ID. (Steps S1522, S1523). As a result, the new voice (synthesized) file is transmitted to the user terminal 390 that has requested the moving image with the ID after registration in the secondary storage device 216.

以上説明したように、この発明によれば、以下の効果を得る事ができる。
すなわち、サーバから配信する動画の視聴者間で肉声が共有して、動画コラボレーションが実現され、あたかもＬｉｖｅコンサート会場にいるかのような音場の共有が可能になる。 As described above, according to the present invention, the following effects can be obtained.
That is, the real voice is shared between the viewers of the video distributed from the server, and the video collaboration is realized, and the sound field can be shared as if it is in a live concert venue.

なお、上述のユーザ端末３９０、及び、サーバ２００は、内部にコンピュータシステムを有している。そして、上述したユーザ端末３９０のブラウザ３００及びＷｅｂブラウザ３１１、ならびに、サーバ２００のプロトコル解析部２１３、サーバＡＰＬベース部２１４、及び、サーバＡＰＬメイン部２１５の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵや各種メモリ、ＯＳや周辺機器等のハードウェアを含むものである。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The user terminal 390 and the server 200 described above have a computer system inside. The operation processes of the browser 300 and the web browser 311 of the user terminal 390, the protocol analysis unit 213, the server APL base unit 214, and the server APL main unit 215 of the server 200 are read in the form of a program by a computer. It is stored in a possible recording medium, and the computer system reads out and executes this program, so that the above processing is performed. Here, the computer system includes a CPU, various memories, an OS, and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.

また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

本発明の一実施形態による動画コラボレーションシステムの概要を示す図である。It is a figure which shows the outline | summary of the moving image collaboration system by one Embodiment of this invention. 同実施形態による動画コラボレーションシステムが提供するサービス概念を示す図である。It is a figure which shows the service concept which the moving image collaboration system by the embodiment provides. 同実施形態によるサーバの機能ブロック図である。It is a functional block diagram of the server by the embodiment. 同実施形態によるユーザ端末の機能ブロック図である。It is a functional block diagram of the user terminal by the embodiment. 同実施形態による動画および音声の種別の詳細を示す図である。It is a figure which shows the detail of the classification of the moving image and audio | voice by the embodiment. 同実施形態による音声合成の詳細を示す図である。It is a figure which shows the detail of the speech synthesis by the embodiment. 同実施形態による動画コラボレーションシステムのログインの詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of login of the moving image collaboration system by the embodiment. 同実施形態による動画コラボレーションシステムの動画取込の詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of the moving image capture of the moving image collaboration system by the embodiment. 同実施形態による動画コラボレーションシステムの動画取込の詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of the moving image capture of the moving image collaboration system by the embodiment. 同実施形態による動画コラボレーションシステムの動画再生の詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of the moving image reproduction of the moving image collaboration system by the embodiment. 同実施形態による動画コラボレーションシステムの動画再生の詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of the moving image reproduction of the moving image collaboration system by the embodiment. 同実施形態による動画コラボレーションシステムの音声吹込の詳細な処理手順を示す図である。It is a figure which shows the detailed process sequence of the audio | voice blowing of the moving image collaboration system by the embodiment.

符号の説明Explanation of symbols

１００…ネットワーク
２００…サーバ
２０１…動画送受信機能部（動画送信手段、登録動画受信手段）
２０２…動画変換機能部（動画登録手段、動画変換手段）
２０３…動画管理機能部
２０４…動画音声分離機能部
２０５…ユーザ管理機能部（ユーザ管理手段）
２０６…ブラウザＡＰＬ管理機能部（アプリケーション管理手段）
２０７…音声送受信機能部（音声送信手段）
２０８…音声消去機能部（音声消去手段）
２０９…音声管理機能部
２１０…音声合成機能部（音声合成手段）
２１１…メタデータ管理機能部（メタデータ管理手段）
２１２…ＴｉｍｅＩｎｄｅｘ機能部
２１３…プロトコル解析部
２１４…サーバＡＰＬベース部
２１５…サーバＡＰＬメイン部
２１６…２次記憶装置（記憶手段）
２１７…ＳＱＬ装置（記憶手段）
３００…ブラウザ
３０１…動画送受信機能部（動画受信手段）
３０２…動画再生機能部（動画再生手段、検索要求手段、一覧受信手段、識別子送信手段、登録動画送信手段）
３０３…動画取込機能部
３０４…ＴｉｍｅＩｎｄｅｘ機能部
３０５…メタデータ管理機能部
３０６…音声送受信機能部（音声受信手段、吹込音声送信手段）
３０７…音声再生機能部
３０８…音声取込機能部
３０９…プロトコル解析部
３１０…ブラウザＡＰＬメイン部
３１１…Ｗｅｂブラウザ（認証要求手段）
３９０…ユーザ端末
４００、４００ａ、４００ｂ、４００ｃ…ユーザ DESCRIPTION OF SYMBOLS 100 ... Network 200 ... Server 201 ... Movie transmission / reception function part (Moving image transmission means, registered moving image reception means)
202 ... moving image conversion function unit (moving image registration means, moving image conversion means)
203 ... Movie management function unit 204 ... Movie audio separation function unit 205 ... User management function unit (user management means)
206: Browser APL management function unit (application management means)
207 ... Audio transmission / reception function unit (audio transmission means)
208 ... Voice erasure function section (voice erasure means)
209 ... voice management function part 210 ... voice synthesis function part (voice synthesis means)
211 ... Metadata management function unit (metadata management means)
212 ... TimeIndex function part 213 ... Protocol analysis part 214 ... Server APL base part 215 ... Server APL main part 216 ... Secondary storage device (storage means)
217 ... SQL device (storage means)
300 ... Browser 301 ... Movie transmission / reception function unit
302 ... Movie playback function unit (video playback unit, search request unit, list reception unit, identifier transmission unit, registered video transmission unit)
303 ... Movie capture function unit 304 ... TimeIndex function unit 305 ... Metadata management function unit 306 ... Audio transmission / reception function unit (audio reception means, blow-in audio transmission means)
307 ... Audio reproduction function unit 308 ... Audio capture function unit 309 ... Protocol analysis unit 310 ... Browser APL main unit 311 ... Web browser (authentication request means)
390: User terminal 400, 400a, 400b, 400c ... User

Claims

動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムであって、
前記サーバは、
前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成手段と、
前記端末へ動画データを送信する動画送信手段と、
前記動画送信手段が送信する動画データに併せて、前記音声合成手段が生成した合成音声データを前記端末へ送信する音声送信手段と
を備え、
前記端末は、
前記サーバから動画データを受信する動画受信手段と、
前記動画受信手段が受信した動画データを再生する動画再生手段と、
前記動画データに併せて前記サーバから合成音声データを受信する音声受信手段と、
前記音声受信手段が受信した合成音声データの音声を再生する音声再生手段と、
自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信手段と
を備える、
ことを特徴とする動画コラボレーションシステム。 A video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network,
The server
Using a plurality of blowing audio data received from the terminal, which is audio data associated with the reproduction time of the moving image, a plurality of sounds are synthesized so that the reproduction time of the moving image matches, and synthesized audio data is generated. Speech synthesis means;
Video transmission means for transmitting video data to the terminal;
In addition to the moving image data transmitted by the moving image transmitting means, voice transmitting means for transmitting the synthesized voice data generated by the voice synthesizing means to the terminal,
The terminal
Moving image receiving means for receiving moving image data from the server;
Video playback means for playing back the video data received by the video reception means;
Voice receiving means for receiving synthesized voice data from the server together with the video data;
Voice reproducing means for reproducing the voice of the synthesized voice data received by the voice receiving means;
Injecting sound transmitting means for transmitting to the server injecting sound data, which is sound data that is injected into the terminal and is associated with the playback time of the video,
A video collaboration system characterized by this.

前記端末は、
検索文字列を前記サーバへ送信する検索要求手段と、
前記検索要求手段によって送信した検索文字列に対応して、前記サーバから動画データの識別子の一覧を受信する一覧受信手段と、
前記一覧受信手段が受信した一覧の中から選択した識別子を前記サーバへ送信する識別子送信手段とをさらに備え、
前記サーバは、
動画データ及び合成音声データを識別子と対応付けて記憶するとともに、動画データのメタデータ及び識別子を対応付けて記憶する記憶手段と、
前記端末から受信した検索文字列により前記記憶手段を検索し、検索の結果得られた識別子の一覧を取得し、前記端末へ返送するメタデータ管理手段とをさらに備え、
前記動画送信手段は、前記端末から受信した識別子に対応する動画データを前記記憶手段から読み出して返送し、
前記音声送信手段は、前記端末から受信した識別子に対応する合成音声データを前記記憶手段から読み出して返送する、
ことを特徴とする請求項１に記載の動画コラボレーションシステム。 The terminal
Search request means for transmitting a search string to the server;
In response to the search character string transmitted by the search request means, a list receiving means for receiving a list of video data identifiers from the server;
An identifier sending means for sending an identifier selected from the list received by the list receiving means to the server;
The server
Storage means for storing the moving image data and the synthesized audio data in association with the identifier, and storing the metadata of the moving image data and the identifier in association with each other;
Metadata search means for searching the storage means by a search character string received from the terminal, obtaining a list of identifiers obtained as a result of the search, and returning the list to the terminal;
The moving picture transmission means reads out the moving picture data corresponding to the identifier received from the terminal from the storage means and returns it,
The voice transmission means reads out the synthesized voice data corresponding to the identifier received from the terminal from the storage means and returns it;
The moving image collaboration system according to claim 1.

前記記憶手段は、前記動画データの音声から人の声を消去した音声のデータである消去音声データをさらに前記識別子と対応付けて記憶し、
前記音声送信手段は、前記端末から受信した識別子に対応する消去音声データを前記記憶手段から読み出して返送し、
前記音声受信手段は、前記動画データに併せて前記サーバから、消去音声データを受信し、
前記音声再生手段は、前記動画データの音声を消音して、前記消去音声データを再生する、
ことを特徴とする請求項２に記載の動画コラボレーションシステム。 The storage means further stores erased voice data, which is voice data obtained by erasing a human voice from the voice of the moving image data, in association with the identifier,
The voice transmission means reads out erase voice data corresponding to the identifier received from the terminal from the storage means and returns it,
The audio receiving means receives erased audio data from the server together with the moving image data,
The sound reproduction means mutes the sound of the moving image data and reproduces the erased sound data.
The moving picture collaboration system according to claim 2.

前記端末は、
動画データと、当該動画データのメタデータを前記サーバへ送信する登録動画送信手段をさらに備え、
前記サーバは、
前記端末から動画データとメタデータを受信する登録動画受信手段と、
動画データの識別子と対応づけて、前記登録動画受信手段が受信した動画データ及びメタデータを前記記憶手段に書き込む動画登録手段と、
前記登録動画受信手段が受信した動画データから、音声のみを抜き出した分離音声データを生成し、当該分離音声データから人の声を除いて消去音声データを生成し、生成した消去音声データを前記識別子と対応付けて前記記憶手段に保存する音声消去手段とをさらに備える、
ことを特徴とする請求項３に記載の動画コラボレーションシステム。 The terminal
It further comprises registered video transmission means for transmitting video data and metadata of the video data to the server,
The server
Registered moving image receiving means for receiving moving image data and metadata from the terminal;
A moving picture registration means for writing the moving picture data and metadata received by the registered moving picture receiving means in the storage means in association with the identifier of the moving picture data;
From the moving image data received by the registered moving image receiving means, separated audio data is generated by extracting only audio, erasure audio data is generated by removing human voice from the separated audio data, and the generated erasure audio data is assigned to the identifier Voice erasure means for storing in the storage means in association with
The moving picture collaboration system according to claim 3.

前記端末は、
ユーザの認証情報を前記サーバへ送信する認証要求手段をさらに備え、
前記サーバは、
前記端末からユーザの認証情報を受信し、当該認証情報によって認証を行うユーザ管理手段と、
前記ユーザ管理手段によって認証された場合に、前記ユーザに対応したプラグインソフトウェアを前記端末へ返送するアプリケーション管理手段とをさらに備え、
前記端末は、前記認証要求手段により送信した認証情報が認証された場合に返送されるプラグインソフトウェアにより、前記動画受信手段、前記動画再生手段、前記音声受信手段、前記音声再生手段、及び、前記吹込音声送信手段を生成する、
ことを特徴とする請求項１に記載の動画コラボレーションシステム。 The terminal
Further comprising authentication request means for transmitting user authentication information to the server;
The server
User management means for receiving user authentication information from the terminal and authenticating with the authentication information;
Application management means for returning plug-in software corresponding to the user to the terminal when authenticated by the user management means;
The terminal uses plug-in software that is returned when the authentication information transmitted by the authentication requesting unit is authenticated, the moving image receiving unit, the moving image reproducing unit, the audio receiving unit, the audio reproducing unit, and the Generating a blowing voice transmission means;
The moving image collaboration system according to claim 1.

前記サーバは、
前記動画データを、前記動画再生手段によって再生可能なデータ形式に変換する動画変換手段をさらに備える、
ことを特徴とする請求項１から請求項５のいずれかの項に記載の動画コラボレーションシステム。 The server
A moving image conversion means for converting the moving image data into a data format reproducible by the moving image reproduction means;
The moving image collaboration system according to any one of claims 1 to 5, wherein

動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムに用いられる動画コラボレーション方法であって、
前記サーバにおいて、
音声合成手段が、前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成過程と、
動画送信手段が、前記端末へ動画データを送信する動画送信過程と、
音声送信手段が、前記動画送信過程において送信する動画データに併せて、前記音声合成過程において生成した合成音声データを前記端末へ送信する音声送信過程と
を有し、
前記端末において、
動画受信手段が、前記サーバから動画データを受信する動画受信過程と、
動画再生手段が、前記動画受信手段が受信した動画データを再生する動画再生過程と、
音声受信手段が、前記動画データに併せて前記サーバから合成音声データを受信する音声受信過程と、
音声再生手段が、前記音声受信過程において受信した合成音声データの音声を再生する音声再生過程と、
吹込音声送信手段が、自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信過程と
を有する、
ことを特徴とする動画コラボレーション方法。 A video collaboration method used in a video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network,
In the server,
The voice synthesizing unit synthesizes and synthesizes a plurality of sounds so that the reproduction times of the moving images match using a plurality of blown-in audio data received from the terminal and corresponding to the reproduction times of the moving images. A speech synthesis process for generating speech data;
A moving image transmitting means for transmitting moving image data to the terminal;
A voice transmission step in which voice transmission means transmits the synthesized voice data generated in the voice synthesis process to the terminal together with the video data to be transmitted in the video transmission process;
In the terminal,
A moving picture receiving means for receiving moving picture data from the server;
A moving image reproducing means for reproducing the moving image data received by the moving image receiving means;
An audio receiving means for receiving synthesized audio data from the server together with the moving image data; and
An audio reproduction means for reproducing audio of the synthesized audio data received in the audio reception process;
Insufflation sound transmitting means includes insufflation sound transmission process for transmitting insufflation sound data, which is sound data associated with the playback time of the moving image, to the server.
A video collaboration method characterized by this.

動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムの前記サーバとして用いられるコンピュータを、
前記端末から受信した、動画の再生時刻に対応付けられた音声のデータである吹込音声データを複数用いて、動画の再生時刻が合致するように複数の音声を合成し、合成音声データを生成する音声合成手段、
前記端末へ動画データを送信する動画送信手段、
前記動画送信手段が送信する動画データに併せて、前記音声合成手段が生成した合成音声データを前記端末へ送信する音声送信手段、
として動作させることを特徴とするコンピュータプログラム。 A computer used as the server of a video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network.
Using a plurality of blowing audio data received from the terminal, which is audio data associated with the reproduction time of the moving image, a plurality of sounds are synthesized so that the reproduction time of the moving image matches, and synthesized audio data is generated. Speech synthesis means,
Video transmission means for transmitting video data to the terminal;
Voice transmitting means for transmitting the synthesized voice data generated by the voice synthesizing means to the terminal together with the moving picture data transmitted by the moving picture transmitting means;
A computer program that operates as a computer program.

動画を配信するサーバと、当該動画を再生する端末とをネットワークを介して接続してなる動画コラボレーションシステムの前記端末として用いられるコンピュータを、
前記サーバから動画データを受信する動画受信手段、
前記動画受信手段が受信した動画データを再生する動画再生手段、
前記動画データに併せて前記サーバから、複数の端末より送信された複数の吹込音声データを用いて、動画の再生時刻が合致するように複数の音声を合成して生成された合成音声データを受信する音声受信手段、
前記音声受信手段が受信した合成音声データの音声を再生する音声再生手段、
自端末に吹き込まれ、動画の再生時刻に対応付けられた音声のデータである吹込音声データを、前記サーバへ送信する吹込音声送信手段、
として動作させることを特徴とするコンピュータプログラム。 A computer used as the terminal of a video collaboration system in which a server that distributes a video and a terminal that plays back the video are connected via a network.
Moving image receiving means for receiving moving image data from the server;
Moving image reproducing means for reproducing moving image data received by the moving image receiving means;
Combined with the moving image data, the synthesized audio data generated by synthesizing a plurality of sounds so that the reproduction times of the moving images match using a plurality of blown sound data transmitted from a plurality of terminals. Voice receiving means,
Voice reproduction means for reproducing the voice of the synthesized voice data received by the voice reception means;
Insufflation sound transmitting means for transmitting insufflation sound data, which is sound data associated with the playback time of the moving image, to the server,
A computer program that operates as a computer program.