JP2013058221A

JP2013058221A - Conference analysis system

Info

Publication number: JP2013058221A
Application number: JP2012230381A
Authority: JP
Inventors: Norihiko Moriwaki; 紀彦森脇; Nobuo Sato; 信夫佐藤; Tsuneyuki Imaki; 常之今木; Toshihiko Kashiyama; 俊彦樫山; Itaru Nishizawa; 格西澤; Masashi Egi; 正史恵木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-10-18
Filing date: 2012-10-18
Publication date: 2013-03-28
Anticipated expiration: 2027-04-12
Also published as: JP5433760B2

Abstract

PROBLEM TO BE SOLVED: To provide a conference visualizing system to promote more positive discussions by collecting voice of a plurality of participants in a conference and displaying an ever changing status of the discussions by the participants in real time.SOLUTION: A voice processing server 40 processes voice data collected from a plurality of speech collecting sections corresponding to a plurality of conference participants to extract speech information. Pieces of the speech information are input to a collection processing server 200 one by one. A stream data processing section of the processing server 200 creates activity data such as total utterance frequencies in a conference of the conference participants by applying query processing to the speech information. A display processing section 203 displays a discussion status of the conference participants by visualizing using size of circles or size of lines or the like on the basis of the activity data.

Description

本発明は、複数のメンバが集まる会議等において、音声データの収集および解析を行な
うことによって、リアルタイムにメンバ間のインタラクション状況を表示するための会議
可視化技術に関する。 The present invention relates to a conference visualization technique for displaying an interaction state between members in real time by collecting and analyzing audio data in a conference where a plurality of members gather.

知識労働者の生産性、創造性を向上させる手法が注目を集めている。新規のアイデアや
知識（ナレッジ）を創出するためには、異分野の専門家が集まり、議論を重ねることが重
要である。その中でも、個人の持つ知恵を組織の財産として共有・管理していくための方
法としてナレッジマネジメントと呼ばれる方法論が注目されている。ナレッジマネジメン
トは、組織文化・風土の改革までを含めた考え方であり、情報技術による知識共有の支援
ツールとしてナレッジマネジメント支援ツールと呼ばれるソフトウェアが開発・販売され
ている。現在販売されているナレッジマネジメント支援ツールの多くはオフィスで生産さ
れた文書を効率的に管理する機能が中心である。また、オフィス内の知識の多くがメンバ
間のコミュニケーションの中に存在することに注目したものがある。特許文献１には、組
織のメンバ間でなされる対話の状況を蓄積する技術が開示されている。更に、電子的なコ
ミュニケーションの場を提供することで知識の表出化を促進するツールが開発されている
。特許文献２には、電子的なインタラクションという観点において、電子メールの送受信
数カウントの比較結果によってメンバ間の影響を表示する技術が開示されている。 Techniques that improve the productivity and creativity of knowledge workers are attracting attention. In order to create new ideas and knowledge (knowledge), it is important that experts from different fields gather and discuss. Among them, a method called knowledge management is attracting attention as a method for sharing and managing the wisdom of individuals as assets of an organization. Knowledge management is a concept that includes the reform of organizational culture and culture, and software called knowledge management support tools has been developed and sold as support tools for knowledge sharing using information technology. Many of the knowledge management support tools currently on sale are centered on the function of efficiently managing documents produced in the office. Some of them noticed that much of the knowledge in the office exists in communication between members. Patent Document 1 discloses a technique for accumulating the state of dialogue between members of an organization. In addition, tools have been developed that promote the expression of knowledge by providing a place for electronic communication. Patent Document 2 discloses a technique for displaying the influence between members based on the comparison result of the number of sent and received e-mails from the viewpoint of electronic interaction.

特開２００５−２０２０３５号公報Japanese Patent Application Laid-Open No. 2005-202035 特開２００４−０４６６８０号公報JP 2004-046680 A

新規のアイデアや知識（ナレッジ）を創出するためには、異分野の専門家が集まり、議
論を重ねることが重要であり、有限の時間を有効に使った実りのある議論のプロセスが重
要である。従来のナレッジマネジメントツールは、議論の過程に着目したものではなく、
その結果に対しての情報共有に主眼をおいている。特許文献１では、参加者もしくは参加
者以外のものが蓄積された対話状況を再現することが目的であり、対話のプロセス自体に
注目したものではない。また、特許文献２では、メンバ間の影響度合いを計算しているが
、電子メールの送受信数という単純な数値に基づいており、議論のプロセスにまで踏み込
んだものではない。しかも、電子メールによるインタラクションは、一般的に深い議論を
行なうには、適しておらず、例え、高精細なテレビ会議システムなど、電子的なインタラ
クション技術が成熟したとしても、フェイス・トゥ・フェイスでの議論を完全に置換する
ものにはなり得ない。オフィスでのナレッジ創出には電子的なメディアを介さないフェイ
ス・トゥ・フェイスでの会話や会議が必須となっている。 In order to create new ideas and knowledge (knowledge), it is important for experts from different fields to gather and discuss repeatedly, and a fruitful discussion process using limited time effectively is important. . Traditional knowledge management tools are not focused on the discussion process,
The focus is on information sharing for the results. In Patent Document 1, the purpose is to reproduce a conversation state in which a participant or a person other than the participant is accumulated, and does not focus on the conversation process itself. Also, in Patent Document 2, the degree of influence between members is calculated, but based on a simple numerical value of the number of e-mails sent and received, it does not go into the discussion process. Moreover, e-mail interactions are generally not suitable for deep discussions, even if electronic interaction technologies such as high-definition video conferencing systems mature, face-to-face. Cannot completely replace the argument. Creating knowledge in the office requires face-to-face conversations and meetings that do not involve electronic media.

本発明は、複数のメンバが集まる会議等において、アイデアやナレッジの創出を促進・
誘発するための情報処理システムに関するものである。会議中の音声を取得して、発言者
（発話者）および、その発言回数、対話シーケンス、会議の活性度を計算して、刻々と変
わる会議の状況をリアルタイムに表示することで、参加者自身にフィードバックがかかり
、より積極的な議論を誘発するする会議可視化システムの提供を目的とする。 The present invention promotes creation of ideas and knowledge in meetings where a plurality of members gather.
The present invention relates to an information processing system for triggering. By acquiring the voice during the conference, the speaker (speaker), the number of utterances, the conversation sequence, and the activity of the conference are calculated, and the meeting status that changes from moment to moment is displayed in real time. The purpose is to provide a conference visualization system that can provide feedback and induce more active discussions.

上記目的を達成するため、本発明においては、会議における複数の会議参加者間の対話
状況を可視化して表示する会議可視化システムであって、会議参加者に対応した複数の音
声収集部と、音声収集部から収集した音声データを処理し、発話情報を抽出する音声処理
部と、音声処理部で抽出された発話情報が順次入力され、この発話情報に対してクエリ処
理を施すことにより会議参加者の会議におけるアクティビティデータを生成するストリー
ム処理部と、このアクティビティデータに基づき、前記会議参加者の対話状況を可視化し
て表示させる表示処理部とを有する
会議可視化システムを提供する。
本発明においては、音声データに所定の処理を行ない、発言者およびその発言回数、対
話回数を特定し、発言回数を円の大きさで、対話回数を線の太さで、リアルタイムに表示
する。さらに、キーストローク情報から得られた議論内容、発言者毎の発言回数累積、活
性度を同時に表示する。 To achieve the above object, according to the present invention, there is provided a conference visualization system that visualizes and displays a conversation state between a plurality of conference participants in a conference, and includes a plurality of audio collection units corresponding to the conference participants, and a voice The voice processing unit that processes the voice data collected from the collection unit and extracts the utterance information, and the utterance information extracted by the voice processing unit are sequentially input, and a query process is performed on the utterance information, thereby meeting participants A conference visualization system is provided that includes a stream processing unit that generates activity data in the conference and a display processing unit that visualizes and displays the conversation status of the conference participants based on the activity data.
In the present invention, predetermined processing is performed on the audio data, the speaker, the number of times of speaking, and the number of conversations are specified, and the number of times of speaking is displayed in a circle size and the number of conversations is displayed in real time with a line thickness. Furthermore, the discussion content obtained from the keystroke information, the cumulative number of speeches for each speaker, and the activity level are displayed simultaneously.

本発明によれば、議論状況をリアルタイムに把握しながら、議論を行なうことにより、
発言量が足りないメンバに対しては発言を促すようなフィードバックがかかる。もしくは
、会議の調停者が、議論状況をリアルタイムに把握しつつ、より多く参加者からのアイデ
アを出してもらうようなコントロールを行なうことで、議論の活性化および有効なナレッ
ジ創出が期待できる。 According to the present invention, by performing discussion while grasping the discussion status in real time,
Feedback that encourages the user to speak is applied to a member who does not have enough speaker. Or, the mediator of the conference can be expected to activate the discussion and create effective knowledge by controlling the participants to get more ideas from the participants while grasping the discussion situation in real time.

第一の実施例の会議可視化システムの構成図。The block diagram of the meeting visualization system of a 1st Example. 第一の実施例の会議可視化システムのシーケンス図。The sequence diagram of the meeting visualization system of a 1st Example. 第一の実施例の会議可視化システムの使用例を示す図。The figure which shows the usage example of the meeting visualization system of a 1st Example. 第一の実施例の参加者登録画面のイメージ図。The image figure of the participant registration screen of a 1st Example. 実施例の一般的なストリームデータ処理の構成図。The block diagram of the general stream data processing of an Example. 実施例の入力ストリームのスキーマ登録例を説明するための図。The figure for demonstrating the schema registration example of the input stream of an Example. 実施例の音源選択処理の実現形態を説明するための図。The figure for demonstrating the implementation | achievement form of the sound source selection process of an Example. 実施例のスムージング処理の実現形態を説明するための図。The figure for demonstrating the implementation | achievement form of the smoothing process of an Example. 実施例のアクティビティデータ生成処理の実現形態を説明するための図。The figure for demonstrating the implementation | achievement form of the activity data generation process of an Example. 実施例のアクティビティデータ生成処理の実現形態を説明するための図。The figure for demonstrating the implementation | achievement form of the activity data generation process of an Example. 第二の実施例の無線センサノードのブロック図。The block diagram of the wireless sensor node of a 2nd Example. 第二の実施例の名札型センサノードの使用形態を説明するための図。The figure for demonstrating the usage form of the name tag type | mold sensor node of a 2nd Example. 実施例のアクティビティデータ生成処理の実現形態を説明するための図。The figure for demonstrating the implementation | achievement form of the activity data generation process of an Example. 会議可視化システムの処理シーケンスの他の実施例を示す図。The figure which shows the other Example of the processing sequence of a meeting visualization system. ストリームデータ処理による、会議可視化データ処理の実現例を詳細に説明するための図。The figure for explaining in detail the example of meeting visualization data processing by stream data processing. 会議可視化システムの各実施例における会議の活性化度表示の他の表示例を示す図。The figure which shows the other example of a display of the activation degree display of the meeting in each Example of a meeting visualization system. 会議可視化システムの各実施例における会議の活性化度表示の他の表示例を示す図。The figure which shows the other example of a display of the activation degree display of the meeting in each Example of a meeting visualization system.

以下、本発明の一実施形態を添付図面に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

図３に第一の実施例の会議可視化システムを利用した会議シーンの一例を示す。４人の
メンバ（メンバＡ、メンバＢ、メンバＣ、メンバＤ）が会議を行なっている。会議卓に設
置されたマイク（マイクＡ、マイクＢ、マイクＣ、マイクＤ）より各メンバの発話がセン
シングされて、これらの発話データは音声処理サーバ４０を経由したのち、集計処理サー
バ２００で所定の処理が行なわれ、最終的に、この会議の状況がモニタ画面３００にリア
ルタイムに表示されている。参加メンバが可視化された会議状況から直接フィードバック
を受けることで、各メンバが発言のモチベーションを高めたり、司会者がより多くのアイ
デアが集まるような会議進行を行なう、といった効果が期待される。なお、ここで音声処
理サーバ４０や集計処理サーバ２００などのサーバは、通常のコンピュータシステムと同
義であり、例えば、集計処理サーバ２００は、処理部（ＣＰＵ）、記憶部（半導体メモリ
や磁気記憶装置）、キーボードやマウスなどの入力部、ネットワークと接続される通信部
などの入出力インタフェース部、更に必要ならＣＤやＤＶＤなどのメディアの読取書込み
部などが内部バスで接続されている構成を有する。この音声処理サーバ４０と集計処理サ
ーバ２００は、一個のサーバ（コンピュータシステム）で構成して良いことはいうまでも
ない。 FIG. 3 shows an example of a conference scene using the conference visualization system of the first embodiment. Four members (member A, member B, member C, member D) are having a meeting. The utterances of each member are sensed from microphones (Mic A, Mic B, Mic C, and Mic D) installed on the conference table, and these utterance data pass through the voice processing server 40 and are then predetermined by the tally processing server 200. Finally, the status of the conference is displayed on the monitor screen 300 in real time. By directly receiving feedback from the visualized conference situation, the members are expected to increase the motivation of the speech and to conduct the conference so that the presenter can gather more ideas. Here, the server such as the voice processing server 40 and the totalization processing server 200 is synonymous with an ordinary computer system. For example, the totalization processing server 200 includes a processing unit (CPU) and a storage unit (semiconductor memory or magnetic storage device). ), An input unit such as a keyboard and a mouse, an input / output interface unit such as a communication unit connected to a network, and a read / write unit of a medium such as a CD or a DVD if necessary. Needless to say, the voice processing server 40 and the tabulation processing server 200 may be configured by a single server (computer system).

図１に第一の実施例の会議可視化システムの全体図を示す。会議可視化システムは、活
動状況のセンシング、センシングデータの集計・解析処理、および、結果の表示、という
大きく分けて３つの機能より構成される。以下、これらの順番に従ってシステムの詳細を
説明する。会議卓３０には、メンバの着座位置に応じて音声収集部であるセンサ（マイク
）２０が設置されており、メンバが会議にて発言を行なった場合には、これらセンサ２０
にて発言のセンシングを行なう。また、会議卓３０には、パーソナルコンピュータ（ＰＣ
）１０が設置されている。このＰＣ１０は、キーストローク情報出力部として機能し、会
議の記録係が会議録を記述する際のキーストロークデータを出力する。このキーストロー
クデータは、集計処理サーバ２００の入出力インタフェース部を介して、サーバ２００内
に入力される。 FIG. 1 shows an overall view of the conference visualization system of the first embodiment. The conference visualization system is roughly divided into three functions: activity status sensing, sensing data aggregation / analysis processing, and result display. Hereinafter, the details of the system will be described according to the order. The conference table 30 is provided with a sensor (microphone) 20 that is a voice collecting unit according to the seating position of the member. When the member speaks at the conference, the sensor 20
Senses speech at. The conference table 30 includes a personal computer (PC).
) 10 is installed. The PC 10 functions as a keystroke information output unit, and outputs keystroke data when a conference recording person describes a conference record. This keystroke data is input into the server 200 via the input / output interface unit of the tabulation processing server 200.

図１の例においては、４つのセンサ（センサ２０−０〜２０−３）が設置されており、
それぞれ、メンバＡ〜メンバＤの発話音声を取得する。センサ２０から取得された音声デ
ータは音声処理サーバ４０に転送される。音声処理サーバ４０においては、その内部に設
置されたサウンドボード４１にて音声データのサンプリング処理が行なわれ、その後、音
声処理部４２にて、音の特徴量データ（具体的には、音声エネルギーの大きさ等）が抽出
される。通常この音声処理部４２は、音声処理サーバ４０内の図示されていない処理部（
ＣＰＵ）におけるプログラム処理として構成される。そして、音声処理サーバ４０にて生
成された特徴量データは、その入出力インタフェース部を介して、メンバの発話情報とし
て集計処理サーバ２００の入出力インタフェース部に転送される。転送される音声特徴量
データ５２は、時刻５２Ｔ、センサＩＤ（識別子）５２Ｓ、および、エネルギー５２Ｅを
含んでいる。また、発言者発言内容出力部であるＰＣ１０から取得されたキーストローク
データ５１も、集計処理サーバ２００に転送され、これは、時刻５１Ｔ、発言者５１Ｎ、
および、発言内容５１Ｗを含んでいる。 In the example of FIG. 1, four sensors (sensors 20-0 to 20-3) are installed.
The voices of members A to D are acquired. The voice data acquired from the sensor 20 is transferred to the voice processing server 40. In the sound processing server 40, sound data sampling processing is performed by a sound board 41 installed therein, and then sound feature data (specifically, sound energy data) Size, etc.) are extracted. Normally, the voice processing unit 42 is a processing unit (not shown) in the voice processing server 40 (
CPU). The feature amount data generated by the voice processing server 40 is transferred to the input / output interface unit of the tabulation processing server 200 as utterance information of the member via the input / output interface unit. The transferred voice feature data 52 includes a time 52T, a sensor ID (identifier) 52S, and energy 52E. In addition, the keystroke data 51 acquired from the PC 10 which is the speaker's speech content output unit is also transferred to the tabulation processing server 200, which includes the time 51T, the speaker 51N,
And the content 51W of utterance is included.

これらのセンシングデータは、集計処理サーバ２００内のストリームデータ処理部１０
０にて、会議の状況を可視化するためのデータである、アクティビティデータＡＤに変換
される。ストリームデータ処理１００では、それぞれのデータソースに対応したＷｉｎｄ
ｏｗ１１０を持っており、一定時間メモリに蓄えられている時系列のデータセットに対し
て、所定の数値演算処理を行なう。この演算処理は、リアルタイムクエリ処理１２０と呼
ばれ、具体的なクエリの設定や、参加者とデータのＩＤとの対応付けは、それぞれ、クエ
リ登録インタフェース２０２、参加者登録インタフェース２０１を通して行なわれる。な
お、上述のストリームデータ処理部１００、参加者登録インタフェース２０１、クエリ登
録インタフェース２０２は、先に説明した集計処理サーバ２００の図示されない処理部（
ＣＰＵ）で実行されるプログラムとして構成される。 These sensing data are stored in the stream data processing unit 10 in the aggregation processing server 200.
At 0, it is converted into activity data AD, which is data for visualizing the status of the conference. In the stream data processing 100, Wind corresponding to each data source
A predetermined numerical calculation process is performed on a time-series data set having ow 110 and stored in the memory for a certain period of time. This arithmetic processing is called real-time query processing 120, and specific query settings and association between participants and data IDs are performed through the query registration interface 202 and the participant registration interface 201, respectively. The stream data processing unit 100, the participant registration interface 201, and the query registration interface 202 described above are not shown in the processing unit (not shown) of the aggregation processing server 200 described above.
CPU).

通常、ストリームデータ処理部１００で生成されたアクティビティデータＡＤは、集計
処理サーバ２００中の図示されない記憶部中のテーブルなどに記憶され、順次、表示処理
部２０３の処理対象なる。本実施例では、具体的な、アクティビティデータＡＤとして、
４つのデータが生成される。 Normally, the activity data AD generated by the stream data processing unit 100 is stored in a table or the like in a storage unit (not shown) in the aggregation processing server 200 and is sequentially processed by the display processing unit 203. In this embodiment, as specific activity data AD,
Four data are generated.

１つ目は、議論活性化度５４であり、これは、時刻５４Ｔと、その時刻での議論の活性
化度５４Ａより構成される複数のリストである。議論活性化度５４Ａは、その議論に関し
ての発言量総和やメンバ参加数等をパラメータにして、計算される。例えば、単位時間当
たりの、発言総回数と発言を行なった参加者総数によって決定される。同図１では、一分
当たりの議論活性化度５４を例示している。２つ目のアクティビティデータは、発言内容
データ５５であり、これは、時刻５５Ｔと、その時刻に対応する発言者５５Ｓと発言内容
５５Ｃ、および、重要性５５Ｆより構成されている。実際には、ＰＣ１０からのキースト
ロークデータ５１に含まれる、時刻５１Ｔ、発言者５１Ｎ、および、発言内容５１Ｗが、
それぞれ、時刻５５Ｔ、発言者５５Ｓ、発言内容５５Ｃにマッピングされる。３つ目のア
クティビティデータは、発言回数データ５６であり、これは、時刻５６Ｔと、その時刻に
対応する、発言者５６Ｎと、発言者５６Ｎに対応する発言累積（回数）５６Ｃより構成さ
れている。４つ目のアクティビティデータは、発言シーケンスデータ５７であり、これは
、時刻５７Ｔと、その時刻に対応する、発言者の発話の順序関係である。具体的には、そ
の時刻にて、発言者（前）５７Ｂの発話の直後に、発言者（後）５７Ａが発話を行なった
回数５７Ｎを、あるウィンドウ時間内で求めたものである。 The first is the discussion activation level 54, which is a list composed of a time 54T and a discussion activation level 54A at that time. The discussion activation level 54A is calculated by using the total amount of speech and the number of members participating in the discussion as parameters. For example, it is determined by the total number of utterances per unit time and the total number of participants who made utterances. In FIG. 1, the discussion activation level 54 per minute is illustrated. The second activity data is utterance content data 55, which is composed of a time 55T, a utterer 55S corresponding to the time, a utterance content 55C, and an importance 55F. Actually, the time 51T, the speaker 51N, and the message content 51W included in the keystroke data 51 from the PC 10 are:
These are mapped to time 55T, speaker 55S, and statement content 55C, respectively. The third activity data is utterance count data 56, which is composed of a time 56T, a utterer 56N corresponding to that time, and a utterance accumulation (number) 56C corresponding to the utterer 56N. . The fourth activity data is the utterance sequence data 57, which is the order relationship between the time 57T and the utterance of the speaker corresponding to that time. More specifically, the number 57N of the utterances by the speaker (rear) 57A immediately after the utterance of the speaker (front) 57B at that time is obtained within a certain window time.

さて、ストリームデータ処理部１００で生成されたアクティビティデータＡＤに基づき
、表示処理部２０３にて描画処理が行なわれる。即ち、アクティビティデータＡＤは、次
段の表示処理部２０３にて、描画処理の素材データとして使用される。この表示処理部２
０３も集計処理サーバ２００の処理部（ＣＰＵ）で実行される描画処理プログラムとして
提供される。例えば、Ｗｅｂベースでの表示を行なう場合には、表示処理部２０３でＨＴ
ＭＬ（ＨｙｐｅｒＴｅｘｔＭａｋｅｕｐＬａｎｇｕａｇｅ）画像の生成処理等が行
なわれる。表示処理部２０３で生成された画像は、入出力インタフェース部を介して、モ
ニタに出力され、モニタ画面３００に示される画面構成で表示される。会議の様子は、モ
ニタ画面３００にて、活性度・発言表示３１０、発言累積３２０、および、発言シーケン
ス３３０の３つの要素として表示される。 Based on the activity data AD generated by the stream data processing unit 100, the display processing unit 203 performs drawing processing. That is, the activity data AD is used as material data for drawing processing in the display processing unit 203 at the next stage. This display processing unit 2
03 is also provided as a drawing processing program executed by the processing unit (CPU) of the aggregation processing server 200. For example, when displaying on the Web base, the display processing unit 203 performs HT.
ML (Hyper Text Makeup Language) image generation processing and the like are performed. The image generated by the display processing unit 203 is output to the monitor via the input / output interface unit and is displayed with the screen configuration shown on the monitor screen 300. The state of the meeting is displayed on the monitor screen 300 as three elements of an activity / speech display 310, a speech accumulation 320, and a speech sequence 330.

以下、素材データであるアクティビティデータを用いて表示される３つの要素について
説明する。活性度・発言表示３１０では、時間軸に沿って、リアルタイムにその会議の活
性度３１１と発言３１３が表示される。活性度３１１は、アクティビティデータＡＤの議
論活性化度５４の表示を行なったものであり、発言３１３はアクティビティデータＡＤ発
言内容データ５５を表示したものである。また、会議の統計データなどに基づいて、活性
度の指標３１２を表示することも可能である。発言累積３２０は、アクティビティデータ
ＡＤの発言回数データ５６に基づいて、会議開始からの参加者毎の発言回数を累積として
表示したものである。最後に、発言シーケンス３３０は、アクティビティデータＡＤの発
言回数データ５６と発言シーケンスデータ５７を使用して、参加者間の発話のやり取りを
可視化したものである。 Hereinafter, three elements displayed using the activity data that is the material data will be described. In the activity / speech display 310, the activity 311 and the speech 313 of the conference are displayed in real time along the time axis. The activity 311 displays the discussion activation level 54 of the activity data AD, and the message 313 displays the activity data AD message content data 55. Also, the activity index 312 can be displayed based on the statistical data of the conference. The utterance accumulation 320 displays the utterance count for each participant from the start of the conference as an accumulation based on the utterance count data 56 of the activity data AD. Finally, the speech sequence 330 visualizes the exchange of speech between participants using the speech count data 56 and speech sequence data 57 of the activity data AD.

具体的には、この発言シーケンス３３０で図示されている参加者毎の円の大きさ（３３
１Ａ、３３１Ｂ、３３１Ｃ、および、３３１Ｄ）は、過去から現在までの一定期間（例え
ば５分間）においての発言回数を円の大きさとして表しており、円と円との間のリンクの
太さは、参加者間での会話が多いか少ないか（会話のインタラクションの量）を可視化し
たものである。例えば、ＡとＢとの間のリンク３３２は細く、ＡとＤとの間のリンク３３
３は太く描かれており、ＡとＤとのインタラクションが多いことが示されている。本例で
は、Ａの発言の後にＤが発言した場合と、Ｄの発言の後にＡが発言した場合とは区別され
てはいないが、発言シーケンスデータ５７を使用することによりこれらを区別するような
表示方法も可能である。素材データ各々を用いて、これら活性度・発言表示３１０、発言
累積３２０、および発言シーケンス３３０の各要素を適宜表示することは、通常の図形描
画処理プログラムを、集計処理サーバ２００の図示されない処理部（ＣＰＵ）で実行する
ことにより実現できることは言うまでもない。 Specifically, the size of the circle for each participant shown in the speech sequence 330 (33
1A, 331B, 331C, and 331D) represent the number of utterances in a certain period (for example, 5 minutes) from the past to the present as the size of the circle, and the thickness of the link between the circles is This is a visualization of whether there is much or little conversation between participants (amount of conversation interaction). For example, the link 332 between A and B is thin and the link 33 between A and D is thin.
3 is drawn thick, indicating that there are many interactions between A and D. In this example, there is no distinction between the case where D speaks after the utterance of A and the case where A speaks after the utterance of D, but these are distinguished by using the utterance sequence data 57. A display method is also possible. Displaying each element of the activity / speech display 310, the speech accumulation 320, and the speech sequence 330 as appropriate using each of the material data means that a normal graphic drawing processing program is processed by a processing unit (not shown) of the tabulation processing server 200. Needless to say, this can be realized by executing the program on the CPU.

図２は、図１で示した全体図における代表的な機能モジュールでの処理シーケンスを示
したものである。まず音声収集部としてのセンサ（マイク）２０では音声データが取得さ
れる（２０Ａ）。次に、サウンドボード４１にて、音声のサンプリング処理が行なわれる
（４１Ａ）。次に音声処理部４２にて、発話情報としての特徴量の抽出（具体的にはエネ
ルギーへの変換）が行なわれる（４２Ａ）。エネルギーは、例えば数ミリ秒の音波形の絶
対値の２乗を全範囲に渡って積分したものである。なお後段にてより確度の高い音声処理
を行なうために、ここで、音声／非音声の識別を行なうことも可能である（４２Ｂ）。音
声／非音声の識別方法として、時間におけるエネルギーの変化度合いによる識別があげら
れる。音声には音波形エネルギーの強弱とその変化パターンがあり、それらを用いること
で音声と非音声の識別を行なう。上述の通り、特徴量抽出４２Ａ、更には音声／非音声判
別４２Ｂは、図示されない処理部（ＣＰＵ）のプログラム処理として実行される。 FIG. 2 shows a processing sequence in a representative functional module in the overall diagram shown in FIG. First, audio data is acquired by the sensor (microphone) 20 as an audio collection unit (20A). Next, sound sampling processing is performed on the sound board 41 (41A). Next, the voice processing unit 42 extracts a feature amount as speech information (specifically, conversion into energy) (42A). For example, the energy is obtained by integrating the square of the absolute value of a sound waveform of several milliseconds over the entire range. In order to perform voice processing with higher accuracy at a later stage, it is also possible to identify voice / non-voice here (42B). As a speech / non-speech discrimination method, discrimination based on the degree of change in energy over time can be given. The voice has the intensity of the sound wave energy and its change pattern, and the voice and the non-voice are distinguished by using them. As described above, the feature amount extraction 42A and the voice / non-voice discrimination 42B are executed as a program process of a processing unit (CPU) (not shown).

次に、ストリームデータ処理部１００にて、音源の選択（１００Ａ）、スムージング処
理（１００Ｂ）、アクティビティデータ生成（１００Ｃ）が行なわれる。最後に、表示処
理部２０３にて、アクティビティデータＡＤに基づいた、画面データ生成（２０３Ａ）が
行なわれる。なお、これらの具体的構成は他の実施例とも共通する部分が多いので後述す
る。 Next, the stream data processing unit 100 performs sound source selection (100A), smoothing processing (100B), and activity data generation (100C). Finally, the display processing unit 203 generates screen data (203A) based on the activity data AD. Since these specific configurations have many parts in common with other embodiments, they will be described later.

図４は参加者の登録画面６０を示したものである。会議卓３０のそれぞれの座席に座る
メンバとマイク（２０）とを対応させるために、画面の座席位置（６１Ａ〜６１Ｆ）の空
欄に参加者の名前を入力して登録を行なう（６２）。図４では、座席位置６１Ａ、６１Ｂ
、６１Ｃ、６１Ｄに、それぞれ、参加者の名前Ａ、Ｂ、Ｃ、Ｄを登録している例を示して
いる。なお、この登録画面６０は、上述したＰＣの画面や、各自の座席位置に設置した手
書き文字入力タブレットの入力画面等を用いれば良い。これらの登録作業は、これらの手
段によって入力された名前データに基づき、集計処理サーバ２００の参加者登録インタフ
ェース２０１を使用して行なわれる。 FIG. 4 shows a participant registration screen 60. In order to associate the members (20) with the members sitting on the respective seats of the conference table 30, registration is performed by inputting the names of the participants in the blanks of the seat positions (61A to 61F) on the screen (62). In FIG. 4, seat positions 61A and 61B
, 61C, and 61D show examples in which participant names A, B, C, and D are registered, respectively. The registration screen 60 may be the above-described PC screen, an input screen of a handwritten character input tablet installed at each seat position, or the like. These registration operations are performed using the participant registration interface 201 of the tabulation processing server 200 based on the name data input by these means.

以上説明した第一の実施例の会議可視化システムにより、発言者および、その発言回数
、対話シーケンス、会議の活性度を計算して、刻々と変わる会議の状況をリアルタイムに
表示することが可能となるため、参加者にフィードバックがかかって、より積極的で活性
度の高い議論を誘発することができる。 With the conference visualization system of the first embodiment described above, it is possible to calculate the speaker, the number of utterances, the dialogue sequence, and the activity of the conference, and display the ever-changing status of the conference in real time. Therefore, feedback is applied to the participants, and a more active and active discussion can be induced.

第一の実施例では、マイク２０から取得した音声データをベースに会議を可視化する方
法を示した。第二の実施例においては、会議の参加メンバに無線センサノードと呼ばれる
デバイスを与えることで、音声以外の情報も加味してより詳細に会議の状況を可視化する
会議可視化システムを提供する。 In the first embodiment, a method for visualizing a conference based on audio data acquired from the microphone 20 has been shown. In the second embodiment, a conference visualization system is provided in which a device called a wireless sensor node is given to a member who participates in a conference to visualize the conference status in more detail in consideration of information other than voice.

まず、無線センサノードの構成について図１１を用いて説明する。図１１は、無線セン
サノード７０の構成の一例を示すブロック図である。無線センサノード７０は、メンバ自
身の動きの測定（加速度を使用）、音声の測定（マイクロホンを使用）、着席位置の測定
（赤外線の送受信を使用）を行なうセンサ７４と、センサ７４を制御するコントローラ７
３と、無線基地局７６と通信を行なう無線処理部７３と、これらの各ブロックに電力を供
給する電源７１、無線データの送受信を行なうアンテナ７５より構成される。センサ７４
には具体的には、加速度センサ７４１、マイクロホン７４２、赤外線送受信器７４３が搭
載されている。 First, the configuration of the wireless sensor node will be described with reference to FIG. FIG. 11 is a block diagram illustrating an example of the configuration of the wireless sensor node 70. The wireless sensor node 70 includes a sensor 74 that measures the movement of the member itself (using acceleration), a sound (using a microphone), and a seating position (using infrared transmission / reception), and a controller that controls the sensor 74. 7
3, a wireless processing unit 73 that communicates with the wireless base station 76, a power source 71 that supplies power to each of these blocks, and an antenna 75 that transmits and receives wireless data. Sensor 74
Specifically, an acceleration sensor 741, a microphone 742, and an infrared transmitter / receiver 743 are mounted.

コントローラ７３は、予め設定された周期、もしくは不定期にセンサ７４の測定データ
を読み込み、この測定データに予め設定したセンサノードのＩＤを加えて無線処理部７２
に転送する。測定データにはセンシングを行った時間情報をタイムスタンプとして与える
場合もある。無線処理部７２は、コントローラ７３から送られたデータを基地局７６（図
１２に示す）に送信する。電源７１は、電池を使用する場合や、太陽電池や振動発電など
の自律発電機構を具備する構成としても良い。 The controller 73 reads the measurement data of the sensor 74 at a preset period or irregularly, and adds a preset ID of the sensor node to the measurement data to add a wireless processing unit 72.
Forward to. In some cases, the measurement data is provided with time information of sensing as a time stamp. The wireless processing unit 72 transmits the data sent from the controller 73 to the base station 76 (shown in FIG. 12). The power source 71 may be configured to include a battery or an autonomous power generation mechanism such as a solar cell or vibration power generation.

図１２に示すように、この無線センサノード７０を名札型に加工した名札型センサノー
ド７０Ａをユーザが装着することにより、ユーザの状態（動き等）に関するセンシングデ
ータを、リアルタイムに無線基地局７６を経由して、集計処理サーバ２００に送信するこ
とが可能となる。さらに、図１２に示すように、会議卓の各座席位置に設置された赤外線
送信器７７からのＩＤ情報を、名札型センサノード７０Ａが赤外線送受信器７４３にて定
期的に検出することで、着席位置の情報を自律的に集計処理サーバ２００に送信すること
も可能となる。このように、本実施例においては、名札型センサノード７０が、ユーザの
着席位置を、自動的に集計処理サーバ２００に送付すれば、登録画面６０を使用した参加
者登録処理（図４）を自動化することが可能となる。 As shown in FIG. 12, when a user wears a name tag type sensor node 70A obtained by processing the wireless sensor node 70 into a name tag type, sensing data relating to the user's state (movement, etc.) can be sent in real time to the radio base station 76. It becomes possible to transmit to the totalization processing server 200 via. Furthermore, as shown in FIG. 12, the name tag type sensor node 70A periodically detects the ID information from the infrared transmitter 77 installed at each seat position of the conference table by the infrared transmitter / receiver 743, so that the seating is performed. It is also possible to autonomously transmit the position information to the aggregation processing server 200. Thus, in this embodiment, if the name tag type sensor node 70 automatically sends the seating position of the user to the tabulation processing server 200, the participant registration process (FIG. 4) using the registration screen 60 is performed. It becomes possible to automate.

さて次に、図５以下の図面を用いて、上述した会議可視化システムを実現するストリー
ムデータ処理部１００について詳述する。上述の各実施例におけるアクティビティデータ
生成にはストリームデータ処理を用いる。このストリームデータ処理と呼ばれる技術自身
は公知の技術であり、Ｂ．Ｂａｂｃｏｃｋ、Ｓ．Ｂａｂｕ、Ｍ．Ｄａｔａｒ、Ｒ．Ｍｏｔ
ｗａｎｉａｎｄＪ．Ｗｉｄｏｍ、“Ｍｏｄｅｌｓａｎｄｉｓｓｕｅｓｉｎｄ
ａｔａｓｔｒｅａｍｓｙｓｔｅｍｓ”、ＩｎＰｒｏｃ．ｏｆＰＯＤＳ２００
２、ｐｐ．１−１６．（２００２）、Ａ．Ａｒａｓｕ、Ｓ．ＢａｂｕａｎｄＪ．Ｗ
ｉｄｏｍ、“ＣＱＬ：ＡＬａｎｇｕａｇｅｆｏｒＣｏｎｔｉｎｕｏｕｓＱｕｅ
ｒｉｅｓｏｖｅｒＳｔｒｅａｍｓａｎｄＲｅｌａｔｉｏｎｓ”、ＩｎＰｒｏｃ
．ｏｆＤＢＰＬ２００３、ｐｐ．１−１９（２００３）、などの文献に開示さ
れている。 Next, the stream data processing unit 100 that realizes the above-described conference visualization system will be described in detail with reference to FIG. Stream data processing is used for activity data generation in each of the embodiments described above. This technique called stream data processing is a known technique. Babcock, S.M. Babu, M.M. Data, R.A. Mot
Wani and J.W. Widom, “Models and issues in d
ata stream systems ", In Proc. of PODS 200
2, pp. 1-16. (2002), A.I. Arasu, S .; Babu and J.M. W
idom, “CQL: A Language for Continuous Que
rice over Streams and Relations ", In Proc
. of DBPL 2003, pp. 1-19 (2003), and the like.

図５は図１のストリームデータ処理部１００の機能動作を説明するための図である。ス
トリームデータ処理は、絶え間なく到来するデータの流れを対象に、フィルタリング処理
や集計処理などを、継続的に実行する技術である。個々のデータにはタイムスタンプが付
与されており、データはタイムスタンプの昇順に並んで流れる。以下では、このようなデ
ータの流れをストリームと呼び、個々のデータをストリームタプル、あるいは単にタプル
と呼ぶ。ある一つのストリーム上を流れるタプルは、単一のデータ型に従う。このデータ
型をスキーマと呼ぶ。スキーマとは任意個のカラムの組合せであり、各カラムは一つの基
本型（整数型、実数型、文字列型など）と、一つの名前（カラム名）の組合せである。 FIG. 5 is a diagram for explaining the functional operation of the stream data processing unit 100 of FIG. Stream data processing is a technique for continuously executing filtering processing, tabulation processing, and the like for the flow of data that arrives constantly. Each data is given a time stamp, and the data flows in ascending order of the time stamp. Hereinafter, such a data flow is referred to as a stream, and individual data is referred to as a stream tuple or simply as a tuple. Tuples that flow on a single stream follow a single data type. This data type is called a schema. A schema is a combination of an arbitrary number of columns, and each column is a combination of one basic type (integer type, real number type, character string type, etc.) and one name (column name).

ストリームデータ処理は、スキーマが定義されたストリーム上のタプルを対象に、リレ
ーショナルデータベースの計算モデルである関係代数に準じて、射影、選択、結合、集計
、和集合、差集合などの演算を実施する。但し、関係代数はデータの集合に対して定義さ
れるので、絶え間なくデータ列が続く（即ち、無限に集合の要素が増え続ける）ストリー
ムに対して関係代数を継続的に処理するには、処理対象となるタプルの集合を常に限定し
ながら実行する必要がある。 Stream data processing is performed on tuples on streams for which schemas are defined, according to relational algebra, which is a relational database calculation model, such as projection, selection, combination, aggregation, union, and difference set. . However, since a relational algebra is defined for a set of data, a continuous process of a relational algebra for a stream with a continuous data sequence (ie, an infinite increase in the elements of the set) It is necessary to always execute while limiting the set of target tuples.

このために、ストリームデータ処理では、ある時刻において処理対象となるタプル集合
を限定する、ウィンドウ演算が定義されている。このように、ストリーム上のタプルは、
関係代数で処理される前に、まずウィンドウ演算によって、処理対象となる期間を定義さ
れる。以下では、この期間をタプルの生存期間と呼び、生存期間を定義されたタプルの集
合をリレーションと呼ぶ。そして、このリレーションに対して関係代数が実施される。 For this reason, in the stream data processing, a window operation that defines a tuple set to be processed at a certain time is defined. In this way, tuples on the stream are
Before being processed with the relational algebra, a period to be processed is first defined by a window operation. Hereinafter, this period is called a tuple lifetime, and a set of tuples whose lifetime is defined is called a relation. A relational algebra is then implemented for this relation.

ウィンドウ演算の例を、５０１〜５０３を用いて説明する。５０１はストリームを、５
０２および５０３は、ストリーム５０１に対してウィンドウ演算を施した結果である、リ
レーションを示している。ウィンドウ演算は、生存期間の定義の仕方によって、時間ウィ
ンドウと個数ウィンドウに分かれる。時間ウィンドウは、各タプルの生存期間を定数時間
に定める。一方、個数ウィンドウは、同時に生存するタプルの個数を定数個に制限する。
リレーション５０２および５０３は、ストリーム５０１を時間ウィンドウ（５２１）と個
数ウィンドウ（５２２）で処理した結果を、それぞれ示している。 An example of window calculation will be described using 501 to 503. 501 is stream 5
Reference numerals 02 and 503 denote relations that are the results of performing window operations on the stream 501. The window operation is divided into a time window and a number window depending on how the lifetime is defined. The time window defines the lifetime of each tuple at a constant time. On the other hand, the number window limits the number of tuples that live at the same time to a constant number.
Relations 502 and 503 show the results of processing the stream 501 in the time window (521) and the number window (522), respectively.

ストリームの図における各黒丸はストリームタプルを表す。ストリーム５０１には、１
時２分３秒、４秒、７秒、８秒、１０秒、および１１秒に流れてくる、６つのストリーム
タプルが存在する。一方、リレーションの図における、黒丸を起点、白丸を終点とする各
線分は、タプルの生存期間を表す。なお、丁度終点の時刻は生存期間に含まれない。リレ
ーション５０２は、ストリーム５０１を、生存期間３秒の時間ウィンドウで処理した結果
である。例として、１時２分３秒のタプルの生存期間は、１時２分３秒から１時２分６秒
までとなる。但し１時２分６秒丁度は生存期間に含まれない。リレーション５０３は、ス
トリーム５０１を、同時生存数３個の個数ウィンドウで処理した結果である。例として、
１時２分３秒のタプルの生存期間は、１時２分３秒から、その３個後に流れてくるタプル
のタイムスタンプ１時２分８秒までとなる。但し１時２分８秒丁度は生存期間に含まれな
い。 Each black circle in the stream diagram represents a stream tuple. 1 in stream 501
There are six stream tuples flowing at hours 2: 3: 4, 7, 7, 8, 10, and 11 seconds. On the other hand, each line segment starting from the black circle and ending at the white circle in the relation diagram represents the lifetime of the tuple. Note that the end point time is not included in the lifetime. Relation 502 is the result of processing stream 501 in a time window with a lifetime of 3 seconds. For example, the lifetime of a 1: 2: 3 tuple is from 1: 2: 3 to 1: 2: 6. However, just 1: 2: 6 is not included in the lifetime. The relation 503 is a result of processing the stream 501 with a number window of three simultaneous survivals. As an example,
The lifetime of the tuple at 1: 2: 3 is from 1: 2: 3 to the time stamp of 1: 2: 8 from the tuple that flows three times later. However, just 1: 2: 8 is not included in the lifetime.

リレーション上の関係代数は、入力のリレーションに対する演算結果として、次のよう
な性質を持つ結果リレーションを出力する。まず、入力リレーションにおいて、ある時刻
に生存するタプルの集合に対し、従来の関係代数を実施した結果を、該時刻における結果
タプル集合と呼ぶ。このとき、任意の時刻において、該時刻における結果タプル集合が、
結果リレーションにおいて該時刻に生存するタプルの集合と一致する。 The relational algebra on the relation outputs a result relation having the following property as an operation result for the input relation. First, the result of performing a conventional relational algebra on a set of tuples that survive at a certain time in the input relation is called a result tuple set at that time. At this time, at any time, the result tuple set at that time is
It matches the set of tuples that survive at that time in the result relation.

リレーション上の関係代数の例を、５０４〜５０８を用いて説明する。この例は、リレ
ーション５０４とリレーション５０５の間の差集合演算を示し、リレーション５０６、５
０７、５０８は、その結果を示している。例えば、入力リレーション５０４と５０５にお
いて、１時２分８秒に生存するタプル集合は、それぞれ２個のタプルと１個のタプルから
成る。従って、１時２分８秒の結果タプル集合（即ち、両タプル集合の差集合）は、２−
１＝１個のタプルから成るタプル集合である。このような関係が、１時２分７秒から１時
２分９秒までの期間で成立する（但し、１時２分９秒丁度は含まず）。従って、結果リレ
ーションにおいて、この期間に生存するタプルは１個となる。結果リレーションの例とし
て、５０６、５０７、５０８は、全てこの性質を持つ。このように、一般に、リレーショ
ン上の関係代数の結果は、一意には定まらない。但し、ストリームデータ処理においては
、その何れも、リレーション上の関係代数の対象として等価である。 An example of relational algebra on the relation will be described using 504 to 508. This example shows the difference set operation between relation 504 and relation 505, and relations 506, 5
07 and 508 show the results. For example, in the input relations 504 and 505, the tuple sets that survive at 1: 2: 8 consist of two tuples and one tuple, respectively. Therefore, the result tuple set at 1: 2: 8 (that is, the difference set of both tuple sets) is 2-
1 = 1 Tuple set consisting of 1 tuple. Such a relationship is established in a period from 1: 2: 7 to 1: 2: 9 (however, it does not include 1: 2: 9). Therefore, in the result relation, one tuple survives during this period. As an example of the result relation, 506, 507, and 508 all have this property. Thus, in general, relational algebra results on relations are not uniquely determined. However, in stream data processing, all of them are equivalent as relational algebra targets in relations.

以上のように、リレーション上の関係代数の結果は一意には定まらないため、そのまま
アプリケーションに渡すことは好ましくない。これに対し、ストリームデータ処理では、
リレーションをアプリケーションに渡す前に、再びストリームに変換する演算が用意され
ている。これを、ストリーム化演算と呼ぶ。ストリーム化演算は、等価な結果リレーショ
ンの全てを同一のストリームに変換する。 As described above, since the relational algebra result on the relation is not uniquely determined, it is not preferable to pass it directly to the application. In contrast, in stream data processing,
Before passing the relation to the application, there is an operation to convert it back into a stream. This is called a stream operation. Streaming operations transform all equivalent result relations into the same stream.

ストリーム化演算によってリレーションから変換されたストリームを、さらにウィンド
ウ演算でリレーションに変換することも可能である。このように、ストリームデータ処理
の中では、リレーション化とストリーム化を任意に組合せることが可能である。 It is also possible to further convert a stream converted from a relation by a stream calculation operation into a relation by a window operation. In this way, in stream data processing, it is possible to arbitrarily combine relation and stream.

ストリーム化演算は、ＩＳｔｒｅａｍ、ＤＳｔｒｅａｍ、ＲＳｔｒｅａｍの３種類に分
かれる。ＩＳｔｒｅａｍは、リレーションにおいて、ある時刻に生存するタプル集合に、
タプルの増加があった場合に、その増加分のタプルを、該時刻をタイムスタンプとするス
トリームタプルとして出力する。ＤＳｔｒｅａｍは、リレーションにおいて、ある時刻に
生存するタプル集合に、タプルの減少があった場合に、その減少分のタプルを、該時刻を
タイムスタンプとするストリームタプルとして出力する。ＲＳｔｒｅａｍは、一定時間間
隔で、リレーションにおいてその時点で生存するタプル集合を、ストリームタプルとして
出力する。 Streaming operations are divided into three types: IStream, DStream, and RSstream. IStream is a tuple set that survives at a certain time in a relation.
When there is an increase in tuples, the incremented tuple is output as a stream tuple with the time as the time stamp. In the relation, when there is a tuple reduction in a tuple set that survives at a certain time in the relation, Dstream outputs the tuple corresponding to the reduction as a stream tuple having the time as the time stamp. RSstream outputs, as a stream tuple, a set of tuples that survive at that time in a relation at regular time intervals.

ストリーム化演算の例を、５０９〜５１１を用いて説明する。ストリーム５０９は、リ
レーション５０６〜５０８を、ＩＳｔｒｅａｍ（５２３）でストリーム化した結果である
。例として、リレーション５０６では、１時２分３秒にタプルが０個から１個に、１時２
分５秒に１個から２個に増える。このため、ストリーム５０９には１時２分３秒と１時２
分５秒に、それぞれ増分１個のストリームタプルが出力される。この結果は、リレーショ
ン５０７に対して処理しても変らない。例えば、リレーション５０７においては、１時２
分９秒に一つのタプルの生存期間が始まっているが、同時に、別のタプル（１時２分３秒
から生存期間が始まるタプル）の生存期間が終わる。このとき、後者のタプルの生存期間
に１時２分９秒丁度は含まれないため、１時２分９秒に生存するタプルは、丁度１個であ
る。従って、１時２分９秒にはタプルの増減は無いことになり、リレーション５０６に対
する結果と同じく、１時２分９秒のストリームタプルは出力されない。ＤＳｔｒｅａｍ（
５２４）とＲＳｔｒｅａｍ（５２５）についても同様に、リレーション５０６、５０７、
５０８の何れを対象としても、ストリーム化した結果は、それぞれストリーム５１０およ
びストリーム５１１になる（但し、ＲＳｔｒｅａｍのストリーム化間隔は１秒）。このよ
うに、一意には定まらない結果リレーションを、ストリーム化演算によって、一意のスト
リームに変換することが可能である。なお、以降の図では、生存期間終了の白丸を省略す
る。 An example of stream calculation will be described using 509 to 511. A stream 509 is a result of streaming the relations 506 to 508 using IStream (523). For example, in relation 506, tuples change from 0 to 1 at 1: 2: 3, 1: 2.
It increases from 1 to 2 in 5 minutes. Therefore, the stream 509 has 1: 2: 3 and 1: 2
One stream tuple is output in increments of 5 seconds each. This result does not change even if the relation 507 is processed. For example, in relation 507, 1 o'clock 2
The lifetime of one tuple begins at 9 minutes, but at the same time, the lifetime of another tuple (a tuple whose lifetime begins at 1: 2: 3) ends. At this time, since the lifetime of the latter tuple does not include exactly 1: 2: 9, there is exactly one tuple that survives at 1: 2: 9. Therefore, there is no increase / decrease of the tuple at 1: 2: 9, and the stream tuple of 1: 2: 9 is not output as the result for the relation 506. Dstream (
524) and RSstream (525), the relations 506, 507,
Regardless of any of 508, the streamed results are stream 510 and stream 511, respectively (however, the streaming interval of RSstream is 1 second). In this way, a result relation that is not uniquely determined can be converted into a unique stream by a stream operation. In the following figures, the white circle at the end of the lifetime is omitted.

ストリームデータ処理では、データ処理の内容をＣＱＬ（ＣｏｎｔｉｎｕｏｕｓＱｕ
ｅｒｙＬａｎｇｕａｇｅ）という宣言型言語で定義する。ＣＱＬの文法は、リレーショ
ナルデータベースにおいて標準的に利用される、関係代数に基づくクエリ言語ＳＱＬに、
ウィンドウ演算、およびストリーム化演算の記法を追加した形式をとる。ＣＱＬ文法の詳
細な定義は、ｈｔｔｐ：／／ｉｎｆｏｌａｂ．ｓｔａｎｆｏｒｄ．ｅｄｕ／ｓｔｒｅａｍ
／ｃｏｄｅ／ｃｑｌ−ｓｐｅｃ．ｔｘｔに開示されている。ここでは、その概要を説明す
る。次の４行は、ＣＱＬ文法に従うクエリの一例である。 In stream data processing, the content of data processing is defined as CQL (Continuous Qu
ery Language) is defined in a declarative language. The CQL grammar is based on the relational algebra-based query language SQL that is used as standard in relational databases.
It takes a form that adds notation of window operation and stream operation. A detailed definition of the CQL grammar can be found at http: // infolab. Stanford. edu / stream
/ Code / cql-spec. It is disclosed in txt. Here, the outline will be described. The next four lines are an example of a query according to the CQL grammar.

ＲＥＧＩＳＴＥＲＱＵＥＲＹｑＡＳ
ＩＳＴＲＥＡＭ（
ＳＥＬＥＣＴｃ１
ＦＲＯＭｓｔ［ＲＯＷＳ３］
ＷＨＥＲＥｃ２＝５）
ＦＲＯＭ句の“ｓｔ”は、ストリームを表す識別子（以下、ストリーム識別子、あるい
はストリーム名と呼ぶ）である。ストリーム名に続く“［“と”］”に囲まれた部分は、
ウィンドウ演算を表す記法である。例中の記述“ｓｔ［ＲＯＷＳ３］”は、ストリーム
ｓｔを、同時生存数３個の個数ウィンドウによって、リレーションに変換することを示し
ている。従って、この記述全体では、リレーションを出力する表現となる。なお、時間ウ
ィンドウは“［ＲＡＮＧＥ３ｓｅｃ］”のように、“ＲＡＮＧＥ”以降に生存期間を
示す記法となる。この他の記法として、“［ＮＯＷ］”と、“［ＵＮＢＯＵＮＤＥＤ］”
があり、それぞれ、非常に短い（但し、０ではない）生存期間と、永続を意味する。 REGISTER QUERY q AS
ISTREAM (
SELECT c1
FROM st [ROWS 3]
WHERE c2 = 5)
“St” in the FROM phrase is an identifier representing a stream (hereinafter referred to as a stream identifier or a stream name). The part surrounded by "[" and "]" following the stream name is
It is a notation representing window operation. The description “st [ROWS 3]” in the example indicates that the stream st is converted into a relation by a number window with three simultaneous survivals. Therefore, the entire description is an expression for outputting a relation. Note that the time window has a notation indicating the lifetime after “RANGE”, such as “[RANGE 3 sec]”. Other notations include "[NOW]" and "[UNBOUNDED]"
Each means very short (but not zero) lifetime and perpetuation.

ＦＲＯＭ句のリレーションを対象に、関係代数が実施される。例中の記述“ＷＨＥＲＥ
ｃ２＝５”は、カラムｃ２が５であるタプルを選択することを示している。また、例中
の記述“ＳＥＬＥＣＴｃ１”は、選択されたタプルのｃ１カラムのみを残して、結果リ
レーションとすることを示している。つまり、これらの記述の意味はＳＱＬと全く同じで
ある。 Relational algebra is implemented for relations in the FROM phrase. The description “WHERE” in the example
“c2 = 5” indicates that a tuple whose column c2 is 5 is selected. Also, the description “SELECT c1” in the example leaves only the c1 column of the selected tuple as a result relation. In other words, the meaning of these descriptions is exactly the same as that of SQL.

さらに、ＳＥＬＥＣＴ句からＷＨＥＲＥ句までの、リレーションを生成する表現全体を
、“（“と”）”で囲い、その前にストリーム化指定（例中の記述“ＩＳＴＲＥＡＭ”）
を置く記法は、該リレーションのストリーム化演算を示している。ストリーム化指定は、
他に“ＤＳＴＲＥＡＭ”と“ＲＳＴＲＥＡＭ”があり、“ＲＳＴＲＥＡＭ”では、“［“
、”］”で囲って、ストリーム化間隔を指定する。 Further, the entire expression for generating the relation from the SELECT clause to the WHERE clause is enclosed in “(“ and ”)”, and before that, the stream specification is specified (description “ISTREAM” in the example).
The notation indicates that the relation is streamed. Streaming specification is
In addition, there are “DSTREAM” and “RSTREAM”. In “RSTREAM”, “[“
, “]” To specify a streaming interval.

この例のクエリは、以下のように分解して定義することも可能である。 The query in this example can be defined by decomposing as follows.

ＲＥＧＩＳＴＥＲＱＵＥＲＹｓＡＳ
ｓｔ［ＲＯＷＳ３］
ＲＥＧＩＳＴＥＲＱＵＥＲＹｒＡＳ
ＳＥＬＥＣＴｃ１
ＦＲＯＭｓ
ＷＨＥＲＥｃ２＝５
ＲＥＧＩＳＴＥＲＱＵＥＲＹｑＡＳ
ＩＳＴＲＥＡＭ（ｒ）
ここで、ウィンドウ演算の前に置けるのはストリームを生成する表現、ＦＲＯＭ句に登
場できるのはリレーションを生成する表現、ストリーム化演算の引数はリレーションを生
成する表現に、それぞれ限定される。 REGISTER QUERY s AS
st [ROWS 3]
REGISTER QUERY r AS
SELECT c1
FROM s
WHERE c2 = 5
REGISTER QUERY q AS
ISTREAM (r)
Here, what can be placed before a window operation is limited to an expression for generating a stream, what can appear in the FROM phrase is an expression for generating a relation, and an argument of the stream operation is limited to an expression for generating a relation.

図５中のストリームデータ処理部１００は、以上のようなストリームデータ処理を実現
するためのソフトウェア構成を示す。ストリームデータ処理部１００は、ＣＱＬで定義さ
れたクエリが、クエリ登録インタフェース２０２に与えられると、クエリ解析部１２２で
クエリを構文解析し、クエリ生成部１２１によって、木構造の実行形式（以下、実行木と
呼ぶ）に展開する。該実行木は、各種演算を行なう演算子（ウィンドウ演算子１１０、関
係代数演算子１１１、ストリーム化演算子１１２）をノードとし、オペレータ間を繋ぐタ
プルキュー（ストリームキュー１１３、リレーションキュー１１４）をエッジとして構成
される。ストリームデータ処理部１００は、該実行木上の各演算子の処理を、適当な順番
で実行することで、処理を進める。 A stream data processing unit 100 in FIG. 5 shows a software configuration for realizing the above stream data processing. When a query defined in CQL is given to the query registration interface 202, the stream data processing unit 100 parses the query with the query analysis unit 122, and the query generation unit 121 executes a tree structure execution format (hereinafter referred to as execution format). It is called a tree. The execution tree has operators (window operator 110, relational algebra operator 111, stream operator 112) that perform various operations as nodes, and tuple queues (stream queue 113, relation queue 114) that connect operators as edges. Composed. The stream data processing unit 100 advances the processing by executing the processing of each operator on the execution tree in an appropriate order.

上述したストリームデータ処理技術に対応し、各実施例において、音声処理サーバ４０
から送られる発話情報であるストリーム５２、参加者登録インタフェース２０１を介して
登録されるストリーム５３、５８などの、ストリームデータ処理１００の外部から送られ
るストリームタプルは、まず、ストリームキュー１１３に入る。これらタプルは、ウィン
ドウ演算子１１０によって生存期間を定義され、リレーションキュー１１４に入る。リレ
ーションキュー１１４上のタプルは、関係代数演算子１１１によって、リレーションキュ
ー１１４を介してパイプライン的に処理される。リレーションキュー１１４上のタプルは
、ストリーム化演算子１１２によってストリーム化され、ストリームキュー１１３に入る
。ストリームキュー１１３上のタプルは、ストリームデータ処理部１００の外部へ送られ
るか、ウィンドウ演算子１１０で処理される。ウィンドウ演算子１１０からストリーム化
演算子１１２までのパスには、リレーションキュー１１４で接続された任意個の関係代数
演算子１１１が置かれる。一方、ストリーム化演算子１１２からウィンドウ演算子１１０
へは、一つのストリームキュー１１３で直接つながる。 Corresponding to the stream data processing technique described above, in each embodiment, the audio processing server 40
Stream tuples sent from the outside of the stream data processing 100 such as the stream 52 that is utterance information sent from and the streams 53 and 58 registered via the participant registration interface 201 first enter the stream queue 113. These tuples are defined in lifetime by the window operator 110 and enter the relation queue 114. Tuples on the relation queue 114 are processed in a pipeline manner via the relation queue 114 by the relational algebra operator 111. Tuples on the relation queue 114 are streamed by the stream operator 112 and enter the stream queue 113. Tuples on the stream queue 113 are sent to the outside of the stream data processing unit 100 or processed by the window operator 110. Arbitrary number of relational algebra operators 111 connected by the relation queue 114 are placed in the path from the window operator 110 to the stream operator 112. On the other hand, the stream operator 112 to the window operator 110
Is directly connected to one stream queue 113.

次に、図１５を用いて、実施例の会議可視化システムにおけるストリームデータ処理部
１００による会議可視化データ処理の実現方法を具体的に開示する。 Next, a method for realizing conference visualization data processing by the stream data processing unit 100 in the conference visualization system of the embodiment will be specifically disclosed with reference to FIG.

１５００〜１５２１は、ストリーム、またはリレーションの、識別名、およびスキーマ
を表す。上側の太枠四角が識別名を、下側の四角の並びがスキーマを構成するカラム名を
示している。７１０、７２０、７３０、８１０、８２０、８３０、８４０、８５０、９１
０、９２０、９３０、９４０、１０００、１０１０、１０２０、１３１０、１３２０、１
３３０の角丸四角は、データ処理の基本処理単位を示している。基本処理単位のそれぞれ
を、ＣＱＬ文法に従うクエリで実現する。クエリ定義、および動作の説明は、図７〜１０
、および図１３を用いて後述する。発話情報である音声特徴量データストリーム１５００
は、音声処理サーバ４０から、音量補正値ストリーム１５０１、および参加者ストリーム
１５０２は、参加者登録インタフェース２０１から、身振り強度ストリーム１５０３、お
よびうなずきストリーム１５０４は、名札型センサノード７０から、発言ログストリーム
１５０５は、ＰＣ（キーストロークセンシング）１０から、それぞれ送られてくる。これ
らを、音源選択１００Ａ、スムージング処理１００Ｂ、およびアクティビティデータ生成
１００Ｃの、各プロセスで順に処理して、出力となるストリーム１５１７〜１５２１を生
成する。１５０６〜１５１６は、中間データとなるストリーム、またはリレーションであ
る。 1500 to 1521 represent the identification name and schema of the stream or relation. The upper thick squares indicate the identification names, and the lower squares indicate the column names that make up the schema. 710, 720, 730, 810, 820, 830, 840, 850, 91
0, 920, 930, 940, 1000, 1010, 1020, 1310, 1320, 1
A rounded square 330 indicates a basic processing unit of data processing. Each basic processing unit is realized by a query according to the CQL grammar. The query definition and description of the operation are shown in FIGS.
And will be described later with reference to FIG. Speech feature data stream 1500 which is speech information
From the audio processing server 40, the volume correction value stream 1501, the participant stream 1502, from the participant registration interface 201, the gesture strength stream 1503, and the nod stream 1504 from the name tag type sensor node 70, and the message log stream 1505. Are sent from a PC (key stroke sensing) 10. These are sequentially processed in each process of sound source selection 100A, smoothing processing 100B, and activity data generation 100C to generate streams 1517 to 1521 to be output. Reference numerals 1506 to 1516 denote streams or relations as intermediate data.

音源選択１００Ａの処理は、基本処理単位７１０、７２０、７３０から構成される。各
処理の実現形態については、図７を用いて後述する。スムージング処理１００Ｂは、基本
処理単位８１０、８２０、８３０、８４０、８５０から構成される。各処理の実現形態に
ついては、図８を用いて後述する。アクティビティデータ生成１００Ｃの処理は、基本処
理単位９１０、９２０、９３０、９４０、１０００、１０１０、１０２０、１３１０、１
３２０、１３３０から構成される。基本処理単位９１０〜９４０は、モニタ画面３００の
３２０に可視化される発言数１５１７、３３０に可視化される発言時間１５１８、および
会話数１５１９を生成する。これら基本処理単位については、図９を用いて後述する。基
本処理単位１０００〜１０２０は、モニタ画面３００の３１１に可視化される活性度１５
２０を生成する。これら基本処理単位については、図１０を用いて後述する。基本処理単
位１３１０〜１３３０は、モニタ画面３００の３１３に可視化される発言ログ１５２１を
生成する。これら基本処理単位については、図１３を用いて後述する。 The processing of the sound source selection 100A includes basic processing units 710, 720, and 730. An implementation form of each process will be described later with reference to FIG. The smoothing process 100B includes basic processing units 810, 820, 830, 840, and 850. An implementation form of each process will be described later with reference to FIG. The processing of the activity data generation 100C includes basic processing units 910, 920, 930, 940, 1000, 1010, 1020, 1310, 1
320, 1330. The basic processing units 910 to 940 generate the number of utterances 1517 visualized on 320 of the monitor screen 300, the utterance time 1518 visualized on 330, and the number of conversations 1519. These basic processing units will be described later with reference to FIG. Basic processing units 1000 to 1020 have an activity of 15 visualized on 311 of the monitor screen 300.
20 is generated. These basic processing units will be described later with reference to FIG. The basic processing units 1310 to 1330 generate a statement log 1521 that is visualized on the monitor screen 313. These basic processing units will be described later with reference to FIG.

次に、図６を用いて、入力ストリームのスキーマ登録について開示する。 Next, the schema registration of the input stream will be disclosed using FIG.

コマンド６００を、例えば、集積解析処理サーバ２００の入力部からなどからクエリ登
録インタフェース２０２を介して、ストリームデータ処理部１００に投入することで、入
力ストリーム１５００〜１５０５を受け付ける６本のストリームキュー１１３が生成され
る。ＲＥＧＩＳＴＥＲＳＴＲＥＡＭの直後はストリーム名を、括弧内はスキーマを示
している。スキーマの、“，”に区切られた個々の記述は、カラムの名称と型の組合せを
示している。 By inputting the command 600 from the input unit of the integrated analysis processing server 200 to the stream data processing unit 100 via the query registration interface 202, for example, the six stream queues 113 that receive the input streams 1500 to 1505 are created. Generated. Immediately after REGISTER STREAM, the stream name is shown, and the parenthesis shows the schema. Each description of the schema delimited by “,” indicates a combination of column name and type.

６０１は、音声特徴量データストリーム１５００（ｖｏｉｃｅ）に入るストリームタプ
ルの例を示している。本例では、１０ミリ秒毎に、４つのマイクから、センサＩＤ（ｉｄ
カラム）と音量（ｅｎｅｒｇｙカラム）を組み合わせたストリームタプルが生成される様
子を示している。 Reference numeral 601 denotes an example of a stream tuple that enters the audio feature data stream 1500 (voice). In this example, the sensor ID (id) is received from four microphones every 10 milliseconds.
It shows how a stream tuple combining a column) and a volume (energy column) is generated.

次に、図７を用いて、音源選択処理１００Ａの基本処理単位７１０、７２０、７３０の
実現方法を開示する。 Next, a method of realizing the basic processing units 710, 720, and 730 of the sound source selection processing 100A will be disclosed using FIG.

コマンド７００を、クエリ登録インタフェース２０２を介して、ストリームデータ処理
部１００に投入することで、基本処理単位７１０、７２０、７３０を実現する実行木が生
成される。コマンド７００は、３つのクエリ登録書式７１０、７２０、７３０に分けられ
、それぞれ、基本処理単位７１０、７２０、７３０の処理内容を定義する（以下同様に、
基本処理単位と、その処理内容を定義するクエリの登録書式を、同義として扱い、同一の
番号で示す。また、クエリ登録書式を、単にクエリと呼ぶ）。 By inputting the command 700 to the stream data processing unit 100 via the query registration interface 202, an execution tree that realizes the basic processing units 710, 720, and 730 is generated. The command 700 is divided into three query registration formats 710, 720, and 730, and defines the processing contents of the basic processing units 710, 720, and 730, respectively (hereinafter, similarly,
The basic processing unit and the registration format of the query that defines the processing content are treated as synonymous and indicated by the same number. The query registration format is simply called a query).

クエリ７１０は、１０ミリ秒ごとの各時刻において、最大の音量を記録するマイク２０
を選択する。まず好適には、各マイクの音量に、定数の補正値を加算する。会議卓に取り
付けられた各マイクの感度は、会議卓の形状、材質、壁に対する位置関係、マイク自体の
品質、など様々な要因により、バラつきを持つため、該加算処理により、マイクの感度を
均等化する。マイク毎に異なる補正値は、音量補正値ストリーム１５０１（ｏｆｆｓｅｔ
）として参加者登録インタフェース２０１より登録される。図１のストリーム５８は、音
量補正値ストリームの例である（センサＩＤカラム５８Ｓ、および補正値カラム５８Ｖが
、それぞれ音量補正値ストリーム１５０１のｉｄカラム、およびｖａｌｕｅカラムを示す
）。音声データストリーム１５００と、音量補正値ストリーム１５０１とを、ｉｄカラム
に関する結合演算により結合し、ストリーム１５００の音量カラム（ｅｎｅｒｇｙ）の値
に、ストリーム１５０１の補正値カラム（ｖａｌｕｅ）の値を加算し、この値を改めてｅ
ｎｅｒｇｙカラムとする。該ｅｎｅｒｇｙカラムと、ｉｄカラムとを組み合わせたタプル
から成る、ストリームを、ｖｏｉｃｅ＿ｒとする。ストリーム６０１とストリーム５８に
対する、このクエリの結果をストリーム６０１Ｒに示す。 The query 710 is the microphone 20 that records the maximum volume at each time every 10 milliseconds.
Select. First, preferably, a constant correction value is added to the volume of each microphone. The sensitivity of each microphone attached to the conference table varies depending on various factors such as the shape of the conference table, the material, the positional relationship with the wall, the quality of the microphone itself, etc. Turn into. The correction value that differs for each microphone is the volume correction value stream 1501 (offset
) Is registered from the participant registration interface 201. A stream 58 in FIG. 1 is an example of a volume correction value stream (the sensor ID column 58S and the correction value column 58V indicate the id column and the value column of the volume correction value stream 1501, respectively). The audio data stream 1500 and the volume correction value stream 1501 are combined by a join operation related to the id column, and the value of the correction value column (value) of the stream 1501 is added to the value of the volume column (energy) of the stream 1500, Change this value to e
The energy column is used. A stream composed of a tuple combining the energy column and the id column is called voice_r. The result of this query for stream 601 and stream 58 is shown in stream 601R.

該ストリームｖｏｉｃｅ＿ｒから、集計演算“ＭＡＸ（ｅｎｅｒｇｙ）”によって最大
音量を算出し、その値と同じ音量のタプルを、ｅｎｅｒｇｙカラムに関する結合演算によ
り抽出する。ストリーム６０１Ｒに対するこのクエリの結果（ｖｏｉｃｅ＿ｍａｘ＿ｓｅ
ｔ）を、リレーション７１１に示す（クエリ７１０ではＮＯＷウィンドウを用いており、
リレーション７１１の各タプルの生存期間は非常に短いため、点で図示する。以下、ＮＯ
Ｗウィンドウによって定義されるタプルの生存期間は点で示す。なお、本クエリに関して
は、ＮＯＷウィンドウの代わりに、１０ミリ秒未満の時間ウィンドウを用いても構わない
）。 From the stream voice_r, the maximum volume is calculated by the aggregation operation “MAX (energy)”, and a tuple having the same volume as that value is extracted by a combination operation relating to the energy column. The result of this query for voice 601R (voice_max_se
t) is shown in relation 711 (the query 710 uses a NOW window,
The lifetime of each tuple in the relation 711 is very short and is illustrated with dots. Hereafter, NO
The lifetime of the tuple defined by the W window is indicated by a dot. For this query, a time window of less than 10 milliseconds may be used instead of the NOW window).

同時刻に最大音量を記録するマイクが２つ以上存在する場合もある。これに対し、クエ
リ７２０は、クエリ７１０の結果から、センサＩＤが最小のマイクのデータのみを選択す
ることで、マイクを一つに絞り込む。まず、集計演算“ＭＩＮ（ｉｄ）”によって最小Ｉ
Ｄを算出し、その値と同じＩＤのタプルを、ｉｄカラムに関する結合演算により抽出する
。リレーション７１１に対するこのクエリの結果（ｖｏｉｃｅ＿ｍａｘ）を、リレーショ
ン７２１に示す。 There may be two or more microphones that record the maximum volume at the same time. In contrast, the query 720 narrows down the microphones to one by selecting only the data of the microphone with the smallest sensor ID from the result of the query 710. First, the minimum I is calculated by the aggregation operation “MIN (id)”.
D is calculated, and a tuple with the same ID as that value is extracted by a join operation related to the id column. The result (voice_max) of this query for relation 711 is shown in relation 721.

クエリ７３０は、クエリ７２０の結果から、閾値を超えるデータのみを音源として残す
。また、センサＩＤを参加者データ５３と付き合わせて、参加者名に変換する。まず、ｅ
ｎｅｒｇｙカラムに関して範囲選択（＞１．０）をかけ、ｉｄカラムに関する結合演算と
ｎａｍｅカラムの射影演算で、音源となる発話者名のストリームを生成する。リレーショ
ン７２１に対するこのクエリの結果（ｖｏｉｃｅ＿ｏｖｅｒ＿ｔｈｒｅｓｈｏｌｄ）を、
ストリーム７３１に示す。以上で、音源選択１００Ａの処理が完了する。 The query 730 leaves only data exceeding the threshold value as a sound source from the result of the query 720. Also, the sensor ID is associated with the participant data 53 and converted into a participant name. First, e
A range selection (> 1.0) is applied to the energy column, and a stream of the speaker name to be a sound source is generated by a join operation on the id column and a projection operation on the name column. The result of this query for relation 721 (voice_over_threshold) is
This is shown in the stream 731. The sound source selection 100A process is thus completed.

次に、図８を用いて、スムージング処理１００Ｂの基本処理単位８１０、８２０、８３
０、８４０、８５０の実現方法を開示する。 Next, with reference to FIG. 8, basic processing units 810, 820, 83 of the smoothing process 100B.
A method of realizing 0, 840, and 850 is disclosed.

コマンド８００を、クエリ登録インタフェース２０２を介して、ストリームデータ処理
部１００に投入することで、基本処理単位８１０、８２０、８３０、８４０、８５０を実
現する実行木が生成される。 An execution tree that realizes the basic processing units 810, 820, 830, 840, and 850 is generated by inputting the command 800 to the stream data processing unit 100 via the query registration interface 202.

クエリ８１０は、クエリ７３０で得られた音源データにおける、同一発言者の連続する
音源断片について、間欠部分を補完し、平滑化された発言期間を抽出する。まず、ウィン
ドウ演算“［ＲＡＮＧＥ２０ｍｓｅｃ］”によって、ストリーム７３１上の各タプル
に２０ミリ秒の生存期間を与え、“ＤＩＳＴＩＮＣＴ”（重複排除演算）によって、同一
発言者のタプル重複を排除する。ストリーム７３１に対するこのクエリの結果（ｖｏｉｃ
ｅ＿ｆｒａｇｍｅｎｔ）を、リレーション８１１に示す。リレーション８１２は、該結果
に至る中間状態であり、ストリーム７３１上の、ｎａｍｅカラムの値が“Ｂ”であるタプ
ルについて、ウィンドウ演算で生存期間を定義した結果である。ストリーム７３１上では
、９時２分５．０３秒、５．０５秒、および５．０７秒において、ｎａｍｅカラムＢのタ
プルが抜けているが、リレーション８１２では、２０ミリ秒の生存期間によって補完され
る。一方、９時２分５．０８秒と５．０９秒のようにデータが連続する箇所では、生存期
間の重複が発生するが、ＤＩＳＴＩＮＣＴによって排除される。その結果、ｎａｍｅカラ
ムＢのタプルは、生存期間が９時２分５．０２秒から５．１１秒までの、一本のタプル８
１３に平滑化される。ｎａｍｅカラムＡ、Ｄのタプルのように、散発的に現れるタプルに
ついては、タプル８１４、８１５、８１６のように、２０ミリ秒の生存期間が定義された
タプルが散在する結果となる。 The query 810 supplements the intermittent portion and extracts the smoothed speech period for the sound source fragments of the same speaker in the sound source data obtained in the query 730. First, a lifetime of 20 milliseconds is given to each tuple on the stream 731 by the window operation “[RANGE 20 msec]”, and tuple duplication of the same speaker is eliminated by “DISTINCT” (deduplication operation). The result of this query on stream 731 (voic
e_fragment) is shown in relation 811. The relation 812 is an intermediate state leading to the result, and is a result of defining the lifetime by the window operation for the tuple on the stream 731 where the value of the name column is “B”. On stream 731, the tuple of name column B is missing at 9: 02: 5.03, 5.05, and 5.07 seconds, but in relation 812, it is complemented by a lifetime of 20 milliseconds. The On the other hand, in the portion where the data is continuous, such as 9: 02: 5.08 seconds and 5.09 seconds, the overlap of the lifetime occurs, but it is eliminated by DISTINCT. As a result, the tuple in the name column B is a single tuple with a lifetime of 9: 2: 5.02 to 5.11 seconds.
13 is smoothed. As for the tuples that appear sporadically, such as the tuples of the name columns A and D, the result is that there are scattered tuples having a defined lifetime of 20 milliseconds, such as the tuples 814, 815, and 816.

クエリ８２０は、クエリ８１０の結果から、持続時間が非常に短い瞬間的な発言（期間
）を、ノイズとして除去する。まず、リレーション８１１の各タプルについて、ストリー
ム化演算“ＩＳＴＲＥＡＭ”とウィンドウ演算“［ＲＡＮＧＥ５０ｍｓｅｃ］”によ
って、タプルの開始時刻から５０ミリ秒の生存期間を持つコピー（ｎａｍｅカラムの値が
、元のタプルと同一のタプル）を生成し、差集合演算“ＥＸＣＥＰＴ”によって、リレー
ション８１１から差し引くことで、生存期間が５０ミリ秒以下のタプルを除去する。リレ
ーション８１１に対するこのクエリの結果（ｓｐｅｅｃｈ）を、リレーション８２１に示
す。リレーション８２２は、該結果に至る中間状態であり、リレーション８１１上の各タ
プルについて、生存期間５０ミリ秒のコピーを作成した結果である。リレーション８１１
と８２２の差集合を取ると、タプル８１４、８１５、８１６は、タプル８２４、８２５、
８２６によって完全に消去される。一方、タプル８１３については、タプル８２３の生存
期間を差引かれて、９時２分５．０７秒から９時２分５．１１秒までの生存期間を持つタ
プル８２７が残る。このように、生存期間が５０ミリ秒以下のタプルは全て除去され、そ
れ以上の生存期間を持つタプルのみが、実際の発言データとして残る。 The query 820 removes an instantaneous utterance (period) having a very short duration from the result of the query 810 as noise. First, for each tuple of the relation 811, a copy having a lifetime of 50 milliseconds from the start time of the tuple (name column value is the original value) by the stream operation “ISTREAM” and the window operation “[RANGE 50 msec]”. (Tuple identical to the tuple) is generated and subtracted from the relation 811 by the difference set operation “EXCEPT” to remove the tuple having a lifetime of 50 milliseconds or less. The result (speech) of this query for relation 811 is shown in relation 821. The relation 822 is an intermediate state leading to the result, and is a result of creating a copy having a lifetime of 50 milliseconds for each tuple on the relation 811. Relation 811
And 822, the tuples 814, 815, 816 are tuples 824, 825,
826 is completely erased. On the other hand, for the tuple 813, the lifetime of the tuple 823 is subtracted, and the tuple 827 having the lifetime from 9: 2: 5.07 to 9: 2: 5.11 remains. In this way, all tuples having a lifetime of 50 milliseconds or less are removed, and only tuples having a lifetime longer than that remain as actual speech data.

クエリ８３０、８４０、および８５０は、クエリ８２０の結果から、ストリーム化演算
ＩＳｔｒｅａｍ、ＤＳｔｒｅａｍ、およびＲＳｔｒｅａｍによって、それぞれ、発言の開
始時刻、終了時刻、および発言中の時刻をタイムスタンプとする、ストリームタプルを生
成する。リレーション８２１に対する、各クエリの結果（ｓｔａｒｔ＿ｓｐｅｅｃｈ、ｓ
ｔｏｐ＿ｓｐｅｅｃｈ、およびｏｎ＿ｓｐｅｅｃｈ）を、それぞれストリーム８３１、８
４１、８５１に示す。以上で、スムージング処理１００Ｂが完了する。 Queries 830, 840, and 850 generate stream tuples with the start time, the end time, and the time during the speech as time stamps from the results of the query 820 by the stream operations IStream, Dstream, and RSstream, respectively. Generate. Result of each query (start_speech, s for relation 821
top_speech and on_speech) for streams 831 and 8 respectively.
41, 851. Thus, the smoothing process 100B is completed.

次に、図９を用いて、アクティビティデータ生成１００Ｃ中の基本処理単位９１０、９
２０、９３０、９４０の実現方法を開示する。コマンド９００を、クエリ登録インタフェ
ース２０２を介して、ストリームデータ処理１００に投入することで、基本処理単位９１
０、９２０、９３０、９４０を実現する実行木が生成される。 Next, with reference to FIG. 9, basic processing units 910, 9 in the activity data generation 100C.
20, 930 and 940 are disclosed. By inputting the command 900 to the stream data processing 100 via the query registration interface 202, a basic processing unit 91 is obtained.
Execution trees that realize 0, 920, 930, and 940 are generated.

クエリ９１０は、クエリ８３０の結果から、会議中の累積発言回数をカウントする。ま
ず、ウィンドウ演算“［ＲＯＷＳ１］”によって、発言開始タプルが発生する度にｎａ
ｍｅカラムの値が切替るリレーションを生成する。但し、同一発言者の発言開始タプルが
連続する場合には、リレーションは切替らない。このリレーションをストリーム化演算“
ＩＳＴＲＥＡＭ”でストリーム化することで、発言者に変化があった際の、発言開始時刻
を切り出す。さらに、該ストリームをウィンドウ演算“［ＵＮＢＯＵＮＤＥＤ］”で永続
化し、ｎａｍｅカラムでグルーピングして、集計演算“ＣＯＵＮＴ”でカウントすること
によって、発言者ごとの累積発話回数を算出する。 The query 910 counts the cumulative number of utterances during the meeting from the result of the query 830. First, each time an utterance start tuple is generated by the window operation “[ROWS 1]”, na
Create a relation that switches the value of the me column. However, the relation is not switched when the speech start tuples of the same speaker are continuous. This relation is streamed
Streaming with “ISTREAM” cuts out the speech start time when there is a change in the speaker. Further, the stream is made permanent with the window operation “[UNBOUNDED]”, grouped with the name column, and aggregated By counting with “COUNT”, the cumulative number of utterances for each speaker is calculated.

ｓｐｅｅｃｈリレーション９０１に対するこのクエリの結果（ｓｐｅｅｃｈ＿ｃｏｕｎ
ｔ）を、リレーション９１１に示す。ストリーム９１２は、リレーション９０１に対する
クエリ８３０の結果（ｓｔａｒｔ＿ｓｐｅｅｃｈ）である。リレーション９１３は、スト
リーム９１２を［ＲＯＷＳ１］のウィンドウ演算で処理した結果である。ストリーム９
１４は、リレーション９１３をＩＳｔｒｅａｍでストリーム化した結果である。このとき
、タプル９１５の開始時刻に対して、ストリームタプル９１７が生成されるが、タプル９
１５と９１６は、同一発言者“Ｂ”のリレーションであり、タプル９１５の終点とタプル
９１６の始点は同一時刻（９時１８分１５秒）になるため、９時１８分１５秒のタプルは
生成されない。ストリーム９１４を、ｎａｍｅでグルーピングして永続化してカウントし
た結果が、リレーション９１１となる。永続化したリレーションをカウントするので、ス
トリーム９１４にタプルが発生する度に、発言数が累積される。 The result of this query for the spech relation 901 (speech_count
t) is shown in relation 911. A stream 912 is a result (start_speech) of the query 830 for the relation 901. The relation 913 is a result of processing the stream 912 by the window operation of [ROWS 1]. Stream 9
14 is the result of streaming the relation 913 with IStream. At this time, a stream tuple 917 is generated with respect to the start time of the tuple 915.
15 and 916 are relations of the same speaker “B”. Since the end point of the tuple 915 and the start point of the tuple 916 are at the same time (9:18:15), a tuple of 9:18:15 is generated. Not. A result obtained by grouping the streams 914 by name and making them permanent and counting becomes a relation 911. Since permanent relations are counted, the number of utterances is accumulated each time a tuple is generated in the stream 914.

クエリ９２０は、クエリ８５０の結果から、過去５分間における発言者ごとの発言時間
を算出する。まず、ｏｎ＿ｓｐｅｅｃｈストリームの各タプルに対し、ウィンドウ演算“
［ＲＡＮＧＥ５ｍｉｎ］”で、５分間の生存期間を定義し、ｎａｍｅカラムでグルー
ピングして、集計演算“ＣＯＵＮＴ”によってカウントする。この処理は、過去５分間に
おいて、ｏｎ＿ｓｐｅｅｃｈストリーム上に存在したタプルの個数を数えることに相当す
る。なお、ｏｎ＿ｓｐｅｅｃｈストリームタプルは、秒間１００個のレートで生成される
ため、ＳＥＬＥＣＴ句でこの個数を１００で割って、秒単位の発言時間を算出する。 The query 920 calculates the speech time for each speaker in the past 5 minutes from the result of the query 850. First, for each tuple of the on_speech stream, a window operation “
[RANGE 5 min] ”defines a 5 minute lifespan, groups by name column, and counts by the aggregate operation“ COUNT ”. This process takes the tuples that have been on the on_spec stream for the past 5 minutes. Note that since the on_spec stream stream tuple is generated at a rate of 100 per second, this number is divided by 100 in the SELECT phrase to calculate the speech time in seconds.

クエリ９３０は、クエリ８３０および８４０の結果から、ある発言の終了後３秒以内に
、別の発言者の発言が開始されたケースを、二者間の会話として抽出する。まず、ｓｔｏ
ｐ＿ｓｐｅｅｃｈストリームとｓｔａｒｔ＿ｓｐｅｅｃｈストリームの各タプルに対し、
それぞれウィンドウ演算“［ＲＡＮＧＥ３ｓｅｃ］”と“［ＮＯＷ］”で、生存期間
を定義し、ｎａｍｅカラムに関する結合演算（一致しないことを条件とする）により、ｓ
ｔｏｐ＿ｓｐｅｅｃｈタプル発生の３秒以内に、ｓｔａｒｔ＿ｓｐｅｅｃｈタプルが発生
する組合せを抽出する。結果は、ｓｔｏｐ＿ｓｐｅｅｃｈ．ｎａｍｅをｐｒｅカラムに、
ｓｔａｒｔ＿ｓｐｅｅｃｈ．ｎａｍｅをｐｏｓｔカラムに射影して出力する。ｓｐｅｅｃ
ｈリレーション９０１に対するこのクエリの結果（ｓｐｅｅｃｈ＿ｓｅｑｕｅｎｃｅ）を
、ストリーム９３１に示す。ストリーム９３２は、リレーション９０１に対するクエリ８
４０の結果（ｓｔｏｐ＿ｓｐｅｅｃｈ）であり、リレーション９３３は、ストリーム９３
２の各タプルに３秒間の生存期間を定義した中間状態である。また、ストリーム９１２を
ＮＯＷウィンドウでリレーションに変換した結果は、９１２と同一の図になる。該リレー
ションと、リレーション９３３の結合演算の結果を、さらにＩＳｔｒｅａｍでストリーム
化した結果が、ストリーム９３１となる。 The query 930 extracts, from the results of the queries 830 and 840, a case in which another speaker starts speaking within 3 seconds after the end of a certain statement, as a conversation between the two parties. First, sto
For each tuple of p_speech stream and start_speech stream,
The window operations “[RANGE 3 sec]” and “[NOW]” define the lifetime, respectively, and s
A combination in which the start_speech tuple is generated is extracted within 3 seconds of the top_speech tuple generation. The result is stop_speech. name in the pre column,
start_speech. Name is projected onto the post column and output. speed
The result (speech_sequence) of this query for the h relation 901 is shown in the stream 931. Stream 932 is query 8 for relation 901.
40 (stop_speech) and relation 933 is stream 93
It is an intermediate state in which a survival time of 3 seconds is defined for each tuple of 2. Also, the result of converting the stream 912 into a relation in the NOW window is the same diagram as 912. A stream 931 is obtained by further streaming the relation and the result of the join operation of the relation 933 with IStream.

クエリ９４０は、クエリ９３０の結果から、会議中の累積会話回数を、二者の組合せ別
にカウントする。まず、ウィンドウ演算“［ＵＮＢＯＵＮＤＥＤ］”で永続化し、“Ｇｒ
ｏｕｐｂｙｐｒｅ，ｐｏｓｔ”で、ｐｒｅカラムとｐｏｓｔカラムの組合せ別にグル
ーピングし、集計演算“ＣＯＵＮＴ”によってカウントする。永続化したリレーションを
カウントするので、ストリーム９３１にタプルが発生する度に、会話数が累積される。 The query 940 counts the cumulative number of conversations during the meeting for each combination of the two from the result of the query 930. First, it is made permanent by the window operation “[UNBOUNDED]” and “Gr
"up by pre, post", grouping by combination of pre column and post column, and counting by counting operation "COUNT". Since the relations that have been perpetuated are counted, the number of conversations is increased each time a tuple occurs in the stream 931. Accumulated.

次に、図１０を用いて、アクティビティデータ生成１００Ｃ中の基本処理単位１０００
、１０１０、１０２０の実現方法を開示する。クエリ１０００、１０１０、および１０２
０を、クエリ登録インタフェース２０２を介して、ストリームデータ処理部１００に投入
することで、それぞれ、基本処理単位１０００、１０１０、および１０２０を実現する実
行木が生成される。これら３種のクエリは、全て会議の盛り上り度を算出する。但し、盛
り上り度の定義は各クエリで異なる。 Next, referring to FIG. 10, the basic processing unit 1000 in the activity data generation 100C.
1010 and 1020 are disclosed. Queries 1000, 1010, and 102
By inputting 0 into the stream data processing unit 100 via the query registration interface 202, execution trees that realize the basic processing units 1000, 1010, and 1020, respectively, are generated. These three types of queries all calculate the meeting excitement. However, the definition of the degree of excitement differs for each query.

クエリ１０００は、ストリーム１５００（ｖｏｉｃｅ）の全マイクの音量値を、過去３
０秒間累積した値として、盛り上り度を算出する。本クエリは、ウィンドウ演算“［ＲＡ
ＮＧＥ３０ｓｅｃ］”と、集計演算“ＳＵＭ（ｅｎｅｒｇｙ）”により、過去３０秒
間におけるストリーム１５００上のタプルのｅｎｅｒｇｙカラム値の和を計算する。また
、ストリーム化演算“ＲＳＴＲＥＡＭ［３ｓｅｃ］”によって、結果の出力を３秒間隔
としている（以下、クエリ１０１０、１０２０についても同様）。以上、クエリ１０００
では、会議出席者の発言エネルギーの総和を、盛り上り度の指標としている。 The query 1000 sets the volume values of all the microphones of the stream 1500 (voice) to the past 3
The degree of excitement is calculated as a value accumulated for 0 seconds. This query uses the window operation “[RA
NGE 30 sec] ”and the summation operation“ SUM (energy) ”to calculate the sum of the energy column values of the tuples on the stream 1500 in the past 30 seconds. Also, the streaming operation“ RSTREAM [3 sec] ” The result output is set at intervals of 3 seconds (hereinafter, the same applies to the queries 1010 and 1020).
Then, the sum of the speech energy of the attendees of the conference is used as an index of the excitement.

クエリ１０１０は、過去３０秒間における、発言者数と会話回数の積として、盛り上り
度を算出する。この盛り上り度は先に説明した単位時間当たりの発言総回数と発言者総数
の積から算出する議論活性化度５４の一具体例となる。クエリ１０１１は、ストリーム１
５１４（ｓｐｅｅｃｈ＿ｓｅｑｕｅｎｃｅ）の、過去３０秒間のタプルをカウントする。
該クエリの結果のリレーション名をｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔとす
る。クエリ１０１２は、ストリーム１５１１（ｓｔａｒｔ＿ｓｐｅｅｃｈ）の、過去３０
秒間のタプルをカウントする。該クエリの結果のリレーション名をｒｅｃｅｎｔ＿ｓｐｅ
ａｋｅｒｓ＿ｃｏｕｎｔとする。クエリ１０１３は、両者の積を算出する。ｒｅｃｅｎｔ
＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔとｒｅｃｅｎｔ＿ｓｐｅａｋｅｒｓ＿ｃｏｕｎｔのど
ちらのリレーションにおいても、自然数の値を持つｃｎｔカラムのみから成るタプルが、
常に丁度一つ生存することになる。従って、両者の積を取った結果も、常に丁度一つのタ
プルが生存するリレーションとなる。 The query 1010 calculates the degree of excitement as the product of the number of speakers and the number of conversations in the past 30 seconds. This excitement is a specific example of the discussion activation degree 54 calculated from the product of the total number of utterances per unit time and the total number of speakers described above. Query 1011 is stream 1
Count tuples in the past 30 seconds of 514 (speech_sequence).
The relation name of the result of the query is represented as "reent_sequences_count". The query 1012 is the past 30 of the stream 1511 (start_speech).
Count tuples per second. The relation name of the query result is represented_rece_spe
aksers_count. A query 1013 calculates the product of both. recent
In both the _sequences_count and the recent_speakers_count relations, a tuple consisting only of cnt columns with natural values is
There will always be exactly one. Therefore, the result of taking the product of both is also a relation in which exactly one tuple always exists.

但し、この積を単純に“ｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔ＊
ｒｅｃｅｎｔ＿ｓｐｅａｋｅｒｓ＿ｃｏｕｎｔ．ｃｎｔ”で計算すると、一人の発言者
が長時間話している期間では、会話数が０になるので、結果も０となってしまう。これを
回避するため、“ｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔ”の代わりに
、“（ｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔ＋１／（１＋ｒｅｃｅ
ｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔ））”を利用する。“＋”以降の、“＋
１／（１＋ｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔ）”の部分は、整数
の商であるため、ｒｅｃｅｎｔ＿ｓｅｑｕｅｎｃｅｓ＿ｃｏｕｎｔ．ｃｎｔが０の場合に
＋１、０より大きい場合に＋０となる。その結果、誰も発言者が居ない沈黙の期間は盛り
上り度が０、一人の発言者が長時間話している期間は１、二人以上の発言者がいる期間は
発言者数と会話数の積となる。以上、クエリ１０１０では、会議出席者の中で議論に参加
している人数が多いこと、および、意見の交換が頻繁であることを、盛り上がり度の指標
としている。 However, this product is simply changed to “reent_sequences_count.cnt *
recipient_speakers_count. When calculated by “cnt”, the number of conversations becomes 0 during a period in which one speaker is speaking for a long time, and the result is also 0. In order to avoid this, “recent_sequences_count. Instead of “cnt”, “(recent_sequences_count.cnt + 1 / (1 + rece
nt_sequences_count. cnt)) ". After" + "," +
The part of 1 / (1 + recent_sequences_count.cnt) ”is an integer quotient, so it is +1 when the current_sequences_count.cnt is 0, and +0 when it is greater than 0. As a result, there is no silence for any speaker During the period, the degree of excitement is 0, the period when one speaker is speaking for a long time is 1, the period when there are two or more speakers is the product of the number of speakers and the number of conversations. The number of attendees who are participating in the discussion and the frequent exchange of opinions are used as indicators of excitement.

クエリ１０２０は、発言者の身振りの強度として、盛り上り度を算出する。クエリ１０
２１は、身振りの瞬間強度を表すストリーム１５０３（ｍｏｔｉｏｎ）をＮＯＷウィンド
ウで処理した結果のリレーションと、発言者の発言期間を表すリレーション１５１０（ｓ
ｐｅｅｃｈ）とを、ｎａｍｅカラムに関する結合演算にかけることで、発言中の出席者に
ついて身振り強度を抽出する。クエリ１０２２は、過去３０秒間における、発言者の身振
り強度を累積する。以上、クエリ１０２０では、発言者の身振りの強弱が、議論の白熱度
を反映すると仮定し、盛り上り度の指標としている。 The query 1020 calculates the degree of excitement as the strength of the speaker's gesture. Query 10
21 is a relation obtained as a result of processing the stream 1503 (motion) representing the instantaneous intensity of gesture by the NOW window, and a relation 1510 (s) representing the speech period of the speaker.
peech) is subjected to a join operation with respect to the name column, thereby extracting the gesture strength of the attendee who is speaking. The query 1022 accumulates the gesture strength of the speaker over the past 30 seconds. As described above, in the query 1020, it is assumed that the strength of the speaker's gesture reflects the incandescence of the discussion, and is used as an index of the degree of excitement.

ここで示した盛り上り度の定義は一例であり、会議の盛り上り度の数値化は、確立した
定義のない、人間の主観に関わるデータであるため、試行を繰返し的確な定義を探索する
必要がある。新しい定義を試行する度に、算出ロジックを、Ｃ、Ｃ＃、Ｊａｖａ（登録商
標）などの手続き型言語でコーディングするのでは、開発工数が甚大である。特に、クエ
リ１０１０のような、発言間の順序関係に基づいた指標を算出するロジックは、コードが
複雑化し、デバグも困難となる。これに対し、議論活性化度などを例示して説明した本実
施例のように、ストリームデータ処理を利用することで、簡潔な宣言型クエリによる定義
が可能となるため、このような工数を大幅に軽減する。 The definition of climax shown here is an example, and the quantification of the climax of the meeting is data related to human subjectivity without an established definition. There is. Each time a new definition is tried, if the calculation logic is coded in a procedural language such as C, C #, Java (registered trademark), the development man-hours are enormous. In particular, logic such as the query 1010 that calculates an index based on the order relationship between statements makes the code complicated and makes debugging difficult. On the other hand, since stream data processing can be used to define a simple declarative query, as in this example, which illustrates the degree of discussion activation, etc., this man-hour is greatly increased. To reduce.

次に、図１３を用いて、アクティビティデータ生成１００Ｃ中の基本処理単位１３１０
、１３２０、１３３０の実現方法を開示する。 Next, referring to FIG. 13, basic processing unit 1310 in activity data generation 100C.
, 1320, 1330 are disclosed.

コマンド１３００を、クエリ登録インタフェース２０２を介して、ストリームデータ処
理１００に投入することで、基本処理単位１３１０、１３２０、１３３０を実現する実行
木が生成される。 By inputting the command 1300 to the stream data processing 100 via the query registration interface 202, an execution tree that realizes the basic processing units 1310, 1320, and 1330 is generated.

多くの出席者から賛同を得た発言は、会議中の重要発言であると捉える。このような発
言を抽出するために、クエリ１３１０は、リレーション１５１０（ｓｐｅｅｃｈ）と、う
なずき状態を表すストリーム１５０４（ｎｏｄ）から、発言者の意見が多数の出席者に賛
同されている（＝うなずかれている）状態を抽出する。うなずき状態の検出は、名札型セ
ンサノード７０が備える加速度センサ７４１で計測する加速度値より、パターン認識技術
を利用して、実現することが可能である。本実施例では、１秒間隔で、その時刻において
出席者がうなずき動作中である場合に、該出席者名をｎａｍｅカラムに示すタプルが発生
する、と仮定する。まず、ストリーム１５０４上の各タプルに対し、ウィンドウ演算“［
ＲＡＮＧＥ１ｓｅｃ］”によって１秒の生存期間を定義することで、出席者ごとのう
なずき期間を表すリレーションが得られる（例：リレーション１３０２）。 The comments obtained from many attendees are regarded as important comments during the meeting. In order to extract such a remark, the query 1310 is obtained from the relation 1510 (speech) and the stream 1504 (nod) indicating the state of nodding, and the opinion of the speaker is approved by many attendees (= nodding). State). The detection of the nodding state can be realized by using a pattern recognition technique from the acceleration value measured by the acceleration sensor 741 provided in the name tag type sensor node 70. In the present embodiment, it is assumed that a tuple indicating the attendee name in the name column is generated when the attendee is nodding at the time at 1 second intervals. First, for each tuple on the stream 1504, the window operation “[
RANGE 1 sec] ”defines a one second life span, resulting in a relation representing the nod period for each attendee (eg, relation 1302).

該リレーションと、発言期間を表すリレーション１５１０（例：リレーション１３０１
）を、ｎａｍｅカラムに関する結合演算（一致しないことを条件とする）にかけることで
、発言者以外の出席者がうなずいている期間を、タプルの生存期間とするリレーション（
例：リレーション１３１２）が得られる。該リレーションにおいて、生存タプルが２個以
上ある（＝２人以上の出席者が、うなずきながら聞いている）期間を、ＨＡＶＩＮＧ句に
よって抽出する。このとき、射影演算によって、発言者の名前（ｓｐｅｅｃｈ．ｎａｍｅ
カラム）と、定数文字列’ｙｅｓ’の値を持つｆｌａｇカラムから成るタプルを出力する
（例：リレーション１３１３）。この結果をＩＳｔｒｅａｍでストリーム化し、クエリ１
３１０の結果を得る（例：ストリーム１３１１）。ストリーム１３１１は、発言者Ｂの発
言が、他の出席者ＣとＤの二人にうなずかれたタイミングで、タプルが発生する様子を示
している。 The relation and a relation 1510 (for example, relation 1301) indicating a speech period
) Is subjected to a join operation on the name column (provided that they do not match), so that a period in which an attendee other than the speaker nods is a tuple lifetime (
Example: Relation 1312) is obtained. In the relation, a period in which there are two or more surviving tuples (= two or more attendees listening while nodding) is extracted by the HAVING phrase. At this time, the name of the speaker (speech.name)
Column) and a flag column having the value of the constant character string 'yes' is output (example: relation 1313). The result is streamed with IStream and the query 1
The result of 310 is obtained (example: stream 1311). The stream 1311 shows a state in which a tuple is generated at the timing when the speech of the speaker B is nominated by the other attendees C and D.

クエリ１３１０によって、重要発言の発生を抽出する一方、発言の内容は、ストリーム
１５０５（ｓｔａｔｅｍｅｎｔ）としてＰＣ１０から入力される。発言内容は議事録係の
キーストロークから抽出されるため、音声解析と加速度解析から自動抽出した重要発言の
発生タイミングに対し、数十秒遅れて入力されることになる。これに対し、クエリ１３２
０、およびクエリ１３３０は、ある発言者の重要発言が検出された後、最初に入力された
該発言者の発言内容に、重要発言のフラグを立てる処理である。 While the occurrence of an important utterance is extracted by the query 1310, the content of the utterance is input from the PC 10 as a stream 1505 (statement). Since the content of the utterance is extracted from the keystrokes of the minutes clerk, the input is delayed by several tens of seconds with respect to the generation timing of the important utterance automatically extracted from the voice analysis and the acceleration analysis. On the other hand, the query 132
0 and the query 1330 are processes for setting an important utterance flag to the utterance content of the utterer first input after an important utterance of a certain utterer is detected.

クエリ１３２０は、発言者ごとに、発言重要度を表すフラグを保持するトグルスイッチ
の役目を果たす。該クエリの結果リレーションａｃｃｅｐｔａｎｃｅ＿ｔｏｇｇｌｅは、
次にストリーム１５０５（ｓｔａｔｅｍｅｎｔ）から入力される発言内容が、重要発言と
なるか否かを、発言者ごとに表している（例：リレーション１３２１）。ｎａｍｅカラム
は発言者名を示し、ｆｌａｇカラムは、’ｙｅｓ’／’ｎｏ’によって重要性を示してい
る。クエリ１３３０は、ストリーム１５０５をＮＯＷウィンドウでリレーション化した結
果と、クエリ１３２０の結果リレーションを、ｎａｍｅカラムに関する結合演算で処理し
、発言内容に重要性の指標を付加して出力する（例：ストリーム１３３１）。 The query 1320 serves as a toggle switch that holds a flag representing the degree of importance of speech for each speaker. The result relation accept_toggle of the query is
Next, it is expressed for each speaker whether or not the content of the message input from the stream 1505 (statement) is an important message (example: relation 1321). The name column indicates a speaker name, and the flag column indicates importance by “yes” / “no”. The query 1330 processes the relation of the stream 1505 with the NOW window and the result relation of the query 1320 by a join operation related to the name column, and adds the importance index to the utterance content and outputs the result (for example, the stream 1331). ).

クエリ１３２０では、まず、ストリーム１５０５から発言内容の入力があった際に、そ
の発言者に関する重要度のフラグを’ｎｏ’にクリアするタプルを生成する。但し、該タ
プルのタイムスタンプは、元となる発言内容タプルのタイムスタンプから、若干時刻を遅
らせる。この処理を、“ＤＳＴＲＥＡＭ（ｓｔａｔａｍｅｎｔ［ＲＡＮＧＥ１ｍｓｅ
ｃ］）”の記述によって定義している。例として、ｓｔａｔｅｍｅｎｔストリーム１３０
３上のストリームタプル１３０４が入力されると、そこから１ｍｓｅｃ分タイムスタン
プのずれたストリームタプル１３２４が、中間状態ストリーム１３２２上に発生する。こ
のような’ｎｏ’タプルのストリームと、クエリ１３１０の結果を、和集合演算“ＵＮＩ
ＯＮＡＬＬ”でマージする。例として、該ストリーム１３２２と、ストリーム１３１１
のマージ結果が、ストリーム１３２３となる。このストリームを、ウィンドウ演算“ＰＡ
ＲＴＩＴＩＯＮＢＹｎａｍｅＲＯＷＳ１］”でリレーション化する。このウィン
ドウ演算は、ｎａｍｅカラムの値に基づいて分けた各グループを、同時生存数１個の個数
ウィンドウでリレーション化する。これにより、各発言者別に、重要度’ｙｅｓ’か’ｎ
ｏ’どちらか一方のフラグが立つことになる。例として、ストリーム１３２３をリレーシ
ョン化した結果が、リレーション１３２１となる。ここで、’ｎｏ’タプルのタイムスタ
ンプを若干ずらす理由は、クエリ１３３０において、’ｎｏ’タプルと、その元となるｓ
ｔａｔｅｍｅｎｔタプル自身が、結合するのを避けるためである。以上で、アクティビテ
ィデータ生成１００Ｃの処理が完了する。 In the query 1320, first, when a comment content is input from the stream 1505, a tuple that clears the importance level flag for the speaker to “no” is generated. However, the time stamp of the tuple is slightly delayed from the time stamp of the original message content tuple. This process is referred to as “DSTREAM (statement [RANGE 1 mse
c]) ”. As an example, the statement stream 130 is defined.
When a stream tuple 1304 on 3 is input, a stream tuple 1324 whose time stamp is shifted by 1 msec is generated on the intermediate state stream 1322. Such a stream of 'no' tuples and the result of the query 1310 are converted into a union operation “UNI”.
ON ALL ”. As an example, the stream 1322 and the stream 1311 are merged.
The merge result is a stream 1323. This stream is converted into a window operation “PA
RITION BY name ROWS 1] ”. This window operation makes each group divided based on the value of the name column a relation by the number window of the number of simultaneous survivors. , Importance 'yes'or' n
o 'Either one of the flags will be set. As an example, the result of converting the stream 1323 into a relation is the relation 1321. Here, the reason for slightly shifting the time stamp of the 'no' tuple is that in the query 1330, the 'no' tuple and its s
This is to avoid the binding of the tuples themselves. Thus, the process of activity data generation 100C is completed.

続いて、アクティビティデータ生成１００Ｃによって得られたアクティビティデータに
基づいて、表示処理部２０３、即ち集計処理サーバ２００の処理部（ＣＰＵ）で実行され
る描画処理プログラムによって得られる画面イメージを図１６、１７を用いて説明する。 Subsequently, based on the activity data obtained by the activity data generation 100C, screen images obtained by a drawing processing program executed by the display processing unit 203, that is, the processing unit (CPU) of the totalization processing server 200 are shown in FIGS. Will be described.

図１６は、発言者の動きに基づいたアクティビティデータ１５２０を、動きの活性度３
１１Ｍとして、活性度・発言表示３１０Ａに反映した画面イメージである。本画面により
、会議内での活動について、単なる音声だけではなくメンバの行動面を併せて可視化する
ことができる。 FIG. 16 shows activity data 1520 based on the movement of the speaker,
11M is a screen image reflected in the activity / speech display 310A. With this screen, it is possible to visualize not only the voice but also the behavioral aspects of the members regarding the activities in the conference.

また、図１７は、うなずきによる発言の重要度を示すアクティビティデータ１５２１を
、重要発言指標３１１ａとして、活性度・発言表示３１０Ｂに反映した画面イメージであ
る。メンバの発言３１３と重要発言指標３１１ａとをリンクさせて表示することにより、
どの発言が参加メンバの納得感を得たものなのかを可視化することができる。このように
、本画面により、単なる音声だけではなく、メンバの納得度を併せて会議状況を可視化す
ることができる。 FIG. 17 is a screen image in which activity data 1521 indicating the importance level of utterances made by nodding is reflected as an important utterance index 311a in the activity / representation display 310B. By linking and displaying the member's utterance 313 and the important utterance index 311a,
It is possible to visualize which remarks are the satisfaction of the participating members. In this way, this screen makes it possible to visualize not only the voice but also the meeting status with the satisfaction of the members.

さて図１４は、図２で示した機能モジュールでの処理シーケンスの別の実施例を示した
ものである。本実施例における処理シーケンスでは、音声処理部４２において、特徴量デ
ータを取得した後、音声処理サーバ４０において、音声／非音声判別処理、スムージング
処理、及び音源選択処理を実行する。好適には、これらの処理も、音声処理サーバ４０の
図示されない処理部（ＣＰＵ）のプログラム処理として実行される。 FIG. 14 shows another embodiment of the processing sequence in the functional module shown in FIG. In the processing sequence in the present embodiment, after the feature amount data is acquired in the voice processing unit 42, the voice processing server 40 executes voice / non-voice discrimination processing, smoothing processing, and sound source selection processing. Preferably, these processes are also executed as a program process of a processing unit (CPU) (not shown) of the voice processing server 40.

図１４において、図２同様、センサ（マイク）２０では音声データが取得される（２０
Ａ）。次に、サウンドボード４１にて、音声のサンプリング処理が行なわれる（４１Ａ）
。次に音声処理部４２にて、特徴量の抽出（エネルギーへの変換）が行なわれる（４２Ａ
）。エネルギーは数ミリ秒の音波形の絶対値の２乗を全範囲に渡って積分したものである
。 In FIG. 14, as in FIG. 2, the sensor (microphone) 20 acquires audio data (20
A). Next, a sound sampling process is performed on the sound board 41 (41A).
. Next, the voice processing unit 42 extracts feature values (converts into energy) (42A).
). The energy is the integral of the square of the absolute value of a sound waveform of several milliseconds over the entire range.

本実施例においては、音声処理サーバ４０の音声処理４２として、特徴量抽出（４２Ａ
）から取得した特徴量データをもとに、音声／非音声の識別を行なう（４２Ｂ）。音声／
非音声の識別方法として、数秒時間におけるエネルギーの変化度合いによる識別があげら
れる。音声には特有の音波形エネルギーの強弱とその変化パターンがあり、それらを用い
ることで音声と非音声の識別を行なう。 In the present embodiment, feature amount extraction (42A) is performed as the speech processing 42 of the speech processing server 40.
The voice / non-voice is identified on the basis of the feature data acquired from (). voice/
Non-speech identification methods include identification based on the degree of energy change in a few seconds. Voice has its own sound wave energy intensity and its change pattern, and these are used to distinguish voice and non-voice.

また、数秒単位の音声/非音声識別結果をそのまま用いると、数１０秒からなる意味の
かたまりとしての1発話単位の区間を求めることが難しい。そこで、スムージング処理（
４２Ｃ）を導入することにより、１発話単位の区間を求め，これを音源選択に使用する。 Further, if the voice / non-voice identification result in units of several seconds is used as it is, it is difficult to obtain a section of one utterance unit as a group of meanings consisting of several tens of seconds. Therefore, smoothing processing (
42C), a section of one utterance unit is obtained and used for sound source selection.

上述の部分は音声処理４２で、センサ（マイク）２０毎に行なう処理であり、最終的に
どのセンサ（マイク）２０から音声が入力されたかを判断する必要がある。そこで本実施
例においては、音声処理４２において、スムージング処理（４２Ｃ）に引続き音源選択４
２Ｄを行ない、センサ（マイク）２０の中から実際に発話されたセンサ（マイク）２０を
選択する。一番近くのセンサ（マイク）２０に届く音声は、その他のセンサ（マイク）２
０より音声と判断される区間が長い。よって、本実施例においては、それぞれのセンサ（
マイク）２０のスムージング処理４２Ｃの結果から一番長かったセンサ（マイク）２０を
音源選択４２Ｄの出力とした。次に、ストリームデータ処理部１００にて、アクティビテ
ィデータ生成（１００Ｃ）が行なわれ、最後に、表示処理部２０３にて、アクティビティ
データＡＤに基づいた、画面データ生成（２０３Ａ）が行なわれることは先に説明した通
りである。 The above-described portion is the audio processing 42, which is performed for each sensor (microphone) 20. It is necessary to finally determine from which sensor (microphone) 20 the audio is input. Therefore, in the present embodiment, in the audio processing 42, the sound source selection 4 is performed following the smoothing processing (42C).
2D is performed, and the actually spoken sensor (microphone) 20 is selected from the sensors (microphones) 20. The sound that reaches the nearest sensor (microphone) 20 is the other sensor (microphone) 2.
The section determined to be voice from 0 is longer. Therefore, in this embodiment, each sensor (
The sensor (microphone) 20 that was the longest from the result of the smoothing process 42C of the microphone 20 was used as the output of the sound source selection 42D. Next, the stream data processing unit 100 performs activity data generation (100C), and finally the display processing unit 203 first performs screen data generation (203A) based on the activity data AD. As explained in

１０…ＰＣ、２０…センサ（マイク）、３０…会議卓、４０…音声処理サーバ、１００
…ストリームデータ処理部、２００…集計処理サーバ、３００…モニタ画面、３１０…会
議活性度・発言内容表示、３２０…発言累積表示、３３０…発言シーケンス表示。 10 ... PC, 20 ... sensor (microphone), 30 ... conference table, 40 ... voice processing server, 100
... Stream data processing unit, 200 ... Total processing server, 300 ... Monitor screen, 310 ... Conference activity / speech content display, 320 ... Sentence accumulation display, 330 ... Sentence sequence display.

Claims

複数の参加者間での対話状況を分析する会議分析システムであって、
前記参加者が装着することで、前記参加者の身体の動きの大きさを加速度センサによって検出して記録する名札型センサノードと、
対話内容を手入力によって記録するためのキーストローク情報入力ユニットと、
前記身体の動きのデータと前記対話内容との対応付けを行うデータストリーム処理ユニットと、を備え、
前記データストリーム処理ユニットは、時間毎に、前記身体の動きのデータによって前記対話内容の重要性を判定する
ことを特徴とする会議分析システム。 A conference analysis system that analyzes the state of dialogue between multiple participants,
A name tag type sensor node that detects and records the magnitude of the movement of the participant's body by an acceleration sensor when worn by the participant,
A keystroke information input unit for recording dialogue contents manually,
A data stream processing unit for associating the body movement data with the conversation content,
The conference analysis system according to claim 1, wherein the data stream processing unit determines the importance of the dialogue content from the data of the body movement at every time.

請求項１に記載の会議分析システムであって、
前記データストリーム処理ユニットは、前記身体の動きのデータより、前記参加者のうなずきの有無を推定し、推定された前記うなずきの量によって、前記対話内容の重要性を判定する
ことを特徴とする会議分析システム。 The conference analysis system according to claim 1,
The data stream processing unit estimates presence / absence of the participant's nodding from the body movement data, and determines the importance of the dialogue content based on the estimated amount of nodding. Analysis system.

請求項1に記載の会議分析システムであって、
前記参加者に対応する複数の音声データを取得する音声データ取得ユニットを更に備え、
前記データストリーム処理ユニットは、時間毎に、前記身体の動きのデータによって、前記音声データの重要性を判定する
ことを特徴とする会議分析システム。 The conference analysis system according to claim 1,
An audio data acquisition unit for acquiring a plurality of audio data corresponding to the participants;
The conference analysis system according to claim 1, wherein the data stream processing unit determines the importance of the audio data based on the data of the body movement every time.

請求項３に記載の会議分析システムであって、
前記データストリーム処理ユニットは、前記身体の動きのデータより、前記参加者のうなずきの有無を推定し、推定された前記うなずきの量によって、前記音声データ、および前記対話内容の重要性を判定する
ことを特徴とする会議分析システム。 The conference analysis system according to claim 3,
The data stream processing unit estimates the presence or absence of the participant's nodding from the body movement data, and determines the importance of the voice data and the content of the dialogue based on the estimated amount of nodding. Conference analysis system characterized by