JP6390607B2

JP6390607B2 - Program and remote conference method

Info

Publication number: JP6390607B2
Application number: JP2015250701A
Authority: JP
Inventors: 真利杉浦
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-12-22
Filing date: 2015-12-22
Publication date: 2018-09-19
Anticipated expiration: 2035-12-22
Also published as: JP2017118281A

Description

本発明は、第一端末装置と第二端末装置によるネットワークを介した遠隔会議において第一端末装置を制御するコンピュータに所定の処理を実行させるプログラムと、サーバ装置を制御するコンピュータに所定の処理を実行させるプログラムと、遠隔会議方法に関する。 The present invention relates to a program for causing a computer that controls a first terminal device to execute a predetermined process in a remote conference via a network between the first terminal device and the second terminal device, and a predetermined process for a computer that controls a server device. The present invention relates to a program to be executed and a remote conference method.

ネットワークを介した遠隔会議に関する技術が提案されている。例えば、特許文献１には、通信会議システムに関する技術が開示されている。通信会議システムでは、次のような機能が実現される。即ち、音声の発言内容が文字データに変換され、話者が識別される。モニタ画面上に、文字データが表示される。文字データは、所望の時間遅延されてモニタ画面に表示される。 Techniques related to remote conferencing via a network have been proposed. For example, Patent Document 1 discloses a technique related to a communication conference system. In the teleconference system, the following functions are realized. That is, the speech content is converted into character data, and the speaker is identified. Character data is displayed on the monitor screen. The character data is displayed on the monitor screen after a desired time delay.

特開２００８−１６０６６７号公報JP 2008-160667 A

複数の端末装置による遠隔会議の開催中に、他の端末装置のユーザの発言を聞き逃すことがある。特許文献１に開示の技術では、ユーザの発言を文字データに変換し、これに基づき文字を表示することで、聞き逃した発言の内容を認識することができる。但し、そのためには、文字を読む必要がある。遠隔会議では、例えば、各端末装置で撮影された映像データに対応する撮影画像及び／又は各端末装置で共有される資料画像の確認が必要になることがある。このような場合、撮影画像及び／又は資料画像を確認しつつ、更に、文字を読み、聞き逃した内容を認識することは、困難な作業であると思われる。発言を文字データとするためには、音声認識の処理が必要である。従って、映像データを用いる遠隔会議では、端末装置での処理負荷が高くなることもある。 While a teleconference is being held by a plurality of terminal devices, the user's speech of other terminal devices may be missed. In the technique disclosed in Patent Document 1, the content of the missed speech can be recognized by converting the user's speech into character data and displaying the character based on this. However, for that purpose, it is necessary to read characters. In a remote conference, for example, it may be necessary to check a captured image corresponding to video data captured by each terminal device and / or a material image shared by each terminal device. In such a case, it may be difficult to read the characters and recognize the missed contents while confirming the captured image and / or the material image. In order to make the utterance text data, speech recognition processing is required. Therefore, in a remote conference using video data, the processing load on the terminal device may increase.

本発明は、端末装置のユーザが聞き逃した他のユーザの発言内容を、遠隔会議の進行中に端末装置のユーザに認識させることが可能な、遠隔会議用のプログラムと遠隔会議方法を提供することを目的とする。 The present invention provides a remote conference program and a remote conference method that allow a user of a terminal device to recognize the content of another user's message that the user of the terminal device missed while the remote conference is in progress. For the purpose.

本発明の一側面は、ネットワークに接続された第一端末装置を制御するコンピュータに、前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、を実行させるプログラムである。 One aspect of the present invention provides a computer that controls a first terminal device connected to a network, a remote conference via the network by the first terminal device and a second terminal device connected to the network. A first acquisition process for acquiring first sound data corresponding to at least the first sound of the first timing collected by the second terminal device, transmitted from the server device to be executed, and collected by the second terminal device; A first transmission process for transmitting, to the server device, a first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the sounded first timing; A second acquisition process for acquiring the second sound data transmitted from the apparatus; a first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data; Play the serial first composite data, a program for executing a reproducing process of outputting the first synthesized sound reproduced.

このプログラムによれば、第二端末装置で集音された第一タイミングの第一音と第二タイミングの第二音を含む第一合成音を出力させることができる。第一端末装置のユーザに、第一音を聞き取らせつつ、第二音を聞き取らせることが可能となる。 According to this program, the first synthesized sound including the first sound of the first timing and the second sound of the second timing collected by the second terminal device can be output. The user of the first terminal device can hear the second sound while listening to the first sound.

本発明の他の側面は、ネットワークに接続されたサーバ装置を制御するコンピュータに、前記ネットワークに接続された第一端末装置と、前記ネットワークに接続された第二端末装置と、前記ネットワークに接続された第三端末装置と、による前記ネットワークを介した遠隔会議において、前記第二端末装置から送信される、前記第二端末装置で集音された第四音に対応する第四音データを取得する第四取得処理と、取得された前記第四音データを記憶部に記憶させる第一記憶処理と、前記遠隔会議において、前記第三端末装置から送信される、前記第三端末装置で集音された第五音に対応する第五音データを取得する第五取得処理と、第一タイミングの前記第四音に対応する前記第四音データと、前記第一タイミングの前記第五音に対応する前記第五音データと、を合成した第二合成データを生成する第二ミキシング処理と、前記第二合成データを、前記第一端末装置へと送信させる第三送信処理と、前記第一端末装置から送信される、前記第一タイミングより前の第二タイミングの前記第四音に対応する前記第四音データの送信を要求する第二リプレイ要求を取得する第六取得処理と、前記第二リプレイ要求が取得された場合、前記第二タイミングの前記第四音に対応する前記第四音データを、前記第一端末装置に送信させる第四送信処理と、を実行させるプログラムである。 Another aspect of the present invention provides a computer that controls a server device connected to a network, a first terminal device connected to the network, a second terminal device connected to the network, and a computer connected to the network. The fourth sound data corresponding to the fourth sound collected by the second terminal device transmitted from the second terminal device in a remote conference via the network by the third terminal device. The fourth acquisition process, the first storage process for storing the acquired fourth sound data in the storage unit, and the third terminal apparatus, which is transmitted from the third terminal apparatus, is collected in the remote conference. A fifth acquisition process for acquiring fifth sound data corresponding to the fifth sound, the fourth sound data corresponding to the fourth sound at the first timing, and the fifth sound at the first timing. A second mixing process for generating second synthesized data obtained by synthesizing the fifth sound data; a third transmission process for transmitting the second synthesized data to the first terminal apparatus; and the first terminal apparatus. A sixth acquisition process for acquiring a second replay request for requesting transmission of the fourth sound data corresponding to the fourth sound at a second timing before the first timing, and the second replay When a request is acquired, a fourth transmission process for causing the first terminal device to transmit the fourth sound data corresponding to the fourth sound at the second timing is a program.

このプログラムによれば、第二端末装置で集音された第一タイミングの第四音に対応する第四音データと、第三端末装置で集音された第一タイミングの第五音に対応する第五音データによる第二合成データを、第一端末装置に送信させ、第二リプレイ要求に応じて、第二タイミングの第四音に対応する第四音データを、第一端末装置に送信させることができる。第一端末装置のユーザに、第一タイミングの第四音及び第五音を聞き取らせつつ、第二タイミングの第四音を聞き取らせることが可能となる。 According to this program, the fourth sound data corresponding to the fourth sound of the first timing collected by the second terminal device and the fifth sound of the first timing collected by the third terminal device The second synthesized data based on the fifth sound data is transmitted to the first terminal device, and the fourth sound data corresponding to the fourth sound at the second timing is transmitted to the first terminal device in response to the second replay request. be able to. The user of the first terminal device can hear the fourth sound at the second timing while listening to the fourth sound and the fifth sound at the first timing.

本発明の更に他の側面は、ネットワークに接続された、第一端末装置と第二端末装置と第三端末装置とサーバ装置と、を含む遠隔会議システムで実行される遠隔会議方法であって、前記第一端末装置と前記第二端末装置と前記第三端末装置とによる前記ネットワークを介した遠隔会議において、前記第二端末装置から送信される、前記第二端末装置で集音された第六音に対応する第六音データを取得する工程と、取得された前記第六音データを記憶する工程と、前記遠隔会議において、前記第三端末装置から送信される、前記第三端末装置で集音された第七音に対応する第七音データを取得する工程と、第一タイミングの前記第六音に対応する前記第六音データと、前記第一タイミングの前記第七音に対応する前記第七音データと、を合成した第三合成データを生成する工程と、前記第三合成データを再生し、再生された第二合成音を出力する工程と、前記第一タイミングより前の第二タイミングの前記第六音の再生を要求するリプレイ要求を取得する工程と、前記リプレイ要求が取得された場合、前記第三合成データと、前記第二タイミングの前記第六音に対応する前記第六音データと、を合成した第四合成データを生成する工程と、前記第四合成データを再生し、再生された第三合成音を出力する工程と、を含む遠隔会議方法である。 Still another aspect of the present invention is a remote conference method executed in a remote conference system including a first terminal device, a second terminal device, a third terminal device, and a server device connected to a network, In a remote conference via the network by the first terminal device, the second terminal device, and the third terminal device, a sixth sound collected from the second terminal device and transmitted from the second terminal device Acquiring the sixth sound data corresponding to the sound, storing the acquired sixth sound data, and collecting the third sound data transmitted from the third terminal device in the remote conference. Obtaining seventh sound data corresponding to the sounded seventh sound; the sixth sound data corresponding to the sixth sound at the first timing; and the seventh sound data corresponding to the seventh sound at the first timing. Synthesize the seventh sound data Generating a third synthesized data; reproducing the third synthesized data; outputting the reproduced second synthesized sound; and reproducing the sixth sound at a second timing before the first timing. A step of obtaining a replay request to be requested; and a fourth that is obtained by synthesizing the third synthesized data and the sixth sound data corresponding to the sixth sound at the second timing when the replay request is obtained. A teleconferencing method comprising: generating synthesized data; and reproducing the fourth synthesized data and outputting the reproduced third synthesized sound.

この遠隔会議方法によれば、第二端末装置で集音された第一タイミングの第六音と、第三端末装置で集音された第一タイミングの第七音と、第二端末装置で集音された第二タイミングの第六音を含む第三合成音を出力させることができる。ユーザに、第一タイミングの第六音及び第七音を聞き取らせつつ、第二タイミングの第六音を聞き取らせることが可能となる。 According to this remote conference method, the sixth sound of the first timing collected by the second terminal device, the seventh sound of the first timing collected by the third terminal device, and the seventh sound collected by the second terminal device are collected. It is possible to output a third synthesized sound including the sixth sound having the second timing. The user can hear the sixth sound at the second timing while listening to the sixth sound and the seventh sound at the first timing.

本発明によれば、端末装置のユーザが聞き逃した他のユーザの発言内容を、遠隔会議の進行中に端末装置のユーザに認識させることが可能な、遠隔会議用のプログラムと遠隔会議方法を得ることができる。 According to the present invention, there is provided a remote conference program and a remote conference method capable of causing a user of a terminal device to recognize the content of another user's message that the user of the terminal device missed while the remote conference is in progress. Can be obtained.

遠隔会議システムの一例を示す図である。It is a figure which shows an example of a remote conference system. 遠隔会議画面の一例を示す図である。It is a figure which shows an example of a remote conference screen. 音処理のフローチャートである。It is a flowchart of sound processing. 変換処理のフローチャートである。It is a flowchart of a conversion process. ミキシング処理のフローチャートである。It is a flowchart of a mixing process. 中継処理のフローチャートである。It is a flowchart of a relay process.

本発明を実施するための実施形態について、図面を用いて説明する。本発明は、以下に記載の構成に限定されるものではなく、同一の技術的思想において種々の構成を採用することができる。例えば、以下に示す構成の一部は、省略し又は他の構成等に置換してもよい。他の構成を含むようにしてもよい。 Embodiments for carrying out the present invention will be described with reference to the drawings. The present invention is not limited to the configurations described below, and various configurations can be employed in the same technical idea. For example, some of the configurations shown below may be omitted or replaced with other configurations. Other configurations may be included.

＜遠隔会議システム＞
遠隔会議システム１０の概略について、図１を参照して説明する。遠隔会議システム１０は、図１に示すように、サーバ装置２０と、端末装置を含む。図１では、３台の端末装置が図示されている。端末装置の台数は、実施される遠隔会議のシチュエーションに応じて変化する。実施形態では、３台の端末装置による遠隔会議を例として説明する。図１に図示された３台の端末装置のそれぞれを、端末装置４１，４２，４３という。端末装置４１，４２，４３は、同一の遠隔会議に出席しているユーザによってそれぞれ操作される。サーバ装置２０と端末装置４１，４２，４３は、ネットワーク９０に接続されている。ネットワーク９０は、例えば、インターネット又はローカルエリアネットワーク（ＬＡＮ）のようなネットワークである。遠隔会議システム１０では、端末装置４１，４２，４３による遠隔会議は、サーバ装置２０と端末装置４１，４２，４３のそれぞれとの間でセッションが確立された状態で行われる。 <Remote conference system>
An outline of the remote conference system 10 will be described with reference to FIG. As shown in FIG. 1, the remote conference system 10 includes a server device 20 and a terminal device. In FIG. 1, three terminal devices are shown. The number of terminal devices varies depending on the situation of the remote conference to be performed. In the embodiment, a remote conference using three terminal devices will be described as an example. The three terminal devices shown in FIG. 1 are referred to as terminal devices 41, 42, and 43, respectively. The terminal devices 41, 42, and 43 are respectively operated by users who are attending the same remote conference. Server device 20 and terminal devices 41, 42, 43 are connected to network 90. The network 90 is, for example, a network such as the Internet or a local area network (LAN). In the remote conference system 10, the remote conference by the terminal devices 41, 42, and 43 is performed in a state where a session is established between the server device 20 and each of the terminal devices 41, 42, and 43.

サーバ装置２０は、端末装置４１，４２，４３の間を送受信されるデータを中継する。端末装置４１，４２，４３の間を送受信されるデータとしては、映像データと音データが例示される。映像データは、端末装置４１，４２，４３のそれぞれで撮影された撮影画像６１（後述する図２参照）に対応するデータである。音データは、端末装置４１，４２，４３のそれぞれで集音された音に対応するデータである。サーバ装置２０は、端末装置４１，４２，４３からそれぞれ送信された映像データを受信し、受信された各映像データを、その映像データの送信元の端末装置を除く端末装置に送信する。例えば、サーバ装置２０は、端末装置４２から送信された映像データを、端末装置４２を除く端末装置４１，４３に送信する。但し、サーバ装置２０は、端末装置４１，４２，４３からそれぞれ送信された全ての映像データを、全ての端末装置４１，４２，４３に送信するようにしてもよい。 The server device 20 relays data transmitted and received between the terminal devices 41, 42, and 43. As data transmitted / received between the terminal devices 41, 42, 43, video data and sound data are exemplified. The video data is data corresponding to a captured image 61 (see FIG. 2 described later) captured by each of the terminal devices 41, 42, and 43. The sound data is data corresponding to the sound collected by each of the terminal devices 41, 42, and 43. The server device 20 receives the video data transmitted from each of the terminal devices 41, 42, and 43, and transmits each received video data to a terminal device other than the terminal device that transmitted the video data. For example, the server device 20 transmits the video data transmitted from the terminal device 42 to the terminal devices 41 and 43 excluding the terminal device 42. However, the server device 20 may transmit all the video data transmitted from the terminal devices 41, 42, and 43 to all the terminal devices 41, 42, and 43, respectively.

サーバ装置２０は、端末装置４１，４２，４３からそれぞれ送信された音データを受信し、受信された複数の音データを合成する。実施形態では、サーバ装置２０で、複数の音データを合成して生成された音データを、「合成データ」という。例えば、端末装置４２，４３からの音データを端末装置４１に送信する場合、サーバ装置２０は、端末装置４２，４３からの音データを合成し、前述の各音データを含む合成データを生成する。その後、サーバ装置２０は、生成された合成データを、端末装置４１に送信する。つまり、サーバ装置２０は、端末装置４１，４２，４３のうち、合成データの送信先となる端末装置からの音データを除く複数の音データを合成し、生成された合成データを、前述の送信先の端末装置に送信する。これに伴い、ユーザが自身が発した声を直接聞き取りつつ、自装置から出力される自身の声と同じ音が、重複して聞き取れるといった、事態の発生を防止することができる。サーバ装置２０から端末装置４１，４２，４３への映像データと合成データの送信は、公知の遠隔会議システムと同様、ストリーミング方式によって行われる。 The server device 20 receives the sound data transmitted from the terminal devices 41, 42, and 43, and synthesizes the plurality of received sound data. In the embodiment, sound data generated by combining a plurality of sound data in the server device 20 is referred to as “synthesized data”. For example, when the sound data from the terminal devices 42 and 43 is transmitted to the terminal device 41, the server device 20 synthesizes the sound data from the terminal devices 42 and 43, and generates synthesized data including the above-described sound data. . Thereafter, the server device 20 transmits the generated composite data to the terminal device 41. That is, the server device 20 synthesizes a plurality of sound data except the sound data from the terminal device that is the destination of the synthesized data among the terminal devices 41, 42, and 43, and the generated synthesized data is transmitted as described above. Transmit to the previous terminal device. Along with this, it is possible to prevent the occurrence of a situation in which the user can listen directly to the voice he / she makes and the same sound as his / her voice output from his / her device can be heard in duplicate. Transmission of video data and composite data from the server device 20 to the terminal devices 41, 42, and 43 is performed by a streaming method, as in a known remote conference system.

この他、サーバ装置２０は、資料データを中継する。資料データは、遠隔会議で用いられる資料画像６２（後述する図２参照）に対応するデータである。遠隔会議システム１０では、公知の遠隔会議システムと同様、端末装置４１，４２，４３の間で同一の資料データに対応する資料画像６２が共有された状態で、遠隔会議を進行することができる。サーバ装置２０は、端末装置４１，４２，４３のうちの何れか１台の端末装置から送信された資料データを、他の２台の端末装置に送信する。端末装置４１，４２，４３では、資料データに対応する資料画像６２が表示される。 In addition, the server device 20 relays material data. The material data is data corresponding to a material image 62 (see FIG. 2 described later) used in the remote conference. In the remote conference system 10, as in the known remote conference system, the remote conference can proceed with the document images 62 corresponding to the same document data shared between the terminal devices 41, 42, and 43. The server device 20 transmits the material data transmitted from any one of the terminal devices 41, 42, and 43 to the other two terminal devices. In the terminal devices 41, 42, and 43, a material image 62 corresponding to the material data is displayed.

サーバ装置２０を介して実行される遠隔会議は、会議ＩＤによって識別される。即ち、端末装置４１，４２，４３による遠隔会議は、会議ＩＤによって、サーバ装置２０を介して実行される他の遠隔会議と識別される。会議ＩＤによって識別される遠隔会議において、端末装置４１，４２，４３は、端末ＩＤにより識別される。実施形態では、端末装置４１，４２，４３の各端末ＩＤは、次の通りとする。端末装置４１の端末ＩＤは、「ＵＳＥＲ４１」とする。端末装置４２の端末ＩＤは、「ＵＳＥＲ４２」とする。端末装置４３の端末ＩＤは、「ＵＳＥＲ４３」とする。 The remote conference executed via the server device 20 is identified by the conference ID. That is, the remote conference by the terminal devices 41, 42, and 43 is distinguished from other remote conferences executed via the server device 20 by the conference ID. In the remote conference identified by the conference ID, the terminal devices 41, 42, and 43 are identified by the terminal ID. In the embodiment, the terminal IDs of the terminal devices 41, 42, and 43 are as follows. The terminal ID of the terminal device 41 is “USER41”. The terminal ID of the terminal device 42 is “USER42”. The terminal ID of the terminal device 43 is “USER43”.

遠隔会議システム１０では、所定のユーザによって、事前に遠隔会議の開催が予約される。予約は、サーバ装置２０によって提供される所定のサイトに登録される。予約の際、例えば、遠隔会議の開催を、出席者に通知することができる。この通知は、例えば、電子メールを用いて行われる。即ち、電子メールの本文には、遠隔会議の開催日時と、サーバ装置２０のＵＲＬが含められる。サーバ装置２０のＵＲＬには、例えば、予約された遠隔会議を識別する会議ＩＤに関する情報が含められる。 In the remote conference system 10, a predetermined user is scheduled to hold a remote conference in advance. The reservation is registered in a predetermined site provided by the server device 20. At the time of reservation, for example, the attendee can be notified of the remote conference. This notification is performed using e-mail, for example. That is, the date and time of the remote conference and the URL of the server device 20 are included in the body of the email. The URL of the server device 20 includes, for example, information related to a conference ID that identifies a reserved remote conference.

＜サーバ装置＞
サーバ装置２０について、図１を参照して説明する。サーバ装置２０は、ＣＰＵ２１と、記憶装置２２と、ＲＡＭ２３と、通信部２４を備える。ＣＰＵ２１と記憶装置２２とＲＡＭ２３と通信部２４は、バス２５に接続される。ＣＰＵ２１は、演算処理を実行する。記憶装置２２は、コンピュータが読み取り可能な記憶媒体である。記憶装置２２としては、ハードディスク及び／又はフラッシュメモリが例示される。この他、記憶装置２２は、ＲＯＭを含むものであってもよい。記憶装置２２には、各種のプログラムが記憶される。例えば、ＯＳ（Operating System）と各種のアプリケーションが記憶装置２２に記憶される。記憶装置２２に記憶されるアプリケーションには、サーバプログラムが含まれる。サーバプログラムは、所定のサーバ装置を、遠隔会議システム１０におけるサーバ装置２０として動作させる遠隔会議用のプログラムである。サーバプログラムには、後述する中継処理（図６参照）のプログラムが含まれる。 <Server device>
The server device 20 will be described with reference to FIG. The server device 20 includes a CPU 21, a storage device 22, a RAM 23, and a communication unit 24. The CPU 21, the storage device 22, the RAM 23, and the communication unit 24 are connected to the bus 25. CPU21 performs arithmetic processing. The storage device 22 is a computer-readable storage medium. Examples of the storage device 22 include a hard disk and / or a flash memory. In addition, the storage device 22 may include a ROM. Various programs are stored in the storage device 22. For example, an OS (Operating System) and various applications are stored in the storage device 22. The application stored in the storage device 22 includes a server program. The server program is a remote conference program for operating a predetermined server device as the server device 20 in the remote conference system 10. The server program includes a relay processing program (see FIG. 6) described later.

サーバプログラムは、記憶装置２２に事前にインストールされる。事前のインストールは、例えば、半導体メモリ等のコンピュータが読み取り可能な記憶媒体に記憶されたサーバプログラムが、サーバ装置２０の読取部（不図示）によって読み取られることで行われる。サーバ装置２０が、例えば、光学ドライブ（不図示）を備えている場合、事前のインストールは、光学メディアに記憶されたサーバプログラムが、光学ドライブによって読み取られることで行われるようにしてもよい。この他、事前のインストールは、ネットワーク９０に接続されるサーバ装置２０とは別のサーバ装置のハードディスク等のコンピュータが読み取り可能な記憶媒体に記憶されたサーバプログラムが、通信部２４で伝送信号として受信されることで行われるようにしてもよい。記憶装置２２へのサーバプログラムのインストールについて、上述した何れの形態とするかは、諸事情を考慮して適宜決定される。コンピュータが読み取り可能な記憶媒体は、一時的な記憶媒体（例えば、伝送信号）を含まない、非一時的な記憶媒体であってもよい。非一時的な記憶媒体は、情報を記憶する期間に関わらず、情報を記憶可能な記憶媒体であればよい。 The server program is installed in the storage device 22 in advance. The prior installation is performed, for example, by a server program stored in a computer-readable storage medium such as a semiconductor memory being read by a reading unit (not shown) of the server device 20. For example, when the server device 20 includes an optical drive (not shown), the prior installation may be performed by a server program stored in the optical medium being read by the optical drive. In addition to this, in advance installation, a server program stored in a computer-readable storage medium such as a hard disk of a server device different from the server device 20 connected to the network 90 is received as a transmission signal by the communication unit 24. It may be made to be performed. The installation of the server program in the storage device 22 is determined as appropriate in consideration of various circumstances. The computer-readable storage medium may be a non-transitory storage medium that does not include a temporary storage medium (for example, a transmission signal). The non-temporary storage medium may be any storage medium that can store information regardless of the period in which the information is stored.

ＲＡＭ２３は、ＣＰＵ２１が各種のプログラムを実行する際に利用される記憶領域となる。ＲＡＭ２３には、処理の実行途中に、処理で利用される所定のデータ及び情報が所定の領域に記憶される。例えば、接続リストがＲＡＭ２３に記憶される。但し、接続リストは、記憶装置２２に記憶されるようにしてもよい。接続リストには、例えば、会議ＩＤによって識別される遠隔会議において、セッションが確立されている端末装置４１，４２，４３の端末ＩＤが登録されている。サーバ装置２０では、ＣＰＵ２１が記憶装置２２に記憶されたＯＳと中継処理のプログラムを含むサーバプログラムを実行する等して、サーバ装置２０を制御する。これにより、サーバ装置２０では、各種の処理が実行される。 The RAM 23 is a storage area used when the CPU 21 executes various programs. The RAM 23 stores predetermined data and information used in the processing in a predetermined area during the execution of the processing. For example, the connection list is stored in the RAM 23. However, the connection list may be stored in the storage device 22. In the connection list, for example, the terminal IDs of the terminal devices 41, 42, and 43 in which sessions are established in the remote conference identified by the conference ID are registered. In the server device 20, the CPU 21 controls the server device 20 by executing a server program including an OS and a relay processing program stored in the storage device 22. Thereby, in the server apparatus 20, various processes are performed.

通信部２４は、サーバ装置２０をネットワーク９０に接続し、ネットワーク９０を介したデータ通信を行う。サーバ装置２０では、通信部２４を介して、端末装置４１，４２，４３のそれぞれとの間で各種のデータが送受信される。通信部２４は、例えば、イーサネット（登録商標）規格に適合するインターフェース回路である。通信部２４によるネットワーク９０への接続は、有線接続とされる。但し、通信部２４によるネットワーク９０への接続は、無線接続であってもよい。 The communication unit 24 connects the server device 20 to the network 90 and performs data communication via the network 90. In the server device 20, various data are transmitted / received to / from each of the terminal devices 41, 42, and 43 via the communication unit 24. The communication unit 24 is an interface circuit conforming to the Ethernet (registered trademark) standard, for example. The connection to the network 90 by the communication unit 24 is a wired connection. However, the connection to the network 90 by the communication unit 24 may be a wireless connection.

サーバ装置２０は、中継処理のプログラムを含むサーバプログラムが記憶装置２２に記憶されている点が、公知のサーバ装置と相違する。但し、サーバ装置２０は、ハードウェア的には、公知のサーバ装置と同じである。従って、サーバ装置２０は、更に、公知のサーバ装置が備える構成を備える。 The server device 20 is different from a known server device in that a server program including a relay processing program is stored in the storage device 22. However, the server device 20 is the same as a known server device in hardware. Therefore, the server device 20 further includes a configuration included in a known server device.

＜端末装置＞
端末装置４１，４２，４３は、ネットワーク９０を介した通信機能を有する同一又は同種の端末装置である。例えば、端末装置４１，４２，４３は、パーソナルコンピュータ、スマートフォン又はタブレット端末である。実施形態では、端末装置４１，４２，４３として、パーソナルコンピュータが例示されている（図１参照）。端末装置４１，４２，４３の説明は、図１及び図２を参照しつつ、端末装置４１を例として説明する。この説明では、図２に示す遠隔会議画面６０は、端末装置４１で表示されている遠隔会議画面とする。端末装置４２，４３は、次に説明する端末装置４１が備える構成と同じ構成を備える。端末装置４２，４３では、図２に示す遠隔会議画面６０と同じ態様の遠隔会議画面が表示される。 <Terminal device>
The terminal devices 41, 42, and 43 are the same or similar terminal devices that have a communication function via the network 90. For example, the terminal devices 41, 42, and 43 are personal computers, smartphones, or tablet terminals. In the embodiment, personal computers are exemplified as the terminal devices 41, 42, and 43 (see FIG. 1). The terminal devices 41, 42, and 43 will be described by taking the terminal device 41 as an example with reference to FIG. 1 and FIG. In this description, the remote conference screen 60 shown in FIG. 2 is a remote conference screen displayed on the terminal device 41. The terminal devices 42 and 43 have the same configuration as that of the terminal device 41 described below. In the terminal devices 42 and 43, a remote conference screen having the same mode as the remote conference screen 60 shown in FIG. 2 is displayed.

端末装置４１は、ＣＰＵ４７と、記憶装置４８と、ＲＡＭ４９と、表示部５０と、操作部５１と、スピーカ５２と、通信部５３と、接続インターフェース５４を備える。ＣＰＵ４７と記憶装置４８とＲＡＭ４９と表示部５０と操作部５１とスピーカ５２と通信部５３と接続インターフェース５４は、バス５５に接続される。実施形態では、接続インターフェース５４を「接続Ｉ／Ｆ５４」と記載する。接続Ｉ／Ｆ５４には、カメラ５６とマイク５７が接続される。 The terminal device 41 includes a CPU 47, a storage device 48, a RAM 49, a display unit 50, an operation unit 51, a speaker 52, a communication unit 53, and a connection interface 54. The CPU 47, the storage device 48, the RAM 49, the display unit 50, the operation unit 51, the speaker 52, the communication unit 53, and the connection interface 54 are connected to the bus 55. In the embodiment, the connection interface 54 is described as “connection I / F 54”. A camera 56 and a microphone 57 are connected to the connection I / F 54.

ＣＰＵ４７は、演算処理を実行する。記憶装置４８は、コンピュータが読み取り可能な記憶媒体である。記憶装置４８としては、ハードディスク及び／又はフラッシュメモリが例示される。この他、記憶装置４８は、ＲＯＭを含むものであってもよい。記憶装置４８には、各種のプログラムが記憶される。例えば、ＯＳと各種のアプリケーションが記憶装置４８に記憶される。記憶装置４８に記憶されるアプリケーションには、クライアントプログラムが含まれる。クライアントプログラムは、所定の端末装置を、遠隔会議システム１０における端末装置４１，４２，４３として動作させる遠隔会議用のプログラムである。クライアントプログラムには、後述する音処理（図３参照）と変換処理（図４参照）とミキシング処理（図５参照）のプログラムが含まれる。 The CPU 47 executes arithmetic processing. The storage device 48 is a computer-readable storage medium. Examples of the storage device 48 include a hard disk and / or a flash memory. In addition, the storage device 48 may include a ROM. The storage device 48 stores various programs. For example, the OS and various applications are stored in the storage device 48. The application stored in the storage device 48 includes a client program. The client program is a remote conference program that causes a predetermined terminal device to operate as the terminal devices 41, 42, and 43 in the remote conference system 10. The client program includes programs for sound processing (see FIG. 3), conversion processing (see FIG. 4), and mixing processing (see FIG. 5) described later.

クライアントプログラムは、記憶装置４８に事前にインストールされていてもよい。事前のインストールは、例えば、半導体メモリ等のコンピュータが読み取り可能な記憶媒体に記憶されたクライアントプログラムが、端末装置４１の読取部（不図示）により読み取られることで行われる。端末装置４１が、例えば、光学ドライブ（不図示）を備えている場合、事前のインストールは、光学メディアに記憶されたプログラムが、光学ドライブにより読み取られることで行われるようにしてもよい。この他、事前のインストールは、ネットワーク９０を介して端末装置４１に接続されるサーバ装置（サーバ装置２０又は不図示のサーバ装置）のハードディスク等のコンピュータが読み取り可能な記憶媒体に記憶されたクライアントプログラムが、通信部５３で伝送信号として受信されることで行われるようにしてもよい。クライアントプログラムは、遠隔会議への接続に際し、端末装置４１がサーバ装置２０にアクセスした場合に、サーバ装置２０からネットワーク９０を介して伝送信号として端末装置４１に送信されてもよい。クライアントプログラムは、そのタイミングで、記憶装置４８にインストールされるようにしてもよい。この場合、クライアントプログラムは、ＲＡＭ４９に記憶されてもよい。記憶装置４８又はＲＡＭ４９へのクライアントプログラムのインストールについて、上述した何れの形態とするかは、諸事情を考慮して適宜決定される。コンピュータが読み取り可能な記憶媒体は、一時的な記憶媒体（例えば、伝送信号）を含まない、非一時的な記憶媒体であってもよい。非一時的な記憶媒体は、情報を記憶する期間に関わらず、情報を記憶可能な記憶媒体であればよい。 The client program may be installed in the storage device 48 in advance. The prior installation is performed by, for example, reading a client program stored in a computer-readable storage medium such as a semiconductor memory by a reading unit (not shown) of the terminal device 41. For example, when the terminal device 41 includes an optical drive (not shown), the prior installation may be performed by reading a program stored in the optical medium by the optical drive. In addition, the prior installation is a client program stored in a computer-readable storage medium such as a hard disk of a server device (server device 20 or a server device not shown) connected to the terminal device 41 via the network 90. However, it may be performed by being received as a transmission signal by the communication unit 53. The client program may be transmitted from the server device 20 to the terminal device 41 as a transmission signal via the network 90 when the terminal device 41 accesses the server device 20 when connecting to the remote conference. The client program may be installed in the storage device 48 at that timing. In this case, the client program may be stored in the RAM 49. The installation of the client program in the storage device 48 or the RAM 49 is determined as appropriate in consideration of various circumstances. The computer-readable storage medium may be a non-transitory storage medium that does not include a temporary storage medium (for example, a transmission signal). The non-temporary storage medium may be any storage medium that can store information regardless of the period in which the information is stored.

ＲＡＭ４９は、ＣＰＵ４７が各種のプログラムを実行する際に利用される記憶領域となる。ＲＡＭ４９には、処理の実行途中に、処理で利用される所定のデータ及び情報が所定の領域に記憶される。端末装置４１では、ＣＰＵ４７が記憶装置４８に記憶されたＯＳと音処理と変換処理とミキシング処理を含むクライアントプログラムを実行する等して、端末装置４１を制御する。これにより、端末装置４１では、各種の処理が実行される。 The RAM 49 is a storage area used when the CPU 47 executes various programs. The RAM 49 stores predetermined data and information used in the processing in a predetermined area during the execution of the processing. In the terminal device 41, the CPU 47 controls the terminal device 41 by executing a client program including an OS, sound processing, conversion processing, and mixing processing stored in the storage device 48. Thereby, in the terminal device 41, various processes are performed.

表示部５０は、例えば液晶ディスプレイである。表示部５０には、各種の情報が表示される。表示部５０には、遠隔会議画面６０が表示される。遠隔会議画面６０は、図２に示すように、撮影画像６１と、資料画像６２と、操作情報６３と、報知情報６４と、終了ボタン６５を含む。 The display unit 50 is, for example, a liquid crystal display. Various information is displayed on the display unit 50. A remote conference screen 60 is displayed on the display unit 50. As shown in FIG. 2, the remote conference screen 60 includes a captured image 61, a document image 62, operation information 63, notification information 64, and an end button 65.

撮影画像６１は、端末装置４１，４２，４３のうちの何れかの端末装置で撮影された映像データに対応する画像である。図２では、端末装置４２の撮影画像６１と端末装置４３の撮影画像６１が例示されている。遠隔会議画面６０では、自装置（端末装置４１）の撮影画像６１を表示することもできる。実施形態では、サーバ装置２０は、端末装置４１からの映像データを端末装置４１に送信しない。従って、端末装置４１では、自装置の撮影画像６１は、自装置で撮影された映像データから直接表示される。但し、サーバ装置２０が、端末装置４１からの映像データを端末装置４１にも送信する場合、端末装置４１では、自装置の撮影画像６１は、端末装置４２，４３の撮影画像６１と同様、サーバ装置２０から送信される映像データに従い表示される。 The captured image 61 is an image corresponding to video data captured by any one of the terminal devices 41, 42, and 43. In FIG. 2, the captured image 61 of the terminal device 42 and the captured image 61 of the terminal device 43 are illustrated. On the remote conference screen 60, a captured image 61 of the own device (terminal device 41) can also be displayed. In the embodiment, the server device 20 does not transmit the video data from the terminal device 41 to the terminal device 41. Therefore, in the terminal device 41, the captured image 61 of the own device is directly displayed from the video data photographed by the own device. However, when the server device 20 transmits the video data from the terminal device 41 to the terminal device 41, the captured image 61 of the own device is similar to the captured image 61 of the terminal devices 42 and 43 in the terminal device 41. It is displayed according to the video data transmitted from the device 20.

遠隔会議画面６０では、撮影画像６１の表示又は非表示を設定することができる。撮影画像６１の表示又は非表示は、端末装置４１，４２，４３の各撮影画像６１に対して個別に設定することができる。図２に示す遠隔会議画面６０に基づけば、端末装置４１の撮影画像６１は、非表示に設定され、端末装置４２，４３の撮影画像６１は、表示に設定されている。遠隔会議画面６０では、端末装置４２，４３の撮影画像６１に対する表示の設定に従い、２個の撮影画像６１が表示されている。 On the remote conference screen 60, display or non-display of the captured image 61 can be set. The display or non-display of the captured image 61 can be set individually for each captured image 61 of the terminal devices 41, 42, and 43. Based on the remote conference screen 60 shown in FIG. 2, the captured image 61 of the terminal device 41 is set to non-display, and the captured images 61 of the terminal devices 42 and 43 are set to display. On the remote conference screen 60, two captured images 61 are displayed in accordance with display settings for the captured images 61 of the terminal devices 42 and 43.

遠隔会議画面６０では、表示に設定された撮影画像６１と共にその撮影画像６１に対応する端末ＩＤが表示される。従って、図２では、端末装置４２の撮影画像６１と共に端末ＩＤ「ＵＳＥＲ４２」が表示され、端末装置４３の撮影画像６１と共に端末ＩＤ「ＵＳＥＲ４３」が表示されている。遠隔会議画面６０において、端末装置４２，４３の各ユーザをそれぞれ含む撮影画像６１及び／又は撮影画像６１と共に表示される端末ＩＤは、端末装置４２，４３に対応する相手先情報となる。即ち、端末装置４１のユーザは、撮影画像６１及び端末ＩＤの一方又は両方に基づき、遠隔会議に出席している他のユーザを認識することができる。 On the remote conference screen 60, the terminal ID corresponding to the captured image 61 is displayed together with the captured image 61 set for display. Accordingly, in FIG. 2, the terminal ID “USER42” is displayed together with the captured image 61 of the terminal device 42, and the terminal ID “USER43” is displayed together with the captured image 61 of the terminal device 43. On the remote conference screen 60, the captured image 61 including each user of the terminal devices 42 and 43 and / or the terminal ID displayed together with the captured image 61 is the partner information corresponding to the terminal devices 42 and 43. That is, the user of the terminal device 41 can recognize another user attending the remote conference based on one or both of the captured image 61 and the terminal ID.

操作情報６３は、例えば、リプレイ要求の送信に対応する操作ボタンである。リプレイ要求は、操作情報６３と関連付けられた撮影画像６１及び端末ＩＤの端末装置からサーバ装置２０に送信された音データの送信を、サーバ装置２０に対して要求する指示である。送信要求の対象となる音データは、次に記載のタイミングより前に、端末装置４１，４２，４３のうちの何れかの端末装置で取得された音に対応する音データである。前述のタイミングは、現在、受信されている合成データの生成時に合成対象とされた、この合成データに含まれる音データに対応する音が取得されたタイミングである。操作情報６３は、撮影画像６１及び端末ＩＤに関連付けられた状態で表示される。例えば、端末ＩＤが「ＵＳＥＲ４２」である端末装置４２の撮影画像６１と共に表示されている操作情報６３は、次に記載のタイミングより前に、端末装置４２で取得された音に対応する音データの送信を、サーバ装置２０に対して要求するリプレイ要求の送信に対応する。前述のタイミングは、現在、受信されている合成データに含まれる端末装置４２からの音データに対応する音が端末装置４２で取得されたタイミングである。以下では、リプレイ要求に応じてサーバ装置２０から送信される音データを、他の音データ（合成データ及び後述する再合成データを含む）と区別する場合、「リプレイデータ」という。 The operation information 63 is an operation button corresponding to transmission of a replay request, for example. The replay request is an instruction for requesting the server device 20 to transmit the sound data transmitted from the terminal device having the captured image 61 and the terminal ID associated with the operation information 63 to the server device 20. The sound data subject to the transmission request is sound data corresponding to the sound acquired by any one of the terminal devices 41, 42, and 43 before the timing described below. The aforementioned timing is the timing at which the sound corresponding to the sound data included in the synthesized data, which is the synthesis target at the time of generating the currently received synthesized data, is acquired. The operation information 63 is displayed in a state associated with the captured image 61 and the terminal ID. For example, the operation information 63 displayed together with the captured image 61 of the terminal device 42 whose terminal ID is “USER42” is the sound data corresponding to the sound acquired by the terminal device 42 before the timing described below. The transmission corresponds to transmission of a replay request for requesting the server device 20. The aforementioned timing is the timing at which the sound corresponding to the sound data from the terminal device 42 included in the currently received composite data is acquired by the terminal device 42. Hereinafter, the sound data transmitted from the server device 20 in response to the replay request is referred to as “replay data” when distinguished from other sound data (including synthesized data and recombined data described later).

報知情報６４は、リプレイ情報が取得されたことを示す情報である。端末装置４１で表示される遠隔会議画面６０を対象とした場合、リプレイ情報は、自装置で過去に取得された音に対応するリプレイデータが、サーバ装置２０から端末装置４２，４３の何れかの端末装置に送信されていることを示す情報である。端末装置４１のリプレイデータは、端末装置４２，４３の何れかの端末装置で、端末装置４１の撮影画像６１及び端末ＩＤに関連付けられた操作情報６３に対応する入力が受け付けられた場合に、サーバ装置２０から送信される。リプレイ情報は、リプレイデータの送信先（リプレイ要求の送信元）である端末装置の端末ＩＤを含む。実施形態では、報知情報６４は、「リプレイ中」とされている。報知情報６４は、撮影画像６１及び端末ＩＤに関連付けられた状態で表示される。例えば、端末装置４２で端末装置４１の撮影画像６１及び端末ＩＤに関連付けられた操作情報６３が押下され、端末装置４１のリプレイデータが、サーバ装置２０から端末装置４２に送信されているとする。この場合、端末装置４１で表示中の遠隔会議画面６０では、報知情報６４が、図２に示すように、端末ＩＤが「ＵＳＥＲ４２」である端末装置４２の撮影画像６１と共に表示される。操作情報６３が押下された端末装置の撮影画像６１が非表示である場合、リプレイ情報の取得に応じて、操作情報６３が押下された端末装置の撮影画像６１及び／又は端末ＩＤと報知情報６４が、互いに関連付けられた状態で遠隔会議画面６０の所定の領域に表示される。 The notification information 64 is information indicating that the replay information has been acquired. When the remote conference screen 60 displayed on the terminal device 41 is targeted, the replay information corresponding to the sound acquired in the past by the own device is one of the terminal devices 42 and 43 from the server device 20. This is information indicating that it is being transmitted to the terminal device. The replay data of the terminal device 41 is the server when the input corresponding to the captured image 61 of the terminal device 41 and the operation information 63 associated with the terminal ID is received by any one of the terminal devices 42 and 43. Sent from the device 20. The replay information includes the terminal ID of the terminal device that is the transmission destination of the replay data (the transmission source of the replay request). In the embodiment, the notification information 64 is “replaying”. The notification information 64 is displayed in a state associated with the captured image 61 and the terminal ID. For example, it is assumed that the terminal device 42 presses the captured image 61 of the terminal device 41 and the operation information 63 associated with the terminal ID, and the replay data of the terminal device 41 is transmitted from the server device 20 to the terminal device 42. In this case, on the remote conference screen 60 being displayed on the terminal device 41, the notification information 64 is displayed together with the captured image 61 of the terminal device 42 whose terminal ID is “USER42”, as shown in FIG. When the captured image 61 of the terminal device on which the operation information 63 is pressed is not displayed, the captured image 61 and / or the terminal ID and the notification information 64 of the terminal device on which the operation information 63 is pressed are acquired according to the acquisition of the replay information. Are displayed in a predetermined area of the remote conference screen 60 in a state of being associated with each other.

終了ボタン６５は、遠隔会議の終了に対応する操作ボタンである。端末装置４１のユーザは、遠隔会議を終了する場合、終了ボタン６５を押下する。この場合、遠隔会議は終了し、サーバ装置２０との間で確立されたセッションは、切断される。 The end button 65 is an operation button corresponding to the end of the remote conference. The user of the terminal device 41 presses the end button 65 when ending the remote conference. In this case, the remote conference is terminated, and the session established with the server device 20 is disconnected.

操作部５１は、端末装置４１に対する各種の指示等の入力を受け付ける。操作部５１は、キーボード及びマウスを含む。詳細は省略するが、キーボード及びマウスへの各操作に対応する入力情報を生成する処理は、公知のパーソナルコンピュータで採用されている技術であり、端末装置４１でも採用される。スピーカ５２は、音を出力する音出力部である。スピーカ５２での出力対象は、例えば、上述したサーバ装置２０から送信される合成データに対応する合成音である。通信部５３は、端末装置４１をネットワーク９０に接続し、ネットワーク９０を介したデータ通信を行う。端末装置４１では、通信部５３を介してサーバ装置２０との間で各種のデータが送受信される。通信部５３は、例えば、イーサネット（登録商標）規格に適合するインターフェース回路である。通信部５３によるネットワーク９０への接続は、無線接続又は有線接続の何れであってもよい。 The operation unit 51 receives inputs such as various instructions for the terminal device 41. The operation unit 51 includes a keyboard and a mouse. Although details are omitted, the process of generating input information corresponding to each operation of the keyboard and mouse is a technique adopted in a known personal computer and is also adopted in the terminal device 41. The speaker 52 is a sound output unit that outputs sound. The output target of the speaker 52 is, for example, synthesized sound corresponding to the synthesized data transmitted from the server device 20 described above. The communication unit 53 connects the terminal device 41 to the network 90 and performs data communication via the network 90. In the terminal device 41, various data are transmitted to and received from the server device 20 via the communication unit 53. The communication unit 53 is, for example, an interface circuit that conforms to the Ethernet (registered trademark) standard. The connection to the network 90 by the communication unit 53 may be either a wireless connection or a wired connection.

接続Ｉ／Ｆ５４は、端末装置４１に所定の装置を接続するインターフェースである。接続Ｉ／Ｆ５４は、例えば、ＵＳＢ（Universal Serial Bus）ポートを含むインターフェースである。接続Ｉ／Ｆ５４は、無線通信モジュールを含むものであってもよい。無線通信モジュールとしては、Ｂｌｕｅｔｏｏｔｈ（登録商標）に対応した通信モジュールが例示される。接続Ｉ／Ｆ５４に接続されたカメラ５６は、外界像を撮影する。端末装置４１は、カメラを内蔵する構成であってもよい。接続Ｉ／Ｆ５４に接続されたマイク５７は、外界音を集音する集音部である。例えば、マイク５７は、端末装置４１のユーザが発した音声を集音する。 The connection I / F 54 is an interface that connects a predetermined device to the terminal device 41. The connection I / F 54 is an interface including a USB (Universal Serial Bus) port, for example. The connection I / F 54 may include a wireless communication module. As the wireless communication module, a communication module compatible with Bluetooth (registered trademark) is exemplified. The camera 56 connected to the connection I / F 54 takes an external image. The terminal device 41 may be configured to incorporate a camera. The microphone 57 connected to the connection I / F 54 is a sound collection unit that collects external sound. For example, the microphone 57 collects sound uttered by the user of the terminal device 41.

端末装置４１は、音処理と変換処理とミキシング処理を含むクライアントプログラムが記憶装置４８又はＲＡＭ４９に記憶されている点が、公知の端末装置と相違する。但し、端末装置４１は、ハードウェア的には、公知の端末装置と同じである。従って、端末装置４１は、更に、公知の端末装置が備える構成を備える。 The terminal device 41 is different from a known terminal device in that a client program including sound processing, conversion processing, and mixing processing is stored in the storage device 48 or the RAM 49. However, the terminal device 41 is the same as a known terminal device in hardware. Therefore, the terminal device 41 further includes a configuration included in a known terminal device.

＜遠隔会議システムで実行される処理＞
遠隔会議システム１０で、端末装置４１，４２，４３による遠隔会議が行われる場合、端末装置４１で実行される音処理と変換処理とミキシング処理と、サーバ装置２０で実行される中継処理について説明する。端末装置４１では、操作部５１を介して、サーバ装置２０のＵＲＬを対象とした操作が入力される。この操作が入力された場合、ＣＰＵ４７は、クライアントプログラムを起動する。このプログラムの起動に伴い、端末装置４１は、サーバ装置２０にアクセスし、サーバ装置２０との間でセッションを確立する。サーバ装置２０との間でセッションを確立する場合に端末装置４１で実行される手順は、公知の遠隔会議システムにおける手順と同様である。従って、これに関する説明は、省略する。 <Processing executed by the remote conference system>
When a remote conference is performed by the terminal devices 41, 42, and 43 in the remote conference system 10, sound processing, conversion processing, mixing processing, and relay processing executed by the server device 20 will be described. . In the terminal device 41, an operation for the URL of the server device 20 is input via the operation unit 51. When this operation is input, the CPU 47 activates the client program. As the program starts, the terminal device 41 accesses the server device 20 and establishes a session with the server device 20. The procedure executed by the terminal device 41 when establishing a session with the server device 20 is the same as the procedure in a known remote conference system. Therefore, the description regarding this is omitted.

サーバ装置２０との間でセッションが確立されることで、端末装置４１は、会議ＩＤによって識別される遠隔会議に参加した状態となる。即ち、端末装置４１のユーザは、会議ＩＤによって識別される遠隔会議が開催されるバーチャルな会議室に入室した状態となる。端末装置４１では、サーバ装置２０との間でセッションが確立されると、カメラ５６で撮影された映像データとマイク５７で集音された音に対応する音データが、通信部５３からサーバ装置２０に送信される。映像データと音データには、送信元を示す端末ＩＤがそれぞれ含められる。例えば、端末装置４１から送信される映像データと音データには、端末ＩＤ「ＵＳＥＲ４１」が含められる。 As a session is established with the server device 20, the terminal device 41 enters a state of participating in the remote conference identified by the conference ID. That is, the user of the terminal device 41 enters a virtual conference room where the remote conference identified by the conference ID is held. In the terminal device 41, when a session is established with the server device 20, sound data corresponding to the video data captured by the camera 56 and the sound collected by the microphone 57 is transmitted from the communication unit 53 to the server device 20. Sent to. The video data and the sound data each include a terminal ID indicating the transmission source. For example, the video data and the sound data transmitted from the terminal device 41 include the terminal ID “USER41”.

サーバ装置２０は、端末装置４１，４２，４３との間でセッションが確立された場合、映像データを中継すると共に、所定の２台の端末装置からの音データを合成し、他の１台の端末装置に合成データを送信する。なお、端末装置４１，４２とセッションが確立され、端末装置４３とはセッションが確立されていない場合、サーバ装置２０から端末装置４１に送信される合成データは、実質的には、端末装置４２で集音された音に対応する音データであり、サーバ装置２０から端末装置４２に送信される合成データは、実質的には、端末装置４１で集音された音に対応する音データである。即ち、前述した場合、サーバ装置２０は、端末装置４２からの音データを、そのまま端末装置４１に送信し、端末装置４１からの音データを、そのまま端末装置４２に送信する。 When a session is established with the terminal devices 41, 42, and 43, the server device 20 relays the video data and synthesizes sound data from two predetermined terminal devices, The composite data is transmitted to the terminal device. Note that when the session is established with the terminal devices 41 and 42 and the session is not established with the terminal device 43, the combined data transmitted from the server device 20 to the terminal device 41 is substantially the terminal device 42. The sound data corresponding to the collected sound and the synthesized data transmitted from the server device 20 to the terminal device 42 are substantially sound data corresponding to the sound collected by the terminal device 41. That is, in the case described above, the server device 20 transmits the sound data from the terminal device 42 to the terminal device 41 as it is, and transmits the sound data from the terminal device 41 to the terminal device 42 as it is.

端末装置４１では、サーバ装置２０との間でセッションが確立されると、表示部５０に遠隔会議画面６０が表示される。端末装置４１では、サーバ装置２０を経由して送信される、セッションが確立された端末装置（端末装置４２，４３の両方又は一方）からの映像データと合成データが、通信部５３で受信される。表示部５０に表示されている遠隔会議画面６０では、受信された映像データに対応する撮影画像６１が表示される。スピーカ５２では、受信された合成データに対応する合成音が出力される。遠隔会議画面６０では、撮影画像６１に対して、その撮影画像６１に対応する端末ＩＤが表示される。更に、撮影画像６１と端末ＩＤに対する操作情報６３が表示される。例えば、端末装置４１では、ＣＰＵ４７は、撮影画像６１と端末ＩＤと操作情報６３の表示指令を表示部５０に出力する。これに伴い、表示部５０には、例えば、図２に示すような遠隔会議画面６０が表示される。但し、資料データが取得されていない場合、資料画像６２は、表示されない。 In the terminal device 41, when a session is established with the server device 20, the remote conference screen 60 is displayed on the display unit 50. In the terminal device 41, the communication unit 53 receives video data and synthesized data transmitted from the terminal device (both or one of the terminal devices 42 and 43) transmitted via the server device 20. . On the remote conference screen 60 displayed on the display unit 50, a captured image 61 corresponding to the received video data is displayed. The speaker 52 outputs a synthesized sound corresponding to the received synthesized data. On the remote conference screen 60, a terminal ID corresponding to the captured image 61 is displayed for the captured image 61. Further, operation information 63 for the captured image 61 and the terminal ID is displayed. For example, in the terminal device 41, the CPU 47 outputs a display command for the captured image 61, the terminal ID, and the operation information 63 to the display unit 50. Accordingly, a remote conference screen 60 as shown in FIG. 2 is displayed on the display unit 50, for example. However, when the material data is not acquired, the material image 62 is not displayed.

遠隔会議を終了させる場合、端末装置４１では、操作部５１が操作され、終了ボタン６５が押下される。この場合、端末装置４１では、セッションを切断する手順が、サーバ装置２０との間で実行される。セッションの切断によって、端末装置４１での遠隔会議は、終了する。サーバ装置２０との間のセッションを切断する場合に端末装置４１で実行される手順は、公知の遠隔会議システムにおける手順と同様である。従って、これに関する説明は、省略する。 When terminating the remote conference, in the terminal device 41, the operation unit 51 is operated, and the end button 65 is pressed. In this case, in the terminal device 41, a procedure for disconnecting the session is executed with the server device 20. The remote conference at the terminal device 41 ends by disconnecting the session. The procedure executed by the terminal device 41 when disconnecting the session with the server device 20 is the same as the procedure in a known remote conference system. Therefore, the description regarding this is abbreviate | omitted.

以下では、サーバ装置２０と端末装置４２，４３との間でも、セッションが確立されているとする。端末装置４２，４３でも、端末装置４１と同様、以下に説明する、音処理と変換処理とミキシング処理が実行される。従って、端末装置４２，４３でも、端末装置４１と同様、遠隔会議画面６０が表示され、合成データに対応する合成音と後述する再合成データに対応する再合成音が出力される。映像データを対象とする処理に関する説明は、公知の遠隔会議システムと同様であるため、省略する。 In the following, it is assumed that a session is also established between the server device 20 and the terminal devices 42 and 43. Also in the terminal devices 42 and 43, the sound processing, the conversion processing, and the mixing processing described below are executed similarly to the terminal device 41. Therefore, similarly to the terminal device 41, the terminal devices 42 and 43 display the remote conference screen 60 and output a synthesized sound corresponding to the synthesized data and a re-synthesized sound corresponding to re-synthesized data described later. The description regarding the processing for the video data is the same as that of a known remote conference system, and is therefore omitted.

＜音処理＞
端末装置４１で実行される音処理について、図３を参照して説明する。音処理は、サーバ装置２０とのセッションが確立されたタイミングで開始される。ＣＰＵ４７は、リプレイ操作が受け付けられたかを判断する（Ｓ１１）。リプレイ操作は、操作部５１を介して遠隔会議画面６０に含まれる操作情報６３を押下する操作であり、操作部５１によって受け付けられる。ＣＰＵ４７は、リプレイ操作の受け付けに応じて操作部５１から出力される指令が取得された場合、リプレイ操作が受け付けられたと判断する。ＣＰＵ４７は、リプレイ操作と共に、押下された操作情報６３が関連付けられた端末ＩＤを取得する。リプレイ操作が受け付けられていない場合（Ｓ１１：Ｎｏ）、ＣＰＵ４７は、処理をＳ１５に移行する。 <Sound processing>
The sound processing executed by the terminal device 41 will be described with reference to FIG. The sound processing is started at the timing when the session with the server device 20 is established. The CPU 47 determines whether a replay operation has been accepted (S11). The replay operation is an operation of pressing the operation information 63 included in the remote conference screen 60 via the operation unit 51 and is accepted by the operation unit 51. CPU47 judges that replay operation was received, when the command output from operation part 51 is acquired according to reception of replay operation. The CPU 47 acquires the terminal ID associated with the pressed operation information 63 together with the replay operation. When the replay operation is not accepted (S11: No), the CPU 47 shifts the process to S15.

リプレイ操作が受け付けられた場合（Ｓ１１：Ｙｅｓ）、ＣＰＵ４７は、リプレイ要求と要求元端末ＩＤと要求先端末ＩＤを送信させる（Ｓ１３）。要求元端末ＩＤは、リプレイ要求の送信元の端末装置の端末ＩＤである。要求先端末ＩＤは、Ｓ１１でリプレイ操作と共に取得された端末ＩＤである。要求元端末ＩＤと要求先端末ＩＤは、例えば、リプレイ要求に含まれてもよい。リプレイ要求と要求元端末ＩＤ及び要求先端末ＩＤの送信先は、サーバ装置２０とされる。ＣＰＵ４７は、リプレイ要求と要求元端末ＩＤと要求先端末ＩＤの送信指令を通信部５３に出力する。送信指令に従い、リプレイ要求と要求元端末ＩＤと要求先端末ＩＤが、通信部５３からサーバ装置２０へと送信される。その後、ＣＰＵ４７は、処理をＳ１５に移行する。 When the replay operation is accepted (S11: Yes), the CPU 47 causes the replay request, the request source terminal ID, and the request destination terminal ID to be transmitted (S13). The request source terminal ID is the terminal ID of the terminal device that is the transmission source of the replay request. The request destination terminal ID is the terminal ID acquired together with the replay operation in S11. The request source terminal ID and the request destination terminal ID may be included in the replay request, for example. The transmission destination of the replay request, the request source terminal ID, and the request destination terminal ID is the server device 20. The CPU 47 outputs a replay request, a request source terminal ID, and a request command for transmitting the request destination terminal ID to the communication unit 53. In accordance with the transmission command, the replay request, the request source terminal ID, and the request destination terminal ID are transmitted from the communication unit 53 to the server device 20. Thereafter, the CPU 47 shifts the process to S15.

Ｓ１５でＣＰＵ４７は、合成データを取得する。合成データは、サーバ装置２０から送信され、通信部５３で受信される。ＣＰＵ４７は、通信部５３を介して合成データを取得する。取得された合成データは、ＲＡＭ４９の領域Ａに記憶される。続けて、ＣＰＵ４７は、リプレイデータが取得されたかを判断する（Ｓ１７）。リプレイデータは、後述する図６のＳ７９でサーバ装置２０から送信される。リプレイデータが取得されていない場合（Ｓ１７：Ｎｏ）、ＣＰＵ４７は、処理をＳ２３に移行する。 In S15, the CPU 47 acquires composite data. The combined data is transmitted from the server device 20 and received by the communication unit 53. The CPU 47 acquires composite data via the communication unit 53. The acquired composite data is stored in the area A of the RAM 49. Subsequently, the CPU 47 determines whether or not replay data has been acquired (S17). The replay data is transmitted from the server device 20 in S79 of FIG. When the replay data is not acquired (S17: No), the CPU 47 shifts the process to S23.

上記では説明を省略したが、遠隔会議システム１０において、端末装置４１，４２，４３では、集音された音をＡ／Ｄ変換した波形データがエンコードされて音データが生成され、サーバ装置２０に送信される。エンコードの方式としては、Ｇ．７２２．１とＳｐｅｅｘとＯＰＵＳが例示される。サーバ装置２０では、端末装置４１，４２，４３からの各音データがデコードされ、デコードされた状態で音データが合成されて合成データが生成される（後述する図６のＳ７３参照）。その後、サーバ装置２０では、合成データがエンコードされ、上述したように、所定の端末装置へと送信される。ＣＰＵ４７は、Ｓ１５で取得されたエンコードされた合成データをデコードし、デコードされた合成データを、ＲＡＭ４９の領域Ａに記憶させる。 Although not described above, in the remote conference system 10, the terminal devices 41, 42, and 43 encode waveform data obtained by A / D-converting the collected sound to generate sound data, and the server device 20 Sent. As an encoding method, G.I. 722.1, Speed and OPUS are exemplified. In the server device 20, each sound data from the terminal devices 41, 42, 43 is decoded, and the sound data is synthesized in the decoded state to generate synthesized data (see S73 in FIG. 6 described later). Thereafter, in the server device 20, the synthesized data is encoded and transmitted to a predetermined terminal device as described above. The CPU 47 decodes the encoded combined data acquired in S15, and stores the decoded combined data in the area A of the RAM 49.

リプレイデータが取得された場合（Ｓ１７：Ｙｅｓ）、ＣＰＵ４７は、変換処理を実行する（Ｓ１９）。リプレイデータは、通信部５３で受信される。ＣＰＵ４７は、通信部５３を介してリプレイデータを取得する。取得されたリプレイデータは、ＲＡＭ４９の領域Ｂに記憶される。サーバ装置２０からのリプレイデータは、合成データと同様、エンコードされた状態である。そのため、ＣＰＵ４７は、エンコードされたリプレイデータをデコードし、デコードされたリプレイデータを、ＲＡＭ４９の領域Ｂに記憶させる。変換処理は、取得されたリプレイデータの再生に影響する状態を、第一状態から第二状態へと変換する処理である。変換処理に関する説明は、後述する。その後、ＣＰＵ４７は、ミキシング処理を実行する（Ｓ２１）。詳細は後述するが、ミキシング処理では、合成データ（Ｓ１５参照）とリプレイデータ（Ｓ１７：Ｙｅｓ参照）が合成され、新たな合成データが生成される（後述する図５のＳ５５参照）。実施形態では、ミキシング処理で新たに生成される合成データを、「再合成データ」という。再合成データは、ＲＡＭ４９の領域Ｃに記憶される。ミキシング処理に関するこの他の説明は、後述する。Ｓ２１を実行した後、ＣＰＵ４７は、処理をＳ２３に移行する。 When the replay data is acquired (S17: Yes), the CPU 47 executes a conversion process (S19). The replay data is received by the communication unit 53. The CPU 47 acquires replay data via the communication unit 53. The acquired replay data is stored in the area B of the RAM 49. The replay data from the server device 20 is in an encoded state like the composite data. Therefore, the CPU 47 decodes the encoded replay data, and stores the decoded replay data in the area B of the RAM 49. The conversion process is a process of converting a state that affects the reproduction of the acquired replay data from the first state to the second state. The description regarding the conversion process will be described later. Thereafter, the CPU 47 executes a mixing process (S21). Although details will be described later, in the mixing process, the combined data (see S15) and the replay data (see S17: Yes) are combined to generate new combined data (see S55 in FIG. 5 described later). In the embodiment, the composite data newly generated by the mixing process is referred to as “recombined data”. The recombined data is stored in area C of the RAM 49. Other description regarding the mixing processing will be described later. After executing S21, the CPU 47 shifts the process to S23.

Ｓ２３でＣＰＵ４７は、再生処理を実行する。再生処理の対象は、ＲＡＭ４９の領域Ａに記憶された合成データ（Ｓ１５参照）又はＲＡＭ４９の領域Ｃに記憶された再合成データ（Ｓ２１参照）とされる。Ｓ１７が否定されていた場合（Ｓ１７：Ｎｏ）、ＣＰＵ４７は、ＲＡＭ４９の領域Ａに記憶された合成データを再生し、再生された合成音の出力指令をスピーカ５２に出力する。Ｓ１７が肯定されていた場合（Ｓ１７：Ｙｅｓ）、ＣＰＵ４７は、ＲＡＭ４９の領域Ｃに記憶された再合成データを再生し、再生された再合成音の出力指令をスピーカ５２に出力する。これに伴い、スピーカ５２から合成音又は再合成音が出力される。 In S23, the CPU 47 executes a reproduction process. The target of the reproduction process is composite data stored in the area A of the RAM 49 (see S15) or recombined data stored in the area C of the RAM 49 (see S21). If S17 is negative (S17: No), the CPU 47 reproduces the synthesized data stored in the area A of the RAM 49 and outputs an output command of the reproduced synthesized sound to the speaker 52. When S17 is affirmed (S17: Yes), the CPU 47 reproduces the recombined data stored in the region C of the RAM 49, and outputs an output command of the regenerated synthesized sound to the speaker 52. Along with this, a synthesized sound or a re-synthesized sound is output from the speaker 52.

Ｓ２３を実行した後、ＣＰＵ４７は、リプレイ情報が取得されたかを判断する（Ｓ２５）。リプレイ情報は、後述する図６のＳ８１でサーバ装置２０から送信され、通信部５３で受信される。ＣＰＵ４７は、通信部５３を介してリプレイ情報を取得する。リプレイ情報が取得された場合（Ｓ２５：Ｙｅｓ）、ＣＰＵ４７は、報知情報６４を表示させる（Ｓ２７）。ＣＰＵ４７は、取得されたリプレイ情報に含まれる端末ＩＤを特定する。ＣＰＵ４７は、報知情報６４の表示指令を表示部５０に出力する。表示指令では、特定された端末ＩＤに一致する端末ＩＤと共に表示されている撮影画像６１の領域の所定の位置が指定される。表示部５０では、表示指令に従い、報知情報６４が、前述した撮影画像６１と重ね合わされた状態で表示される（図２参照）。リプレイ情報が取得されていない場合（Ｓ２５：Ｎｏ）又はＳ２７を実行した後、ＣＰＵ４７は、処理をＳ１１に戻す。その後、ＣＰＵ４７は、再度、Ｓ１１以降の処理を実行する。音処理は、サーバ装置２０とのセッションが切断されるまで継続される。 After executing S23, the CPU 47 determines whether replay information has been acquired (S25). The replay information is transmitted from the server device 20 in S81 of FIG. The CPU 47 acquires replay information via the communication unit 53. When the replay information is acquired (S25: Yes), the CPU 47 displays the notification information 64 (S27). CPU47 specifies terminal ID contained in the acquired replay information. The CPU 47 outputs a display command for the notification information 64 to the display unit 50. In the display command, a predetermined position in the area of the captured image 61 displayed together with the terminal ID that matches the specified terminal ID is designated. In the display unit 50, in accordance with the display command, the notification information 64 is displayed in a state of being superimposed on the above-described captured image 61 (see FIG. 2). When the replay information is not acquired (S25: No) or after executing S27, the CPU 47 returns the process to S11. Thereafter, the CPU 47 executes the processes after S11 again. The sound processing is continued until the session with the server device 20 is disconnected.

＜変換処理＞
図３のＳ１９で実行される変換処理について、図４を参照して説明する。変換処理は、デコードされた状態のリプレイデータを対象として実行される。ＣＰＵ４７は、ＲＡＭ４９の領域Ｂに記憶されたリプレイデータ（図３のＳ１７：Ｙｅｓ参照）を、ＲＡＭ４９の領域Ｄに記憶させる（Ｓ３１）。ＲＡＭ４９の領域Ｂに記憶されたリプレイデータは、ＲＡＭ４９の領域Ｄに記憶された後、ＲＡＭ４９の領域Ｂから消去される。続けて、ＣＰＵ４７は、速度変換処理を実行する（Ｓ３３）。速度変換処理は、ＲＡＭ４９の領域Ｄに記憶されたリプレイデータを対象として実行される。速度変換処理は、リプレイデータのテンポを変換させる処理である。ここで、テンポの変換は、再生速度の変換に対応する。再生速度の変換とは、変換されたリプレイデータを再生した場合に、リプレイデータの再生に必要な時間が、変換前と比較して変化することを意味する。例えば、リプレイデータの再生速度を速くさせる場合、リプレイデータは、時間方向に間引かれた状態へと変換される。速度変換処理としては、例えば、タイムストレッチと称される公知の変換処理を採用することができる。従って、速度変換処理に関するこの他の説明は、省略する。 <Conversion processing>
The conversion process executed in S19 of FIG. 3 will be described with reference to FIG. The conversion process is executed on the replay data in the decoded state. The CPU 47 stores the replay data stored in the area B of the RAM 49 (see S17: Yes in FIG. 3) in the area D of the RAM 49 (S31). The replay data stored in area B of RAM 49 is erased from area B of RAM 49 after being stored in area D of RAM 49. Subsequently, the CPU 47 executes a speed conversion process (S33). The speed conversion process is executed on the replay data stored in the area D of the RAM 49. The speed conversion process is a process for converting the tempo of the replay data. Here, tempo conversion corresponds to playback speed conversion. The conversion of the playback speed means that when the converted replay data is played back, the time required for playing back the replay data changes compared to before the conversion. For example, when the reproduction speed of replay data is increased, the replay data is converted into a state that is thinned out in the time direction. As the speed conversion process, for example, a known conversion process called a time stretch can be employed. Therefore, the other description regarding the speed conversion process is omitted.

次に、ＣＰＵ４７は、振幅変換処理を実行する（Ｓ３５）。振幅変換処理は、ＲＡＭ４９の領域Ｄに記憶された速度変換処理後のリプレイデータを対象として実行される。振幅変換処理は、速度変換処理後のリプレイデータのボリューム（ｄＢ）を増幅させる処理である。振幅変換処理は、リプレイデータの振幅を増幅させる公知の処理を採用することができる。従って、振幅変換処理に関するこの他の説明は、省略する。振幅変換処理に伴い、ＲＡＭ４９の領域Ｄには、この処理の実行前のリプレイデータと比較し、ボリューム（ｄＢ）が増幅されたリプレイデータが記憶される。このリプレイデータは、デコードされた状態のままとされる。Ｓ３５を実行した後、ＣＰＵ４７は、変換処理を終了する。 Next, the CPU 47 executes amplitude conversion processing (S35). The amplitude conversion process is executed on the replay data after the speed conversion process stored in the region D of the RAM 49. The amplitude conversion process is a process of amplifying the volume (dB) of replay data after the speed conversion process. As the amplitude conversion process, a known process for amplifying the amplitude of the replay data can be employed. Therefore, the other description regarding an amplitude conversion process is abbreviate | omitted. Along with the amplitude conversion process, the area D of the RAM 49 stores the replay data in which the volume (dB) is amplified as compared with the replay data before the execution of this process. This replay data is left in a decoded state. After executing S35, the CPU 47 ends the conversion process.

＜ミキシング処理＞
図３のＳ２１で実行されるミキシング処理について、図５を参照して説明する。ミキシング処理は、デコードされた状態の合成データ及びリプレイデータを対象として実行される。ＣＰＵ４７は、ＲＡＭ４９の領域Ａに記憶された合成データ（図３のＳ１５参照）を、ＲＡＭ４９の領域Ｃに記憶させる（Ｓ４１）。ＲＡＭ４９の領域Ａに記憶された合成データは、ＲＡＭ４９の領域Ｃに記憶された後、ＲＡＭ４９の領域Ａから消去される。ＣＰＵ４７は、再合成データにおける合成データの音チャネルを、第一チャネルに設定する（Ｓ４３）。例えば、再合成データがステレオのデータである場合、第一チャネルは、左右の音チャネルのうちの一方の音チャネル（例えば、右チャネル）である。 <Mixing process>
The mixing process executed in S21 of FIG. 3 will be described with reference to FIG. The mixing process is executed on the decoded composite data and replay data. The CPU 47 stores the composite data (see S15 in FIG. 3) stored in the area A of the RAM 49 in the area C of the RAM 49 (S41). The composite data stored in the area A of the RAM 49 is erased from the area A of the RAM 49 after being stored in the area C of the RAM 49. The CPU 47 sets the sound channel of the composite data in the recombined data to the first channel (S43). For example, when the recombined data is stereo data, the first channel is one of the left and right sound channels (for example, the right channel).

続けて、ＣＰＵ４７は、第一減衰処理を実行する（Ｓ４５）。第一減衰処理は、ＲＡＭ４９の領域Ｃに記憶された合成データを対象として実行される。第一減衰処理は、合成データのボリューム（ｄＢ）を減衰させる処理である。第一減衰処理における減衰比率は、第一比率とされる。第一比率については、後述する。第一減衰処理は、上述した振幅変換処理と同様、公知の処理を採用することができる。従って、第一減衰処理に関するこの他の説明は、省略する。 Subsequently, the CPU 47 executes a first attenuation process (S45). The first attenuation process is executed on the composite data stored in the area C of the RAM 49. The first attenuation process is a process for attenuating the volume (dB) of the composite data. The attenuation ratio in the first attenuation process is the first ratio. The first ratio will be described later. As the first attenuation process, a known process can be adopted as in the above-described amplitude conversion process. Therefore, the other description regarding a 1st attenuation process is abbreviate | omitted.

次に、ＣＰＵ４７は、ＲＡＭ４９の領域Ｄに記憶された変換処理後のリプレイデータ（図４のＳ３５参照）を、ＲＡＭ４９の領域Ｃに記憶させる（Ｓ４７）。ＲＡＭ４９の領域Ｄに記憶された変換処理後のリプレイデータは、ＲＡＭ４９の領域Ｃに記憶された後、ＲＡＭ４９の領域Ｄから消去される。ＣＰＵ４７は、再合成データにおけるリプレイデータの音チャネルを、第二チャネルに設定する（Ｓ４９）。例えば、再合成データがステレオのデータである場合、第二チャネルは、左右の音チャネルのうちの他方の音チャネル（例えば、左チャネル）である。続けて、ＣＰＵ４７は、第二減衰処理を実行する（Ｓ５１）。第二減衰処理は、ＲＡＭ４９の領域Ｃに記憶された合成データを対象として実行される。第二減衰処理は、合成データのボリューム（ｄＢ）を減衰させる処理である。第二減衰処理における減衰比率は、第二比率とされる。 Next, the CPU 47 stores the replay data after conversion processing (see S35 in FIG. 4) stored in the area D of the RAM 49 in the area C of the RAM 49 (S47). The replay data after the conversion process stored in the area D of the RAM 49 is stored in the area C of the RAM 49 and then deleted from the area D of the RAM 49. The CPU 47 sets the sound channel of the replay data in the recombined data to the second channel (S49). For example, when the recombined data is stereo data, the second channel is the other sound channel (for example, the left channel) of the left and right sound channels. Subsequently, the CPU 47 executes a second attenuation process (S51). The second attenuation process is executed on the composite data stored in the area C of the RAM 49. The second attenuation process is a process for attenuating the volume (dB) of the composite data. The attenuation ratio in the second attenuation process is the second ratio.

第一減衰処理（Ｓ４５参照）での第一比率と第二減衰処理の第二比率の関係は、「第一比率＞第二比率」とされる。第一比率と第二比率の関係は、「第一比率＋第二比率＝１００％」としてもよい。例えば、第一比率は７５％で、第二比率は２５％とされる。第一減衰処理と第二減衰処理は、処理対象となるデータが相違するが、技術的には同一の処理である。第二減衰処理は、上述した振幅変換処理と同様、公知の処理を採用することができる。従って、第二減衰処理に関するこの他の説明は、省略する。 The relationship between the first ratio in the first attenuation process (see S45) and the second ratio in the second attenuation process is “first ratio> second ratio”. The relationship between the first ratio and the second ratio may be “first ratio + second ratio = 100%”. For example, the first ratio is 75% and the second ratio is 25%. The first attenuation process and the second attenuation process are technically the same process, although the data to be processed is different. As the second attenuation process, a known process can be adopted as in the above-described amplitude conversion process. Therefore, the other description regarding a 2nd attenuation process is abbreviate | omitted.

Ｓ５１を実行した後、ＣＰＵ４７は、再合成データを生成する（Ｓ５３）。ＣＰＵ４７は、ＲＡＭ４９の領域Ｃに記憶された、第一減衰処理後の合成データと第二減衰処理後のリプレイデータを合成する。このとき、合成データの音チャネルは、Ｓ４３で設定された第一チャネルとされる。リプレイデータの音チャネルは、Ｓ４９で設定された第二チャネルとされる。Ｓ５３の実行後、合成データとリプレイデータは、ＲＡＭ４９の領域Ｃから消去され、ＲＡＭ４９の領域Ｃには、再合成データが記憶される。再合成データを生成する処理としては、複数の音データを合成する公知の処理を採用することができる。従って、再合成データの生成に関するこの他の説明は、省略する。Ｓ５３を実行した後、ＣＰＵ４７は、ミキシング処理を終了する。 After executing S51, the CPU 47 generates recombined data (S53). The CPU 47 synthesizes the composite data after the first attenuation process and the replay data after the second attenuation process stored in the area C of the RAM 49. At this time, the sound channel of the synthesized data is the first channel set in S43. The sound channel of the replay data is the second channel set in S49. After the execution of S53, the combined data and the replay data are deleted from the area C of the RAM 49, and the re-combined data is stored in the area C of the RAM 49. As the process for generating the re-synthesized data, a known process for synthesizing a plurality of sound data can be employed. Therefore, the other description regarding the production | generation of resynthesis data is abbreviate | omitted. After executing S53, the CPU 47 ends the mixing process.

＜中継処理＞
サーバ装置２０で実行される中継処理について、図６を参照して説明する。中継処理は、端末装置４１，４２，４３のうちの何れかの端末装置とセッションが確立されたタイミングで開始される。実施形態では、上述した通り、端末装置４１，４２，４３の全てとセッションが確立されているとする。ＣＰＵ２１は、端末装置４１，４２，４３のうちの何れかの端末装置から送信された音データを取得する（Ｓ６１）。音データは、通信部２４で受信される。ＣＰＵ２１は、通信部２４を介して音データを取得する。ＣＰＵ２１は、上述した通り、取得された音データをデコードする。取得された音データは、デコードされた状態で、ＲＡＭ２３の所定の領域に記憶される。 <Relay processing>
The relay process executed by the server device 20 will be described with reference to FIG. The relay process is started when a session is established with any one of the terminal devices 41, 42, and 43. In the embodiment, as described above, it is assumed that a session has been established with all of the terminal devices 41, 42, and 43. The CPU 21 acquires sound data transmitted from any one of the terminal devices 41, 42, and 43 (S61). The sound data is received by the communication unit 24. The CPU 21 acquires sound data via the communication unit 24. As described above, the CPU 21 decodes the acquired sound data. The acquired sound data is stored in a predetermined area of the RAM 23 in a decoded state.

ＣＰＵ２１は、Ｓ６１と後述するＳ６３〜Ｓ７１の各処理を、セッションが確立されている端末装置４１，４２，４３のそれぞれに対応させて実行する。即ち、サーバ装置２０と端末装置４１，４２，４３の間でセッションが確立されている場合、ＣＰＵ２１は、Ｓ６１〜Ｓ７１を３回繰り返す。例えば、１サイクル目のＳ６１では、ＣＰＵ２１は、端末装置４１からの音データを取得し、その後、Ｓ６３〜Ｓ７１を順次実行する。１サイクル目のＳ７１を実行した後、ＣＰＵ２１は、処理をＳ６１に戻し、２サイクル目のＳ６１で端末装置４２からの音データを取得し、Ｓ６３〜Ｓ７１を順次実行する。２サイクル目のＳ７１を実行した後、ＣＰＵ２１は、再度、処理をＳ６１に戻し、３サイクル目のＳ６１で端末装置４３からの音データを取得し、Ｓ６３〜Ｓ７１を順次実行する。ＣＰＵ２１は、セッションが確立されている端末装置の数分、Ｓ６１〜Ｓ７１を実行した後、処理をＳ７３に移行する。 CPU21 performs each process of S61 and S63-S71 mentioned later corresponding to each of the terminal devices 41, 42, and 43 with which the session is established. That is, when a session is established between the server device 20 and the terminal devices 41, 42, and 43, the CPU 21 repeats S61 to S71 three times. For example, in S61 of the first cycle, the CPU 21 acquires sound data from the terminal device 41, and then sequentially executes S63 to S71. After executing S71 in the first cycle, the CPU 21 returns the process to S61, acquires sound data from the terminal device 42 in S61 in the second cycle, and sequentially executes S63 to S71. After executing S71 in the second cycle, the CPU 21 returns the process to S61 again, acquires sound data from the terminal device 43 in S61 in the third cycle, and sequentially executes S63 to S71. After executing S61 to S71 for the number of terminal devices with which sessions are established, the CPU 21 proceeds to S73.

従って、中継処理では、ＲＡＭ２３に、接続リストに登録された端末ＩＤ毎に、音データを記憶する領域が確保される。実施形態では、端末装置４１からの音データが記憶されるＲＡＭ２３の領域を、「領域Ｅ」という。端末装置４２からの音データが記憶されるＲＡＭ２３の領域を、「領域Ｆ」という。端末装置４３からの音データが記憶されるＲＡＭ２３の領域を、「領域Ｇ」という。領域Ｅは、端末ＩＤ「ＵＳＥＲ４１」に関連付けられる。領域Ｆは、端末ＩＤ「ＵＳＥＲ４２」に関連付けられる。領域Ｇは、端末ＩＤ「ＵＳＥＲ４３」に関連付けられる。例えば、取得された音データが端末ＩＤ「ＵＳＥＲ４２」を含む音データである場合、この音データは、デコードされてＲＡＭ２３の領域Ｆに記憶される。 Therefore, in the relay process, an area for storing sound data is secured in the RAM 23 for each terminal ID registered in the connection list. In the embodiment, the area of the RAM 23 in which the sound data from the terminal device 41 is stored is referred to as “area E”. The area of the RAM 23 in which sound data from the terminal device 42 is stored is referred to as “area F”. An area of the RAM 23 in which sound data from the terminal device 43 is stored is referred to as “area G”. The area E is associated with the terminal ID “USER41”. The region F is associated with the terminal ID “USER42”. The region G is associated with the terminal ID “USER43”. For example, when the acquired sound data is sound data including the terminal ID “USER42”, the sound data is decoded and stored in the area F of the RAM 23.

Ｓ６１を実行した後、ＣＰＵ２１は、Ｓ６１で取得された音データが無音状態の音データであるかを判断する（Ｓ６３）。音データが無音状態であるかは、例えば、次のようにして判断される。判断対象とされる音データは、デコードされた状態の音データ（波形データ）である。無音状態は、取得された音データの振幅値が基準時間連続して基準値未満である状態に対応する。換言すれば、無音状態でない状態（有音状態）は、取得された音データの振幅値が基準時間連続して基準値以下とならない状態に対応する。基準時間は、例えば、ユーザが発言している場合の１音１音の間隔を考慮して予め決定される。基準値は、例えば、ユーザが、発言していると判断できる音のボリュームを考慮して予め決定される。基準値を、無音状態又は有音状態の何れとするかは、諸条件を考慮して適宜決定される。基準時間と基準値は、中継処理のプログラムに登録されている。解析される音データの振幅値としては、全ての周波数の波形の振幅値が解析の対象とされる。但し、特定の帯域の周波数の波形の振幅値を解析の対象としてもよい。特定の帯域としては、人が発する声の周波数帯域が例示される。この場合、取得された音データが音声以外の音が集音された音に対応する音データであった場合に、無音状態であると判断することができる。 After executing S61, the CPU 21 determines whether the sound data acquired in S61 is sound data in a silent state (S63). Whether or not the sound data is silent is determined as follows, for example. The sound data to be determined is sound data (waveform data) in a decoded state. The silent state corresponds to a state where the amplitude value of the acquired sound data is less than the reference value for the reference time continuously. In other words, a state that is not silent (sound state) corresponds to a state in which the amplitude value of the acquired sound data does not continuously fall below the reference value for the reference time. The reference time is determined in advance in consideration of, for example, the interval of one sound when the user is speaking. The reference value is determined in advance in consideration of, for example, a volume of sound that can be determined that the user is speaking. Whether the reference value is a silent state or a voiced state is appropriately determined in consideration of various conditions. The reference time and the reference value are registered in the relay processing program. As the amplitude value of the sound data to be analyzed, the amplitude values of the waveforms of all frequencies are to be analyzed. However, an amplitude value of a waveform having a frequency in a specific band may be an analysis target. An example of the specific band is a frequency band of a voice uttered by a person. In this case, when the acquired sound data is sound data corresponding to the sound from which sound other than the sound is collected, it can be determined that the sound is in a silent state.

音データが無音状態でない場合（Ｓ６３：Ｎｏ）、ＣＰＵ２１は、この音データを記憶装置２２の所定の領域に記憶させる（Ｓ６５）。実施形態では、端末装置４１からの音データが記憶される記憶装置２２の領域を、「領域Ｈ」という。端末装置４２からの音データが記憶される記憶装置２２の領域を、「領域Ｉ」という。端末装置４３からの音データが記憶される記憶装置２２の領域を、「領域Ｊ」という。領域Ｈは、端末ＩＤ「ＵＳＥＲ４１」に関連付けられる。領域Ｉは、端末ＩＤ「ＵＳＥＲ４２」に関連付けられる。領域Ｊは、端末ＩＤ「ＵＳＥＲ４３」に関連付けられる。例えば、取得された音データが端末ＩＤ「ＵＳＥＲ４２」を含む音データである場合、この音データは、記憶装置２２の領域Ｉに記憶される。既に記憶済みの音データがある場合、新たに記憶される今回の音データは、記憶済みの音データと結合するようにしてもよい。この場合、今回の音データの結合位置は、記憶済み音データの終端とされる。記憶装置２２の領域Ｈ，Ｉ，Ｊに記憶された音データは、後述するＳ７９でリプレイデータとして送信される。実施形態では、記憶装置２２の領域Ｈ，Ｉ，Ｊに記憶された、要求元端末ＩＤの端末装置に送信される前の音データについても、「リプレイデータ」という。Ｓ６５におけるリプレイデータの記憶先は、ＲＡＭ２３としてもよい。 When the sound data is not silent (S63: No), the CPU 21 stores the sound data in a predetermined area of the storage device 22 (S65). In the embodiment, the area of the storage device 22 in which the sound data from the terminal device 41 is stored is referred to as “area H”. An area of the storage device 22 in which sound data from the terminal device 42 is stored is referred to as “area I”. An area of the storage device 22 in which sound data from the terminal device 43 is stored is referred to as “area J”. The region H is associated with the terminal ID “USER41”. The area I is associated with the terminal ID “USER42”. The area J is associated with the terminal ID “USER43”. For example, when the acquired sound data is sound data including the terminal ID “USER42”, the sound data is stored in the area I of the storage device 22. If there is already stored sound data, the newly stored sound data may be combined with the stored sound data. In this case, the combination position of the current sound data is the end of the stored sound data. The sound data stored in the areas H, I, and J of the storage device 22 is transmitted as replay data in S79 described later. In the embodiment, the sound data stored in the areas H, I, and J of the storage device 22 before being transmitted to the terminal device having the request source terminal ID is also referred to as “replay data”. The storage destination of the replay data in S65 may be the RAM 23.

音データが無音状態である場合（Ｓ６３：Ｙｅｓ）、ＣＰＵ２１は、記憶装置２２の所定の領域に記憶された音データを消去する（Ｓ６７）。消去対象となる音データは、取得された音データに含まれる端末ＩＤに対応する音データである。例えば、取得された音データが端末ＩＤ「ＵＳＥＲ４３」を含む音データである場合、記憶装置２２の領域Ｊに記憶されている音データは、消去される。 When the sound data is silent (S63: Yes), the CPU 21 deletes the sound data stored in the predetermined area of the storage device 22 (S67). The sound data to be erased is sound data corresponding to the terminal ID included in the acquired sound data. For example, when the acquired sound data is sound data including the terminal ID “USER43”, the sound data stored in the area J of the storage device 22 is deleted.

Ｓ６５又はＳ６７を実行した後、ＣＰＵ２１は、リプレイ要求が取得されたかを判断する（Ｓ６９）。リプレイ要求は、上述した図３のＳ１３で送信され、通信部２４で受信される。このとき、通信部２４では、要求先端末ＩＤと要求元端末ＩＤも受信される。ＣＰＵ２１は、通信部２４を介してリプレイ要求と要求先端末ＩＤと要求元端末ＩＤを取得する。続けて、ＣＰＵ２１は、リプレイ要求に従い、取得された要求先端末ＩＤを接続リストに登録する（Ｓ７１）。ＣＰＵ２１は、リプレイ要求と共に取得される要求元端末ＩＤに一致する接続リストに登録されている端末ＩＤに関連付けて、要求先端末ＩＤを登録する。 After executing S65 or S67, the CPU 21 determines whether a replay request has been acquired (S69). The replay request is transmitted in S13 of FIG. 3 and received by the communication unit 24. At this time, the communication unit 24 also receives the request destination terminal ID and the request source terminal ID. The CPU 21 acquires a replay request, a request destination terminal ID, and a request source terminal ID via the communication unit 24. Subsequently, the CPU 21 registers the acquired request destination terminal ID in the connection list in accordance with the replay request (S71). The CPU 21 registers the request destination terminal ID in association with the terminal ID registered in the connection list that matches the request source terminal ID acquired together with the replay request.

ＣＰＵ２１は、セッションが確立されている端末装置４１，４２，４３の数分（３回）、Ｓ６１〜Ｓ７１を実行した後、処理をＳ７３に移行する。Ｓ７３でＣＰＵ２１は、合成データを生成する。合成データは、ＲＡＭ２３の領域Ｅ，Ｆ，Ｇに記憶された音データのうちの２個の音データを合成して生成される。例えば、ＣＰＵ２１は、ＲＡＭ２３の領域Ｆ及び領域Ｇに記憶された端末装置４２，４３からの音データを合成し、端末装置４１に送信される合成データを生成する。生成された合成データは、ＲＡＭ２３の領域Ｋに記憶される。領域Ｋに記憶される合成データには、送信先となる端末装置の端末ＩＤ（前述した例に基づけば、端末ＩＤ「ＵＳＥＲ４１」）が関連付けられる。Ｓ７３の実行後、ＲＡＭ２３の領域Ｅ，Ｆ，Ｇに記憶された音データは、領域Ｅ，Ｆ，Ｇから消去される。Ｓ７３では、Ｓ６３で無音状態と判断（Ｓ６３：Ｎｏ）される音データについては、合成対象から除外するようにしてもよい。この場合、端末装置４１，４２，４３に対応させて３回繰り返されるＳ６１で取得された音データが、全て無音状態である場合、Ｓ７３及び後述するＳ７５は、スキップされる。合成データを生成する処理としては、複数の音データを合成する公知の処理を採用することができる。従って、合成データの生成に関するこの他の説明は、省略する。 The CPU 21 executes S61 to S71 for the number of times (three times) of the terminal devices 41, 42, and 43 with which the session is established, and then proceeds to S73. In S73, the CPU 21 generates composite data. The synthesized data is generated by synthesizing two pieces of sound data among the sound data stored in the areas E, F, and G of the RAM 23. For example, the CPU 21 synthesizes sound data from the terminal devices 42 and 43 stored in the area F and the area G of the RAM 23 and generates synthesized data transmitted to the terminal apparatus 41. The generated composite data is stored in the area K of the RAM 23. The combined data stored in the area K is associated with the terminal ID of the terminal device that is the transmission destination (terminal ID “USER41” based on the above-described example). After execution of S73, the sound data stored in the areas E, F, and G of the RAM 23 are deleted from the areas E, F, and G. In S73, the sound data determined to be silent (S63: No) in S63 may be excluded from synthesis targets. In this case, when all the sound data acquired in S61 that is repeated three times corresponding to the terminal devices 41, 42, and 43 are silent, S73 and S75 described later are skipped. As the process for generating the synthesized data, a known process for synthesizing a plurality of sound data can be employed. Therefore, the other description regarding the production | generation of composite data is abbreviate | omitted.

続けて、ＣＰＵ２１は、合成データを送信させる（Ｓ７５）。合成データの送信先は、セッションが確立されている端末装置４１，４２，４３とされる。ＣＰＵ２１は、端末ＩＤ「ＵＳＥＲ４１」が関連付けられた端末装置４１への合成データの送信指令を通信部２４に出力する。ＣＰＵ２１は、端末ＩＤ「ＵＳＥＲ４２」が関連付けられた端末装置４２への合成データの送信指令を通信部２４に出力する。ＣＰＵ２１は、端末ＩＤ「ＵＳＥＲ４３」が関連付けられた端末装置４３への合成データの送信指令を通信部２４に出力する。前述した各送信指令に従い、合成データが、通信部２４から端末装置４１，４２，４３へと順次送信される。送信に際し、合成データは、エンコードされる。Ｓ７５の実行後、合成データは、ＲＡＭ２３の領域Ｋから消去される。 Subsequently, the CPU 21 transmits the composite data (S75). The transmission destination of the composite data is the terminal devices 41, 42, and 43 with which sessions are established. The CPU 21 outputs a compositing data transmission command to the terminal device 41 associated with the terminal ID “USER41” to the communication unit 24. The CPU 21 outputs a compositing data transmission command to the terminal device 42 associated with the terminal ID “USER42” to the communication unit 24. The CPU 21 outputs a compositing data transmission command to the terminal device 43 associated with the terminal ID “USER43” to the communication unit 24. In accordance with each transmission command described above, the composite data is sequentially transmitted from the communication unit 24 to the terminal devices 41, 42, and 43. Upon transmission, the composite data is encoded. After execution of S75, the composite data is erased from the area K of the RAM 23.

次に、ＣＰＵ２１は、接続リストに要求先端末ＩＤが記憶されているかを判断する（Ｓ７７）。ＣＰＵ２１は、接続テーブルにアクセスする。要求先端末ＩＤが登録されている場合（Ｓ７７：Ｙｅｓ）、ＣＰＵ２１は、リプレイデータを送信させる（Ｓ７９）。ＣＰＵ２１は、接続テーブルから、セッションが確立されている端末装置４１，４２，４３の端末ＩＤ「ＵＳＥＲ４１，４２，４３」とこれら端末ＩＤに関連付けられた要求先端末ＩＤを特定する。リプレイデータの送信先は、接続テーブルで、要求先端末ＩＤが関連付けられた端末ＩＤの端末装置とされる。ＣＰＵ２１は、記憶装置２２の領域Ｈ，Ｉ，Ｊのうち、要求先端末ＩＤに一致する端末ＩＤに関連付けられた記憶装置２２の領域に記憶されているリプレイデータを読み出す。続けて、ＣＰＵ２１は、読み出されたリプレイデータの送信指令を通信部２４に出力する。送信指令に従い、リプレイデータが、通信部２４から、送信先として設定された端末装置へと送信される。送信に際し、リプレイデータは、エンコードされる。 Next, the CPU 21 determines whether or not the requested terminal ID is stored in the connection list (S77). The CPU 21 accesses the connection table. When the request destination terminal ID is registered (S77: Yes), the CPU 21 transmits replay data (S79). The CPU 21 specifies, from the connection table, the terminal IDs “USER41, 42, 43” of the terminal devices 41, 42, 43 with which the session is established, and the request destination terminal IDs associated with these terminal IDs. The transmission destination of the replay data is the terminal device of the terminal ID associated with the request destination terminal ID in the connection table. CPU21 reads the replay data memorize | stored in the area | region of the memory | storage device 22 linked | related with terminal ID corresponding to request | requirement terminal ID among the area | regions H, I, and J of the memory | storage device 22. FIG. Subsequently, the CPU 21 outputs a transmission command for the read replay data to the communication unit 24. In accordance with the transmission command, the replay data is transmitted from the communication unit 24 to the terminal device set as the transmission destination. At the time of transmission, the replay data is encoded.

続けて、ＣＰＵ２１は、リプレイ情報を送信させる（Ｓ８１）。リプレイ情報には、Ｓ７９で特定された要求先端末ＩＤが関連付けられた端末ＩＤ（要求元端末ＩＤ）が含められる。リプレイ情報の送信先は、Ｓ７９で特定された要求先端末ＩＤの端末装置に設定される。ＣＰＵ２１は、リプレイ情報の送信指令を通信部２４に出力する。送信指令に従い、リプレイ情報が、通信部２４から送信先として設定された端末装置へと送信される。要求先端末ＩＤが登録されていない場合（Ｓ７７：Ｎｏ）又はＳ８１を実行した後、ＣＰＵ２１は、処理をＳ６１に戻す。その後、ＣＰＵ２１は、再度、Ｓ６１以降の処理を実行する。中継処理は、全ての端末装置４１，４２，４３とのセッションが切断されるまで継続される。 Subsequently, the CPU 21 transmits replay information (S81). The replay information includes a terminal ID (request source terminal ID) associated with the request destination terminal ID specified in S79. The transmission destination of the replay information is set in the terminal device having the request destination terminal ID specified in S79. The CPU 21 outputs a replay information transmission command to the communication unit 24. In accordance with the transmission command, the replay information is transmitted from the communication unit 24 to the terminal device set as the transmission destination. When the request destination terminal ID is not registered (S77: No) or after executing S81, the CPU 21 returns the process to S61. Thereafter, the CPU 21 executes the processes after S61 again. The relay process is continued until the sessions with all the terminal devices 41, 42, 43 are disconnected.

＜実施形態の効果＞
実施形態によれば、次のような効果を得ることができる。この説明では、端末装置４１，４２，４３による遠隔会議に関する内容は、端末装置４１を主体として記載する。 <Effect of embodiment>
According to the embodiment, the following effects can be obtained. In this description, the contents related to the remote conference by the terminal devices 41, 42, and 43 are described with the terminal device 41 as a main body.

（１）遠隔会議システム１０で実施される端末装置４１，４２，４３による遠隔会議において、サーバ装置２０では、端末装置４１，４２，４３から順次送信される音データが取得される（図６のＳ６１参照）。順次取得される端末装置４１からの音データは、記憶装置２２の領域Ｈに繰り返して記憶され、順次取得される端末装置４２からの音データは、記憶装置２２の領域Ｉに繰り返して記憶され、順次取得される端末装置４３からの音データは、記憶装置２２の領域Ｊに繰り返して記憶される（図６のＳ６５参照）。サーバ装置２０では、例えば、端末装置４２，４３からの音データを合成した合成データが生成され、この合成データが、端末装置４１に送信される（図６のＳ７３、Ｓ７５参照）。 (1) In the remote conference by the terminal devices 41, 42, and 43 implemented in the remote conference system 10, the server device 20 acquires sound data that is sequentially transmitted from the terminal devices 41, 42, and 43 (FIG. 6). (See S61). The sound data from the terminal device 41 sequentially acquired is repeatedly stored in the area H of the storage device 22, and the sound data from the terminal device 42 sequentially acquired is repeatedly stored in the area I of the storage device 22, The sound data from the terminal device 43 acquired sequentially is repeatedly stored in the area J of the storage device 22 (see S65 in FIG. 6). In the server device 20, for example, synthesized data obtained by synthesizing sound data from the terminal devices 42 and 43 is generated, and this synthesized data is transmitted to the terminal device 41 (see S73 and S75 in FIG. 6).

端末装置４１では、遠隔会議画面６０に、端末装置４２の撮影画像６１及び端末ＩＤ「ＵＳＥＲ４２」に関連付けて操作情報６３が表示され、端末装置４３の撮影画像６１及び端末ＩＤ「ＵＳＥＲ４３」に関連付けて操作情報６３が表示される（図２参照）。端末装置４１では、操作部５１を介した操作情報６３の押下に伴い、リプレイ操作が受け付けられた場合、リプレイ要求と要求元端末ＩＤと要求先端末ＩＤがサーバ装置２０に送信される（図３のＳ１１：Ｙｅｓ、Ｓ１３参照）。サーバ装置２０では、リプレイ要求と共に取得される要求先端末ＩＤに対応するリプレイデータが、端末装置４１に送信される（図６のＳ７９参照）。端末装置４１では、リプレイデータが取得され、合成データとリプレイデータを合成した再合成データが生成される（図３のＳ１７：Ｙｅｓ、図５のＳ５３参照）。端末装置４１では、再合成データが再生され、合成データに対応する合成音とリプレイデータに対応する音を含む再合成音が、スピーカ５２から出力される（図３のＳ２３参照）。 In the terminal device 41, the operation information 63 is displayed on the remote conference screen 60 in association with the captured image 61 of the terminal device 42 and the terminal ID “USER42”, and is associated with the captured image 61 of the terminal device 43 and the terminal ID “USER43”. Operation information 63 is displayed (see FIG. 2). In the terminal device 41, when a replay operation is accepted as the operation information 63 is pressed through the operation unit 51, a replay request, a request source terminal ID, and a request destination terminal ID are transmitted to the server device 20 (FIG. 3). S11: Yes, see S13). In the server device 20, the replay data corresponding to the requested terminal ID acquired together with the replay request is transmitted to the terminal device 41 (see S79 in FIG. 6). In the terminal device 41, the replay data is acquired, and recombined data obtained by combining the combined data and the replay data is generated (S17: Yes in FIG. 3, see S53 in FIG. 5). In the terminal device 41, the re-synthesized data is reproduced, and the re-synthesized sound including the synthesized sound corresponding to the synthesized data and the sound corresponding to the replay data is output from the speaker 52 (see S23 in FIG. 3).

そのため、遠隔会議システム１０では、例えば、端末装置４１で、端末装置４２，４３でそれぞれ現在取得されている最新の音と、端末装置４２，４３の何れかの端末装置で過去に取得された音を、同時に出力させることができる。端末装置４１のユーザに、端末装置４２，４３の各ユーザの現在の発言内容を認識させつつ、端末装置４１のユーザが聞き逃した端末装置４２，４３の何れかのユーザの過去の発言内容を、認識させることができる。実施形態におけるリプレイ要求は、端末装置４１で、端末装置４２，４３の何れかの端末装置で過去に取得された音が出力される点を考慮すると、次のような指示ということもできる。即ち、リプレイ要求は、最新の音が取得されたタイミングより前に端末装置４２，４３のうちの何れかの端末装置で取得された過去の音の再生を要求する指示ということもできる。 Therefore, in the remote conference system 10, for example, the latest sound currently acquired by the terminal devices 42 and 43 in the terminal device 41 and the sound acquired in the past by any one of the terminal devices 42 and 43. Can be output simultaneously. While making the user of the terminal device 41 recognize the current utterance content of each user of the terminal device 42, 43, the past utterance content of any user of the terminal device 42, 43 that the user of the terminal device 41 missed is heard. Can be recognized. The replay request in the embodiment can be said to be the following instruction in consideration of the fact that the terminal device 41 outputs sounds acquired in the past by any one of the terminal devices 42 and 43. That is, the replay request can be said to be an instruction for requesting the reproduction of the past sound acquired by any one of the terminal devices 42 and 43 before the timing when the latest sound is acquired.

（２）端末装置４１では、リプレイデータが取得された場合、変換処理が実行される（図３のＳ１７：Ｙｅｓ、Ｓ１９及び図４参照）。変換処理では、リプレイデータは、速度変換処理によって再生速度が速くなる状態へと変換される（図４のＳ３３参照）。そのため、再合成データを対象とした再生処理（図３のＳ２３参照）において、リプレイデータに対応する音の再生時間を短くすることができる。端末装置４１のユーザに、聞き逃した端末装置４２，４３の何れかのユーザの過去の発言内容を、迅速に認識させることができる。 (2) In the terminal device 41, when the replay data is acquired, a conversion process is executed (see S17 in FIG. 3: Yes, S19 and FIG. 4). In the conversion process, the replay data is converted into a state in which the reproduction speed is increased by the speed conversion process (see S33 in FIG. 4). Therefore, in the reproduction process (see S23 in FIG. 3) for the recombined data, the reproduction time of the sound corresponding to the replay data can be shortened. The user of the terminal device 41 can promptly recognize the contents of the past remarks of any user of the terminal devices 42 and 43 that have missed.

（３）端末装置４１では、リプレイデータが取得された場合、ミキシング処理が実行される（図３のＳ１７：Ｙｅｓ、Ｓ２１及び図５参照）。ミキシング処理では、合成データは、第一比率だけ減衰され、リプレイデータは、第二比率だけ減衰される（図５のＳ４５、Ｓ５１参照）。第二比率は、第一比率より小さな値とされる。第一比率だけ減衰された合成データと第二比率だけ減衰されたリプレイデータが合成され、再合成データが生成される（図５のＳ５３参照）。その際、合成データの音チャネルは、第一チャネルとされ、リプレイデータの音チャネルは、第二チャネルとされる（図５のＳ４３、Ｓ４９参照）。そのため、再合成データを対象とした再生処理（図３のＳ２３参照）において、リプレイデータに対応する音を、第一チャネルから出力される合成データに対応する合成音より大きなボリュームで、第二チャネルから出力させることができる。端末装置４１のユーザに、聞き逃した端末装置４２，４３の何れかのユーザの過去の発言内容を、スムーズに認識させることができる。 (3) In the terminal device 41, when replay data is acquired, a mixing process is performed (refer to S17 in FIG. 3: Yes, S21, and FIG. 5). In the mixing process, the composite data is attenuated by the first ratio, and the replay data is attenuated by the second ratio (see S45 and S51 in FIG. 5). The second ratio is smaller than the first ratio. The combined data attenuated by the first ratio and the replay data attenuated by the second ratio are combined to generate recombined data (see S53 in FIG. 5). At this time, the sound channel of the synthesized data is the first channel, and the sound channel of the replay data is the second channel (see S43 and S49 in FIG. 5). For this reason, in the reproduction process for recombined data (see S23 in FIG. 3), the sound corresponding to the replay data has a volume larger than that of the synthetic sound corresponding to the synthesized data output from the first channel, and the second channel. Can be output from. It is possible to make the user of the terminal device 41 smoothly recognize the content of the past remarks of any user of the terminal devices 42 and 43 that he / she missed.

（４）例えば、端末装置４２から、リプレイ要求と要求元端末ＩＤ「ＵＳＥＲ４２」と要求先端末ＩＤ「ＵＳＥＲ４１」が送信され、サーバ装置２０で、これらが取得された場合、サーバ装置２０では、端末装置４１に、リプレイ情報が送信される（図６のＳ８１参照）。端末装置４１では、リプレイ情報の取得に応じて、報知情報６４が表示される（図３のＳ２７参照）。その際、報知情報６４は、取得されたリプレイ情報に含まれる端末ＩＤ「ＵＳＥＲ４２」と共に表示されている撮影画像６１の領域の所定の位置に表示される（図２参照）。そのため、端末装置４１のユーザに、端末装置４２で、自身の発言内容が聞き直されていることを報知することができる。 (4) For example, when the replay request, the request source terminal ID “USER42”, and the request destination terminal ID “USER41” are transmitted from the terminal device 42 and acquired by the server device 20, the server device 20 The replay information is transmitted to the device 41 (see S81 in FIG. 6). In the terminal device 41, the notification information 64 is displayed according to the acquisition of the replay information (see S27 in FIG. 3). At that time, the notification information 64 is displayed at a predetermined position in the area of the captured image 61 displayed together with the terminal ID “USER42” included in the acquired replay information (see FIG. 2). Therefore, it is possible to notify the user of the terminal device 41 that the terminal device 42 is listening to the content of his / her own speech.

（５）サーバ装置２０では、取得された音データが無音状態であるかが判断され、音データが無音状態でない場合、音データが、記憶装置２２の領域Ｈ，Ｉ，Ｊのうち、音データに含まれる端末ＩＤに対応する領域に記憶される（図６のＳ６３：Ｎｏ、Ｓ６５参照）。これに対して、取得された音データが無音状態である場合、それまでに記憶された音データが消去される（図６のＳ６３：Ｙｅｓ、Ｓ６７参照）。そのため、リプレイ要求に応じて送信される音データを、無音状態ではない音データとすることができる。音データの記憶を、無音状態から無音状態までの期間とすることができる。 (5) In the server device 20, it is determined whether the acquired sound data is in a silence state. If the sound data is not in a silence state, the sound data is the sound data in the areas H, I, and J of the storage device 22. (See S63: No, S65 in FIG. 6). On the other hand, if the acquired sound data is silent, the sound data stored so far is deleted (see S63: Yes, S67 in FIG. 6). Therefore, the sound data transmitted in response to the replay request can be sound data that is not silent. The storage of the sound data can be a period from the silent state to the silent state.

＜変形例＞
実施形態は、次のようにすることもできる。以下に示す変形例のうちの幾つかの構成は、適宜組み合わせて採用することもできる。以下では、上記とは異なる点を説明することとし、同様の点についての説明は、適宜省略する。 <Modification>
The embodiment can also be performed as follows. Some configurations of the modifications shown below can be appropriately combined and employed. Hereinafter, points different from the above will be described, and description of similar points will be omitted as appropriate.

（１）上記では、サーバ装置２０で、合成データが生成され、生成された合成データが、端末装置４１，４２，４３に適宜送信される（図６のＳ７３、Ｓ７５参照）。サーバ装置２０は、音データを合成することなく、取得された音データを、音データの送信元以外の端末装置に送信するようにしてもよい。例えば、サーバ装置２０は、端末装置４２，４３からの音データを、合成することなく、端末装置４１に送信する。この場合、図６の中継処理では、Ｓ６３〜Ｓ７９は省略される。 (1) In the above, the server device 20 generates combined data, and the generated combined data is appropriately transmitted to the terminal devices 41, 42, and 43 (see S73 and S75 in FIG. 6). The server device 20 may transmit the acquired sound data to a terminal device other than the sound data transmission source without synthesizing the sound data. For example, the server device 20 transmits the sound data from the terminal devices 42 and 43 to the terminal device 41 without being synthesized. In this case, S63 to S79 are omitted in the relay process of FIG.

端末装置４１，４２，４３では、次のような処理が実行される。ここでは、端末装置４１を例として説明する。端末装置４１では、サーバ装置２０から送信される端末装置４２，４３からの２個の音データが、受信される。ＣＰＵ４７は、通信部５３を介して前述の２個の音データを取得する。ＣＰＵ４７は、取得された各音データをデコードし、デコードされた各音データを対象として、図６のＳ６３〜Ｓ６７に対応する処理を実行する。ＣＰＵ４７は、リプレイ操作が受け付けられた場合（図３のＳ１１：Ｙｅｓ参照）、自装置の記憶装置４８の所定の領域から、操作情報６３が関連付けられた端末ＩＤの音データを取得し（図３のＳ１７：Ｙｅｓ参照）、上記同様、Ｓ１９以降の処理を実行する。ＣＰＵ４７は、自装置の端末ＩＤ（上記に準じて「要求元端末ＩＤ」という）と、操作情報６３が関連付けられた端末ＩＤ（上記に準じて「要求先端末ＩＤ」という）を含むリプレイ情報を、サーバ装置２０に送信する。 In the terminal devices 41, 42, 43, the following processing is executed. Here, the terminal device 41 will be described as an example. The terminal device 41 receives the two sound data from the terminal devices 42 and 43 transmitted from the server device 20. The CPU 47 acquires the above-described two sound data via the communication unit 53. The CPU 47 decodes each acquired sound data, and executes processing corresponding to S63 to S67 in FIG. 6 for each decoded sound data. When the replay operation is accepted (see S11: Yes in FIG. 3), the CPU 47 acquires the sound data of the terminal ID associated with the operation information 63 from a predetermined area of the storage device 48 of the own device (FIG. 3). S17: Yes), the processing after S19 is executed as described above. The CPU 47 displays replay information including the terminal ID of the device itself (referred to as “requesting terminal ID” according to the above) and the terminal ID associated with the operation information 63 (referred to as “requested terminal ID” according to the above). To the server device 20.

サーバ装置２０では、ＣＰＵ２１は、通信部２４を介して、このリプレイ情報を取得する。続けて、ＣＰＵ２１は、リプレイ情報を送信させる。リプレイ情報の送信先は、要求先端末ＩＤの端末装置とされる。この端末装置では、リプレイ情報に含まれる要求元端末ＩＤに従い、図３のＳ２７が実行される。 In the server device 20, the CPU 21 acquires this replay information via the communication unit 24. Subsequently, the CPU 21 transmits replay information. The transmission destination of the replay information is the terminal device of the request destination terminal ID. In this terminal device, S27 in FIG. 3 is executed in accordance with the request source terminal ID included in the replay information.

（２）上記では説明を省略したが、遠隔会議システム１０では、公知の遠隔会議システムと同様、端末装置４１，４２，４３のそれぞれに対して、所定の役割が設定される。役割としては、主催者と発表者と参加者が例示される。これら３つの役割について、主催者は最上位の役割であり、発表者と参加者は、主催者より下位の役割である。発表者は、参加者より上位の役割である。図６のＳ６３では、例えば、判断条件となる基準時間及び／又は振幅値の基準値は、役割に応じて適宜変更するようにしてもよい。例えば、上位の役割程、基準時間を長く設定するようにしてもよい。この他、無音状態であるかに関わらず、一定時間、音データを記憶するようにしてもよい。この場合も、前述した一定時間は、役割に応じて変更するようにしてもよい。例えば、上位の役割程、音データを長い時間記憶させるようにしてもよい。 (2) Although not described above, in the remote conference system 10, a predetermined role is set for each of the terminal devices 41, 42, and 43, as in the known remote conference system. The role is exemplified by the organizer, the presenter, and the participant. Of these three roles, the organizer is the highest role, and the presenters and participants are lower than the organizer. The presenter has a higher role than the participants. In S63 of FIG. 6, for example, the reference time and / or the reference value of the amplitude value that are the determination conditions may be changed as appropriate according to the role. For example, the reference time may be set longer for the higher role. In addition, sound data may be stored for a certain period of time regardless of whether it is in a silent state. Also in this case, the above-mentioned fixed time may be changed according to the role. For example, the sound data may be stored for a longer time in the higher role.

（３）上記では、音データが取得される毎に、取得された音データが無音であるかが判断され、取得された１個の音データが、無音状態である場合、それまでに記憶された音データが消去される（図６のＳ６１、Ｓ６３：Ｙｅｓ、Ｓ６７参照）。Ｓ６３の判断は、同一の端末装置から取得される音データが、例えば、予め定めた回数又は時間連続して無音状態であった場合、肯定（図６のＳ６３：Ｙｅｓ参照）されるようにしてもよい。この場合、端末装置毎に、無音状態の音データが連続して取得された回数、又は無音状態の音データの時間がカウントされる。サーバ装置２０と端末装置４１，４２，４３の間を送受信される音データを、一定時間（例えば、２０ｍｓｅｃ）のデータとした場合、無音状態の音データが連続して取得された回数をカウントすることで、無音状態である時間を計測することができる。無音状態の判断条件となる基準時間は、上記の場合と比較し、長い時間に設定される。 (3) In the above, every time the sound data is acquired, it is determined whether the acquired sound data is silent, and if the acquired single sound data is in the silent state, it is stored until then. The sound data is deleted (see S61, S63: Yes, S67 in FIG. 6). The determination in S63 is made so that the sound data acquired from the same terminal device is affirmed (see S63: Yes in FIG. 6) when, for example, the sound data is silent for a predetermined number of times or continuously. Also good. In this case, for each terminal device, the number of times that silent sound data is continuously acquired or the time of silent data is counted. When the sound data transmitted and received between the server device 20 and the terminal devices 41, 42, and 43 is data for a certain time (for example, 20 msec), the number of times that the sound data in the silent state is continuously acquired is counted. Thus, it is possible to measure the time of silence. The reference time that is the determination condition for the silent state is set to a longer time than in the above case.

（４）上記では、変換処理（図４参照）では、Ｓ３３で速度変換処理が実行される。速度変換処理は、省略してもよい。速度変換処理は、リプレイデータの再生速度を遅くする処理（テンポを遅くする処理）であってもよい。変換処理では、Ｓ３５で振幅変換処理が実行される。ミキシング処理（図５参照）では、Ｓ５１で第二減衰処理が実行される。振幅変換処理又は第二減衰処理の一方は、省略してもよい。例えば、振幅変換処理は、省略してもよい。 (4) In the above, in the conversion process (see FIG. 4), the speed conversion process is executed in S33. The speed conversion process may be omitted. The speed conversion process may be a process of slowing down the playback speed of replay data (a process of slowing down the tempo). In the conversion process, the amplitude conversion process is executed in S35. In the mixing process (see FIG. 5), the second attenuation process is executed in S51. One of the amplitude conversion process and the second attenuation process may be omitted. For example, the amplitude conversion process may be omitted.

１０遠隔会議システム
２０サーバ装置
２１ＣＰＵ
２２記憶装置
２３ＲＡＭ
２４通信部
２５バス
４１，４２，４３端末装置
４７ＣＰＵ
４８記憶装置
４９ＲＡＭ
５０表示部
５１操作部
５２スピーカ
５３通信部
５４接続インターフェース（接続Ｉ／Ｆ）
５５バス
５６カメラ
５７マイク
６０遠隔会議画面
６１撮影画像
６２資料画像
６３操作情報
６４報知情報
６５終了ボタン
９０ネットワーク 10 remote conference system 20 server device 21 CPU
22 storage device 23 RAM
24 communication unit 25 bus 41, 42, 43 terminal device 47 CPU
48 Storage device 49 RAM
50 Display Unit 51 Operation Unit 52 Speaker 53 Communication Unit 54 Connection Interface (Connection I / F)
55 Bus 56 Camera 57 Microphone 60 Remote conference screen 61 Captured image 62 Document image 63 Operation information 64 Notification information 65 End button 90 Network

Claims

ネットワークに接続された第一端末装置を制御するコンピュータに、
前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、
前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、
前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、
前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、
前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、
前記第二音データの再生に影響する状態を第一状態から第二状態へと変換する変換処理と、を実行させ、
前記第一ミキシング処理は、前記第一音データと、前記第二状態へと変換された前記第二音データと、を合成する処理である、プログラム。 To the computer that controls the first terminal device connected to the network,
First timing collected by the second terminal device, transmitted from a server device that executes a remote conference via the network by the first terminal device and a second terminal device connected to the network A first acquisition process for acquiring first sound data corresponding to at least the first sound;
A first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the first timing collected by the second terminal device is sent to the server device. One transmission process,
A second acquisition process for acquiring the second sound data transmitted from the server device;
A first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data;
A reproduction process for reproducing the first synthesized data and outputting the reproduced first synthesized sound;
A conversion process for converting the state affecting the reproduction of the second sound data from the first state to the second state ;
The first mixing process is a program that synthesizes the first sound data and the second sound data converted into the second state .

前記変換処理は、前記第一状態の前記第二音データを、前記第二音データの再生速度が前記第一状態より速い前記第二状態へと間引く処理である、請求項１に記載のプログラム。 2. The program according to claim 1 , wherein the conversion process is a process of thinning the second sound data in the first state to the second state in which a reproduction speed of the second sound data is faster than the first state. .

ネットワークに接続された第一端末装置を制御するコンピュータに、
前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、
前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、
前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、
前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、
前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、
前記第一音データを、第一比率だけ減衰させる第一減衰処理と、
前記第二音データを、前記第一比率より小さな第二比率だけ減衰させる第二減衰処理と、を実行させ、
前記第一ミキシング処理は、減衰された前記第一音データと、減衰された前記第二音データと、を合成する処理である、プログラム。 To the computer that controls the first terminal device connected to the network,
First timing collected by the second terminal device, transmitted from a server device that executes a remote conference via the network by the first terminal device and a second terminal device connected to the network A first acquisition process for acquiring first sound data corresponding to at least the first sound;
A first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the first timing collected by the second terminal device is sent to the server device. One transmission process,
A second acquisition process for acquiring the second sound data transmitted from the server device;
A first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data;
A reproduction process for reproducing the first synthesized data and outputting the reproduced first synthesized sound;
A first attenuation process for attenuating the first sound data by a first ratio ;
A second attenuation process for attenuating the second sound data by a second ratio smaller than the first ratio;
The first mixing process is a program that combines the attenuated first sound data and the attenuated second sound data.

ネットワークに接続された第一端末装置を制御するコンピュータに、To the computer that controls the first terminal device connected to the network,
前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、First timing collected by the second terminal device, transmitted from a server device that executes a remote conference via the network by the first terminal device and a second terminal device connected to the network A first acquisition process for acquiring first sound data corresponding to at least the first sound;
前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、A first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the first timing collected by the second terminal device is sent to the server device. One transmission process,
前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、A second acquisition process for acquiring the second sound data transmitted from the server device;
前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、A first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data;
前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、A reproduction process for reproducing the first synthesized data and outputting the reproduced first synthesized sound;
前記第二端末装置に対応する相手先情報と、前記第一リプレイ要求の送信に対応する操作情報と、を関連付けて表示させる第一表示処理と、を実行させ、A first display process for displaying the destination information corresponding to the second terminal device and the operation information corresponding to the transmission of the first replay request in association with each other;
前記第一送信処理は、前記相手先情報と前記操作情報とが表示されている状態で、前記操作情報に対する入力が受け付けられた場合、前記第一リプレイ要求と、前記遠隔会議で前記第二端末装置を識別する端末情報と、を前記サーバ装置へと送信させる処理である、プログラム。In the first transmission process, when the destination information and the operation information are displayed and an input to the operation information is received, the second terminal is used in the first replay request and the remote conference. A program which is a process of transmitting terminal information for identifying a device to the server device.

ネットワークに接続された第一端末装置を制御するコンピュータに、To the computer that controls the first terminal device connected to the network,
前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、First timing collected by the second terminal device, transmitted from a server device that executes a remote conference via the network by the first terminal device and a second terminal device connected to the network A first acquisition process for acquiring first sound data corresponding to at least the first sound;
前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、A first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the first timing collected by the second terminal device is sent to the server device. One transmission process,
前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、A second acquisition process for acquiring the second sound data transmitted from the server device;
前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、A first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data;
前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、A reproduction process for reproducing the first synthesized data and outputting the reproduced first synthesized sound;
前記第一端末装置で集音された第三音に対応する第三音データを、前記サーバ装置へと送信させる第二送信処理と、Second transmission processing for transmitting third sound data corresponding to the third sound collected by the first terminal device to the server device;
前記サーバ装置から送信される、前記第一端末装置で集音された、前記第一タイミングより前の第三タイミングの前記第三音に対応する前記第三音データが前記サーバ装置から前記第二端末装置に送信されていることを示すリプレイ情報を取得する第三取得処理と、The third sound data transmitted from the server device and collected by the first terminal device and corresponding to the third sound at the third timing before the first timing is transmitted from the server device to the second A third acquisition process for acquiring replay information indicating transmission to the terminal device;
前記リプレイ情報が取得された場合、前記第二端末装置に対応する相手先情報と、前記リプレイ情報の取得を示す報知情報と、を関連付けて表示させる第二表示処理と、を実行させる、プログラム。A program for executing, when the replay information is acquired, a second display process for displaying the destination information corresponding to the second terminal device and the notification information indicating the acquisition of the replay information in association with each other.

前記第一合成データは、複数の音チャネルから構成され、
前記第一ミキシング処理は、前記第一音データを、前記複数の音チャネルのうちの第一チャネルに合成し、前記第二音データを、前記複数の音チャネルのうちの第二チャネルに合成する処理である、請求項１から請求項５の何れか１項に記載のプログラム。 The first synthesized data is composed of a plurality of sound channels,
In the first mixing process, the first sound data is combined with a first channel of the plurality of sound channels, and the second sound data is combined with a second channel of the plurality of sound channels. The program according to any one of claims 1 to 5 , which is a process.

ネットワークに接続された第一端末装置を制御するコンピュータに、To the computer that controls the first terminal device connected to the network,
前記第一端末装置と、前記ネットワークに接続された第二端末装置と、による前記ネットワークを介した遠隔会議を実行するサーバ装置から送信される、前記第二端末装置で集音された第一タイミングの第一音に少なくとも対応する第一音データを取得する第一取得処理と、First timing collected by the second terminal device, transmitted from a server device that executes a remote conference via the network by the first terminal device and a second terminal device connected to the network A first acquisition process for acquiring first sound data corresponding to at least the first sound;
前記第二端末装置で集音された前記第一タイミングより前の第二タイミングの第二音に対応する第二音データの送信を要求する第一リプレイ要求を、前記サーバ装置へと送信させる第一送信処理と、A first replay request for requesting transmission of second sound data corresponding to a second sound at a second timing before the first timing collected by the second terminal device is sent to the server device. One transmission process,
前記サーバ装置から送信される前記第二音データを取得する第二取得処理と、A second acquisition process for acquiring the second sound data transmitted from the server device;
前記第一音データと前記第二音データとを合成した第一合成データを生成する第一ミキシング処理と、A first mixing process for generating first synthesized data obtained by synthesizing the first sound data and the second sound data;
前記第一合成データを再生し、再生された第一合成音を出力させる再生処理と、A reproduction process for reproducing the first synthesized data and outputting the reproduced first synthesized sound;
前記第二音データの再生に影響する状態を第一状態から第二状態へと変換する変換処理と、A conversion process for converting the state affecting the reproduction of the second sound data from the first state to the second state;
前記第一音データを、第一比率だけ減衰させる第一減衰処理と、A first attenuation process for attenuating the first sound data by a first ratio;
前記第二音データを、前記第一比率より小さな第二比率だけ減衰させる第二減衰処理と、A second attenuation process for attenuating the second sound data by a second ratio smaller than the first ratio;
前記第二端末装置に対応する相手先情報と、前記第一リプレイ要求の送信に対応する操作情報と、を関連付けて表示させる第一表示処理と、A first display process for displaying the destination information corresponding to the second terminal device and the operation information corresponding to the transmission of the first replay request in association with each other;
前記第一端末装置で集音された第三音に対応する第三音データを、前記サーバ装置へと送信させる第二送信処理と、Second transmission processing for transmitting third sound data corresponding to the third sound collected by the first terminal device to the server device;
前記サーバ装置から送信される、前記第一端末装置で集音された、前記第一タイミングより前の第三タイミングの前記第三音に対応する前記第三音データが前記サーバ装置から前記第二端末装置に送信されていることを示すリプレイ情報を取得する第三取得処理と、The third sound data transmitted from the server device and collected by the first terminal device and corresponding to the third sound at the third timing before the first timing is transmitted from the server device to the second A third acquisition process for acquiring replay information indicating transmission to the terminal device;
前記リプレイ情報が取得された場合、前記第二端末装置に対応する相手先情報と、前記リプレイ情報の取得を示す報知情報と、を関連付けて表示させる第二表示処理と、を実行させ、When the replay information is acquired, the second display process for displaying the destination information corresponding to the second terminal device and the notification information indicating the acquisition of the replay information in association with each other is executed.
前記第一ミキシング処理は、前記第一音データと、前記第二状態へと変換された前記第二音データと、を合成する処理であり、The first mixing process is a process of combining the first sound data and the second sound data converted into the second state,
前記変換処理は、前記第一状態の前記第二音データを、前記第二音データの再生速度が前記第一状態より速い前記第二状態へと間引く処理であり、The conversion process is a process of thinning the second sound data in the first state to the second state where the reproduction speed of the second sound data is faster than the first state,
前記第一合成データは、複数の音チャネルから構成され、The first synthesized data is composed of a plurality of sound channels,
前記第一ミキシング処理は、更に、The first mixing process further includes:
前記第一音データを、前記複数の音チャネルのうちの第一チャネルに合成し、前記第二音データを、前記複数の音チャネルのうちの第二チャネルに合成する処理で、且つ、A process of combining the first sound data with a first channel of the plurality of sound channels, and combining the second sound data with a second channel of the plurality of sound channels; and
減衰された前記第一音データと、減衰された前記第二音データと、を合成する処理であり、A process of synthesizing the attenuated first sound data and the attenuated second sound data;
前記第一送信処理は、前記相手先情報と前記操作情報とが表示されている状態で、前記操作情報に対する入力が受け付けられた場合、前記第一リプレイ要求と、前記遠隔会議で前記第二端末装置を識別する端末情報と、を前記サーバ装置へと送信させる処理である、プログラム。In the first transmission process, when the destination information and the operation information are displayed and an input to the operation information is received, the second terminal is used in the first replay request and the remote conference. A program which is a process of transmitting terminal information for identifying a device to the server device.

ネットワークに接続されたサーバ装置を制御するコンピュータに、
前記ネットワークに接続された第一端末装置と、前記ネットワークに接続された第二端末装置と、前記ネットワークに接続された第三端末装置と、による前記ネットワークを介した遠隔会議において、前記第二端末装置から送信される、前記第二端末装置で集音された第四音に対応する第四音データを取得する第四取得処理と、
取得された前記第四音データを記憶させる第一記憶処理と、
前記遠隔会議において、前記第三端末装置から送信される、前記第三端末装置で集音された第五音に対応する第五音データを取得する第五取得処理と、
第一タイミングの前記第四音に対応する前記第四音データと、前記第一タイミングの前記第五音に対応する前記第五音データと、を合成した第二合成データを生成する第二ミキシング処理と、
前記第二合成データを、前記第一端末装置へと送信させる第三送信処理と、
前記第一端末装置から送信される、前記第一タイミングより前の第二タイミングの前記第四音に対応する前記第四音データの送信を要求する第二リプレイ要求を取得する第六取得処理と、
前記第二リプレイ要求が取得された場合、前記第二タイミングの前記第四音に対応する前記第四音データを、前記第一端末装置に送信させる第四送信処理と、を実行させるプログラム。 To the computer that controls the server device connected to the network,
In a remote conference via the network by a first terminal device connected to the network, a second terminal device connected to the network, and a third terminal device connected to the network, the second terminal A fourth acquisition process for acquiring fourth sound data corresponding to the fourth sound collected by the second terminal device transmitted from the device;
A first storage process for storing the acquired fourth sound data;
In the remote conference, a fifth acquisition process of acquiring fifth sound data corresponding to the fifth sound collected by the third terminal device, transmitted from the third terminal device;
Second mixing for generating second synthesized data obtained by synthesizing the fourth sound data corresponding to the fourth sound at the first timing and the fifth sound data corresponding to the fifth sound at the first timing. Processing,
A third transmission process for transmitting the second composite data to the first terminal device;
A sixth acquisition process for acquiring a second replay request for requesting transmission of the fourth sound data corresponding to the fourth sound at the second timing before the first timing, transmitted from the first terminal device; ,
When the second replay request is acquired, a program for executing a fourth transmission process for causing the first terminal device to transmit the fourth sound data corresponding to the fourth sound at the second timing.

前記第一記憶処理は、取得された前記第四音データを第一期間分記憶させる処理である、請求項８に記載のプログラム。 The program according to claim 8, wherein the first storage process is a process of storing the acquired fourth sound data for a first period.

前記第一記憶処理は、取得された前記第四音データを、前記第四音データの振幅値が、基準時間連続して基準値以下又は未満とならない前記第一期間分記憶させる処理である、請求項９に記載のプログラム。 The first storage process is a process of storing the acquired fourth sound data for the first period in which the amplitude value of the fourth sound data is not less than or less than a reference value continuously for a reference time. The program according to claim 9.

ネットワークに接続された、第一端末装置と第二端末装置と第三端末装置とサーバ装置と、を含む遠隔会議システムで実行される遠隔会議方法であって、
前記第一端末装置と前記第二端末装置と前記第三端末装置とによる前記ネットワークを介した遠隔会議において、前記第二端末装置から送信される、前記第二端末装置で集音された第六音に対応する第六音データを取得する工程と、
取得された前記第六音データを記憶する工程と、
前記遠隔会議において、前記第三端末装置から送信される、前記第三端末装置で集音された第七音に対応する第七音データを取得する工程と、
第一タイミングの前記第六音に対応する前記第六音データと、前記第一タイミングの前記第七音に対応する前記第七音データと、を合成した第三合成データを生成する工程と、
前記第三合成データを再生し、再生された第二合成音を出力する工程と、
前記第一タイミングより前の第二タイミングの前記第六音の再生を要求するリプレイ要求を取得する工程と、
前記リプレイ要求が取得された場合、前記第三合成データと、前記第二タイミングの前記第六音に対応する前記第六音データと、を合成した第四合成データを生成する工程と、
前記第四合成データを再生し、再生された第三合成音を出力する工程と、を含む遠隔会議方法。 A remote conference method executed in a remote conference system including a first terminal device, a second terminal device, a third terminal device, and a server device connected to a network,
In a remote conference via the network by the first terminal device, the second terminal device, and the third terminal device, a sixth sound collected from the second terminal device and transmitted from the second terminal device Obtaining sixth sound data corresponding to the sound;
Storing the acquired sixth sound data;
Obtaining the seventh sound data corresponding to the seventh sound collected by the third terminal device transmitted from the third terminal device in the remote conference;
Generating third synthesized data obtained by synthesizing the sixth sound data corresponding to the sixth sound at the first timing and the seventh sound data corresponding to the seventh sound at the first timing;
Reproducing the third synthesized data and outputting the reproduced second synthesized sound;
Obtaining a replay request for requesting reproduction of the sixth sound at a second timing prior to the first timing;
When the replay request is acquired, generating fourth synthesized data obtained by synthesizing the third synthesized data and the sixth sound data corresponding to the sixth sound at the second timing;
Reproducing the fourth synthesized data and outputting the reproduced third synthesized sound.