JP2006203548A

JP2006203548A - Voice signal processor for processing voice signals of a plurality of speakers, and program

Info

Publication number: JP2006203548A
Application number: JP2005013039A
Authority: JP
Inventors: Toshiaki Ishibashi; 利晃石橋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-01-20
Filing date: 2005-01-20
Publication date: 2006-08-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a means for issuing only a voice part which seems to have necessity to be listened by a listener concerning a system for synthesizing the voices of a plurality of speakers and issuing them. <P>SOLUTION: A voice signal processing server 13 receives the voice signals of attendants 19 from each terminal apparatus 11 and stores them in a data buffer 1393 so as to respectively discriminate them. An extracting part 133 extracts the signals indicating the voice of a speaker whose speech is desired to be listened by the attendants 19 from the voice signals stored in the data buffer 1393 by following designation data which is previously transmitted from each one of the terminal apparatuses 11 in response to the desire of each attendant 19. The voice signals extracted by the extracting part 133 are mixed by a mixing part 134, and then, transmitted from the voice signal processing server 13 to the terminal apparatuses 11. Consequently, the attendants 19 can listen to only the voices of the desired speakers. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、会議等の状況下において、音響機器を用いた複数話者による同時発声を可能とする音声信号処理技術に関する。 The present invention relates to an audio signal processing technique that enables simultaneous speech by a plurality of speakers using an acoustic device in a meeting or the like.

多地点に配置された者が、通信回線に接続された音響機器を用いて、音声会議に参加することを可能とする技術がある。すなわち、各地点に配置された音響機器により拾音された各々の話者の音声を示す音声信号は各々中央装置に送信され、中央装置においてミキシングされた後、各音響機器に送信される。その結果、各地点に配置された音響機器からは、多地点に配置された話者の音声がミキシングされた音声が発音され、音声会議が成立する。そのような従来技術を開示した文献として、例えば特許文献１がある。
特開平１１−１２７４９９号公報 There is a technology that enables a person placed at multiple points to participate in an audio conference using an audio device connected to a communication line. That is, the audio signals indicating the voices of the respective speakers picked up by the audio devices arranged at the respective points are transmitted to the central device, mixed in the central device, and then transmitted to the audio devices. As a result, the audio equipment arranged at each point produces a sound in which the voices of the speakers arranged at multiple points are mixed, and an audio conference is established. As a document disclosing such a prior art, there is, for example, Patent Document 1.
Japanese Patent Laid-Open No. 11-127499

従来技術による場合、会議の参加者は全ての話者の音声を聞くことになる。そのため、参加者は、全ての音声の中から自分が聞く必要があると考える音声を判別する必要があった。また、記録された過去の会議の音声を再生して聞く場合にも、聞き手は会議の全ての音声を聞く必要があり、上記と同様の問題があった。 According to the prior art, conference participants will hear the voices of all speakers. For this reason, the participant has to determine the voice that he / she needs to hear from all the voices. In addition, when reproducing and listening to recorded past conference audio, the listener needs to listen to all the audio of the conference, and there is a problem similar to the above.

上記の状況に鑑み、本発明は、複数話者の音声を合成して発音するシステムにおいて、聞き手が聞く必要があると考える音声部分のみを発音可能とする手段を提供することを目的とする。 In view of the above situation, an object of the present invention is to provide a means for enabling sound generation of only a voice part that a listener thinks needs to be heard in a system that synthesizes and sounds a plurality of speakers.

上記課題を達成するために、本発明は、複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る音声信号入力手段と、前記音声信号入力手段により受け取られた音声信号および属性データを対応付けて記憶する音声信号記憶手段と、任意の属性を指定する属性指定データを受け取る属性指定データ入力手段と、前記音声信号記憶手段に記憶された音声信号のうち、前記属性指定データにより指定された属性を示す属性データに対応付けて記憶されている音声信号を抽出する抽出手段と、前記抽出手段により抽出された音声信号を出力する出力手段とを備えることを特徴とする音声信号処理装置を提供する。 In order to achieve the above object, the present invention provides audio signal input means for receiving audio signals output from a plurality of terminal devices together with attribute data indicating attributes of the audio signals, and audio received by the audio signal input means. Audio signal storage means for storing signals and attribute data in association with each other, attribute designation data input means for receiving attribute designation data for designating an arbitrary attribute, and among the audio signals stored in the audio signal storage means, the attribute An extraction means for extracting a sound signal stored in association with attribute data indicating an attribute designated by the designated data, and an output means for outputting the sound signal extracted by the extraction means An audio signal processing apparatus is provided.

かかる構成の音声信号処理装置によれば、記憶された音声信号のうち、聞き手が希望する属性を備える音声信号のみが抽出され出力される結果、聞き手にとって不要な音声を含まない音声の再生が可能となる。 According to the audio signal processing apparatus having such a configuration, only the audio signal having the attribute desired by the listener is extracted and output from the stored audio signals, and as a result, it is possible to reproduce audio that does not include unnecessary audio for the listener. It becomes.

また、本発明は、任意の属性を指定する属性指定データを受け取る属性指定データ入力手段と、複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る音声信号入力手段と、前記音声信号入力手段により受け取られた音声信号のうち、前記属性指定データにより指定された属性を示す属性データとともに受け取られた音声信号を抽出する抽出手段と、前記抽出手段により抽出された音声信号を出力する出力手段とを備えることを特徴とする音声信号処理装置を提供する。 Also, the present invention provides attribute designation data input means for receiving attribute designation data for designating an arbitrary attribute, and audio signal input means for receiving audio signals output from a plurality of terminal devices together with attribute data indicating the attributes of the audio signals. Extraction means for extracting the audio signal received with the attribute data indicating the attribute designated by the attribute designation data from among the audio signals received by the audio signal input means; and the voice extracted by the extraction means There is provided an audio signal processing device comprising output means for outputting a signal.

かかる構成の音声信号処理装置によれば、リアルタイムで入力される音声信号のうち、聞き手が希望する属性を備える音声信号のみが抽出され出力される結果、聞き手にとって不要な音声を含まない音声の再生が可能となる。 According to the audio signal processing apparatus having such a configuration, only the audio signal having the attribute desired by the listener is extracted and output from the audio signals input in real time. Is possible.

また、好ましい態様において、前記抽出手段により複数の音声信号が抽出された場合、当該複数の音声信号をミキシングするミキシング手段をさらに備え、前記出力手段は、前記抽出手段により音声信号が１のみ抽出された場合は当該音声信号を出力し、前記抽出手段により複数の音声信号が抽出された場合は前記ミキシング手段によりミキシングされた音声信号を出力するように構成してもよい。 Further, in a preferred aspect, when a plurality of audio signals are extracted by the extraction unit, the output unit further includes a mixing unit that mixes the plurality of audio signals, and the output unit extracts only one audio signal by the extraction unit. In this case, the audio signal may be output, and when a plurality of audio signals are extracted by the extraction unit, the audio signal mixed by the mixing unit may be output.

かかる構成の音声信号処理装置によれば、複数の話者の音声を示す音声信号を個別に出力する場合と比較し、出力される信号量が少なくなる。 According to the audio signal processing device having such a configuration, the amount of signal to be output is reduced as compared with the case where audio signals indicating the voices of a plurality of speakers are individually output.

また、好ましい態様において、前記属性指定データ入力手段は、一の端末装置から前記属性指定データを受け取り、前記出力手段は、当該一の端末装置に音声信号を出力するように構成してもよい。 In a preferred aspect, the attribute designation data input means may receive the attribute designation data from one terminal device, and the output means may output a voice signal to the one terminal device.

かかる構成の音声信号処理装置によれば、端末装置の使用者の希望に応じて抽出された音声信号が当該使用者の端末装置に出力される結果、各々の聞き手に応じた音声信号の提供が可能となる。 According to the audio signal processing device having such a configuration, the audio signal extracted according to the user's request of the terminal device is output to the user's terminal device, so that the audio signal corresponding to each listener can be provided. It becomes possible.

また、好ましい態様において、前記属性データは音声信号の出力元の端末装置の識別子であり、前記属性指定データは１以上の端末装置の識別子を指定するデータであるように構成してもよい。 In a preferred aspect, the attribute data may be an identifier of a terminal device from which an audio signal is output, and the attribute specifying data may be data specifying an identifier of one or more terminal devices.

かかる構成の音声信号処理装置によれば、端末装置の識別子により、聞き手が聞きたいと思う音声信号が特定される。 According to the audio signal processing device having such a configuration, the audio signal that the listener wants to hear is specified by the identifier of the terminal device.

また、好ましい態様において、前記複数の端末装置はネットワークを介して前記音声信号入力手段と接続されており、前記識別子は前記ネットワーク上で端末装置に割り当てられたアドレスであるように構成してもよい。 In a preferred aspect, the plurality of terminal devices may be connected to the audio signal input means via a network, and the identifier may be an address assigned to the terminal device on the network. .

かかる構成の音声信号処理装置によれば、ネットワークアドレスにより、聞き手が聞きたいと思う音声信号が特定される。 According to the audio signal processing apparatus having such a configuration, the audio signal that the listener wants to hear is specified by the network address.

また、好ましい態様において、前記属性データは音声信号の生成された時刻を示す時刻データであり、前記属性指定データは任意の時間帯を指定する時間帯指定データであり、前記抽出手段は、前記時間帯指定データにより指定された時間帯に含まれる時刻を示す時刻データに対応付けられた音声信号を抽出するように構成してもよい。 Further, in a preferred aspect, the attribute data is time data indicating a time when an audio signal is generated, the attribute specifying data is time zone specifying data specifying an arbitrary time zone, and the extracting means includes the time You may comprise so that the audio | voice signal matched with the time data which shows the time contained in the time slot | zone designated by the band designation | designated data may be extracted.

かかる構成の音声信号処理装置によれば、聞き手が指定する時間帯において発言された音声を示す音声信号のみが抽出され、不要な時間帯の音声信号を聞き手が選択する必要がなくなる。 According to the audio signal processing device having such a configuration, only the audio signal indicating the audio uttered in the time zone specified by the listener is extracted, and it is not necessary for the listener to select an audio signal in an unnecessary time zone.

また、好ましい態様において、前記音声信号処理装置は一の種類の属性を示す属性データと他の種類の属性を示す属性データとの対応関係を示す対応データを記憶する対応データ記憶手段をさらに備え、前記抽出手段は、前記音声信号入力手段により受け取られた一の種類の属性データに代えて、前記対応データに従い当該属性データに対応する他の種類の属性データを用いて、音声信号の抽出を行うように構成してもよい。 In a preferred aspect, the audio signal processing apparatus further includes correspondence data storage means for storing correspondence data indicating a correspondence relationship between attribute data indicating one type of attribute and attribute data indicating another type of attribute, The extraction unit extracts an audio signal using another type of attribute data corresponding to the attribute data according to the corresponding data instead of the one type of attribute data received by the audio signal input unit. You may comprise as follows.

かかる構成の音声信号処理装置によれば、音声信号に直接対応付けられている属性データとは異なる種類の属性により、聞きたい音声を抽出するための条件を指定することが可能となる。 According to the audio signal processing apparatus having such a configuration, it is possible to specify a condition for extracting a desired voice by using an attribute of a different type from the attribute data directly associated with the audio signal.

また、好ましい態様において、前記対応データは、端末装置の識別子と当該端末装置を使用する話者の属性を示す話者データとの対応関係を示し、前記音声信号入力手段は、音声信号とともに当該音声信号の出力元の端末装置の識別子を属性データとして受け取り、前記属性指定データ入力手段は、話者の属性を指定するデータを属性指定データとして受け取り、前記抽出手段は、前記属性指定データにより指定された属性を示す話者データに対応する端末装置の識別子を前記対応データに従い特定し、特定した識別子とともに受け取られた音声信号を抽出するように構成してもよい。 Further, in a preferred aspect, the correspondence data indicates a correspondence relationship between an identifier of the terminal device and speaker data indicating an attribute of the speaker using the terminal device, and the voice signal input means includes the voice signal and the voice data. An identifier of a terminal device that is a signal output source is received as attribute data, the attribute designation data input means receives data that designates a speaker attribute as attribute designation data, and the extraction means is designated by the attribute designation data The terminal device identifier corresponding to the speaker data indicating the attribute may be specified according to the corresponding data, and the voice signal received together with the specified identifier may be extracted.

かかる構成の音声信号処理装置によれば、音声信号が端末装置の識別子により特定可能な場合であっても、聞き手は音声信号の話者の属性により、聞きたいと思う音声の抽出を指定することができる。 According to the audio signal processing device having such a configuration, even when the audio signal can be specified by the identifier of the terminal device, the listener specifies the extraction of the audio to be heard by the attribute of the speaker of the audio signal. Can do.

また、本発明は、上記の音声信号処理装置により行われる処理をコンピュータに実行させるプログラムを提供する。 The present invention also provides a program that causes a computer to execute processing performed by the above-described audio signal processing apparatus.

［１．第１実施形態］
［１．１．音声会議システムの構成］
図１は、本発明の第１実施形態にかかる音声会議システム１の構成を示したブロック図である。音声会議システム１は、互いに異なる場所にいる会議参加者が、音声により会議を行うことを可能とするシステムである。音声会議システム１は、複数の通信機器を相互に接続するネットワーク１０と、ネットワーク１０に各々接続された複数の端末装置１１と、端末装置１１の各々に接続されたヘッドセット１２と、ネットワーク１０に接続された音声信号処理サーバ１３を備えている。 [1. First Embodiment]
[1.1. Configuration of audio conference system]
FIG. 1 is a block diagram showing a configuration of an audio conference system 1 according to the first embodiment of the present invention. The audio conference system 1 is a system that enables conference participants in different places to hold a conference by audio. The audio conference system 1 includes a network 10 for connecting a plurality of communication devices to each other, a plurality of terminal devices 11 connected to the network 10, a headset 12 connected to each of the terminal devices 11, and a network 10. A connected audio signal processing server 13 is provided.

複数の端末装置１１およびヘッドセット１２の各々は、会議の参加者１９の各々により使用される。音声会議システム１を利用した会議に参加可能な参加者の数、すなわち端末装置１１およびヘッドセット１２の数は任意に変更可能であり、さらに会議の進行中に参加者の構成が変動してもよい。 Each of the plurality of terminal devices 11 and the headset 12 is used by each of the conference participants 19. The number of participants who can participate in the conference using the audio conference system 1, that is, the number of the terminal devices 11 and the headsets 12 can be arbitrarily changed, and even if the configuration of the participants fluctuates during the conference, Good.

図１に示すように、異なる参加者１９および当該参加者１９が使用する端末装置１１およびヘッドセット１２を互いに区別する必要がある場合には、それぞれ、参加者１９−ｎ、端末装置１１−ｎおよびヘッドセット１２−ｎのように、末尾に「−ｎ」を付してそれらを区別する。ただし、「ｎ」は任意の自然数である。また、異なる参加者１９および当該参加者１９が使用する端末装置１１およびヘッドセット１２を互いに区別する必要がない場合には、それぞれ、単に参加者１９、端末装置１１およびヘッドセット１２と呼ぶ。 As shown in FIG. 1, when different participants 19 and the terminal device 11 and the headset 12 used by the participant 19 need to be distinguished from each other, the participant 19-n and the terminal device 11-n, respectively. And, like the headset 12-n, “-n” is added to the end to distinguish them. However, “n” is an arbitrary natural number. In addition, when there is no need to distinguish different participants 19 and the terminal devices 11 and headsets 12 used by the participants 19, they are simply referred to as participants 19, terminal devices 11 and headsets 12, respectively.

ネットワーク１０は、有線または無線により相互接続された１以上の中継装置を備え、異なる通信機器間のデータの中継を行う。ネットワーク１０は、インターネット等の利用者を限定しないオープンネットワークであってもよいし、イントラネットやインターネットプロトコル以外の通信プロトコルを用いるＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のいずれであってもよい。 The network 10 includes one or more relay devices interconnected by wire or wireless, and relays data between different communication devices. The network 10 may be an open network that does not limit users such as the Internet, or may be any of an intranet, a LAN (Local Area Network) using a communication protocol other than the Internet protocol, and the like.

端末装置１１は、参加者１９の音声を示す音声信号を音声信号処理サーバ１３に送信するとともに、音声信号処理サーバ１３から他の参加者１９の音声を示す音声信号が合成された合成音声信号を受信する装置であり、例えば、汎用のパーソナルコンピュータ、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、専用端末装置等のいずれであってもよい。端末装置１１は、所定時間ごとにクロック信号を発する発振器を備え発せられたクロック信号をカウントすることにより現在の時刻を示す時刻データを生成する計時部１１１と、端末装置１１が音声信号処理サーバ１３から受信すべき音声信号の属性を指定する指定データをネットワーク１０に送信する指定データ送信部１１２と、ヘッドセット１２から入力されるアナログ音声信号をデジタル音声信号に変換しその時点における時刻データを対応づけて音声信号送信部１１４に引き渡すとともに記憶部１１６に記憶されているデジタル音声信号をアナログ音声信号に変換してヘッドセット１２に出力する音声信号処理部１１３と、音声信号処理部１１３から音声信号および時刻データを受け取ってネットワーク１０に送出する音声信号送信部１１４と、ネットワーク１０から音声信号を受け取って記憶部１１６に書き込む音声信号受信部１１５と、端末装置１１の制御プログラム等を記憶するとともに他の構成部のワークエリアとして利用される記憶部１１６を備えている。また、端末装置１１の各々には互いに異なる端末ＩＤが予め割り当てられており、記憶部１１６に記憶されている。 The terminal device 11 transmits a voice signal indicating the voice of the participant 19 to the voice signal processing server 13 and a synthesized voice signal obtained by synthesizing a voice signal indicating the voice of the other participant 19 from the voice signal processing server 13. For example, a general-purpose personal computer, a PDA (Personal Digital Assistant), or a dedicated terminal device may be used. The terminal device 11 includes an oscillator that generates a clock signal every predetermined time, and counts the generated clock signal to generate time data indicating the current time, and the terminal device 11 includes the audio signal processing server 13. Corresponding time data at that time by converting the analog audio signal input from the headset 12 into a digital audio signal and the specified data transmitting unit 112 for transmitting the specified data specifying the attribute of the audio signal to be received from the network 10 Then, the audio signal processing unit 113 that passes the audio signal to the audio signal transmission unit 114 and converts the digital audio signal stored in the storage unit 116 into an analog audio signal and outputs the analog audio signal to the headset 12. And audio signal transmission for receiving time data and sending it to the network 10 114, an audio signal receiving unit 115 that receives an audio signal from the network 10 and writes it to the storage unit 116, and a storage unit 116 that stores a control program of the terminal device 11 and is used as a work area of other components. ing. Further, different terminal IDs are assigned in advance to each of the terminal devices 11 and stored in the storage unit 116.

ヘッドセット１２は、参加者１９の音声を示すアナログ音声信号を生成して端末装置１１に出力するマイクと、端末装置１１から入力されるアナログ音声信号を音声に変換して発音するヘッドフォンを備えている。 The headset 12 includes a microphone that generates an analog audio signal indicating the voice of the participant 19 and outputs the analog audio signal to the terminal device 11, and a headphone that converts the analog audio signal input from the terminal device 11 into sound and generates a sound. Yes.

音声信号処理サーバ１３は、複数の端末装置１１の各々から音声信号を受信し、受信した複数の音声信号をミキシングもしくはグループ化した後、複数の端末装置１１の各々に送信する装置である。音声信号処理サーバ１３は、ネットワーク１０から指定データを受信して記憶部１３９に書き込む指定データ受信部１３１と、ネットワーク１０から音声信号および時刻データを受信して記憶部１３９に書き込む音声信号受信部１３２と、記憶部１３９に記憶された音声信号のうち所定の条件を満たす音声信号を抽出する抽出部１３３と、抽出部１３３により抽出された複数の音声信号をミキシングして合成音声信号を生成するミキシング部１３４と、抽出部１３３により抽出された音声信号もしくはミキシング部１３４により生成された合成音声信号をネットワーク１０に送信する音声信号送信部１３５と、音声信号処理サーバ１３の制御プログラム等を記憶するとともに他の構成部のワークエリアとして利用される記憶部１３９を備えている。 The audio signal processing server 13 is a device that receives an audio signal from each of the plurality of terminal devices 11, mixes or groups the received plurality of audio signals, and then transmits the audio signal to each of the plurality of terminal devices 11. The audio signal processing server 13 receives the specified data from the network 10 and writes it to the storage unit 139, and the audio signal reception unit 132 receives the audio signal and time data from the network 10 and writes them to the storage unit 139. And an extraction unit 133 that extracts an audio signal satisfying a predetermined condition from among the audio signals stored in the storage unit 139, and a mixing that generates a synthesized audio signal by mixing a plurality of audio signals extracted by the extraction unit 133 And the audio signal transmission unit 135 that transmits the audio signal extracted by the extraction unit 133 or the synthesized audio signal generated by the mixing unit 134 to the network 10 and the control program of the audio signal processing server 13. A storage unit 139 used as a work area for other components is provided. That.

記憶部１３９は、予め会議データ１３９１および参加者データ１３９２を記憶している。会議データ１３９１は、過去に音声会議システム１により開催された会議および将来開催予定の会議の参加者等を示すデータである。図２は会議データ１３９１の内容を例示した図である。会議データ１３９１は、会議ＩＤ、日付、時間帯、議題、参加者ＩＤおよび役割の項目からなるレコードを、日付および時間帯順に複数含んでいる。各レコードは各会議に対応している。各レコードの参加者ＩＤの項目には、参加者の数に応じて複数の参加者ＩＤが含まれ得る。また、各レコードの役割の項目には、参加者ＩＤで特定される各参加者の会議における役割を示すデータが含まれる。役割を示すデータとしては、例えば「司会」、「発表者」、「通訳」、「ゲスト」、「一般参加者」等がある。例えば、現在が２００４年１２月１日の朝であるとすると、例えば会議ＩＤ「２００４０００１」で特定される会議（以下、会議「２００４０００１」のように呼ぶ）は、既に過去に開催された会議の参加者等の記録を示す。一方、例えば会議「２００４０３１５」は将来開催される予定の会議の参加者等を示す。 The storage unit 139 stores conference data 1391 and participant data 1392 in advance. The conference data 1391 is data indicating participants of conferences held by the audio conference system 1 in the past and conferences scheduled to be held in the future. FIG. 2 is a diagram illustrating the contents of the conference data 1391. The conference data 1391 includes a plurality of records composed of items of conference ID, date, time zone, agenda, participant ID, and role in order of date and time zone. Each record corresponds to each meeting. The item of participant ID of each record may include a plurality of participant IDs according to the number of participants. Further, the role item of each record includes data indicating the role of each participant specified by the participant ID in the conference. Examples of data indicating roles include “moderator”, “presenter”, “interpreter”, “guest”, “general participant”, and the like. For example, if the current day is the morning of December 1, 2004, a conference identified by the conference ID “2004001” (hereinafter referred to as a conference “2004001”) is a conference that has already been held in the past. Records of participants etc. are shown. On the other hand, for example, a conference “2004315” indicates participants of a conference scheduled to be held in the future.

参加者データ１３９２は、音声会議システム１を用いて会議に参加可能な者の氏名等を示すデータである。図３は参加者データ１３９２の内容を例示した図である。ただし、図３においては、会議「２００４０３１５」への参加者に関するレコードをピックアップして例示している。参加者データ１３９２は、参加者ＩＤ、氏名、所属、役職およびパスワードの項目からなるレコードを複数含んでいる。参加者データ１３９２に含まれる参加者ＩＤは、会議データ１３９１に含まれる参加者ＩＤと共通である。参加者データ１３９２において、各レコードは各々異なる人物に対応しているため、参加者ＩＤがレコード間で重複して用いられることはないが、会議データ１３９１に含まれる参加者ＩＤは、同じ人物が異なる会議に参加することが可能であるため、異なるレコードに同じ参加者ＩＤが含まれ得る。会議データ１３９１および参加者データ１３９２は、例えば音声会議システム１の管理者により作成され、必要に応じて更新される。 Participant data 1392 is data indicating names of persons who can participate in the conference using the audio conference system 1. FIG. 3 is a diagram illustrating the contents of the participant data 1392. However, in FIG. 3, a record relating to a participant in the conference “2004315” is shown as an example. Participant data 1392 includes a plurality of records including items of participant ID, name, affiliation, title, and password. The participant ID included in the participant data 1392 is common to the participant ID included in the conference data 1391. In the participant data 1392, since each record corresponds to a different person, the participant ID is not used repeatedly between records, but the participant ID included in the conference data 1391 is the same person. Because it is possible to participate in different conferences, the same participant ID can be included in different records. The conference data 1391 and the participant data 1392 are created, for example, by the administrator of the audio conference system 1 and updated as necessary.

全ての参加者１９には、予め、音声会議システム１の管理者等から参加者データ１３９２に登録されている各人の参加者ＩＤおよびパスワードが通知されている。また、会議データ１３９１の各レコードに含まれる参加者ＩＤの各々に対応する参加者１９には、予め、音声会議システム１の管理者等から、当該レコードに含まれる会議ＩＤ、日付、時間帯、議題および役割が通知されている。 All participants 19 are notified in advance of the participant ID and password of each person registered in the participant data 1392 from the administrator of the audio conference system 1 or the like. In addition, the participant 19 corresponding to each participant ID included in each record of the conference data 1391 is given a conference ID, date, time zone, The agenda and role are informed.

［１．２．音声会議システムの動作］
続いて、複数の参加者１９が音声会議システム１を用いて会議を行う場合の音声会議システム１の動作を、会議「２００４０３１５」が開催される場合を例として説明する。図４は、会議「２００４０３１５」が開催される際の参加者１９、端末装置１１、端末装置１１と音声信号処理サーバ１３との間に確立されるコネクション、音声信号処理サーバ１３のデータバッファの対応関係を示す図である。会議データ１３９１（図２参照）に示されるように、会議「２００４０３１５」へは、参加者ＩＤ「０４２５」、「００２５」、「３７４７」、「００７４」、「０３６２」および「９１２５」が割り当てられている６名の参加者１９が参加する。以下、それら６名の参加者を、順に参加者１９−１、参加者１９−２、・・・、参加者１９−６とする。 [1.2. Operation of the audio conference system]
Next, the operation of the audio conference system 1 when a plurality of participants 19 hold a conference using the audio conference system 1 will be described by taking a case where the conference “2004315” is held as an example. FIG. 4 illustrates the correspondence between the participant 19 when the conference “2004315” is held, the terminal device 11, the connection established between the terminal device 11 and the audio signal processing server 13, and the data buffer of the audio signal processing server 13. It is a figure which shows a relationship. As shown in the conference data 1391 (see FIG. 2), the participant IDs “0425”, “0025”, “3747”, “0074”, “0362”, and “9125” are assigned to the conference “2004315”. Six participants 19 attending. Hereinafter, these six participants are sequentially referred to as a participant 19-1, a participant 19-2,..., A participant 19-6.

まず、参加者１９−１〜６の各々は、端末装置１１−１〜６を操作して、端末装置１１と音声信号処理サーバ１３との間に通信コネクションを確立させる。例として、端末装置１１−１〜６の各々には、端末ＩＤ「００４１」、「０３０１」、「０２７８」、「００７５」、「０１２３」および「００８４」が割り当てられているものとする。音声信号処理サーバ１３は、端末装置１１−１〜６の各々との間に通信コネクションを確立し、確立した通信コネクションの各々にコネクションＩＤを割り当てる。通信コネクションの確立の方法は従来技術によるものと同様であるので、説明を省略する。端末装置１１は、各々、任意のタイミングで音声信号処理サーバ１３との間に通信コネクションを確立することができるので、参加者１９は任意のタイミングで会議に参加したり、会議から離脱したりすることができる。以下、例として、音声信号処理サーバ１３は端末装置１１−１〜６の各々との間に確立される通信コネクションに対し、それぞれコネクションＩＤ「０００４」、「００１５」、「００３４」、「００２１」、「００２３」および「０００９」を割り当てるものとする。 First, each of the participants 19-1 to 19-6 operates the terminal devices 11-1 to 6 to establish a communication connection between the terminal device 11 and the audio signal processing server 13. As an example, it is assumed that terminal IDs “0041”, “0301”, “0278”, “0075”, “0123”, and “0084” are assigned to each of the terminal devices 11-1 to 11-6. The audio signal processing server 13 establishes a communication connection with each of the terminal devices 11-1 to 11, and assigns a connection ID to each of the established communication connections. Since the method for establishing the communication connection is the same as that according to the prior art, the description thereof is omitted. Since each of the terminal devices 11 can establish a communication connection with the audio signal processing server 13 at an arbitrary timing, the participant 19 joins or leaves the conference at an arbitrary timing. be able to. Hereinafter, as an example, the audio signal processing server 13 has connection IDs “0004”, “0015”, “0034”, and “0021” for communication connections established between the terminal devices 11-1 to 11-6, respectively. , “0023” and “0009”.

音声信号処理サーバ１３は、上記のように端末装置１１との間に通信コネクションを確立すると、確立した通信コネクションを介して受信される音声信号を一時的に記憶するためのデータバッファ１３９３を記憶部１３９に確保する。以下、音声信号処理サーバ１３が端末装置１１−１〜６の各々との間に確立した通信コネクションのそれぞれのために確保するデータバッファをデータバッファ１３９３−１〜６とする。データバッファ１３９３−１〜６には対応する通信コネクションのコネクションＩＤが対応付けられ、当該コネクションＩＤにより、各々のデータバッファ１３９３に記憶される音声信号が、いずれの通信コネクションを介して受信されたものであるかが識別される。一方、端末装置１１もまた、通信コネクションの確立に際して、音声信号処理サーバ１３から受信する音声信号を一時的に記憶するためのデータバッファ１１６１を記憶部１１６に確保する。 When the audio signal processing server 13 establishes a communication connection with the terminal device 11 as described above, the audio signal processing server 13 stores a data buffer 1393 for temporarily storing an audio signal received via the established communication connection. 139 is secured. Hereinafter, data buffers reserved for each of the communication connections established between the audio signal processing server 13 and each of the terminal devices 11-1 to 6 will be referred to as data buffers 1393-1 to 1393-1. The connection IDs of the corresponding communication connections are associated with the data buffers 1393-1 to 6, and the audio signals stored in the respective data buffers 1393 are received via any communication connection by the connection ID. Is identified. On the other hand, the terminal device 11 also secures a data buffer 1161 in the storage unit 116 for temporarily storing an audio signal received from the audio signal processing server 13 when establishing a communication connection.

また、音声信号処理サーバ１３は、端末装置１１との間に通信コネクションを確立する際、端末装置１１から端末ＩＤを取得し、当該端末装置１１に割り当てたコネクションＩＤと端末ＩＤとの対応関係を示すデータを作成する。図５は音声信号処理サーバ１３において作成されるコネクションＩＤと端末ＩＤの対応関係を示すデータの例を示している。音声信号処理サーバ１３はこのデータに従い、コネクションＩＤにより、そのコネクションＩＤにより特定される通信コネクションを用いた通信の相手の端末装置１１を特定することができる。 Further, when establishing a communication connection with the terminal device 11, the audio signal processing server 13 obtains a terminal ID from the terminal device 11, and determines the correspondence between the connection ID assigned to the terminal device 11 and the terminal ID. Create the data shown. FIG. 5 shows an example of data indicating the correspondence between the connection ID and the terminal ID created in the audio signal processing server 13. According to this data, the audio signal processing server 13 can specify the communication partner terminal device 11 using the communication ID specified by the connection ID.

上記のように端末装置１１と音声信号処理サーバ１３との間に通信コネクションの確立が行われると、続いて参加者１９は端末装置１１を操作して、参加者ＩＤおよびパスワードを音声信号処理サーバ１３に送信し、音声信号処理サーバ１３は受信した参加者ＩＤおよびパスワードの組合せが参加者データ１３９２（図３参照）に含まれる参加者ＩＤおよびパスワードの組合せと一致するかを判定することにより、参加者１９の認証を行う。その判定により参加者１９が正しく認証されると、続いて参加者１９は端末装置１１を操作して、参加したい会議の会議ＩＤ「２００４０３１５」を音声信号処理サーバ１３に送信する。音声信号処理サーバ１３は会議データ１３９１（図２参照）を参照し、受信した会議ＩＤ「２００４０３１５」に対応するレコードの参加者ＩＤの項目に、先に受信した参加者ＩＤが含まれているかを判定することにより、参加者１９が会議「２００４０３１５」への参加資格を有することを確認する。 When the communication connection is established between the terminal device 11 and the audio signal processing server 13 as described above, the participant 19 subsequently operates the terminal device 11 to obtain the participant ID and password as the audio signal processing server. The audio signal processing server 13 determines whether the received combination of the participant ID and password matches the combination of the participant ID and password included in the participant data 1392 (see FIG. 3), The participant 19 is authenticated. If the participant 19 is correctly authenticated by the determination, the participant 19 subsequently operates the terminal device 11 to transmit the conference ID “20044031” of the conference to be joined to the audio signal processing server 13. The audio signal processing server 13 refers to the conference data 1391 (see FIG. 2), and checks whether the participant ID received earlier is included in the item of the participant ID of the record corresponding to the received conference ID “200440315”. By determining, it is confirmed that the participant 19 has the eligibility to participate in the conference “2004315”.

音声信号処理サーバ１３は、上記判定により参加者１９が会議「２００４０３１５」への参加資格を有することを確認すると、会議「２００４０３１５」の参加者１９の各々について対応するコネクションＩＤ等の各種属性を示す対応データ１３９４を生成し、記憶部１３９に記憶する。図６は会議「２００４０３１５」に関する対応データ１３９４の内容を例示した図である。ただし、図６は会議「２００４０３１５」への参加者の全てに関し、上記のように音声信号処理サーバ１３により会議への参加資格の確認が行われた後の対応データ１３９４を示している。対応データ１３９４は、コネクションＩＤ、端末ＩＤ、参加者ＩＤ、氏名、所属、役職および役割の項目からなるレコードを複数含んでいる。対応データ１３９４の各レコードは会議への参加者に対応している。 When the audio signal processing server 13 confirms that the participant 19 has the qualification to participate in the conference “2004315” by the above determination, the audio signal processing server 13 indicates various attributes such as a connection ID corresponding to each participant 19 of the conference “200440315”. Correspondence data 1394 is generated and stored in the storage unit 139. FIG. 6 is a diagram illustrating the content of the correspondence data 1394 related to the conference “2004315”. However, FIG. 6 shows the correspondence data 1394 after confirming the eligibility for participation in the conference by the audio signal processing server 13 as described above for all the participants in the conference “2004315”. The correspondence data 1394 includes a plurality of records including items of connection ID, terminal ID, participant ID, name, affiliation, post, and role. Each record of the correspondence data 1394 corresponds to a participant in the conference.

音声信号処理サーバ１３は、各々の参加者１９について会議への参加資格があることを確認すると、その参加者１９が用いている端末装置１１と音声信号処理サーバ１３との間で確立されている通信コネクションのコネクションＩＤに対し、先に生成したコネクションＩＤと端末ＩＤの対応関係を示すデータ（図５参照）に基づき、対応する端末ＩＤを特定する。また、音声信号処理サーバ１３は、端末装置１１から受信した参加者ＩＤを検索キーとして参加者データ１３９２（図３参照）からレコードを検索し、検索したレコードに含まれる氏名、所属および役職を特定する。さらに、音声信号処理サーバ１３は、端末装置１１から受信した会議ＩＤを検索キーとして会議データ１３９１（図２参照）からレコードを検索し、検索したレコードの役割の項目に含まれるデータのうち、端末装置１１から受信した参加者ＩＤに対応するものを特定する。音声信号処理サーバ１３は上記のように各々特定したデータを対応付け、対応データ１３９４のレコードとして格納する。 When the audio signal processing server 13 confirms that each participant 19 is eligible to participate in the conference, the audio signal processing server 13 is established between the terminal device 11 used by the participant 19 and the audio signal processing server 13. For the connection ID of the communication connection, the corresponding terminal ID is specified based on the data (see FIG. 5) indicating the correspondence between the connection ID and the terminal ID generated earlier. Also, the audio signal processing server 13 searches for records from the participant data 1392 (see FIG. 3) using the participant ID received from the terminal device 11 as a search key, and specifies the name, affiliation, and title included in the searched record. To do. Further, the audio signal processing server 13 searches for records from the conference data 1391 (see FIG. 2) using the conference ID received from the terminal device 11 as a search key, and among the data included in the item of the role of the searched record, the terminal The one corresponding to the participant ID received from the device 11 is specified. The audio signal processing server 13 associates the data identified as described above and stores them as records of the correspondence data 1394.

上記のように生成される対応データ１３９４に含まれる各項目は、参加者１９の異なる種類の属性を示している。すなわち、コネクションＩＤは参加者１９の使用する通信コネクションを示し、端末ＩＤは参加者１９の使用する端末装置１１を示し、氏名、所属、役職および役割は、各々、その名称により示される参加者１９の属性を示している。さらに、対応データ１３９４の各レコードは、後述するように、参加者ＩＤにより各参加者１９の音声を示す音声信号と対応付けられる。その結果、対応データ１３９４は各音声信号の属性をも意味するデータとなる。すなわち、例えば氏名「ササキコウジ」は、参加者ＩＤ「０４２５」に対応付けられた音声信号の話者の氏名を示すデータである。 Each item included in the correspondence data 1394 generated as described above indicates different types of attributes of the participant 19. That is, the connection ID indicates a communication connection used by the participant 19, the terminal ID indicates the terminal device 11 used by the participant 19, and the name, affiliation, title, and role are each indicated by the participant 19. Shows the attributes. Furthermore, each record of the correspondence data 1394 is associated with an audio signal indicating the audio of each participant 19 by the participant ID, as will be described later. As a result, the correspondence data 1394 is data that also means the attribute of each audio signal. That is, for example, the name “Sasaki Koji” is data indicating the name of the speaker of the audio signal associated with the participant ID “0425”.

音声信号処理サーバ１３は、会議データ１３９１に従い、会議「２００４０３１５」の開催日時である２００４年１２月１日１３：００になると、会議「２００４０３１５」に関し生成されている対応データ１３９４に従い、端末装置１１−１〜６の各々との間で音声信号の送受信を開始する。その結果、参加者１９−１〜６はヘッドセット１２を用いて会議の参加者に対し発言を行うとともに、他の参加者の発言を聞くことができるようになる。以下、そのための音声会議システム１の動作を説明する。 In accordance with the conference data 1391, the audio signal processing server 13, according to the correspondence data 1394 generated for the conference “2004315”, when the meeting “2004315” is held at 13:00 on December 1, 2004, is the terminal device 11. Audio signal transmission / reception with each of -1 to 6 is started. As a result, the participants 19-1 to 19-6 can speak to the conference participants using the headset 12 and can listen to the speech of other participants. Hereinafter, the operation of the audio conference system 1 for that purpose will be described.

参加者１９の発音した音声は、ヘッドセット１２のマイクによりアナログ音声信号に変換され、端末装置１１の音声信号処理部１１３に入力される。音声信号処理部１１３はヘッドセット１２から受け取ったアナログ音声信号をデジタル音声信号に変換した後、変換後の音声信号を含むデータパケットを生成する。 The sound produced by the participant 19 is converted into an analog sound signal by the microphone of the headset 12 and input to the sound signal processing unit 113 of the terminal device 11. The audio signal processing unit 113 converts the analog audio signal received from the headset 12 into a digital audio signal, and then generates a data packet including the converted audio signal.

図７は、音声信号が複数のデータパケットに含まれる様子を模式的に示した図である。音声信号処理部１１３は、音声信号を先頭から順に所定のデータ長の音声信号ブロックに分割する。音声信号処理部１１３は、音声信号ブロックの間の順序を示すブロック番号を音声信号ブロックの前に付加する。さらに、音声信号処理部１１３はブロック番号の前に、コネクションＩＤ、送信元ＩＤおよび送信先ＩＤを順次付加する。送信元ＩＤは端末装置１１の端末ＩＤであり、送信先ＩＤは音声信号処理サーバ１３のネットワーク１０におけるＩＤである。さらに、音声信号処理部１１３は、その時点で計時部１１１から受け取った時刻データを、音声信号ブロックの後に付加する。音声信号処理部１１３は、そのように各種データの付加された音声信号ブロックの前および後に、一連のデータの区切りを示すデータとして、ＨＯＤ（ＨｅａｄｏｆＤａｔａ）およびＥＯＤ（ＥｎｄｏｆＤａｔａ）を付加する。このように生成されたＨＯＤで始まりＥＯＤで終わる一連のデータがデータパケットである。 FIG. 7 is a diagram schematically illustrating how an audio signal is included in a plurality of data packets. The audio signal processing unit 113 divides the audio signal into audio signal blocks having a predetermined data length in order from the top. The audio signal processing unit 113 adds a block number indicating the order between the audio signal blocks to the front of the audio signal block. Furthermore, the audio signal processing unit 113 sequentially adds a connection ID, a transmission source ID, and a transmission destination ID before the block number. The transmission source ID is a terminal ID of the terminal device 11, and the transmission destination ID is an ID in the network 10 of the audio signal processing server 13. Furthermore, the audio signal processing unit 113 adds the time data received from the time measuring unit 111 at that time after the audio signal block. The audio signal processing unit 113 adds HOD (Head of Data) and EOD (End of Data) as data indicating a series of data delimiters before and after the audio signal block to which various kinds of data are added. . A series of data starting with the HOD generated in this way and ending with EOD is a data packet.

データパケットに含まれる時刻データは、正確にはデータパケットの生成時刻を示すデータであるが、参加者１９による発声からデータパケットの生成までの時間が十分に短い場合、時刻データは、当該音声信号ブロックに対応する音声が参加者１９により発せられた時刻もしくは当該音声信号ブロックが生成された時刻を示すデータである。従って、時刻データは、当該音声信号ブロックの属性を示す属性データの一種であると言える。 The time data included in the data packet is exactly data indicating the generation time of the data packet, but when the time from the utterance by the participant 19 to the generation of the data packet is sufficiently short, the time data This is data indicating the time when the audio corresponding to the block is emitted by the participant 19 or the time when the audio signal block is generated. Therefore, it can be said that the time data is a kind of attribute data indicating the attribute of the audio signal block.

音声信号処理部１１３は、上記のように生成したデータパケットを順次、音声信号送信部１１４に引き渡し、音声信号送信部１１４は受け取ったデータパケットを順次、ネットワーク１０に送出する。ネットワーク１０に含まれる中継装置は、送信先ＩＤによりネットワーク１０において特定される通信機器へ到達可能な通信経路を示すルーティングテーブルを記憶しており、端末装置１１から送出されたデータパケットに含まれる送信先ＩＤに基づき、ルーティングテーブルにより送信先ＩＤにより特定される通信機器へ到達可能な通信経路上の隣接する中継装置にデータパケットを転送する。その結果、データパケットは音声信号処理サーバ１３に送り届けられる。ルーティングテーブルの更新方法等は従来技術によるものと同様であるので、説明を省略する。 The audio signal processing unit 113 sequentially transfers the data packets generated as described above to the audio signal transmission unit 114, and the audio signal transmission unit 114 sequentially transmits the received data packets to the network 10. The relay device included in the network 10 stores a routing table indicating a communication path that can reach the communication device specified in the network 10 by the transmission destination ID, and is included in the data packet transmitted from the terminal device 11. Based on the destination ID, the data packet is transferred to the adjacent relay device on the communication path that can reach the communication device specified by the destination ID by the routing table. As a result, the data packet is delivered to the audio signal processing server 13. Since the routing table update method and the like are the same as those in the prior art, the description thereof is omitted.

ネットワーク１０を介して上記のように音声信号処理サーバ１３に送り届けられたデータパケットは、音声信号処理サーバ１３の音声信号受信部１３２により受信される。音声信号受信部１３２は、受信したデータパケットに含まれるコネクションＩＤに従い、当該コネクションＩＤに対応するデータバッファ１３９３の当該データパケットに含まれるブロック番号に応じた領域に、当該データパケットに含まれる音声信号ブロックを記憶させる。端末装置１１から送出された複数のデータパケットは、各々、ネットワーク１０において通過する通信経路が異なる結果、送出順に音声信号処理サーバ１３に受信されるとは限らない。しかしながら、音声信号受信部１３２により、ブロック番号に応じた順序でデータバッファ１３９３に音声信号ブロックが記憶される結果、データバッファ１３９３に記憶される一連の音声信号は、端末装置１１においてデータパケットに分割される前の音声信号を再現したものとなる。データパケットの一部が何らかの理由で音声信号処理サーバ１３に到達しなかった場合には、音声信号処理サーバ１３により到達しなかったデータパケットに含まれる音声信号が前後の音声信号に基づき補間される等の処理が行われるが、それらの処理は従来技術によるものと同様であるため、説明を省略する。 The data packet sent to the audio signal processing server 13 as described above via the network 10 is received by the audio signal receiving unit 132 of the audio signal processing server 13. The audio signal receiving unit 132 follows the connection ID included in the received data packet, and the audio signal included in the data packet in an area corresponding to the block number included in the data packet of the data buffer 1393 corresponding to the connection ID. Remember the block. The plurality of data packets transmitted from the terminal device 11 are not necessarily received by the audio signal processing server 13 in the order of transmission as a result of the different communication paths passing through the network 10. However, as a result of the audio signal receiving unit 132 storing the audio signal blocks in the data buffer 1393 in the order corresponding to the block numbers, a series of audio signals stored in the data buffer 1393 are divided into data packets in the terminal device 11. It is a reproduction of the audio signal before being played. When a part of the data packet does not reach the audio signal processing server 13 for some reason, the audio signal included in the data packet not reached by the audio signal processing server 13 is interpolated based on the preceding and following audio signals. These processes are the same as those according to the prior art, and the description thereof will be omitted.

図８は、端末装置１１−１〜６の各々から送信された音声信号が、音声信号処理サーバ１３のデータバッファ１３９３−１〜６に各々記憶される様子を模式的に示した図である。既に述べたように、データバッファ１３９３−１〜６は会議「２００４０３１５」の参加者の各々の音声信号を記憶するために確保されており、参加者に対応するコネクションＩＤにより、いずれの参加者の音声信号を記憶するためのデータバッファであるかが識別可能となっている。例えば、コネクションＩＤ「０００４」に対応するデータバッファ１３９３−１には、コネクションＩＤ「０００４」により特定される通信コネクションを介して端末装置１１−１より受信した音声信号ブロックがブロック番号に従い組み立てられ、参加者１９−１の音声を示す音声信号が復元される。ただし、データバッファ１３９３において復元される音声信号には、各音声信号ブロックとともにデータパケットに含まれていた時刻データが、音声信号における各音声信号ブロックの位置に対応付けて記憶されている。従って、データバッファ１３９３に記憶されているデータによれば、音声信号の各部分により示される音声が発声された時刻が特定可能である。 FIG. 8 is a diagram schematically showing how audio signals transmitted from each of the terminal devices 11-1 to 6 are stored in the data buffers 1393-1 to 1393-1 of the audio signal processing server 13, respectively. As described above, the data buffers 1393-1 to 3936-1 are reserved for storing the audio signals of the participants of the conference “200431515”, and the connection ID corresponding to the participant is used to determine which of the participants. It is possible to identify whether the data buffer is for storing an audio signal. For example, in the data buffer 1393-1 corresponding to the connection ID “0004”, the audio signal block received from the terminal device 11-1 via the communication connection specified by the connection ID “0004” is assembled according to the block number. An audio signal indicating the audio of the participant 19-1 is restored. However, in the audio signal restored in the data buffer 1393, the time data included in the data packet together with each audio signal block is stored in association with the position of each audio signal block in the audio signal. Therefore, according to the data stored in the data buffer 1393, the time at which the voice indicated by each part of the voice signal is uttered can be specified.

ところで、参加者１９の各々は端末装置１１を操作して音声信号処理サーバ１３に指定データを送信することにより、特定の属性を備えた音声信号のみを自分の使用する端末装置１１に送信するよう、音声信号処理サーバ１３に指示を与えることができる。既定の状態、すなわち参加者１９が指示データを端末装置１１から音声信号処理サーバ１３に送信していない状態では、音声信号処理サーバ１３はデータバッファ１３９３−１〜６に記憶される音声信号の全てを、順次、ミキシングした後、端末装置１１に送信する。一方、参加者１９が指示データを端末装置１１から音声信号処理サーバ１３に送信した後は、音声信号処理サーバ１３はデータバッファ１３９３−１〜６に記憶される音声信号のうち、指定データにより示される条件を満たす音声信号を抽出し、抽出した音声信号のみをミキシングして端末装置１１に送信する。これらの音声信号処理サーバ１３による処理は、端末装置１１−１〜６の各々に関し個別に行われる。従って、以下、端末装置１１−１に対し、音声信号処理サーバ１３から音声信号が送信される場合を例として、その具体的な動作を説明する。 By the way, each participant 19 operates the terminal device 11 to transmit the designated data to the audio signal processing server 13 so that only the audio signal having a specific attribute is transmitted to the terminal device 11 used by the participant 19. An instruction can be given to the audio signal processing server 13. In a predetermined state, that is, in a state where the participant 19 has not transmitted the instruction data from the terminal device 11 to the audio signal processing server 13, the audio signal processing server 13 includes all of the audio signals stored in the data buffers 1393-1 to 1393-1. Are sequentially mixed, and then transmitted to the terminal device 11. On the other hand, after the participant 19 transmits the instruction data from the terminal device 11 to the audio signal processing server 13, the audio signal processing server 13 indicates the designated data among the audio signals stored in the data buffers 1393-1 to 1393-1. An audio signal satisfying the condition is extracted, and only the extracted audio signal is mixed and transmitted to the terminal device 11. These processes by the audio signal processing server 13 are individually performed for each of the terminal devices 11-1 to 11-6. Therefore, the specific operation will be described below by taking as an example a case where an audio signal is transmitted from the audio signal processing server 13 to the terminal device 11-1.

まず、音声信号処理サーバ１３が端末装置１１−１から指定データを受信していない場合、抽出部１３３はデータバッファ１３９３−１〜６の各々から、音声信号のサンプリングレートに等しい周期で新たに書き込まれた音声信号の各サンプルを読み出し、読み出した音声信号をミキシング部１３４に引き渡す。ミキシング部１３４は抽出部１３３から受け取ったサンプルを加算して１つのサンプルを生成し、生成したサンプルを順次音声信号送信部１３５に引き渡す。このようにミキシング部１３４から順次音声信号送信部１３５に引き渡されるサンプルは、全体として、参加者１９−１〜６の音声をミキシングした音声を示す合成音声信号である。 First, when the audio signal processing server 13 has not received the designated data from the terminal device 11-1, the extraction unit 133 newly writes data from each of the data buffers 1393-1 to 6393-6 at a period equal to the sampling rate of the audio signal. Each sample of the read audio signal is read, and the read audio signal is delivered to the mixing unit 134. The mixing unit 134 adds the samples received from the extraction unit 133 to generate one sample, and sequentially delivers the generated sample to the audio signal transmission unit 135. Thus, the samples sequentially delivered from the mixing unit 134 to the audio signal transmitting unit 135 are synthesized audio signals indicating the audio obtained by mixing the audio of the participants 19-1 to 6 as a whole.

音声信号送信部１３５は、ミキシング部１３４から合成音声信号を受け取ると、受け取った合成音声信号を用いてデータパケットを生成し、生成したデータパケットを順次、ネットワーク１０に送出する。音声信号送信部１３５がデータパケットを生成する方法は、音声信号送信部１１４がデータパケットを生成する方法と同様である（図７参照）。ただし、この場合、データパケットに含まれる送信元ＩＤは音声信号処理サーバ１３のＩＤであり、送信先ＩＤは端末装置１１−１の端末ＩＤである。また、本実施形態においては端末装置１１において時刻データが利用されることはないので、音声信号処理サーバ１３から送出されるデータパケットには時刻データが含まれないようにしてもよい。 When the voice signal transmission unit 135 receives the synthesized voice signal from the mixing unit 134, the voice signal transmission unit 135 generates a data packet using the received synthesized voice signal, and sequentially sends the generated data packet to the network 10. The method of generating the data packet by the audio signal transmitting unit 135 is the same as the method of generating the data packet by the audio signal transmitting unit 114 (see FIG. 7). However, in this case, the transmission source ID included in the data packet is the ID of the audio signal processing server 13, and the transmission destination ID is the terminal ID of the terminal device 11-1. In the present embodiment, since the time data is not used in the terminal device 11, the data packet transmitted from the audio signal processing server 13 may not include the time data.

音声信号送信部１３５からネットワーク１０に送出されたデータパケットは含まれる送信先ＩＤに基づき、端末装置１１−１に送り届けられる。端末装置１１−１の音声信号受信部１１５はデータパケットを受信すると、受信したデータパケットに含まれる音声信号ブロックをブロック番号に応じたデータバッファ１１６１の領域に順次書き込んでゆく。その一方で、音声信号処理部１１３はデータバッファ１１６１に書き込まれた音声信号をアナログ音声信号に変換し、ヘッドセット１２のヘッドフォンにそれぞれ出力する。ヘッドセット１２のヘッドフォンは、音声信号処理部１１３から入力された音声信号を音に変換し発音する。その結果、参加者１９−１は、自分を含む全ての参加者の発言を聞くことができ、会議「２００４０３１５」の他の参加者との間で議論を行うことができる。 The data packet transmitted from the audio signal transmission unit 135 to the network 10 is delivered to the terminal device 11-1 based on the included transmission destination ID. When receiving the data packet, the audio signal receiving unit 115 of the terminal device 11-1 sequentially writes the audio signal block included in the received data packet in the area of the data buffer 1161 corresponding to the block number. On the other hand, the audio signal processing unit 113 converts the audio signal written in the data buffer 1161 into an analog audio signal and outputs it to the headphones of the headset 12. The headphones of the headset 12 convert the sound signal input from the sound signal processing unit 113 into sound and generate a sound. As a result, the participant 19-1 can hear the speech of all the participants including himself and can discuss with other participants of the conference “2004315”.

次に、端末装置１１−１から音声信号処理サーバ１３に対し指定データが送信される場合の音声会議システム１の動作について説明する。端末装置１１−１の指定データ送信部１１２は、参加者１９−１の操作に応じて、指定データ１３９５−１を生成し音声信号処理サーバ１３に送信する。指定データは、例えば［役職＝部長ｏｒ役割＝ゲストｏｒ（所属＝海外部ａｎｄ役職＝一般職員）］のような形式のデータである。この指定データは、対応データ１３９４（図６参照）に示される属性に基づき、音声信号処理サーバ１３がいずれの参加者１９の音声信号を端末装置１１−１に送信すべきかを指示している。 Next, the operation of the audio conference system 1 when designated data is transmitted from the terminal device 11-1 to the audio signal processing server 13 will be described. The designated data transmitting unit 112 of the terminal device 11-1 generates designated data 1395-1 and transmits it to the audio signal processing server 13 in accordance with the operation of the participant 19-1. The designated data is, for example, data in a format such as [title = department or role = guest or (affiliation = overseas department and title = general staff)]. This designation data instructs which audio signal of the participant 19 should be transmitted to the terminal device 11-1 by the audio signal processing server 13 based on the attribute shown in the correspondence data 1394 (see FIG. 6).

端末装置１１−１から送信された指定データは、音声信号処理サーバ１３の指定データ受信部１３１に受信され、記憶部１３９に端末装置１１−１のコネクションＩＤ「０００４」とともに一時的に記憶される。以下、記憶部１３９に記憶された指定データを、その送信元である端末装置１１−１〜６の各々に応じて、指定データ１３９５−１〜６と呼ぶ（図１参照）。すなわち、例えば指定データ１３９５−１にはコネクションＩＤ「０００４」が対応付けられている。抽出部１３３は指定データ１３９５−１が記憶部１３９に記憶されている場合、対応データ１３９４から指定データ１３９５−１の条件を満たすレコードを抽出し、抽出したレコードのコネクションＩＤを取り出す。例えば指定データ１３９５−１が［役職＝部長ｏｒ役割＝ゲストｏｒ（所属＝海外部ａｎｄ役職＝一般職員）］である場合、抽出部１３３は役職が「部長」であるレコードのコネクションＩＤ「０００４」、役割が「ゲスト」であるレコードのコネクションＩＤ「０００９」、所属が「海外部」であり役職が「一般職員」であるレコードのコネクションＩＤ「００２３」を抽出する。 The designation data transmitted from the terminal device 11-1 is received by the designation data receiving unit 131 of the audio signal processing server 13, and temporarily stored in the storage unit 139 together with the connection ID “0004” of the terminal device 11-1. . Hereinafter, the designation data stored in the storage unit 139 will be referred to as designation data 1395-1 to 6395-6 according to each of the terminal apparatuses 11-1 to 11-6 that are the transmission sources (see FIG. 1). That is, for example, the connection ID “0004” is associated with the designated data 1395-1. When the designation data 1395-1 is stored in the storage unit 139, the extraction unit 133 extracts a record that satisfies the conditions of the designation data 1395-1 from the correspondence data 1394, and extracts the connection ID of the extracted record. For example, when the designated data 1395-1 is [title = department or role = guest or (affiliation = overseas department and title = general staff)], the extraction unit 133 has a connection ID “0004” of a record whose title is “department manager”. , The connection ID “0009” of the record whose role is “guest”, and the connection ID “0023” of the record whose affiliation is “overseas department” and whose title is “general staff” are extracted.

抽出部１３３は、上記のように抽出したコネクションＩＤに、指定データ１３９５−１に対応付けられて記憶されているコネクションＩＤ「０００４」が含まれていない場合には、抽出したコネクションＩＤに、コネクションＩＤ「０００４」を追加する。ただし、上記の例の場合、コネクションＩＤ「０００４」は既に抽出されているため、コネクションＩＤの追加の処理は行われない。このように、抽出したコネクションＩＤに指定データの送信元の端末装置１１のコネクションＩＤを追加する結果、以下に説明する抽出部１３３およびミキシング部１３４の処理において当該端末装置１１から送信された音声信号が当該端末装置１１に送信される音声信号にミキシングされ、参加者１９が自分の発言を含む会議全体の音声を聞くことができるのである。 When the connection ID “0004” stored in association with the specified data 1395-1 is not included in the connection ID extracted as described above, the extraction unit 133 extracts the connection ID into the connection ID extracted. An ID “0004” is added. However, in the case of the above example, since the connection ID “0004” has already been extracted, the connection ID addition processing is not performed. As described above, as a result of adding the connection ID of the terminal device 11 that is the transmission source of the specified data to the extracted connection ID, the audio signal transmitted from the terminal device 11 in the processing of the extraction unit 133 and the mixing unit 134 described below. Is mixed with the audio signal transmitted to the terminal device 11, and the participant 19 can hear the audio of the entire conference including his / her speech.

抽出部１３３は、上記のように抽出したコネクションＩＤに対応するデータバッファ１３９３から音声信号のサンプルを順次読み出しミキシング部１３４に引き渡す。この場合、コネクションＩＤ「０００４」、「００２３」および「０００９」に対応するデータバッファ１３９３−１、データバッファ１３９３−５およびデータバッファ１３９３−６からサンプルが読み出され、ミキシング部１３４に引き渡されることになる。 The extraction unit 133 sequentially reads out the audio signal samples from the data buffer 1393 corresponding to the connection ID extracted as described above, and passes them to the mixing unit 134. In this case, samples are read from the data buffer 1393-1, the data buffer 1393-5, and the data buffer 1393-6 corresponding to the connection IDs “0004”, “0023”, and “0009” and delivered to the mixing unit 134. become.

ミキシング部１３４はデータバッファ１３９３−１、データバッファ１３９３−５およびデータバッファ１３９３−６から読み出されたサンプルを順次受け取ると、受け取ったサンプルを加算して１つのサンプルを生成し、生成したサンプルを順次音声信号送信部１３５に引き渡す。このようにミキシング部１３４から順次音声信号送信部１３５に引き渡されるサンプルは、全体として、参加者１９−１、参加者１９−５および参加者１９−６の音声をミキシングした音声を示す合成音声信号である。 When the mixing unit 134 sequentially receives the samples read from the data buffer 1393-1, the data buffer 1393-5, and the data buffer 1393-6, it adds the received samples to generate one sample, and the generated sample is Sequentially delivered to the audio signal transmission unit 135. In this way, the samples sequentially delivered from the mixing unit 134 to the audio signal transmission unit 135 are, as a whole, a synthesized audio signal indicating the audio obtained by mixing the audio of the participant 19-1, the participant 19-5, and the participant 19-6. It is.

音声信号送信部１３５は、ミキシング部１３４から合成音声信号を受け取ると、受け取った合成音声信号をコネクションＩＤ「０００４」により特定される端末装置１１−１に送信する。合成音声信号は端末装置１１−１に受信され、ヘッドセット１２−１により音に変換されて発音される。その結果、参加者１９−１は、指定データにより指定した希望する参加者１９の音声のみを含む会議の音声を聞くことができる。 When receiving the synthesized voice signal from the mixing unit 134, the voice signal transmitting unit 135 transmits the received synthesized voice signal to the terminal device 11-1 specified by the connection ID “0004”. The synthesized speech signal is received by the terminal device 11-1, converted into sound by the headset 12-1, and pronounced. As a result, the participant 19-1 can hear the audio of the conference including only the audio of the desired participant 19 designated by the designated data.

端末装置１１−１〜６は各々異なる指定データを音声信号処理サーバ１３に送信することができ、抽出部１３３は端末装置１１−１〜６の各々について、指定データにより示される異なる条件で音声信号の抽出を行う。抽出部１３３により端末装置１１−１〜６の各々について抽出された音声信号は、個別にミキシング部１３４によりミキシングされ、音声信号送信部１３５から対応する端末装置１１に送信される。従って、参加者１９−１〜６は、会議における発言のうち、各自の希望する参加者の発言のみを聞くことができる。また、指定データ受信部１３１は端末装置１１−１〜６の各々から新たに受け取った指定データを、既に記憶部１３９に記憶されている指定データ１３９５−１〜６に上書きするため、参加者１９−１〜６は指定データを音声信号処理サーバ１３に送信し直すことにより、任意のタイミングでヘッドセット１２から発せられる音に含まれる発言の話者の構成を変更することができる。 Each of the terminal devices 11-1 to 6 can transmit different designated data to the audio signal processing server 13, and the extracting unit 133 can output the audio signal for each of the terminal devices 11-1 to 6 under different conditions indicated by the designated data. Perform extraction. The audio signals extracted for each of the terminal devices 11-1 to 6 by the extraction unit 133 are individually mixed by the mixing unit 134 and transmitted from the audio signal transmission unit 135 to the corresponding terminal device 11. Accordingly, the participants 19-1 to 19-6 can hear only the speech of the participant desired by the participant in the conference. In addition, the designated data receiving unit 131 overwrites the designated data 1395-1 to 139-6 already stored in the storage unit 139 with the newly received designated data from each of the terminal devices 11-1 to 11-6. -1 to 6 can change the configuration of the speaker of the speech included in the sound emitted from the headset 12 at an arbitrary timing by retransmitting the designated data to the audio signal processing server 13.

会議「２００４０３１５」が終了すると、音声信号処理サーバ１３はデータバッファ１３９３−１〜６に記憶されている音声信号（図８参照）を含む音声記録ファイルを生成し、記憶部１３９に記憶する。その結果、記憶部１３９には、過去に開催された会議の音声を各々記録した複数の音声記録ファイルが音声記録ファイル群１３９６として記憶されることになる。 When the conference “2004315” ends, the audio signal processing server 13 generates an audio recording file including the audio signals (see FIG. 8) stored in the data buffers 1393-1 to 139-6 and stores them in the storage unit 139. As a result, the storage unit 139 stores a plurality of audio recording files each recording audio of a meeting held in the past as an audio recording file group 1396.

図９は、音声記録ファイル群１３９６に含まれる音声記録ファイルの内容を模式的に示した図である。音声記録ファイルには、データバッファ１３９３の各々に記憶された音声信号が、対応付けられている時刻データにより示される時刻が互いに一致するように時系列的に配置された状態で格納される。また、音声記録ファイルに格納されている音声信号の各々には、対応データ１３９４（図６参照）の対応するレコードに含まれる参加者ＩＤ、氏名、所属、役職および役割の項目が対応付けられる。さらに、音声記録ファイルには、会議データ１３９１（図２参照）の対応するレコードに含まれる会議ＩＤ、日付、時間帯および議題の項目が管理データとして追加される。音声信号処理サーバ１３は、そのように構成された音声記録ファイルに、例えば会議ＩＤをファイル名に付して記憶部１３９に記憶させる。以下、例えばファイル名が「２００４０３１５」である音声記録ファイルを、音声記録ファイル「２００４０３１５」と呼ぶ。 FIG. 9 is a diagram schematically showing the contents of audio recording files included in the audio recording file group 1396. In the audio recording file, the audio signals stored in each of the data buffers 1393 are stored in a state of being arranged in time series so that the times indicated by the associated time data coincide with each other. Further, each of the audio signals stored in the audio recording file is associated with items of participant ID, name, affiliation, title, and role included in the corresponding record of the correspondence data 1394 (see FIG. 6). Furthermore, the items of the conference ID, date, time zone, and agenda included in the corresponding record of the conference data 1391 (see FIG. 2) are added to the audio recording file as management data. The audio signal processing server 13 causes the storage unit 139 to store, for example, a conference ID in the audio recording file configured as described above, with the conference ID added to the file name. Hereinafter, for example, an audio recording file whose file name is “200440315” is referred to as an audio recording file “200440315”.

音声会議システム１の利用が許可されている者、すなわち参加者データ１３９２（図３参照）にデータが登録されている者は、音声信号処理サーバ１３に記憶されている音声記録ファイルを用いて、過去の会議の音声を聞くことができる。以下、参加者１９−ｘが会議「２００４０３１５」の音声を聞く場合の音声会議システム１の動作を説明する。ここで、参加者１９−ｘは会議「２００４０３１５」の参加者である必要はない。 A person who is permitted to use the audio conference system 1, that is, a person whose data is registered in the participant data 1392 (see FIG. 3) uses an audio recording file stored in the audio signal processing server 13. You can listen to audio from past meetings. The operation of the audio conference system 1 when the participant 19-x listens to the audio of the conference “2004315” will be described below. Here, the participant 19-x does not need to be a participant of the conference “2004315”.

参加者１９−ｘは端末装置１１−ｘを操作して、端末装置１１−ｘと音声信号処理サーバ１３との通信コネクションの確立、認証等の処理を行わせた後、過去の会議のリストを要求する要求データを端末装置１１−ｘから音声信号処理サーバ１３に送信する。その要求に応じて、音声信号処理サーバ１３は会議データ１３９１（図２参照）を用いて、会議リストを示す画面データを生成し、端末装置１１−ｘに送信する。その結果、端末装置１１−ｘの表示部には会議リストを含む画面が表示される。図１０は、端末装置１１−ｘの表示部に表示される画面を例示した図である。この画面において、参加者１９−ｘがいずれかの会議のデータ行を選択し、「ＯＫ」ボタンを押下する操作を行うと、端末装置１１−ｘは選択されたデータ行に含まれる会議ＩＤを音声信号処理サーバ１３に送信する。ここで、参加者１９−ｘは会議「２００４０３１５」を選択し、音声信号処理サーバ１３に会議ＩＤ「２００４０３１５」が送信されたものとする。 The participant 19-x operates the terminal device 11-x to establish communication connection between the terminal device 11-x and the audio signal processing server 13, and performs processing such as authentication, and then displays a list of past conferences. Request data to be requested is transmitted from the terminal device 11-x to the audio signal processing server 13. In response to the request, the audio signal processing server 13 generates screen data indicating the conference list using the conference data 1391 (see FIG. 2), and transmits the screen data to the terminal device 11-x. As a result, a screen including the conference list is displayed on the display unit of the terminal device 11-x. FIG. 10 is a diagram illustrating a screen displayed on the display unit of the terminal device 11-x. In this screen, when the participant 19-x selects a data row of any conference and performs an operation of pressing the “OK” button, the terminal device 11-x displays the conference ID included in the selected data row. It transmits to the audio signal processing server 13. Here, it is assumed that the participant 19-x selects the conference “2004315” and the conference ID “2004315” is transmitted to the audio signal processing server 13.

音声信号処理サーバ１３は、端末装置１１−ｘから会議ＩＤ「２００４０３１５」を受信すると、音声記録ファイル「２００４０３１５」を音声記録ファイル群１３９６から読み出す。音声信号処理サーバ１３は、音声記録ファイル「２００４０３１５」に含まれる会議の時間帯を示すデータと、音声信号に各々対応付けられている参加者の氏名等のデータを用いて、参加者選択および時間帯指定の画面を示す画面データを生成する。音声信号処理サーバ１３は生成した画面データを端末装置１１−ｘに送信する。その結果、端末装置１１−ｘの表示部には先に選択した会議の参加者リストを含む画面が表示される。 When the audio signal processing server 13 receives the conference ID “200440315” from the terminal device 11-x, the audio signal processing server 13 reads the audio recording file “200440315” from the audio recording file group 1396. The audio signal processing server 13 uses the data indicating the meeting time zone included in the audio recording file “200440315” and the data such as the names of the participants respectively associated with the audio signals to select the participant and set the time. Generate screen data indicating the screen for specifying the band. The audio signal processing server 13 transmits the generated screen data to the terminal device 11-x. As a result, a screen including the participant list of the previously selected conference is displayed on the display unit of the terminal device 11-x.

図１１は、端末装置１１−ｘの表示部に表示される画面を例示した図である。この画面において、参加者１９−ｘが音声を聞きたい参加者の選択、音声を聞きたい時間帯の指定および無声部分を削除するか否かの選択ボタンの選択を行った後、「ＯＫ」ボタンの押下操作を行うと、端末装置１１−ｘは参加者１９−ｘの操作に応じた指定データを生成し、音声信号処理サーバ１３に送信する。以下、ユーザにより図１１に例示した参加者の選択、時間帯の指定および無声部分の削除の選択が行われたものとする。 FIG. 11 is a diagram illustrating a screen displayed on the display unit of the terminal device 11-x. In this screen, after the participant 19-x selects a participant who wants to hear the voice, specifies a time zone in which he / she wants to hear the voice, and selects whether or not to delete the silent part, an “OK” button is displayed. The terminal device 11-x generates designation data corresponding to the operation of the participant 19-x and transmits it to the audio signal processing server 13. Hereinafter, it is assumed that the user has selected the participant exemplified in FIG. 11, the designation of the time zone, and the selection of deletion of the unvoiced part.

この場合、端末装置１１−ｘが音声信号処理サーバ１３に送信する指定データは、［（参加者ＩＤ＝０４２５ｏｒ参加者ＩＤ＝００２５ｏｒ参加者ＩＤ＝００７４ｏｒ参加者ＩＤ＝０３６２）ａｎｄ（時刻データ≧１３：２０ａｎｄ時刻データ≦１４：００）ａｎｄ（レベル≧１０ｏｒレベル≦−１０）］のようになる。ここで、レベルとは音声信号の振幅を示し、［（レベル≧１０ｏｒレベル≦−１０）］は、音声信号のうち振幅が１０以上または−１０以下のもののみを抽出することにより、振幅が１０と−１０の間のものは無声部分であるとしてカットすることを指示している。 In this case, the designation data transmitted from the terminal device 11-x to the audio signal processing server 13 is [(participant ID = 0425 or participant ID = 0025 or participant ID = 0074 or participant ID = 0362) and (time Data ≧ 13: 20 and time data ≦ 14: 00) and (level ≧ 10 or level ≦ −10)]. Here, the level indicates the amplitude of the audio signal, and [(level ≧ 10 or level ≦ −10)] indicates that the amplitude is extracted by extracting only audio signals having an amplitude of 10 or more and −10 or less. Anything between 10 and -10 indicates that it should be cut as being silent.

音声信号処理サーバ１３の指定データ受信部１３１は、端末装置１１−ｘから指定データを受信すると、受信した指定データを記憶部１３９に一時的に記憶する。以下、記憶部１３９に記憶された指定データを指定データ１３９５と呼ぶ。指定データ１３９５が記憶部１３９に書き込まれると、抽出部１３３は音声記録ファイル「２００４０３１５」（図９参照）に含まれる音声信号のうち、指定データ１３９５の条件を満たす部分を抽出する。より具体的には、参加者ＩＤ「０４２５」、「００２５」、「００７４」および「０３６２」に対応付けられた音声信号をまず選択し、選択した音声信号から、１３：２０〜１４：００の時間帯に含まれる時刻を示す時刻データに対応付けられた部分を取り出す。 When the designated data receiving unit 131 of the audio signal processing server 13 receives the designated data from the terminal device 11-x, the designated data receiving unit 131 temporarily stores the received designated data in the storage unit 139. Hereinafter, the designation data stored in the storage unit 139 is referred to as designation data 1395. When the designation data 1395 is written in the storage unit 139, the extraction unit 133 extracts a portion satisfying the condition of the designation data 1395 from the audio signal included in the audio recording file “200440315” (see FIG. 9). More specifically, audio signals associated with the participant IDs “0425”, “0025”, “0074”, and “0362” are first selected, and 13:20 to 14:00 are selected from the selected audio signals. A portion associated with time data indicating the time included in the time zone is extracted.

さらに、抽出部１３３は取り出した複数の音声信号に含まれるサンプルを先頭から順次読み出し、同じ時刻に対応するサンプルの振幅がいずれも−１０より大きく１０より小さい期間が所定時間（例えば５秒間）以上継続する部分を発見すると、その部分を順次削除する。このように、振幅が−１０より大きく１０より小さい期間が所定時間以上継続することを削除の条件とするのは、発言が継続中における言葉の切れ目等の無声部分がカットされてしまわないようにするためである。 Further, the extraction unit 133 sequentially reads samples included in the plurality of extracted audio signals from the head, and a period in which the amplitudes of the samples corresponding to the same time are both greater than −10 and smaller than 10 is a predetermined time (for example, 5 seconds) or more When a part that continues is deleted, the part is deleted sequentially. As described above, the deletion condition is that a period in which the amplitude is larger than −10 and smaller than 10 is continued for a predetermined time or longer so that a silent portion such as a break of a word during speech is not cut. It is to do.

抽出部１３３は、上記のように音声信号の抽出を行うと、得られた音声信号をミキシング部１３４に引き渡す。ミキシング部１３４は抽出部１３３から受け取った音声信号をミキシングし、生成した合成音声信号を音声信号送信部１３５に引き渡す。音声信号送信部１３５はミキシング部１３４から受け取った合成音声信号を端末装置１１−ｘに送信する。その結果、端末装置１１−ｘに接続されたヘッドセット１２−ｘのヘッドフォンからは、参加者１９−ｘの希望する会議における希望する時間帯の希望する参加者の音声のみが発音される。また、所定時間以上誰も発言しなかった部分の音声信号はカットされ、全体として発音される音声の時間が短くなる。 When the extraction unit 133 extracts the audio signal as described above, the extraction unit 133 passes the obtained audio signal to the mixing unit 134. The mixing unit 134 mixes the audio signal received from the extraction unit 133 and passes the generated synthesized audio signal to the audio signal transmission unit 135. The audio signal transmission unit 135 transmits the synthesized audio signal received from the mixing unit 134 to the terminal device 11-x. As a result, only the voice of the desired participant in the desired time zone in the conference desired by the participant 19-x is produced from the headphones of the headset 12-x connected to the terminal device 11-x. In addition, the audio signal of the portion where no one has spoken for a predetermined time or longer is cut, and the time of the sound that is pronounced as a whole is shortened.

以上のように、音声会議システム１によれば、会議に参加するものは自分が聞きたいと思う参加者の発言のみを聞きながら会議に参加できる。従って、例えば社長や部長といったキーパーソンの発言のみを聞きながら会議に参加したい参加者にとって都合がよい。 As described above, according to the audio conference system 1, those who participate in the conference can participate in the conference while listening only to the remarks of the participant who wants to hear. Therefore, it is convenient for the participant who wants to participate in the conference while listening only to the remarks of the key person such as the president or the general manager.

また、音声会議システム１によって過去の会議の記録音声を聞く場合、自分が聞きたいと思う参加者の発言のみを聞くことができる。従って、キーパーソンの発言のみを聞きたい場合、通訳の音声は聞く必要がない場合、外国語を話す者の発言は聞く必要がない場合など、それらの音声部分以外の部分のみを聞くことができ、聞く者にとって都合がよい。その場合、記録音声を聞くために要する時間も短くて済む。 In addition, when listening to recorded audio of past conferences by the audio conference system 1, it is possible to listen only to the speech of the participant who wants to listen. Therefore, if you only want to hear the key person's speech, you do not need to hear the speech of the interpreter, or you do not need to hear the speech of a person who speaks a foreign language, you can hear only the other parts. , Convenient for listeners. In that case, the time required to listen to the recorded voice can be shortened.

上記のように、音声会議システム１を用いれば、会議の参加者のうち、特定の参加者の発言のみを聞くことができるため、聞き手は各参加者の発言を個別に取り出して聞くことにより、各参加者の意見をより容易に理解することもできる。 As described above, since the audio conference system 1 can be used to listen only to the speech of a specific participant among the participants in the conference, the listener can extract and listen to the speech of each participant individually. It is also easier to understand the opinions of each participant.

ところで、上記の実施形態においては、音声記録ファイルには会議に参加する全ての参加者１９の音声信号が含まれるものとして説明した。しかしながら、例えば音声会議システム１の管理者等により、予め音声記録ファイルに含めるべき音声信号の条件を指定する指定データを記憶部１３９に記憶させておき、その指定データに従って抽出部１３３が抽出した音声信号のみを音声記録ファイルに含めるようにしてもよい。そうすれば、音声記録ファイルのサイズを小さくすることができる。 By the way, in the above embodiment, it has been described that the audio recording file includes the audio signals of all the participants 19 participating in the conference. However, for example, the administrator of the audio conference system 1 stores in advance the specification data for specifying the conditions of the audio signal to be included in the audio recording file in the storage unit 139, and the audio extracted by the extraction unit 133 according to the specification data Only the signal may be included in the audio recording file. Then, the size of the audio recording file can be reduced.

また、上記の実施形態においては、参加者１９はヘッドセット１２を用いて会議に参加するものとして説明した。しかしながら、参加者１９は指向性の強いマイクおよび発する音をビーム制御可能なスピーカアレイ等を備えたハンズフリーの音響装置をヘッドセット１２の代わりに用いるようにしてもよい。 Moreover, in said embodiment, the participant 19 demonstrated as what participates in a meeting using the headset 12. FIG. However, the participant 19 may use a hands-free acoustic device including a microphone having a strong directivity and a speaker array capable of beam-controlling emitted sound instead of the headset 12.

また、上記の実施形態においては、音声信号処理サーバ１３により端末装置１１から受信された音声信号は、いったんデータバッファ１３９３に記憶された後、抽出部１３３による抽出処理に利用されるものとして説明した。しかしながら、音声信号処理サーバ１３が音声信号のサンプルを順序どおりに受信可能な状況においては、音声信号受信部１３２は受信した音声信号のサンプルをデータバッファ１３９３に記憶することなく抽出部１３３に引き渡すようにしてもよい。その場合、抽出部１３３は受け取ったサンプルのうち指定データ１３９５の条件を満たすもののみを順次ミキシング部１３４に引き渡す。 In the above embodiment, the audio signal received from the terminal device 11 by the audio signal processing server 13 is once stored in the data buffer 1393 and then used for the extraction processing by the extraction unit 133. . However, in a situation where the audio signal processing server 13 can receive the audio signal samples in order, the audio signal receiving unit 132 delivers the received audio signal samples to the extraction unit 133 without storing them in the data buffer 1393. It may be. In that case, the extraction unit 133 sequentially delivers only those samples that satisfy the conditions of the designated data 1395 to the mixing unit 134.

また、上記の実施形態においては、会議に参加する参加者１９はいずれも会議における発言が可能なものとして説明した。しかしながら、一部の参加者１９には発言を許可せず、傍聴のみを可能とするようにしてもよい。 Moreover, in said embodiment, all the participants 19 who participate in a meeting demonstrated as what can speak in a meeting. However, some participants 19 may not be allowed to speak, and may only be able to listen.

また、音声会議システム１において、端末装置１１と音声信号処理サーバ１３との間で送受信されるデータパケットに含まれる音声信号を暗号化するように構成してもよい。その場合、音声信号送信部１１４および音声信号送信部１３５において、送信される音声信号が暗号化された後、データブロックに分割され、データパケットに含められる。また、音声信号受信部１３２および音声信号受信部１１５は、データパケットに含まれるデータブロックを組み立てた後、それを復号化して音声信号を復元する。このように音声信号を暗号化すると、会議の内容が第三者に漏洩することが防止される。 Further, the audio conference system 1 may be configured to encrypt an audio signal included in a data packet transmitted / received between the terminal device 11 and the audio signal processing server 13. In that case, the audio signal transmission unit 114 and the audio signal transmission unit 135 encrypt the audio signal to be transmitted, then divide it into data blocks and include it in the data packet. The audio signal receiving unit 132 and the audio signal receiving unit 115 assemble a data block included in the data packet, and then decode the data block to restore the audio signal. Encrypting the audio signal in this way prevents the contents of the conference from leaking to a third party.

また、上記の実施形態においては、音声信号の各部分の生成時刻を示す時刻データが端末装置１１により生成され、音声信号に対応付けられるものとして説明した。しかしながら、端末装置１１においては時刻データの生成および対応付けを行うことなく、音声信号処理サーバ１３が計時部を備えるようにし、音声信号を端末装置１１から受信した時点で、その時点の時刻を示す時刻データを受信した音声信号に対応付けるようにしてもよい。その場合、時刻データの示す時刻は参加者１９の発言が行われた時刻に通信に要した時間を加えた時刻となるが、通信に要する時間が無視できる程度に短い場合には、音声信号処理サーバ１３が音声信号を受信した時刻を発言の時刻とみなすことができる。また、音声信号処理サーバ１３において時刻データを生成する際に、通信に要する推定時間を現在時刻より差し引いた時刻を示す時刻データを生成することにより、時刻データにより示される時刻と発言の時刻との誤差を小さくするようにしてもよい。 Further, in the above embodiment, the time data indicating the generation time of each part of the audio signal is generated by the terminal device 11 and described as being associated with the audio signal. However, in the terminal device 11, without generating and associating time data, the audio signal processing server 13 includes a time measuring unit, and when the audio signal is received from the terminal device 11, the time at that time is indicated. The time data may be associated with the received audio signal. In this case, the time indicated by the time data is a time obtained by adding the time required for communication to the time when the speech of the participant 19 is made. If the time required for communication is short enough to be ignored, the audio signal processing is performed. The time when the server 13 receives the audio signal can be regarded as the time of speech. Further, when generating the time data in the audio signal processing server 13, by generating time data indicating the time obtained by subtracting the estimated time required for communication from the current time, the time indicated by the time data and the time of the speech are expressed. The error may be reduced.

また、音声信号処理サーバ１３および端末装置１１は、専用のハードウェアにより実現されてもよいし、音声信号の入出力が可能な汎用コンピュータにアプリケーションプログラムに従った処理を実行させることにより実現されてもよい。音声信号処理サーバ１３が汎用コンピュータにより実現される場合、抽出部１３３およびミキシング部１３４は、汎用コンピュータが備えるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）およびＣＰＵの制御下で動作するＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）が、アプリケーションプログラムに含まれる各モジュールに従った処理を同時並行して行うことにより、汎用コンピュータの機能として実現される。また、音声信号処理サーバ１３の音声信号受信部１３２および音声信号送信部１３５は、汎用コンピュータがデータパケットをネットワーク１０との間で送受信するために備える入出力インタフェースと、アプリケーションプログラムに含まれる各モジュールに従ったデータパケットの生成および組み立てに関するＣＰＵの処理により、汎用コンピュータの機能として実現される。 The audio signal processing server 13 and the terminal device 11 may be realized by dedicated hardware, or by causing a general-purpose computer capable of inputting / outputting audio signals to execute processing according to an application program. Also good. When the audio signal processing server 13 is realized by a general-purpose computer, the extraction unit 133 and the mixing unit 134 are configured such that a CPU (Central Processing Unit) included in the general-purpose computer and a DSP (Digital Signal Processor) operating under the control of the CPU By performing processing according to each module included in the program in parallel, it is realized as a function of a general-purpose computer. The audio signal receiving unit 132 and the audio signal transmitting unit 135 of the audio signal processing server 13 include an input / output interface provided for a general-purpose computer to transmit and receive data packets to and from the network 10, and each module included in the application program. This is realized as a function of a general-purpose computer by the processing of the CPU related to generation and assembly of data packets according to the above.

［１．３．変形例］
上述した音声会議システム１においては、端末装置１１の各々は互いに離れた位置に配置され、ネットワーク１０を介して音声信号処理サーバ１３との間でデータ通信を行うことにより、多地点間の会議を実現する。しかしながら、本発明の実施形態は、ネットワーク１０を介することなく、各端末装置１１が直接、音声信号処理サーバ１３に接続するように変形することもできる。そのような変形を加えた音声会議システムは、例えば数十名が一同に介して会議を行うような場合に便利なシステムである。 [1.3. Modified example]
In the above-described audio conference system 1, each of the terminal devices 11 is arranged at a position distant from each other, and performs data communication with the audio signal processing server 13 via the network 10, thereby allowing multipoint conferences. Realize. However, the embodiment of the present invention can be modified such that each terminal device 11 is directly connected to the audio signal processing server 13 without going through the network 10. The voice conference system to which such a modification is added is a convenient system when, for example, dozens of people conduct a conference through the same.

図１２は、ネットワーク１０を介することなく、端末装置１１が直接、音声信号処理サーバ１３に接続されて構成される音声会議システム１０１を示した図である。音声会議システム１０１において、端末装置１１の音声信号送信部１１４は音声信号ブロックを含むデータパケットを生成することはなく、音声信号処理部１１３により生成される音声信号をそのままの形式で音声信号処理サーバ１３に出力する。音声信号処理サーバ１３の音声信号受信部１３２は、データパケットを受信することはなく、各々の端末装置１１から音声信号をそのままの形式で受け取る。音声信号処理サーバ１３において、各々の端末装置１１から入力される音声信号は音声信号処理サーバ１３の音声信号受信部１３２に設けられた複数の入力端子のＩＤにより互いに区別される。すなわち、音声会議システム１０１の音声信号処理サーバ１３においては、コネクションＩＤの代わりに入力端子のＩＤが用いられる。 FIG. 12 is a diagram showing an audio conference system 101 configured such that the terminal device 11 is directly connected to the audio signal processing server 13 without going through the network 10. In the audio conference system 101, the audio signal transmission unit 114 of the terminal device 11 does not generate a data packet including an audio signal block, and the audio signal generated by the audio signal processing unit 113 is directly used as an audio signal processing server. 13 is output. The audio signal receiving unit 132 of the audio signal processing server 13 does not receive the data packet and receives the audio signal from each terminal device 11 in the form as it is. In the audio signal processing server 13, the audio signals input from each terminal device 11 are distinguished from each other by the IDs of a plurality of input terminals provided in the audio signal receiving unit 132 of the audio signal processing server 13. That is, in the audio signal processing server 13 of the audio conference system 101, the ID of the input terminal is used instead of the connection ID.

また、音声会議システム１０１において、音声信号処理サーバ１３の音声信号送信部１３５はミキシング部１３４により生成される音声信号をそのままの形式で、音声信号処理サーバ１３に接続されている各々の端末装置１１に対し出力する。端末装置１１の音声信号受信部１１５は、音声信号処理サーバ１３から入力される音声信号をそのまま音声信号処理部１１３に引き渡す。 In the audio conference system 101, the audio signal transmission unit 135 of the audio signal processing server 13 uses the audio signal generated by the mixing unit 134 as it is, and each terminal device 11 connected to the audio signal processing server 13. Is output. The audio signal receiving unit 115 of the terminal device 11 delivers the audio signal input from the audio signal processing server 13 to the audio signal processing unit 113 as it is.

音声会議システム１０１においては、端末装置１１および音声信号処理サーバ１３において音声信号を音声信号ブロックに分割してデータパケットに含めたり、データパケットに含まれる音声信号ブロックを組み立てて音声信号を復元したりする処理が不要であるので、システムが音声会議システム１と比べ簡素化され、また処理速度も音声会議システム１と比べ高速化することが可能となる。また、音声会議システム１においては、音声信号をネットワーク１０を介して送信するために、アナログ音声信号とデジタル音声信号の間の変換が必要とされたが、音声会議システム１０１においてはアナログ音声信号のままで全ての処理を行うことも可能である。また、一部の処理をアナログ音声信号を用いて行い、他の処理をデジタル音声信号を用いて行うようにしてもよい。 In the audio conference system 101, the audio signal is divided into audio signal blocks in the terminal device 11 and the audio signal processing server 13 and included in the data packet, or the audio signal block included in the data packet is assembled to restore the audio signal. Therefore, the system is simplified compared to the audio conference system 1 and the processing speed can be increased as compared to the audio conference system 1. In the audio conference system 1, in order to transmit an audio signal via the network 10, conversion between an analog audio signal and a digital audio signal is required. It is also possible to perform all processing as it is. Also, some processes may be performed using analog audio signals, and other processes may be performed using digital audio signals.

［２．第２実施形態］
上述した音声会議システム１においては、音声信号の抽出処理およびミキシング処理は、音声信号処理サーバ１３において行われる。以下に説明する第２実施形態にかかる音声会議システム２は、音声信号の抽出処理およびミキシング処理が、音声信号処理サーバ１３ではなく端末装置１１の各々において行われる。音声信号の抽出処理およびミキシング処理は会議に参加する参加者１９の各々について行われる必要があるため、音声信号処理サーバ１３の処理能力が高くない場合において、一般的に音声会議システム２は音声会議システム１よりも速い処理を実現可能である。 [2. Second Embodiment]
In the audio conference system 1 described above, audio signal extraction processing and mixing processing are performed in the audio signal processing server 13. In the audio conference system 2 according to the second embodiment described below, audio signal extraction processing and mixing processing are performed in each of the terminal devices 11 instead of the audio signal processing server 13. Since the audio signal extraction process and the mixing process need to be performed for each participant 19 participating in the conference, the audio conference system 2 generally performs the audio conference when the processing capability of the audio signal processing server 13 is not high. Processing faster than the system 1 can be realized.

図１３は音声会議システム２の構成を示したブロック図である。音声会議システム２の構成および機能は音声会議システム１のものと多くの部分が共通しているため、以下、音声会議システム２が音声会議システム１と異なる点のみを説明する。また、図１３において、上述した音声会議システム１の構成部（図１参照）と同じ構成部もしくは対応する機能を有する構成部には、音声会議システム１における場合と同じ符合が付されている。 FIG. 13 is a block diagram showing the configuration of the audio conference system 2. Since the configuration and functions of the audio conference system 2 have many parts in common with those of the audio conference system 1, only the differences between the audio conference system 2 and the audio conference system 1 will be described below. In FIG. 13, the same components as those in the audio conference system 1 described above (see FIG. 1) or components having a corresponding function are given the same reference numerals as those in the audio conference system 1.

音声会議システム２においては、音声信号の抽出の条件を指定する指定データを端末装置１１から音声信号処理サーバ１３に送信する必要がないため、端末装置１１は指定データ送信部１１２を備えておらず、また音声信号処理サーバ１３は指定データ受信部１３１を備えていない。また、音声会議システム２においては、音声信号の抽出処理および抽出された音声信号のミキシング処理が各々の端末装置１１において行われるため、音声信号処理サーバ１３は抽出部１３３およびミキシング部１３４を備えず、端末装置１１が抽出部１３３およびミキシング部１３４を備えている。さらに、音声会議システム２においては、音声記録ファイルの記憶は端末装置１１において行われる。 In the audio conference system 2, it is not necessary to transmit the specification data for specifying the audio signal extraction condition from the terminal device 11 to the audio signal processing server 13, and thus the terminal device 11 does not include the specification data transmission unit 112. The audio signal processing server 13 does not include the designated data receiving unit 131. In the audio conference system 2, the audio signal extraction processing and the mixing processing of the extracted audio signals are performed in each terminal device 11, so the audio signal processing server 13 does not include the extraction unit 133 and the mixing unit 134. The terminal device 11 includes an extraction unit 133 and a mixing unit 134. Further, in the audio conference system 2, the audio recording file is stored in the terminal device 11.

音声会議システム２においては、端末装置１１が音声信号処理サーバ１３との間で会議のための通信コネクションを確立した後、音声信号処理サーバ１３から端末装置１１に対し、会議データ１３９１（図２参照）の該当するレコードと、対応データ１３９４（図６参照）が送信される。また、音声信号処理サーバ１３との間で会議のための通信コネクションを確立している端末装置１１の構成に変更が生じた場合、音声信号処理サーバ１３はその変更を反映した対応データ１３９４を端末装置１１に送信する。端末装置１１は対応データ１３９４を受信すると、受信した対応データ１３９４を記憶部１１６に一時的に記憶する。 In the audio conference system 2, after the terminal device 11 establishes a communication connection for the conference with the audio signal processing server 13, the audio signal processing server 13 sends conference data 1391 to the terminal device 11 (see FIG. 2). ) And the corresponding data 1394 (see FIG. 6) are transmitted. Also, when a change occurs in the configuration of the terminal device 11 that has established a communication connection for a conference with the audio signal processing server 13, the audio signal processing server 13 sends corresponding data 1394 reflecting the change to the terminal. Transmit to device 11. When the terminal device 11 receives the correspondence data 1394, the terminal device 11 temporarily stores the received correspondence data 1394 in the storage unit 116.

会議中において、参加者１９の音声を示す音声信号のサンプルが端末装置１１から音声信号処理サーバ１３に送信され、各々、データバッファ１３９３に記憶されると、音声信号送信部１３５が新たにデータバッファ１３９３に記憶された音声信号のサンプルを同じ時刻における複数の音声を示すサンプルとしてグループ化し、順次端末装置１１の各々に送信する。すなわち、音声信号処理サーバ１３から端末装置１１に送信される音声信号は、個々の参加者１９の音声を示す音声信号がミキシングされていない状態のものであり、いずれの端末装置１１にも同じ内容の音声信号が送信されることになる。 During the conference, when a sample of the audio signal indicating the audio of the participant 19 is transmitted from the terminal device 11 to the audio signal processing server 13 and stored in the data buffer 1393, the audio signal transmission unit 135 newly adds a data buffer. The audio signal samples stored in 1393 are grouped as samples indicating a plurality of audio signals at the same time, and sequentially transmitted to each of the terminal devices 11. That is, the audio signal transmitted from the audio signal processing server 13 to the terminal device 11 is in a state where the audio signal indicating the audio of each participant 19 is not mixed, and the same content is applied to any of the terminal devices 11. The audio signal is transmitted.

端末装置１１は、音声信号処理サーバ１３から複数の音声信号を示すサンプルを順次受信すると、受信したサンプルの各々をデータバッファ１１６１−１〜ｎの各々に順次記憶してゆく。すなわち、データバッファ１１６１−１〜ｎには、会議に参加している参加者１９−１〜ｎの各々の音声信号が個別に記憶される。 When the terminal device 11 sequentially receives samples indicating a plurality of audio signals from the audio signal processing server 13, the terminal device 11 sequentially stores each of the received samples in each of the data buffers 1161-1 to 116-n. That is, the audio signals of the participants 19-1 to 19-n participating in the conference are individually stored in the data buffers 1161-1 to n.

例えば、参加者１９−１は、端末装置１１−１を操作して、発言を聞きたいと思う参加者を指定する。端末装置１１−１は、参加者１９−１の操作に応じて指定データを生成し、指定データ１３９５として記憶部１１６に一時的に記憶する。端末装置１１の抽出部１３３は、記憶部１１６に記憶されている指定データ１３９５および対応データ１３９４に従い、データバッファ１１６１−１〜ｎに記憶される音声信号から参加者１９−１が指定した条件を満たす音声信号を抽出し、ミキシング部１３４に引き渡す。ミキシング部１３４は抽出部１３３から受け取った音声信号をミキシングして合成音声信号を生成し、音声信号処理部１１３に引き渡す。その結果、ヘッドセット１２−１からは、音声会議システム１における場合と同様に、参加者１９−１の希望する参加者の発言のみを含む音声が発音される。 For example, the participant 19-1 operates the terminal device 11-1 and designates a participant who wants to hear a remark. The terminal device 11-1 generates designated data in response to the operation of the participant 19-1 and temporarily stores it as the designated data 1395 in the storage unit 116. The extraction unit 133 of the terminal device 11 determines the conditions specified by the participant 19-1 from the audio signals stored in the data buffers 1161-1 to n according to the specification data 1395 and the corresponding data 1394 stored in the storage unit 116. An audio signal that satisfies the condition is extracted and delivered to the mixing unit 134. The mixing unit 134 mixes the audio signal received from the extraction unit 133 to generate a synthesized audio signal, and delivers it to the audio signal processing unit 113. As a result, as in the case of the audio conference system 1, the headset 12-1 produces a sound including only the speech of the participant desired by the participant 19-1.

会議が終了すると、端末装置１１はデータバッファ１１６１−１〜ｎに記憶されている音声信号、先に音声信号処理サーバ１３から受信した会議データ１３９１の該当するデータおよび対応データ１３９４を用いて、音声記録ファイル（図９参照）を作成し、記憶部１１６に記憶する。 When the conference is ended, the terminal device 11 uses the audio signals stored in the data buffers 1161-1 to 116-n, the corresponding data of the conference data 1391 received from the audio signal processing server 13 and the corresponding data 1394, to generate audio. A recording file (see FIG. 9) is created and stored in the storage unit 116.

参加者１９は、過去の会議の記録音声を聞きたい場合、音声記録ファイル群１３９６に含まれる音声記録ファイルのリストを端末装置１１に表示させ、記録音声を聞きたい会議を選択する。さらに、参加者１９は選択した会議の参加者のうち、発言を聞きたい参加者を選択し、必要に応じて聞きたい発言の時間帯等を指定する。その操作に応じて、端末装置１１は指定データを生成し指定データ１３９５として記憶部１１６に一時的に記憶した後、抽出部１３３による抽出処理を開始する。すなわち、端末装置１１の抽出部１３３は、参加者１９の指定する会議に関する音声記録ファイルに含まれる音声信号から、指定データ１３９５により示される条件を満たす音声信号を抽出する。 When the participant 19 wants to hear the recorded audio of the past conference, the participant 19 displays a list of audio recording files included in the audio recording file group 1396 on the terminal device 11 and selects the conference for which he wants to hear the recorded audio. Furthermore, the participant 19 selects a participant who wants to hear a speech among the participants of the selected conference, and designates a time zone of the speech to be heard as necessary. In response to the operation, the terminal device 11 generates designated data, temporarily stores it as designated data 1395 in the storage unit 116, and then starts extraction processing by the extracting unit 133. That is, the extraction unit 133 of the terminal device 11 extracts an audio signal that satisfies the condition indicated by the designation data 1395 from the audio signal included in the audio recording file related to the conference designated by the participant 19.

抽出部１３３により抽出された音声信号はミキシング部１３４に引き渡され、ミキシング処理された後、音声信号処理部１１３を介してヘッドセット１２のヘッドフォンに出力され、発音される。音声会議システム２においては、参加者１９が過去の会議の記録音声を聞きたい場合、音声信号処理サーバ１３と端末装置１１との間に通信コネクションが確立される必要がない。 The audio signal extracted by the extraction unit 133 is delivered to the mixing unit 134, mixed, and then output to the headphones of the headset 12 through the audio signal processing unit 113 to be sounded. In the audio conference system 2, when the participant 19 wants to listen to the recorded audio of the past conference, it is not necessary to establish a communication connection between the audio signal processing server 13 and the terminal device 11.

以上説明したように、音声会議システム２によっても、音声会議システム１における場合と同様に、参加者１９は希望する発言者の発言のみを聞きながら会議に参加したり、過去の会議の記録音声のうち、希望する発言者の発言のみを聞くことができる。 As described above, according to the audio conference system 2, as in the audio conference system 1, the participant 19 can participate in the conference while listening only to the speech of the desired speaker, or the recorded audio of past conferences can be recorded. You can only listen to the speech of the speaker you want.

第２実施形態に関しても、第１実施形態と同様に、音声信号処理サーバ１３および端末装置１１は、専用のハードウェアにより実現されてもよいし、音声信号の入出力が可能な汎用コンピュータにアプリケーションプログラムに従った処理を実行させることにより実現されてもよい。また、音声会議システム２においても、音声会議システム１と同様の変形を行うことが可能である。 Also in the second embodiment, as in the first embodiment, the audio signal processing server 13 and the terminal device 11 may be realized by dedicated hardware, or applied to a general-purpose computer capable of inputting and outputting audio signals. You may implement | achieve by performing the process according to a program. Also, the audio conference system 2 can be modified in the same manner as the audio conference system 1.

本発明の第１実施形態にかかる音声会議システムの構成を示したブロック図である。It is the block diagram which showed the structure of the audio conference system concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかる会議データの内容を例示した図である。It is the figure which illustrated the contents of conference data concerning a 1st embodiment of the present invention. 本発明の第１実施形態にかかる参加者データの内容を例示した図である。It is the figure which illustrated the contents of participant data concerning a 1st embodiment of the present invention. 本発明の第１実施形態にかかる参加者ＩＤ、端末ＩＤ、コネクションＩＤおよびデータバッファの対応関係を示す図である。It is a figure which shows the correspondence of participant ID, terminal ID, connection ID, and a data buffer concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかるコネクションＩＤと端末ＩＤの対応表を例示した図である。It is the figure which illustrated the correspondence table of connection ID and terminal ID concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかる対応データの内容を例示した図である。It is the figure which illustrated the contents of the correspondence data concerning a 1st embodiment of the present invention. 本発明の第１実施形態にかかる音声信号とデータパケットの関係を模式的に示した図である。It is the figure which showed typically the relationship between the audio | voice signal and data packet concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかるデータバッファに音声信号が記憶される様子を模式的に示した図である。It is the figure which showed typically a mode that an audio | voice signal was memorize | stored in the data buffer concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかる音声記録ファイルの内容を模式的に示した図である。It is the figure which showed typically the content of the audio | voice recording file concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかる端末装置の表示部に表示される画面を例示した図である。It is the figure which illustrated the screen displayed on the display part of the terminal device concerning 1st Embodiment of this invention. 本発明の第１実施形態にかかる端末装置の表示部に表示される画面を例示した図である。It is the figure which illustrated the screen displayed on the display part of the terminal device concerning 1st Embodiment of this invention. 本発明の第１実施形態の変形例にかかる音声会議システムの構成を示した図である。It is the figure which showed the structure of the audio conference system concerning the modification of 1st Embodiment of this invention. 本発明の第２実施形態にかかる音声会議システムの構成を示したブロック図である。It is the block diagram which showed the structure of the audio conference system concerning 2nd Embodiment of this invention.

符号の説明Explanation of symbols

１・２・１０１…音声会議システム、１０…ネットワーク、１１…端末装置、１２…ヘッドセット、１３…音声信号処理サーバ、１９…参加者、１１１…計時部、１１２…指定データ送信部、１１３…音声信号処理部、１１４・１３５…音声信号送信部、１１５・１３２…音声信号受信部、１１６・１３９…記憶部、１３１…指定データ受信部、１３３…抽出部、１３４…ミキシング部、１１６１・１３９３…データバッファ、１３９１…会議データ、１３９２…参加者データ、１３９４…対応データ、１３９５…指定データ、１３９６…音声記録ファイル群。 DESCRIPTION OF SYMBOLS 1 * 2 * 101 ... Voice conference system, 10 ... Network, 11 ... Terminal device, 12 ... Headset, 13 ... Audio signal processing server, 19 ... Participant, 111 ... Time measuring part, 112 ... Designated data transmission part, 113 ... Audio signal processing unit, 114/135, audio signal transmission unit, 115/132, audio signal reception unit, 116/139 ... storage unit, 131 ... designated data reception unit, 133 ... extraction unit, 134 ... mixing unit, 1161-1393 ... data buffer, 1391 ... conference data, 1392 ... participant data, 1394 ... corresponding data, 1395 ... designated data, 1396 ... voice recording file group.

Claims

複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る音声信号入力手段と、
前記音声信号入力手段により受け取られた音声信号および属性データを対応付けて記憶する音声信号記憶手段と、
任意の属性を指定する属性指定データを受け取る属性指定データ入力手段と、
前記音声信号記憶手段に記憶された音声信号のうち、前記属性指定データにより指定された属性を示す属性データに対応付けて記憶されている音声信号を抽出する抽出手段と、
前記抽出手段により抽出された音声信号を出力する出力手段と
を備えることを特徴とする音声信号処理装置。 Audio signal input means for receiving audio signals output from a plurality of terminal devices together with attribute data indicating attributes of the audio signals;
Audio signal storage means for storing the audio signal and attribute data received by the audio signal input means in association with each other;
Attribute designation data input means for receiving attribute designation data for designating an arbitrary attribute;
Extraction means for extracting a voice signal stored in association with attribute data indicating an attribute designated by the attribute designation data from among the voice signals stored in the voice signal storage means;
An audio signal processing apparatus comprising: output means for outputting the audio signal extracted by the extracting means.

任意の属性を指定する属性指定データを受け取る属性指定データ入力手段と、
複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る音声信号入力手段と、
前記音声信号入力手段により受け取られた音声信号のうち、前記属性指定データにより指定された属性を示す属性データとともに受け取られた音声信号を抽出する抽出手段と、
前記抽出手段により抽出された音声信号を出力する出力手段と
を備えることを特徴とする音声信号処理装置。 Attribute designation data input means for receiving attribute designation data for designating an arbitrary attribute;
Audio signal input means for receiving audio signals output from a plurality of terminal devices together with attribute data indicating attributes of the audio signals;
Extracting means for extracting the audio signal received together with the attribute data indicating the attribute designated by the attribute designation data, out of the audio signals received by the audio signal input means;
An audio signal processing apparatus comprising: output means for outputting the audio signal extracted by the extracting means.

前記抽出手段により複数の音声信号が抽出された場合、当該複数の音声信号をミキシングするミキシング手段をさらに備え、
前記出力手段は、前記抽出手段により音声信号が１のみ抽出された場合は当該音声信号を出力し、前記抽出手段により複数の音声信号が抽出された場合は前記ミキシング手段によりミキシングされた音声信号を出力する
ことを特徴とする請求項１または２に記載の音声信号処理装置。 In the case where a plurality of audio signals are extracted by the extraction means, further comprising a mixing means for mixing the plurality of audio signals,
The output unit outputs the audio signal when only one audio signal is extracted by the extraction unit, and outputs the audio signal mixed by the mixing unit when a plurality of audio signals are extracted by the extraction unit. The audio signal processing apparatus according to claim 1, wherein the audio signal processing apparatus outputs the sound signal.

前記属性指定データ入力手段は、一の端末装置から前記属性指定データを受け取り、
前記出力手段は、当該一の端末装置に音声信号を出力する
ことを特徴とする請求項１または２に記載の音声信号処理装置。 The attribute designation data input means receives the attribute designation data from one terminal device,
The audio signal processing apparatus according to claim 1, wherein the output unit outputs an audio signal to the one terminal device.

前記属性データは音声信号の出力元の端末装置の識別子であり、
前記属性指定データは１以上の端末装置の識別子を指定するデータである
ことを特徴とする請求項１または２に記載の音声信号処理装置。 The attribute data is an identifier of a terminal device from which an audio signal is output,
The audio signal processing apparatus according to claim 1 or 2, wherein the attribute specifying data is data specifying an identifier of one or more terminal devices.

前記複数の端末装置はネットワークを介して前記音声信号入力手段と接続されており、
前記識別子は前記ネットワーク上で端末装置に割り当てられたアドレスである
ことを特徴とする請求項５に記載の音声信号処理装置。 The plurality of terminal devices are connected to the audio signal input means via a network,
The audio signal processing apparatus according to claim 5, wherein the identifier is an address assigned to a terminal apparatus on the network.

前記属性データは音声信号の生成された時刻を示す時刻データであり、
前記属性指定データは任意の時間帯を指定する時間帯指定データであり、
前記抽出手段は、前記時間帯指定データにより指定された時間帯に含まれる時刻を示す時刻データに対応付けられた音声信号を抽出する
ことを特徴とする請求項１または２に記載の音声信号処理装置。 The attribute data is time data indicating the time when the audio signal is generated,
The attribute designation data is time zone designation data for designating an arbitrary time zone,
3. The audio signal processing according to claim 1, wherein the extraction unit extracts an audio signal associated with time data indicating a time included in a time zone specified by the time zone specifying data. apparatus.

一の種類の属性を示す属性データと他の種類の属性を示す属性データとの対応関係を示す対応データを記憶する対応データ記憶手段をさらに備え、
前記抽出手段は、前記音声信号入力手段により受け取られた一の種類の属性データに代えて、前記対応データに従い当該属性データに対応する他の種類の属性データを用いて、音声信号の抽出を行う
ことを特徴とする請求項１または２に記載の音声信号処理装置。 A correspondence data storage unit for storing correspondence data indicating a correspondence relationship between attribute data indicating one type of attribute and attribute data indicating another type of attribute;
The extraction unit extracts an audio signal using another type of attribute data corresponding to the attribute data according to the corresponding data instead of the one type of attribute data received by the audio signal input unit. The audio signal processing apparatus according to claim 1 or 2,

前記対応データは、端末装置の識別子と当該端末装置を使用する話者の属性を示す話者データとの対応関係を示し、
前記音声信号入力手段は、音声信号とともに当該音声信号の出力元の端末装置の識別子を属性データとして受け取り、
前記属性指定データ入力手段は、話者の属性を指定するデータを属性指定データとして受け取り、
前記抽出手段は、前記属性指定データにより指定された属性を示す話者データに対応する端末装置の識別子を前記対応データに従い特定し、特定した識別子とともに受け取られた音声信号を抽出する
ことを特徴とする請求項８に記載の音声信号処理装置。 The correspondence data indicates a correspondence relationship between the identifier of the terminal device and speaker data indicating the attribute of the speaker who uses the terminal device;
The audio signal input means receives the identifier of the terminal device that is the output source of the audio signal together with the audio signal as attribute data,
The attribute designation data input means receives data for designating speaker attributes as attribute designation data,
The extraction means specifies an identifier of a terminal device corresponding to speaker data indicating an attribute designated by the attribute designation data according to the correspondence data, and extracts a voice signal received together with the identified identifier. The audio signal processing device according to claim 8.

複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る処理と、
受け取った音声信号および属性データを対応付けて記憶する処理と、
任意の属性を指定する属性指定データを受け取る処理と、
前記音声信号のうち、前記属性指定データにより指定された属性を示す属性データに対応付けて記憶されている音声信号を抽出する処理と、
抽出した音声信号を出力する処理と
をコンピュータに実行させることを特徴とするプログラム。 A process of receiving audio signals output from a plurality of terminal devices together with attribute data indicating attributes of the audio signals;
A process of storing the received audio signal and attribute data in association with each other;
A process of receiving attribute specification data specifying an arbitrary attribute;
A process of extracting an audio signal stored in association with attribute data indicating an attribute designated by the attribute designation data from the audio signal;
A program for causing a computer to execute processing for outputting an extracted audio signal.

任意の属性を指定する属性指定データを受け取る処理と、
複数の端末装置から出力される音声信号を当該音声信号の属性を示す属性データとともに受け取る処理と、
受け取った音声信号のうち、前記属性指定データにより指定された属性を示す属性データとともに受け取った音声信号を抽出する処理と、
抽出した音声信号を出力する処理と
をコンピュータに実行させることを特徴とするプログラム。 A process of receiving attribute specification data specifying an arbitrary attribute;
A process of receiving audio signals output from a plurality of terminal devices together with attribute data indicating attributes of the audio signals;
A process of extracting a received audio signal together with attribute data indicating an attribute specified by the attribute specifying data from the received audio signal;
A program for causing a computer to execute processing for outputting an extracted audio signal.