JP2021139953A

JP2021139953A - Audio conference support device, program, audio conference support method, and audio conference support system

Info

Publication number: JP2021139953A
Application number: JP2020035136A
Authority: JP
Inventors: 真鳥越; Makoto Torigoe
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2021-09-16

Abstract

To save the trouble for grouping for minute book creation.SOLUTION: An audio conference support device comprises: a reception part which receives signals from a plurality of audio conference terminals; an identification part which identifies whether or not the signals received by the reception part from the plurality of audio conference terminals respectively are predetermined signals; a group management part which forms, when the identification part identifies that the signals are the predetermined signals within a predetermined time difference, a group consisting of the audio conference terminals as transmission sources of the plurality of identified signals; an audio recognition part which recognizes the audio signals received from the audio conference terminals; and a transmission part which transmits a character string indicative of recognition results of audio signals received from two or more audio conference terminals belonging to the same group, to the two or more audio conference terminals.SELECTED DRAWING: Figure 5

Description

本発明は、音声会議支援装置、プログラム、音声会議支援方法、および音声会議支援システムに関する。 The present invention relates to a voice conference support device, a program, a voice conference support method, and a voice conference support system.

近年、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）およびネットワークの性能が向上していることに伴い、ビジネスでの連絡手段の１つであったＥメールおよび内線電話などが、個々のＰＣ上で利用されるコミニケションツールおよびグループウェアのようなソフトウェア、またはＷｅｂサービスなどに置き換えられつつある。また、音声会議を行うための手段も、専用の装置からソフトウェアまたはＷｅｂサービスなどに変わりつつある。 In recent years, as the performance of PCs (Personal Computers) and networks has improved, e-mail and extension telephones, which were one of the means of communication in business, are used as communication tools on individual PCs. And it is being replaced by software such as groupware, or Web services. Further, the means for conducting a voice conference is also changing from a dedicated device to software or a Web service.

一方で、ＡＩ技術の急速な発達により音声認識の技術向上が飛躍したことから、音声会議の議事録を会議音声に基づいて自動生成する技術も現れている。例えば、特許文献１には、複数の出席者の各々が利用する音声会議端末と議事録作成装置をネットワークで結んだ音声会議システムを構築し、リアルタイムな議事録を生成する技術が開示されている。 On the other hand, with the rapid development of AI technology, the improvement of voice recognition technology has made great strides, and therefore, a technology for automatically generating the minutes of a voice conference based on the conference voice has also appeared. For example, Patent Document 1 discloses a technique for constructing a voice conference system in which a voice conference terminal used by each of a plurality of attendees and a minutes creation device are connected by a network to generate minutes in real time. ..

特開２００２−３４４６３６号公報JP-A-2002-344636

議事録作成装置が同一会議に参加する出席者の発言を１の議事録にまとめる場合、議事録作成装置は、同一会議に参加する出席者の音声会議端末の組み合わせを把握し、グルーピングする必要がある。特に、議事録作成装置と音声会議のためのシステムとが別々に構成される場合、議事録作成装置および音声会議のためのシステムの各々において、同一会議に参加する出席者の音声会議端末をグルーピングすることになる。 When the minutes preparation device summarizes the remarks of the attendees who participate in the same meeting into one minutes, the minutes preparation device needs to grasp and group the combinations of the voice conference terminals of the attendees who participate in the same meeting. be. In particular, when the minutes preparation device and the system for the voice conference are separately configured, the voice conference terminals of the attendees who participate in the same conference are grouped in each of the minutes preparation device and the system for the voice conference. Will be done.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、議事録作成のためのグルーピングの手間を軽減することが可能な、新規かつ改良された音声会議支援装置、プログラム、音声会議支援方法、および音声会議支援システムを提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is a new and improved voice conference capable of reducing the time and effort of grouping for creating minutes. The purpose is to provide assistive devices, programs, audio conferencing support methods, and audio conferencing support systems.

上記課題を解決するために、本発明のある観点によれば、複数の音声会議端末から信号を受信する受信部と、前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、音声会議端末から受信された音声信号を認識する音声認識部と、同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、を備える、音声会議支援装置が提供される。 In order to solve the above problems, according to a certain viewpoint of the present invention, a receiving unit that receives signals from a plurality of audio conferencing terminals and a signal received from each of the plurality of audio conferencing terminals by the receiving unit are predetermined. A group consisting of an identification unit that identifies whether or not the signal is a signal, and a voice conference terminal that is the source of each of the plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit. The group management unit to be formed, the voice recognition unit that recognizes the voice signal received from the voice conference terminal, and the character string indicating the recognition result of the voice signal received from two or more voice conference terminals belonging to the same group. A voice conference support device including a transmission unit for transmitting to the two or more voice conference terminals is provided.

前記受信部は、前記音声会議端末から第１の信号および第２の信号を受信し、前記第１の信号は、前記音声会議端末から他の音声会議端末に送信される信号であり、前記第２の信号は、前記音声会議端末が前記他の音声会議端末から受信した信号であってもよい。 The receiving unit receives the first signal and the second signal from the voice conference terminal, and the first signal is a signal transmitted from the voice conference terminal to another voice conference terminal, and the first signal. The signal of 2 may be a signal received by the voice conference terminal from the other voice conference terminal.

前記第１の信号は、前記音声会議端末の利用者が発した音声を示す音声信号であり、前記第２の信号は、前記他の音声会議端末の利用者が発した音声を示す音声信号であってもよい。 The first signal is a voice signal indicating a voice emitted by a user of the voice conference terminal, and the second signal is a voice signal indicating a voice emitted by a user of the other voice conference terminal. There may be.

前記所定の信号は、第１のキーワードを含む音声を示す音声信号であってもよい。 The predetermined signal may be an audio signal indicating an audio including the first keyword.

前記識別部は、前記複数の音声会議端末の各々から受信された信号が示す音声を発した利用者をさらに識別し、前記グループ管理部は、前記識別部により所定の時間差内で前記所定の信号であると識別され、かつ、音声を発した利用者が同一であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成してもよい。 The identification unit further identifies the user who has emitted the sound indicated by the signal received from each of the plurality of voice conference terminals, and the group management unit further identifies the user who has emitted the sound indicated by the signal received from each of the plurality of voice conference terminals, and the group management unit has the predetermined signal within a predetermined time difference by the identification unit. A group consisting of voice conferencing terminals that are the sources of each of the plurality of signals identified as being the same and that the user who emitted the voice is the same may be formed.

前記送信部は、前記グループが形成された後に、当該グループに属する前記２以上の音声会議端末に、前記第２の信号の送信の停止を指示する制御信号を送信してもよい。 After the group is formed, the transmission unit may transmit a control signal instructing to stop the transmission of the second signal to the two or more voice conferencing terminals belonging to the group.

前記識別部は、前記受信部により前記複数の音声会議端末の各々から受信された音声信号が第２のキーワードを含む音声を示すか否かを識別し、前記グループ管理部は、前記グループに属するいずれかの音声会議端末から受信された音声信号が前記第２のキーワードを含む音声を示すことが前記識別部により識別された場合、前記グループを解除してもよい。 The identification unit identifies whether or not the audio signal received from each of the plurality of audio conferencing terminals by the receiving unit indicates audio including the second keyword, and the group management unit belongs to the group. When the identification unit identifies that the audio signal received from any of the audio conferencing terminals indicates the audio including the second keyword, the group may be released.

前記送信部は、前記グループが解除された場合、前記グループに属していた前記２以上の音声会議端末に、前記第２の信号の送信の開始を指示する制御信号を送信してもよい。 When the group is released, the transmission unit may transmit a control signal instructing the start of transmission of the second signal to the two or more voice conferencing terminals belonging to the group.

前記所定の信号は、前記音声会議端末に事前に記録されており、前記音声会議端末への利用者による操作により読み出された信号であってもよい。 The predetermined signal may be a signal that has been recorded in advance in the voice conference terminal and read out by a user's operation on the voice conference terminal.

前記所定の信号は、音声信号または非可聴領域の成分からなる信号であってもよい。 The predetermined signal may be an audio signal or a signal composed of components in the inaudible region.

また、上記課題を解決するために、本発明の別の観点によれば、コンピュータを、複数の音声会議端末から信号を受信する受信部と、前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、音声会議端末から受信された音声信号を認識する音声認識部と、同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、として機能させるための、プログラムが提供される。 Further, in order to solve the above problems, according to another viewpoint of the present invention, the computer is connected to a receiving unit that receives signals from a plurality of voice conferencing terminals and from each of the plurality of voice conferencing terminals by the receiving unit. An identification unit that identifies whether or not the received signal is a predetermined signal, and a voice that is a source of each of a plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit. A group management unit that forms a group consisting of conference terminals, a voice recognition unit that recognizes voice signals received from voice conference terminals, and a recognition result of voice signals received from two or more voice conference terminals belonging to the same group. A program is provided for functioning as a transmission unit that transmits the character string indicating the above to the two or more voice conference terminals.

また、上記課題を解決するために、本発明の別の観点によれば、複数の音声会議端末から信号を受信することと、前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別することと、所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成することと、音声会議端末から受信された音声信号を認識することと、同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信することと、を含む、音声会議支援方法が提供される。 Further, in order to solve the above problems, according to another viewpoint of the present invention, it is possible to receive a signal from a plurality of voice conference terminals, and a signal received from each of the plurality of voice conference terminals is a predetermined signal. To form a group consisting of voice conference terminals that are the sources of each of the plurality of signals identified as the predetermined signals within a predetermined time difference, and to form a voice conference. Recognizing the voice signal received from the terminal and transmitting a character string indicating the recognition result of the voice signal received from two or more voice conference terminals belonging to the same group to the two or more voice conference terminals. And, including, voice conference support methods are provided.

また、上記課題を解決するために、本発明の別の観点によれば、複数の音声会議端末および音声会議支援装置を有する音声会議支援システムであって、前記複数の音声会議端末の各々は、前記音声会議支援装置に信号を送信し、前記音声会議支援装置は、複数の音声会議端末から信号を受信する受信部と、前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、音声会議端末から受信された音声信号を認識する音声認識部と、同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、を備える、音声会議支援システムが提供される。 Further, in order to solve the above problems, according to another viewpoint of the present invention, a voice conference support system having a plurality of voice conference terminals and a voice conference support device, and each of the plurality of voice conference terminals is The voice conference support device transmits a signal to the voice conference support device, and the voice conference support device has a receiving unit that receives signals from a plurality of voice conference terminals and signals received from each of the plurality of voice conference terminals by the receiving unit. A group consisting of an identification unit that identifies whether or not the signal is a predetermined signal, and a voice conference terminal that is the source of each of the plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit. A character string indicating the recognition result of the voice signal received from the group management unit, the voice recognition unit that recognizes the voice signal received from the voice conference terminal, and two or more voice conference terminals belonging to the same group. , A voice conference support system including a transmission unit for transmitting to the two or more voice conference terminals is provided.

以上説明した本発明によれば、議事録作成のためのグルーピングの手間を軽減することが可能である。 According to the present invention described above, it is possible to reduce the time and effort of grouping for creating minutes.

本発明の一実施形態による音声会議支援システムの構成を示す説明図である。It is explanatory drawing which shows the structure of the voice conference support system by one Embodiment of this invention. 比較例による音声会議システムを示す説明図である。It is explanatory drawing which shows the audio conferencing system by the comparative example. 本発明の一実施形態による音声会議支援システムの基本構成を示す説明図である。It is explanatory drawing which shows the basic structure of the voice conference support system by one Embodiment of this invention. 本発明の一実施形態による音声会議端末２０の構成を示す説明図である。It is explanatory drawing which shows the structure of the voice conference terminal 20 by one Embodiment of this invention. 本発明の一実施形態による議事録作成サーバ３０の構成を示す説明図である。It is explanatory drawing which shows the structure of the minutes making server 30 by one Embodiment of this invention. 音声会議端末２０の起動時の動作を示す説明図である。It is explanatory drawing which shows the operation at the time of activation of a voice conference terminal 20. 音声会議のセッションが成立した後に議事録作成サーバ３０が複数の音声会議端末２０をグルーピングする動作を示す説明図である。It is explanatory drawing which shows the operation which the minutes making server 30 groupes a plurality of voice conference terminals 20 after the session of a voice conference is established. グルーピング後の動作を示す説明図である。It is explanatory drawing which shows the operation after grouping. 音声会議の終了時の動作を示す説明図である。It is explanatory drawing which shows the operation at the end of a voice conference. 議事録作成サーバ３０のハードウェア構成を示したブロック図である。It is a block diagram which showed the hardware structure of the minutes making server 30.

以下に添付図面を参照しながら、本発明の実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

また、本明細書及び図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なるアルファベットを付して区別する場合もある。例えば、実質的に同一の機能構成または論理的意義を有する複数の構成を、必要に応じて音声会議端末２０Ａ、２０Ｂおよび２０Ｃのように区別する。ただし、実質的に同一の機能構成を有する複数の構成要素の各々を特に区別する必要がない場合、複数の構成要素の各々に同一符号のみを付する。例えば、音声会議端末２０Ａ、２０Ｂおよび２０Ｃを特に区別する必要が無い場合には、各音声会議端末を単に音声会議端末２０と称する。 Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding different alphabets after the same reference numerals. For example, a plurality of configurations having substantially the same functional configuration or logical meaning are distinguished as necessary, such as voice conference terminals 20A, 20B, and 20C. However, when it is not necessary to particularly distinguish each of the plurality of components having substantially the same functional configuration, only the same reference numerals are given to each of the plurality of components. For example, when it is not necessary to distinguish the voice conference terminals 20A, 20B and 20C, each voice conference terminal is simply referred to as a voice conference terminal 20.

＜１．音声会議支援システムの概要＞
本発明の一実施形態は、遠隔する拠点から複数の利用者が参加する音声会議を支援する音声会議支援システムに関する。まず、図１を参照し、本発明の一実施形態による音声会議支援システムの概要を説明する。 <1. Overview of voice conference support system>
One embodiment of the present invention relates to a voice conference support system that supports a voice conference in which a plurality of users participate from a remote base. First, with reference to FIG. 1, an outline of a voice conference support system according to an embodiment of the present invention will be described.

図１は、本発明の一実施形態による音声会議支援システムの構成を示す説明図である。図１に示したように、本発明の一実施形態による音声会議支援システムは、音声会議端末２０Ａ〜２０Ｃおよび議事録作成サーバ３０を有する。 FIG. 1 is an explanatory diagram showing a configuration of a voice conference support system according to an embodiment of the present invention. As shown in FIG. 1, the voice conference support system according to the embodiment of the present invention includes voice conference terminals 20A to 20C and a minutes creation server 30.

これら音声会議端末２０Ａ〜２０Ｃおよび議事録作成サーバ３０はネットワーク１２を介して接続されている。ネットワーク１２は、ネットワーク１２に接続されている装置から送信される情報の有線、または無線の伝送路である。例えば、ネットワーク１２は、インターネット、電話回線網、衛星通信網などの公衆回線網や、Ｅｔｈｅｒｎｅｔ（登録商標）を含む各種のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などを含んでもよい。また、ネットワーク１２は、ＩＰ−ＶＰＮ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ−ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）などの専用回線網を含んでもよい。 These voice conference terminals 20A to 20C and the minutes creation server 30 are connected via the network 12. The network 12 is a wired or wireless transmission path for information transmitted from a device connected to the network 12. For example, the network 12 may include a public line network such as the Internet, a telephone line network, a satellite communication network, various LANs (Local Area Network) including Ethernet (registered trademark), and a WAN (Wide Area Network). Further, the network 12 may include a dedicated line network such as IP-VPN (Internet Protocol-Virtual Private Network).

（音声会議端末）
音声会議端末２０は、音声会議端末２０の利用者が発した音声を示す音声信号を他の音声会議端末２０に送信する。また、音声会議端末２０は、他の音声会議端末２０の利用者が発した音声を示す音声信号を他の音声会議端末２０から受信し、当該音声信号に基づいて他の音声会議端末２０の利用者が発した音声を出力する。 (Voice conference terminal)
The voice conference terminal 20 transmits a voice signal indicating a voice emitted by a user of the voice conference terminal 20 to another voice conference terminal 20. Further, the voice conference terminal 20 receives a voice signal indicating a voice emitted by a user of the other voice conference terminal 20 from the other voice conference terminal 20, and uses the other voice conference terminal 20 based on the voice signal. Outputs the voice emitted by the person.

例えば、図１に示した例では、利用者ＵＡが発した音声を示す音声信号を音声会議端末２０Ａが音声会議端末２０Ｂに送信し、音声会議端末２０Ｂが当該音声信号に基づいて利用者ＵＡが発した音声を出力する。また、利用者ＵＢが発した音声を示す音声信号を音声会議端末２０Ｂが音声会議端末２０Ａに送信し、音声会議端末２０Ａが当該音声信号に基づいて利用者ＵＢが発した音声を出力する。かかる構成により、利用者ＵＡおよび利用者ＵＢが音声会議を行うことが可能である。 For example, in the example shown in FIG. 1, the voice conference terminal 20A transmits a voice signal indicating the voice emitted by the user UA to the voice conference terminal 20B, and the voice conference terminal 20B is used by the user UA based on the voice signal. Output the emitted sound. Further, the voice conference terminal 20B transmits a voice signal indicating the voice emitted by the user UB to the voice conference terminal 20A, and the voice conference terminal 20A outputs the voice emitted by the user UB based on the voice signal. With such a configuration, the user UA and the user UB can hold a voice conference.

また、本発明の一実施形態による音声会議端末２０は、音声会議の議事録を表示する機能も有する。具体的には、音声会議端末２０は、音声会議端末２０の利用者が発した音声を示す音声信号を議事録作成サーバ３０にも送信する。そして、音声会議端末２０は、議事録作成サーバ３０に送信された音声信号の認識結果として文字列を議事録作成サーバ３０から受信し、当該文字列を議事録に追加していく。 Further, the voice conference terminal 20 according to the embodiment of the present invention also has a function of displaying the minutes of the voice conference. Specifically, the voice conference terminal 20 also transmits a voice signal indicating the voice emitted by the user of the voice conference terminal 20 to the minutes creation server 30. Then, the voice conference terminal 20 receives a character string from the minutes creation server 30 as a recognition result of the voice signal transmitted to the minutes creation server 30, and adds the character string to the minutes.

なお、音声会議端末２０は、音声信号に加えて、映像信号を他の音声会議端末２０と送受信してもよい。また、図１においては音声会議端末２０の一例としてノート型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）を示しているが、音声会議端末２０は、デスクトップ型のＰＣ、スマートフォン、携帯電話またはＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙｐｈｏｎｅＳｙｓｔｅｍ）などの他の情報処理装置であってもよい。 The voice conference terminal 20 may transmit and receive a video signal to and from another voice conference terminal 20 in addition to the voice signal. Further, in FIG. 1, a notebook PC (Personal Computer) is shown as an example of the voice conference terminal 20, but the voice conference terminal 20 may be a desktop PC, a smartphone, a mobile phone, a PHS (Personal Handphone System), or the like. It may be another information processing device.

（議事録作成サーバ）
議事録作成サーバ３０は、音声会議支援装置の一例であり、音声会議の内容を示す議事録を作成することにより音声会議を支援する。例えば、音声会議端末２０Ａおよび音声会議端末２０Ｂを介して利用者ＵＡおよび利用者ＵＢが音声会議を行う場合、議事録作成サーバ３０は、音声会議端末２０Ａおよび音声会議端末２０Ｂから音声信号を受信し、これら音声信号を認識する。そして、議事録作成サーバ３０は、利用者ＵＡの発言を示す文字列および利用者ＵＢの発言を示す文字列が時系列に沿って並べられた議事録をリアルタイムで作成し、議事録を構成する文字列を音声会議端末２０Ａおよび音声会議端末２０Ｂに送信する。 (Minutes creation server)
The minutes creation server 30 is an example of a voice conference support device, and supports a voice conference by creating minutes showing the contents of the voice conference. For example, when the user UA and the user UB hold a voice conference via the voice conference terminal 20A and the voice conference terminal 20B, the minutes creation server 30 receives the voice signal from the voice conference terminal 20A and the voice conference terminal 20B. , Recognize these audio signals. Then, the minutes creation server 30 creates the minutes in real time in which the character string indicating the remarks of the user UA and the character string indicating the remarks of the user UB are arranged in chronological order, and constitutes the minutes. The character string is transmitted to the voice conference terminal 20A and the voice conference terminal 20B.

（背景）
ここで、図２に示す比較例による音声会議システムを参照しながら、本発明の一実施形態が創作されるに至った背景を説明する。 (background)
Here, with reference to the audio conferencing system according to the comparative example shown in FIG. 2, the background leading to the creation of the embodiment of the present invention will be described.

図２は、比較例による音声会議システムを示す説明図である。図２に示したように、比較例による音声会議システムは、ネットワーク１２を介して互いに接続された端末Ｃ１〜Ｃ６およびサーバＤを有する。サーバＤは、音声会議に参加している複数の端末から受信した音声信号を認識して１つの議事録を作成するので、サーバＤは、どの端末が同システムの音声会議に参加しているかを識別する必要がある。 FIG. 2 is an explanatory diagram showing a voice conferencing system according to a comparative example. As shown in FIG. 2, the voice conferencing system according to the comparative example has terminals C1 to C6 and a server D connected to each other via the network 12. Since the server D recognizes the voice signals received from the plurality of terminals participating in the voice conference and creates one minutes, the server D determines which terminal is participating in the voice conference of the same system. Need to be identified.

例えば、図２に示した例では、端末Ｃ１、Ｃ２およびＣ５が同じ音声会議に参加するグループＧ１であり、端末Ｃ４およびＣ６が他の音声会議に参加するグループＧ２であり、端末Ｃ３はいずれの音声会議にも参加していない。このように、同一時刻に複数の音声会議が開催される場合においては、音声会議の数と同じ数の会議システムが必要となるという課題がある。また、企業等で既に音声会議システムを導入していた場合、議事録の自動生成の実現のために、議事録の作成機能を有する専用の音声会議システムを新たに導入するコストが発生する。また、場合によってはシステムの更改を行うためのコストも生じる。 For example, in the example shown in FIG. 2, terminals C1, C2 and C5 are group G1 participating in the same audio conference, terminals C4 and C6 are group G2 participating in another audio conference, and terminal C3 is any of them. I haven't even participated in a voice conference. As described above, when a plurality of voice conferences are held at the same time, there is a problem that the same number of conference systems as the number of voice conferences is required. In addition, if a company or the like has already introduced a voice conference system, there will be a cost of newly introducing a dedicated voice conference system having a minutes creation function in order to realize automatic generation of minutes. In some cases, there is a cost for updating the system.

一方、既存の音声会議システムと併用可能な議事録作成システムであれば、導入コストを削減できる。本発明の一実施形態による音声会議支援システムは、このように既存の音声会議システムと併用可能なシステムである。図３を参照して、本発明の一実施形態による音声会議支援システムの基本構成を説明する。 On the other hand, if the minutes creation system can be used together with the existing voice conference system, the introduction cost can be reduced. The voice conference support system according to the embodiment of the present invention is a system that can be used in combination with the existing voice conference system in this way. A basic configuration of a voice conference support system according to an embodiment of the present invention will be described with reference to FIG.

図３は、本発明の一実施形態による音声会議支援システムの基本構成を示す説明図である。各音声会議端末２０は、ユーザＩＦ部２２０、通話アプリケーション２３０および議事録アプリケーション２４０を有する。 FIG. 3 is an explanatory diagram showing a basic configuration of a voice conference support system according to an embodiment of the present invention. Each voice conference terminal 20 has a user IF unit 220, a call application 230, and a minutes application 240.

ユーザＩＦ部２２０は、マイクのような音声入力部、スピーカのような音声出力部、および表示部などを含む。通話アプリケーション２３０は音声会議のためのアプリケーションであり、議事録アプリケーション２４０は議事録作成のために通話アプリケーション２３０と共存するアプリケーションである。 The user IF unit 220 includes an audio input unit such as a microphone, an audio output unit such as a speaker, and a display unit. The call application 230 is an application for voice conferencing, and the minutes application 240 is an application that coexists with the call application 230 for creating minutes.

議事録アプリケーション２４０は、ユーザＩＦ部２２０に含まれるマイクを通話アプリケーション２３０と共有しており、通話アプリケーション２３０で音声会議を行っている間の音声信号をマイクから取得することが可能である。また、議事録アプリケーション２４０は、当該音声信号を議事録作成サーバ３０に送信し、議事録作成サーバ３０における当該音声信号の認識結果を議事録作成サーバ３０から受信し、認識結果をユーザＩＦ部２２０に表示させることが可能である。 The minutes application 240 shares the microphone included in the user IF unit 220 with the call application 230, and can acquire a voice signal from the microphone during a voice conference in the call application 230. Further, the minutes application 240 transmits the voice signal to the minutes creation server 30, receives the recognition result of the voice signal in the minutes creation server 30 from the minutes creation server 30, and receives the recognition result from the user IF unit 220. It is possible to display it in.

図３に示した例では、音声会議端末２０Ａと音声会議端末２０Ｃの間で音声会議が行われており、音声会議端末２０Ｂと音声会議端末２０Ｄとの間で他の音声会議が行われている。実際、このように複数の音声会議が並行して行われる場合があり、この場合、議事録作成サーバ３０は同グループに属する音声会議端末２０の組み合わせを識別する必要がある。 In the example shown in FIG. 3, a voice conference is held between the voice conference terminal 20A and the voice conference terminal 20C, and another voice conference is held between the voice conference terminal 20B and the voice conference terminal 20D. .. In fact, a plurality of voice conferences may be held in parallel in this way, and in this case, the minutes creation server 30 needs to identify the combination of voice conference terminals 20 belonging to the same group.

例えば、各利用者が通話アプリケーション２３０で音声会議のグルーピングのための操作を行う際に、各利用者が議事録アプリケーション２４０でも同様に議事録作成サーバ３０におけるグルーピングのための操作を行うことにより、議事録作成サーバ３０が音声会議端末２０を適切にグルーピングし得る。しかし、各利用者が音声会議のグルーピングのための操作とは別に議事録作成サーバ３０におけるグルーピングのための操作を行うことは煩雑である。 For example, when each user performs an operation for grouping voice conferences in the call application 230, each user also performs an operation for grouping in the minutes creation server 30 in the minutes application 240 in the same manner. The minutes creation server 30 can appropriately group the voice conference terminals 20. However, it is complicated for each user to perform an operation for grouping on the minutes creation server 30 separately from the operation for grouping the voice conference.

本件発明者は、上記事情を一着眼点にして本発明の一実施形態を創作するに至った。本発明の一実施形態によれば、議事録作成のためのグルーピングの手間を軽減することが可能である。以下、本発明の一実施形態による音声会議端末２０および議事録作成サーバ３０の構成、および本発明の一実施形態の動作を順次詳細に説明する。 The present inventor has come to create an embodiment of the present invention with the above circumstances as the first point of view. According to one embodiment of the present invention, it is possible to reduce the time and effort of grouping for creating minutes. Hereinafter, the configuration of the voice conference terminal 20 and the minutes creation server 30 according to the embodiment of the present invention, and the operation of the embodiment of the present invention will be sequentially described in detail.

＜２．音声会議端末の構成＞
図４は、本発明の一実施形態による音声会議端末２０の構成を示す説明図である。図４に示したように、本発明の一実施形態による音声会議端末２０は、ユーザＩＦ部２２０、通話アプリケーション２３０および議事録アプリケーション２４０を有する。ユーザＩＦ部２２０は、第１音声入力部２２２、表示部２２４、操作部２２６および音声出力部２２８を有する。議事録アプリケーション２４０は、第２音声入力部２４２、端末送信部２４４、端末受信部２４６および制御部２４８を有する。 <2. Configuration of voice conference terminal>
FIG. 4 is an explanatory diagram showing a configuration of a voice conference terminal 20 according to an embodiment of the present invention. As shown in FIG. 4, the voice conference terminal 20 according to the embodiment of the present invention has a user IF unit 220, a call application 230, and a minutes application 240. The user IF unit 220 includes a first audio input unit 222, a display unit 224, an operation unit 226, and an audio output unit 228. The minutes application 240 has a second voice input unit 242, a terminal transmission unit 244, a terminal reception unit 246, and a control unit 248.

（第１音声入力部）
第１音声入力部２２２は、音声会議端末２０の利用者が発した音声が入力される構成である。第１音声入力部２２２は、音声会議端末２０の利用者が発した音声を電気的な音声信号に変換し、当該音声信号を通話アプリケーション２３０および端末送信部２４４に出力する。第１音声入力部２２２が出力する音声信号は第１の信号の一例であり、本明細書においては第１音声入力部２２２が出力する音声信号を第１音声信号と称する場合がある。 (1st voice input section)
The first voice input unit 222 is configured to input the voice emitted by the user of the voice conference terminal 20. The first voice input unit 222 converts the voice emitted by the user of the voice conference terminal 20 into an electrical voice signal, and outputs the voice signal to the call application 230 and the terminal transmission unit 244. The audio signal output by the first audio input unit 222 is an example of the first signal, and in the present specification, the audio signal output by the first audio input unit 222 may be referred to as a first audio signal.

（表示部）
表示部２２４は、多様な表示画面を表示する。特に、本発明の一実施形態による表示部２２４は、議事録作成サーバ３０により生成されて端末受信部２４６により受信された議事録を表示する。 (Display part)
The display unit 224 displays various display screens. In particular, the display unit 224 according to the embodiment of the present invention displays the minutes generated by the minutes creation server 30 and received by the terminal receiving unit 246.

（操作部）
操作部２２６は、音声会議端末２０の利用者が音声会議端末２０に情報または指示などを入力するために操作する構成である。音声会議端末２０の利用者は、操作部２２６を操作することにより、例えば、表示部２２４に表示された議事録を訂正すること、または音声会議の終了指示を入力することなどが可能である。 (Operation unit)
The operation unit 226 is configured to be operated by the user of the voice conference terminal 20 to input information or instructions to the voice conference terminal 20. By operating the operation unit 226, the user of the voice conference terminal 20 can, for example, correct the minutes displayed on the display unit 224 or input an instruction to end the voice conference.

（音声出力部）
音声出力部２２８は、通話アプリケーション２３０により他の音声会議端末２０から受信された音声信号を空気的な振動である音声に変換して出力する。通話アプリケーション２３０により受信される音声信号は第２の信号の一例であり、本明細書においては通話アプリケーション２３０により受信されてユーザＩＦ部２２０に入力される音声信号を第２音声信号と称する場合がある。 (Audio output section)
The voice output unit 228 converts the voice signal received from the other voice conference terminal 20 by the call application 230 into voice which is aerial vibration and outputs the voice signal. The voice signal received by the call application 230 is an example of the second signal, and in the present specification, the voice signal received by the call application 230 and input to the user IF unit 220 may be referred to as a second voice signal. be.

（通話アプリケーション）
通話アプリケーション２３０は、第１音声入力部２２２から入力される第１音声信号を同じ音声会議に参加する他の音声会議端末２０に送信する。また、通話アプリケーション２３０は、同じ音声会議に参加する他の音声会議端末２０から第２音声信号を受信し、当該第２音声信号を音声出力部２２８に出力する。 (Call application)
The call application 230 transmits the first voice signal input from the first voice input unit 222 to another voice conference terminal 20 participating in the same voice conference. Further, the call application 230 receives the second voice signal from another voice conference terminal 20 participating in the same voice conference, and outputs the second voice signal to the voice output unit 228.

（第２音声入力部）
第２音声入力部２４２には、音声出力部２２８から第２音声信号がループバックにより入力される。例えば、第２音声信号は、音声出力部２２８が有する外部出力端子からのケーブル接続を介して第２音声入力部２４２に入力されてもよいし、音声会議端末２０のＯＳのドライバレベルでのループバックデバイスの機能により第２音声入力部２４２に入力されてもよい。第２音声入力部２４２は、入力された第２音声信号を端末送信部２４４に出力する。 (2nd voice input section)
A second audio signal is input from the audio output unit 228 to the second audio input unit 242 by loopback. For example, the second audio signal may be input to the second audio input unit 242 via a cable connection from the external output terminal of the audio output unit 228, or a loop at the OS driver level of the audio conferencing terminal 20. It may be input to the second voice input unit 242 by the function of the back device. The second audio input unit 242 outputs the input second audio signal to the terminal transmission unit 244.

（端末送信部）
端末送信部２４４は、第１音声入力部２２２から入力される第１音声信号、および第２音声入力部２４２から入力される第２音声信号を議事録作成サーバ３０にネットワーク１２を介して送信する。このため、通話アプリケーション２３０から他の音声会議端末２０に送信された第１音声信号は議事録作成サーバ３０にも送信される。さらに、当該第１音声信号は他の音声会議端末２０において出力された第２音声信号として他の音声会議端末２０から議事録作成サーバ３０に送信される。従って、議事録作成サーバ３０は、同一の音声を示す第１音声信号と第２音声信号を、異なる音声会議端末２０から略同時刻に受信する。なお、端末送信部２４４は、後述する第２音声信号の送信停止指示に基づいて第２音声信号の送信を停止し、第２音声信号の送信開始指示に基づいて第２音声信号の送信を再開する。 (Terminal transmitter)
The terminal transmission unit 244 transmits the first audio signal input from the first audio input unit 222 and the second audio signal input from the second audio input unit 242 to the minutes creation server 30 via the network 12. .. Therefore, the first voice signal transmitted from the call application 230 to the other voice conference terminal 20 is also transmitted to the minutes creation server 30. Further, the first audio signal is transmitted from the other audio conference terminal 20 to the minutes creation server 30 as a second audio signal output by the other audio conference terminal 20. Therefore, the minutes creation server 30 receives the first audio signal and the second audio signal indicating the same audio from different audio conference terminals 20 at substantially the same time. The terminal transmission unit 244 stops the transmission of the second audio signal based on the instruction to stop the transmission of the second audio signal, which will be described later, and resumes the transmission of the second audio signal based on the instruction to start the transmission of the second audio signal. do.

（端末受信部）
端末受信部２４６は、議事録作成サーバ３０における音声信号の認識結果を受信し、当該認識結果を制御部２４８に出力する。音声信号の認識結果は、音声の内容を示す文字列を含む。また、音声信号の認識結果は、音声信号が取得された時刻を示す情報、および、音声信号が示す音声を発した利用者を示す識別情報を含んでもよい。その他、端末受信部２４６は、議事録作成サーバ３０から議事録アプリケーション２４０の動作に関する制御信号を受信する。 (Terminal receiver)
The terminal receiving unit 246 receives the recognition result of the voice signal in the minutes creation server 30, and outputs the recognition result to the control unit 248. The recognition result of the voice signal includes a character string indicating the content of the voice. Further, the recognition result of the voice signal may include information indicating the time when the voice signal is acquired and identification information indicating the user who has emitted the voice indicated by the voice signal. In addition, the terminal receiving unit 246 receives a control signal related to the operation of the minutes application 240 from the minutes creation server 30.

（制御部）
制御部２４８は、音声会議端末２０の動作全般を制御する。例えば、制御部２４８は、端末受信部２４６により受信された音声信号の認識結果に基づき、音声の内容を示す文字列を時系列に整列して議事録を生成し、当該議事録を表示部２２４に表示させる。 (Control unit)
The control unit 248 controls the overall operation of the voice conference terminal 20. For example, the control unit 248 generates the minutes by arranging the character strings indicating the contents of the voice in chronological order based on the recognition result of the voice signal received by the terminal reception unit 246, and displays the minutes in the display unit 224. To display.

＜３．議事録作成サーバの構成＞
以上、本発明の一実施形態による音声会議端末２０の構成を説明した。続いて、図５を参照し、本発明の一実施形態による議事録作成サーバ３０の構成を説明する。 <3. Minutes creation server configuration>
The configuration of the voice conference terminal 20 according to the embodiment of the present invention has been described above. Subsequently, with reference to FIG. 5, the configuration of the minutes creation server 30 according to the embodiment of the present invention will be described.

図５は、本発明の一実施形態による議事録作成サーバ３０の構成を示す説明図である。図５に示したように、本発明の一実施形態による議事録作成サーバ３０は、サーバ受信部３１０、音声認識部３２０、識別部３３０、グループ管理部３４０、サーバ送信部３５０および制御部３６０を有する。 FIG. 5 is an explanatory diagram showing the configuration of the minutes creation server 30 according to the embodiment of the present invention. As shown in FIG. 5, the minutes creation server 30 according to the embodiment of the present invention includes a server receiving unit 310, a voice recognition unit 320, an identification unit 330, a group management unit 340, a server transmitting unit 350, and a control unit 360. Have.

（サーバ受信部）
サーバ受信部３１０は、複数の音声会議端末２０から信号を受信する受信部である。例えば、サーバ受信部３１０は、複数の音声会議端末２０から第１音声信号および第２音声信号を受信する。 (Server receiver)
The server receiving unit 310 is a receiving unit that receives signals from a plurality of voice conference terminals 20. For example, the server receiving unit 310 receives the first audio signal and the second audio signal from the plurality of audio conferencing terminals 20.

（音声認識部）
音声認識部３２０は、サーバ受信部３１０により受信された第１音声信号および第２音声信号を認識し、第１音声信号の認識結果および第２音声信号の認識結果を識別部３３０およびサーバ送信部３５０に出力する。 (Voice recognition unit)
The voice recognition unit 320 recognizes the first voice signal and the second voice signal received by the server reception unit 310, and identifies the recognition result of the first voice signal and the recognition result of the second voice signal in the identification unit 330 and the server transmission unit. Output to 350.

（識別部）
識別部３３０は、サーバ受信部３１０により複数の音声会議端末２０の各々から受信された信号が所定の信号であるか否かを識別する。具体的には、識別部３３０は、特定のキーワードを保持しており、複数の音声会議端末２０の各々から受信された音声信号がキーワードを示す音声を含む信号であるか否かを識別する。すなわち、識別部３３０は、音声認識部３２０で得られた文字列がキーワードを含むか否かを識別する。キーワードは、議事録の開始を意味する「会議を始めます」のような開始キーワード（第１のキーワード）、および議事録の終了を意味する「会議を終わります」のような終了キーワード（第２のキーワード）を含んでもよい。 (Identification section)
The identification unit 330 identifies whether or not the signal received from each of the plurality of voice conference terminals 20 by the server reception unit 310 is a predetermined signal. Specifically, the identification unit 330 holds a specific keyword, and identifies whether or not the audio signal received from each of the plurality of audio conferencing terminals 20 is a signal including an audio indicating the keyword. That is, the identification unit 330 identifies whether or not the character string obtained by the voice recognition unit 320 includes a keyword. The keywords are a start keyword (first keyword) such as "start meeting" which means the start of the minutes, and an end keyword (second keyword) such as "end the meeting" which means the end of the minutes. Keyword) may be included.

（グループ管理部）
グループ管理部３４０は、議事録作成サーバ３０に接続されている複数の音声会議端末２０のうちで、同一の音声会議に参加している２以上の音声会議端末２０の組み合わせを推定し、当該２以上の音声会議端末２０の組み合わせからなるグループを形成する。具体的には、グループ管理部３４０は、識別部３３０により所定の時間差内で所定の信号であると識別された複数の音声信号の送信元である２以上の音声会議端末２０からなるグループを形成する。 (Group management department)
The group management unit 340 estimates a combination of two or more voice conference terminals 20 participating in the same voice conference among the plurality of voice conference terminals 20 connected to the minutes creation server 30, and the 2 or more. A group consisting of a combination of the above voice conference terminals 20 is formed. Specifically, the group management unit 340 forms a group consisting of two or more audio conference terminals 20 that are sources of a plurality of audio signals identified as predetermined signals within a predetermined time difference by the identification unit 330. do.

この点について補足すると、ある音声会議端末２０の利用者が開始キーワードを含む音声を発すると、当該音声会議端末２０が第１音声信号として当該音声を示す音声信号を議事録作成サーバ３０に送信し、当該音声会議端末２０と同じ音声会議に参加する他の音声会議端末２０が第２音声信号として同音声を示す音声信号を議事録作成サーバ３０に送信する。すなわち、同じ音声会議に参加する２以上の音声会議端末２０からは、略同時刻に開始キーワードを含む音声信号が受信されると考えられる。 To supplement this point, when a user of a certain voice conference terminal 20 emits a voice including a start keyword, the voice conference terminal 20 transmits a voice signal indicating the voice as a first voice signal to the minutes creation server 30. , Another voice conference terminal 20 participating in the same voice conference as the voice conference terminal 20 transmits a voice signal indicating the same voice as a second voice signal to the minutes creation server 30. That is, it is considered that the voice signals including the start keyword are received from the two or more voice conference terminals 20 participating in the same voice conference at substantially the same time.

従って、グループ管理部３４０は、識別部３３０により所定の時間差内に開始キーワードを含む音声であると識別された複数の音声信号の送信元である２以上の音声会議端末２０からなるグループを形成することにより、同じ音声会議に参加する２以上の音声会議端末２０をグルーピングすることが可能である。所定の時間差は、ネットワーク遅延および音声会議端末２０内での処理遅延を考慮した時間差であることが望ましく、例えば１秒〜５秒の間の時間差であってもよい。 Therefore, the group management unit 340 forms a group consisting of two or more voice conference terminals 20 that are sources of a plurality of voice signals identified as voices including the start keyword within a predetermined time difference by the identification unit 330. Thereby, it is possible to group two or more audio conference terminals 20 participating in the same audio conference. The predetermined time difference is preferably a time difference in consideration of the network delay and the processing delay in the voice conference terminal 20, and may be, for example, a time difference between 1 second and 5 seconds.

なお、グループ管理部３４０は、いずれかの音声会議端末２０から受信された音声信号について識別部３３０により終了キーワードを含む音声であると識別された場合、当該音声会議端末２０が属する２以上の音声会議端末２０からなるグループを解除する。 When the group management unit 340 identifies the voice signal received from any of the voice conference terminals 20 as the voice including the end keyword by the identification unit 330, the group management unit 340 has two or more voices to which the voice conference terminal 20 belongs. The group consisting of the conference terminals 20 is released.

（サーバ送信部）
サーバ送信部３５０は、同一のグループに属する２以上の音声会議端末２０から受信された音声信号の認識結果を、当該グループに属する２以上の音声会議端末２０に送信する送信部である。また、サーバ送信部３５０は、グループ管理部３４０によりグループが形成されると、当該グループに属する２以上の音声会議端末２０に第２音声信号の送信の停止を指示する制御信号である送信停止指示を送信する。第２音声信号はグループ形成のために用いられるところ、グループの形成後には第２音声信号が議事録作成サーバ３０に送信されなくてよくなるためである。一方、第１音声信号の議事録作成サーバ３０への送信は議事録の作成のために継続される。 (Server transmitter)
The server transmission unit 350 is a transmission unit that transmits the recognition result of the audio signal received from the two or more audio conferencing terminals 20 belonging to the same group to the two or more audio conferencing terminals 20 belonging to the group. Further, when a group is formed by the group management unit 340, the server transmission unit 350 is a transmission stop instruction which is a control signal for instructing two or more voice conference terminals 20 belonging to the group to stop the transmission of the second voice signal. To send. This is because the second audio signal is used for group formation, and the second audio signal does not have to be transmitted to the minutes creation server 30 after the group is formed. On the other hand, the transmission of the first voice signal to the minutes creation server 30 is continued for the minutes creation.

また、サーバ送信部３５０は、グループ管理部３４０によりグループが解除されると、次のグループの形成に備えるために、当該グループに属していた２以上の音声会議端末２０に第２音声信号の送信の開始を指示する制御信号である送信開始指示を送信する。 Further, when the group is released by the group management unit 340, the server transmission unit 350 transmits a second voice signal to two or more voice conference terminals 20 belonging to the group in order to prepare for the formation of the next group. A transmission start instruction, which is a control signal for instructing the start of, is transmitted.

（制御部）
制御部３６０は、議事録作成サーバ３０の動作全般を制御する。例えば、制御部３６０は、サーバ送信部３５０からの認識結果の送信、送信停止指示の送信、および送信開始指示の送信などを制御する。 (Control unit)
The control unit 360 controls the overall operation of the minutes creation server 30. For example, the control unit 360 controls the transmission of the recognition result from the server transmission unit 350, the transmission of the transmission stop instruction, the transmission of the transmission start instruction, and the like.

＜４．音声会議支援システムの動作＞
以上、本発明の一実施形態による音声会議端末２０および議事録作成サーバ３０の構成を説明した。続いて、図６〜図９を参照し、本発明の一実施形態による音声会議支援システムの動作を整理する。 <4. Operation of voice conference support system>
The configuration of the voice conference terminal 20 and the minutes creation server 30 according to the embodiment of the present invention has been described above. Subsequently, with reference to FIGS. 6 to 9, the operation of the voice conference support system according to the embodiment of the present invention will be arranged.

（起動）
図６は、音声会議端末２０の起動時の動作を示す説明図である。まず、音声会議端末２０Ａが通話アプリケーション２３０および議事録アプリケーション２４０を起動すると（Ｓ１０）、音声会議端末２０Ａの議事録アプリケーション２４０が通話アプリケーション２３０と音声入力を共有し（Ｓ１１）、音声会議端末２０Ａの端末送信部２４４が第１音声信号および第２音声信号の送信を開始する（Ｓ１２、Ｓ１３）。 (Start-up)
FIG. 6 is an explanatory diagram showing an operation at the time of starting the voice conference terminal 20. First, when the voice conference terminal 20A activates the call application 230 and the minutes application 240 (S10), the minutes application 240 of the voice conference terminal 20A shares the voice input with the call application 230 (S11), and the voice conference terminal 20A The terminal transmission unit 244 starts transmitting the first voice signal and the second voice signal (S12, S13).

同様に、音声会議端末２０Ｂが通話アプリケーション２３０および議事録アプリケーション２４０を起動すると（Ｓ１４）、音声会議端末２０Ｂの議事録アプリケーション２４０が通話アプリケーション２３０と音声入力を共有し（Ｓ１５）、音声会議端末２０Ｂの端末送信部２４４が第１音声信号および第２音声信号の送信を開始する（Ｓ１６、Ｓ１７）。 Similarly, when the voice conference terminal 20B activates the call application 230 and the minutes application 240 (S14), the minutes application 240 of the voice conference terminal 20B shares the voice input with the call application 230 (S15), and the voice conference terminal 20B The terminal transmission unit 244 of the above starts transmitting the first voice signal and the second voice signal (S16, S17).

さらに、音声会議端末２０Ｃが通話アプリケーション２３０および議事録アプリケーション２４０を起動すると（Ｓ１８）、音声会議端末２０Ｃの議事録アプリケーション２４０が通話アプリケーション２３０と音声入力を共有し（Ｓ１９）、音声会議端末２０Ｃの端末送信部２４４が第１音声信号および第２音声信号の送信を開始する（Ｓ２０、Ｓ２１）。なお、アプリケーション間の起動順序、および複数の音声会議端末２０間でのアプリケーションの起動順序は、上述した順序である必要は無く、特に限定されない。 Further, when the voice conference terminal 20C activates the call application 230 and the minutes application 240 (S18), the minutes application 240 of the voice conference terminal 20C shares the voice input with the call application 230 (S19), and the voice conference terminal 20C The terminal transmission unit 244 starts transmitting the first voice signal and the second voice signal (S20, S21). The start-up order between the applications and the start-up order of the applications between the plurality of voice conference terminals 20 do not have to be the above-mentioned order, and are not particularly limited.

その後、音声会議端末２０Ａの通話アプリケーション２３０が音声会議端末２０Ｂに音声会議の実施を要求し（Ｓ２２）、音声会議端末２０Ｂの通話アプリケーション２３０が当該要求を承認し（Ｓ２３）、音声会議のセッションが成立する。以降、音声会議端末２０Ａおよび音声会議端末２０Ｂの間で音声会議が行われ、当該音声会議に音声会議端末２０Ｃは参加しない。 After that, the call application 230 of the voice conference terminal 20A requests the voice conference terminal 20B to perform the voice conference (S22), the call application 230 of the voice conference terminal 20B approves the request (S23), and the voice conference session is started. To establish. After that, a voice conference is held between the voice conference terminal 20A and the voice conference terminal 20B, and the voice conference terminal 20C does not participate in the voice conference.

（議事録作成サーバ側でのグルーピング）
図７は、音声会議のセッションが成立した後に議事録作成サーバ３０が複数の音声会議端末２０をグルーピングする動作を示す説明図である。音声会議端末２０Ａの利用者が音声を発すると、図７に示したように、音声会議端末２０Ａの通話アプリケーション２３０が当該音声を示す音声信号を音声会議端末２０Ｂに送信し（Ｓ３１）、音声会議端末２０Ｂの音声出力部２２８が当該音声信号を出力する（Ｓ３２）。 (Grouping on the minutes creation server side)
FIG. 7 is an explanatory diagram showing an operation in which the minutes creation server 30 groups a plurality of voice conference terminals 20 after a voice conference session is established. When the user of the voice conference terminal 20A emits a voice, as shown in FIG. 7, the call application 230 of the voice conference terminal 20A transmits a voice signal indicating the voice to the voice conference terminal 20B (S31), and the voice conference The voice output unit 228 of the terminal 20B outputs the voice signal (S32).

ここで、音声会議端末２０Ａの端末送信部２４４は、音声会議端末２０Ｂに送信された音声信号を第１音声信号として議事録作成サーバ３０に送信し（Ｓ３３）、議事録作成サーバ３０において当該第１音声信号の音声認識が行われる（Ｓ３４）。また、音声会議端末２０Ｂの端末送信部２４４は、音声出力部２２８により出力された音声信号を第２音声信号として議事録作成サーバ３０に送信し（Ｓ３５）、議事録作成サーバ３０において当該第２音声信号の音声認識が行われる（Ｓ３６）。 Here, the terminal transmission unit 244 of the voice conference terminal 20A transmits the voice signal transmitted to the voice conference terminal 20B to the minutes creation server 30 as the first voice signal (S33), and the minutes creation server 30 concerned. 1 Voice recognition of the voice signal is performed (S34). Further, the terminal transmission unit 244 of the voice conference terminal 20B transmits the voice signal output by the voice output unit 228 to the minutes preparation server 30 as a second voice signal (S35), and the minutes preparation server 30 performs the second voice signal. Voice recognition of the voice signal is performed (S36).

議事録作成サーバ３０により音声認識された第１音声信号および第２音声信号が「会議を始めます」のような特定の開始キーワードを含む音声を示し、議事録作成サーバ３０のサーバ受信部３１０が所定の時間差内で当該第１音声信号および第２音声信号を受信した場合、フレームＦ１に示す処理が行われる。 The first voice signal and the second voice signal voice-recognized by the minutes preparation server 30 indicate a voice including a specific start keyword such as "start a meeting", and the server receiving unit 310 of the minutes preparation server 30 indicates a voice. When the first voice signal and the second voice signal are received within a predetermined time difference, the process shown in frame F1 is performed.

具体的には、議事録作成サーバ３０のグループ管理部３４０が、第１音声信号の送信元である音声会議端末２０Ａ、および第２音声信号の送信元である音声会議端末２０Ｂをグルーピングする（Ｓ３７）。そして、議事録作成サーバ３０のサーバ送信部３５０が、グルーピングされた音声会議端末２０Ａおよび音声会議端末２０Ｂに第２音声信号の送信停止指示を送信する（Ｓ３８、Ｓ３９）。 Specifically, the group management unit 340 of the minutes creation server 30 groups the voice conference terminal 20A, which is the source of the first voice signal, and the voice conference terminal 20B, which is the source of the second voice signal (S37). ). Then, the server transmission unit 350 of the minutes creation server 30 transmits a transmission stop instruction of the second voice signal to the grouped voice conference terminals 20A and voice conference terminals 20B (S38, S39).

また、議事録作成サーバ３０のサーバ送信部３５０は、第１音声信号の認識結果を同一グループに属する音声会議端末２０Ａおよび音声会議端末２０Ｂに送信し（Ｓ４０、Ｓ４２）、音声会議端末２０Ａおよび音声会議端末２０Ｂは第１音声信号の認識結果を議事録に反映させる（Ｓ４１、Ｓ４３）。議事録作成サーバ３０は、同様に、図示しない他の音声会議端末２０の組み合わせをグルーピングし、結果、複数のグループが混在してもよい。 Further, the server transmission unit 350 of the minutes creation server 30 transmits the recognition result of the first voice signal to the voice conference terminals 20A and the voice conference terminals 20B belonging to the same group (S40, S42), and the voice conference terminals 20A and the voice. The conference terminal 20B reflects the recognition result of the first voice signal in the minutes (S41, S43). Similarly, the minutes creation server 30 may group combinations of other voice conference terminals 20 (not shown), and as a result, a plurality of groups may be mixed.

（音声会議の進行）
図８は、グルーピング後の動作を示す説明図である。音声会議端末２０Ａの利用者が音声を発すると、図８に示したように、音声会議端末２０Ａの通話アプリケーション２３０が当該音声を示す音声信号を音声会議端末２０Ｂに送信し（Ｓ４４）、音声会議端末２０Ｂの音声出力部２２８が当該音声信号を出力する（Ｓ４５）。また、音声会議端末２０Ａの端末送信部２４４は、音声会議端末２０Ｂに送信された音声信号を第１音声信号として議事録作成サーバ３０に送信し（Ｓ４６）、議事録作成サーバ３０において当該第１音声信号の音声認識が行われる（Ｓ４７）。 (Progress of audio conference)
FIG. 8 is an explanatory diagram showing an operation after grouping. When the user of the voice conference terminal 20A emits a voice, as shown in FIG. 8, the call application 230 of the voice conference terminal 20A transmits a voice signal indicating the voice to the voice conference terminal 20B (S44), and the voice conference The voice output unit 228 of the terminal 20B outputs the voice signal (S45). Further, the terminal transmission unit 244 of the voice conference terminal 20A transmits the voice signal transmitted to the voice conference terminal 20B to the minutes creation server 30 as the first voice signal (S46), and the minutes creation server 30 causes the first voice signal. Voice recognition of the voice signal is performed (S47).

そして、議事録作成サーバ３０のサーバ送信部３５０は、第１音声信号の認識結果を同一グループに属する音声会議端末２０Ａおよび音声会議端末２０Ｂに送信し（Ｓ４８、Ｓ５０）、音声会議端末２０Ａおよび音声会議端末２０Ｂは第１音声信号の認識結果を議事録に反映させる（Ｓ４９、Ｓ５１）。 Then, the server transmission unit 350 of the minutes creation server 30 transmits the recognition result of the first voice signal to the voice conference terminal 20A and the voice conference terminal 20B belonging to the same group (S48, S50), and the voice conference terminal 20A and the voice. The conference terminal 20B reflects the recognition result of the first voice signal in the minutes (S49, S51).

同様に、音声会議端末２０Ｂの利用者が音声を発すると、図８に示したように、音声会議端末２０Ｂの通話アプリケーション２３０が当該音声を示す音声信号を音声会議端末２０Ａに送信し（Ｓ５２）、音声会議端末２０Ａの音声出力部２２８が当該音声信号を出力する（Ｓ５３）。また、音声会議端末２０Ｂの端末送信部２４４は、音声会議端末２０Ａに送信された音声信号を第１音声信号として議事録作成サーバ３０に送信し（Ｓ５４）、議事録作成サーバ３０において当該第１音声信号の音声認識が行われる（Ｓ５５）。 Similarly, when the user of the voice conference terminal 20B emits a voice, as shown in FIG. 8, the call application 230 of the voice conference terminal 20B transmits a voice signal indicating the voice to the voice conference terminal 20A (S52). , The voice output unit 228 of the voice conference terminal 20A outputs the voice signal (S53). Further, the terminal transmission unit 244 of the voice conference terminal 20B transmits the voice signal transmitted to the voice conference terminal 20A to the minutes creation server 30 as the first voice signal (S54), and the minutes creation server 30 causes the first voice signal. Voice recognition of the voice signal is performed (S55).

そして、議事録作成サーバ３０のサーバ送信部３５０は、第１音声信号の認識結果を同一グループに属する音声会議端末２０Ａおよび音声会議端末２０Ｂに送信し（Ｓ５６、Ｓ５８）、音声会議端末２０Ａおよび音声会議端末２０Ｂは第１音声信号の認識結果を議事録に反映させる（Ｓ５７、Ｓ５９）。なお、議事録作成サーバ３０による第１音声信号の認識結果は、グループに属していない音声会議端末２０Ｃには送信されない。 Then, the server transmission unit 350 of the minutes creation server 30 transmits the recognition result of the first voice signal to the voice conference terminal 20A and the voice conference terminal 20B belonging to the same group (S56, S58), and the voice conference terminal 20A and the voice. The conference terminal 20B reflects the recognition result of the first voice signal in the minutes (S57, S59). The recognition result of the first voice signal by the minutes creation server 30 is not transmitted to the voice conference terminal 20C that does not belong to the group.

（音声会議の終了）
図９は、音声会議の終了時の動作を示す説明図である。音声会議端末２０Ａの利用者が音声を発すると、図９に示したように、音声会議端末２０Ａの通話アプリケーション２３０が当該音声を示す音声信号を音声会議端末２０Ｂに送信し（Ｓ６１）、音声会議端末２０Ｂの音声出力部２２８が当該音声信号を出力する（Ｓ６２）。また、音声会議端末２０Ａの端末送信部２４４は、音声会議端末２０Ｂに送信された音声信号を第１音声信号として議事録作成サーバ３０に送信し（Ｓ６３）、議事録作成サーバ３０において当該第１音声信号の音声認識が行われる（Ｓ６４）。 (End of voice conference)
FIG. 9 is an explanatory diagram showing an operation at the end of the voice conference. When the user of the voice conference terminal 20A emits a voice, as shown in FIG. 9, the call application 230 of the voice conference terminal 20A transmits a voice signal indicating the voice to the voice conference terminal 20B (S61), and the voice conference The voice output unit 228 of the terminal 20B outputs the voice signal (S62). Further, the terminal transmission unit 244 of the voice conference terminal 20A transmits the voice signal transmitted to the voice conference terminal 20B as the first voice signal to the minutes creation server 30 (S63), and the minutes creation server 30 causes the first voice signal. Voice recognition of the voice signal is performed (S64).

そして、議事録作成サーバ３０のサーバ送信部３５０は、第１音声信号の認識結果を同一グループに属する音声会議端末２０Ａおよび音声会議端末２０Ｂに送信し（Ｓ６５、Ｓ６７）、音声会議端末２０Ａおよび音声会議端末２０Ｂは第１音声信号の認識結果を議事録に反映させる（Ｓ６６、Ｓ６８）。 Then, the server transmission unit 350 of the minutes creation server 30 transmits the recognition result of the first voice signal to the voice conference terminal 20A and the voice conference terminal 20B belonging to the same group (S65, S67), and the voice conference terminal 20A and the voice. The conference terminal 20B reflects the recognition result of the first voice signal in the minutes (S66, S68).

ここで、議事録作成サーバ３０により音声認識された第１音声信号が「会議を終わります」のような特定の終了キーワードを含む音声を示す場合、フレームＦ２に示す処理が行われる。 Here, when the first voice signal voice-recognized by the minutes creation server 30 indicates a voice including a specific end keyword such as "end the meeting", the process shown in the frame F2 is performed.

具体的には、議事録作成サーバ３０のグループ管理部３４０が第１音声信号の送信元である音声会議端末２０Ａが属するグループを解除し（Ｓ６９）、議事録作成サーバ３０のサーバ送信部３５０が、当該グループに属していた音声会議端末２０Ａおよび音声会議端末２０Ｂに第２音声信号の送信開始指示を送信する（Ｓ７０、Ｓ７１）。そして、議事録作成サーバ３０の制御部３６０は議事録を記憶媒体に格納する（Ｓ７２）。 Specifically, the group management unit 340 of the minutes creation server 30 releases the group to which the voice conference terminal 20A, which is the source of the first voice signal, belongs (S69), and the server transmission unit 350 of the minutes creation server 30 cancels the group. , The transmission start instruction of the second voice signal is transmitted to the voice conference terminal 20A and the voice conference terminal 20B belonging to the group (S70, S71). Then, the control unit 360 of the minutes creation server 30 stores the minutes in the storage medium (S72).

その後、音声会議端末２０Ａおよび音声会議端末２０Ｂの通話アプリケーション２３０の間で音声会議終了のための切断要求と承認がやり取りされることにより（Ｓ７３、Ｓ７４）、音声会議のセッションが終了する。なお、上記では音声会議端末２０Ａが特定のキーワードを含む音声を示す音声信号を送信する例を説明したが、このような音声信号の送信元は、同一の音声会議に参加しているいずれの音声会議端末２０であってもよい。 After that, the voice conference session ends by exchanging a disconnection request and approval for ending the voice conference between the voice conference terminal 20A and the call application 230 of the voice conference terminal 20B (S73, S74). In the above description, an example in which the voice conference terminal 20A transmits a voice signal indicating voice including a specific keyword has been described, but the source of such a voice signal is any voice participating in the same voice conference. It may be the conference terminal 20.

＜５．作用効果＞
以上説明したように、本発明の一実施形態は、既存の音声会議システムと併用可能な音声会議支援システムに関し、音声会議端末２０において通話アプリケーション２３０および議事録アプリケーション２４０が音声入力を共有する。そして、音声会議のセッションが複数の音声会議端末２０の間で成立している状態である音声会議端末２０の利用者が特定のキーワードを含む音声を発することで、議事録作成サーバ３０が同一の音声会議に参加している複数の音声会議端末２０を簡易に議事録作成のためにグルーピングすることが可能である。これにより、音声会議ごとの音声会議システムが不要となり、導入コストを削減できる。 <5. Action effect>
As described above, in one embodiment of the present invention, regarding the voice conference support system that can be used in combination with the existing voice conference system, the call application 230 and the minutes application 240 share the voice input in the voice conference terminal 20. Then, the user of the voice conference terminal 20 in a state where the voice conference session is established among the plurality of voice conference terminals 20 emits a voice including a specific keyword, so that the minutes creation server 30 is the same. It is possible to easily group a plurality of voice conference terminals 20 participating in the voice conference for the purpose of creating minutes. This eliminates the need for a voice conference system for each voice conference, and can reduce the introduction cost.

また、音声会議の開始時に自然に発せられる「会議を始めます」のようなフレーズを開始キーワードとして保持しておくことで、議事録作成のためのグルーピングをユーザが実質的に手間を感じることなく実現することが可能である。同様に、音声会議の終了時に自然に発せられる「会議を終わります」のようなフレーズを終了キーワードとして保持しておくことで、グループの解除についてもユーザが実質的に手間を感じることなく実現することが可能である。 In addition, by holding a phrase such as "start a meeting" that is naturally issued at the start of a voice conference as a start keyword, the user does not have to feel the trouble of grouping for minutes. It is possible to achieve it. Similarly, by retaining a phrase such as "end the meeting" that is naturally emitted at the end of the audio conference as the end keyword, the user can virtually eliminate the trouble of canceling the group. It is possible.

また、グルーピングが行われた後には議事録作成サーバ３０が同一グループに属する各音声会議端末２０に第２音声信号の送信停止指示を送信することで、音声会議端末２０における処理負荷、および音声会議端末２０と議事録作成サーバ３０の間のトラフィックを抑制することが可能である。また、グループの解除後には議事録作成サーバ３０が同一グループに属していた各音声会議端末２０に第２音声信号の送信開始指示を送信することで、新たなグルーピングに備えることが可能である。 Further, after the grouping is performed, the minutes creation server 30 transmits a second voice signal transmission stop instruction to each voice conference terminal 20 belonging to the same group, so that the processing load on the voice conference terminal 20 and the voice conference It is possible to suppress the traffic between the terminal 20 and the minutes creation server 30. Further, after the group is released, the minutes creation server 30 can prepare for a new grouping by transmitting a transmission start instruction of the second voice signal to each voice conference terminal 20 belonging to the same group.

＜６．変形例＞
以上、本発明の一実施形態を説明した。以下では、上述した実施形態の幾つかの変形例を説明する。なお、以下に説明する各変形例は、単独で上述した実施形態に適用されてもよいし、組み合わせで上述した実施形態に適用されてもよい。また、各変形例は、上述した実施形態で説明した構成に代えて適用されてもよいし、上述した実施形態で説明した構成に対して追加的に適用されてもよい。 <6. Modification example>
The embodiment of the present invention has been described above. Hereinafter, some modifications of the above-described embodiment will be described. In addition, each modification described below may be applied alone to the above-described embodiment, or may be applied in combination to the above-described embodiment. Further, each modification may be applied in place of the configuration described in the above-described embodiment, or may be additionally applied to the configuration described in the above-described embodiment.

例えば、上記では、グループ管理部３４０が、識別部３３０により所定の時間差内に開始キーワードを含む音声であると識別された複数の音声信号の送信元である２以上の音声会議端末２０からなるグループを形成する例を説明した。この点に関し、識別部３３０は、音声信号が開始キーワードを含む音声を示すか否かに加え、各音声信号を発した利用者を識別してもよい。そして、グループ管理部３４０は、開始キーワードを含む音声を示す音声信号であることに加えて、同一の利用者が発した音声を示す音声信号であることに基づいて、音声信号の送信元である２以上の音声会議端末２０からなるグループを形成してもよい。 For example, in the above, the group management unit 340 is a group consisting of two or more audio conference terminals 20 that are sources of a plurality of audio signals identified by the identification unit 330 as audio including a start keyword within a predetermined time difference. An example of forming is described. In this regard, the identification unit 330 may identify the user who emitted each voice signal, in addition to whether or not the voice signal indicates a voice including the start keyword. Then, the group management unit 340 is a source of the voice signal based on the voice signal indicating the voice including the start keyword and the voice signal indicating the voice emitted by the same user. A group consisting of two or more voice conference terminals 20 may be formed.

かかる構成によれば、複数の音声会議が偶然同時に開始され、異なる音声会議に属する異なる利用者が同時に開始キーワードを発した場合でも、各音声会議に属する音声会議端末２０を別々にグルーピングすることが可能である。 According to such a configuration, even if a plurality of voice conferences are accidentally started at the same time and different users belonging to different voice conferences issue start keywords at the same time, the voice conference terminals 20 belonging to each voice conference can be grouped separately. It is possible.

また、上記では利用者が発した開始キーワードを含む音声を示す音声信号がグルーピングのための所定の信号として用いられる例を説明したが、所定の信号はかかる例に限定されない。例えば、音声会議端末２０は録音済みの音声信号を記憶しており、操作部２２６への操作に基づいて当該録音済みの音声信号を読み出し、所定の信号として送信してもよい。この場合、同一の音声会議に参加する各利用者が音声会議端末２０を操作し、各音声会議端末２０が録音済みの音声信号を議事録作成サーバ３０に送信してもよい。または、ある音声会議端末２０が他の音声会議端末２０に録音済みの音声信号を送信すると、他の音声会議端末２０が第２音声信号として当該録音済みの音声信号を議事録作成サーバ３０に送信してもよい。 Further, although the example in which the voice signal indicating the voice including the start keyword emitted by the user is used as a predetermined signal for grouping has been described above, the predetermined signal is not limited to such an example. For example, the audio conference terminal 20 stores a recorded audio signal, and the recorded audio signal may be read out based on an operation on the operation unit 226 and transmitted as a predetermined signal. In this case, each user participating in the same voice conference may operate the voice conference terminal 20 and each voice conference terminal 20 may transmit the recorded voice signal to the minutes creation server 30. Alternatively, when a certain voice conference terminal 20 transmits a recorded voice signal to another voice conference terminal 20, the other voice conference terminal 20 transmits the recorded voice signal as a second voice signal to the minutes creation server 30. You may.

さらに、所定の信号は、音声会議端末２０に事前に記録された非可聴領域の成分からなる信号であってもよい。この場合、議事録作成サーバ３０は、非可聴領域の成分を検出するための構成を有することが望ましい。 Further, the predetermined signal may be a signal composed of components in the inaudible region pre-recorded in the voice conference terminal 20. In this case, it is desirable that the minutes preparation server 30 has a configuration for detecting a component in the inaudible region.

また、上記では１つの音声会議端末２０を１人の利用者が利用する例を説明したが、１つの音声会議端末２０を利用する利用者の人数は限定されず、１つの音声会議端末２０は複数の利用者により利用されてもよい。 Further, although the example in which one user uses one voice conference terminal 20 has been described above, the number of users who use one voice conference terminal 20 is not limited, and one voice conference terminal 20 may be used. It may be used by a plurality of users.

＜７．ハードウェア構成＞
以上、本発明の実施形態を説明した。上述した音声認識、およびグルーピングなどの情報処理は、ソフトウェアと、以下に説明する議事録作成サーバ３０のハードウェアとの協働により実現される。 <7. Hardware configuration>
The embodiments of the present invention have been described above. The above-mentioned information processing such as voice recognition and grouping is realized by the cooperation between the software and the hardware of the minutes creation server 30 described below.

図１０は、議事録作成サーバ３０のハードウェア構成を示したブロック図である。議事録作成サーバ３０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３０３と、ホストバス３０４と、を備える。また、議事録作成サーバ３０は、ブリッジ３０５と、外部バス３０６と、インターフェース３０７と、入力装置３０８と、表示装置３０９と、音声出力装置３１６と、ストレージ装置（ＨＤＤ）３１１と、ドライブ３１２と、ネットワークインターフェース３１５とを備える。 FIG. 10 is a block diagram showing the hardware configuration of the minutes creation server 30. The minutes creation server 30 includes a CPU (Central Processing Unit) 301, a ROM (Read Only Memory) 302, a RAM (Random Access Memory) 303, and a host bus 304. The minutes creation server 30 includes a bridge 305, an external bus 306, an interface 307, an input device 308, a display device 309, an audio output device 316, a storage device (HDD) 311 and a drive 312. It includes a network interface 315.

ＣＰＵ３０１は、演算処理装置および制御装置として機能し、各種プログラムに従って議事録作成サーバ３０内の動作全般を制御する。また、ＣＰＵ３０１は、マイクロプロセッサであってもよい。ＲＯＭ３０２は、ＣＰＵ３０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ３０３は、ＣＰＵ３０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバスなどから構成されるホストバス３０４により相互に接続されている。これらＣＰＵ３０１、ＲＯＭ３０２およびＲＡＭ３０３とソフトウェアとの協働により、上述した音声認識部３２０、識別部３３０、グループ管理部３４０および制御部３６０などの機能が実現され得る。 The CPU 301 functions as an arithmetic processing device and a control device, and controls the overall operation in the minutes creation server 30 according to various programs. Further, the CPU 301 may be a microprocessor. The ROM 302 stores programs, calculation parameters, and the like used by the CPU 301. The RAM 303 temporarily stores a program used in the execution of the CPU 301, parameters that change appropriately in the execution, and the like. These are connected to each other by a host bus 304 composed of a CPU bus or the like. By collaborating with the CPU 301, ROM 302 and RAM 303 and software, the above-mentioned functions such as the voice recognition unit 320, the identification unit 330, the group management unit 340 and the control unit 360 can be realized.

ホストバス３０４は、ブリッジ３０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス３０６に接続されている。なお、必ずしもホストバス３０４、ブリッジ３０５および外部バス３０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 304 is connected to an external bus 306 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 305. It is not always necessary to separately configure the host bus 304, the bridge 305, and the external bus 306, and these functions may be implemented in one bus.

入力装置３０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、カメラ、センサー、スイッチおよびレバーなどユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ３０１に出力する入力制御回路などから構成されている。議事録作成サーバ３０のユーザは、該入力装置３０８を操作することにより、議事録作成サーバ３０に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 308 generates an input signal for the user to input information such as a mouse, a keyboard, a touch panel, a button, a microphone, a camera, a sensor, a switch and a lever, and an input signal based on the input by the user, and outputs the input signal to the CPU 301. It is composed of an input control circuit and the like. By operating the input device 308, the user of the minutes creation server 30 can input various data to the minutes creation server 30 and instruct the processing operation.

表示装置３０９は、例えば、液晶ディスプレイ（ＬＣＤ）装置、プロジェクター装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置およびランプなどの表示装置を含む。また、音声出力装置３１６は、スピーカおよびヘッドホンなどの音声出力装置を含む。 The display device 309 includes, for example, a liquid crystal display (LCD) device, a projector device, an OLED (Organic Light Emitting Diode) device, and a display device such as a lamp. Further, the audio output device 316 includes an audio output device such as a speaker and headphones.

ストレージ装置３１１は、本実施形態にかかる議事録作成サーバ３０の記憶部の一例として構成されたデータ格納用の装置である。ストレージ装置３１１は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置などを含んでもよい。ストレージ装置３１１は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）またはＳＳＤ（ＳｏｌｉｄＳｔｒａｇｅＤｒｉｖｅ）、あるいは同等の機能を有するメモリ等で構成される。このストレージ装置３１１は、ストレージを駆動し、ＣＰＵ３０１が実行するプログラムや各種データを格納する。 The storage device 311 is a data storage device configured as an example of the storage unit of the minutes creation server 30 according to the present embodiment. The storage device 311 may include a storage medium, a recording device for recording data on the storage medium, a reading device for reading data from the storage medium, a deleting device for deleting the data recorded on the storage medium, and the like. The storage device 311 is composed of, for example, an HDD (Hard Disk Drive) or SSD (Solid Stage Drive), or a memory having an equivalent function. The storage device 311 drives the storage and stores programs and various data executed by the CPU 301.

ドライブ３１２は、記憶媒体用リーダライタであり、議事録作成サーバ３０に内蔵、あるいは外付けされる。ドライブ３１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記憶媒体３４に記録されている情報を読み出して、ＲＡＭ３０３またはストレージ装置３１１に出力する。また、ドライブ３１２は、リムーバブル記憶媒体３４に情報を書き込むこともできる。 The drive 312 is a reader / writer for a storage medium, and is built in or externally attached to the minutes creation server 30. The drive 312 reads the information recorded in the removable storage medium 34 such as the mounted magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and outputs the information to the RAM 303 or the storage device 311. The drive 312 can also write information to the removable storage medium 34.

ネットワークインターフェース３１５は、例えば、ネットワーク１２に接続するための通信デバイス等で構成された通信インターフェースである。また、ネットワークインターフェース３１５は、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）対応通信装置であっても、有線による通信を行うワイヤー通信装置であってもよい。 The network interface 315 is, for example, a communication interface composed of a communication device or the like for connecting to the network 12. Further, the network interface 315 may be a wireless LAN (Local Area Network) compatible communication device or a wire communication device that performs wired communication.

なお、上述した議事録作成サーバ３０のハードウェア構成は音声会議端末２０にも適用可能である。いずれの構成においてもドライブ３１２およびリムーバブル記憶媒体３４は必須ではなく、入力装置３０８と、表示装置３０９と、音声出力装置３１６の構成は用途に応じて変えられたり、省略されたりしてもよい。 The hardware configuration of the minutes creation server 30 described above can also be applied to the voice conference terminal 20. The drive 312 and the removable storage medium 34 are not indispensable in any of the configurations, and the configurations of the input device 308, the display device 309, and the audio output device 316 may be changed or omitted depending on the application.

＜８．補足＞
以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 <8. Supplement>
Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person having ordinary knowledge in the field of technology to which the present invention belongs can come up with various modifications or modifications within the scope of the technical ideas described in the claims. , These are also naturally understood to belong to the technical scope of the present invention.

例えば、本明細書の音声会議支援システムの処理における各ステップは、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はない。例えば、音声会議支援システムの処理における各ステップは、フローチャートとして記載した順序と異なる順序で処理されても、並列的に処理されてもよい。 For example, each step in the processing of the voice conference support system of the present specification does not necessarily have to be processed in chronological order in the order described as a flowchart. For example, each step in the processing of the voice conference support system may be processed in an order different from the order described in the flowchart, or may be processed in parallel.

また、音声会議端末２０および議事録作成サーバ３０に内蔵されるＣＰＵ、ＲＯＭおよびＲＡＭなどのハードウェアに、上述した音声会議端末２０および議事録作成サーバ３０の各構成と同等の機能を発揮させるためのコンピュータプログラムも作成可能である。また、該コンピュータプログラムを記憶させた記憶媒体も提供される。 Further, in order for the hardware such as the CPU, ROM, and RAM built in the voice conference terminal 20 and the minutes creation server 30 to exhibit the same functions as the configurations of the voice conference terminal 20 and the minutes creation server 30 described above. Computer programs can also be created. A storage medium for storing the computer program is also provided.

２０音声会議端末
２２０ユーザＩＦ部
２２２第１音声入力部
２２４表示部
２２６操作部
２２８音声出力部
２３０通話アプリケーション
２４０議事録アプリケーション
２４２第２音声入力部
２４４端末送信部
２４６端末受信部
２４８制御部
３０議事録作成サーバ
３１０サーバ受信部
３２０音声認識部
３３０識別部
３４０グループ管理部
３５０サーバ送信部
３６０制御部

20 Voice conference terminal 220 User IF unit 222 1st voice input unit 224 Display unit 226 Operation unit 228 Voice output unit 230 Call application 240 Minutes application 242 2nd voice input unit 244 Terminal transmission unit 246 Terminal reception unit 248 Control unit 30 Proceedings Recording server 310 Server reception unit 320 Voice recognition unit 330 Identification unit 340 Group management unit 350 Server transmission unit 360 Control unit

Claims

複数の音声会議端末から信号を受信する受信部と、
前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、
前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、
音声会議端末から受信された音声信号を認識する音声認識部と、
同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、
を備える、音声会議支援装置。 A receiver that receives signals from multiple audio conferencing terminals,
An identification unit that identifies whether or not the signal received from each of the plurality of audio conferencing terminals by the receiving unit is a predetermined signal, and
A group management unit that forms a group consisting of voice conference terminals that are sources of each of the plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit.
A voice recognition unit that recognizes the voice signal received from the voice conference terminal,
A transmission unit that transmits a character string indicating a recognition result of a voice signal received from two or more voice conference terminals belonging to the same group to the two or more voice conference terminals.
A voice conference support device equipped with.

前記受信部は、前記音声会議端末から第１の信号および第２の信号を受信し、
前記第１の信号は、前記音声会議端末から他の音声会議端末に送信される信号であり、
前記第２の信号は、前記音声会議端末が前記他の音声会議端末から受信した信号である、請求項１に記載の音声会議支援装置。 The receiving unit receives the first signal and the second signal from the voice conference terminal, and receives the first signal and the second signal.
The first signal is a signal transmitted from the voice conference terminal to another voice conference terminal.
The voice conference support device according to claim 1, wherein the second signal is a signal received by the voice conference terminal from the other voice conference terminal.

前記第１の信号は、前記音声会議端末の利用者が発した音声を示す音声信号であり、
前記第２の信号は、前記他の音声会議端末の利用者が発した音声を示す音声信号である、請求項２に記載の音声会議支援装置。 The first signal is a voice signal indicating a voice emitted by a user of the voice conference terminal.
The voice conference support device according to claim 2, wherein the second signal is a voice signal indicating a voice emitted by a user of the other voice conference terminal.

前記所定の信号は、第１のキーワードを含む音声を示す音声信号である、請求項３に記載の音声会議支援装置。 The voice conference support device according to claim 3, wherein the predetermined signal is a voice signal indicating voice including the first keyword.

前記識別部は、前記複数の音声会議端末の各々から受信された信号が示す音声を発した利用者をさらに識別し、
前記グループ管理部は、前記識別部により所定の時間差内で前記所定の信号であると識別され、かつ、音声を発した利用者が同一であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成する、請求項４に記載の音声会議支援装置。 The identification unit further identifies the user who has emitted the sound indicated by the signal received from each of the plurality of voice conference terminals.
The group management unit is a source of each of a plurality of signals identified by the identification unit as the predetermined signal within a predetermined time difference and identified as the same user who emitted the voice. The voice conference support device according to claim 4, which forms a group consisting of a voice conference terminal.

前記送信部は、前記グループが形成された後に、当該グループに属する前記２以上の音声会議端末に、前記第２の信号の送信の停止を指示する制御信号を送信する、請求項２または３〜５までのいずれか一項に記載の音声会議支援装置。 2. The voice conference support device according to any one of up to 5.

前記識別部は、前記受信部により前記複数の音声会議端末の各々から受信された音声信号が第２のキーワードを含む音声を示すか否かを識別し、
前記グループ管理部は、前記グループに属するいずれかの音声会議端末から受信された音声信号が前記第２のキーワードを含む音声を示すことが前記識別部により識別された場合、前記グループを解除する、請求項３〜６までのいずれか一項に記載の音声会議支援装置。 The identification unit identifies whether or not the audio signal received from each of the plurality of audio conferencing terminals by the receiving unit indicates audio including the second keyword.
When the identification unit identifies that the audio signal received from any of the audio conferencing terminals belonging to the group indicates the audio including the second keyword, the group management unit releases the group. The audio conference support device according to any one of claims 3 to 6.

前記送信部は、前記グループが解除された場合、前記グループに属していた前記２以上の音声会議端末に、前記第２の信号の送信の開始を指示する制御信号を送信する、請求項７に記載の音声会議支援装置。 According to claim 7, when the group is released, the transmitting unit transmits a control signal instructing the start of transmission of the second signal to the two or more voice conferencing terminals belonging to the group. The described voice conference support device.

前記所定の信号は、前記音声会議端末に事前に記録されており、前記音声会議端末への利用者による操作により読み出された信号である、請求項１に記載の音声会議支援装置。 The voice conference support device according to claim 1, wherein the predetermined signal is a signal that is recorded in advance in the voice conference terminal and read by a user operation on the voice conference terminal.

前記所定の信号は、音声信号または非可聴領域の成分からなる信号である、請求項９に記載の音声会議支援装置。 The voice conference support device according to claim 9, wherein the predetermined signal is a voice signal or a signal composed of a component in an inaudible region.

コンピュータを、
複数の音声会議端末から信号を受信する受信部と、
前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、
前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、
音声会議端末から受信された音声信号を認識する音声認識部と、
同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、
として機能させるための、プログラム。 Computer,
A receiver that receives signals from multiple audio conferencing terminals,
An identification unit that identifies whether or not the signal received from each of the plurality of audio conferencing terminals by the receiving unit is a predetermined signal, and
A group management unit that forms a group consisting of voice conference terminals that are sources of each of the plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit.
A voice recognition unit that recognizes the voice signal received from the voice conference terminal,
A transmission unit that transmits a character string indicating a recognition result of a voice signal received from two or more voice conference terminals belonging to the same group to the two or more voice conference terminals.
A program to function as.

複数の音声会議端末から信号を受信することと、
前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別することと、
所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成することと、
音声会議端末から受信された音声信号を認識することと、
同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信することと、
を含む、音声会議支援方法。 Receiving signals from multiple audio conferencing terminals
Identifying whether or not the signal received from each of the plurality of audio conferencing terminals is a predetermined signal, and
Forming a group consisting of voice conferencing terminals that are the sources of each of the plurality of signals identified as the predetermined signals within a predetermined time difference.
Recognizing the audio signal received from the audio conference terminal
Sending a character string indicating the recognition result of the audio signal received from two or more audio conferencing terminals belonging to the same group to the two or more audio conferencing terminals, and
Voice conference support methods, including.

複数の音声会議端末および音声会議支援装置を有する音声会議支援システムであって、
前記複数の音声会議端末の各々は、前記音声会議支援装置に信号を送信し、
前記音声会議支援装置は、
複数の音声会議端末から信号を受信する受信部と、
前記受信部により前記複数の音声会議端末の各々から受信された信号が所定の信号であるか否かを識別する識別部と、
前記識別部により所定の時間差内で前記所定の信号であると識別された複数の信号の各々の送信元である音声会議端末からなるグループを形成するグループ管理部と、
音声会議端末から受信された音声信号を認識する音声認識部と、
同一のグループに属する２以上の音声会議端末から受信された音声信号の認識結果を示す文字列を、前記２以上の音声会議端末に送信する送信部と、
を備える、音声会議支援システム。
A voice conference support system having a plurality of voice conference terminals and voice conference support devices.
Each of the plurality of voice conference terminals transmits a signal to the voice conference support device,
The voice conference support device is
A receiver that receives signals from multiple audio conferencing terminals,
An identification unit that identifies whether or not the signal received from each of the plurality of audio conferencing terminals by the receiving unit is a predetermined signal, and
A group management unit that forms a group consisting of voice conference terminals that are sources of each of the plurality of signals identified as the predetermined signal within a predetermined time difference by the identification unit.
A voice recognition unit that recognizes the voice signal received from the voice conference terminal,
A transmission unit that transmits a character string indicating a recognition result of a voice signal received from two or more voice conference terminals belonging to the same group to the two or more voice conference terminals.
A voice conference support system equipped with.