JP2009086207A

JP2009086207A - Minute information generation system, minute information generation method, and minute information generation program

Info

Publication number: JP2009086207A
Application number: JP2007254717A
Authority: JP
Inventors: Seiji Hirano; 誠治平野; Yoshie Arai; 美江新井; Kazue Arai; 和重荒井
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2009-04-23

Abstract

<P>PROBLEM TO BE SOLVED: To provide a minute information generation system, capable of efficiently generating accurate minute information without effort. <P>SOLUTION: The minute information generation system for generating minute information using a voice recognition means for performing text-conversion of voice information to text information comprises a voice input means for converting a voice input from a user to the voice information. In the system, user information including information for identifying the user is preliminarily stored, the voice input means used by the user is stored in association with the user information, the user information corresponding to the voice input means through which the voice information is input is detected, and minute information that is information associated with the detected user information with the text information that is the information applied text-conversion of the voice information by the voice recognition means is generated. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成システムに関する。 The present invention relates to a minutes information generating system for generating minutes information using a voice recognition means for converting voice information into text information.

従来、コンピュータ端末によって形成されるネットワークを利用して、会議の映像や音声を遠隔地間で送受信して会議を行う電子会議システムが利用されている。
また、一方で、近年では、入力される音声情報をテキスト情報に変換する音声認識技術が発展しており（例えば、特許文献１参照）、様々な場面に応用されるようになってきた（例えば、特許文献２参照）。
そこで、このような電子会議システムと音声認識技術とを用いて、電子会議システムにおいて交わされる会議内容としての音声情報を、音声認識技術によってテキスト情報に変換し、保存することで、議事録情報を生成する電子会議システムが提案されている（例えば、特許文献３参照）。
特開２００４−３６１７６９号公報特開２００７−２０６０１１号公報特開２００５−３４１０１５号公報 2. Description of the Related Art Conventionally, an electronic conference system that uses a network formed by computer terminals to perform a conference by transmitting and receiving conference video and audio between remote locations has been used.
On the other hand, in recent years, a speech recognition technique for converting input speech information into text information has been developed (see, for example, Patent Document 1), and has been applied to various scenes (for example, , See Patent Document 2).
Therefore, by using such an electronic conference system and voice recognition technology, the voice information as the contents of the meeting exchanged in the electronic conference system is converted into text information by the voice recognition technology and saved, so that the minutes information can be stored. An electronic conference system to be generated has been proposed (see, for example, Patent Document 3).
JP 2004-361769 A JP 2007-206011 A JP 2005-341015 A

しかしながら、このような議事録情報生成システムでは、会議中にどのような発言があったかを記憶することはできるが、誰がどの発言を行ったかを記憶できるものではない。
また、複数の会議参加ユーザが同時に発言した場合、重なった音声情報が音声情報として入力されるために音声認識エラーとなり、その間の発言は議事録が生成されないこととなる。 However, in such a minutes information generation system, it is possible to memorize what kind of remarks were made during the meeting, but it is not possible to memorize who made what remarks.
In addition, when a plurality of conference participation users speak at the same time, since the overlapped voice information is input as voice information, a voice recognition error occurs, and the minutes during that time are not generated.

また、このような音声認識を精度良く行うために、個々のユーザが発するそれぞれの特徴を予め記憶する音声プロファイルと呼ばれる情報を予め作成しておき、音声情報が入力される場合には、このような音声プロファイル情報を指標として用い、音声情報のキャリブレーションを行ってテキスト情報に変換する音声認識処理が一般的に行われている。
しかしながら、そのような音声プロファイル情報を生成するためには、ユーザが定型文などをマイクに向かって発声しながら、入力された音声情報を解析して特徴を抽出する処理が必要である。会議の都度このような音声情報の特徴抽出処理を行うのは、ユーザにとって面倒であり、負担となっている。 In addition, in order to perform such voice recognition with high accuracy, information called a voice profile that stores in advance each feature emitted by each user is created in advance, and this is the case when voice information is input. A speech recognition process is generally performed in which speech information is calibrated and converted into text information by using simple speech profile information as an index.
However, in order to generate such audio profile information, it is necessary for the user to analyze the input audio information and extract features while uttering a standard sentence or the like toward the microphone. It is troublesome and burdensome for the user to perform such voice information feature extraction processing at each meeting.

また、このような音声情報をテキスト情報に変換する音声認識処理は、複雑な計算処理を必要とするため、コンピュータへの処理負荷が高いことが一般的である。このため、リアルタイムでの音声認識処理では、音声認識の計算処理が会議の進行に追いつかずに、計算処理が中断されてしまったり、エラーとなったりする場合がある。
また、このような議事録情報には、会議の発言内容に加えて、会議時間や会議参加者などの会議情報が記録されることが望ましい。 In addition, since the speech recognition processing for converting such speech information into text information requires a complicated calculation process, the processing load on the computer is generally high. For this reason, in real-time voice recognition processing, the calculation processing of voice recognition may not catch up with the progress of the conference, and the calculation processing may be interrupted or an error may occur.
In addition to the content of the conference statement, conference information such as conference time and conference participants is preferably recorded in such minutes information.

本発明は、このような状況に鑑みてなされたもので、情報の精度が高く、かつ手間をかけずに効率良く議事録情報を生成する議事録情報生成システムを提供する。 The present invention has been made in view of such a situation, and provides a minutes information generation system that generates minute information efficiently without taking time and effort.

上述の課題を解決するために、本発明は、音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成システムであって、ユーザから入力される音声を音声情報に変換する音声入力手段と、ユーザを識別する情報を含むユーザ情報を予め記憶するユーザ情報記憶手段と、ユーザが利用する音声入力手段と、ユーザ情報記憶手段に記憶されたユーザのユーザ情報とを対応付けて記憶する対応情報記憶手段と、対応情報記憶手段から、音声が入力された音声入力手段に対応するユーザ情報を検出する対応情報検出手段と、音声情報が入力された音声入力手段に対応するユーザ情報を対応情報検出手段によって検出し、検出したユーザ情報と、音声情報が音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成する議事録情報生成手段と、を備えることを特徴とする議事録情報生成システムである。 In order to solve the above-described problems, the present invention provides a minutes information generation system for generating minutes information using a voice recognition means for converting voice information into text information, and the voice input from a user is received. Voice input means for converting into voice information, user information storage means for storing user information including information for identifying the user in advance, voice input means used by the user, and user information of the user stored in the user information storage means Correspondence information storage means for storing information in association with each other, correspondence information detection means for detecting user information corresponding to the voice input means to which voice is input from the correspondence information storage means, and voice input means for which voice information is input User information corresponding to the user information is detected by the corresponding information detection means, and the detected user information and the voice information is converted into text by the voice recognition means. And minutes information generating means for generating the minutes information as information and text information associating that a minutes information generating system comprising: a.

本発明は、上述のユーザ情報記憶手段は、可搬記憶媒体が有しており、可搬記憶媒体がユーザによって接続されると、可搬記憶媒体に記憶されたユーザ情報を読み出すユーザ情報読出手段と、ユーザ情報読出手段が読み出したユーザ情報と、音声入力手段とを対応付けた情報を、対応情報記憶手段に記憶させる対応情報制御手段と、をさらに備えることを特徴とする。 In the present invention, the above-described user information storage means is provided in a portable storage medium, and when the portable storage medium is connected by a user, the user information reading means for reading out the user information stored in the portable storage medium. And correspondence information control means for storing information in which the user information read by the user information reading means is associated with the voice input means in the correspondence information storage means.

本発明は、上述のユーザ情報記憶手段が記憶するユーザ情報は、ユーザ情報が示すユーザが発する音声の特徴を示す音声特徴情報を含み、音声認識手段は、音声情報が入力された音声入力手段に対応付けられた音声特徴情報に基づいて音声情報をテキスト情報に変換することを特徴とする。 According to the present invention, the user information stored in the user information storage unit includes voice feature information indicating a feature of a voice uttered by the user indicated by the user information, and the voice recognition unit is connected to the voice input unit to which the voice information is input. Voice information is converted into text information based on the voice characteristic information associated with the voice characteristic information.

本発明は、上述の対応情報記憶手段は、複数の音声入力手段のうちいずれかの音声入力手段と、複数のユーザ情報のうちいずれかのユーザ情報とを一対一で対応付ける情報を記憶することを特徴とする。 According to the present invention, the correspondence information storage unit stores information that associates one of the plurality of voice input units with one of the plurality of user information on a one-to-one basis. Features.

本発明は、上述の音声入力手段に入力される音声情報を記憶する音声情報記憶手段をさらに備え、上述の議事録情報生成手段は、音声情報記憶手段から読み出した音声情報に基づいて、議事録情報を生成することを特徴とする。 The present invention further includes voice information storage means for storing voice information input to the voice input means, and the minutes information generation means is based on the voice information read from the voice information storage means. It is characterized by generating information.

本発明は、会議参加ユーザ、会議場所、及び会議時間を示す情報である会議情報の入力を受け付ける会議情報入力手段をさらに備え、議事録情報生成手段が生成する議事録情報に、会議情報を付加することを特徴とする。 The present invention further includes conference information input means for receiving input of conference information, which is information indicating a conference participation user, a conference location, and a conference time, and adds conference information to the minutes information generated by the minutes information generation means It is characterized by doing.

本発明は、音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成方法であって、ユーザ情報記憶手段が、ユーザを識別する情報を含むユーザ情報を予め記憶するステップと、音声入力手段が、ユーザから入力される音声を音声情報に変換するステップと、対応情報記憶手段が、ユーザが利用する音声入力手段と、ユーザ情報記憶手段に記憶されたユーザのユーザ情報とを対応付けて記憶するステップと、対応情報検出手段が、対応情報記憶手段から、音声が入力された音声入力手段に対応するユーザ情報を検出するステップと、議事録情報生成手段が、音声情報が入力された音声入力手段に対応するユーザ情報を対応情報検出手段によって検出し、検出されたユーザ情報と、音声情報が音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成するステップと、を備えることを特徴とする議事録情報生成方法である。 The present invention relates to a minutes information generating method for generating minutes information using a voice recognition means for converting voice information into text information, wherein the user information storage means stores user information including information for identifying a user. A step of storing in advance, a step in which the voice input means converts voice inputted from the user into voice information, a correspondence information storage means in which the user uses voice input means, and a user stored in the user information storage means Storing the user information in association with each other, the correspondence information detecting means detecting from the correspondence information storing means the user information corresponding to the voice input means to which the voice is inputted, and the minutes information generating means The user information corresponding to the voice input means to which the voice information is input is detected by the corresponding information detection means, and the detected user information and the voice information are voiced. Is minutes information generating method characterized by comprising the steps of: generating the minutes information is information that associates the text information is information that is text converted by the identification means.

本発明は、ユーザから入力される音声を音声情報に変換する音声入力手段を備え、音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成装置としてのコンピュータに、ユーザ情報記憶手段が、ユーザを識別する情報を含むユーザ情報を予め記憶するステップと、対応情報記憶手段が、ユーザが利用する音声入力手段と、ユーザ情報記憶手段に記憶されたユーザのユーザ情報とを対応付けて記憶するステップと、対応情報検出手段が、対応情報記憶手段から、音声が入力された音声入力手段に対応するユーザ情報を検出するステップと、議事録情報生成手段が、音声情報が入力された音声入力手段に対応するユーザ情報を対応情報検出手段によって検出し、検出されたユーザ情報と、音声情報が音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成するステップと、を実行させるための議事録情報生成プログラムである。 The present invention includes a voice input unit that converts voice input from a user into voice information, and serves as a minutes information generation device that generates minutes information using a voice recognition unit that converts voice information into text information. In the computer, the user information storage means stores user information including information for identifying the user in advance, the correspondence information storage means is the voice input means used by the user, and the user information stored in the user information storage means. A step of storing user information in association with each other, a step of detecting correspondence information from the correspondence information storage unit, and a step of detecting user information corresponding to the voice input unit to which the voice is input, and a minutes information generating unit, User information corresponding to the voice input means to which voice information is input is detected by the corresponding information detection means, and the detected user information and voice information are detected. Is minutes information generation program for executing the steps of: generating the minutes information is information that associates the text information is information that is text converted by the speech recognition means.

以上説明したように、本発明によれば、ユーザの識別情報を含むユーザ情報を予め記憶し、音声入力手段にユーザ情報を対応付けておき、音声入力手段に入力される音声情報が音声認識手段によってテキスト変換された情報であるテキスト情報と、その音声情報が入力された音声入力手段に対応するユーザ情報とを対応付けた議事録情報を生成するようにしたので、ユーザを識別する情報と音声認識手段により生成されたテキスト情報とが対応付けられた議事録情報を生成することが可能となり、また、ユーザは予め自身のユーザ情報をユーザ情報記憶手段に記憶させておくことで、会議の都度、自身の情報を入力する必要がないので、詳細な議事録情報を効率よく生成することができる。 As described above, according to the present invention, user information including user identification information is stored in advance, user information is associated with voice input means, and voice information input to the voice input means is voice recognition means. Since the minutes information is generated by associating the text information, which is the information converted into the text by the user, and the user information corresponding to the voice input means to which the voice information is input, the information for identifying the user and the voice It is possible to generate minutes information associated with the text information generated by the recognition means, and the user stores his / her user information in the user information storage means in advance so that each meeting can be performed. Since it is not necessary to input own information, detailed minutes information can be generated efficiently.

さらに、本発明によれば、ユーザ情報を可搬記憶媒体に記憶させることとしたので、ユーザは、予め自身のユーザ情報を記憶させた可般記憶媒体を携帯して持ち歩くことができ、その可搬記憶媒体の情報を議事録情報生成システムに読みこませることで、会議の都度ユーザ情報を入力する手間を省くことができる。 Furthermore, according to the present invention, since the user information is stored in the portable storage medium, the user can carry the portable storage medium in which the user information is stored in advance and carry it. By having the minutes information generation system read the information in the portable storage medium, it is possible to save the trouble of inputting user information at each meeting.

さらに、本発明によれば、ユーザ情報として、ユーザの音声特徴情報を記憶するようにしたので、音声認識手段は、音声特徴情報に基づいて音声情報をテキスト情報に変換することができ、より精度の高い音声認識が可能となる。 Further, according to the present invention, since the voice feature information of the user is stored as the user information, the voice recognition means can convert the voice information to text information based on the voice feature information, and more accurately. High voice recognition is possible.

さらに、本発明によれば、複数の音声入力手段のうちいずれかの音声入力手段と、複数のユーザ情報のうちいずれかのユーザ情報とを一対一で対応付けるようにしたので、複数ユーザによる会議を行う場合にも、それぞれのユーザの発言を個別に取得することができ、同時に発言が行われた場合にも、それぞれの発言をテキスト情報に変換する精度を高くすることができる。 Furthermore, according to the present invention, since any one of the plurality of voice input means and one of the plurality of user information are associated one-to-one, a conference by a plurality of users can be performed. Even in the case of performing, it is possible to individually acquire the utterances of the respective users, and even when the utterances are performed at the same time, the accuracy of converting the respective utterances into text information can be increased.

さらに、本発明によれば、音声入力手段に入力される音声情報を記憶する音声情報記憶手段を設け、音声情報記憶手段に記憶された音声情報に基づいた議事録情報の生成を行うことができるようにしたので、負荷の高い音声認識処理をリアルタイムに行うことが困難である場合でも、音声情報を記憶しておき、事後的に議事録情報を生成することができる。 Further, according to the present invention, the voice information storage means for storing the voice information input to the voice input means is provided, and the minutes information can be generated based on the voice information stored in the voice information storage means. As described above, even when it is difficult to perform high-load voice recognition processing in real time, voice information can be stored and minutes information can be generated later.

さらに、本発明によれば、会議参加ユーザ、会議場所、及び会議時間を示す情報である会議情報の入力を受け付けて議事録情報に付加するようにしたので、議事録情報への情報追加や修正を行わなくても、議事録として必要な情報項目を含んだ議事録情報を生成することができる。 Furthermore, according to the present invention, since the input of the meeting information, which is information indicating the meeting participation user, the meeting place, and the meeting time, is received and added to the minutes information, information addition or correction to the minutes information is performed. Even if not performed, it is possible to generate minutes information including necessary information items as minutes.

以下、本発明の一実施形態について、図面を参照して説明する。
＜第１の実施形態＞
図１は、本実施形態による議事録情報生成システム１０の構成を示すブロック図である。
本実施形態による議事録情報生成システム１０は、ユーザの個人情報を記憶するＩＣ（集積回路）カード１００と、会議に出席するユーザごとに用意され設置される会議端末２００と、会議端末２００から受信する情報に基づいて議事録情報を生成する３００とを備えている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
<First Embodiment>
FIG. 1 is a block diagram showing the configuration of a minutes information generation system 10 according to this embodiment.
The minutes information generation system 10 according to the present embodiment includes an IC (integrated circuit) card 100 that stores personal information of users, a conference terminal 200 that is prepared and installed for each user attending the conference, and received from the conference terminal 200. 300 for generating the minutes information based on the information to be recorded.

なお、図１には、一枚のＩＣカード１００を図示して本実施形態を説明するが、ＩＣカード１００は、複数のユーザがそれぞれ自身のユーザ情報を記憶するＩＣカード１００を所持して良い。また、図１には、一台の会議端末２００を図示して本実施形態を説明するが、会議端末２００は、ＩＣカード１００と一対一になるように複数台を設置する。本実施形態では、電子会議室の席ごとに、会議端末２００が設置されることを想定する。複数台の会議端末２００を利用する場合には、複数の会議端末２００が、一台の議事録生成サーバ３００にネットワークを介して接続されるようにしても良い。 FIG. 1 illustrates the present embodiment with one IC card 100 illustrated, but the IC card 100 may have an IC card 100 in which a plurality of users store their own user information. . In addition, FIG. 1 illustrates one embodiment of the conference terminal 200, but a plurality of conference terminals 200 are installed so as to be one-to-one with the IC card 100. In the present embodiment, it is assumed that the conference terminal 200 is installed for each seat of the electronic conference room. When a plurality of conference terminals 200 are used, the plurality of conference terminals 200 may be connected to one minutes generation server 300 via a network.

ＩＣカード１００は、情報の記憶と演算が可能なＩＣチップを備える可搬記憶媒体であり、ユーザ情報記憶部１０１を備えている。ユーザ情報記憶部１０１は、ユーザ番号、氏名、音声プロファイルなどの情報を記憶する。ユーザ番号とは、本システムにおいてユーザを一意に特定するための識別情報であり、数値や、数値と文字とを組み合わせた情報である。氏名とは、ユーザの氏名を示す情報である。この他に、組織内におけるユーザの所属部署名などをさらに記憶させることとしても良い。 The IC card 100 is a portable storage medium including an IC chip capable of storing and calculating information, and includes a user information storage unit 101. The user information storage unit 101 stores information such as a user number, name, and voice profile. The user number is identification information for uniquely identifying a user in the present system, and is a numerical value or information combining numerical values and characters. The name is information indicating the name of the user. In addition to this, it is also possible to further store the department name of the user in the organization.

音声プロファイルとは、ユーザが発する音声の特徴を表す情報であり、その音声の周波数、速度、アクセント、イントネーションなどの特徴が数値化された情報である。ここで、ＩＣカード１００は、ＩＣカードリーダ２０１と接触して情報の読み書きを行う接触型でも良いし、無線通信で情報の読み書きを行う非接触型でも良い。また、ＩＣカード１００は、上述のようなユーザ情報記憶部１０１を備えるものであれば、ＩＣカード、ＳＩＭ（Subscriber Identity Module）、ＳＤ（Secure Digital）メモリ、ＲＦＩＤ（Radio Frequency IDentification）などの携帯媒体でも良い。このような携帯媒体は、対タンパ性のあるセキュアなＩＣを搭載した携帯媒体であることが望ましい。 The voice profile is information representing the characteristics of the voice uttered by the user, and is information in which characteristics such as the frequency, speed, accent, and intonation of the voice are digitized. Here, the IC card 100 may be a contact type that reads and writes information by contacting the IC card reader 201, or may be a non-contact type that reads and writes information by wireless communication. If the IC card 100 includes the user information storage unit 101 as described above, a portable medium such as an IC card, a SIM (Subscriber Identity Module), an SD (Secure Digital) memory, and an RFID (Radio Frequency IDentification). But it ’s okay. Such a portable medium is desirably a portable medium equipped with a tamper-resistant secure IC.

会議端末２００は、本システムを利用して開催される会議に出席するユーザに利用される情報端末である。会議端末２００は、ＩＣカードリーダ２０１と、マイク２０２とを備えている。ＩＣカードリーダ２０１は、上述のＩＣカード１００に接続し、記憶された情報を読みだす。本実施形態の会議端末２００には、ＩＣカードリーダ２０１として、ＩＣカード１００の挿入口（スロット）が設けられていることとする。マイク２０２は、ユーザが発する音声を電気信号に変換し、音声情報を生成する。ここで、ユーザの数（ＩＣカード１００の数）と同数の会議端末２００が用意され、複数のユーザはそれぞれ１台の会議端末２００を利用することとする。会議端末２００は、少なくともこのような機能を備えていれば、ＰＣのような装置でも良いし、このようなＩＣカードリーダ２０１とマイク２０２とのみを備えた簡単な機器でも良い。 The conference terminal 200 is an information terminal used by a user who attends a conference held using this system. The conference terminal 200 includes an IC card reader 201 and a microphone 202. The IC card reader 201 is connected to the above-described IC card 100 and reads stored information. Assume that the conference terminal 200 of this embodiment is provided with an insertion slot (slot) for the IC card 100 as the IC card reader 201. The microphone 202 converts voice uttered by the user into an electrical signal and generates voice information. Here, the same number of conference terminals 200 as the number of users (the number of IC cards 100) are prepared, and a plurality of users each use one conference terminal 200. The conference terminal 200 may be an apparatus such as a PC as long as it has at least such a function, or may be a simple device including only the IC card reader 201 and the microphone 202.

議事録生成サーバ３００は、会議端末２００とネットワークを介して情報通信を行い、会議端末２００から受信する情報に基づいて、議事録情報を生成する。ここで、会議端末２００と議事録生成サーバ３００とを接続するネットワークは、情報通信が可能であれば良く、いわゆるインターネットでも良いし、ＬＡＮ（Local Area Network）内のネットワークでも良いし、その他の情報通信ネットワークでも良い。会議端末２００と議事録生成サーバ３００とのインターフェイスは、ＵＳＢ等によるシリアル通信、赤外線等の無線通信のインターフェイスでも良い。議事録生成サーバ３００は、対応情報記憶部３０１と、対応情報制御部３０２と、音声認識部３０３と、議事録情報生成部３０４と、議事録情報記憶部３０５とを備えている。 The minutes generation server 300 performs information communication with the conference terminal 200 via the network, and generates minutes information based on information received from the conference terminal 200. Here, the network that connects the conference terminal 200 and the minutes generation server 300 only needs to be capable of information communication, and may be a so-called Internet, a network within a LAN (Local Area Network), or other information. It may be a communication network. The interface between the conference terminal 200 and the minutes generation server 300 may be a serial communication interface such as USB or a wireless communication interface such as infrared. The minutes generation server 300 includes a correspondence information storage unit 301, a correspondence information control unit 302, a voice recognition unit 303, a minutes information generation unit 304, and a minutes information storage unit 305.

対応情報記憶部３０１は、マイク２０２と、ＩＣカードリーダ２０１とを対応付ける情報を記憶する。対応情報記憶部３０１は、会議端末２００から、ＩＣカード１００のユーザ情報記憶部１０１に記憶されたユーザ情報と、その会議端末２００が備えるマイク２０２の識別情報とを受信し、対応させて記憶する。
対応情報制御部３０２は、対応情報記憶部３０１から、音声が入力されたマイク２０２に対応するユーザ情報を検出する。 The correspondence information storage unit 301 stores information for associating the microphone 202 with the IC card reader 201. The correspondence information storage unit 301 receives the user information stored in the user information storage unit 101 of the IC card 100 and the identification information of the microphone 202 included in the conference terminal 200 from the conference terminal 200 and stores them in association with each other. .
The correspondence information control unit 302 detects user information corresponding to the microphone 202 to which the voice is input from the correspondence information storage unit 301.

音声認識部３０３は、入力される音声情報の周波数、速度、アクセント、イントネーションなどの音声特徴情報を解析した上で、予め記憶された単語辞書情報を参照して音声情報に対応するテキスト情報を生成する音声認識処理を行う。ここで、ユーザが発する音声情報には、個人によって傾向が異なり、ある程度の音声パターンがある。そこで、音声認識部３０３は、予めそのユーザの音声特徴情報を取得しておき、予め記憶された音声特徴情報と、入力される解析対象の音声情報を比較することで、より精度の高い音声認識処理を行うことができるものである。 The speech recognition unit 303 analyzes speech feature information such as frequency, speed, accent, and intonation of input speech information, and generates text information corresponding to the speech information with reference to pre-stored word dictionary information Perform voice recognition processing. Here, the voice information emitted by the user has a tendency to vary depending on the individual and includes a certain level of voice pattern. Therefore, the voice recognition unit 303 acquires the voice feature information of the user in advance, and compares the voice feature information stored in advance with the input voice information to be analyzed, thereby achieving higher-precision voice recognition. It can be processed.

議事録情報生成部３０４は、音声が入力されたマイク２０２に対応するユーザ情報を、対応情報制御部３０２を介して検出し、ユーザ情報と、音声情報が音声認識部３０３によってテキスト変換されたテキスト情報とを対応づけた議事録情報を生成する。議事録情報記憶部３０５は、議事録情報生成部３０４に生成された議事録情報を記憶する。 The minutes information generation unit 304 detects user information corresponding to the microphone 202 to which the voice is input via the correspondence information control unit 302, and the user information and the text in which the voice information is converted into a text by the voice recognition unit 303. Minutes information is created in association with the information. The minutes information storage unit 305 stores the minutes information generated in the minutes information generation unit 304.

次に、本発明による議事録情報生成システムの動作例を説明する。
図２は、本実施形態による議事録情報生成システム１０の動作例を示すフローチャートである。
各ユーザは、会議端末２００が設置された会議室に集合し、予め自身のユーザ情報が記憶されたＩＣカード１００を、ＩＣカードリーダ２０１のスロットに挿入する。議事録情報生成システム１０は、音響モデル設定処理を開始する（ステップＳ１０）。ここで、音響モデル設定処理とは、解析対象の音声、音素がそれぞれどのような周波数特性を持っているか等を予め定める処理である。一般的には、音響モデルは、混合正規分布を出力確率とした隠れマルコフモデルなどによって表される。 Next, an operation example of the minutes information generation system according to the present invention will be described.
FIG. 2 is a flowchart showing an operation example of the minutes information generation system 10 according to the present embodiment.
Each user gathers in the conference room in which the conference terminal 200 is installed, and inserts the IC card 100 in which the user information is stored in advance into the slot of the IC card reader 201. The minutes information generation system 10 starts the acoustic model setting process (step S10). Here, the acoustic model setting process is a process for determining in advance what frequency characteristics each of the speech and phoneme to be analyzed has. In general, the acoustic model is represented by a hidden Markov model whose output probability is a mixed normal distribution.

図３は、音響モデル設定処理を詳細に示すフローチャートである。会議端末２００のＩＣカードリーダ２０１は、ＩＣカード１００のユーザ情報記憶部１０１に記憶されたユーザ情報から、音声プロファイルを読み出す（ステップＳ１１）。また、ＩＣカードリーダ２０１は、ＩＣカード１００のユーザ情報記憶部１０１に記憶されたユーザ情報から、氏名情報を読み出す（ステップＳ１２）。そして、会議端末２００は、ＩＣカードリーダ２０１が読み出したユーザ情報を、議事録生成サーバ３００に送信する（ステップＳ１３）。議事録生成サーバ３００は、受信する音声プロファイルと、氏名情報と、これらのユーザを送信してきた会議端末２００が備えるマイク２０２の識別情報とを対応付けて、対応情報記憶部３０１に記憶させる。 FIG. 3 is a flowchart showing in detail the acoustic model setting process. The IC card reader 201 of the conference terminal 200 reads the voice profile from the user information stored in the user information storage unit 101 of the IC card 100 (step S11). Further, the IC card reader 201 reads the name information from the user information stored in the user information storage unit 101 of the IC card 100 (step S12). Then, the conference terminal 200 transmits the user information read by the IC card reader 201 to the minutes generation server 300 (step S13). The minutes generation server 300 associates the received audio profile, name information, and identification information of the microphone 202 included in the conference terminal 200 that has transmitted these users, and stores them in the correspondence information storage unit 301.

図２に戻り、議事録情報生成システム１０は、音声入力設定確認処理を行う（ステップＳ２０）。図４は、音声入力設定確認処理を詳細に示すフローチャートである。音声入力設定確認処理は、音声認識のために、議事録情報生成システム１０が予め行うキャリブレーションの処理である。例えば、会議端末２００は、ディスプレイを備えることとして、ユーザに、予め定められた文章を読み上げる音声入力を促すメッセージを表示する。 Returning to FIG. 2, the minutes information generation system 10 performs a voice input setting confirmation process (step S20). FIG. 4 is a flowchart showing in detail the voice input setting confirmation process. The voice input setting confirmation process is a calibration process performed in advance by the minutes information generation system 10 for voice recognition. For example, the conference terminal 200 includes a display, and displays a message that prompts the user to input a voice to read a predetermined sentence.

そして、ユーザが、マイク２０２に予め定められた音声を入力する（ステップＳ２１）。音声認識部３０３は、マイク２０２によって出力される音声情報の周波数や強弱等の音声特徴情報を解析し、マイク２０２に対応付けて対応情報記憶部３０１に記憶された対応情報に基づいて音声確認処理を行う（ステップＳ２２）。また、入力された文章に対応する単語を、予め記憶された単語辞書情報から検出する確認処理を行う（ステップＳ２３）。議事録情報生成システム１０は、このようにして行うステップＳ２２またはステップＳ２３の処理で、予め定められた異常を検知した場合は、以降の処理を行わないこととしても良いし、音響モデルの補正処理や、予め定められた単語辞書情報の補正処理などを行ってから、処理を継続することとしても良い。 Then, the user inputs a predetermined voice to the microphone 202 (step S21). The voice recognition unit 303 analyzes voice feature information such as frequency and strength of voice information output from the microphone 202 and performs voice confirmation processing based on correspondence information stored in the correspondence information storage unit 301 in association with the microphone 202. Is performed (step S22). Moreover, the confirmation process which detects the word corresponding to the input sentence from the word dictionary information stored beforehand is performed (step S23). When the minutes information generation system 10 detects a predetermined abnormality in the process of step S22 or step S23 performed as described above, the subsequent process may not be performed, and the acoustic model correction process may be performed. Alternatively, the processing may be continued after performing predetermined word dictionary information correction processing or the like.

そして、会議の主催者が、議事録情報生成システム１０に会議を開始する命令を入力すると、議事録情報生成システム１０は、音声情報の取得を開始する（ステップＳ３０）。また、ここで、議事録情報生成部３０４は、議事録情報を生成して、議事録情報生成部３０４に記憶させる。ここで、議事録情報には、例えば、会議開始時刻や会議場所等の情報を記憶させる。 Then, when the meeting organizer inputs a command to start the meeting to the minutes information generation system 10, the minutes information generation system 10 starts to acquire voice information (step S30). Here, the minutes information generation unit 304 generates the minutes information and stores it in the minutes information generation unit 304. Here, the minutes information stores, for example, information such as the meeting start time and the meeting place.

マイク２０２に、音声が入力される（ステップＳ４０）と、会議端末２００は、マイク２０２が出力する音声情報を議事録生成サーバ３００に送信する。議事録生成サーバ３００は、会議端末２００から送信される音声情報を受信すると、音声認識部３０３が、受信する音声情報の音声分析を行う（ステップＳ５０）。 When voice is input to the microphone 202 (step S40), the conference terminal 200 transmits the voice information output from the microphone 202 to the minutes generation server 300. When the minutes generation server 300 receives the voice information transmitted from the conference terminal 200, the voice recognition unit 303 performs voice analysis of the received voice information (step S50).

そして、音声認識部３０３は、対応情報制御部３０２を介して、入力された音声情報に対応するユーザ情報のうち、音声プロファイルを対応情報記憶部３０１から読み出し、読み出した音声プロファイルに基づいた音声情報の補正処理を行う（ステップＳ６０）。例えば、入力される音声情報の周波数特性から、ユーザの発声位置からマイク２０２までの距離を予測し、予測値に基づいて音声プロファイルの補正を行うようにしても良い。 Then, the voice recognition unit 303 reads the voice profile from the correspondence information storage unit 301 among the user information corresponding to the inputted voice information via the correspondence information control unit 302, and the voice information based on the read voice profile. The correction process is performed (step S60). For example, the distance from the user's utterance position to the microphone 202 may be predicted from the frequency characteristics of the input voice information, and the voice profile may be corrected based on the predicted value.

音声認識部３０３は、音声情報から認識したそれぞれの単語について、予め記憶された単語辞書情報から最適な単語を検出する（ステップＳ７０）
そして、音声認識部３０３は、ステップＳ７０で検出された単語を結合し、テキスト情報を生成して、ユーザの氏名を示すテキスト情報ともに、議事録情報記憶部３０５に記憶された議事録情報に情報を追加して記憶させる（ステップＳ８０）。 The voice recognition unit 303 detects the optimum word from the word dictionary information stored in advance for each word recognized from the voice information (step S70).
Then, the voice recognition unit 303 combines the words detected in step S70, generates text information, and stores the text information indicating the user's name in the minutes information stored in the minutes information storage unit 305. Is added and stored (step S80).

議事録情報生成システム１０は、会議終了の命令が入力されるまで、ステップＳ４０からステップＳ８０までの処理を続ける。
議事録情報生成システム１０は、会議の主催者等から会議終了の指示情報が入力されると（ステップＳ９０：ＹＥＳ）、議事録情報記憶部３０５に記憶された議事録情報に、会議終了時間を記憶させ（ステップＳ１００）、議事録生成処理を終了する。 The minutes information generation system 10 continues the processing from step S40 to step S80 until an instruction to end the meeting is input.
When the meeting information is input from the meeting organizer or the like (step S90: YES), the minutes information generation system 10 adds the meeting end time to the minutes information stored in the minutes information storage unit 305. The minutes are stored (step S100), and the minutes generation process is terminated.

このように、本発明によれば、ユーザが、予め自身の氏名や音声プロファイルなどを記憶させたＩＣカード１００を用いることで、音声認識会議システムを利用して会議を行う際に、その都度、音声プロファイル情報や氏名などの情報入力を行わなくても、精度の高い音声認識による議事録情報を取得することができるものである。 Thus, according to the present invention, each time a user performs a conference using the voice recognition conference system by using the IC card 100 in which his / her name and voice profile are stored in advance, It is possible to acquire minutes information by voice recognition with high accuracy without inputting information such as voice profile information and name.

さらに、複数ユーザの人数分のマイク２０２を設置し、それぞれを利用するユーザのユーザ情報とマイク２０２とを対応付けておくことで、複数ユーザが参加する会議でも、複数ユーザの発する音声が混じることを防ぐことができ、精度の高い音声認識を行うことができる。 Furthermore, by installing microphones 202 for the number of users of the plurality of users and associating the user information of the users who use each with the microphones 202, voices emitted by the users can be mixed even in a conference in which the users participate. Can be prevented, and highly accurate voice recognition can be performed.

このようにすれば、海外などの遠隔地から会議に参加するようなユーザでも、予め音声プロファイルを生成し、自身の氏名などの情報とともにユーザ情報としてＩＣカード１００に記憶させておき、会議参加時には、そのユーザ情報記憶部１０１に記憶された情報をシステムに読み込ませることで、自身のユーザ情報と音声プロファイルとを会議開始時に速やかに設定することが可能となる。 In this way, even a user who participates in a conference from a remote location such as abroad, a voice profile is generated in advance and stored in the IC card 100 as user information together with information such as his / her name, By reading the information stored in the user information storage unit 101 into the system, it is possible to quickly set the user information and the voice profile of the user at the start of the conference.

また、会議室の各個人の席に、マイクなどの音声入力装置と、音声プロファイル等のユーザ情報を読み取り可能な装置とを設置しておけば、複数のユーザの音声を識別し、同時に話した内容も、各個人の席ごとに認識できるため、認識エラーを削減することができる。また、会議端末が設置された会議室で行われる会議で、各ユーザの識別や音声の認識の精度を高め、会議をしながら会議議事録を生成し、議事録生成の時間を削減することができる。このように、信頼性の高い議事録生成システムを提供することができる。 If a voice input device such as a microphone and a device capable of reading user information such as a voice profile are installed at each individual seat in the conference room, the voices of multiple users can be identified and spoke simultaneously. Since the contents can also be recognized for each individual seat, recognition errors can be reduced. Also, in meetings held in conference rooms where conference terminals are installed, it is possible to improve the accuracy of user identification and voice recognition, generate meeting minutes while holding meetings, and reduce the time for generating minutes it can. Thus, a highly reliable minutes generation system can be provided.

また、上述のような音声認識処理は、隠れマルコフモデルなどによる複雑な計算処理が必要となることから、コンピュータへの処理負荷が高いことが一般的である。このため、リアルタイムでの音声認識処理は、処理が音声入力に追いつかない場合が考えられる。そこで、議事録生成サーバ３００は、マイク２０２から入力される音声情報を、そのまま記憶しておき、休憩時間や、会議終了後などに、音声認識部３０３が音声認識処理を行って、議事録情報生成部３０４が議事録情報を生成するようにしても良い。 In addition, since the speech recognition process as described above requires a complicated calculation process such as a hidden Markov model, the processing load on the computer is generally high. For this reason, the case where the process cannot catch up with the voice input in the real-time voice recognition process can be considered. Therefore, the minutes generation server 300 stores the voice information input from the microphone 202 as it is, and the voice recognition unit 303 performs voice recognition processing at the break time or after the meeting, so that the minutes information The generation unit 304 may generate minutes information.

また、近年では、ＩＣカード等を利用して組織内での会議室の利用予約などを行う会議予約システムが一般に利用されている。このようなＩＣカードによれば、会議室を予約する際にユーザから入力された会議場所、会議時間、会議参加ユーザなどの会議情報を参照することが可能である。そこで、議事録情報生成部３０４は、このようなＩＣカードから会議情報を読み込んで議事録情報記憶部３０５に追加して議事録情報を記憶させるようにしても良い。また、ＩＣカードリーダ２０１に挿入されたＩＣカード１００が記憶するユーザ情報を、実際に会議に出席したユーザであるとして会議出席者名簿を生成することも可能である。 In recent years, a conference reservation system that uses an IC card or the like to make a reservation for use of a conference room in an organization is generally used. According to such an IC card, it is possible to refer to conference information such as a conference location, a conference time, and a conference participating user input by the user when reserving a conference room. Therefore, the minutes information generation unit 304 may read the meeting information from such an IC card, add it to the minutes information storage unit 305, and store the minutes information. It is also possible to generate a conference attendee list by assuming that the user information stored in the IC card 100 inserted into the IC card reader 201 is the user who actually attended the conference.

また、ユーザの音声を取得する音声入力手段は、その周りの雑音をも取得してしまう。そこで、各個人の席に設ける音声入力手段とは別に、外部の雑音を主に取得するための音声入力手段を１台または複数台設置して、ユーザの音声を取得する音声入力手段が取得する音声から、外部の雑音を取得するための音声入力手段が取得する音声を差し引き、フィルタリングすることで、ユーザの音声をより正確に集音するようにしても良い。 Moreover, the voice input means for acquiring the user's voice also acquires the surrounding noise. Therefore, apart from the voice input means provided at each individual seat, one or a plurality of voice input means for mainly acquiring external noise are installed, and the voice input means for acquiring the user's voice is acquired. The user's voice may be collected more accurately by subtracting and filtering the voice acquired by the voice input means for acquiring external noise from the voice.

なお、本実施形態では、複数のユーザがそれぞれ異なる会議端末２００を利用することとしたが、一台の会議端末２００を複数ユーザで利用した場合も、予め取得された音声プロファイルによるフィルタリングを行うことで、精度の高い音声認識が可能となる。
また、本実施形態では、対応情報記憶部、対応情報制御部、音声認識部などの機能部は、議事録生成サーバ３００が備えることとしたが、このような機能部は、会議端末２００が備えることとしても良い。本システムの管理者、実施者は、会議端末２００として用意するコンピュータ端末、ネットワーク、議事録生成サーバ３００としてのコンピュータ端末などの性能や特性、または本システムを利用する人数などによって、最適な端末構成を設計して良い。 In the present embodiment, a plurality of users use different conference terminals 200. However, even when a single conference terminal 200 is used by a plurality of users, filtering by a previously acquired voice profile is performed. Thus, highly accurate speech recognition is possible.
In the present embodiment, the function generation unit such as the correspondence information storage unit, the correspondence information control unit, and the voice recognition unit is provided in the minutes generation server 300. However, such a function unit is provided in the conference terminal 200. It's also good. The administrator and practitioner of this system can determine the optimum terminal configuration according to the performance and characteristics of the computer terminal, network, and computer terminal as the minutes generation server 300 prepared as the conference terminal 200, or the number of people who use this system. Can be designed.

また、本実施形態では、議事録生成サーバ３００の対応情報記憶部３０１には、ユーザ情報記憶部１０１に記憶されたユーザ情報を全て記憶することとしたが、例えば、識別情報としてのユーザ番号のみを記憶し、その他の情報はＩＣカード１００から都度読み出すようにしても良い。
また、本実施形態では、日本語による音声認識を前提として説明したが、英語、フランス語等による音声情報に対しても、同様の構成で同様の効果を得ることができる。 In the present embodiment, the correspondence information storage unit 301 of the minutes generation server 300 stores all user information stored in the user information storage unit 101. For example, only the user number as identification information is stored. , And other information may be read from the IC card 100 each time.
Further, although the present embodiment has been described on the premise of voice recognition in Japanese, the same effect can be obtained with the same configuration for voice information in English, French, and the like.

なお、本発明における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより議事録情報の生成を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Note that the program for realizing the function of the processing unit in the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed, whereby the minutes information is recorded. Generation may be performed. The “computer system” here includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a storage device such as a hard disk built in the computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

本発明の一実施形態による端末構成を示す図である。It is a figure which shows the terminal structure by one Embodiment of this invention. 本発明の一実施形態による音声認識処理を示すフローチャートである。It is a flowchart which shows the speech recognition process by one Embodiment of this invention. 本発明の一実施形態による音響モデル設定処理を示すフローチャートである。It is a flowchart which shows the acoustic model setting process by one Embodiment of this invention. 本発明の一実施形態による音声入力設定確認処理を示すフローチャートである。It is a flowchart which shows the audio | voice input setting confirmation process by one Embodiment of this invention.

符号の説明Explanation of symbols

１０議事録情報生成システム
１００ＩＣカード
１０１ユーザ情報記憶部
２００会議端末
２０１ＩＣカードリーダ
２０２マイク
３００議事録生成サーバ
３０１対応情報記憶部
３０２対応情報制御部
３０３音声認識部
３０４議事録情報生成部
３０５議事録情報記憶部 DESCRIPTION OF SYMBOLS 10 Minutes information generation system 100 IC card 101 User information storage part 200 Conference terminal 201 IC card reader 202 Microphone 300 Minutes generation server 301 Corresponding information storage part 302 Corresponding information control part 303 Voice recognition part 304 Minutes information generation part 305 Recording information storage

Claims

音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成システムであって、
ユーザから入力される音声を音声情報に変換する音声入力手段と、
前記ユーザを識別する情報を含むユーザ情報を予め記憶するユーザ情報記憶手段と、
前記ユーザが利用する前記音声入力手段と、前記ユーザ情報記憶手段に記憶された前記ユーザのユーザ情報とを対応付けて記憶する対応情報記憶手段と、
前記対応情報記憶手段から、前記音声が入力された前記音声入力手段に対応する前記ユーザ情報を検出する対応情報検出手段と、
前記音声情報が入力された前記音声入力手段に対応するユーザ情報を前記対応情報検出手段によって検出し、検出した当該ユーザ情報と、当該音声情報が前記音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成する議事録情報生成手段と、
を備えることを特徴とする議事録情報生成システム。 A minutes information generation system for generating minutes information using a voice recognition means for converting voice information into text information,
Voice input means for converting voice input from the user into voice information;
User information storage means for storing in advance user information including information for identifying the user;
Correspondence information storage means for storing the voice input means used by the user and the user information of the user stored in the user information storage means in association with each other;
Correspondence information detecting means for detecting the user information corresponding to the voice input means to which the voice is input from the correspondence information storage means;
User information corresponding to the voice input means to which the voice information has been input is detected by the correspondence information detection means, and the detected user information and text that is information obtained by converting the voice information into text by the voice recognition means Minutes information generating means for generating minutes information that is information associated with information;
The minutes information generation system characterized by comprising.

前記ユーザ情報記憶手段は、可搬記憶媒体が有しており、
前記可搬記憶媒体がユーザによって接続されると、当該可搬記憶媒体に記憶されたユーザ情報を読み出すユーザ情報読出手段と、
前記ユーザ情報読出手段が読み出したユーザ情報と、前記音声入力手段とを対応付けた情報を、前記対応情報記憶手段に記憶させる対応情報制御手段と、
をさらに備えることを特徴とする請求項１に記載の議事録情報生成システム。 The user information storage means has a portable storage medium,
User information reading means for reading user information stored in the portable storage medium when the portable storage medium is connected by a user;
Correspondence information control means for storing, in the correspondence information storage means, information in which the user information read by the user information reading means is associated with the voice input means;
The minutes information generation system according to claim 1, further comprising:

前記ユーザ情報記憶手段が記憶するユーザ情報は、当該ユーザ情報が示すユーザが発する音声の特徴を示す音声特徴情報を含み、
前記音声認識手段は、音声情報が入力された前記音声入力手段に対応付けられた前記音声特徴情報に基づいて前記音声情報をテキスト情報に変換する
ことを特徴とする請求項１または請求項２のいずれか１項に記載の議事録情報生成システム。 The user information stored by the user information storage means includes audio feature information indicating the characteristics of the audio emitted by the user indicated by the user information,
The voice recognition means converts the voice information into text information based on the voice feature information associated with the voice input means to which voice information has been input. Minutes information generation system given in any 1 paragraph.

前記対応情報記憶手段は、複数の前記音声入力手段のうちいずれかの音声入力手段と、複数の前記ユーザ情報のうちいずれかのユーザ情報とを一対一で対応付ける情報を記憶する
ことを特徴とする請求項１から請求項３のうちいずれか１項に記載の議事録情報生成システム。 The correspondence information storage means stores one-to-one correspondence between any one of the plurality of voice input means and one of the plurality of user information. The minutes information generation system according to any one of claims 1 to 3.

前記音声入力手段に入力される音声情報を記憶する音声情報記憶手段をさらに備え、
前記議事録情報生成手段は、前記音声情報記憶手段から読み出した音声情報に基づいて、前記議事録情報を生成する
ことを特徴とする請求項１から請求項４までのいずれか１項に記載の議事録情報生成システム。 Voice information storage means for storing voice information input to the voice input means;
The said minutes information production | generation means produces | generates the said minutes information based on the audio | voice information read from the said audio | voice information storage means. The any one of Claim 1 to 4 characterized by the above-mentioned. Minutes information generation system.

会議参加ユーザ、会議場所、及び会議時間を示す情報である会議情報の入力を受け付ける会議情報入力手段をさらに備え、
前記議事録情報生成手段が生成する議事録情報に、前記会議情報を付加する
ことを特徴とする請求項１から請求項５までのいずれか１項に記載の議事録情報生成システム。 A conference information input means for receiving input of conference information that is information indicating a conference participant, a conference location, and a conference time;
The minutes information generation system according to any one of claims 1 to 5, wherein the meeting information is added to the minutes information generated by the minutes information generation means.

音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成方法であって、
ユーザ情報記憶手段が、前記ユーザを識別する情報を含むユーザ情報を予め記憶するステップと、
音声入力手段が、ユーザから入力される音声を音声情報に変換するステップと、
対応情報記憶手段が、前記ユーザが利用する前記音声入力手段と、前記ユーザ情報記憶手段に記憶された前記ユーザのユーザ情報とを対応付けて記憶するステップと、
対応情報検出手段が、前記対応情報記憶手段から、前記音声が入力された前記音声入力手段に対応する前記ユーザ情報を検出するステップと、
議事録情報生成手段が、前記音声情報が入力された前記音声入力手段に対応するユーザ情報を前記対応情報検出手段によって検出し、検出された当該ユーザ情報と、当該音声情報が前記音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成するステップと、
を備えることを特徴とする議事録情報生成方法。 A minutes information generating method for generating minutes information using a voice recognition means for converting voice information into text information,
A user information storage means for storing user information including information for identifying the user in advance;
Voice input means for converting voice input from the user into voice information;
A correspondence information storage means that stores the voice input means used by the user in association with the user information of the user stored in the user information storage means;
Corresponding information detecting means detects the user information corresponding to the voice input means to which the voice is input from the correspondence information storage means;
Minutes information generation means detects user information corresponding to the voice input means to which the voice information is input by the correspondence information detection means, and the detected user information and the voice information are detected by the voice recognition means. Generating minutes information that is information that is associated with text information that is text-converted information;
The minutes information generation method characterized by comprising.

ユーザから入力される音声を音声情報に変換する音声入力手段を備え、音声情報をテキスト情報にテキスト変換する音声認識手段を用いて議事録情報を生成する議事録情報生成装置としてのコンピュータに、
ユーザ情報記憶手段が、前記ユーザを識別する情報を含むユーザ情報を予め記憶するステップと、
対応情報記憶手段が、前記ユーザが利用する前記音声入力手段と、前記ユーザ情報記憶手段に記憶された前記ユーザのユーザ情報とを対応付けて記憶するステップと、
対応情報検出手段が、前記対応情報記憶手段から、前記音声が入力された前記音声入力手段に対応する前記ユーザ情報を検出するステップと、
議事録情報生成手段が、前記音声情報が入力された前記音声入力手段に対応するユーザ情報を前記対応情報検出手段によって検出し、検出された当該ユーザ情報と、当該音声情報が前記音声認識手段によってテキスト変換された情報であるテキスト情報とを対応づけた情報である議事録情報を生成するステップと、
を実行させるための議事録情報生成プログラム。 A computer as a minutes information generating device that includes voice input means for converting voice input from a user into voice information, and generates minutes information using voice recognition means for converting voice information into text information,
A user information storage means for storing user information including information for identifying the user in advance;
A correspondence information storage means that stores the voice input means used by the user in association with the user information of the user stored in the user information storage means;
Corresponding information detecting means detects the user information corresponding to the voice input means to which the voice is input from the correspondence information storage means;
Minutes information generation means detects user information corresponding to the voice input means to which the voice information is input by the correspondence information detection means, and the detected user information and the voice information are detected by the voice recognition means. Generating minutes information that is information that is associated with text information that is text-converted information;
Minutes information generation program to execute