JP2020003911A

JP2020003911A - Information processing apparatus, control method, and program

Info

Publication number: JP2020003911A
Application number: JP2018120667A
Authority: JP
Inventors: 下郡山　敬己; Itsuki Shimokooriyama; 敬己下郡山
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2020-01-09
Anticipated expiration: 2038-06-26
Also published as: JP7189416B2

Abstract

To provide a technology that supports specifying a corresponding text from texts already acquired by user input.SOLUTION: An information processing apparatus includes acquisition means of acquiring a text by user input, and storage means of storing the text acquired by the acquisition means. The information processing apparatus comprises: specifying means of specifying a text corresponding to the acquired text out of the texts stored in the storage mans, in accordance with similarity between the text acquired by the acquisition means and the texts stored in the storage means; and display means of displaying the acquired text in association with the specified text.SELECTED DRAWING: Figure 3

Description

本発明は、取得したテキストを関連するテキストに対応づけることを支援する技術に関する。 The present invention relates to a technology that supports associating an acquired text with a related text.

従来から、複数の情報処理装置において入力された情報を文字列として時系列的に表示する技術がある。 2. Description of the Related Art Conventionally, there is a technique of displaying information input in a plurality of information processing apparatuses as a character string in a time-series manner.

例えば会議システムにおいては、音声による発話が音声認識によってテキストに変換され発話者以外の情報処理装置に接続された表示装置にて維持される技術がある。前記システムは、ろう者の会議参加を支援する目的などで使用されるが、逆に発話が苦手なろう者の発言を可能とするため、キーボードからの入力も可能である。すなわち音声による発話者と、ろう者によるキーボードからの入力が同時並行的に行われる場合がある。 For example, in a conference system, there is a technology in which speech utterance is converted into text by speech recognition and is maintained on a display device connected to an information processing device other than the speaker. The above system is used for the purpose of assisting a deaf person to participate in a conference, and on the other hand, to enable a deaf person who is not good at speaking to speak, input from a keyboard is also possible. That is, there is a case where a voice speaker and a deaf person input from a keyboard are simultaneously performed.

しかしながら、例えば音声認識による会議システムにおいては、ろう者が他の発話者の発話内容を理解した後で、意見や質問を行う。このとき他の聴者が完全にリアルタイムで話者の発話を聞くのと違い、音声認識にかかる時間など発話内容の理解が遅れる場合もある。また一般的に、音声による発話よりもキーボード入力の方が、時間がかかる。これらの結果、ろう者の意見や質問は、もととなる発言から遅れたタイミングで他者に提示されることになり対応付けが分かりにくくなる。 However, in a conference system based on voice recognition, for example, a deaf person makes an opinion or asks a question after understanding the content of the utterance of another speaker. At this time, unlike other listeners who hear the speaker's utterance completely in real time, the understanding of the utterance content such as the time required for voice recognition may be delayed. In general, keyboard input takes longer than utterance by voice. As a result, opinions and questions of the deaf are presented to others at a timing delayed from the original remark, making it difficult to understand the correspondence.

その他、インターネット上のソーシャルネットワークシステムの普及率が上がり、多数のユーザが入力画面において対話することが多くなった。この場合もあるユーザが他のユーザに応答している間に短時間で話題が変化し、当該応答がどの話題に対するものであるか対応付けが分かりにくくなる場合がある。 In addition, the spread of social network systems on the Internet has increased, and many users have often interacted on an input screen. In this case, the topic may change in a short time while a certain user is responding to another user, and it may be difficult to understand the correspondence to which topic the response is.

この問題に対して、例えば特許文献１はユーザがコメントしようとしている前のコメントを予め指定することで、関連するコメントを階層的に表示し、コメント間の関係を分かりやすくする技術を提供している。 To solve this problem, for example, Patent Literature 1 provides a technique in which a user designates a comment before a comment is made in advance, thereby displaying related comments in a hierarchical manner to make the relationship between the comments easy to understand. I have.

また特許文献２は、発言者のイメージを画面上の特定位置に配置し、その発言者の入力は対応するイメージの横に時系列的に表示することで発言者の発言順序が解りやすく、より臨場感のあるチャット機能を実現する技術を提供している。 In Patent Document 2, the image of the speaker is arranged at a specific position on the screen, and the input of the speaker is displayed in chronological order beside the corresponding image, so that the order of the speaker's statement can be easily understood. We provide technology to realize a realistic chat function.

特開２００２−１６３２１９号公報JP 2002-163219 A 特開２００２−２８８１０２号公報JP 2002-288102 A

しかしながら特許文献１においては、ユーザは自分がこれから入力しようとするコメント（例えば質問）が、前の何れのコメント（説明）に対応するものであるかを指定する必要がある。これは説明に対してすぐに質問するのであれば容易である。しかし会議の場などでは、引き続く説明をある程度聞いた後、やはり自分の知りたい内容が含まれていない場合に質問することが多い。 However, in Patent Literature 1, the user needs to specify which of the preceding comments (explanations) the comment (for example, a question) that he or she is about to enter corresponds to. This is easy if you ask a question immediately for the explanation. However, in a meeting or the like, after listening to the following explanation to a certain extent, the user often asks a question when the content that he or she wants to know is not included.

その場合、ユーザは質問したい説明をある程度遡って確認する必要が発生することになる。あるいは、とりあえず自分が質問を入力する時点での最後の説明に対応づけておき、正確に対応する説明の位置は、後で回答すべき人が遡って確認しなければならないという問題が発生する。 In this case, the user needs to confirm the explanation to ask a question to some extent. Alternatively, there is a problem that the last explanation at the time when the user inputs the question is associated with the question for the time being, and the position of the corresponding explanation must be confirmed by the person who should answer later.

また特許文献２においては、ユーザ１人々々の発言は時系列的に分かりやすくなるものの複数のユーザの同一のトピックに対する発言の関連性が時系列的に分かりやすくなるとは限らない。特に会議等においては問題になる。 Further, in Patent Literature 2, the remarks of one user are easily understood in chronological order, but the relevance of a plurality of users' utterances to the same topic is not always easily understood in chronological order. This is particularly problematic in meetings and the like.

本発明の目的は、前記の問題に鑑み、ユーザの入力により取得したテキストに対して、既に取得しているテキストから対応するテキストを特定することを支援する技術を提供することである。 An object of the present invention is to provide a technique for supporting the identification of a text obtained by a user's input from a text already obtained, in view of the above problem.

本発明は、ユーザの入力によるテキストを取得する取得手段と、前記取得手段で取得したテキストを記憶する記憶手段とを備える情報処理装置であって、前記取得手段で取得したテキストと、前記記憶手段に記憶されているテキストとの類似度に従って、前記記憶手段に記憶されているテキストのうち、前記取得したテキストに対応するテキストを特定する特定手段と、前記取得したテキストを前記特定されたテキストに対応付けて表示する表示手段とを備えることを特徴とする。 The present invention is an information processing apparatus including an acquisition unit that acquires a text input by a user, and a storage unit that stores the text acquired by the acquisition unit, wherein the text acquired by the acquisition unit includes: Specifying means for specifying a text corresponding to the obtained text among the texts stored in the storage means according to the similarity to the text stored in the storage means, and converting the obtained text to the specified text. Display means for displaying in association with each other.

本発明により、ユーザの入力により取得したテキストに対して、既に取得しているテキストから対応するテキストを特定することを支援する技術を提供することが可能となる。
なお本発明は、前述した「ろう者の会議参加を支援する音声認識システム」を例とするが、当該システムに限定するものではなく、複数の発言がほぼ同時に入力／閲覧されるシステムであって、それらの発言が関係を持つ可能性がある場合に適用可能なものである。 Advantageous Effects of Invention According to the present invention, it is possible to provide a technology that supports specifying a text corresponding to a text acquired by a user input from texts already acquired.
Although the present invention takes the above-described "voice recognition system for assisting a deaf person to participate in a conference" as an example, the present invention is not limited to this system. Is applicable when there is a possibility that those statements have a relationship.

本発明の実施形態に係るシステム構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a system configuration according to an embodiment of the present invention. 本発明の実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment of the present invention. 本発明の実施形態に係る機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration according to the embodiment of the present invention. 本発明の実施形態を説明するための発話の一例を示す図である。It is a figure showing an example of an utterance for explaining an embodiment of the present invention. 本発明の実施形態に係る処理を説明するフローチャートの一例である。5 is an example of a flowchart illustrating a process according to the embodiment of the present invention. 本発明の実施形態に係るキーボードから入力した文字列を挿入する位置を特定する処理を説明するフローチャートの一例である。5 is an example of a flowchart illustrating a process of specifying a position where a character string input from a keyboard is to be inserted according to the embodiment of the present invention. 本発明の実施形態に係るキーボードから文字列を入力する画面の一例を示す図である。It is a figure showing an example of a screen which inputs a character string from the keyboard concerning an embodiment of the present invention. 本発明の実施形態に係る他の発話群から関連する発話候補の抽出を説明するための図である。It is a figure for explaining extraction of a relevant utterance candidate from other utterance groups concerning an embodiment of the present invention. 本発明の実施形態に係る関連する発話を抽出する際の制限事項を説明するための図である。It is a figure for explaining restrictions at the time of extracting a relevant utterance concerning an embodiment of the present invention. 本発明の実施形態に係る関連する発話が複数抽出された場合のユーザによる選択を説明するための図である。It is a figure for explaining selection by a user when a plurality of related utterances concerning an embodiment of the present invention are extracted. 本発明の実施形態に係る関連する発話が選択された結果の画面の一例を示す図である。It is a figure showing an example of a screen of the result of having selected a related utterance concerning an embodiment of the present invention.

以下、本発明の実施の形態を、図面を参照して詳細に説明する。
図１は、本発明の実施形態に係るシステム構成の一例を示す図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram illustrating an example of a system configuration according to the embodiment of the present invention.

＜システム構成例１＞
本発明の実施形態に拘わるシステムは、音声認識サーバ１０１、情報処理端末１０２（発話者用１０２ａ、読者用／キーボード入力１０２ｂ、校正者用１０２ｃとする）で構成される。ユーザは情報処理端末１０２ａに接続されたマイク１０４で音声を入力する。情報処理端末１０２ａは、前記音声を音声認識サーバ１０１に送信して文字列に変換し情報処理端末１０２ａ〜ｃに送り、情報処理端末１０２ａ〜ｃで表示、ユーザに提示する。すなわち、情報処理端末１０２ａ〜ｃは、音声の入力と文字列の出力の入出力双方を兼ね備えていてもよい。ここで出力される情報処理端末１０２においては、後述する読者用１０２ｂと校正者用１０２ｃが兼ねられていてもよいし、またそれぞれ専用の情報処理端末であってもよい。また出力は情報処理端末１０２に接続された表示装置上に対して行うが、プロジェクタなどを用いた構成も、本発明の実施形態に拘わるシステム構成とする。プロジェクタを使う場合であれば、情報処理端末１０２は発話者用の一台のみで、当該情報処理端末１０２ａに接続したプロジェクタからスクリーンに表示した音声認識結果の文字列を読者全員が読んでもよい。その場合、発話者用の前記情報処理端末１０２ａで直接、発話者自身あるいは別のユーザが校正者として誤認識を校正してもよい。 <System configuration example 1>
The system according to the embodiment of the present invention includes a speech recognition server 101 and an information processing terminal 102 (referred to as a speaker 102a, a reader / keyboard input 102b, and a proofreader 102c). The user inputs voice with the microphone 104 connected to the information processing terminal 102a. The information processing terminal 102a transmits the voice to the voice recognition server 101, converts the voice into a character string, sends the character string to the information processing terminals 102a to 102c, displays the information on the information processing terminals 102a to 102c, and presents it to the user. That is, the information processing terminals 102a to 102c may have both input and output of voice and output of a character string. The information processing terminal 102 output here may serve as both a reader 102b and a proofreader 102c, which will be described later, or may be dedicated information processing terminals. The output is performed on a display device connected to the information processing terminal 102, but a configuration using a projector or the like is also a system configuration according to the embodiment of the present invention. If a projector is used, the information processing terminal 102 may be only one for the speaker, and all readers may read the character string of the voice recognition result displayed on the screen from the projector connected to the information processing terminal 102a. In this case, the speaker himself or another user may directly correct the erroneous recognition as the proofreader at the information processing terminal 102a for the utterer.

さらに音声認識サーバ１０１は、クラウド上に存在するものであってもよく、その場合には、本システムのユーザは後述する音声認識サーバ１０１上の機能を、クラウドサービスする形態であってもよい。これらのサービスを利用する形態であっても、本発明の実施形態に拘わるシステム構成とする。 Further, the voice recognition server 101 may exist in the cloud, and in that case, the user of the present system may provide a function of the voice recognition server 101 described later in a cloud service. Even when these services are used, the system configuration according to the embodiment of the present invention is used.

＜システム構成例２＞
構成例１で説明した情報処理端末１０２ａ〜ｃは、入出力を兼ね備えていたが、入力専用、出力専用と分かれていてもよい。 <System configuration example 2>
Although the information processing terminals 102a to 102c described in the configuration example 1 have both input and output, they may be divided into input only and output only.

＜システム構成例３＞
音声認識サーバ１０１と情報処理端末１０２ａ〜ｃは同一筐体であってもよい。すなわち、図１における情報処理端末１０２ａ〜ｃのうちの１つに音声認識可能なソフトウェアがインストールされていて、音声認識サーバ１０１を兼ねていてもよい。 <System configuration example 3>
The voice recognition server 101 and the information processing terminals 102a to 102c may have the same housing. That is, software capable of voice recognition may be installed in one of the information processing terminals 102 a to 102 c in FIG. 1 and may also serve as the voice recognition server 101.

＜システム構成例４＞
前述のシステム構成例１〜３に音声認識サーバ１０１は例であり、例えばＳＮＳサーバであってもよい。この場合、情報処理端末１０２は、ＳＮＳクライアントの端末となる。その他考えられるいかなるシステム、即ち複数のユーザがコミュニケーションを取るためのいかなるシステムであっても本願発明の請求項は、これらのシステムを含むものとする。 <System configuration example 4>
The speech recognition server 101 is an example in the above system configuration examples 1 to 3, and may be, for example, an SNS server. In this case, the information processing terminal 102 is an SNS client terminal. The claims of the present invention include any other conceivable system, i.e., any system for communicating with a plurality of users.

図２は、本発明の実施形態に係る音声認識サーバ１０１、情報処理端末１０２ａ〜ｃに適用可能なハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram illustrating an example of a hardware configuration applicable to the speech recognition server 101 and the information processing terminals 102a to 102c according to the embodiment of the present invention.

図２に示すように、音声認識サーバ１０１、情報処理端末１０２ａ〜ｃは、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、通信Ｉ／Ｆコントローラ２０８等が接続された構成を採る。
ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 As shown in FIG. 2, the speech recognition server 101 and the information processing terminals 102 a to 102 c include a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, The input controller 205, the video controller 206, the memory controller 207, the communication I / F controller 208, and the like are connected.
The CPU 201 generally controls each device and controller connected to the system bus 204.

また、ＲＯＭ２０３あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、各サーバあるいは各ＰＣが実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。また、本発明を実施するために必要な情報が記憶されている。なお外部メモリはデータベースであってもよい。 The ROM 203 or the external memory 211 includes a basic input / output system (BIOS) or an operating system (OS), which is a control program of the CPU 201, and a function to be executed by each server or PC, which will be described later. Various programs and the like are stored. Further, information necessary for implementing the present invention is stored. The external memory may be a database.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０３あるいは外部メモリ２１１からＲＡＭ２０２にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 202 functions as a main memory, a work area, and the like for the CPU 201. The CPU 201 loads various programs and the like necessary for executing processing from the ROM 203 or the external memory 211 to the RAM 202, and realizes various operations by executing the loaded programs.

また、入力コントローラ２０５は、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。 The input controller 205 controls input from a keyboard (KB) 209 or a pointing device such as a mouse (not shown).

ビデオコントローラ２０６は、ディスプレイ２１０等の表示器への表示を制御する。尚、表示器は液晶ディスプレイ等の表示器でもよい。これらは、必要に応じて管理者が使用する。 The video controller 206 controls display on a display such as the display 210. The display may be a display such as a liquid crystal display. These are used by the administrator as needed.

メモリコントローラ２０７は、ブートプログラム、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、あるいは、ＰＣＭＣＩＡ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒＭｅｍｏｒｙＣａｒｄＩｎｔｅｒｎａｔｉｏｎａｌＡｓｓｏｃｉａｔｉｏｎ）カードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 The memory controller 207 includes an external storage device (hard disk (HD)) that stores a boot program, various applications, font data, user files, edit files, various data, and the like, a flexible disk (FD), or a PCMCIA (Personal Computer). Access to an external memory 211 such as a CompactFlash (registered trademark) memory connected to a memory card (International Association) card slot via an adapter is controlled.

通信Ｉ／Ｆコントローラ２０８は、ネットワークを介して外部機器と接続・通信し、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）を用いた通信等が可能である。 The communication I / F controller 208 connects to and communicates with external devices via a network, and executes communication control processing on the network. For example, communication using TCP / IP (Transmission Control Protocol / Internet Protocol) or the like is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０２内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上に表示することが可能である。また、ＣＰＵ２０１は、ディスプレイ２１０上のマウスカーソル（図示しない）等によるユーザ指示を可能とする。 Note that the CPU 201 can display an image on the display 210 by executing, for example, a process of developing (rasterizing) an outline font in a display information area in the RAM 202. Further, the CPU 201 enables a user instruction by a mouse cursor (not shown) on the display 210 or the like.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０２にロードされることによりＣＰＵ２０１によって実行されるものである。
図３は、本発明の実施形態に係る機能構成の一例を示す図である。 Various programs described below for realizing the present invention are recorded in the external memory 211 and are executed by the CPU 201 by being loaded into the RAM 202 as needed.
FIG. 3 is a diagram illustrating an example of a functional configuration according to the embodiment of the present invention.

なお、情報処理端末１０２は、発話者用１０２ａ、読者用／キーボード入力用１０２ｂ、校正者用１０２ｃの機能をそれぞれ別々の端末に持っても、共通した端末で持ってもよいので、ここではそれぞれを区別せずに説明する。 Note that the information processing terminal 102 may have the functions of the speaker 102a, the reader / keyboard input 102b, and the proofreader 102c on separate terminals or on a common terminal. Will be described without distinction.

また以下の説明では特に区別が必要な場合を除き、音声認識により入力された文字列、キーボードを用いて入力された文字列を「発話」と呼ぶことにする。あくまで便宜上の言葉であり、アプリケーションによる呼び方（メッセージ、コメント、投稿など）と区別するものではない。 In the following description, a character string input by voice recognition and a character string input using a keyboard will be referred to as “utterance” unless it is particularly necessary to distinguish them. It is a word for convenience only, and does not distinguish it from the way the application calls it (messages, comments, posts, etc.).

音声取得部３１１は、情報処理端末１０２が内蔵している、あるいは接続されたマイクなどから話者の音声による発話を音声データとして入力し、音声データ送信部３１２により音声認識サーバ１０１に送信する。 The voice acquisition unit 311 inputs, as voice data, an utterance of a speaker's voice from a microphone or the like built in or connected to the information processing terminal 102, and transmits the voice data to the voice recognition server 101 by the voice data transmission unit 312.

音声認識サーバ１０１は、音声データ受信部３２１で受信した音声データを音声認識部３２２に渡して音声データを文字列に変換し、当該文字列を認識結果送信部３２３により情報処理端末１０２に認識結果として送り返す。また、前述の認識結果を認識結果管理部３２４により認識結果記憶部３２０に格納する。 The speech recognition server 101 passes the speech data received by the speech data receiving unit 321 to the speech recognition unit 322, converts the speech data into a character string, and transmits the character string to the information processing terminal 102 by the recognition result transmitting unit 323. And send it back. The recognition result is stored in the recognition result storage unit 320 by the recognition result management unit 324.

情報処理端末１０２は、前記文字列を認識結果受信部３１３にて受信し、表示部３１４により表示することで読者（情報処理端末１０２のユーザ）に提示する。 The information processing terminal 102 receives the character string by the recognition result receiving unit 313, and presents it to the reader (user of the information processing terminal 102) by displaying it on the display unit 314.

キーボード操作受付部３１５は、読者（例えばろう者）がキーボードから入力することで、発話の機会を提供する機能部である。 The keyboard operation receiving unit 315 is a functional unit that provides an opportunity for speech by a reader (for example, a deaf person) inputting from a keyboard.

前記キーボード入力結果は、情報処理端末１０２のキーボード入力情報送信部３１６により、音声認識サーバ１０１に送信され、音声認識サーバ１０１のキーボード入力情報受信部３２５が受信し、認識結果記憶部３２０に格納されている発話に関する情報を更新する。ただし、その時点で記憶されている既に格納されている他の認識結果の中のいずれの位置に挿入するかは未確定であり、挿入位置は後述の処理で決定される。 The keyboard input result is transmitted to the speech recognition server 101 by the keyboard input information transmission unit 316 of the information processing terminal 102, received by the keyboard input information reception unit 325 of the speech recognition server 101, and stored in the recognition result storage unit 320. Update information about utterances that are present. However, it is undetermined at which position in the other recognition results already stored at that time to insert, and the insertion position is determined by the processing described later.

関連づけ処理部３２６は、ユーザがキーボードから入力した発話を挿入する位置を特定するための機能部である。説明として音声認識による会議支援を例示しているので、例えば音声認識により得られている発話や、他のユーザがキーボードから入力した発話など、発話全体から、前記ユーザがキーボードから入力した発話がどの位置の発話に関連するものであるかを特定する。 The association processing unit 326 is a functional unit for specifying the position where the utterance input by the user from the keyboard is to be inserted. As an example of the conference support by voice recognition as an explanation, the utterance input by the user from the keyboard is represented by the utterance obtained by voice recognition or the utterance input by another user from the keyboard. Specify whether it is related to the utterance of the position.

関連候補送信部３２７は、前記関連づけ処理部３２６で特定したキーボード入力された発話の関連する位置を情報処理端末１０２に送信する機能部である。複数の位置が特定された場合には、ユーザに選択させるべく前記複数の位置情報を送信する。情報処理端末１０２の関連候補受信部３１７は、これら発話の位置情報を受信し、関連候補選択・送信部３１８にてユーザに提示、選択させ、その結果を音声認識サーバ１０１の選択情報受信部３２８に送信する。 The related candidate transmitting unit 327 is a functional unit that transmits the related position of the utterance input by the keyboard specified by the association processing unit 326 to the information processing terminal 102. When a plurality of positions are specified, the plurality of position information is transmitted so that the user can select the plurality of positions. The related candidate receiving unit 317 of the information processing terminal 102 receives the position information of these utterances, presents it to the user at the related candidate selecting / transmitting unit 318, and makes the user select the result. The result is the selection information receiving unit 328 of the voice recognition server 101. Send to

前記選択情報受信部３２８は、受信した情報に基づき、前記キーボードから入力された発話の挿入位置により、認識結果記憶部３２０の発話に関する情報（この場合は挿入位置）を更新する。 Based on the received information, the selection information receiving unit 328 updates the information (in this case, the insertion position) regarding the utterance in the recognition result storage unit 320 with the insertion position of the utterance input from the keyboard.

図４は、本発明の実施形態を説明するための発話の一例を示す図である。この図では一人が声による発言をして音声認識を用いて文字列に変換して表示している。キーボードからの発話はまだない状態である。これはあくまで例であって、複数の人の音声による発話、キーボード入力による発話が既に混在していてもよい。 FIG. 4 is a diagram showing an example of an utterance for explaining the embodiment of the present invention. In this figure, one person makes a voice utterance and converts it into a character string using voice recognition for display. There is no utterance from the keyboard yet. This is merely an example, and utterances by voices of a plurality of persons and utterances by keyboard input may already be mixed.

図４の例では、発話は１〜１６に区切られている。通常、音声認識においては一定時間音声の入力が途切れた場合などに発話を区切っていく。これは音声認識に関する周知の技術であり、また本発明の本質とは関係がないため詳細の説明を割愛する。 In the example of FIG. 4, the utterance is divided into 1 to 16. Normally, in speech recognition, utterances are separated when the input of speech is interrupted for a certain period of time. This is a well-known technique relating to speech recognition, and has no relation to the essence of the present invention, so that detailed description is omitted.

内容としては、ある会社の会議で、代表者が業績についての説明を行っている。その中で、商品Ａ〜Ｃ、売上げ、商戦などの用語が多く含まれている。これにより例えば、発話１〜３、８〜９、１３〜１５が商品Ａの売上げについて発話している部分であると認識し、それ以外の部分では話題が異なる、ということが分析可能である。文章の話題の変化を判定する技術は周知の技術であり、特開２０１６−０４０６６０号公報、特開２０１８−０４９４７８号公報などにも記載されているため詳細の説明は割愛する。 At a company meeting, representatives explain their performance. Among them, many terms such as products A to C, sales, and sales are included. Thus, for example, it is possible to analyze that the utterances 1 to 3, 8 to 9, and 13 to 15 are parts that are speaking about the sales of the product A, and that the topics are different in other parts. The technique of determining a change in the topic of a sentence is a well-known technique, and is described in JP-A-2006-040660, JP-A-2018-49478, and the like, and thus the detailed description is omitted.

図５は、本発明の実施形態に係る処理を説明するフローチャートの一例である。図５のフローチャートの各ステップは、音声認識サーバ１０１上のＣＰＵ２０１、および、情報処理端末１０２ａ〜ｃ上のＣＰＵ２０１で実行される。 FIG. 5 is an example of a flowchart illustrating processing according to the embodiment of the present invention. Each step of the flowchart in FIG. 5 is executed by the CPU 201 on the speech recognition server 101 and the CPU 201 on the information processing terminals 102a to 102c.

ステップＳ５０１においては、情報処理端末１０２ａに接続されたマイクなどを通して発話者の発話を受け付け、音声データに変換する。 In step S501, the utterance of the speaker is received through a microphone or the like connected to the information processing terminal 102a, and is converted into voice data.

ステップＳ５０２においては、情報処理端末１０２ａは、前記音声データを音声認識サーバ１０１に送信し、ステップＳ５０３により音声認識サーバ１０１にて受信する。 In step S502, the information processing terminal 102a transmits the voice data to the voice recognition server 101, and the voice data is received by the voice recognition server 101 in step S503.

ステップＳ５０４においては、音声認識サーバ１０１は、前記音声データにおける発話者の発話を音声認識により文字列に変換する。 In step S504, the voice recognition server 101 converts the utterance of the speaker in the voice data into a character string by voice recognition.

ステップＳ５０５においては、音声認識サーバ１０１は、ステップＳ５０４における変換結果の文字列を情報処理端末１０２ａに送信する。システム内に複数の情報処理端末１０２が接続されている場合には、発話を入力した情報処理端末１０２ａのみではなく全ての情報処理端末１０２に前記文字列を送信する。発話者が使用し音声データを入力した情報処理端末１０２ａに対しても発話者自身が音声認識結果を確認するため送信してもよい。 In step S505, the speech recognition server 101 transmits the character string resulting from the conversion in step S504 to the information processing terminal 102a. When a plurality of information processing terminals 102 are connected in the system, the character string is transmitted not only to the information processing terminal 102a that has input the utterance but also to all the information processing terminals 102. The speaker may also transmit the information to the information processing terminal 102a to which the speaker has used and input the voice data in order to confirm the voice recognition result.

ステップＳ５０６においては、情報処理端末１０２は、前記文字列を受信し、発話者／読者に発話の時系列順に提示する。 In step S506, the information processing terminal 102 receives the character string and presents it to the speaker / reader in the chronological order of the utterance.

ステップＳ５０７においては、音声認識サーバ１０１は、音声認識の結果を認識結果記憶部３２０に格納する。 In step S507, the speech recognition server 101 stores the result of speech recognition in the recognition result storage unit 320.

ステップＳ５０８においては、情報処理端末１０２のユーザ（例えばろう者）が、他者の発話（図４など）を見て、キーボード入力により発話しようとする場合の処理を受け付ける。ステップＳ５０８で入力された発話は、音声認識サーバ１０１に送られ、ステップＳ５０９により関連する発話位置（図４の発話１〜３）に前記キーボード入力による発話の文字列を挿入する。ステップＳ５０８、ステップＳ５０９の詳細は、図６のフローチャートと図７〜１０の画面（情報処理端末１０２側）などの例を用いて後述する。 In step S508, the user of the information processing terminal 102 (for example, a deaf person) sees the utterance of another person (eg, FIG. 4) and accepts a process in which the user tries to utter by keyboard input. The utterance input in step S508 is sent to the voice recognition server 101, and the character string of the utterance input by the keyboard is inserted into the relevant utterance position (utterances 1 to 3 in FIG. 4) in step S509. Details of step S508 and step S509 will be described later with reference to the flowchart of FIG. 6 and examples of the screens (the information processing terminal 102 side) of FIGS.

ステップＳ５１０においては、前記キーボード入力による発話を挿入した結果を、情報処理端末１０２に送信する。ステップＳ５１１においては、ステップＳ５１０から送信された情報を受信し、情報処理端末１０２の表示装置によりユーザに提示する。
以上で、図５のフローチャートを用いた説明を完了する。 In step S510, the result of inserting the utterance by the keyboard input is transmitted to the information processing terminal 102. In step S511, the information transmitted from step S510 is received and presented to the user on the display device of information processing terminal 102.
This is the end of the description using the flowchart in FIG.

図６は、本発明の実施形態に係るキーボードから入力した文字列を挿入する位置を特定する処理を説明するフローチャートの一例を示す図である。図６のフローチャートの各ステップは、音声認識サーバ１０１上のＣＰＵ２０１、および、情報処理端末１０２ｂ上のＣＰＵ２０１で実行される。 FIG. 6 is a diagram illustrating an example of a flowchart illustrating a process of specifying a position to insert a character string input from the keyboard according to the embodiment of the present invention. Each step of the flowchart in FIG. 6 is executed by the CPU 201 on the speech recognition server 101 and the CPU 201 on the information processing terminal 102b.

フローチャートの音声認識サーバ１０１（左側の処理）は、図５のステップＳ５０９、情報処理端末１０２ｂ（右側の処理）は、図５のステップＳ５０８に対応する。 The voice recognition server 101 (the process on the left) in the flowchart corresponds to step S509 in FIG. 5, and the information processing terminal 102b (the process on the right) corresponds to step S508 in FIG.

ステップＳ６２１においては、情報処理端末１０２ｂ（キーボード入力用）が、ユーザがキーボード入力を開始する操作を受け付ける。具体的な例としては、図７の発話表示画面（図４の発話を実際に表示した画面）のキー入力開始ボタン７０１をユーザが押下する操作を受け付けると、キー入力画面７０２を表示し、同時にキー入力が開始された旨を音声認識サーバ１０１に通知する。 In step S621, the information processing terminal 102b (for keyboard input) accepts an operation by the user to start keyboard input. As a specific example, when the user receives an operation of pressing the key input start button 701 on the utterance display screen in FIG. 7 (the screen in which the utterance in FIG. 4 is actually displayed), the key input screen 702 is displayed, and The voice recognition server 101 is notified that the key input has been started.

ステップＳ６０２においては、音声認識サーバ１０１が情報処理端末１０２ｂにおいてキーボード入力を開始した旨を受け付ける。この処理は、キーボード入力を開始した時点を時刻、音声認識および他のユーザのキーボード入力による発話との位置関係を記憶することにより、当該キーボード入力の挿入位置を管理するものであり、詳細は後述する。 In step S602, the voice recognition server 101 accepts the start of keyboard input in the information processing terminal 102b. This processing manages the insertion position of the keyboard input by storing the time when the keyboard input is started and the positional relationship between the time, the voice recognition, and the utterance by the keyboard input of another user, and details will be described later. I do.

ステップＳ６２２においては、図７のキー入力画面７０２でユーザのキーボード入力を受け付ける。例として図４の発話位置２の発話に対する質問が入力されている。質問の入力後、ユーザの入力完了ボタン７０３押下を受け付け、入力が完了した旨の通知を音声認識サーバ１０１に送信する。 In step S622, the keyboard input of the user is accepted on the key input screen 702 of FIG. As an example, a question regarding the utterance at the utterance position 2 in FIG. 4 is input. After the input of the question, the input completion button 703 of the user is accepted, and a notification that the input is completed is transmitted to the speech recognition server 101.

ステップＳ６０２においては、ステップＳ６２２から入力完了の通知と入力された文字列（発話）を受信する。 In step S602, notification of input completion and the input character string (utterance) are received from step S622.

ステップＳ６０３においては、当該発話の挿入位置を探す範囲、すなわち発話の起点と終点を決定する。このことを詳細に説明する。 In step S603, the range for searching the insertion position of the utterance, that is, the start and end points of the utterance is determined. This will be described in detail.

ステップＳ６０２で受信した発話の挿入位置を後述の処理（ステップＳ６０４〜Ｓ６１１）で決定するために、既に登録されている発話のどの範囲を類似の発言の検索対象とするかを決定する必要がある。具体的な起点の決定方法としては、例えば、ステップＳ６０１でキーボード入力が開始された通知を受信した時に完了していた他の最後の発話を起点としてもよい。あるいはステップＳ６０２で入力完了の通知を受信した時点で既に完了していた他の最後の発話を起点としてもよい。他の例として、そもそも発話をしたときはそれ以前の発話に対する意見や質問であろうから、ステップＳ６０１で入力開始の通知を受信した時点としてもよい。これらはあくまで例であり設計事項である。本説明では、例として入力完了した時点で既に登録されている最後の発話を起点とする。 In order to determine the insertion position of the utterance received in step S602 in the processing described later (steps S604 to S611), it is necessary to determine which range of the already registered utterance is to be searched for similar utterances. . As a specific method of determining the starting point, for example, the starting point may be another last utterance completed when the notification that the keyboard input has been started is received in step S601. Alternatively, another last utterance that has already been completed when the notification of input completion is received in step S602 may be used as the starting point. As another example, when the utterance is made in the first place, it may be an opinion or a question on the utterance before that, so the time may be the time when the input start notification is received in step S601. These are only examples and design matters. In this description, as an example, the last utterance already registered when the input is completed is set as the starting point.

終点については、それ以上前方に遡って挿入位置を探さないという境界位置である。終点については、例えば、特に境界を設けず既に登録されている先頭まで全てを検索対象としてもよい。あるいは文字数や発話の時間経過を用いて、例えば「１，０００文字以上は遡らない」、「実際の発話の時間としてキーボードでの入力完了を受け付けたときから３分以上前の発話は遡らない」としてもよい。あるいは図９を例として説明すると、発話全体が表示装置に収まるわけではなく、発話１〜７までは既に情報に隠れてしまいユーザがスクロールしない限りは閲覧できなくなっている。このような場合、表示装置に収まっている発話８〜１６までの範囲で類似の発言を検索するとしてもよい。 The end point is a boundary position at which the insertion position is not searched further forward. As for the end point, for example, it is also possible to search all the registered points up to the head without any particular boundary. Alternatively, using the number of characters and the lapse of time of the utterance, for example, "do not go back more than 1,000 characters", or "do not go back more than three minutes before the completion of inputting on the keyboard as the actual utterance time" It may be. Alternatively, referring to FIG. 9 as an example, the entire utterance does not fit on the display device, and utterances 1 to 7 are already hidden by information and cannot be viewed unless the user scrolls. In such a case, similar utterances may be searched in the range of utterances 8 to 16 that are contained in the display device.

以上のように、起点と終点を決定しておき、次のステップＳ６０４〜ステップＳ６１１の繰り返し処理を実行する。前述の通りこれはあくまで例であり、図６のフローチャートでは説明していないが、例えば何らかの条件で終点は動的に変更されてもよい。 As described above, the start point and the end point are determined, and the subsequent steps S604 to S611 are repeatedly executed. As described above, this is only an example, and is not described in the flowchart of FIG. 6, but the end point may be dynamically changed under some conditions, for example.

ステップＳ６０４からステップＳ６１１は、ステップＳ６０２で受信した発話を、発話（図４）の中のどの位置に挿入するかを判定するための繰り返し処理である。この繰り返し処理は発話を時系列に遡りながら実行するが、その起点をどこにするかは設計事項となる。後述の説明の中で具体例を幾つか提示する。 Steps S604 to S611 are repetitive processes for determining at which position in the utterance (FIG. 4) the utterance received in step S602 is to be inserted. This repetition processing is performed while uttering the speech in chronological order, and the starting point is a matter of design. Some specific examples will be presented in the following description.

ステップＳ６０５においては、次に類似度を計算しようとする登録済みの発話が、すでに終点を超えているか否かを判定する。具体的には、ステップＳ６０３で決定した終点の位置と比較する。超えていない場合にはステップＳ６０６に進む。超えている場合には繰り返し処理を抜けてステップＳ６１２に進む。 In step S605, it is determined whether or not the registered utterance whose similarity is to be calculated next has already exceeded the end point. Specifically, comparison is made with the position of the end point determined in step S603. If not, the process proceeds to step S606. If it exceeds, the process exits the repetition process and proceeds to step S612.

ステップＳ６０６においては、発話が意図するトピック（主題）を判定するための範囲を設定する。具体的には、現在着目している登録済みの発話を単体で範囲としてもよい。あるいは、前の発話に遡りながらトピックが変わるところまでを探し、複数の発話で１つのものと考え、キーボードから入力された発話との類似度を計算してもよい（類似度の計算については後述）。また現在の繰り返し処理に入る前段階で、発言全体に対して予めトピックの境界を判定しておき、現在説明している例のように発話を１つずつ遡るのではなく、トピックの区切り毎に遡っていってもよい。逆に１つの発話に着目した時点で毎回、その発話を含んで同一のトピックとなる範囲を決定してもよい。 In step S606, a range for determining a topic (subject) intended by the utterance is set. Specifically, the registered utterances that are currently focused on may be set as a single range. Alternatively, it is possible to search for a point at which the topic changes while going back to the previous utterance, consider a plurality of utterances as one, and calculate the similarity with the utterance input from the keyboard (the calculation of the similarity will be described later). ). Also, before entering the current repetition process, the boundaries of topics are determined in advance for the entire utterance, and utterances are not traced back one by one as in the example currently described, but instead for each topic break. You may go back. Conversely, each time attention is focused on one utterance, the range of the same topic including that utterance may be determined.

トピックの境界（話題が切り替わった位置）を決定することは周知の技術であり、特開２００７−２４１９０２号公報、特開２００４−２３４５１２号公報などにも記載があるため詳細の説明は割愛する。例えば前述したように、図４の発話位置１〜３は各々１つのトピックとなる範囲を表しており、いずれも商品Ａの売上げに関する内容を含んでいるものである。 Determining the boundary of a topic (position at which the topic is switched) is a well-known technique, and is described in JP-A-2007-241902, JP-A-2004-234512, and the like, and therefore, detailed description is omitted. For example, as described above, the utterance positions 1 to 3 in FIG. 4 each represent a range that is one topic, and each includes the contents related to the sales of the product A.

ステップＳ６０７においては、ステップＳ６０６で範囲を決定した１つの発話（あるいは同一トピックとして決定した１組の発話）と、ステップＳ６０２で受信した発話の類似度を計算する。
類似度について図８を用いて説明する。 In step S607, the similarity between one utterance whose range has been determined in step S606 (or a set of utterances determined as the same topic) and the utterance received in step S602 is calculated.
The similarity will be described with reference to FIG.

＜類似度計算の例１＞
キーボード入力された発話には、発話位置１〜３と共通する言語的特徴（ここでは単語）が含まれている。「商品Ａ」、「商品Ｂ」、「クリスマス」、「商戦」という単語である。これらが一致する場合には１点を付与するものとする。この場合、発話位置１〜３はそれぞれ、２点、５点、３点となる。この中で同一の単語は発話位置２の文に一番多く含まれているため最も類似していると考えられる。 <Example 1 of similarity calculation>
The utterance input by the keyboard includes linguistic features (here, words) common to the utterance positions 1 to 3. The words are "commodity A", "commodity B", "Christmas", and "sales". If they match, one point will be given. In this case, the utterance positions 1 to 3 are 2, 5, and 3, respectively. Among these, the same word is considered to be the most similar because it is included most in the sentence at the utterance position 2.

＜類似度計算の例２＞
また「商品Ａ」、「商品Ｂ」などはこの組織において特有な固有名詞であるため、重みを上げる（例えば２点とする）ことで、更に類似度は高くなる。この場合、発話位置１〜３はそれぞれ、４点、８点、６点となる。 <Example 2 of similarity calculation>
Further, since “product A” and “product B” are proper nouns unique to this organization, the similarity is further increased by increasing the weight (for example, two points). In this case, the utterance positions 1 to 3 are 4, 8, and 6, respectively.

＜類似度計算の例３＞
前述の例１，２では文の長さ（文字列中の文字の数）が多くなればなるほど、同一の単語の数が出現する可能性は高くなることから、文の長さに応じてスコアを調整することも一般的に行われる。例えば単純に文字数で除してもよい。この場合、（最低点の発話位置１が前項と同じく４点になるよう１００倍して四捨五入すると）発言位置１〜３はそれぞれ、４点、１２点、５点となる。 <Similarity calculation example 3>
In Examples 1 and 2 described above, the longer the length of a sentence (the number of characters in a character string), the higher the possibility that the same number of words will appear. Is generally also adjusted. For example, it may be simply divided by the number of characters. In this case, the utterance positions 1 to 3 are 4 points, 12 points, and 5 points (when the lowest utterance position 1 is 100 points and rounded down to 4 points as in the previous section).

ここでは類似度の計算を単純化して説明したが、テキスト同士の類似度計算については様々な周知の技術があり、例えば特開２０１７−０９１３９９号公報、特開２０１７−１８８０３９公報などにも記載があるため詳細の説明は割愛する。 Here, the calculation of the similarity is described in a simplified manner, but there are various well-known techniques for calculating the similarity between the texts. For example, Japanese Patent Application Laid-Open Nos. 2017-091399 and 2017-188039 disclose the same. Therefore, detailed description is omitted.

また、後述のステップＳ６０９で、類似度がある閾値を超えたものをキーボード入力された発話に関連する発話として登録するが、閾値を超えるものがない場合に備えるため、ステップＳ６０７で類似度が最高点のものだけを記憶しておき、後で利用してもよい。 Further, in step S609 described later, those having a similarity exceeding a certain threshold are registered as utterances related to the utterance input by the keyboard. However, in case there is no utterance exceeding the threshold, in step S607, the highest similarity is registered in step S607. Only points may be stored and used later.

ステップＳ６０８においては、ステップＳ６０７で計算した類似度が、予め設定した閾値を超えたか否かを判断する。具体的には、設定ファイル（不図示）あるいはプログラム中に閾値を記憶しておき、その閾値との値の大小を比較する。「超えたか」と表現しているがこれは設計事項であり「閾値以上か」と等号を含めてもよい。またいずれかの計算式を用いて比較してもよい。 In step S608, it is determined whether or not the similarity calculated in step S607 has exceeded a preset threshold. Specifically, a threshold is stored in a setting file (not shown) or in a program, and the value of the threshold is compared with the threshold. Although “exceeded” is expressed, this is a design matter and may include an equal sign “exceeded by threshold”. Further, the comparison may be performed using any of the calculation formulas.

いずれにしても閾値に基づいて判断する。閾値を超える（あるいは閾値以上である）場合には、ステップＳ９０９に進み、挿入位置の候補として登録する。そうでない場合には繰り返し処理の最初に戻り、次の発言（終点側なので前方の発言）に遡って類似度の計算を継続する。 In any case, the determination is made based on the threshold value. If it exceeds the threshold value (or is equal to or greater than the threshold value), the process advances to step S909 to register the insertion position as a candidate. If not, the process returns to the beginning of the repetition processing, and the calculation of the similarity is continued by going back to the next utterance (the utterance in front of the end point).

ステップＳ６０９においては、ステップＳ６０８で閾値を超えると判断された発話（あるいは発話の範囲）を、キーボード入力された発話の挿入位置の候補として、発話の内容を登録する。 In step S609, the content of the utterance (or the range of the utterance) determined to exceed the threshold value in step S608 is registered as a candidate for the insertion position of the utterance input from the keyboard.

ステップＳ６１０においては、キーボード入力された発話の挿入位置を複数認めるか否かを判定する。具体的には、設定ファイル（不図示）あるいはプログラム中に複数候補を認めるか否かを判定フラグとして記憶しておき、その値に基づいて判定する。 In step S610, it is determined whether or not a plurality of insertion positions of the utterance input from the keyboard are permitted. Specifically, whether or not a plurality of candidates are recognized in a setting file (not shown) or a program is stored as a determination flag, and the determination is made based on the value.

あるいは、例えば前記ステップＳ６０８の閾値よりも大きな値の第２の閾値を同様に記憶し、前述の閾値を超えたものがあっても、第２の閾値を超えたものがない場合は複数認める、一方、第２の閾値を超えたものがあればそれ以上の候補を登録することは認めない、というように動的に判定するのでもよい。いずれにしても設計事項である。 Alternatively, for example, a second threshold value larger than the threshold value in step S608 is similarly stored, and even if there is a value exceeding the above-described threshold value, if there is no value exceeding the second threshold value, a plurality of values are recognized. On the other hand, it may be dynamically determined that if any of the candidates exceeds the second threshold, no more candidates are registered. In any case, it is a design matter.

前述の処理により繰り返し処理（Ｓ６０４〜Ｓ６１１）を終了すると、１つまたは複数の挿入位置の候補が格納されている。ここで１つの場合にはその位置に挿入するが、複数ある場合にはキーボード入力による発話を行ったユーザに選択させてもよい。あるいはもっとも類似度が高い発話の直後に挿入してもよい。具体的には、図７の発話表示画面７００ｂにおいてキーボード入力文が挿入される位置を例示している。これは類似度が最も高かった発話位置２（発話８〜９の直後）に挿入した例である。 When the repetition processing (S604 to S611) is completed by the above-described processing, one or more insertion position candidates are stored. Here, if there is one, it is inserted at that position, but if there is more than one, the user who made the utterance by keyboard input may select it. Alternatively, it may be inserted immediately after the utterance having the highest similarity. Specifically, a position where a keyboard input sentence is inserted in the utterance display screen 700b of FIG. 7 is illustrated. This is an example of insertion at the utterance position 2 having the highest similarity (immediately after the utterances 8 to 9).

複数の挿入位置候補がありユーザに選択させる場合には、ステップＳ６１２においては、挿入位置候補一覧を情報処理端末１０２ｂに送信する。 When there are a plurality of insertion position candidates and the user is to be selected, in step S612, a list of insertion position candidates is transmitted to the information processing terminal 102b.

ステップＳ６２３においては、複数の挿入位置候補を受信しユーザに提示する。ここでは例えば、図４の発話位置２、３が挿入候補になったとする。具体的には例えば図１０の挿入位置の選択１０００ａのように、情報処理端末１０２ｂのディスプレイの発話を表示している画面中で、発話位置２、３の直後に（挿入位置候補１、２の標識のような）識別可能な標識を表示する。 In step S623, a plurality of insertion position candidates are received and presented to the user. Here, for example, it is assumed that the utterance positions 2 and 3 in FIG. 4 are insertion candidates. Specifically, for example, as in the insertion position selection 1000a in FIG. 10, in the screen displaying the utterance of the display of the information processing terminal 102b, immediately after the utterance positions 2 and 3 (the insertion position candidates 1 and 2 Display an identifiable sign (such as a sign).

あるいは、発話を表示している画面とは別に選択操作をするダイアログを表示してもよい（挿入位置の選択１０００ｂ）。この場合は、スクロールすることでディスプレイ内には既に表示されていない候補も表示し、ユーザの選択対象として提示することが可能となる。 Alternatively, a dialog for performing a selection operation may be displayed separately from the screen displaying the utterance (selection of insertion position 1000b). In this case, by scrolling, candidates that are not already displayed on the display are also displayed, and can be presented as selection targets by the user.

ステップＳ６２４においては、ステップＳ６２３で提示した挿入位置の選択画面（図１０）でユーザの選択を受け付け、その選択した位置を音声認識サーバ１０１に送信する。 In step S624, the user's selection is accepted on the insertion position selection screen (FIG. 10) presented in step S623, and the selected position is transmitted to the speech recognition server 101.

ステップＳ６１３においては、選択された挿入位置とキーボードからの発話を関連付けて認識結果記憶部３２０に登録し、選択された挿入位置にキーボードからの発話を挿入して、前述のステップＳ５１１の通り情報処理端末１０２（ａおよびｂ）においてユーザに提示する。情報処理端末１０２での表示の一例を図１１に示す。図１１は、図１０の挿入位置の選択で挿入位置候補１が選択された場合の表示例である
以上で、図６のフローチャートを用いた説明を完了する。 In step S613, the selected insertion position and the utterance from the keyboard are associated and registered in the recognition result storage unit 320, and the utterance from the keyboard is inserted into the selected insertion position. Presented to the user at terminals 102 (a and b). FIG. 11 shows an example of a display on the information processing terminal 102. FIG. 11 is a display example when the insertion position candidate 1 is selected in the selection of the insertion position in FIG. 10. The description using the flowchart in FIG. 6 is completed.

なお例として類似度を算出する発話の起点を特定するに際して、ステップＳ６０３で時系列的に後方にとり、ステップＳ６０４からステップＳ６１１の繰り返し処理は、起点から前方の終点まで遡っていく方法としたが、逆に発話の起点を前方に取りそこから時系列的に後方の終点まで辿ってもよい。その場合、始点と終点は逆になるが、その決定方法は前述と同様である。後方から前方に辿ったのは説明上の便宜的なものであり、方式を限定するものではない。あくまで設計事項である。以上で図６〜図１１を用いたキーボードから入力した文字列を挿入する位置を特定し表示する処理の説明を完了する。 In addition, as an example, when specifying the starting point of the utterance for which the similarity is calculated, in the step S603, the backward is taken in time series, and the repetition processing from the step S604 to the step S611 goes back from the starting point to the front end point. Conversely, the starting point of the utterance may be taken in front and traced in chronological order therefrom to the rear end point. In this case, the start point and the end point are reversed, but the determination method is the same as described above. Tracing from the rear to the front is for convenience of explanation, and does not limit the method. This is a design matter only. This is the end of the description of the processing for specifying and displaying the position where the character string input from the keyboard is to be inserted using FIGS.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the configurations and contents of the various data described above are not limited thereto, and it goes without saying that various data and configurations are configured according to applications and purposes.

以上、いくつかの実施形態について示したが、本発明は、例えば、システム、装置、方法、コンピュータプログラムもしくは記録媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although some embodiments have been described above, the present invention can take embodiments as, for example, a system, an apparatus, a method, a computer program or a recording medium, and more specifically, a plurality of devices. , Or may be applied to an apparatus composed of one device.

また、本発明におけるコンピュータプログラムは、図５〜図６に示すフローチャートの処理方法をコンピュータが実行可能なコンピュータプログラムであり、本発明の記憶媒体は図５〜図６の処理方法をコンピュータが実行可能なコンピュータプログラムが記憶されている。なお、本発明におけるコンピュータプログラムは図５〜図６の各装置の処理方法ごとのコンピュータプログラムであってもよい。 The computer program according to the present invention is a computer program capable of executing the processing method of the flowcharts shown in FIGS. 5 and 6, and the storage medium of the present invention is capable of executing the processing method shown in FIGS. Computer programs are stored. Note that the computer program in the present invention may be a computer program for each processing method of each device in FIGS.

以上のように、前述した実施形態の機能を実現するコンピュータプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたコンピュータプログラムを読出し実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, the recording medium storing the computer program for realizing the functions of the above-described embodiments is supplied to the system or the apparatus, and the computer (or CPU or MPU) of the system or the apparatus is stored in the recording medium. It goes without saying that the object of the present invention is achieved by reading and executing the program.

この場合、記録媒体から読み出されたコンピュータプログラム自体が本発明の新規な機能を実現することになり、そのコンピュータプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the computer program itself read from the recording medium implements the novel functions of the present invention, and the recording medium storing the computer program constitutes the present invention.

コンピュータプログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク、ソリッドステートドライブ等を用いることができる。 As a recording medium for supplying a computer program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, A silicon disk, a solid state drive, or the like can be used.

また、コンピュータが読み出したコンピュータプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのコンピュータプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 When the computer executes the computer program read out, not only the functions of the above-described embodiments are realized, but also an OS (operating system) running on the computer based on the instructions of the computer program. It goes without saying that a part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.

さらに、記録媒体から読み出されたコンピュータプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのコンピュータプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the computer program read from the recording medium is written into a memory provided on a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function is executed based on the instructions of the computer program code. It goes without saying that a CPU or the like provided in the expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にコンピュータプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのコンピュータプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system including a plurality of devices or to an apparatus including a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a computer program to a system or an apparatus. In this case, by reading the recording medium storing the computer program for achieving the present invention into the system or the apparatus, the system or the apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのコンピュータプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。
なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Further, by downloading and reading out a computer program for achieving the present invention from a server, a database, or the like on a network by a communication program, the system or apparatus can enjoy the effects of the present invention.
It should be noted that all configurations obtained by combining the above-described embodiments and their modifications are also included in the present invention.

１０１音声認識サーバ
１０２情報処理端末
３１１音声取得部
３１２音声データ送信部
３１３認識結果受信部
３１４表示部
３１５キーボード操作受付部
３１６キーボード入力情報送信部
３１７関連候補受信部
３１８関連候補選択・送信部
３２０認識結果記憶部
３２１音声データ受信部
３２２音声認識部
３２３認識結果送信部
３２４認識結果管理部
３２５キーボード入力結果受信部
３２６関連づけ処理部
３２７関連候補送信部
３２８選択情報受信部 101 voice recognition server 102 information processing terminal 311 voice acquisition unit 312 voice data transmission unit 313 recognition result reception unit 314 display unit 315 keyboard operation reception unit 316 keyboard input information transmission unit 317 related candidate reception unit 318 related candidate selection / transmission unit 320 recognition Result storage unit 321 Voice data receiving unit 322 Voice recognition unit 323 Recognition result transmitting unit 324 Recognition result managing unit 325 Keyboard input result receiving unit 326 Association processing unit 327 Related candidate transmitting unit 328 Selection information receiving unit

Claims

ユーザの入力によるテキストを取得する取得手段と、前記取得手段で取得したテキストを記憶する記憶手段とを備える情報処理装置であって、
前記取得手段で取得したテキストと、前記記憶手段に記憶されているテキストとの類似度に従って、前記記憶手段に記憶されているテキストのうち、前記取得したテキストに対応するテキストを特定する特定手段と、
前記取得したテキストを前記特定されたテキストに対応付けて表示する表示手段と
を備えることを特徴とする情報処理装置。 An information processing apparatus comprising: an acquisition unit that acquires a text input by a user; and a storage unit that stores the text acquired by the acquisition unit.
Specifying means for specifying a text corresponding to the obtained text among the texts stored in the storage means, according to the similarity between the text obtained by the obtaining means and the text stored in the storage means; ,
Display means for displaying the acquired text in association with the specified text.

前記対応するテキストは、前記記憶手段に記憶されている関連ある複数のテキストをまとめたテキストであることを特徴とする請求項１に記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the corresponding text is a text obtained by collecting a plurality of related texts stored in the storage unit.

前記特定手段は、前記記憶手段に記憶されている所定の時間範囲内のテキストの中で前記対応するテキストを特定することを特徴とする請求項１または２に記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the specifying unit specifies the corresponding text among texts within a predetermined time range stored in the storage unit.

前記特定手段は、前記記憶手段に記憶されているテキストのうち、前記表示手段に表示できるテキストの中で前記対応するテキストを特定することを特徴とする請求項１〜３のいずれか１項に記載の情報処理装置。 4. The method according to claim 1, wherein the specifying unit specifies the corresponding text among texts that can be displayed on the display unit, among the texts stored in the storage unit. 5. An information processing apparatus according to claim 1.

前記特定手段は、前記対応するテキストの候補が複数存在する場合、ユーザより前記対応するテキストに係る位置の選択を受け付けることを特徴とする請求項１〜４のいずれか１項に記載の情報処理装置。 5. The information processing apparatus according to claim 1, wherein when a plurality of candidates for the corresponding text exist, the specifying unit accepts a selection of a position related to the corresponding text from a user. 6. apparatus.

ユーザの入力によるテキストを取得する取得手段と、前記取得手段で取得したテキストを記憶する記憶手段とを備える情報処理装置であって、
前記取得手段で取得したテキストと、前記記憶手段に記憶されているテキストとの類似度に従って、前記記憶手段に記憶されているテキストのうち、前記取得したテキストに対応するテキストを特定する特定手段と、
前記記憶手段に記憶されているテキストを並べて表示する場合に、前記取得したテキストを、前記特定されたテキストの近傍に表示する表示手段と
を備えることを特徴とする情報処理装置。 An information processing apparatus comprising: an acquisition unit that acquires a text input by a user; and a storage unit that stores the text acquired by the acquisition unit.
Specifying means for specifying a text corresponding to the obtained text among the texts stored in the storage means, according to the similarity between the text obtained by the obtaining means and the text stored in the storage means; ,
A display unit that displays the acquired text in the vicinity of the identified text when displaying the texts stored in the storage unit side by side.

ユーザの入力によるテキストを取得する取得手段と、前記取得手段で取得したテキストを記憶する記憶手段とを備える情報処理装置の制御方法であって、
特定手段が、前記取得手段で取得したテキストと、前記記憶手段に記憶されているテキストとの類似度に従って、前記記憶手段に記憶されているテキストのうち、前記取得したテキストに対応するテキストを特定する特定ステップと、
表示手段が、前記取得したテキストを前記特定されたテキストに対応付けて表示する表示ステップと
を備えることを特徴とする情報処理装置の制御方法。 A control method for an information processing apparatus, comprising: an acquisition unit that acquires a text input by a user, and a storage unit that stores the text acquired by the acquisition unit,
Specifying means for specifying a text corresponding to the obtained text among the texts stored in the storage means according to a similarity between the text obtained by the obtaining means and the text stored in the storage means. Specific steps to perform,
A display step of displaying the acquired text in association with the specified text.

ユーザの入力によるテキストを取得する取得手段と、前記取得手段で取得したテキストを記憶する記憶手段とを備える情報処理装置において実行可能なプログラムであって、
前記情報処理装置を、
前記取得手段で取得したテキストと、前記記憶手段に記憶されているテキストとの類似度に従って、前記記憶手段に記憶されているテキストのうち、前記取得したテキストに対応するテキストを特定する特定手段と、
前記取得したテキストを前記特定されたテキストに対応付けて表示する表示手段
として機能させるためのプログラム。 A program executable by an information processing apparatus comprising: an acquiring unit that acquires a text input by a user; and a storing unit that stores the text acquired by the acquiring unit.
The information processing device,
Specifying means for specifying a text corresponding to the obtained text among the texts stored in the storage means, according to the similarity between the text obtained by the obtaining means and the text stored in the storage means; ,
A program for functioning as display means for displaying the acquired text in association with the specified text.