JP6873805B2

JP6873805B2 - Dialogue support system, dialogue support method, and dialogue support program

Info

Publication number: JP6873805B2
Application number: JP2017085246A
Authority: JP
Inventors: 山本　正明; 正明山本; 永松　健司; 健司永松
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2021-05-19
Anticipated expiration: 2037-04-24
Also published as: JP2018185561A

Description

本発明は、対話支援システム、対話支援方法、及び対話支援プログラムに関する。 The present invention relates to a dialogue support system, a dialogue support method, and a dialogue support program.

従来のテキスト（言語）対話システム（以下、従来システムという。）には、意図理解部と意図理解モデルを利用したものがある。例えば、意図理解部は、入力されたテキストにより示される発話者の意図（以下、トピックともいう。）を、意図理解モデルを用いて推定する。意図理解モデルには、想定される入力テキストとトピックとの対応関係が定義されている。例えば意図理解モデルにおいて、入力テキストとして想定される「通帳を作りたい」と、トピックである「口座開設の手続き」とが紐付いて定義されている場合、意図理解部は、実際の入力テキスト（例えば、「通帳を作りたいです。」）と、想定されるテキスト（「通帳を作りたい」）とを比較し、両者の類似度を計算する。そして、その類似度が高い場合、意図理解部は、その入力テキストである「通帳を作りたい」に対応する「口座開設の手続き」をトピックとして推定する。 Some conventional text (language) dialogue systems (hereinafter referred to as conventional systems) use an intention understanding unit and an intention understanding model. For example, the intention understanding unit estimates the speaker's intention (hereinafter, also referred to as a topic) indicated by the input text by using the intention understanding model. The intent understanding model defines the correspondence between the expected input text and the topic. For example, in the intention understanding model, when "I want to make a passbook", which is assumed as an input text, and "Procedure for opening an account", which is a topic, are defined in association with each other, the intention understanding department performs the actual input text (for example, , "I want to make a passbook.") And the expected text ("I want to make a passbook") are compared, and the degree of similarity between the two is calculated. Then, when the degree of similarity is high, the intention understanding department estimates the "procedure for opening an account" corresponding to the input text "I want to make a passbook" as a topic.

このような従来システムには、例えば、特許文献１の言語処理装置がある。すなわち、特許文献１には、遷移前状態と入力単語と出力と正の値である単語重み値情報と遷移先情報とを含む組である単語遷移データと、単数又は複数の単語に対応する言語理解結果であるコンセプトと、該コンセプトに対応する正の値であるコンセプト重み値情報とを含む組であるコンセプト重みデータと、遷移前状態、任意の単語にマッチするフィラー、負の値であるフィラー重み情報及び遷移先状態を含む組であるフィラー遷移データと、を言語理解モデルとして記憶する言語理解モデル記憶部と、入力される単語系列に含まれる単語と現状態とに基づき、前記言語理解モデル記憶部から読み出した前記単語遷移データに従って、定義された前記出力を理解結果候補として出力し、単語重み値を累積し、前記言語理解モデル記憶部から読み出した前記フィラー遷移データに従って、フィラー重み値を累積し、前記遷移先状態に遷移する状態遷移動作を順次行う有限状態変換器処理部と、前記言語理解モデル記憶部から読み出した前記コンセプト重みデータに従って、前記有限状態変換器処理部から出力された前記理解結果候補に含まれるコンセプトに対応するコンセプト重み値を累積するコンセプト重み付け処理部と、出力された複数の系列の前記理解結果候補の中から、前記累積された単語重み値と前記累積されたコンセプト重み値と前記累積されたフィラー重み値との重み付き和である累積重みが最大となる理解結果を決定する理解結果決定部とを具備することを特徴とする言語処理装置が記載されている。 Such a conventional system includes, for example, the language processing device of Patent Document 1. That is, Patent Document 1 describes word transition data, which is a set including a state before transition, an input word, an output, word weight value information which is a positive value, and transition destination information, and a language corresponding to one or more words. Concept weight data, which is a set including a concept that is an understanding result and concept weight value information that is a positive value corresponding to the concept, a state before transition, a filler that matches an arbitrary word, and a filler that is a negative value. The language understanding model based on the language understanding model storage unit that stores the filler transition data, which is a set including the weight information and the transition destination state, as a language understanding model, and the words and the current state included in the input word sequence. The defined output is output as an understanding result candidate according to the word transition data read from the storage unit, the word weight values are accumulated, and the filler weight value is calculated according to the filler transition data read from the language understanding model storage unit. Output from the finite state converter processing unit according to the finite state converter processing unit that accumulates and sequentially performs the state transition operation to transition to the transition destination state and the concept weight data read from the language understanding model storage unit. The concept weighting processing unit that accumulates the concept weight values corresponding to the concepts included in the understanding result candidates, and the accumulated word weight values and the accumulated word weight values from the output multiple series of the understanding result candidates. Described is a language processing apparatus including a language processing apparatus including an understanding result determining unit that determines an understanding result that maximizes the cumulative weight, which is a weighted sum of the concept weight value and the accumulated filler weight value. ..

特開２００６−３０２２９３号公報Japanese Unexamined Patent Publication No. 2006-302293

しかし、特許文献１では、入力テキストの内容によってはトピックを推定することが困難な場合がある。例えば、「通帳を作りたい。また、預金したい。」という入力テキストにおいて、「通帳を作りたい」というテキスト部分に「口座開設の手続き」というトピックが含まれ、「預金したい」というテキスト部分に「預金の手続き」というトピックが含まれていたと仮定する。また、意図理解モデルにおいて、「通帳を作りたい」という入力テキストと、「口座開設の手続き」というトピックとが紐付けて定義されていると仮定する。同様に、「預金したい」という入力テキストと、「預金の手続き」というトピックとが紐付けて定義されていると仮定する。この場合、特許文献１によれば、入力テキストと
最も類似度の高いテキスト（例えば、「通帳を作りたい」）のみが抽出され、その結果、意図理解モデルにおける「通帳を作りたい」に紐付けられたトピック（「口座開設の手続き」）のみが推定結果として出力される。すなわち、特許文献１では、入力テキストの内容に関わらず１個のトピックを推定結果として出力するので、発話（入力テキスト）の内容によってはその意図を正確に把握することができない場合がある。 However, in Patent Document 1, it may be difficult to estimate the topic depending on the content of the input text. For example, in the input text "I want to make a passbook. Also, I want to make a deposit.", The text part "I want to make a passbook" includes the topic "Procedure for opening an account", and the text part "I want to make a deposit" includes "I want to make a deposit." Suppose the topic "Deposit Procedures" was included. In addition, it is assumed that the input text "I want to make a passbook" and the topic "Procedure for opening an account" are defined in association with each other in the intention understanding model. Similarly, suppose that the input text "I want to deposit" and the topic "Deposit procedure" are defined in association with each other. In this case, according to Patent Document 1, only the text having the highest similarity to the input text (for example, "I want to make a passbook") is extracted, and as a result, it is linked to "I want to make a passbook" in the intention understanding model. Only the topic (“Procedure for opening an account”) is output as the estimation result. That is, in Patent Document 1, since one topic is output as an estimation result regardless of the content of the input text, the intention may not be accurately grasped depending on the content of the utterance (input text).

本発明はこのような点に鑑みてなされたものであり、その目的は、発話の意図を正確に把握して対話を行うことが可能な対話支援システム、対話支援方法、及び対話支援プログラムを提供することにある。 The present invention has been made in view of these points, and an object of the present invention is to provide a dialogue support system, a dialogue support method, and a dialogue support program capable of accurately grasping the intention of utterance and conducting a dialogue. To do.

前記の課題を解決するための本発明の一つは、プロセッサ及びメモリを備える対話支援システムであって、対話の主題を示す情報であるトピック情報を複数記憶している意図理解モデル記憶部と、前記トピック情報を参照することにより、外部から発せられた言葉から複数の意図を抽出する意図理解部と、前記抽出した意図のそれぞれについて、それぞれの前記意図に対応した言葉を、前記対話を構成する言葉として生成する対話生成部と、前記生成した言葉を出力する対話出力部と、を備える。 One of the present inventions for solving the above-mentioned problems is an intention understanding model storage unit which is a dialogue support system including a processor and a memory and stores a plurality of topic information which is information indicating the subject of the dialogue. By referring to the topic information, the intention understanding unit that extracts a plurality of intentions from words emitted from the outside and words corresponding to the respective intentions for each of the extracted intentions constitute the dialogue. It includes a dialogue generation unit that generates words, and a dialogue output unit that outputs the generated words.

本発明によれば、発話の意図を正確に把握して対話を行うことができる。 According to the present invention, it is possible to accurately grasp the intention of utterance and engage in dialogue.

図１は、実施例１に係る音声対話支援システム２０００の構成の一例を示す図である。FIG. 1 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the first embodiment. 図２は、対話支援装置１及び対話シナリオ作成装置５が備えるハードウェアの一例を示す図である。FIG. 2 is a diagram showing an example of the hardware included in the dialogue support device 1 and the dialogue scenario creation device 5. 図３は、意図理解モデル９３が記憶している情報の一例を示す図である。FIG. 3 is a diagram showing an example of information stored in the intention understanding model 93. 図４は、音声対話支援システム２０００において行われる典型的な処理の流れを説明するフローチャートである。FIG. 4 is a flowchart illustrating a typical processing flow performed in the voice dialogue support system 2000. 図５は、音声認識処理ｓ１の詳細の一例を説明するフローチャートである。FIG. 5 is a flowchart illustrating a detailed example of the voice recognition process s1. 図６は、対話制御処理ｓ２の詳細の一例を説明するフローチャートである。FIG. 6 is a flowchart illustrating a detailed example of the dialogue control process s2. 図７は、対話シナリオ９１の一例を示す図である。FIG. 7 is a diagram showing an example of the dialogue scenario 91. 図８は、サブ対話シナリオ９１５の一例を示す図である。FIG. 8 is a diagram showing an example of the sub-dialogue scenario 915. 図９は、出力テキストリスト９２の一例を示す図である。FIG. 9 is a diagram showing an example of the output text list 92. 図１０は、音声合成処理ｓ３の一例を説明するフローチャートである。FIG. 10 is a flowchart illustrating an example of the voice synthesis process s3. 図１１は、意図理解モデル９３の一例を示す図である。FIG. 11 is a diagram showing an example of the intention understanding model 93. 図１２は、実施例２に係る音声対話支援システム２０００の構成の一例を示す図である。FIG. 12 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the second embodiment. 図１３は、実施例２に係る意図理解モデルの一例を示す図である。FIG. 13 is a diagram showing an example of an intention understanding model according to the second embodiment. 図１４は、実施例２に係るサブ対話シナリオ９１６の一例を示す図である。FIG. 14 is a diagram showing an example of the sub-dialogue scenario 916 according to the second embodiment. 図１５は、実施例２に係る出力テキストリスト９６の一例を示す図である。FIG. 15 is a diagram showing an example of the output text list 96 according to the second embodiment. 図１６は、実施例３に係る音声対話支援システム２０００の構成の一例を示す図である。FIG. 16 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the third embodiment. 図１７は、対話ログ９９の一例を示す図である。FIG. 17 is a diagram showing an example of the dialogue log 99. 図１８は、対話ログ９９の更新後の一例を示す図である。FIG. 18 is a diagram showing an example after updating the dialogue log 99.

以下、図面を参照しつつ、本発明に係る各実施例を詳述する。
−−実施例１−−
図１は、実施例１に係る音声対話支援システム２０００の構成の一例を示す図である。本実施例の音声対話支援システム２０００は、例えば、人間との音声対話を行ういわゆる対話型ロボット（サービスロボット）であり、対話に係る音声の入出力処理を行う音声処理システム３０００と、対話に関する情報処理を行う対話支援システム１０００とを含んで構成されている。 Hereinafter, each embodiment of the present invention will be described in detail with reference to the drawings.
−− Example 1−−
FIG. 1 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the first embodiment. The voice dialogue support system 2000 of this embodiment is, for example, a so-called interactive robot (service robot) that performs voice dialogue with humans, and is a voice processing system 3000 that performs voice input / output processing related to dialogue, and information on dialogue. It is configured to include a dialogue support system 1000 that performs processing.

音声処理システム３０００は、音声が入力される音声入力装置３０（マイク等）、所定の合成音声を出力する音声出力装置５０（スピーカー等）を備える。 The voice processing system 3000 includes a voice input device 30 (microphone or the like) for inputting voice, and a voice output device 50 (speaker or the like) for outputting a predetermined synthetic voice.

対話支援システム１０００は、対話支援装置１、及び対話シナリオ作成装置５を備える。対話支援装置１は音声処理システム３０００と接続しており、音声入力装置３０から入力された音声１００に基づき所定の情報処理を行うことにより音声１００に対応する音声である合成音声５００を生成し、生成した合成音声５００を音声出力装置５０に送信する。 The dialogue support system 1000 includes a dialogue support device 1 and a dialogue scenario creation device 5. The dialogue support device 1 is connected to the voice processing system 3000, and generates a synthetic voice 500 which is a voice corresponding to the voice 100 by performing predetermined information processing based on the voice 100 input from the voice input device 30. The generated synthetic voice 500 is transmitted to the voice output device 50.

対話シナリオ作成装置５は、音声対話支援システム２０００の管理者又はユーザ等（以下、ユーザという。）が利用する情報処理装置であり、対話支援装置１が処理する様々な情報を作成する。例えば、対話シナリオ作成装置５は、次述する対話シナリオ９１、サブ対話シナリオ９１５、及び出力テキストリスト９２等の内容を編集する。なお、対話支援装置１と対話シナリオ作成装置５との間は所定の通信線により直接に、もしくは、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）、インターネット、専用線等の
通信網を介して接続される。 The dialogue scenario creation device 5 is an information processing device used by an administrator or a user (hereinafter referred to as a user) of the voice dialogue support system 2000, and creates various information processed by the dialogue support device 1. For example, the dialogue scenario creation device 5 edits the contents of the dialogue scenario 91, the sub-dialogue scenario 915, the output text list 92, and the like described below. The dialogue support device 1 and the dialogue scenario creation device 5 are directly connected by a predetermined communication line, or via a communication network such as a LAN (Local Area Network), WAN (Wide Area Network), the Internet, or a dedicated line. Is connected.

なお、図２は、対話支援装置１及び対話シナリオ作成装置５が備えるハードウェアの一例を示す図である。同図に示すように、対話支援装置１及び対話シナリオ作成装置５は、ＣＰＵ（Central Processing Unit）等の、処理の制御を司るプロセッサ１１と、ＲＡＭ
（Random Access Memory）、ＲＯＭ（Read Only Memory）等の主記憶装置１２と、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）等の補助記憶装置１３と、キーボード、マウス、タッチパネル等の入力装置１４と、モニタ（ディスプレイ）等の出力装置１５と、有線LANカード、無線LANカード、モデム等の通信装置１６とを備える。 Note that FIG. 2 is a diagram showing an example of the hardware included in the dialogue support device 1 and the dialogue scenario creation device 5. As shown in the figure, the dialogue support device 1 and the dialogue scenario creation device 5 include a processor 11 that controls processing, such as a CPU (Central Processing Unit), and a RAM.
Main storage device 12 such as (Random Access Memory) and ROM (Read Only Memory), auxiliary storage device 13 such as HDD (Hard Disk Drive) and SSD (Solid State Drive), and input devices such as keyboard, mouse and touch panel. It includes 14, an output device 15 such as a monitor (display), and a communication device 16 such as a wired LAN card, a wireless LAN card, and a modem.

次に、図１に示すように、対話支援装置１は、音声認識部２０、対話制御部６０、前処理部７０、意図理解部８０、意図理解モデル記憶部８５、及び音声合成部４０を備える。 Next, as shown in FIG. 1, the dialogue support device 1 includes a voice recognition unit 20, a dialogue control unit 60, a preprocessing unit 70, an intention understanding unit 80, an intention understanding model storage unit 85, and a voice synthesis unit 40. ..

音声認識部２０は、外部から入力された音声を文字列に変換することにより前記外部から発せられた言葉とする。具体的には、音声認識部２０は、音声入力装置３０から取得した音声１００から音声以外の音（雑音）を除去し、雑音を除去した音声を文字列の情報（入力テキスト２００）に変換する。 The voice recognition unit 20 converts the voice input from the outside into a character string to obtain the words emitted from the outside. Specifically, the voice recognition unit 20 removes sounds (noise) other than voice from the voice 100 acquired from the voice input device 30, and converts the noise-removed voice into character string information (input text 200). ..

対話制御部６０は、前処理部７０、及び意図理解部８０と情報の送受信を行うことにより、音声認識部２０から受信した入力テキスト２００に対応する出力テキスト４００を生成し、生成した出力テキスト４００を音声合成部４０に送信する。 The dialogue control unit 60 generates an output text 400 corresponding to the input text 200 received from the voice recognition unit 20 by transmitting and receiving information to and from the preprocessing unit 70 and the intention understanding unit 80, and the generated output text 400 Is transmitted to the voice synthesis unit 40.

次に、意図理解モデル記憶部８５は、意図理解モデル９３を記憶している。すなわち、意図理解モデル記憶部８５は、対話の主題（以下、トピックともいう。）を示す情報であるトピック情報を複数記憶している。 Next, the intention understanding model storage unit 85 stores the intention understanding model 93. That is, the intention understanding model storage unit 85 stores a plurality of topic information which is information indicating the subject of the dialogue (hereinafter, also referred to as a topic).

ここで、意図理解モデル９３について説明する。
（意図理解モデル）
図３は、意図理解モデル９３が記憶している情報の一例を示す図である。意図理解モデル９３はトピック情報を記憶しており、複数の主題（トピック）のそれぞれについて、それぞれのトピックの名称と、それぞれのトピックにおいて発せられることが想定される言葉とを対応づけて記憶している。具体的には、意図理解モデル９３は、トピックの識別子（以下、トピックＩＤという。）が格納されるトピックＩＤ９３１、トピックＩＤ９３１が示すトピックの名称が格納されるトピック名９３２、及び、トピックＩＤ９３１が示すトピックにおいて発せられることが想定される言葉を表す文字列（例えば、「通帳を作りたい」。以下、想定入力テキストという。）が格納される入力テキスト９３３の各項目を含む、少なくとも１つ以上のレコードを有するデータベースである。 Here, the intention understanding model 93 will be described.
(Intention understanding model)
FIG. 3 is a diagram showing an example of information stored in the intention understanding model 93. The intention understanding model 93 stores topic information, and for each of a plurality of subjects (topics), the name of each topic and the words expected to be uttered in each topic are stored in association with each other. There is. Specifically, the intention understanding model 93 includes a topic ID 931 in which a topic identifier (hereinafter referred to as a topic ID) is stored, a topic name 932 in which the topic name indicated by the topic ID 931 is stored, and a topic ID 931. At least one or more, including each item of the input text 933 that stores a character string representing a word that is expected to be issued in the topic (for example, "I want to make a passbook". Hereinafter referred to as an assumed input text). A database with records.

次に、図１に示すように、意図理解部８０は、前記トピック情報を参照することにより、外部から発せられた言葉（入力テキスト２００）から複数の意図を抽出する。 Next, as shown in FIG. 1, the intention understanding unit 80 extracts a plurality of intentions from words (input text 200) emitted from the outside by referring to the topic information.

前処理部７０（分割部）は、前記外部から発せられた言葉（入力テキスト２００）を複数の言葉の部分（分割テキスト２０１）に分割する。 The preprocessing unit 70 (division unit) divides the words (input text 200) emitted from the outside into a plurality of word portions (division text 201).

この場合、前記意図理解部８０は、前記分割した複数の言葉の部分のそれぞれについて、それぞれの前記言葉の部分に対応する前記意図を抽出する。具体的には、前記意図理解部８０は、前記トピック情報のそれぞれと、前記分割した言葉の部分との間の類似度を算出し、算出した前記類似度のうち最大の類似度を算出した前記トピック情報を参照することにより、前記分割した言葉の前記意図を抽出する。 In this case, the intention understanding unit 80 extracts the intention corresponding to each of the divided word parts for each of the divided word parts. Specifically, the intention understanding unit 80 calculated the similarity between each of the topic information and the divided word portion, and calculated the maximum similarity among the calculated similarity. By referring to the topic information, the intent of the divided words is extracted.

なお、本実施形態では、抽出された主題（トピック）は、トピックＩＤのリスト（トピックＩＤリスト３００）として抽出され、抽出されたトピックＩＤリスト３００は、対話制御部６０に入力される。 In the present embodiment, the extracted subject (topic) is extracted as a list of topic IDs (topic ID list 300), and the extracted topic ID list 300 is input to the dialogue control unit 60.

対話制御部６０は、対話生成部６２、及び対話出力部６４を備える。 The dialogue control unit 60 includes a dialogue generation unit 62 and a dialogue output unit 64.

対話生成部６２は、意図理解部８０が前記抽出した意図のそれぞれについて、それぞれの前記意図に対応した言葉を、前記対話を構成する言葉（出力テキスト４００）として生成する。 The dialogue generation unit 62 generates words corresponding to the respective intentions as words (output text 400) constituting the dialogue for each of the extracted intentions by the intention understanding unit 80.

具体的には、前記対話生成部６２は、前記抽出した意図について、当該意図に関連づけられた、発話の手順を記憶した情報である対話シナリオ（対話シナリオ９１）を取得し、取得した前記対話シナリオが示す手順に基づき、前記抽出した意図に対応した言葉を生成する。 Specifically, the dialogue generation unit 62 acquires a dialogue scenario (dialogue scenario 91), which is information related to the extracted intention and memorizes the utterance procedure, and the acquired dialogue scenario. Generates words corresponding to the extracted intentions based on the procedure shown in.

なお、対話制御部６０は、前記手順が示す処理のうち自身が現在行っている処理の位置（又は直前に行った処理の位置）を随時記憶している。 The dialogue control unit 60 stores the position of the process currently being performed (or the position of the process performed immediately before) among the processes indicated by the procedure at any time.

また、対話制御部６０は、対話の手順の詳細（以下、サブ対話シナリオという。）を記憶しているデータベースであるサブ対話シナリオ９１５と、サブ対話シナリオ９１５における具体的な各対話の内容（出力テキスト４００）が定義された出力テキストリスト９２とを記憶している。 Further, the dialogue control unit 60 has a sub-dialogue scenario 915, which is a database that stores details of the dialogue procedure (hereinafter, referred to as a sub-dialogue scenario), and specific contents (output) of each dialogue in the sub-dialogue scenario 915. The output text list 92 in which the text 400) is defined is stored.

対話出力部６４は、対話生成部６２が前記生成した言葉を出力する。 The dialogue output unit 64 outputs the words generated by the dialogue generation unit 62.

音声合成部４０は、対話出力部６４が前記対話を構成する言葉を音声に変換する。 In the voice synthesis unit 40, the dialogue output unit 64 converts the words constituting the dialogue into voice.

次に、対話シナリオ作成装置５は、対話シナリオ作成部５１を備える。対話シナリオ作
成部５１は、前記対話シナリオ（例えば、対話シナリオ９１、サブ対話シナリオ９１５、及び出力テキストリスト９２に関する情報）の入力を受け付けると共に受け付けた前記対話シナリオに関する情報を出力する。 Next, the dialogue scenario creation device 5 includes a dialogue scenario creation unit 51. The dialogue scenario creation unit 51 accepts input of the dialogue scenario (for example, information regarding the dialogue scenario 91, the sub-dialogue scenario 915, and the output text list 92) and outputs the information regarding the received dialogue scenario.

ユーザは、対話シナリオ作成装置５に出力される所定の編集画面を利用して、対話シナリオ９１、サブ対話シナリオ９１５、及び出力テキストリスト９２の作成及び編集を行うことができる。これにより、ユーザは、対話シナリオや出力テキストの情報を使用環境に応じて自由に変更することができる。 The user can create and edit the dialogue scenario 91, the sub-dialogue scenario 915, and the output text list 92 by using the predetermined edit screen output to the dialogue scenario creation device 5. As a result, the user can freely change the information of the dialogue scenario and the output text according to the usage environment.

以上に説明した対話支援装置１及び対話シナリオ作成装置５の各機能は、例えば、これらの装置が備えるハードウェアによって、又は、これらの装置のプロセッサ１１が主記憶装置１２又は補助記憶装置１３に記憶されているプログラムを読み込んで実行することにより実現される。なお、このプログラムは、ＣＤ−ＲＯＭ、ＳＤカード、ＤＶＤ等の記録媒体に記録したものであってもよい。 The functions of the dialogue support device 1 and the dialogue scenario creation device 5 described above are stored, for example, by the hardware included in these devices, or by the processor 11 of these devices in the main storage device 12 or the auxiliary storage device 13. It is realized by reading and executing the program that is being executed. This program may be recorded on a recording medium such as a CD-ROM, an SD card, or a DVD.

次に、音声対話支援システム２０００が行う処理について説明する。
＜処理＞
（音声対話支援処理）
図４は、音声対話支援システム２０００において行われる典型的な処理の流れを説明するフローチャートである。 Next, the processing performed by the voice dialogue support system 2000 will be described.
<Processing>
(Voice dialogue support processing)
FIG. 4 is a flowchart illustrating a typical processing flow performed in the voice dialogue support system 2000.

まず、対話支援装置１が起動すると、対話支援装置１は、外部（例えば、発問を行った人物。以下、対話相手ともいう。）から発せられた音声を言葉として認識する処理（以下、音声認識処理ｓ１という。）を実行する（ｓ１）。そして、対話支援装置１は、ｓ１で認識した言葉に対する応答の言葉を生成する処理（以下、対話制御処理ｓ２という。）を実行する（ｓ２）。そして、対話支援装置１は、ｓ２で生成した応答の言葉に対応する音声を出力して対話相手との対話を実現する処理（以下、音声出力処理ｓ３という。）を実行する（ｓ３）。
次に、音声認識処理ｓ１の詳細を説明する。 First, when the dialogue support device 1 is activated, the dialogue support device 1 recognizes a voice emitted from the outside (for example, the person who asked the question; hereinafter, also referred to as a dialogue partner) as a word (hereinafter, voice recognition). Process s1) is executed (s1). Then, the dialogue support device 1 executes a process of generating a word of response to the word recognized in s1 (hereinafter, referred to as a dialogue control process s2) (s2). Then, the dialogue support device 1 executes a process (hereinafter, referred to as voice output process s3) of outputting a voice corresponding to the response word generated in s2 to realize a dialogue with the dialogue partner (s3).
Next, the details of the voice recognition process s1 will be described.

（音声認識処理ｓ１）
図５は、音声認識処理ｓ１の詳細の一例を説明するフローチャートである。同図に示すように、音声認識部２０は、音声入力装置３０が取得した対話相手の音声を取得する（ｓ１１）。そして、音声認識部２０は、ｓ１１で取得した音声から、対話相手の音声以外の音（雑音）を除去し、雑音を除去した音声１００の入力テキスト２００に変換する（ｓ１３）。音声認識部２０は、ｓ１３で変換した入力テキスト２００を、対話支援システム１０００に送信する。以上の処理が繰り返される。 (Voice recognition process s1)
FIG. 5 is a flowchart illustrating a detailed example of the voice recognition process s1. As shown in the figure, the voice recognition unit 20 acquires the voice of the dialogue partner acquired by the voice input device 30 (s11). Then, the voice recognition unit 20 removes sounds (noise) other than the voice of the dialogue partner from the voice acquired in s11, and converts the voice into input text 200 of the voice 100 from which the noise has been removed (s13). The voice recognition unit 20 transmits the input text 200 converted in s13 to the dialogue support system 1000. The above process is repeated.

次に、対話制御処理ｓ２の詳細を説明する。対話制御処理ｓ２は、前述のように、例えば、対話制御部６０が入力テキスト２００を受信したことを契機に開始される。 Next, the details of the dialogue control process s2 will be described. As described above, the dialogue control process s2 is started, for example, when the dialogue control unit 60 receives the input text 200.

（対話制御処理ｓ２）
図６は、対話制御処理ｓ２の詳細の一例を説明するフローチャートである。同図に示すように、対話制御部６０は、自身が直前に行った対話シナリオ９１における処理の次の処理を一つ読み込み、読み込んだ処理の種類を判断する（ｓ２１）。なお、全ての処理が読み込まれているか、又はｓ２１で読み込んだ処理が対話制御処理ｓ２を終了するための処理である場合は、対話制御処理ｓ２は終了する（不図示）。また、対話制御部６０は、対話シナリオ９１を初めて読み込む場合には、対話シナリオ９１の最初の処理を読み込む。 (Dialogue control process s2)
FIG. 6 is a flowchart illustrating a detailed example of the dialogue control process s2. As shown in the figure, the dialogue control unit 60 reads one process next to the process in the dialogue scenario 91 performed immediately before by itself, and determines the type of the read process (s21). If all the processes are read, or if the process read in s21 is a process for terminating the dialogue control process s2, the dialogue control process s2 ends (not shown). Further, when the dialogue control unit 60 reads the dialogue scenario 91 for the first time, the dialogue control unit 60 reads the first process of the dialogue scenario 91.

処理の内容が、（Ａ）入力テキスト２００を解析してトピック（意図）を推定する処理
（トピック推定処理）である場合、対話制御部６０は、次述するｓ２３の処理を行い（ｓ２１：トピック推定）、（Ｂ）出力テキスト４００を出力する処理（出力処理）である場合（ｓ２１：テキスト出力）、対話制御部６０は、後述するｓ２６の処理を行い、（Ｃ）所定の条件判断を行う処理（条件判断処理）である場合（ｓ２１：条件判断）、対話制御部６０は後述するｓ２７の処理を行う。 When the content of the process is (A) a process of analyzing the input text 200 and estimating a topic (intention) (topic estimation process), the dialogue control unit 60 performs the process of s23 described below (s21: topic). (Estimation), (B) In the case of processing (output processing) for outputting the output text 400 (s21: text output), the dialogue control unit 60 performs the processing of s26 described later, and (C) determines a predetermined condition. In the case of processing (condition determination processing) (s21: condition determination), the dialogue control unit 60 performs the processing of s27 described later.

（Ａ）トピック推定
ｓ２３において前処理部７０は、入力テキスト２００を複数のテキストに分割して分割テキスト２０１を作成する。 (A) In topic estimation s23, the preprocessing unit 70 divides the input text 200 into a plurality of texts to create the divided text 201.

例えば、入力テキスト２００が、「通帳を作りたい。また、預金したい。」である場合、前処理部７０は、形態素解析により入力テキスト２００に含まれる品詞を推定する。形態素解析とは、テキストを意味の持つ最小単位（一般的に形態素と呼ぶ。）に分割し、形態素の品詞等を判別することである。 For example, when the input text 200 is "I want to make a passbook and want to make a deposit", the preprocessing unit 70 estimates the part of speech included in the input text 200 by morphological analysis. Morphological analysis is to divide a text into the smallest units having meaning (generally called morphemes) and determine the part of speech of the morpheme.

そして、前処理部７０は、入力テキスト２００に１つの接続詞（例、「また」など）が含まれる場合、入力テキスト２００の中に２つの意図（トピック）が含まれると判断する。そして、前処理部７０は、入力テキスト２００を、前記接続詞の前方にあるテキスト（例「通帳を作りたい。」）と、後方にあるテキスト（例「貯金したい」）とに分割し、それぞれのテキストを分割テキスト２０１とする。 Then, when the input text 200 includes one conjunction (eg, "mata"), the preprocessing unit 70 determines that the input text 200 includes two intentions (topics). Then, the preprocessing unit 70 divides the input text 200 into a text before the conjunction (eg, "I want to make a passbook") and a text after it (eg, "I want to save money"), and each of them. Let the text be split text 201.

なお、ここでは、分割テキスト２０１の作成方法として形態素解析を説明したが、分割テキスト２０１の作成方法としては、文字列を複数のトピックに分割できる方法であれば形態素解析に限らずその他の任意の方法でもよい。また、前処理部７０は、ある方法によって複数の分割テキスト２０１を作成できなかった場合には、他の方法によって複数の分割テキスト２０１を作成するように再試行してもよい。 Here, morphological analysis has been described as a method for creating the split text 201, but the method for creating the split text 201 is not limited to morphological analysis as long as the character string can be divided into a plurality of topics. It may be a method. Further, if the preprocessing unit 70 cannot create a plurality of divided texts 201 by a certain method, the preprocessing unit 70 may retry to create a plurality of divided texts 201 by another method.

次に、意図理解部８０は、ｓ２３で生成した分割テキスト２０１のそれぞれについて、その分割テキスト２０１と、意図理解モデル９３に定義されている各想定入力テキストとの間の類似度を算出する。 Next, the intention understanding unit 80 calculates the similarity between the divided text 201 and each assumed input text defined in the intention understanding model 93 for each of the divided texts 201 generated in s23.

そして、意図理解部８０は、ｓ２３で生成した分割テキスト２０１のそれぞれのトピックを推定する。すなわち意図理解部８０は、分割テキスト２０１と、各想定入力テキストとの間の類似度のうち、最大の類似度を算出した想定入力テキストに対応づけられたトピックを、その分割テキスト２０１のトピックとする（ｓ２４）。これを全ての分割テキスト２０１について行う。 Then, the intention understanding unit 80 estimates each topic of the divided text 201 generated in s23. That is, the intention understanding unit 80 sets the topic associated with the assumed input text for which the maximum similarity is calculated among the similarities between the divided text 201 and each assumed input text as the topic of the divided text 201. (S24). This is done for all split texts 201.

例えば、ある分割テキスト２０１が「通帳を作りたい」である場合、意図理解モデル９３に登録されている各想定入力テキスト（トピックＩＤが「Ｉ１」〜「Ｉ３」の想定入力テキスト）のうち、「通帳を作りたい」と最も類似度の高い想定入力テキストは、トピックＩＤが「Ｉ１」の「通帳を作りたい」である。そして、これに対応づけられたトピックは、「口座開設の手続き」である。 For example, when a certain divided text 201 is "I want to make a passbook", among the assumed input texts (assumed input texts whose topic IDs are "I1" to "I3") registered in the intention understanding model 93, " The assumed input text having the highest degree of similarity to "I want to make a passbook" is "I want to make a passbook" with a topic ID of "I1". The topic associated with this is "procedures for opening an account".

なお、前記の類似度の計算方法としては、例えば動的計画法（DP：Dynamic Programming）、マッチング法等がある。 As the above-mentioned calculation method of the degree of similarity, for example, there are a dynamic programming method (DP: Dynamic Programming), a matching method and the like.

意図理解部８０は、ｓ２４で推定した各トピックの情報を対話制御部６０に送信する。具体的には、意図理解部８０は、ｓ２４で推定した各トピックのトピックＩＤのリスト（トピックＩＤリスト３００）を対話制御部６０に送信する。その後はｓ２１の処理に戻る。 The intention understanding unit 80 transmits the information of each topic estimated in s24 to the dialogue control unit 60. Specifically, the intention understanding unit 80 transmits a list of topic IDs (topic ID list 300) of each topic estimated in s24 to the dialogue control unit 60. After that, the process returns to the process of s21.

（Ｂ）テキスト出力
ｓ２６において対話制御部６０は、ｓ２１で読み込んだ対話シナリオ９１の処理により特定される出力テキスト４００を出力テキストリスト９２（出力テキスト９２２）から読み込み、読み込んだ出力テキスト４００を出力する。その後はｓ２１の処理に戻る。 (B) Text output In s26, the dialogue control unit 60 reads the output text 400 specified by the processing of the dialogue scenario 91 read in s21 from the output text list 92 (output text 922), and outputs the read output text 400. .. After that, the process returns to the process of s21.

（Ｃ）条件判断
ｓ２７において対話制御部６０は、ｓ２１で読み込んだ処理に対応する条件判断の処理を行う。その後はｓ２１の処理に戻る。 (C) Condition determination In s27, the dialogue control unit 60 performs a condition determination process corresponding to the process read in s21. After that, the process returns to the process of s21.

ここで、対話制御処理ｓ２が読み込む対話シナリオ９１の一例を説明する。 Here, an example of the dialogue scenario 91 read by the dialogue control process s2 will be described.

（対話シナリオ９１）
図７は、対話シナリオ９１の一例を示す図である。同図に示すように、対話シナリオ９１は、外部からの発話（入力テキスト２００）を促すための最初の言葉（最初の出力テキスト４００）を出力する出力処理である「処理ステップ０」、入力された入力テキスト２００に対する解析を行うことによりトピックを推定するトピック推定処理である「処理ステップ１」、推定したトピックのうち一つを選択し、選択したトピックに対応づけられた出力テキスト４００をサブ対話シナリオ９１５が示す手順で出力する出力処理である「処理ステップ２」、選択したトピックについて出力テキスト４００を全て出力したか否かの条件判断をする処理である条件判断処理である「処理ステップ３」、及び、最後の出力テキスト４００を出力する出力処理である「処理ステップ４」等の各処理を含む。 (Dialogue scenario 91)
FIG. 7 is a diagram showing an example of the dialogue scenario 91. As shown in the figure, the dialogue scenario 91 is input in "processing step 0", which is an output process for outputting the first word (first output text 400) for prompting an external utterance (input text 200). "Processing step 1", which is a topic estimation process for estimating a topic by analyzing the input text 200, selects one of the estimated topics, and sub-dialogues the output text 400 associated with the selected topic. "Processing step 2" which is an output process which is output according to the procedure shown in scenario 915, and "Processing step 3" which is a condition judgment process which is a process of determining whether or not all the output text 400 has been output for the selected topic. , And each process such as "processing step 4" which is an output process for outputting the final output text 400.

次に、上記の処理ステップ２で参照されるサブ対話シナリオ９１５の一例を説明する。 Next, an example of the sub-dialogue scenario 915 referred to in the above processing step 2 will be described.

（サブ対話シナリオ９１５）
図８は、サブ対話シナリオ９１５の一例を示す図である。同図に示すように、サブ対話シナリオ９１５は、トピックＩＤが格納されるトピックＩＤ９１５１、トピックＩＤ９１５１のトピックにおいて出力される出力テキスト４００を特定する識別子（以下、出力テキストＩＤという。）が格納される出力テキストＩＤ９１５２、及び、出力テキストＩＤ９１５２の出力テキスト４００を出力した後に実行される、対話シナリオ９１における処理を特定する情報が格納される次処理９１５３の各項目を有する、少なくとも１つ以上のレコードで構成される。 (Sub-dialogue scenario 915)
FIG. 8 is a diagram showing an example of the sub-dialogue scenario 915. As shown in the figure, in the sub-dialogue scenario 915, an identifier (hereinafter, referred to as an output text ID) that identifies the output text 400 output in the topic ID 9151 in which the topic ID is stored and the topic of the topic ID 9151 is stored. At least one record having each item of the output text ID 9152 and the next process 9153 that stores the information specifying the process in the dialogue scenario 91, which is executed after the output text 400 of the output text ID 9152 is output. It is composed.

次に、上記のサブ対話シナリオ９１５で参照される出力テキストリスト９２の一例を説明する。 Next, an example of the output text list 92 referred to in the above sub-dialogue scenario 915 will be described.

（出力テキストリスト９２）
図９は、出力テキストリスト９２の一例を示す図である。同図に示すように、出力テキスト９２は、対話制御部６０が出力する言葉（テキスト）の一覧を格納したデータベースであり、出力テキストＩＤが格納される出力テキストＩＤ９２１と、出力テキストＩＤ９２１により特定される出力テキストの内容が格納される出力テキスト９２２の各項目を有する、少なくとも１つ以上のレコードで構成されている。 (Output text list 92)
FIG. 9 is a diagram showing an example of the output text list 92. As shown in the figure, the output text 92 is a database that stores a list of words (texts) output by the dialogue control unit 60, and is specified by the output text ID 921 in which the output text ID is stored and the output text ID 921. It is composed of at least one or more records having each item of the output text 922 in which the contents of the output text are stored.

なお、本実施例において、出力テキストＩＤ９２１の「ＯＳ」は、最初に出力する出力テキスト４００であることを示し、出力テキストＩＤ９２１の「ＯＥ」は、最後に出力する出力テキスト４００であることを示す。 In this embodiment, the "OS" of the output text ID 921 indicates that it is the output text 400 to be output first, and the "OE" of the output text ID 921 indicates that it is the output text 400 to be output last. ..

（対話制御処理ｓ２の一例）
ここで、図７の対話シナリオ９１、図８のサブ対話シナリオ９１５、及び図９の出力テ
キストリスト９２に基づいて行われる対話制御処理ｓ２の一例を説明する。 (Example of dialogue control process s2)
Here, an example of the dialogue control process s2 performed based on the dialogue scenario 91 of FIG. 7, the sub-dialogue scenario 915 of FIG. 8, and the output text list 92 of FIG. 9 will be described.

まず、対話制御部６０は、対話シナリオ９１の処理ステップ０を読み込み、テキストの出力処理を行う旨を判断する。そこで対話制御部６０は、対話制御部６０は出力テキストリスト９２を参照し、出力テキストＩＤ９２１が「ＯＳ」の出力テキスト４００（「ご用件は何でしょうか？」）を取得する。対話制御部６０は、取得した出力テキスト４００を音声合成部４０に送信する。 First, the dialogue control unit 60 reads the processing step 0 of the dialogue scenario 91 and determines that the text output processing is performed. Therefore, the dialogue control unit 60 refers to the output text list 92, and acquires the output text 400 (“what is your requirement?”) With the output text ID 921 of “OS”. The dialogue control unit 60 transmits the acquired output text 400 to the voice synthesis unit 40.

対話制御部６０は、音声認識部２０から新たな入力テキスト２００を受信すると、処理ステップ１を読み込むことでトピック推定処理を行う。 When the dialogue control unit 60 receives the new input text 200 from the voice recognition unit 20, the dialogue control unit 60 performs the topic estimation process by reading the process step 1.

具体的には、例えば、対話制御部６０が「通帳を作りたい。また、預金したい。」という入力テキスト２００を受信した場合、前処理部７０及び意図理解部８０は、「口座開設の手続き」及び「貯金の手続き」という２つのトピックを推定する。そして意図理解部８０は、トピックＩＤリスト３００（「Ｉ１」及び「Ｉ２」）を対話制御部６０に送信する。 Specifically, for example, when the dialogue control unit 60 receives the input text 200 "I want to make a passbook and want to make a deposit", the preprocessing unit 70 and the intention understanding unit 80 perform "procedures for opening an account". And estimate two topics, "savings procedure". Then, the intention understanding unit 80 transmits the topic ID list 300 (“I1” and “I2”) to the dialogue control unit 60.

次に、対話制御部６０は、音声認識部２０から新たな入力テキスト２００を受信すると、処理ステップ２を読み込むことで、処理ステップ１で推定したトピックに対応する出力テキスト４００を出力する（対話を実行する）。 Next, when the dialogue control unit 60 receives the new input text 200 from the voice recognition unit 20, it reads the processing step 2 and outputs the output text 400 corresponding to the topic estimated in the processing step 1 (dialogue). Execute).

具体的には、まず対話制御部６０は、前記で受信したトピックＩＤリスト３００から１つのトピック（トピックＩＤ）を選択し、サブ対話シナリオ９１５から、選択したトピックに対応づけられた各出力テキスト４００とその出力テキスト４００を出力した後に実行する処理とを特定する。 Specifically, first, the dialogue control unit 60 selects one topic (topic ID) from the topic ID list 300 received above, and from the sub-dialogue scenario 915, each output text 400 associated with the selected topic. And the process to be executed after the output text 400 is output.

例えば、選択したトピックＩＤが「Ｉ１」であった場合、対話制御部６０は、まず、トピックＩＤ９１５１に「Ｉ１」が格納されているサブ対話シナリオ９１５のレコード９１５４から、出力テキストＩＤ９１５２の内容である「Ｏ１」及び次処理９１５３の内容である「処理ステップ３」を取得する。また対話制御部６０は、出力テキストリスト９２から、前記で取得した「Ｏ１」に対応する出力テキスト４００（「口座開設の手続きのため、書類Ａをご記入ください。）を取得する。対話制御部６０は、取得した出力テキスト４００を音声合成部４０に送信する。 For example, when the selected topic ID is "I1", the dialogue control unit 60 first obtains the contents of the output text ID 9152 from the record 9154 of the sub-dialogue scenario 915 in which "I1" is stored in the topic ID 9151. Acquire "processing step 3" which is the content of "O1" and the next processing 9153. Further, the dialogue control unit 60 acquires the output text 400 (“Please fill in the document A for the procedure for opening an account)” corresponding to the “O1” acquired above from the output text list 92. The dialogue control unit 60. 60 transmits the acquired output text 400 to the voice synthesis unit 40.

なお、選択したトピックＩＤに対応する出力テキストが複数ある場合は、例えば、対話制御部６０は、そのうち一つの出力テキスト４００を音声合成部４０に送信後、音声認識部２０からの新たな入力テキスト２００の受信を待機し、その新たな入力テキスト２００を受信後、他の出力テキスト４００を音声合成部４０に送信する。これらの処理を全ての出力テキスト４００について繰り返す。 When there are a plurality of output texts corresponding to the selected topic ID, for example, the dialogue control unit 60 transmits one of the output texts 400 to the voice synthesis unit 40, and then the new input text from the voice recognition unit 20. After waiting for the reception of 200 and receiving the new input text 200, another output text 400 is transmitted to the voice synthesis unit 40. These processes are repeated for all output texts 400.

対話制御部６０は、前記で特定した処理ステップ３を読み込むことで、条件判断の処理を実行する。具体的には、例えば、対話制御部６０は、トピックＩＤリスト３００に含まれている全てのトピックＩＤ（「Ｉ１」「Ｉ２」）に対応づけられた出力テキストリスト９２の出力テキスト９２２の内容（出力テキスト４００）を全て出力したか否かを判断する。 The dialogue control unit 60 executes the conditional determination process by reading the process step 3 specified above. Specifically, for example, the dialogue control unit 60 contains the contents of the output text 922 of the output text list 92 associated with all the topic IDs (“I1” and “I2”) included in the topic ID list 300 ("I1" and "I2"). It is determined whether or not all the output text 400) has been output.

出力テキスト４００を全て出力していない場合は（処理ステップ３：Ｎｏ）、対話制御部６０は、処理ステップ２を再度読み込み、現在選択しているトピックと異なる他のトピックに関する対話を実行する。 If all the output text 400 is not output (processing step 3: No), the dialogue control unit 60 reloads processing step 2 and executes a dialogue on another topic different from the currently selected topic.

例えば、トピックＩＤが「Ｉ１」のトピックに関する対話を終了したがトピックＩＤが「Ｉ２」のトピックに関する対話を行っていない場合には、対話制御部６０は、サブ対話シナリオ９１５から前記と同様に、トピックＩＤが「Ｉ２」に対応づけられた出力テキスト４００（例えば、「預金の手続きのため、書類Ｂをご記入ください。」）及びその出力テキスト４００を出力した後に実行する処理（「処理ステップ３」）を取得する。対話制御部６０は、取得した出力テキスト４００を音声合成部４０に送信する。 For example, when the dialogue on the topic whose topic ID is "I1" is completed but the dialogue on the topic whose topic ID is "I2" is not performed, the dialogue control unit 60 starts from the sub-dialogue scenario 915 in the same manner as described above. Output text 400 with topic ID associated with "I2" (for example, "Please fill in document B for deposit procedure") and the process to be executed after outputting the output text 400 ("Process step 3") ") To get. The dialogue control unit 60 transmits the acquired output text 400 to the voice synthesis unit 40.

一方、出力テキスト４００を全て出力した場合は（処理ステップ３：Ｙｅｓ）、対話制御部６０は、処理ステップ３の次の処理である処理ステップ４を読み込むことで、出力テキスト４００を出力する。 On the other hand, when all the output text 400 is output (processing step 3: Yes), the dialogue control unit 60 outputs the output text 400 by reading the processing step 4 which is the next processing of the processing step 3.

具体的には、対話制御部６０は、出力テキストリスト９２を参照し、出力テキストＩＤ９２１が「ＯＥ」の出力テキスト４００（「ご案内は、以上となります。ありがとうございました。」）を音声合成部４０に送信する。以上で対話制御処理ｓ２は終了する。 Specifically, the dialogue control unit 60 refers to the output text list 92, and outputs the output text 400 whose output text ID 921 is "OE" ("Thank you for your guidance.") In the voice synthesis unit. Send to 40. This completes the dialogue control process s2.

次に、音声合成処理ｓ３の詳細を説明する。
（音声合成処理ｓ３）
図１０は、音声合成処理ｓ３の一例を説明するフローチャートである。同図に示すように、音声合成部４０は、対話支援システム１０００から出力テキスト４００を受信する（ｓ５１）。そして、音声合成部４０は、出力テキスト４００に基づき所定の合成音声５００を生成する（ｓ５３）。そして、音声合成部４０は、生成した合成音声を音声出力装置５０に送信し、音声出力装置５０は、受信した合成音声を再生する（ｓ５５）。以上で音声合成処理ｓ３は終了する（ｓ５７）。 Next, the details of the speech synthesis process s3 will be described.
(Speech synthesis processing s3)
FIG. 10 is a flowchart illustrating an example of the voice synthesis process s3. As shown in the figure, the speech synthesis unit 40 receives the output text 400 from the dialogue support system 1000 (s51). Then, the voice synthesis unit 40 generates a predetermined synthetic voice 500 based on the output text 400 (s53). Then, the voice synthesis unit 40 transmits the generated synthetic voice to the voice output device 50, and the voice output device 50 reproduces the received synthetic voice (s55). This completes the speech synthesis process s3 (s57).

以上に説明したように、本実施例に係る音声対話支援システム２０００は、トピック情報を複数記憶し（意図理解モデル９３）、トピック情報に基づき、外から発せられた言葉（入力テキスト２００）から複数の意図を抽出し、抽出した意図のそれぞれについて、それぞれの意図に対応した言葉を、対話を構成する言葉として生成し、生成した言葉を出力する（出力テキスト４００）ので、外部から発せられた言葉に複数の意図が含まれている場合でも、その意図のそれぞれについて、対応する適切な言葉を出力することができる。これにより、例えば、発話者から複数の意図を含む言葉が発せられた場合であっても、適切な対話を行うことができる。このように、本実施例に係る音声対話支援システム２０００によれば、複数ある発話の意図を正確に把握した上で対話を行うことができる。 As described above, the voice dialogue support system 2000 according to the present embodiment stores a plurality of topic information (intention understanding model 93), and based on the topic information, a plurality of words (input text 200) uttered from the outside. For each of the extracted intentions, words corresponding to each intention are generated as words that compose the dialogue, and the generated words are output (output text 400), so that the words are emitted from the outside. Even if multiple intentions are included in, the corresponding appropriate words can be output for each of the intentions. Thereby, for example, even when a word including a plurality of intentions is uttered by the speaker, an appropriate dialogue can be performed. As described above, according to the voice dialogue support system 2000 according to the present embodiment, it is possible to perform a dialogue after accurately grasping the intentions of a plurality of utterances.

また、本実施例に係る音声対話支援システム２０００は、外部から発せられた言葉（入力テキスト２００）を複数の言葉の部分に分割し（分割テキスト２０１）、分割した複数の言葉の部分のそれぞれについて、それぞれの言葉の部分に対応する意図を抽出するので、発せられた言葉（入力テキスト２００）から複数の意図を正確に抽出することができる。これにより、複数ある発話の意図を高精度に推定することができる。 Further, the voice dialogue support system 2000 according to the present embodiment divides a word (input text 200) uttered from the outside into a plurality of word parts (split text 201), and for each of the divided plurality of word parts. Since the intention corresponding to each word part is extracted, a plurality of intentions can be accurately extracted from the spoken word (input text 200). As a result, the intentions of a plurality of utterances can be estimated with high accuracy.

また、本実施例に係る音声対話支援システム２０００は、トピック情報のそれぞれと、分割した言葉の部分（分割テキスト２０１）との間の類似度を算出し、算出した類似度のうち最大の類似度を算出したトピック情報を参照することにより、分割した言葉の意図を抽出するので、分割した言葉のそれぞれに対応した適切な意図を抽出することができる。 Further, the voice dialogue support system 2000 according to the present embodiment calculates the similarity between each of the topic information and the divided word portion (divided text 201), and the maximum similarity among the calculated similarity. By referring to the topic information calculated in, the intention of the divided words is extracted, so that the appropriate intention corresponding to each of the divided words can be extracted.

また、本実施例に係る音声対話支援システム２０００は、抽出した意図のそれぞれについて、その意図に関連づけられた、発話の手順を記憶した情報である対話シナリオ（対話シナリオ９１）を取得し、取得した対話シナリオに基づき、抽出した意図に対応した言葉を生成するので、抽出した複数の意図のそれぞれについて、適切な手順で対話を行うことができる。 In addition, the voice dialogue support system 2000 according to the present embodiment acquires and acquires a dialogue scenario (dialogue scenario 91), which is information memorizing the procedure of speech associated with each of the extracted intentions. Since words corresponding to the extracted intentions are generated based on the dialogue scenario, it is possible to have a dialogue with each of the extracted multiple intentions in an appropriate procedure.

また、本実施例に係る音声対話支援システム２０００は、外部から入力された音声（音声１００）を文字列（入力テキスト２００）に変換し（音声認識部２０）、また、対話を構成する言葉（出力テキスト４００）を音声（合成音声５００）に変換するので、複数ある発話の意図を正確に把握した音声対話を行うことができる。 Further, the voice dialogue support system 2000 according to the present embodiment converts the voice (voice 100) input from the outside into a character string (input text 200) (speech recognition unit 20), and also the words constituting the dialogue (voice recognition unit 20). Since the output text 400) is converted into voice (synthetic voice 500), it is possible to perform a voice dialogue that accurately grasps the intentions of a plurality of utterances.

なお、本実施例において、音声処理システム３０００は必ずしも必要ではない。例えば、音声認識部２０は、入力装置１４を介してユーザから入力された文字列（テキスト）を入力テキスト２００として生成し、生成した入力テキスト２００を対話支援システム１０００に送信してもよい。同様に、音声合成部４０は、対話支援システム１０００から受信した出力テキスト４００を出力装置１５に出力してユーザに提示してもよい。これにより、チャットボット（Chatbot）等、文字列（テキスト）ベースの対話を実現することがで
きる。 In this embodiment, the voice processing system 3000 is not always necessary. For example, the voice recognition unit 20 may generate a character string (text) input from the user via the input device 14 as the input text 200, and transmit the generated input text 200 to the dialogue support system 1000. Similarly, the voice synthesis unit 40 may output the output text 400 received from the dialogue support system 1000 to the output device 15 and present it to the user. This makes it possible to realize a character string (text) -based dialogue such as a chatbot.

−−実施例２−−
実施例１では、意図理解モデル９３に様々な想定入力テキストが定義されているが、これらの想定入力テキストのいずれにも実際の入力テキスト２００が対応していない場合は、音声対話支援システム２０００は、入力テキスト２００に含まれる意図の推定を誤る可能性がある。 −− Example 2-−
In the first embodiment, various assumed input texts are defined in the intention understanding model 93, but when the actual input text 200 does not correspond to any of these assumed input texts, the voice dialogue support system 2000 is used. , There is a possibility that the estimation of the intention included in the input text 200 may be erroneous.

例えば、図１１は、意図理解モデル９３の一例を示す図であるが、入力テキスト２００が「持っていない」である場合、同図に示す意図理解モデルにおいて、「持っていない」に対応する想定入力テキストは、トピックＩＤが「Ｉ１−２」である「印鑑を持っていない」と、トピックＩＤが「Ｉ２−２」である「通帳を持っていない」とがある。なぜなら、これらのトピックＩＤのトピックはいずれも「持っていない」というテキスト（文字列）を含んでいるからである。したがって、本来の入力テキスト２００のトピックが「通帳を持っていない」であったとしても、実施例１では、「印鑑を持っていない」に対応する「印鑑不所持」が意図として推定されてしまい、その結果、意図の推定は誤りとなる。 For example, FIG. 11 is a diagram showing an example of the intention understanding model 93, but when the input text 200 is “not possessed”, it is assumed that the input text 200 corresponds to “not possessed” in the intention understanding model shown in the figure. The input text includes "I do not have a seal" whose topic ID is "I1-2" and "I do not have a passbook" whose topic ID is "I2-2". This is because all the topics with these topic IDs contain the text (character string) that "does not have". Therefore, even if the topic of the original input text 200 is "I do not have a passbook", in Example 1, "I do not have a seal" corresponding to "I do not have a seal" is presumed as an intention. As a result, the estimation of intention is incorrect.

そこで、本実施例における音声対話支援システム２０００は、複数の意図理解モデルを用いることによりこの問題を解決する。 Therefore, the voice dialogue support system 2000 in this embodiment solves this problem by using a plurality of intention understanding models.

図１２は、実施例２に係る音声対話支援システム２０００の構成の一例を示す図である。同図に示すように、本実施例の音声対話支援システム２０００では、実施例１の音声対話支援システム２０００と比べると、意図理解モデル記憶部８５が、それぞれ異なる主題（トピック）の言葉について記憶した複数の意図理解モデル９４（意図理解モデル９４（１）（２）（３））を記憶している点が異なる。
ここで、本実施例に係る意図理解モデルについて説明する。 FIG. 12 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the second embodiment. As shown in the figure, in the voice dialogue support system 2000 of the present embodiment, the intention understanding model storage unit 85 memorized words of different subjects (topics) as compared with the voice dialogue support system 2000 of the first embodiment. The difference is that a plurality of intention understanding models 94 (intention understanding models 94 (1) (2) (3)) are stored.
Here, the intention understanding model according to this embodiment will be described.

（意図理解モデル）
図１３は、実施例２に係る意図理解モデルの一例を示す図である。同図に示すように、本実施例の意図理解モデル９４は、意図理解モデル９４（１）（以下、意図理解モデルＭ１ともいう。）、意図理解モデル９４（２）（以下、意図理解モデルＭ２ともいう。）、及び意図理解モデル９４（３）（以下、意図理解モデルＭ３ともいう。）を含んで構成されている。 (Intention understanding model)
FIG. 13 is a diagram showing an example of an intention understanding model according to the second embodiment. As shown in the figure, the intention understanding model 94 of this embodiment includes an intention understanding model 94 (1) (hereinafter, also referred to as an intention understanding model M1) and an intention understanding model 94 (2) (hereinafter, an intention understanding model M2). It also includes an intention understanding model 94 (3) (hereinafter, also referred to as an intention understanding model M3).

意図理解モデルＭ１は、複数のトピック情報を記憶している。具体的には、意図理解モデルＭ１は、「口座開設の手続き」及び「預金の手続き」という２つのトピックと、これに対応する言葉（想定入力テキスト）とを記憶している。 The intention understanding model M1 stores a plurality of topic information. Specifically, the intention understanding model M1 stores two topics, "procedure for opening an account" and "procedure for deposit", and corresponding words (assumed input text).

意図理解モデルＭ２は、意図理解モデルＭ１のうち１つのトピックに関連するトピック（以下、関連トピックという。）を記憶している。具体的には、意図理解モデルＭ２は、「口座開設の手続き」というトピックに関連する２つのトピック（「印鑑不所持」及び「印鑑所持」）を記憶している。 The intention understanding model M2 stores a topic related to one topic of the intention understanding model M1 (hereinafter, referred to as a related topic). Specifically, the intention understanding model M2 stores two topics (“non-possession of seal” and “possession of seal”) related to the topic of “procedure for opening an account”.

意図理解モデルＭ３は、意図理解モデルＭ１のうち他の１つのトピックに関連するトピック（関連トピック）を記憶している。具体的には、意図理解モデルＭ３は、「預金の手続き」というトピックに関連する２つのトピック（通常不所持」及び「通帳所持」）を記憶している。 The intention understanding model M3 stores a topic (related topic) related to the other one topic in the intention understanding model M1. Specifically, the intent understanding model M3 stores two topics (usually non-possessed) and “passbook possession” related to the topic “deposit procedure”.

なお、各意図理解モデル９４の項目の構成は、実施例１の意図理解モデル９３と同様である。 The composition of the items of each intention understanding model 94 is the same as that of the intention understanding model 93 of the first embodiment.

このように、本実施例における意図理解モデル記憶部８５は、前記トピック情報（意図理解モデルＭ１）と、当該トピック情報が示すトピックに関連するトピックを示す情報である関連トピック情報（意図理解モデルＭ２、Ｍ３）とを対応づけて記憶している。そして、前記意図理解モデル記憶部８０は、前記トピック情報及び前記関連トピック情報を参照することにより、前記発せられた言葉から前記意図を抽出する。 As described above, the intention understanding model storage unit 85 in the present embodiment includes the topic information (intention understanding model M1) and related topic information (intention understanding model M2) which is information indicating a topic related to the topic indicated by the topic information. , M3) are associated and memorized. Then, the intention understanding model storage unit 80 extracts the intention from the spoken words by referring to the topic information and the related topic information.

なお、本実施例では、音声対話支援システム２０００は、複数の意図理解モデル９４に対応したサブ対話シナリオ９１６及び出力テキストリスト９６を記憶している。 In this embodiment, the voice dialogue support system 2000 stores the sub-dialogue scenario 916 and the output text list 96 corresponding to the plurality of intention understanding models 94.

ここで、本実施例におけるサブ対話シナリオ９１６について説明する。
（サブ対話シナリオ）
図１４は、実施例２に係るサブ対話シナリオ９１６の一例を示す図である。同図に示すように、本実施例に係るサブ対話シナリオ９１６は、実施例１のサブ対話シナリオ９１５と同様に、トピックＩＤが格納されるトピックＩＤ９１６１、出力テキストＩＤが格納される出力テキストＩＤ９１６２、及び、出力テキストＩＤ９１６２の出力テキスト４００を出力した後に実行される、対話シナリオ９１における処理を特定する情報が格納される次処理９１６３の各項目を有する、少なくとも１つ以上のレコードで構成される。 Here, the sub-dialogue scenario 916 in this embodiment will be described.
(Sub-dialogue scenario)
FIG. 14 is a diagram showing an example of the sub-dialogue scenario 916 according to the second embodiment. As shown in the figure, the sub-dialogue scenario 916 according to the present embodiment has the topic ID 9161 in which the topic ID is stored and the output text ID 9162 in which the output text ID is stored, as in the sub-dialogue scenario 915 of the first embodiment. It is composed of at least one or more records having each item of the next process 9163 in which the information specifying the process in the dialogue scenario 91, which is executed after the output text 400 of the output text ID 9162 is output, is stored.

さらに、本実施例に係るサブ対話シナリオ９１６では、これらの項目に加えて、出力テキストＩＤ９１６２の出力テキスト４００を出力した後に実行される、意図理解モデルを特定する情報が格納される次モデル９１６４の項目を有する。すなわち、出力テキストＩＤ９１６２により特定されるテキストを出力する処理の次に行う処理が、次処理９１６３及び次モデル９１６４によって特定される。 Further, in the sub-dialogue scenario 916 according to the present embodiment, in addition to these items, the next model 9164 in which the information for identifying the intention understanding model, which is executed after the output text 400 of the output text ID 9162 is output, is stored. Has an item. That is, the process to be performed after the process of outputting the text specified by the output text ID 9162 is specified by the next process 9163 and the next model 9164.

なお、出力テキストＩＤ９１６２に対応する出力テキスト４００の内容は、出力テキストリスト９６に格納される。 The contents of the output text 400 corresponding to the output text ID 9162 are stored in the output text list 96.

そこで、本実施例に係る出力テキストリスト９６について説明する。
（出力テキストリスト）
図１５は、実施例２に係る出力テキストリスト９６の一例を示す図である。同図に示すように、出力テキストリスト９６は、実施例１の出力テキストリスト９２と同様に、出力テキストＩＤが格納される出力テキストＩＤ９６１と、出力テキストＩＤ９６１により特定される出力テキストの内容が格納される出力テキスト９６２の各項目を有する、少なくとも１つ以上のレコードで構成されている。 Therefore, the output text list 96 according to this embodiment will be described.
(Output text list)
FIG. 15 is a diagram showing an example of the output text list 96 according to the second embodiment. As shown in the figure, the output text list 96 stores the output text ID 961 in which the output text ID is stored and the contents of the output text specified by the output text ID 961 as in the output text list 92 of the first embodiment. It is composed of at least one or more records having each item of the output text 962 to be output.

次に、本実施例において音声対話支援システム２０００が行う音声対話支援処理について説明する。
＜処理＞
本実施例において、音声認識処理ｓ１、及び音声出力処理ｓ３は実施例１と同様である。ここでは、本実施例に係る対話制御処理ｓ２を、対話制御処理ｓ２が実施例１で示した対話シナリオ９１（図７）に基づき行われることを前提に説明する。 Next, the voice dialogue support process performed by the voice dialogue support system 2000 in this embodiment will be described.
<Processing>
In this embodiment, the voice recognition process s1 and the voice output process s3 are the same as those in the first embodiment. Here, the dialogue control process s2 according to the present embodiment will be described on the premise that the dialogue control process s2 is performed based on the dialogue scenario 91 (FIG. 7) shown in the first embodiment.

まず対話制御部６０は、実施例１と同様に処理ステップ０を読み込み、出力テキストＩＤ９２１が「ＯＳ」の出力テキスト４００（「ご用件は何でしょうか？」）を取得し、取得した出力テキスト４００を音声合成部４０に送信する。 First, the dialogue control unit 60 reads the processing step 0 in the same manner as in the first embodiment, acquires the output text 400 (“what is your requirement?”) With the output text ID 921 of “OS”, and the acquired output text. 400 is transmitted to the voice synthesis unit 40.

次に、対話制御部６０は、実施例１と同様に音声認識部２０から新たな入力テキスト２００を受信すると処理ステップ１を読み込み、トピック推定を行う。 Next, when the dialogue control unit 60 receives the new input text 200 from the voice recognition unit 20 as in the first embodiment, the dialogue control unit 60 reads the processing step 1 and estimates the topic.

ここで、対話制御部６０が「通帳を作りたい。また、預金したい。」という入力テキスト２００を受信した場合、前処理部７０及び意図理解部８０は、この入力テキスト２００に基づき分割テキスト２０１を生成し、生成した分割テキスト２０１に基づき、入力テキスト２００に含まれる各トピックを推定する。 Here, when the dialogue control unit 60 receives the input text 200 "I want to make a passbook and want to make a deposit", the preprocessing unit 70 and the intention understanding unit 80 use the divided text 201 based on the input text 200. Each topic included in the input text 200 is estimated based on the generated and generated split text 201.

具体的には、前処理部７０及び意図理解部８０は、意図理解モデル９４（１）に基づき、「口座開設の手続き」（トピックＩＤ「Ｉ１−１」）を一つのトピックとして推定し、「貯金の手続き」（トピックＩＤ「Ｉ１−２」）を他のトピックとして推定する。そして、意図理解部８０は、推定したトピックのリスト（「Ｉ−１」及び「Ｉ２−１」なるトピックＩＤリスト３００）を対話制御部６０に送信する。 Specifically, the preprocessing unit 70 and the intention understanding unit 80 estimate the "account opening procedure" (topic ID "I1-1") as one topic based on the intention understanding model 94 (1), and " "Saving procedure" (topic ID "I1-2") is estimated as another topic. Then, the intention understanding unit 80 transmits a list of estimated topics (topic ID list 300 of “I-1” and “I2-1”) to the dialogue control unit 60.

続いて、対話制御部６０は、音声認識部２０から新たな入力テキスト２００を受信すると処理ステップ２を読み込み、処理ステップ１で推定したトピックに関する出力テキスト４００を出力する（対話を実行する）。 Subsequently, when the dialogue control unit 60 receives the new input text 200 from the voice recognition unit 20, the dialogue control unit 60 reads the processing step 2 and outputs the output text 400 related to the topic estimated in the processing step 1 (executes the dialogue).

例えば、対話制御部６０は、トピックＩＤが「Ｉ１−１」のトピックを選択し、トピックＩＤ９１６１に「Ｉ１−１」が格納されているサブ対話シナリオ９１６のレコードにおける、出力テキストＩＤ９１６２の内容である「Ｏ１−１」と、次処理９１６３及び次モデル９１６４の内容である「意図理解モデルＭ２」及び「処理ステップ１」とを特定する。そして対話制御部６０は、出力テキストＩＤ９６１が「Ｏ１−１」である出力テキストリスト９６のレコードの出力テキスト４００（「印鑑をお持ちですか？」）を取得する。対話制御部６０は、取得した出力テキスト４００を音声合成部４０に送信する。 For example, the dialogue control unit 60 selects the topic whose topic ID is "I1-1", and is the content of the output text ID 9162 in the record of the sub-dialogue scenario 916 in which "I1-1" is stored in the topic ID 9161. “O1-1” and “intention understanding model M2” and “processing step 1” which are the contents of the next processing 9163 and the next model 9164 are specified. Then, the dialogue control unit 60 acquires the output text 400 (“Do you have a seal?”) Of the record of the output text list 96 whose output text ID 961 is “O1-1”. The dialogue control unit 60 transmits the acquired output text 400 to the voice synthesis unit 40.

なお、選択したトピックＩＤに対応する出力テキスト４００が複数ある場合、対話制御部６０は実施例１と同様に全ての出力テキスト４００を出力する。 When there are a plurality of output texts 400 corresponding to the selected topic ID, the dialogue control unit 60 outputs all the output texts 400 as in the first embodiment.

対話制御部６０は、出力テキスト４００を音声合成部４０に送信後、音声認識部２０からの新たな入力テキスト２００の受信を待機し、その新たな入力テキスト２００を受信後、前記で特定した、「意図理解モデルＭ２」における「処理ステップ１」を読み込むことにより、トピック推定を行う。 After transmitting the output text 400 to the voice synthesis unit 40, the dialogue control unit 60 waits for the reception of the new input text 200 from the voice recognition unit 20, and after receiving the new input text 200, the dialogue control unit 60 is specified above. Topic estimation is performed by reading "processing step 1" in "intention understanding model M2".

具体的には、例えば、対話制御部６０が、出力テキスト４００として「印鑑をお持ちですか？」を送信後、新たな入力テキスト２００として「持っていない」を受信した場合、
前処理部７０及び意図理解部８０は、意図理解モデルＭ２を参照し、前記新たな入力テキスト２００と最も類似度が高い想定入力テキストを特定する。例えば、意図理解部８０は、意図理解モデルＭ２の各レコードのうち、前記新たな入力テキスト２００と最も類似度の高い想定入力テキスト（「印鑑を持っていない」）が格納されているレコードを特定し、特定したレコードのトピック（「印鑑不所持」）を推定対象のトピックとする。 Specifically, for example, when the dialogue control unit 60 sends "Do you have a seal?" As the output text 400 and then receives "I do not have" as the new input text 200.
The preprocessing unit 70 and the intention understanding unit 80 refer to the intention understanding model M2 and specify the assumed input text having the highest degree of similarity to the new input text 200. For example, the intention understanding unit 80 identifies a record in which the assumed input text (“not having a seal”) having the highest similarity to the new input text 200 is stored among the records of the intention understanding model M2. Then, the topic of the specified record (“non-possession of seal”) is set as the topic to be estimated.

このように、意図理解部８０は、入力テキスト２００である「持っていない」が、印鑑を持っていない意図を有していることを正確に推定することができる。 In this way, the intention understanding unit 80 can accurately estimate that the input text 200 "does not have" has the intention of not having the seal.

以降は実施例１と同様である。最後に対話制御部６０は、出力テキストＩＤ「ＯＥ」が出力テキストＩＤ９６１に格納されている出力テキストリスト９６のレコードの出力テキスト４００（「ご案内は、以上となります。ありがとうございました。」）を取得し、取得した出力テキスト４００を音声合成部４０に送信する。 The following is the same as in Example 1. Finally, the dialogue control unit 60 outputs the output text 400 of the record of the output text list 96 in which the output text ID "OE" is stored in the output text ID 961 ("Thank you for your guidance."). The acquired output text 400 is transmitted to the voice synthesis unit 40.

以上のように、本実施例の音声対話支援システム２０００は、トピック情報（意図理解モデルＭ１）と、そのトピック情報が示すトピックに関連するトピックを示す情報（関連トピック情報）とを対応づけて記憶しており（意図理解モデルＭ２、Ｍ３）、トピック情報及び関連トピック情報を参照することにより、発せられた言葉（入力テキスト２００）から複数の意図を抽出するので、発話の意図を高精度に推定することができる。 As described above, the voice dialogue support system 2000 of the present embodiment stores the topic information (intention understanding model M1) and the information indicating the topic related to the topic indicated by the topic information (related topic information) in association with each other. (Intention understanding models M2 and M3), and by referring to the topic information and related topic information, multiple intentions are extracted from the spoken words (input text 200), so that the intention of the speech is estimated with high accuracy. can do.

−−実施例３−−
実施例２においては複数の意図理解モデル９４が設けられているが、入力テキスト２００の内容によっては、意図理解モデル９４のいずれにも適切な想定入力テキストが存在しない場合がある。 −− Example 3-−
Although a plurality of intention understanding models 94 are provided in the second embodiment, there is a case where an appropriate assumed input text does not exist in any of the intention understanding models 94 depending on the content of the input text 200.

例えば、実施例２において、まず、「通帳を作りたい。」という１つのトピックが含まれる入力テキスト２００のみが対話制御部６０に入力され、そのトピックの対話が行われている最中に、「預金したい。」という異なるトピックを含む入力テキスト２００が対話制御部６０に入力された場合（例えば、発話者が急に次のトピックを思い出して言う場合）を想定する。すなわち、「通帳を作りたい。また、預金したい。」という２つのトピックを含む入力テキスト２００が同機会に対話制御部６０に入力されなかった場合を想定する。 For example, in the second embodiment, first, only the input text 200 including one topic "I want to make a passbook" is input to the dialogue control unit 60, and while the dialogue of that topic is being performed, " It is assumed that an input text 200 including a different topic "I want to make a deposit" is input to the dialogue control unit 60 (for example, when the speaker suddenly recalls the next topic). That is, it is assumed that the input text 200 including the two topics "I want to make a passbook and want to make a deposit" is not input to the dialogue control unit 60 at the same opportunity.

この場合、実施例２に係る音声対話支援システム２０００はまず、「通帳を作りたい。」という入力テキスト２００に対応するトピックとして、意図理解モデルＭ１を参照することにより「口座開設の手続き」というトピック（トピックＩＤ「Ｉ１−１」）を推定し、意図理解モデルＭ２を参照することにより推定したこのトピックに対応する対話（サブ対話シナリオ９１６）を実行する。しかし、この意図理解モデルＭ２による対話の実行中に、「預金したい。」という入力テキスト２００が対話制御部６０に入力された場合（例えば、サブ対話シナリオ９１６による対話の実行中において、「印鑑をお持ちですか？」という出力テキスト４００の送信後に「持っていない。それから、預金したい」という入力テキスト２００が対話制御部６０に入力された場合）、音声対話支援システム２０００は、（現在使用している）意図理解モデルＭ２に基づき、「持っていない。」に対応するトピック（「Ｉ１−２」）である「印鑑不所持」を推定することはできるが、意図理解モデルＭ２には「預金したい。」に対応するトピックが定義されていないため、音声対話支援システム２０００は、適切なトピック（例えば、意図理解モデルＭ１のトピックＩＤ「Ｉ２−１」に対応する「預金の手続き」）を推定できない。 In this case, the voice dialogue support system 2000 according to the second embodiment first refers to the topic "procedure for opening an account" by referring to the intention understanding model M1 as a topic corresponding to the input text 200 "I want to make a passbook." (Topic ID "I1-1") is estimated, and the dialogue (sub-dialogue scenario 916) corresponding to this topic estimated by referring to the intention understanding model M2 is executed. However, when the input text 200 "I want to deposit" is input to the dialogue control unit 60 during the execution of the dialogue by the intention understanding model M2 (for example, during the execution of the dialogue by the sub-dialogue scenario 916, the "seal" is displayed. After sending the output text 400 "Do you have it?", The voice dialogue support system 2000 (currently used) when the input text 200 "I do not have it. Then I want to make a deposit" is input to the dialogue control unit 60). Based on the intention understanding model M2, it is possible to estimate the topic ("I1-2") corresponding to "I do not have", "I do not have a seal", but the intention understanding model M2 has "deposit". Since the topic corresponding to "I want to" is not defined, the voice dialogue support system 2000 estimates an appropriate topic (for example, "deposit procedure" corresponding to the topic ID "I2-1" of the intention understanding model M1). Can not.

この原因は、各意図理解モデル９４に定義されているトピック情報の数が少ないためであるので、この対処法としては、ユーザが、各意図理解モデル９４に多数のトピック情報
及び想定入力テキストを記憶させておくことが考えられるが、そのような作業はユーザにとって煩雑である。 This is because the number of topic information defined in each intention understanding model 94 is small. Therefore, as a countermeasure, the user stores a large amount of topic information and assumed input text in each intention understanding model 94. However, such work is complicated for the user.

そこで、本実施例に係る音声対話支援システム２０００は、意図理解モデルの使用履歴を用いることにより、適切なトピックの推定を行う。 Therefore, the voice dialogue support system 2000 according to this embodiment estimates an appropriate topic by using the usage history of the intention understanding model.

図１６は、実施例３に係る音声対話支援システム２０００の構成の一例を示す図である。同図に示すように、本実施例の音声対話支援システム２０００は、実施例２の音声対話支援システム２０００と比べると、対話ログ記憶部８２が設けられている点が異なる。対話ログ記憶部８２は、過去に参照した意図理解モデル９４が記録されている対話ログ９９を記憶している。 FIG. 16 is a diagram showing an example of the configuration of the voice dialogue support system 2000 according to the third embodiment. As shown in the figure, the voice dialogue support system 2000 of the present embodiment is different from the voice dialogue support system 2000 of the second embodiment in that the dialogue log storage unit 82 is provided. The dialogue log storage unit 82 stores the dialogue log 99 in which the intention understanding model 94 referred to in the past is recorded.

すなわち、対話ログ記憶部８２は、前記意図を抽出するために参照された前記トピック情報である参照トピック情報（対話ログ９９）を記憶する。前記意図理解部８０は、前記発せられた言葉から前記意図を抽出する場合に、前記トピック情報及び前記関連トピック情報を参照することにより第１の意図の候補を取得すると共に、前記参照トピック情報を参照することにより第２の意図の候補を取得し、取得した前記第１の意図の候補及び前記第２の意図の候補を所定の基準により比較することにより、前記第１の意図の候補又は前記第２の意図の候補のいずれかを前記抽出する意図とする。
ここで、対話ログ９９について詳細に説明する。 That is, the dialogue log storage unit 82 stores the reference topic information (dialogue log 99), which is the topic information referred to for extracting the intention. When the intention understanding unit 80 extracts the intention from the spoken words, the intention understanding unit 80 obtains a candidate for the first intention by referring to the topic information and the related topic information, and obtains the reference topic information. The candidate for the first intention or the candidate for the first intention is obtained by referring to the candidate for the second intention, and by comparing the acquired candidate for the first intention and the candidate for the second intention according to a predetermined criterion. Any of the candidates for the second intention is intended to be extracted.
Here, the dialogue log 99 will be described in detail.

（対話ログ）
対話ログ９９には、入力テキスト２００及び出力テキスト４００（以下、両者をあわせて入出力テキストという。）の情報が記録される。 (Dialogue log)
Information on the input text 200 and the output text 400 (hereinafter, both are collectively referred to as input / output text) is recorded in the dialogue log 99.

図１７は、対話ログ９９の一例を示す図である。同図に示すように、対話ログ９９は、入出力テキストの内容が格納されるテキスト９９３、テキスト９９３の入出力テキストが出力テキスト４００であるか入力テキスト２００であるかを示す情報（属性の情報）が格納されるデータ属性９９２、テキスト９９３の入出力テキストに対応づけられているトピックのトピックＩＤが格納されるトピックＩＤ９９４、トピックＩＤ９９４のトピックの名称が格納されるトピック名９９５、テキスト９９３の入出力テキストが入力又は出力された処理において参照された意図理解モデルを特定する情報が格納される使用意図理解モデル９９６、及び、テキスト９９３の入出力テキストが使用された日時（例えば、トピックの推定日時）が格納される日時９９１の各項目を有する、少なくとも１つ以上のレコードで構成される。 FIG. 17 is a diagram showing an example of the dialogue log 99. As shown in the figure, the dialogue log 99 contains information (attribute information) indicating whether the input / output texts of the text 993 and the text 993 in which the contents of the input / output texts are stored are the output text 400 or the input text 200. ) Is stored in the data attribute 992, the input / output text of the text 993 is stored in the topic ID of the topic associated with the text, the topic ID 994 is stored, the topic name of the topic ID 994 is stored in the topic name 995, and the text 993 is entered. The date and time when the input / output texts of the intended use model 996 and the text 993 were used (for example, the estimated date and time of the topic) in which the information that identifies the intention understanding model referenced in the input or output process of the output text is stored. ) Is stored, and consists of at least one or more records having each item of the date and time 991.

＜処理＞
次に、本実施例において音声対話支援システム２０００が行う音声対話支援処理について説明する。 <Processing>
Next, the voice dialogue support process performed by the voice dialogue support system 2000 in this embodiment will be described.

本実施例において、音声認識処理ｓ１、及び音声出力処理ｓ３は実施例１、２と同様である。ここでは、対話制御処理ｓ２が実施例１、２で示した対話シナリオ９１（図７）と、実施例２の意図理解モデル９４（図１３）、サブ対話シナリオ９１６（図１４）、及び出力テキストリスト９６（図１５）とに基づき行われることを前提に、本実施例に係る対話制御処理ｓ２を説明する。また、図１７に示した対話ログ９９が記憶されているものとする。 In this embodiment, the voice recognition process s1 and the voice output process s3 are the same as those in the first and second embodiments. Here, the dialogue control process s2 includes the dialogue scenario 91 (FIG. 7) shown in Examples 1 and 2, the intention understanding model 94 (FIG. 13) of the second embodiment, the sub-dialogue scenario 916 (FIG. 14), and the output text. The dialogue control process s2 according to the present embodiment will be described on the premise that the operation is performed based on the list 96 (FIG. 15). Further, it is assumed that the dialogue log 99 shown in FIG. 17 is stored.

まず、図７に示すように、対話制御部６０が実施例２と同様に、音声認識部２０から入力テキスト２００を受信すると、意図理解モデルＭ１を参照することにより複数のトピックを推定する（処理ステップ０及び処理ステップ１）。そして対話制御部６０は、選択し
た１つのトピックについて、意図理解モデルＭ２を参照することにより、前記選択したトピックに対応する出力テキスト４００を取得し、取得した出力テキストを音声合成部４０に送信する（処理ステップ２）。 First, as shown in FIG. 7, when the dialogue control unit 60 receives the input text 200 from the voice recognition unit 20 as in the second embodiment, a plurality of topics are estimated by referring to the intention understanding model M1 (processing). Step 0 and processing step 1). Then, the dialogue control unit 60 acquires the output text 400 corresponding to the selected topic by referring to the intention understanding model M2 for the selected topic, and transmits the acquired output text to the speech synthesis unit 40. (Processing step 2).

ここで、対話制御部６０は、音声認識部２０から新たな入力テキスト２００として、複数のトピックが含まれている入力テキスト２００（「持っていない。預金したい。」）を受信したとする。 Here, it is assumed that the dialogue control unit 60 receives the input text 200 (“I do not have it. I want to make a deposit”) including a plurality of topics as a new input text 200 from the voice recognition unit 20.

すると、意図理解部８０は、この入力テキスト２００を、第１の分割テキスト２０１（「持っていない。」）と、第２の分割テキスト２０１（「預金したい。」）に分割する。 Then, the intention understanding unit 80 divides the input text 200 into a first split text 201 (“I do not have it”) and a second split text 201 (“I want to deposit”).

そして、意図理解部８０は、対話ログ９９を参照して、直前に行ったトピックの推定において参照された意図理解モデル９４を取得する。具体的には、意図理解部８０は、図１７に示すように、「2016年12月1日 14:00:01」に行ったトピックの推定に参照された「
意図理解モデルＭ１」を取得する。 Then, the intention understanding unit 80 refers to the dialogue log 99 and acquires the intention understanding model 94 referred to in the estimation of the topic performed immediately before. Specifically, as shown in FIG. 17, the intention understanding unit 80 referred to the topic estimation performed at “December 1, 2016 14:00:01”.
Intention understanding model M1 ”is acquired.

ここで、意図理解部８０は、第１の分割テキスト２０１（「持っていない。」）に対して、前記で取得した意図理解モデルＭ１を参照することにより、実施例１で説明した類似度を用いてトピックの推定を行う（第１の意図の候補）。また、意図理解部８０は、第１の分割テキスト２０１（「持っていない。」）に対して、現在参照されている意図理解モデルである意図理解モデルＭ２を参照することにより、実施例１で説明した類似度を用いてトピックの推定を行う（第２の意図の候補）。 Here, the intention understanding unit 80 obtains the similarity described in the first embodiment by referring to the intention understanding model M1 acquired above with respect to the first divided text 201 (“not having”). Use to estimate the topic (candidate for first intent). Further, in the first embodiment, the intention understanding unit 80 refers to the intention understanding model M2, which is the intention understanding model currently being referred to, with respect to the first divided text 201 (“does not have”). Estimate the topic using the similarities described (candidate for second intent).

そして、意図理解部８０は、意図理解モデルＭ１及び意図理解モデルＭ２に基づくトピック推定において算出された類似度のうちその値が大きい方のトピック推定に係るトピック（本実施例では、意図理解モデルＭ２を参照して推定された、トピックＩＤが「Ｉ１−２」のトピック「印鑑不所持」であるとする。）を、第１の分割テキスト２０１（「持っていない」）に対応するトピックと推定する。 Then, the intention understanding unit 80 is a topic related to the topic estimation having the larger value among the similarity calculated in the topic estimation based on the intention understanding model M1 and the intention understanding model M2 (in the present embodiment, the intention understanding model M2). The topic ID "I1-2", which is estimated by referring to "I1-2", is presumed to be a topic "I do not have a seal"), and is estimated to be a topic corresponding to the first divided text 201 ("I do not have"). To do.

意図理解部８０は、第２の分割テキスト２０１（「預金したい」）に対しても同様に、意図理解モデルＭ１を参照することにより、実施例１で説明した類似度を用いてトピックの推定を行う（第１の意図の候補）。また、意図理解部８０は、第２の分割テキスト２０１（「預金したい」）に対して、意図理解モデルＭ２を参照することにより、実施例１で説明した類似度を用いてトピックの推定を行う（第２の意図の候補）。 Similarly, the intention understanding unit 80 estimates the topic using the similarity described in the first embodiment by referring to the intention understanding model M1 for the second divided text 201 (“I want to deposit”). Do (candidate for first intent). Further, the intention understanding unit 80 estimates the topic using the similarity described in the first embodiment by referring to the intention understanding model M2 with respect to the second divided text 201 (“I want to deposit”). (Candidate for second intent).

そして、意図理解部８０は、意図理解モデルＭ１及び意図理解モデルＭ２に基づくトピック推定において算出された類似度のうちその値が大きい方のトピック推定に係るトピック（本実施例では、意図理解モデルＭ１を参照して推定された、トピックＩＤが「Ｉ２−１」のトピック「預金の手続き」であるとする。）を、第２の分割テキスト２０１（「持っていない」）に対応するトピックと推定する。 Then, the intention understanding unit 80 is a topic related to the topic estimation having the larger value among the similarity calculated in the topic estimation based on the intention understanding model M1 and the intention understanding model M2 (in the present embodiment, the intention understanding model M1). The topic ID "I2-1", which is estimated by referring to "I2-1", is the topic "Deposit procedure"), and is estimated to be the topic corresponding to the second split text 201 ("I do not have"). To do.

そして、意図理解部８０は、このように推定したトピック（「Ｉ１−２」及び「Ｉ２−１」のトピックＩＤリスト３００）を、実施例２と同様に対話制御部６０に送信する。 Then, the intention understanding unit 80 transmits the topics estimated in this way (topic ID list 300 of “I1-2” and “I2-1”) to the dialogue control unit 60 in the same manner as in the second embodiment.

なお、意図理解部８０は、前記で推定した各トピックについて、そのトピックＩＤ、トピック名、参照した意図理解モデルの情報、トピックの推定に用いられた入力テキスト２００の情報、推定日時、入力テキスト２００の属性の情報、及び入力テキスト２００の内容を対話ログ９９に追記することにより、対話ログ９９を更新する。 For each topic estimated above, the intention understanding unit 80 includes the topic ID, the topic name, the information of the referenced intention understanding model, the information of the input text 200 used for estimating the topic, the estimated date and time, and the input text 200. The dialogue log 99 is updated by adding the information of the attribute of the above and the content of the input text 200 to the dialogue log 99.

図１８は、対話ログ９９の更新後の一例を示す図である。同図に示すように、対話ログ
９９には、トピックＩＤが「Ｉ１−２」「Ｉ２−１」のトピックを推定するために参照された意図理解モデルである「意図理解モデルＭ１」及び「意図理解モデルＭ２」に係るレコード９９７が追加されている。 FIG. 18 is a diagram showing an example after updating the dialogue log 99. As shown in the figure, in the dialogue log 99, "intention understanding model M1" and "intention understanding model M1", which are intention understanding models referred to for estimating topics whose topic IDs are "I1-2" and "I2-1", are recorded. Record 997 related to "Understanding model M2" has been added.

以上に説明したように、本実施例に係る音声対話支援システム２０００は、発せられた言葉から意図を抽出する場合に、トピック情報及び関連トピック情報を参照することにより第１の意図の候補を取得すると共に、参照トピック情報を参照することにより第２の意図の候補を取得し、第１の候補及び第２の候補を所定の基準により比較することにより（例えば実施例１で説明した類似度により比較することにより）、第１の意図の候補又は第２の意図の候補のいずれかを、抽出する意図とするので、過去に行われたトピックの対話に基づいてトピックの推定を合理的に行うことができる。 As described above, the voice dialogue support system 2000 according to the present embodiment acquires the first intention candidate by referring to the topic information and the related topic information when extracting the intention from the spoken words. At the same time, the candidate of the second intention is acquired by referring to the reference topic information, and the first candidate and the second candidate are compared according to a predetermined criterion (for example, by the similarity described in Example 1). (By comparison), since either the candidate for the first intention or the candidate for the second intention is intended to be extracted, the topic is reasonably estimated based on the dialogue of the topic that has been performed in the past. be able to.

すなわち、本実施例に係る音声対話支援システム２０００は、対話シナリオ９１において、現在参照している意図理解モデル９４と直前に参照された意図理解モデルとを併用することにより、対話相手があるトピックについての対話中にその直前に話しておくべきであったトピックを思い出して対話する等、トピックが随時変化する場合であっても、それぞれのトピックについて適切な対話を行うことが可能となる。また、トピックに変化が生じる場合に備えて意図理解モデルにおけるトピック情報の記憶量を増やす必要もない。 That is, in the dialogue scenario 91, the voice dialogue support system 2000 according to the present embodiment uses the intention understanding model 94 currently referred to and the intention understanding model referred to immediately before in combination to cover a topic with a dialogue partner. Even if the topic changes from time to time, such as by remembering the topic that should have been talked about immediately before the dialogue, it is possible to have an appropriate dialogue on each topic. In addition, it is not necessary to increase the amount of topic information stored in the intention understanding model in case the topic changes.

以上の各実施例の説明は、本発明の理解を容易にするためのものであり、本発明を限定するものではない。本発明はその趣旨を逸脱することなく、変更、改良され得ると共に本発明にはその等価物が含まれる。 The above description of each embodiment is for facilitating the understanding of the present invention, and does not limit the present invention. The present invention can be modified and improved without departing from the spirit of the present invention, and the present invention includes its equivalents.

例えば、各実施例では、対話シナリオ９１は全てのトピックで共通しているものとしたが、トピックごとに異なる対話シナリオ９１を使用してもよい。 For example, in each embodiment, the dialogue scenario 91 is common to all topics, but different dialogue scenarios 91 may be used for each topic.

また、各実施例において音声対話支援システム２０００は音声対話型ロボット（サービスロボット）であるものとしたが、音声対話支援システム２０００はこれらの他、タブレット端末やチャットボット（Chatbot）であってもよい。音声対話支援システム２０００
は、音声又はテキストを入力とした対話システムに広く適用することができる。 Further, in each embodiment, the voice dialogue support system 2000 is assumed to be a voice dialogue type robot (service robot), but the voice dialogue support system 2000 may be a tablet terminal or a chatbot in addition to these. .. Voice dialogue support system 2000
Can be widely applied to dialogue systems with voice or text input.

２０００音声対話支援システム、１０００対話支援システム、６２対話生成部、６４対話出力部、８０意図理解部、８５意図理解モデル記憶部 2000 Voice dialogue support system, 1000 Dialogue support system, 62 Dialogue generation unit, 64 Dialogue output unit, 80 Intention understanding unit, 85 Intention understanding model storage unit

Claims

プロセッサ及びメモリを備える対話支援システムであって、
対話の主題を示す情報であるトピック情報と当該主題に関する言葉とを対応づけた情報である意図理解モデルを、異なる主題について複数記憶している意図理解モデル記憶部と、
外部から発せられた言葉を複数の言葉の部分に分割する分割部と、
前記分割した言葉と、前記意図理解モデルのそれぞれのトピック情報との間の類似度を算出し、算出した前記類似度のうち最大の類似度を算出した前記意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する意図理解部と、
前記抽出した意図のそれぞれについて、当該意図に関連づけられた、発話の手順を記憶した情報である対話シナリオを取得し、取得した前記対話シナリオが示す手順に基づき、当該意図に対応した言葉を、前記対話を構成する言葉として生成する対話生成部と、
を備え、
前記意図理解部は、前記分割した言葉に対応する意図を抽出する場合に、複数の意図理解モデルのそれぞれのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度のうち最大の類似度を算出した意図理解モデルを特定すると共に、直前に発せられた、分割された言葉の意図の抽出に使用された意図理解モデルのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度と前記最大の類似度とを比較し、類似度が高い方の意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する
対話支援システム。 A dialogue support system with a processor and memory
An intention understanding model storage unit that stores a plurality of intention understanding models, which are information that associates topic information that indicates the subject of dialogue with words related to the subject, for different subjects.
A division part that divides words emitted from the outside into multiple word parts,
By calculating the similarity between the divided words and the topic information of each topic information of the intention understanding model, and referring to the intention understanding model for which the maximum similarity among the calculated similarity is calculated, the said. The intention understanding department that extracts the intention corresponding to the divided words,
For each of the extracted intentions, a dialogue scenario, which is information related to the intention and memorizes the utterance procedure, is acquired, and based on the procedure indicated by the acquired dialogue scenario, words corresponding to the intention are described. A dialogue generator that is generated as words that make up the dialogue ,
With
The intention understanding unit calculates the similarity between the topic information of each of the plurality of intention understanding models and the divided words when extracting the intention corresponding to the divided words, and the calculated similarity. Between the topic information of the intention understanding model used to extract the intention of the divided words issued immediately before and the topic information of the divided words, while identifying the intention understanding model for which the maximum similarity was calculated. Dialogue support to extract the intention corresponding to the divided words by calculating the similarity of the above, comparing the calculated similarity with the maximum similarity, and referring to the intention understanding model with the higher similarity. system.

前記対話シナリオの入力を受け付けると共に受け付けた前記対話シナリオに関する情報を出力する対話シナリオ作成部を備える、請求項１に記載の対話支援システム。 The dialogue support system according to claim 1 , further comprising a dialogue scenario creation unit that accepts input of the dialogue scenario and outputs information about the received dialogue scenario.

外部から入力された音声を文字列に変換することにより前記外部から発せられた言葉とする音声認識部と、
前記対話を構成する言葉を音声に変換する音声合成部と、
を備える、請求項１に記載の対話支援システム。 A voice recognition unit that converts a voice input from the outside into a character string and converts it into a word emitted from the outside.
A speech synthesizer that converts the words that make up the dialogue into speech,
The dialogue support system according to claim 1.

前記意図理解モデル記憶部は、第１の前記意図理解モデルにおける前記トピック情報と、当該トピック情報が示すトピックに関連するトピックを示す情報である関連トピック情報を含む第２の前記意図理解モデルとを対応づけて記憶しており、
前記意図理解部は、前記トピック情報及び前記関連トピック情報を参照することにより、前記分割された言葉に対する意図を抽出する、請求項１に記載の対話支援システム。 The intention understanding model storing unit, and the topic information in the first of the intention understanding model, and a second of the intention understanding model including relevant topics information indicating the topics related to topics that the topic information I remember it in association with it
The dialogue support system according to claim 1, wherein the intention understanding unit extracts an intention for the divided words by referring to the topic information and the related topic information.

外部から入力された音声を文字列に変換することにより前記外部から発せられた言葉とする音声認識部と、
前記対話を構成する言葉を音声に変換する音声合成部と、
を備え、
前記意図理解モデル記憶部は、第１の前記意図理解モデルにおける前記トピック情報と、当該トピック情報が示すトピックに関連するトピックを示す情報である関連トピック情報を含む第２の前記意図理解モデルとを対応づけて記憶しており、
前記意図理解部は、前記トピック情報及び前記関連トピック情報を参照することにより、前記分割された言葉に対する意図を抽出する、
請求項１に記載の対話支援システム。 A voice recognition unit that converts a voice input from the outside into a character string and converts it into a word emitted from the outside .
A speech synthesizer that converts the words that make up the dialogue into speech,
With
The intention understanding model storing unit, and the topic information in the first of the intention understanding model, and a second of the intention understanding model including relevant topics information indicating the topics related to topics that the topic information I remember it in association with it
The intention understanding unit extracts the intention for the divided words by referring to the topic information and the related topic information.
The dialogue support system according to claim 1.

プロセッサ及びメモリを備える情報処理装置が、
対話の主題を示す情報であるトピック情報と当該主題に関する言葉とを対応づけた情報である意図理解モデルを、異なる主題について複数記憶する意図理解モデル記憶処理と、
外部から発せられた言葉を複数の言葉の部分に分割する分割処理と、
前記分割した言葉と、前記意図理解モデルのそれぞれのトピック情報との間の類似度を算出し、算出した前記類似度のうち最大の類似度を算出した前記意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する意図理解処理と、
前記抽出した意図のそれぞれについて、当該意図に関連づけられた、発話の手順を記憶した情報である対話シナリオを取得し、取得した前記対話シナリオが示す手順に基づき、当該意図に対応した言葉を、前記対話を構成する言葉として生成する対話生成処理と、
前記生成した言葉を出力する対話出力処理と、
を実行し、
前記意図理解処理は、前記分割した言葉に対応する意図を抽出する場合に、複数の意図理解モデルのそれぞれのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度のうち最大の類似度を算出した意図理解モデルを特定すると共に、直前に発せられた、分割された言葉の意図の抽出に使用された意図理解モデルのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度と前記最大の類似度とを比較し、類似度が高い方の意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する処理である
対話支援方法。 An information processing device equipped with a processor and memory
Intention understanding model memory processing that stores multiple intention understanding models, which are information that associates topic information that indicates the subject of dialogue with words related to the subject, for different subjects.
A division process that divides words emitted from the outside into multiple word parts,
By calculating the similarity between the divided words and the topic information of each topic information of the intention understanding model, and referring to the intention understanding model for which the maximum similarity among the calculated similarity is calculated, the said. Intention understanding processing that extracts the intention corresponding to the divided words,
For each of the extracted intentions, a dialogue scenario, which is information related to the intention and memorizes the utterance procedure, is acquired, and based on the procedure indicated by the acquired dialogue scenario, words corresponding to the intention are described. Dialogue generation processing that is generated as words that make up the dialogue,
Dialogue output processing that outputs the generated words and
The execution,
In the intention understanding process, when extracting the intention corresponding to the divided words, the similarity between the topic information of each of the plurality of intention understanding models and the divided words is calculated, and the calculated similarity is calculated. Between the topic information of the intention understanding model used to extract the intention of the divided words issued immediately before and the topic information of the divided words, while identifying the intention understanding model for which the maximum similarity was calculated. In the process of calculating the similarity of the above, comparing the calculated similarity with the maximum similarity, and referring to the intention understanding model with the higher similarity, the intention corresponding to the divided words is extracted. A dialogue support method.

プロセッサ及びメモリを備える情報処理装置に、
対話の主題を示す情報であるトピック情報と当該主題に関する言葉とを対応づけた情報である意図理解モデルを、異なる主題について複数記憶している意図理解モデル記憶処理と、
外部から発せられた言葉を複数の言葉の部分に分割する分割処理と、
前記分割した言葉と、前記意図理解モデルのそれぞれのトピック情報との間の類似度を算出し、算出した前記類似度のうち最大の類似度を算出した前記意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する意図理解処理と、
前記抽出した意図のそれぞれについて、当該意図に関連づけられた、発話の手順を記憶した情報である対話シナリオを取得し、取得した前記対話シナリオが示す手順に基づき、当該意図に対応した言葉を、前記対話を構成する言葉として生成する対話生成処理と、
前記生成した言葉を出力する対話出力処理と、
を実行させ、
前記意図理解処理は、前記分割した言葉に対応する意図を抽出する場合に、複数の意図理解モデルのそれぞれのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度のうち最大の類似度を算出した意図理解モデルを特定すると共に、直前に発せられた、分割された言葉の意図の抽出に使用された意図理解モデルのトピック情報と、前記分割した言葉との間の類似度を算出し、算出した類似度と前記最大の類似度とを比較し、類似度が高い方の意図理解モデルを参照することにより、前記分割した言葉に対応する意図を抽出する処理である
対話支援プログラム。
For information processing devices equipped with a processor and memory
Intention understanding model memory processing that stores multiple intention understanding models, which are information that associates topic information that indicates the subject of dialogue with words related to the subject, for different subjects.
A division process that divides words emitted from the outside into multiple word parts,
By calculating the similarity between the divided words and the topic information of each topic information of the intention understanding model, and referring to the intention understanding model for which the maximum similarity among the calculated similarity is calculated, the said. Intention understanding processing that extracts the intention corresponding to the divided words,
For each of the extracted intentions, a dialogue scenario, which is information related to the intention and memorizes the utterance procedure, is acquired, and based on the procedure indicated by the acquired dialogue scenario, words corresponding to the intention are described. Dialogue generation processing that is generated as words that make up the dialogue,
Dialogue output processing that outputs the generated words and
To run ,
In the intention understanding process, when extracting the intention corresponding to the divided words, the similarity between the topic information of each of the plurality of intention understanding models and the divided words is calculated, and the calculated similarity is calculated. Between the topic information of the intention understanding model used to extract the intention of the divided words issued immediately before and the topic information of the divided words, while identifying the intention understanding model for which the maximum similarity was calculated. In the process of calculating the similarity of the above, comparing the calculated similarity with the maximum similarity, and referring to the intention understanding model with the higher similarity, the intention corresponding to the divided words is extracted. A dialogue support program.