JP6755509B2

JP6755509B2 - Dialogue method, dialogue system, dialogue scenario generation method, dialogue scenario generator, and program

Info

Publication number: JP6755509B2
Application number: JP2018518374A
Authority: JP
Inventors: 弘晃杉山; 豊美目黒; 淳司大和; 雄一郎吉川; 石黒　浩; 浩石黒
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2016-05-20
Filing date: 2017-05-19
Publication date: 2020-09-16
Anticipated expiration: 2037-05-19
Also published as: JPWO2017200075A1; WO2017200075A1

Description

この発明は、人とコミュニケーションを行うロボットなどに適用可能な、コンピュータが人間と自然言語を用いて対話を行う技術に関する。 The present invention relates to a technique in which a computer interacts with a human using natural language, which is applicable to a robot or the like that communicates with a human.

近年、人とコミュニケーションを行うロボットの研究開発が進展しており、様々な現場で実用化されてきている。例えば、コミュニケーションセラピーの現場において、ロボットが孤独感を抱える人の話し相手となる利用形態がある。具体的には、老人介護施設においてロボットが入居者の傾聴役となることで、入居者の孤独感を癒す役割を担うことができると共に、ロボットとの会話している姿を見せ、入居者とその家族や介護士など周りの人々との会話のきっかけを作ることができる。また、例えば、コミュニケーション訓練の現場において、ロボットが練習相手となる利用形態がある。具体的には、外国語学習施設においてロボットが外国語学習者の練習相手となることで、外国語学習を効率的に進めることができる。また、例えば、情報提示システムとしての応用において、ロボット同士の対話を聞かせることを基本としながら、時折人に話しかけることで、退屈させずに人を対話に参加させ、人が受け入れやすい形で情報を提示することができる。具体的には、街中の待ち合わせ場所やバス停、駅のホームなどで人が時間を持て余している際や、自宅や教室などで対話に参加する余裕がある際に、ニュースや商品紹介、蘊蓄及び知識紹介、教育（例えば、子供の保育及び教育、大人への一般教養教授、モラル啓発など）など、効率的な情報提示が期待できる。さらに、例えば、情報収集システムとしての応用において、ロボットが人に話しかけながら情報を収集する利用形態がある。ロボットとのコミュニケーションにより対話感を保持できるため、人に聴取されているという圧迫感を与えずに情報収集することができる。具体的には、個人情報調査や市場調査、商品評価、推薦商品のための趣向調査などに応用することが想定されている。このように人とロボットのコミュニケーションは様々な応用が期待されており、ユーザとより自然に対話を行うロボットの実現が期待される。また、スマートフォンの普及により、LINE(登録商標)のように、複数ユーザでほぼリアルタイムにチャットを行うことにより、ユーザ間での会話を楽しむチャットサービスも実施されている。このチャットサービスにユーザとロボットとの会話の技術を適用すれば、チャット相手となるユーザがいなくても、ユーザとより自然に会話を行うチャットサービスの実現が可能となる。本明細書では、これらのサービスで用いられるロボットやチャット相手などのユーザの対話相手となるハードウェアやユーザの対話相手となるハードウェアとしてコンピュータを機能させるためのコンピュータソフトウェアなどを総称してエージェントと呼ぶこととする。エージェントは、ユーザの対話相手となるものであるため、ロボットやチャット相手などのように擬人化されていたり、人格化されていたり、性格や個性を有していたりするものであってもよい。 In recent years, research and development of robots that communicate with humans have progressed, and they have been put to practical use in various fields. For example, in the field of communication therapy, there is a usage pattern in which a robot is a conversation partner for a person who has a feeling of loneliness. Specifically, by having the robot listen to the resident in the elderly care facility, it can play a role in healing the loneliness of the resident, and at the same time, it shows a conversation with the robot and talks with the resident. You can create an opportunity for conversation with the people around you, such as the family and caregivers. Further, for example, in the field of communication training, there is a usage pattern in which a robot is a training partner. Specifically, by using a robot as a practice partner for a foreign language learner in a foreign language learning facility, foreign language learning can be efficiently promoted. In addition, for example, in the application as an information presentation system, while listening to the dialogue between robots as a basis, by occasionally talking to people, people can participate in the dialogue without getting bored, and information in a form that is easy for people to accept. Can be presented. Specifically, when people have time to spare at meeting places, bus stops, station platforms, etc. in the city, or when they can afford to participate in dialogue at home or in the classroom, news, product introduction, education and knowledge Efficient information presentation such as introduction and education (for example, childcare and education, general education professor for adults, moral enlightenment, etc.) can be expected. Further, for example, in an application as an information collecting system, there is a usage form in which a robot collects information while talking to a person. Since the feeling of dialogue can be maintained by communicating with the robot, it is possible to collect information without giving a feeling of oppression that a person is listening. Specifically, it is expected to be applied to personal information surveys, market surveys, product evaluations, and taste surveys for recommended products. In this way, human-robot communication is expected to have various applications, and it is expected that robots that can interact more naturally with users will be realized. In addition, with the spread of smartphones, chat services such as LINE (registered trademark) are being implemented to enjoy conversations between users by having multiple users chat in near real time. By applying the technology of conversation between a user and a robot to this chat service, it is possible to realize a chat service that allows a more natural conversation with a user even if there is no user to chat with. In this specification, agents are used as a general term for hardware used in these services, such as robots and chat partners, which are the conversation partners of users, and computer software for operating a computer as hardware which is the conversation partner of users. I will call it. Since the agent is a conversation partner of the user, it may be anthropomorphic, personalized, or have a personality or individuality, such as a robot or a chat partner.

これらのサービスの実現のキーとなるのは、ハードウェアやコンピュータソフトウェアにより実現されるエージェントが人間と自然に対話を行うことができる技術である。 The key to the realization of these services is the technology that enables agents realized by hardware and computer software to interact naturally with humans.

対話システムの従来技術として非特許文献１，２が知られている。非特許文献１では、所定のシナリオに沿って発話を生成する。また、非特許文献１では、人の発話に依らず、「そっか」や「ふーん」等の相槌や曖昧な回答を示す発話を生成する。非特許文献２では、一つ以上前の人または対話システムの発話からのみに基づいて次の発話を生成する。 Non-Patent Documents 1 and 2 are known as prior art of a dialogue system. In Non-Patent Document 1, utterances are generated according to a predetermined scenario. Further, in Non-Patent Document 1, utterances showing utterances such as "sokka" and "huh" and ambiguous answers are generated regardless of human utterances. Non-Patent Document 2 generates the next utterance based only on the utterance of one or more previous persons or dialogue systems.

有本庸浩，吉川雄一郎，石黒浩，「複数体のロボットによる音声認識なし対話の印象評価」，日本ロボット学会学術講演会,2016Yoshihiro Arimoto, Yuichiro Yoshikawa, Hiroshi Ishiguro, "Impression Evaluation of Dialogue without Speech Recognition by Multiple Robots", Academic Lecture by the Robotics Society of Japan, 2016 杉山弘晃、目黒豊美、東中竜一郎、南泰浩、「任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成」，人工知能学会論文誌，2015, 30(1), 183-194.Hiroaki Sugiyama, Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, "Generation of response sentences using dependencies and examples for user utterances with arbitrary topics", Journal of the Japanese Society for Artificial Intelligence, 2015, 30 (1), 183- 194.

人と対話システムとの対話を継続することで(i)メンタルヘルスケアができる、(ii)エンターテイメントになる、(iii)コミュニケーションの練習になる、(iv)対話システムへの親近感が増す、等の効果を得ることができる。 By continuing the dialogue between people and the dialogue system, (i) mental health care can be achieved, (ii) entertainment, (iii) communication practice, (iv) increased familiarity with the dialogue system, etc. The effect of can be obtained.

しかしながら、非特許文献１のように所定のシナリオに沿って発話を生成するのでは、想定外の質問に答えられず、会話が続かない。また、非特許文献１では、質問したロボットが人間の返答に対して、「そっか」などの曖昧なレスポンスにとどめる。そのように人の発話を促した後は、別のロボットが少し話題をずらした発話をする。このようにすることで、人に、自分の発話が無視された感じを与えないようにしている。しかし、「そっか」などの曖昧なレスポンスが続くと、人は自分の発言を流されてばかりいると感じてしまい、会話が続かない。非特許文献２のように応答文を生成するのでは、１問１答になってしまい、会話が続かない。 However, if the utterance is generated according to a predetermined scenario as in Non-Patent Document 1, the unexpected question cannot be answered and the conversation cannot be continued. Further, in Non-Patent Document 1, the questioning robot responds only to a human response with an ambiguous response such as "Is it?". After prompting a person to speak in this way, another robot speaks with a slightly different topic. By doing this, I try not to give people the feeling that their utterances have been ignored. However, if an ambiguous response such as "I see" continues, people will feel that their remarks are being swept away, and the conversation will not continue. If a response sentence is generated as in Non-Patent Document 2, one question and one answer will be obtained, and the conversation will not continue.

本発明は、対話システムの発話の一部をいったん曖昧なものにして、その曖昧な部分を確認させる対話をするための発話を差し込ませることで、対話のターン数を増やすことができる対話方法、対話システム、対話シナリオ生成方法、対話シナリオ生成装置、及びプログラムを提供することを目的とする。 The present invention is a dialogue method capable of increasing the number of dialogue turns by temporarily obscuring a part of an utterance of a dialogue system and inserting an utterance for making a dialogue to confirm the ambiguous part. An object of the present invention is to provide a dialogue system, a dialogue scenario generation method, a dialogue scenario generator, and a program.

上記の課題を解決するために、本発明の一態様によれば、対話システムが行う対話方法は、対話システムが、発話を生成する発話生成ステップと、対話システムが、発話生成ステップが生成した発話の少なくとも一部を曖昧化する、または／および、発話生成ステップが生成した発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話を変換後発話として得る発話決定ステップと、対話システムが、発話決定ステップが得た変換後発話を提示する発話提示ステップと、を含む。 In order to solve the above problems, according to one aspect of the present invention, the dialogue method performed by the dialogue system is an utterance generation step in which the dialogue system generates an utterance and an utterance generated by the dialogue system in the utterance generation step. The utterance determination step that obtains the utterance generated as a post-conversion utterance by obscuring at least a part of the utterance or / and replacing the word contained in the utterance generated by the utterance generation step with a word having no meaning of the word. And the utterance presentation step in which the dialogue system presents the converted utterance obtained by the utterance determination step.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示ステップと、対話システムが、第１の発話を提示した後に、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話を提示する第２発話提示ステップと、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue method performed by the dialogue system causes the dialogue system to obscure at least a part of a predetermined utterance or / and a predetermined utterance. After the first utterance presentation step, which presents the first utterance, which is the utterance generated by replacing the word contained in the word with a word having no meaning, and after the dialogue system presents the first utterance, It includes a second utterance presentation step of presenting a second utterance, which is an utterance that can be read as not being able to uniquely interpret the first utterance.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示ステップと、対話システムが、第１の発話を提示した後に、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話を提示する第２発話提示ステップと、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue method performed by the dialogue system causes the dialogue system to obscure at least a part of a predetermined utterance or / and a predetermined utterance. After the first utterance presentation step, which presents the first utterance, which is the utterance generated by replacing the word contained in the word with a word having no meaning, and after the dialogue system presents the first utterance, Includes a second utterance presentation step of presenting a second utterance, which is an utterance that includes a question for identifying the first utterance into one meaning.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話、を提示する第１の発話提示ステップと、対話システムが、第１の発話提示ステップによる提示の後に、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、を提示する第２の発話提示ステップと、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue method performed by the dialogue system is meaningless to the dialogue system in at least partially obscured utterances and / or. The first utterance presentation step of presenting the utterance containing the word, and the dialogue system, after the presentation by the first utterance presentation step, the utterance containing the concrete content corresponding to the ambiguous part, or / and the meaning. Includes a second utterance presentation step of presenting an utterance that includes a word that has a meaning corresponding to a portion of the word that does not have.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, in the dialogue scenario generation method, the dialogue scenario generator generates the dialogue scenario used for the dialogue performed by the dialogue system. In the dialogue scenario generation method, the dialogue scenario generator is generated by obscuring at least a part of a predetermined utterance or / and replacing a word contained in the predetermined utterance with a word having no meaning of the word. Dialogue including the first utterance, which is the utterance made, and the second utterance, which is the utterance presented after the first utterance is presented and the utterance that can be read that the first utterance cannot be uniquely interpreted. Generate a scenario.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, in the dialogue scenario generation method, the dialogue scenario generator generates the dialogue scenario used for the dialogue performed by the dialogue system. In the dialogue scenario generation method, the dialogue scenario generator is generated by obscuring at least a part of a predetermined utterance or / and replacing a word contained in the predetermined utterance with a word having no meaning of the word. The first utterance, which is the utterance that was made, and the second utterance, which is the utterance presented after the first utterance is presented and includes a question for identifying the first utterance into one meaning. Generate a dialogue scenario that includes.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話である第１の発話と、第１の発話を提示した後に提示する発話であり、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, in the dialogue scenario generation method, the dialogue scenario generator generates the dialogue scenario used for the dialogue performed by the dialogue system. In the dialogue scenario generation method, the dialogue scenario generator presents a first utterance and a first utterance, which are at least partially ambiguous utterances and / and utterances containing meaningless words. A second utterance, which is an utterance to be presented later, which includes a specific content corresponding to an ambiguous part, or / and a word having a meaning corresponding to a meaningless word part. And generate a dialogue scenario that includes.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、発話を生成する発話生成部と、発話生成部が生成した発話の少なくとも一部を曖昧化する、または／および、発話生成ステップが生成した発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話を変換後発話として得る発話決定部と、発話決定部が得た変換後発話を提示する発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue system obscures at least a part of the utterance generator that generates the utterance and the utterance generated by the utterance generator, or /. In addition, the utterance determination unit that obtains the utterance generated by replacing the word included in the utterance generated by the utterance generation step with a word having no meaning of the word as the post-conversion utterance, and the post-conversion utterance obtained by the utterance determination unit. Includes an utterance presentation section that presents.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示部と、第１の発話を提示した後に、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話を提示する第２発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue system obscures at least a part of a predetermined utterance and / or refers to a word contained in the predetermined utterance. The first utterance presentation unit that presents the first utterance, which is the utterance generated by replacing it with a meaningless word, and the first utterance cannot be uniquely interpreted after the first utterance is presented. Includes a second utterance presentation unit that presents a second utterance that is an utterance that can be read.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示部と、第１の発話を提示した後に、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話を提示する第２発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue system obscures at least a part of a predetermined utterance and / or refers to a word contained in the predetermined utterance. To identify the first utterance into one meaning after presenting the first utterance and the first utterance presenting section, which presents the first utterance, which is the utterance generated by replacing it with a meaningless word. The second utterance presenting unit, which presents the second utterance which is the utterance including the question of.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話、を提示した後に、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、を提示する提示部を含む。 To solve the above problems, according to another aspect of the invention, the dialogue system presents at least partially obscured utterances and / and utterances containing meaningless words. Later, it includes a presentation unit that presents an utterance containing a specific content corresponding to an ambiguous part, and / and an utterance containing a word having a meaning corresponding to a part of a word having no meaning.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話である、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, the dialogue scenario generator generates a dialogue scenario used for the dialogue performed by the dialogue system. The dialogue scenario generator is an utterance generated by obscuring at least a part of a predetermined utterance or / and replacing a word contained in the predetermined utterance with a word having no meaning of the word. And the second utterance, which is the utterance presented after the first utterance is presented, and the second utterance, which can be read that the first utterance cannot be uniquely interpreted, is generated.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, the dialogue scenario generator generates a dialogue scenario used for the dialogue performed by the dialogue system. The dialogue scenario generator is an utterance generated by obscuring at least a part of a predetermined utterance or / and replacing a word contained in the predetermined utterance with a word having no meaning of the word. And the second utterance, which is the utterance presented after the first utterance is presented, and is the utterance containing a question for identifying the first utterance into one meaning, and a dialogue scenario including ..

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話である第１の発話と、第１の発話を提示した後に提示する発話であり、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above problems, according to another aspect of the present invention, the dialogue scenario generator generates a dialogue scenario used for the dialogue performed by the dialogue system. The dialogue scenario generator is a first utterance, which is at least a partially ambiguous utterance or / and a utterance containing a meaningless word, and an utterance presented after the first utterance is presented. A dialogue scenario that includes a second utterance, which is an utterance containing concrete content corresponding to an ambiguous part, and / and a utterance containing a word having a meaning corresponding to a meaningless word part. To generate.

本発明によれば、対話のターン数を増やすことができるという効果を奏する。 According to the present invention, there is an effect that the number of turns of dialogue can be increased.

第一実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 1st Embodiment. 第一実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 1st Embodiment. 第二実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 2nd Embodiment. 第二実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 2nd Embodiment. 第三実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 3rd Embodiment. 第三実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 3rd Embodiment. 変形例３に係る対話システムを示す図。The figure which shows the dialogue system which concerns on modification 3.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described. In the drawings used in the following description, the same reference numerals are given to the components having the same function and the steps for performing the same processing, and duplicate description is omitted.

＜本発明の実施形態のポイント＞
本発明の実施形態では、ユーザと対話する対話システムであって、複数のロボットであるロボットＲ１とロボットＲ２とを備える対話システムに、対話システムが生成した発話文（元の発話文）をそのまま発話させるのではなく、元の発話文の少なくとも一部を曖昧化する、または／および、元の発話文に含まれる語を当該語の意味を有さない語に置き換えることにより生成した文(以下、これらの方法により生成した文を「曖昧化した文」ともいう)に変換し、変換した文をあるロボットに発話させる。そしてその後に、一意に解釈できていないことを表出する文を他のロボットに発話させる、または／および、曖昧化した文を発話したロボットに元の発話文を発話させる（言い直しさせる）。ロボットが一意に解釈できていないことを表出する発話をすると、ユーザはその発話から、ロボットが一意に解釈できていないことを読み取ることができる。すなわち、一意に解釈できていないことを表出する文とは、一意に解釈できていないことが読み取れる文である。このようにすれば、対話システムが生成する発話文を増やさずに、ユーザに納得感を与えるロボットの発話を増やすことができ、結果としてユーザと対話システムとの対話のターンを増やすことができる。曖昧化した文とは、例えば、元の発話文の一部を(i)指示語で置換した文、(ii)言い間違った語で置換した文、あるいは(iii)省略した文等である。なお、(i)指示語で置換した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を指示語で置換した文を提示した場合には、指示語が何を意味するのかにより、2つ以上の意味に解釈される。(ii)言い間違った語で置換した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を言い間違った語で置換した文を提示した場合には、少なくとも(a)前後の文脈を考慮して、言い間違いのない本来の意味に解釈可能な文と、(b)言い間違った語で置換した文との2つの意味に解釈される。なお、言い間違った語が元の言葉とあまりにも違うと、対話感に影響が出るので、以下に例示するように、元の言葉と一音違う意味のある言葉等、音が似ていている語を言い間違った語として用いることが望ましい。(iii)省略した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を省略した文を提示した場合には、省略した部分に何が補われるのかにより、2つ以上の意味に解釈される。以下、元の発話文と変換した発話文の例を示す。
元の発話文：「『車』、燃費が大事だよね」
(i)指示語で置換した文：「『あれ』、燃費が大事だよね」
(ii)言い間違った語で置換した文：「『くるみ』、燃費が大事だよね」
(iii)省略した文：「『省略』燃費が大事だよね」<Points of Embodiment of the present invention>
In the embodiment of the present invention, the utterance sentence (original utterance sentence) generated by the dialogue system is directly spoken to the dialogue system including the robot R1 and the robot R2, which are dialogue systems that interact with the user. A sentence generated by obscuring at least a part of the original utterance, or / and replacing a word contained in the original utterance with a word that has no meaning (hereinafter, hereafter). The sentences generated by these methods are converted into "ambiguous sentences"), and the converted sentences are made to speak to a certain robot. Then, after that, another robot is made to utter a sentence expressing that it cannot be uniquely interpreted, or / and the robot that utters the ambiguous sentence is made to utter (rephrase) the original utterance sentence. When an utterance that expresses that the robot cannot be uniquely interpreted is made, the user can read from the utterance that the robot cannot be uniquely interpreted. That is, a sentence expressing that it cannot be uniquely interpreted is a sentence that can be read that it cannot be uniquely interpreted. In this way, it is possible to increase the utterances of the robot that give a sense of conviction to the user without increasing the utterance sentences generated by the dialogue system, and as a result, it is possible to increase the number of turns of dialogue between the user and the dialogue system. Ambiguous sentences are, for example, sentences in which a part of the original utterance sentence is replaced with (i) a demonstrative word, (ii) a sentence replaced with a wrong word, or (iii) an abbreviated sentence. In the case of (i) a sentence replaced with a directive, that is, when a sentence in which a part of the original utterance is replaced with a directive is presented without presenting the original utterance, what is the directive? It is interpreted as two or more meanings depending on what it means. (ii) In the case of a sentence replaced with the wrong word, that is, when a sentence in which a part of the original utterance is replaced with the wrong word without presenting the original utterance is presented, at least ( It is interpreted into two meanings: a) a sentence that can be interpreted in its original meaning without any mistakes, and (b) a sentence that is replaced with the wrong word, considering the context. If the wrong word is too different from the original word, the feeling of dialogue will be affected, so as shown in the example below, the sounds are similar, such as words with a meaning that is one note different from the original word. It is desirable to use the word as a wrong word. (iii) In the case of an abbreviated sentence, that is, when a sentence in which a part of the original utterance sentence is omitted is presented without presenting the original utterance sentence, it depends on what is supplemented by the omitted part. Interpreted as one or more meanings. The following is an example of the original utterance sentence and the converted utterance sentence.
Original utterance: "'Car', fuel economy is important, isn't it?"
(i) Sentence replaced with a demonstrative: "'That', fuel efficiency is important, isn't it?"
(ii) Sentence replaced with the wrong word: "'Kurumi', fuel efficiency is important, isn't it?"
(iii) Omitted sentence: "'Omitted' fuel efficiency is important, isn't it?"

対話のターンを増やすために、本実施形態の対話システムにおいては、あるロボットＲ１に発話させるために生成された発話文を、曖昧化した文に変換し、曖昧化した文をロボットＲ１が発話する。そして、ロボットＲ１による曖昧化した文の発話の後に、曖昧化した文の内容を確認する発話文を別のロボットＲ２が発話する。ただし、対話システムにロボットＲ１一体しか含まれない場合は、ロボットＲ１による曖昧化した文の発話の後に、ロボットＲ１自身が曖昧化した文の内容を確認する発話文を発話してもよい。 In order to increase the number of dialogue turns, in the dialogue system of the present embodiment, the utterance sentence generated to make a certain robot R1 speak is converted into an ambiguous sentence, and the ambiguous sentence is uttered by the robot R1. .. Then, after the utterance of the ambiguous sentence by the robot R1, another robot R2 utters an utterance sentence for confirming the content of the ambiguous sentence. However, when the dialogue system includes only one robot R1, the robot R1 itself may utter an utterance sentence confirming the content of the ambiguous sentence after the utterance of the ambiguous sentence by the robot R1.

なお、ユーザと対話システムとの対話中の任意のタイミングで曖昧化した文の発話を挿入してよいが、対話が長くなりすぎないように留意する必要がある。元の発話をそのまま発しても、ユーザがロボットの発話を理解あるいはそれに共感を持ちにくいと判断される場合に曖昧化した文の発話を挿入すると特に効果的である。例えば、(A)対話システムが話題を転換する（例えば、シナリオ対話を開始する）タイミング、(B)対話システムの発話に対するユーザの返答が対話システムの予測する返答からはずれているとき、(C)対話システムが話題の変化を検出したとき、などにロボットに曖昧化した文の発話させるとよい。対話システムが話題の変化を検出する方法としては、例えば、対話中の文や単語を利用して、(a)word2vecを利用した話題語間の距離、(b)文全体の単語をword2vecにかけて平均を取った場合の文間距離、(c)単語のcos類似度などを求め、距離が所定の値以上の場合またはcos類似度が所定の値以下の場合(要は、二つの発話が関連していない、または、二つの発話の関連が薄いことを所定の指標が示している場合)、話題が変化したと判定する方法がある。上述の（A）〜(C)等のタイミングは、ユーザが対話システムの発話内容を理解しづらくなるタイミングであるため、ロボットＲ１に曖昧化した文を発話させて、ロボットＲ１とロボットＲ２との間の対話を挿入することで、人と対話システムとの間の対話のターン数を増やすとともに、人に対話システムの発話内容の理解を促すことができる。 It should be noted that the utterance of the ambiguous sentence may be inserted at any time during the dialogue between the user and the dialogue system, but care must be taken not to make the dialogue too long. Even if the original utterance is uttered as it is, it is particularly effective to insert an ambiguous sentence utterance when it is judged that the user does not understand or sympathize with the robot's utterance. For example, (A) when the dialogue system changes topics (for example, to start a scenario dialogue), (B) when the user's response to the dialogue system's utterance deviates from the response predicted by the dialogue system, (C). When the dialogue system detects a change in the topic, it is advisable to have the robot utter an ambiguous sentence. As a method for the dialogue system to detect the change of the topic, for example, using the sentence or word in the dialogue, (a) the distance between the topic words using word2vec, (b) the word of the whole sentence is averaged by word2vec. Find the inter-sentence distance, (c) cos similarity of words, etc., and if the distance is greater than or equal to a predetermined value or if the cos similarity is less than or equal to a predetermined value (in short, two utterances are related). There is a way to determine that the topic has changed) if it is not, or if a given indicator indicates that the two utterances are not closely related. Since the timings (A) to (C) described above are timings that make it difficult for the user to understand the utterance content of the dialogue system, the robot R1 is made to utter an ambiguous sentence, and the robot R1 and the robot R2 By inserting the dialogue between, it is possible to increase the number of turns of the dialogue between the person and the dialogue system and to encourage the person to understand the utterance content of the dialogue system.

前述のように、元の発話文を(i)指示語で置換した文、(ii)言い間違った語で置換した文、(iii)省略した文、に変換するときの、指示語の対象となる語、言い間違いの対象となる語、省略の対象となる語に特に限定はないが、例えば、主要な語を対象とする。例えば、tf-idf(文書中の単語に関する重み)に基づき、元の発話文に含まれる語のうちの重みの大きい語を対象となる語として選択してもよい。また、元の発話文に含まれる語のうちで、他の語との関係で、上位概念となる語を対象となる語として選択してもよい。例えば、元の発話文に『セダン』と『車』とが含まれる場合、語『セダン』との関係で、上位概念となる語『車』を対象となる語として選択することができる。 As mentioned above, the target of the directive when converting the original spoken sentence into (i) a sentence replaced with a directive, (ii) a sentence replaced with a wrong word, and (iii) an abbreviated sentence. There is no particular limitation on the word, the word that is the target of misunderstanding, and the word that is the target of omission, but for example, the main words are targeted. For example, based on tf-idf (weight related to a word in a document), a word having a large weight among the words contained in the original utterance sentence may be selected as the target word. Further, among the words included in the original utterance sentence, the word that is a higher-level concept may be selected as the target word in relation to other words. For example, when the original utterance includes "sedan" and "car", the superordinate concept word "car" can be selected as the target word in relation to the word "sedan".

以下、対話の例を示す。以下の対話の例は、発話t(1)、発話t(2)、・・・の順に発話されているものとする。なお、Ｘ→ＹはＸからＹに対して発話していることを意味し、『』内に指示語、言い間違い、省略の何れかを示す。 An example of dialogue is shown below. In the following example of dialogue, it is assumed that utterance t (1), utterance t (2), ... Are spoken in this order. Note that X → Y means that X is speaking to Y, and either a demonstrative word, a mistaken word, or an abbreviation is indicated in "".

（例１：指示語）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『あれ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：それって車の話？
発話t(3):ロボットＲ１→ロボットＲ２：そう、車。僕、車、セダンがすきなんだよね(Example 1: Demonstrative)
Utterance t (1): Robot R1 → Robot R2: I like "that" and sedans. Utterance t (2): Robot R2 → Robot R1: Is that a car story?
Utterance t (3): Robot R1 → Robot R2: Yes, a car. I like cars and sedans, right?

（例２：省略）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『省略』セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：それって何の話？
発話t(3):ロボットＲ１→ロボットＲ２：うん、車。僕、車、セダンがすきなんだよね(Example 2: Omitted)
Utterance t (1): Robot R1 → Robot R2: I like the "omitted" sedan. Utterance t (2): Robot R2 → Robot R1: What is that?
Utterance t (3): Robot R1 → Robot R2: Yeah, a car. I like cars and sedans, right?

（例３：言い間違い）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『くるみ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：え、何の話？
発話t(3):ロボットＲ１→ロボットＲ２：ごめん、車。僕、車、セダンがすきなんだよね(Example 3: Misstatement)
Utterance t (1): Robot R1 → Robot R2: I like "walnuts" and sedans. Utterance t (2): Robot R2 → Robot R1: What's the story?
Utterance t (3): Robot R1 → Robot R2: Sorry, car. I like cars and sedans, right?

なお、例１〜３では、曖昧化した発話t(1)の直後に対話システムが行う発話（この例ではロボットＲ２の発話t(2)）は、曖昧化した発話である第１の発話t(1)の曖昧化されている部分を1つの意味に特定する語を含む発話である。しかし、曖昧化した発話t(1)の直後に対話システムが行う発話は、このような発話に限らず、一意に解釈できていないことの表出する発話、すなわち、一意に解釈できていないことが読み取れる発話、であればよく、例えば、次のような発話でもよい。 In Examples 1 to 3, the utterance made by the dialogue system immediately after the ambiguous utterance t (1) (in this example, the utterance t (2) of the robot R2) is the first utterance t which is an ambiguous utterance. It is an utterance that includes a word that identifies the ambiguous part of (1) as one meaning. However, the utterances made by the dialogue system immediately after the ambiguous utterance t (1) are not limited to such utterances, but are utterances that express that they cannot be uniquely interpreted, that is, they cannot be uniquely interpreted. Any utterance that can be read, for example, the following utterance may be used.

（例４：言い間違い）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『くるみ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：ごめん、意味が分からない。
発話t(3):ロボットＲ１→ロボットＲ２：ごめん、車。僕、車、セダンがすきなんだよね(Example 4: Misstatement)
Utterance t (1): Robot R1 → Robot R2: I like "walnuts" and sedans. Utterance t (2): Robot R2 → Robot R1: I'm sorry, I don't understand the meaning.
Utterance t (3): Robot R1 → Robot R2: Sorry, car. I like cars and sedans, right?

この例では、ロボットＲ２の発話「ごめん、意味が分からない。」は、曖昧化した発話である第１の発話t(1)の曖昧化されている部分を1つの意味に特定する語を含む発話とは言えないが、ロボットＲ２の発話t(2)の対象となった発話をしたロボットであるロボットＲ１が曖昧化した発話を特定する語を発話せざるを得ないようにする発話である。例１〜４の曖昧化した発話t(1)の直後に対話システムが行う発話t(2)は、まとめると、一意に解釈できていないことの表出する発話と言え、別の言い方をすると、一意に解釈できていないことが読み取れる発話と言え、さらに別の言い方をすると、1つの意味に特定する語を含む発話をさせるための発話と言える。 In this example, the utterance "I'm sorry, I don't understand the meaning" of the robot R2 includes a word that identifies the ambiguous part of the first utterance t (1), which is an ambiguous utterance, into one meaning. Although it cannot be said to be an utterance, it is an utterance that forces the robot R1, which is the utterance targeted by the utterance t (2) of the robot R2, to utter a word that specifies an ambiguous utterance. .. The utterance t (2) made by the dialogue system immediately after the ambiguous utterance t (1) in Examples 1 to 4 can be summarized as an utterance that expresses that it cannot be uniquely interpreted. , It can be said that it is an utterance that cannot be uniquely interpreted, and in other words, it is an utterance that includes a word that is specific to one meaning.

なお、上述の対話の例には、発話を誰に対して行っているかを記載してあるが、発話を誰に対して行うかを限定する必要はない。例えば、例１はロボットＲ１とロボットＲ２との間の対話であるが、ロボットＲ１とロボットＲ２と人との間の対話であってもよい。なお、発話を誰に対して行っているかを限定する場合には、例えば、ロボットの頭部や視線の動きにより発話の対象となる相手が誰であるかを表出するようにすればよい。 In the above-mentioned example of dialogue, it is described to whom the utterance is made, but it is not necessary to limit to whom the utterance is made. For example, Example 1 is a dialogue between the robot R1 and the robot R2, but it may be a dialogue between the robot R1 and the robot R2 and a person. When limiting to whom the utterance is being made, for example, the movement of the robot's head or line of sight may be used to indicate who the utterance target is.

＜第一実施形態＞
図１は第一実施形態に係る対話システム１００の機能ブロック図を、図２は第一実施形態に係る対話システム１００の処理フローを示す。<First Embodiment>
FIG. 1 shows a functional block diagram of the dialogue system 100 according to the first embodiment, and FIG. 2 shows a processing flow of the dialogue system 100 according to the first embodiment.

対話システム１００は、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。対話装置１９０は、音声合成部１１０と、発話生成部１５０と、発話決定部１２０とを含む。ロボットＲ１は提示部１０１−１を含み、ロボットＲ２は提示部１０１−２を含む。提示部１０１−１、１０１−２は、ロボットＲ１、Ｒ２の周囲に音響信号を発するものであり、例えばスピーカである。 The dialogue system 100 includes robots R1 and R2 and a dialogue device 190. The dialogue device 190 includes a voice synthesis unit 110, an utterance generation unit 150, and an utterance determination unit 120. The robot R1 includes a presentation unit 101-1, and the robot R2 includes a presentation unit 101-2. The presentation units 101-1 and 101-2 emit acoustic signals around the robots R1 and R2, and are, for example, speakers.

対話システム１００は、ユーザである人が２体のロボットであるＲ１とＲ２と対話するためのものであり、対話装置１９０が生成した発話音声（合成音声データ）をロボットＲ１、Ｒ２が発話するものである。以下、対話システム１００が行う動作の流れを説明する。 The dialogue system 100 is for a user to interact with two robots R1 and R2, and the robots R1 and R2 speak utterance voices (synthetic voice data) generated by the dialogue device 190. Is. Hereinafter, the flow of operations performed by the dialogue system 100 will be described.

発話生成部１５０は、発話文(テキストデータ)を生成し（Ｓ１）、発話決定部１２０及び音声合成部１１０に出力する。以下、この発話文をオリジナル発話文ともいう。発話生成部１５０内には、例えば、非特許文献２に記載された「雑談対話システム」と呼ばれる対話システムのように、入力された単語をトリガーとして、あらかじめ記述しておいたルールに従って発話のテキストを生成して出力する対話システムが備えられている。事前に設定された単語に基づき発話生成部１５０内に備えられた当該対話システムが、オリジナル発話文を生成して出力する。 The utterance generation unit 150 generates an utterance sentence (text data) (S1) and outputs it to the utterance determination unit 120 and the voice synthesis unit 110. Hereinafter, this utterance sentence is also referred to as an original utterance sentence. In the utterance generation unit 150, for example, as in the dialogue system called "chat dialogue system" described in Non-Patent Document 2, the input word is used as a trigger and the utterance text is uttered according to the rules described in advance. Is equipped with an interactive system that generates and outputs. The dialogue system provided in the utterance generation unit 150 based on a preset word generates and outputs an original utterance sentence.

または、発話生成部１５０内には、例えば、非特許文献１に記載された「シナリオ対話システム」と呼ばれる対話システムのように、事前に設定された単語が対話システム内に予め記憶されたシナリオの選択肢に対応する場合に、その選択肢に対応して予め記憶された発話のテキストを選択して出力する対話システムが備えられている。発話生成部１５０内に備えられた当該対話システムが予め記憶しているテキストからオリジナル発話文を選択して出力する。ここで、事前に設定された単語に基づいてオリジナル発話文を生成する例で説明したが、事前に単語を設定していなくてもよい。例えば、オリジナル発話文生成時点が継続中の対話の一時点である場合には、事前に設定した単語に代わり、オリジナル発話文生成時点より前の対話中の単語（トピック等）を用いてもよい。 Alternatively, in the utterance generation unit 150, a preset word is stored in advance in the dialogue system, such as a dialogue system called "scenario dialogue system" described in Non-Patent Document 1. When corresponding to an option, a dialogue system is provided that selects and outputs a pre-stored utterance text corresponding to the option. The original utterance sentence is selected and output from the text stored in advance by the dialogue system provided in the utterance generation unit 150. Here, the example of generating an original utterance sentence based on a preset word has been described, but the word may not be set in advance. For example, when the time when the original utterance sentence is generated is one time point of the ongoing dialogue, the word (topic, etc.) in the dialogue before the time when the original utterance sentence is generated may be used instead of the preset word. ..

発話決定部１２０は、発話生成部１５０から入力されたオリジナル発話文を受け取り、オリジナル発話文の少なくとも一部を曖昧化することにより生成した発話文を変換後発話文（テキストデータ）として得（Ｓ２）、音声合成部１１０に出力する。なお、前述の発話文の少なくとも一部を(i)指示語で置換する処理、(ii)言い間違った語で置換する処理、あるいは、(iii)省略する処理、が発話文を曖昧化することに相当する。なお、発話文の少なくとも一部を言い間違った語に置換する処理とは、発話文に含まれる語を当該語の意味を有さない語に置換する処理とも言える。 The utterance determination unit 120 receives the original utterance sentence input from the utterance generation unit 150, and obtains the utterance sentence generated by obscuring at least a part of the original utterance sentence as the converted utterance sentence (text data) (S2). ), Output to the voice synthesis unit 110. It should be noted that (i) the process of replacing at least a part of the above-mentioned utterance sentence with a demonstrative word, (ii) the process of replacing with a wrong word, or (iii) the process of omitting the utterance sentence obscures the utterance sentence. Corresponds to. The process of replacing at least a part of the utterance sentence with a wrong word can be said to be a process of replacing a word included in the utterance sentence with a word having no meaning of the word.

音声合成部１１０は、発話決定部１２０から入力された変換後発話文（テキストデータ）に対して音声合成を行い合成音声データを得て（Ｓ３）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。 The voice synthesis unit 110 performs voice synthesis on the converted utterance sentence (text data) input from the utterance determination unit 120 to obtain synthetic voice data (S3), and presents the obtained synthetic voice data to the robot R1. Output to unit 101-1.

提示部１０１−１は、音声合成部１１０から入力された変換後発話文の合成音声データに対応する音声を再生する、すなわち、変換後発話文をロボットＲ１の発話として提示する（Ｓ４）。合成音声データの提示先として、変換後発話文の合成音声データに対応する音声を再生したロボット自身を提示先とする場合には、独り言を話しているように処理を行えばよい。 The presentation unit 101-1 reproduces the voice corresponding to the synthesized voice data of the converted utterance sentence input from the voice synthesis unit 110, that is, presents the converted utterance sentence as the utterance of the robot R1 (S4). When the presentation destination of the synthetic voice data is the robot itself that reproduces the voice corresponding to the synthetic voice data of the converted speech sentence, the processing may be performed as if speaking to a soliloquy.

発話生成部１５０は、発話決定部１２０から入力された変換後発話文の内容を確認する発話文（以下、「確認発話文」ともいう）を生成し（Ｓ６）、音声合成部１１０へ出力する。なお、確認発話文は、変換後発話文を１つの意味に特定するための質問を含む。 The utterance generation unit 150 generates an utterance sentence (hereinafter, also referred to as “confirmation utterance sentence”) for confirming the content of the converted utterance sentence input from the utterance determination unit 120 (S6), and outputs the utterance sentence to the speech synthesis unit 110. .. The confirmation utterance sentence includes a question for specifying the converted utterance sentence as one meaning.

確認発話文とは、例えば、(i)正しい内容を特定して確認を行う発話文、(ii)内容を何ら特定せずに確認を行う発話文、(iii)間違った内容を特定して確認を行う発話文、である。例えば、(i)正しい内容を特定して確認を行う発話文としては「それって、ＸＸのこと？」(ここではＸＸは正しい内容であり、変換後発話文を１つの意味に特定する語である)との発話文、(ii)内容を何ら特定せずに確認を行う発話文としては「何のこと？」との発話文、(iii)間違った内容を特定して確認を行う発話文としては「それって、ＹＹのこと？」「ＹＹって言った？」「ＹＹって何のこと？」（ここではＹＹは間違いである）などの発話文が有り得る。発話生成部１５０が(i)から(iii)の何れの種類の確認発話文を具体的にどのように生成するかは、発話生成部１５０内で予め定めておいてもよいし、発話生成部１５０外から対話システムの運用者が指定できるようにしておいてもよい。また、正しい内容は、発話生成部１５０が生成したオリジナル発話文と発話決定部１２０が生成した変換後発話文とに基づいて、発話決定部１２０が曖昧化した部分に対応する語をオリジナル発話文から取得することにより決定する。間違った内容は、発話生成部１５０が生成したオリジナル発話文と発話決定部１２０が生成した変換後発話文とに基づいて、発話決定部１２０が曖昧化した部分に対応する語をオリジナル発話文から取得して、取得した語に基づいて生成すればよい。なお、確認発話文は、変換後発話文を１つの意味に特定するための質問を含むが、変換後発話文を１つの意味に特定するものではない。 Confirmation utterances are, for example, (i) utterances that identify and confirm the correct content, (ii) utterances that confirm without specifying any content, and (iii) identify and confirm the wrong content. Is an utterance sentence, which does. For example, (i) As an utterance sentence for identifying and confirming the correct content, "Is that XX?" (Here, XX is the correct content, and a word that specifies the converted utterance sentence as one meaning. (Ii) The utterance that confirms without specifying the content is the utterance that says "What?", (Iii) The utterance that identifies and confirms the wrong content. As a sentence, there may be utterance sentences such as "Is that YY?", "Did you say YY?", "What is YY?" (YY is wrong here). The specific method of generating any type of confirmation utterance sentence (i) to (iii) by the utterance generation unit 150 may be predetermined in the utterance generation unit 150, or may be determined in advance in the utterance generation unit 150. The operator of the dialogue system may be able to specify from outside 150. Further, the correct content is based on the original utterance sentence generated by the utterance generation unit 150 and the converted utterance sentence generated by the utterance determination unit 120, and the word corresponding to the ambiguous part of the utterance determination unit 120 is the original utterance sentence. Determined by obtaining from. The wrong content is based on the original utterance sentence generated by the utterance generation unit 150 and the converted utterance sentence generated by the utterance determination unit 120, and the word corresponding to the ambiguous part of the utterance determination unit 120 is extracted from the original utterance sentence. It may be acquired and generated based on the acquired words. The confirmation utterance sentence includes a question for specifying the converted utterance sentence to one meaning, but does not specify the converted utterance sentence to one meaning.

音声合成部１１０は、発話生成部１５０から入力された確認発話文に対して音声合成を行い合成音声データを得て（Ｓ７）、得られた合成音声データをロボットＲ２の提示部１０１−２に出力する。 The voice synthesis unit 110 performs voice synthesis on the confirmed utterance sentence input from the utterance generation unit 150 to obtain synthetic voice data (S7), and transfers the obtained synthetic voice data to the presentation unit 101-2 of the robot R2. Output.

提示部１０１−２は、音声合成部１１０から入力された確認発話文の合成音声データに対応する音声を再生する、すなわち、確認発話文をロボットＲ２の発話として提示する（Ｓ８）。 The presentation unit 101-2 reproduces the voice corresponding to the synthesized voice data of the confirmation utterance sentence input from the voice synthesis unit 110, that is, presents the confirmation utterance sentence as the utterance of the robot R2 (S8).

発話生成部１５０は、さらに、確認発話文に応答する発話文（以下、「応答発話文」ともいう）を生成し（Ｓ９）、音声合成部１１０へ出力する。なお、応答発話文は、確認発話文に含まれる質問の回答であり、かつ、変換後発話文を１つの意味に特定する語を含む。 The utterance generation unit 150 further generates an utterance sentence (hereinafter, also referred to as “response utterance sentence”) that responds to the confirmation utterance sentence (S9), and outputs the utterance sentence to the speech synthesis unit 110. The response utterance sentence is an answer to a question included in the confirmation utterance sentence, and includes a word that specifies the converted utterance sentence as one meaning.

音声合成部１１０は、発話生成部１５０から入力された応答発話文に対して音声合成を行い合成音声データを得て（Ｓ１０）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。なお、確認発話文が(i)正しい内容を特定して確認を行う発話文である場合には、応答発話文は、確認内容を肯定した上で、正しい内容を復唱する発話文などであり、例えば、「うん、ＸＸ」である。確認発話文が(ii)内容を何ら特定せずに確認を行う発話文である場合や(iii)間違った内容を特定して確認を行う発話文である場合には、応答発話文は、正しい内容を確認する発話文などであり、例えば、「ＸＸ」である。 The voice synthesis unit 110 performs voice synthesis on the response utterance sentence input from the utterance generation unit 150 to obtain synthetic voice data (S10), and transfers the obtained synthetic voice data to the presentation unit 101-1 of the robot R1. Output. If the confirmation utterance is (i) a utterance that identifies and confirms the correct content, the response utterance is a utterance that affirms the confirmation content and then repeats the correct content. For example, "Yeah, XX". If the confirmation utterance is (ii) a utterance that confirms without specifying any content, or (iii) is a utterance that identifies and confirms the wrong content, the response utterance is correct. It is an utterance sentence for confirming the content, for example, "XX".

提示部１０１−１は、音声合成部１１０から入力された応答発話文の合成音声データに対応する音声を再生する、すなわち、応答発話文をロボットＲ１の発話として提示する（Ｓ１１）。 The presentation unit 101-1 reproduces the voice corresponding to the synthesized voice data of the response utterance sentence input from the voice synthesis unit 110, that is, presents the response utterance sentence as the utterance of the robot R1 (S11).

音声合成部１１０は、発話生成部１５０から入力されたオリジナル発話文に対して音声合成を行い合成音声データを得て（Ｓ１２）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。 The voice synthesis unit 110 performs voice synthesis on the original utterance sentence input from the utterance generation unit 150 to obtain synthetic voice data (S12), and transmits the obtained synthetic voice data to the presentation unit 101-1 of the robot R1. Output.

提示部１０１−１は、音声合成部１１０から入力されたオリジナル発話文の合成音声データに対応する音声を再生する、すなわち、オリジナル発話文をロボットＲ１の発話として提示する（Ｓ１３）。 The presentation unit 101-1 reproduces the voice corresponding to the synthetic voice data of the original utterance sentence input from the voice synthesis unit 110, that is, presents the original utterance sentence as the utterance of the robot R1 (S13).

＜各部の処理について＞
以下では、対話システム１００の各部の処理を中心に説明する。なお、ここでは、各発話文の音声合成を、対話の開始よりも前に行う例を示す。<About the processing of each part>
Hereinafter, the processing of each part of the dialogue system 100 will be mainly described. Here, an example is shown in which the speech synthesis of each utterance sentence is performed before the start of the dialogue.

［ロボットＲ１、Ｒ２］
ロボットＲ１とＲ２は、ユーザと対話するためのものであり、ユーザの近くに配置され、対話装置１９０が生成した発話を行う。[Robots R1, R2]
The robots R1 and R2 are for interacting with the user, are arranged near the user, and make an utterance generated by the dialogue device 190.

［発話生成部１５０］
発話生成部１５０は、オリジナル発話文を生成し、発話決定部１２０及び音声合成部１１０に出力する。[Utterance generator 150]
The utterance generation unit 150 generates an original utterance sentence and outputs it to the utterance determination unit 120 and the voice synthesis unit 110.

また、発話生成部１５０は、発話決定部１２０で得た変換後発話文とオリジナル発話文を用いて、発話決定部１２０が曖昧化した部分を求め、曖昧化した部分を確認するための確認発話文を生成し、音声合成部１１０に出力する。発話決定部１２０が曖昧化した部分は、変換後発話文とオリジナル発話文との差分から求めることができる。なお、発話決定部１２０から曖昧化した部分を示す情報を受け取る構成としてもよい。 Further, the utterance generation unit 150 uses the converted utterance sentence and the original utterance sentence obtained by the utterance determination unit 120 to obtain the ambiguous part of the utterance determination unit 120, and confirms the utterance to confirm the ambiguous part. A sentence is generated and output to the speech synthesis unit 110. The ambiguous portion of the utterance determination unit 120 can be obtained from the difference between the converted utterance sentence and the original utterance sentence. It should be noted that the configuration may be such that information indicating an ambiguous portion is received from the utterance determination unit 120.

さらに、発話生成部１５０は、確認発話文に対する応答発話文を生成し、音声合成部１１０に出力する。 Further, the utterance generation unit 150 generates a response utterance sentence for the confirmation utterance sentence and outputs it to the speech synthesis unit 110.

なお、オリジナル発話文、確認発話文、応答発話文を音声合成部１１０に出力する際には、それぞれの発話文に発話順を表す情報を付加して出力する。例えば、確認発話文の発話順がN+2であり、応答発話文の発話順がN+3であり、オリジナル発話文の発話順がN+4である。Nは0以上の整数の何れかである。確認発話文、応答発話文、オリジナル発話文の発話順は連続している必要はないが、順序は入れ替わらないものとする。発話生成部１５０は、確認発話文、応答発話文、オリジナル発話文を発話するロボットも決定してもよく、この場合には、発話するロボットを表す情報も音声合成部１１０に出力する。 When the original utterance sentence, the confirmation utterance sentence, and the response utterance sentence are output to the speech synthesis unit 110, information indicating the utterance order is added to each utterance sentence and output. For example, the utterance order of the confirmation utterance sentence is N + 2, the utterance order of the response utterance sentence is N + 3, and the utterance order of the original utterance sentence is N + 4. N is any integer greater than or equal to 0. The utterance order of the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence does not have to be continuous, but the order shall not be changed. The utterance generation unit 150 may also determine a robot that utters a confirmation utterance sentence, a response utterance sentence, and an original utterance sentence. In this case, information representing the uttering robot is also output to the speech synthesis unit 110.

［発話決定部１２０］
発話決定部１２０は、発話生成部１５０で生成したオリジナル発話文を受け取り、オリジナル発話文の少なくとも一部を曖昧化することにより生成した発話文を変換後発話文として得、音声合成部１１０に出力する。また、発話決定部１２０は、変換後発話文または曖昧化した部分を示す情報を発話生成部１５０に出力する。[Utterance decision unit 120]
The utterance determination unit 120 receives the original utterance sentence generated by the utterance generation unit 150, obtains the utterance sentence generated by obscuring at least a part of the original utterance sentence as the converted utterance sentence, and outputs it to the voice synthesis unit 110. To do. Further, the utterance determination unit 120 outputs information indicating the converted utterance sentence or the ambiguous portion to the utterance generation unit 150.

なお、変換後発話文を音声合成部１１０に出力する際に、変換後発話文に発話順を表す情報を付加して出力する。変換後発話文の発話順は例えばN+1であり、確認発話文、応答発話文、オリジナル発話文より前である。発話決定部１２０は、変換後発話文を発話するロボットも決定してもよく、この場合には、発話するロボットを表す情報も音声合成部１１０に出力する。 When the converted utterance sentence is output to the speech synthesis unit 110, information indicating the utterance order is added to the converted utterance sentence and output. The utterance order of the converted utterance sentence is, for example, N + 1, which is before the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence. The utterance determination unit 120 may also determine the robot that utters the converted utterance sentence, and in this case, also outputs information representing the uttering robot to the speech synthesis unit 110.

［音声合成部１１０］
音声合成部１１０は、発話生成部１５０から入力された確認発話文、応答発話文、オリジナル発話文、及び、発話決定部１２０から入力された変換後発話文に対する音声合成を行って、合成音声データを得て、得られた合成音声データをロボットＲ１の提示部１０１−１またはロボットＲ２の提示部１０１−２に出力する。発話決定部１２０は、発話順を表す情報に従って、合成音声データを出力する。よって、本実施形態では、変換後発話文、確認発話文、応答発話文、オリジナル発話文の順に合成音声データを出力する。発話決定部１２０から発話文と共に当該発話文を発話するロボットを表す情報が入力された場合には、当該情報に対応するロボットの提示部に対して合成音声データを出力する。[Speech synthesis unit 110]
The voice synthesis unit 110 performs voice synthesis on the confirmation utterance sentence, the response utterance sentence, the original utterance sentence, and the converted utterance sentence input from the utterance determination unit 120, and synthesizes the synthesized voice data. Is obtained, and the obtained synthetic speech data is output to the presentation unit 101-1 of the robot R1 or the presentation unit 101-2 of the robot R2. The utterance determination unit 120 outputs synthetic voice data according to the information indicating the utterance order. Therefore, in the present embodiment, the synthesized speech data is output in the order of the converted utterance sentence, the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence. When information representing a robot that utters the utterance sentence is input from the utterance determination unit 120 together with the utterance sentence, synthetic voice data is output to the presentation unit of the robot corresponding to the information.

［提示部１０１−１、１０１−２］
提示部１０１−１、１０１−２は、音声合成部１１０から入力された合成音声データに対応する音声を再生する。これにより、ユーザはロボットＲ１またはＲ２の発話を受聴することになり、ユーザと対話システム１００との対話が実現される。[Presentation Units 101-1 and 101-2]
The presentation units 101-1 and 101-2 reproduce the voice corresponding to the synthetic voice data input from the voice synthesis unit 110. As a result, the user listens to the utterance of the robot R1 or R2, and the dialogue between the user and the dialogue system 100 is realized.

＜効果＞
以上の構成により、対話のターン数を増やすことができる。<Effect>
With the above configuration, the number of dialogue turns can be increased.

対話システムと人との会話において、対話システムの発話が人の予測や共感を超えた文脈のものと解釈されるものとなってしまう場合がある。例えば、対話システムの発話が突然で、急には、その発話意図が理解できない場合である。本実施形態では、文の一部をいったん曖昧なものにして、その曖昧性を確認させる対話をするための発話を別のロボットに差し込ませている。対話システムがこのような発話を差し込むことで、人が対話システムの発話意図を理解しやすくなる。 In a conversation between a dialogue system and a person, the utterance of the dialogue system may be interpreted as being in a context beyond human prediction and empathy. For example, when the dialogue system utters suddenly and suddenly the intention of the utterance cannot be understood. In the present embodiment, a part of the sentence is once ambiguous, and an utterance for a dialogue to confirm the ambiguity is inserted into another robot. When the dialogue system inserts such an utterance, it becomes easier for a person to understand the utterance intention of the dialogue system.

＜第二実施形態＞
図３は第二実施形態に係る対話システム１００の機能ブロック図を、図４は第二実施形態に係る対話システム１００の処理フローを示す。<Second embodiment>
FIG. 3 shows a functional block diagram of the dialogue system 100 according to the second embodiment, and FIG. 4 shows a processing flow of the dialogue system 100 according to the second embodiment.

第二実施形態の対話システム１００は、第一実施形態の対話システム１００と同様に、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。第二実施形態の対話装置１９０が第一実施形態の対話装置１９０と異なるのは、発話終了検出部１４０も含むことである。第二実施形態のロボットＲ１が第一実施形態のロボットＲ１と異なるのは、入力部１０２−１も含むことであり、第二実施形態のロボットＲ２が第一実施形態のロボットＲ２と異なるのは、入力部１０２−２も含むことである。入力部１０２−１、１０２−２は、ロボットの周囲で発せられた音響信号を収音するものであり、例えばマイクロホンである。入力部はユーザが発話した発話音声を収音可能とすればよいので、入力部１０２−１、１０２−２の何れか一方を備えないでもよい。また、ユーザの近傍などの、ロボットＲ１，Ｒ２とは異なる場所に設置されたマイクロホンを入力部とし、入力部１０２−１、１０２−２の双方を備えない構成としてもよい。 The dialogue system 100 of the second embodiment includes the robots R1 and R2 and the dialogue device 190, similarly to the dialogue system 100 of the first embodiment. The dialogue device 190 of the second embodiment is different from the dialogue device 190 of the first embodiment in that the speech end detection unit 140 is also included. The robot R1 of the second embodiment is different from the robot R1 of the first embodiment in that the input unit 102-1 is also included, and the robot R2 of the second embodiment is different from the robot R2 of the first embodiment. , Input unit 102-2 is also included. Input units 102-1 and 102-2 collect acoustic signals emitted around the robot, and are, for example, microphones. Since the input unit may be capable of picking up the uttered voice spoken by the user, it is not necessary to provide either one of the input units 102-1 and 102-2. Further, the microphone installed in a place different from the robots R1 and R2, such as in the vicinity of the user, may be used as the input unit, and both the input units 102-1 and 102-2 may not be provided.

以下、第二実施形態の対話システム１００が行う動作の流れを、第一実施形態の対話システム１００が行う動作の流れと異なる点を中心に説明する。 Hereinafter, the flow of operations performed by the dialogue system 100 of the second embodiment will be described focusing on points different from the flow of operations performed by the dialogue system 100 of the first embodiment.

まず、第二実施形態の対話システム１００は、ステップＳ１〜Ｓ４を行う。 First, the dialogue system 100 of the second embodiment performs steps S1 to S4.

ステップＳ４による変換後発話文の提示後に、入力部１０２−１、１０２−２の少なくとも何れかにおいて収音されたユーザの発話に対応する音声データは、発話終了検出部１４０に出力される。 After the converted utterance sentence is presented in step S4, the voice data corresponding to the user's utterance picked up by at least one of the input units 102-1 and 102-2 is output to the utterance end detection unit 140.

発話終了検出部１４０は、入力部１０２−１、１０２−２の少なくとも何れかから収音された取得した音声データを用いて、ユーザの発話の終了を検出するか、または、ユーザの発話がないまま予め定めた時間が経過したこと、すなわち、タイムアウトしたことを検出し（Ｓ５）、発話生成部１５０に発話の終了、または、タイムアウトしたことを知らせる制御信号を出力する。 The utterance end detection unit 140 detects the end of the user's utterance by using the acquired voice data collected from at least one of the input units 102-1 and 102-2, or the user has no utterance. It detects that a predetermined time has elapsed, that is, has timed out (S5), and outputs a control signal to the utterance generation unit 150 to notify that the utterance has ended or has timed out.

発話生成部１５０に発話終了検出部１４０からの制御信号が入力されると、第二実施形態の対話システム１００は、ステップＳ６〜Ｓ１３を行う。 When the control signal from the utterance end detection unit 140 is input to the utterance generation unit 150, the dialogue system 100 of the second embodiment performs steps S6 to S13.

すなわち、本実施形態では、変換後発話文の提示後にユーザが発話する時間を設けているものの、対話システム１００は、ユーザの発話は音声認識せずに、ユーザの発話が終了した時点、または、所定時間経過した時点で、確認発話文を提示する。なお、ユーザの発話内容が曖昧化した部分の正しい内容を含むものであろうと、間違った内容を含むものであろうと、対話システム１００が提示する確認発話文と応答発話文は、上記の(i)の場合と同様のものとすればよい。例えば、対話システム１００は、「それって、ＸＸのこと？」を確認発話文として提示し、「うん、ＸＸ」を応答発話文として提示する。 That is, in the present embodiment, although the user is provided with a time to speak after the converted utterance sentence is presented, the dialogue system 100 does not recognize the user's utterance by voice, and the user's utterance is completed or the user's utterance is completed. When the predetermined time has passed, the confirmation utterance is presented. It should be noted that the confirmation utterance sentence and the response utterance sentence presented by the dialogue system 100 are described in the above (i) regardless of whether the user's utterance content includes the correct content of the ambiguous part or the incorrect content. ) May be the same. For example, the dialogue system 100 presents "That's XX?" As a confirmation utterance, and "Yeah, XX" as a response utterance.

以下に、本実施形態の以下、対話の例を示す。
（例５）
発話t(1):ロボットＲ１→ユーザ：『あれ』、どんなタイプが好き？
発話t(2):ユーザ→ロボットＲ１：え、何？
発話t(3):ロボットＲ２→ロボットＲ１：それって車の話？
発話t(4):ロボットＲ１→ロボットＲ２：そう、車。車、どんなタイプが好き？
例５は、ロボットＲ１が変換後発話文t(1)を発話し、ロボットＲ１が変換後発話文t(1)を発話した後にユーザの発話を受け付ける時間を設ける。ユーザの発話t(2)が終了した時点でロボットＲ２が確認発話文t(3)を発話する。次にロボットＲ１が発話t(4)として応答発話文とオリジナル発話文を発話する例である。The following is an example of the dialogue of the present embodiment.
(Example 5)
Utterance t (1): Robot R1 → User: "That", what type do you like?
Utterance t (2): User → Robot R1: What?
Utterance t (3): Robot R2 → Robot R1: Is that a car story?
Utterance t (4): Robot R1 → Robot R2: Yes, a car. What type of car do you like?
In Example 5, a time is provided for accepting a user's utterance after the robot R1 utters the converted utterance sentence t (1) and the robot R1 utters the converted utterance sentence t (1). When the user's utterance t (2) ends, the robot R2 utters the confirmation utterance t (3). Next, this is an example in which the robot R1 utters a response utterance sentence and an original utterance sentence as the utterance t (4).

なお、本実施形態では、対話システム１００が提示する確認発話文と応答発話文はユーザの発話内容には依存しないため、本実施形態の対話システム１００は音声認識する機能を備えなくてよい。 In the present embodiment, since the confirmation utterance sentence and the response utterance sentence presented by the dialogue system 100 do not depend on the utterance content of the user, the dialogue system 100 of the present embodiment does not have to have a voice recognition function.

＜第三実施形態＞
図５は第三実施形態に係る対話システム１００の機能ブロック図を、図６は第三実施形態に係る対話システム１００の処理フローを示す。<Third Embodiment>
FIG. 5 shows a functional block diagram of the dialogue system 100 according to the third embodiment, and FIG. 6 shows a processing flow of the dialogue system 100 according to the third embodiment.

第三実施形態の対話システム１００は、第二実施形態の対話システム１００と同様に、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。第三実施形態の対話装置１９０が第二実施形態の対話装置１９０と異なるのは、発話終了検出部１４０を含まず、音声認識部１４１を含むことである。 The dialogue system 100 of the third embodiment includes the robots R1 and R2 and the dialogue device 190, similarly to the dialogue system 100 of the second embodiment. The dialogue device 190 of the third embodiment is different from the dialogue device 190 of the second embodiment in that it does not include the utterance end detection unit 140 but includes the voice recognition unit 141.

以下、第三実施形態の対話システム１００が行う動作の流れを、第二実施形態の対話システム１００が行う動作の流れと異なる点を中心に説明する。 Hereinafter, the flow of operations performed by the dialogue system 100 of the third embodiment will be described focusing on points different from the flow of operations performed by the dialogue system 100 of the second embodiment.

まず、第三実施形態の対話システム１００は、ステップＳ１〜Ｓ４を行う。 First, the dialogue system 100 of the third embodiment performs steps S1 to S4.

ステップＳ４による変換後発話文の提示後に、入力部１０２−１、１０２−２の少なくとも何れかにおいて収音されたユーザの発話に対応する音声データは、音声認識部１４１に出力される。 After the converted utterance sentence is presented in step S4, the voice data corresponding to the user's utterance picked up by at least one of the input units 102-1 and 102-2 is output to the voice recognition unit 141.

音声認識部１４１は、入力部１０２−１、１０２−２の少なくとも何れかから収音された音声データを音声認識して、音声認識結果の発話文（ユーザの発話に対応する発話文）を得て（Ｓ５１）、音声認識結果の発話文を発話生成部１５０に出力する。 The voice recognition unit 141 voice-recognizes the voice data collected from at least one of the input units 102-1 and 102-2, and obtains the utterance sentence of the voice recognition result (the utterance sentence corresponding to the user's utterance). (S51), the utterance sentence of the voice recognition result is output to the utterance generation unit 150.

発話生成部１５０は、音声認識結果の発話文が生成した確認発話文と同一の内容であるか否かを判断し（Ｓ５２）、音声認識結果の発話文が生成した確認発話文と同一の内容である場合には、第三実施形態の対話システム１００は、ステップＳ６〜Ｓ８を行わずに、ステップＳ９〜Ｓ１３を行い、音声認識結果の発話文が生成した確認発話文と同一の内容ではない場合には、第三実施形態の対話システム１００は、ステップＳ６〜Ｓ１３を行う。すなわち、第三実施形態の対話システム１００は、ユーザが曖昧化した文の内容を確認する発話文を発話した場合には、曖昧化した文の内容を確認する発話文を発話せず、ユーザの発話の後に応答発話文を発話する。 The utterance generation unit 150 determines whether or not the utterance sentence of the voice recognition result has the same content as the generated confirmation utterance sentence (S52), and has the same content as the confirmed utterance sentence generated by the utterance sentence of the voice recognition result. In the case of, the dialogue system 100 of the third embodiment performs steps S9 to S13 without performing steps S6 to S8, and the utterance sentence of the voice recognition result is not the same content as the generated confirmation utterance sentence. In this case, the dialogue system 100 of the third embodiment performs steps S6 to S13. That is, when the user utters an utterance sentence for confirming the content of the ambiguous sentence, the dialogue system 100 of the third embodiment does not utter the utterance sentence for confirming the content of the ambiguous sentence, and the user Speak a response utterance sentence after the utterance.

なお、本実施形態では対話システム１００による変換後発話文の提示後にユーザの発話を受け付ける例について説明したが、対話システム１００による何れの発話文の提示後にユーザの発話を受け付ける構成としてもよい。また、変換後発話文の提示後のユーザの発話の音声認識結果の発話文が生成した確認発話文と同一の内容ではない場合などの、ユーザの発話が対話システム１００が予め想定した発話以外の発話を行った場合について説明する。そのような場合には、第一実施形態で説明した確認発話文、応答発話文、オリジナル発話文、の何れでもない発話文を対話システム１００が発話してもよい。例えば、発話生成部１００は、音声認識の結果が肯定してよい内容であれば「うん、ＸＸ」を応答発話文とする。一方、発話生成部１００は、音声認識の結果が否定する必要のある内容であれば「ごめん、ＸＸ」を応答発話文として生成する。発話生成部１００は、生成した何れかの応答発話文をロボットＲ１の発話として提示すればよい。 In the present embodiment, an example in which the user's utterance is accepted after the converted utterance sentence is presented by the dialogue system 100 has been described, but the user's utterance may be accepted after any utterance sentence is presented by the dialogue system 100. Further, when the utterance sentence of the voice recognition result of the user's utterance after the converted utterance sentence is not the same as the generated confirmation utterance sentence, the user's utterance is other than the utterance assumed in advance by the dialogue system 100. The case where the utterance is made will be described. In such a case, the dialogue system 100 may utter an utterance sentence that is neither the confirmation utterance sentence, the response utterance sentence, or the original utterance sentence described in the first embodiment. For example, the utterance generation unit 100 sets "Yeah, XX" as the response utterance sentence if the result of the voice recognition can be affirmed. On the other hand, the utterance generation unit 100 generates "sorry, XX" as the response utterance sentence if the result of the voice recognition needs to be denied. The utterance generation unit 100 may present any of the generated response utterance sentences as the utterance of the robot R1.

なお、対話システム１００がユーザの発話を受け付ける場合には、例えば、ロボットの頭部や視線をユーザに向ける等の動きによりユーザに発話を促すようにしてもよい。 When the dialogue system 100 accepts the user's utterance, the user may be urged to speak by, for example, moving the robot's head or line of sight toward the user.

＜変形例１＞
上述の実施形態では、対話システムは、発話の前にロボットの発話文(オリジナル発話文、変換後発話文、確認発話文、応答発話文)を生成していたが、実際には、最初の発話をする前に、生成、音声合成を行っておき、合成音声データを図示しない記憶部に記憶しておき、実際の対話時には、所定のタイミングで各合成音声データを提示部１０１−１または１０１−２で再生する構成としてもよい。また、最初の発話をする前に、発話の前にロボットの発話文を生成し、発話文を図示しない記憶部に記憶しておき、実際の対話時には、所定のタイミングで、各発話文を音声合成して合成音声データを得て、提示部１０１−１または１０１−２で再生する構成としてもよい。<Modification example 1>
In the above embodiment, the dialogue system generates the robot's utterances (original utterance, converted speech, confirmation speech, response speech) before the speech, but in reality, the first speech. Before performing the above, generation and speech synthesis are performed, the synthesized speech data is stored in a storage unit (not shown), and each synthetic speech data is presented at a predetermined timing in the presentation unit 101-1 or 101- at the time of actual dialogue. It may be configured to reproduce in 2. In addition, before the first utterance, the robot's utterance sentence is generated before the utterance, and the utterance sentence is stored in a storage unit (not shown), and each utterance sentence is voiced at a predetermined timing during the actual dialogue. It may be configured to synthesize and obtain synthetic voice data and reproduce it by the presentation unit 101-1 or 101-2.

＜変形例２＞
上述の実施形態では２台のロボットを含む対話システムについて説明した。しかし、上述したように発話決定部１２０が発話するロボットを決定しない形態などもある。そのため、対話システム１００に必ずしも２台のロボットを必要としない形態がある。この形態とする場合には、対話システム１００に含むロボットを１台としてもよい。また、上述したように発話決定部１２０が２台のロボットを発話するロボットとして決定する形態がある。この形態を対話システム１００に３台以上のロボットを含む構成で動作させてもよい。<Modification 2>
In the above-described embodiment, a dialogue system including two robots has been described. However, as described above, there is also a form in which the utterance determination unit 120 does not determine the robot to speak. Therefore, there is a form in which the dialogue system 100 does not necessarily require two robots. In this form, the number of robots included in the dialogue system 100 may be one. Further, as described above, there is a form in which the utterance determination unit 120 determines as a robot that utters two robots. This form may be operated in a configuration in which the dialogue system 100 includes three or more robots.

＜変形例３＞
対話システム１００が複数台のロボットを含む構成において、どのロボットが発話しているのかをユーザが判別可能とされていれば、提示部の個数はロボットの個数と同一でなくてもよい。また、提示部はロボットに設置されていなくてもよい。どのロボットが発話しているのかをユーザが判別可能とする方法としては、合成する音声の声質をロボットごとに異ならせる、複数のスピーカを用いてロボットごとに定位を異ならせる、などの周知の技術を用いればよい。<Modification example 3>
In a configuration in which the dialogue system 100 includes a plurality of robots, the number of presentation units does not have to be the same as the number of robots as long as it is possible for the user to determine which robot is speaking. Further, the presentation unit does not have to be installed on the robot. Well-known techniques that allow the user to determine which robot is speaking include different voice qualities for the voice to be synthesized for each robot, and different localization for each robot using multiple speakers. Should be used.

＜変形例４＞
上述した実施形態では、エージェントとしてロボットを用いて音声による対話を行う例を説明したが、上述した実施形態のロボットは身体等を有する人型ロボットであっても、身体等を有さないロボットであってもよい。また、この発明の対話技術はこれらに限定されず、ロボットのように身体等の実体がなく、発声機構を備えないエージェントを用いて対話を行う形態とすることも可能である。そのような形態としては、例えば、コンピュータの画面上に表示されたエージェントを用いて対話を行う形態が挙げられる。より具体的には、「LINE」や「２ちゃんねる（登録商標）」のような、複数アカウントがテキストメッセージにより対話を行うグループチャットにおいて、ユーザのアカウントと対話装置のアカウントとが対話を行う形態に本対話システムを適用することも可能である。この形態では、エージェントを表示する画面を有するコンピュータは人の近傍にある必要があるが、当該コンピュータと対話装置とはインターネットなどのネットワークを介して接続されていてもよい。つまり、本対話システムは、人とロボットなどの話者同士が実際に向かい合って話す対話だけではなく、話者同士がネットワークを介してコミュニケーションを行う会話にも適用可能である。<Modification example 4>
In the above-described embodiment, an example in which a robot is used as an agent to perform a voice dialogue has been described, but the robot of the above-described embodiment is a humanoid robot having a body or the like but a robot having no body or the like. There may be. Further, the dialogue technique of the present invention is not limited to these, and it is also possible to have a form in which dialogue is performed using an agent that does not have an entity such as a body like a robot and does not have a vocalization mechanism. Examples of such a form include a form in which a dialogue is performed using an agent displayed on a computer screen. More specifically, in a group chat where multiple accounts interact by text message, such as "LINE" and "2 Channel (registered trademark)", the user account and the dialogue device account interact with each other. It is also possible to apply this dialogue system. In this form, the computer having the screen for displaying the agent needs to be in the vicinity of a person, but the computer and the dialogue device may be connected to each other via a network such as the Internet. That is, this dialogue system can be applied not only to conversations in which speakers such as humans and robots actually talk face to face, but also to conversations in which speakers communicate with each other via a network.

本変形例の対話装置は、図７に示すように、発話生成部１５０、発話決定部１２０、および提示部１０１を少なくとも備える。発話決定部１２０は、外部に存在する雑談対話システムおよびシナリオ対話システムと通信可能なインターフェースを備える。雑談対話システムおよびシナリオ対話システムは同様の機能を持つ処理部として対話装置内に構成しても構わない。また、発話生成部１５０、発話決定部１２０は、外部に存在する情報処理装置と通信可能なインターフェースを備え、各部の一部または同様の機能を持つ処理部を対話装置外にある情報処理装置内に構成しても構わない。 As shown in FIG. 7, the dialogue device of this modified example includes at least an utterance generation unit 150, an utterance determination unit 120, and a presentation unit 101. The utterance determination unit 120 includes an interface capable of communicating with an external chat dialogue system and a scenario dialogue system. The chat dialogue system and the scenario dialogue system may be configured in the dialogue device as a processing unit having the same function. Further, the utterance generation unit 150 and the utterance determination unit 120 are provided with an interface capable of communicating with an information processing device existing outside, and a part of each unit or a processing unit having a similar function is inside the information processing device outside the dialogue device. It may be configured as.

本変形例の対話装置は、例えば、スマートフォンやタブレットのようなモバイル端末、もしくはデスクトップ型やラップトップ型のパーソナルコンピュータなどの情報処理装置である。以下、対話装置がスマートフォンであるものとして説明する。提示部１０１はスマートフォンが備える液晶ディスプレイである。この液晶ディスプレイにはチャットアプリケーションのウィンドウが表示され、ウィンドウ内にはグループチャットの対話内容が時系列に表示される。グループチャットとは、チャットにおいて複数のアカウントが互いにテキストメッセージを投稿し合い対話を展開する機能である。このグループチャットには、対話装置が制御する仮想的な人格に対応する複数の仮想アカウントと、ユーザのアカウントとが参加しているものとする。すなわち、本変形例は、エージェントが、対話装置であるスマートフォンの液晶ディスプレイに表示された仮想アカウントである場合の一例である。なお、第二実施形態や第三実施形態に対応する本変形例の対話装置では、スマートフォンの液晶ディスプレイに表示されたソフトウェアキーボードを入力部１０２とすることでユーザが発話内容を入力し、自らのアカウントを通じてグループチャットへ投稿することができる。なお、スマートフォンに搭載されたマイクロホンを入力部１０２として機能させ、ユーザが発声により発話内容を入力する構成としてもよい。この構成とする場合には、対話装置は発話終了検出部１４０または音声認識部１４１を備えるか、外部に存在する情報処理装置と通信可能なインターフェースを備え、発話終了検出部１４０または音声認識部１４１と同様の機能を持つ処理部を対話装置外にある情報処理装置内に構成する。また、スマートフォンに搭載されたスピーカと音声合成機能を用い、各対話システムから得た発話内容を、各仮想アカウントに対応する音声でスピーカから出力する構成としてもよい。 The dialogue device of this modification is, for example, a mobile terminal such as a smartphone or tablet, or an information processing device such as a desktop type or laptop type personal computer. Hereinafter, it is assumed that the dialogue device is a smartphone. The presentation unit 101 is a liquid crystal display included in the smartphone. A chat application window is displayed on this liquid crystal display, and the conversation contents of the group chat are displayed in chronological order in the window. Group chat is a function in which multiple accounts post text messages to each other in a chat and develop a dialogue. It is assumed that a plurality of virtual accounts corresponding to the virtual personality controlled by the dialogue device and the user's account participate in this group chat. That is, this modification is an example in which the agent is a virtual account displayed on the liquid crystal display of the smartphone which is the dialogue device. In the dialogue device of the present modification corresponding to the second embodiment and the third embodiment, the user inputs the utterance content by using the software keyboard displayed on the liquid crystal display of the smartphone as the input unit 102, and the user inputs his / her own speech content. You can post to group chat through your account. The microphone mounted on the smartphone may function as the input unit 102, and the user may input the utterance content by utterance. In this configuration, the dialogue device includes the utterance end detection unit 140 or the voice recognition unit 141, or has an interface capable of communicating with an information processing device existing outside, and the utterance end detection unit 140 or the voice recognition unit 141. A processing unit having the same function as the above is configured in the information processing device outside the dialogue device. Further, the speaker and the voice synthesis function mounted on the smartphone may be used, and the utterance content obtained from each dialogue system may be output from the speaker with the voice corresponding to each virtual account.

＜変形例５＞
変形例１で説明した通り、発話生成部１５０と発話決定部１２０とにより、複数のロボットを対話させるための複数の発話文を得ることができる。また、発話生成部１５０と発話決定部１２０と音声合成部１１０とにより、複数のロボットを対話させるための複数の発話の合成音声データを得ることができる。また、変形例４で説明した通り、生成した発話文は、ロボットではなく、コンピュータ画面上に表示されたエージェントなどの発声機構を備えないエージェントに提示させてもよい。すなわち、発話生成部１５０と発話決定部１２０とによる装置は、複数のエージェントを対話させるための複数の発話文を生成する対話シナリオ生成装置として機能させることができる。また、発話生成部１５０と発話決定部１２０と音声合成部１１０による装置は、複数のエージェントを対話させるための複数の発話の合成音声データを生成する対話シナリオ生成装置として機能させることができる。<Modification 5>
As described in the first modification, the utterance generation unit 150 and the utterance determination unit 120 can obtain a plurality of utterance sentences for allowing a plurality of robots to interact with each other. Further, the utterance generation unit 150, the utterance determination unit 120, and the voice synthesis unit 110 can obtain synthetic voice data of a plurality of utterances for allowing a plurality of robots to interact with each other. Further, as described in the modified example 4, the generated utterance sentence may be presented not to the robot but to an agent having no utterance mechanism such as an agent displayed on the computer screen. That is, the device by the utterance generation unit 150 and the utterance determination unit 120 can function as a dialogue scenario generation device that generates a plurality of utterance sentences for interacting with a plurality of agents. Further, the device by the utterance generation unit 150, the utterance determination unit 120, and the voice synthesis unit 110 can function as a dialogue scenario generation device that generates synthetic voice data of a plurality of utterances for allowing a plurality of agents to interact with each other.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、提示部が提示する発話順以外の上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。<Other variants>
The present invention is not limited to the above embodiments and modifications. For example, the above-mentioned various processes other than the utterance order presented by the presenting unit are not only executed in chronological order according to the description, but also executed in parallel or individually as required by the processing capacity of the device that executes the processes. May be done. In addition, changes can be made as appropriate without departing from the spirit of the present invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例１−３、５で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。また、上記変形例４で説明した対話システムにおける各種の処理機能をコンピュータによって実現してもよい。その場合、対話システムが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。<Programs and recording media>
Further, various processing functions in each of the devices described in the above-described embodiments and modifications 1-3 and 5 may be realized by a computer. In that case, the processing content of the function that each device should have is described by the program. Further, various processing functions in the dialogue system described in the above-described modification 4 may be realized by a computer. In that case, the processing content of the function that the dialogue system should have is described by the program. Then, by executing this program on the computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 Further, the distribution of this program is performed, for example, by selling, transferring, renting, or the like a portable recording medium such as a DVD or a CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage unit. Then, when the process is executed, the computer reads the program stored in its own storage unit and executes the process according to the read program. Further, as another embodiment of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer to this computer, the processing according to the received program may be executed sequentially. Further, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and the result acquisition without transferring the program from the server computer to this computer. May be. In addition, the program shall include information used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話方法であって、
前記対話システムが、発話を生成する発話生成ステップと、
前記対話システムが、前記発話生成ステップが生成した発話の少なくとも一部を指示語で置換することにより生成した発話を変換後発話として得る発話決定ステップと、
前記第一のエージェントが、前記発話決定ステップが得た変換後発話を提示する変換後発話提示ステップと、
前記第二のエージェントが、前記一部を含み、前記指示語の指示内容を質問する発話を提示する確認発話提示ステップと、
前記第一のエージェントが、前記指示内容を肯定する発話を提示する肯定発話提示ステップと、を含む、
対話方法。 A method of dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The utterance generation step in which the dialogue system generates utterances,
An utterance determination step in which the dialogue system obtains an utterance generated by replacing at least a part of the utterance generated by the utterance generation step with a demonstrative as a post-conversion utterance.
A post-conversion utterance presentation step in which the first agent presents the post-conversion utterance obtained by the utterance determination step,
A confirmation utterance presentation step in which the second agent presents an utterance that includes the part and asks the instruction content of the demonstrative word.
The first agent includes an affirmative utterance presentation step of presenting an utterance that affirms the instruction content.
How to interact.

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話方法であって、
前記対話システムが、発話を生成する発話生成ステップと、
前記対話システムが、前記発話生成ステップが生成した発話の少なくとも一部を省略することにより生成した発話を変換後発話として得る発話決定ステップと、
前記第一のエージェントが、前記発話決定ステップが得た変換後発話を提示する変換後発話提示ステップと、
前記第二のエージェントが、前記変換後発話における省略内容を質問する発話を提示する確認発話提示ステップと、
前記第一のエージェントが、前記発話生成ステップで生成した発話を提示する発話提示ステップと、を含む、
対話方法。 A method of dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The utterance generation step in which the dialogue system generates utterances,
An utterance determination step in which the dialogue system obtains an utterance generated by omitting at least a part of the utterances generated by the utterance generation step as a post-conversion utterance.
A post-conversion utterance presentation step in which the first agent presents the post-conversion utterance obtained by the utterance determination step,
A confirmation utterance presentation step in which the second agent presents an utterance asking the abbreviated content in the post-conversion utterance, and
The first agent includes an utterance presentation step of presenting the utterance generated in the utterance generation step.
How to interact.

請求項１または請求項２の対話方法であって、
前記変換後発話提示ステップと前記確認発話提示ステップとの間に、所定の時間を設けて前記所定の時間に発話されたユーザの発話を音声認識せずに、前記確認発話提示ステップを実行する、
対話方法。 The dialogue method of claim 1 or claim 2.
A predetermined time is provided between the post-conversion utterance presentation step and the confirmation utterance presentation step, and the confirmation utterance presentation step is executed without voice recognition of the user 's utterance spoken at the predetermined time.
How to interact.

請求項１または請求項２の対話方法であって、
前記変換後発話提示ステップと前記確認発話提示ステップとの間に、所定の時間を設けて前記所定の時間に発話されたユーザの発話を音声認識して、音声認識結果を得るステップとを含み、
前記音声認識結果と前記指示内容または前記省略内容を質問する発話とが同一だった場合は前記確認発話提示ステップを省略し、異なる場合は前記確認発話提示ステップを実行する、
対話方法。 The dialogue method of claim 1 or claim 2.
A step of providing a predetermined time between the post-conversion utterance presentation step and the confirmation utterance presentation step to voice-recognize the user 's utterance spoken at the predetermined time and obtain a voice recognition result is included.
If the voice recognition result and the utterance asking the instruction content or the omitted content are the same, the confirmation utterance presentation step is omitted, and if they are different, the confirmation utterance presentation step is executed.
How to interact.

請求項１または請求項２の対話方法であって、
前記発話生成ステップ実行時点が継続中の対話の一時点である場合には、前記一部は、前記発話生成ステップ実行時点よりも前の対話中のトピックとなる単語とする、
対話方法。 The dialogue method of claim 1 or claim 2.
When the utterance generation step execution time point is one time point of the ongoing dialogue, the part thereof is a word that becomes a topic in the dialogue before the utterance generation step execution time point.
How to interact.

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する対話シナリオ生成方法であって、
前記対話シナリオ生成装置が、
所定の発話の少なくとも一部を指示語で置換することにより生成した発話である第１の発話と、
前記第一のエージェントが前記第１の発話を提示した後に前記第二のエージェントが提示する発話であり、前記一部を含み、前記指示語の指示内容を質問する発話である第２の発話と、
前記第２の発話を提示した後に前記第一のエージェントが提示する発話であり、前記指示内容を肯定する発話と、
を含む対話シナリオを生成する、
対話シナリオ生成方法。 A dialogue scenario generation method in which a dialogue scenario generator generates a dialogue scenario used for a dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The dialogue scenario generator
The first utterance, which is an utterance generated by replacing at least a part of a predetermined utterance with a demonstrative word, and
A second utterance that is a utterance presented by the second agent after the first agent presents the first utterance, includes a part of the utterance, and asks the instruction content of the demonstrative word. ,
An utterance presented by the first agent after the second utterance is presented, and an utterance that affirms the instruction content, and
Generate a dialogue scenario that includes
How to generate a dialogue scenario.

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する対話シナリオ生成方法であって、
前記対話シナリオ生成装置が、
所定の発話の少なくとも一部を省略することにより生成した発話である第１の発話と、
前記第一のエージェントが前記第１の発話を提示した後に前記第二のエージェントが提示する発話であり、前記第１の発話における省略内容を質問する発話である第２の発話と、
前記第２の発話を提示した後に前記第一のエージェントが提示する発話である前記所定の発話と、
を含む対話シナリオを生成する、
対話シナリオ生成方法。 A dialogue scenario generation method in which a dialogue scenario generator generates a dialogue scenario used for a dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The dialogue scenario generator
The first utterance, which is an utterance generated by omitting at least a part of a predetermined utterance, and
A second utterance, which is an utterance presented by the second agent after the first agent presents the first utterance, and is an utterance asking the abbreviated content in the first utterance, and
The predetermined utterance, which is the utterance presented by the first agent after the second utterance is presented, and
Generate a dialogue scenario that includes
How to generate a dialogue scenario.

第一のエージェントと、
前記第一のエージェントとは異なる第二のエージェントと、
発話を生成する発話生成部と、
前記発話生成部が生成した発話の少なくとも一部を指示語で置換することにより生成した発話を変換後発話として得る発話決定部と、を含み、
前記第一のエージェントが、前記発話決定部が得た変換後発話を提示し、
前記第二のエージェントが、前記一部を含み、前記指示語の指示内容を質問する発話を提示し、
前記第一のエージェントが、前記指示内容を肯定する発話を提示する、
対話システム。 With the first agent
A second agent different from the first agent,
An utterance generator that generates utterances,
Includes an utterance determination unit that obtains an utterance generated by replacing at least a part of the utterance generated by the utterance generation unit with a demonstrative as a post-conversion utterance.
The first agent presents the post-conversion utterance obtained by the utterance determination unit,
The second agent presents an utterance that includes the part and asks the instruction content of the demonstrative word.
The first agent presents an utterance that affirms the instruction.
Dialogue system.

第一のエージェントと、
前記第一のエージェントとは異なる第二のエージェントと、
発話を生成する発話生成部と、
前記発話生成部が生成した発話の少なくとも一部を省略することにより生成した発話を変換後発話として得る発話決定部と、
前記第一のエージェントが、前記発話決定部が得た変換後発話を提示し、
前記第二のエージェントが、前記変換後発話における省略内容を質問する発話を提示し、
前記第一のエージェントが、前記発話生成部で生成した発話を提示する、
対話システム。 With the first agent
A second agent different from the first agent,
An utterance generator that generates utterances,
An utterance determination unit that obtains an utterance generated by omitting at least a part of the utterances generated by the utterance generation unit as a post-conversion utterance.
The first agent presents the post-conversion utterance obtained by the utterance determination unit,
The second agent presents an utterance asking for abbreviations in the post-conversion utterance.
The first agent presents the utterance generated by the utterance generation unit.
Dialogue system.

請求項８または請求項９の対話システムであって、
変換後発話を提示した後、かつ、前記指示内容または前記省略内容を質問する発話を提示する前に、所定の時間を設けて前記所定の時間に発話されたユーザの発話を音声認識せずに、前記指示内容または前記省略内容を質問する、
対話システム。 The dialogue system according to claim 8 or 9.
After presenting the post-conversion utterance and before presenting the utterance asking the instruction content or the abbreviated content, a predetermined time is provided without voice recognition of the user 's utterance uttered at the predetermined time. , Ask the instructions or the omissions,
Dialogue system.

請求項８または請求項９の対話システムであって、
変換後発話を提示した後、かつ、前記指示内容または前記省略内容を質問する発話を提示する前に、所定の時間を設けて前記所定の時間に発話されたユーザの発話を音声認識して、音声認識結果を得る音声認識部を含み、
前記音声認識結果と前記指示内容または前記省略内容を質問する発話とが同一だった場合は前記指示内容または前記省略内容を質問する発話の提示を省略し、異なる場合は前記指示内容または前記省略内容を質問する発話を提示する、
対話システム。 The dialogue system according to claim 8 or 9.
After presenting the post-conversion utterance and before presenting the utterance asking the instruction content or the abbreviated content, a predetermined time is provided to voice-recognize the utterance of the user uttered at the predetermined time. Includes a voice recognition unit that obtains voice recognition results
If the voice recognition result and the utterance asking the instruction content or the omitted content are the same, the presentation of the utterance asking the instruction content or the omitted content is omitted, and if they are different, the instruction content or the omitted content is omitted. Present the utterance to ask the question,
Dialogue system.

請求項８または請求項９の対話システムであって、
前記発話生成部実行時点が継続中の対話の一時点である場合には、前記一部は、前記発話生成部実行時点よりも前の対話中のトピックとなる単語とする、
対話システム。 The dialogue system according to claim 8 or 9.
When the utterance generation unit execution time point is one time point of the ongoing dialogue, the part thereof is a word that becomes a topic during the dialogue before the utterance generation unit execution time point.
Dialogue system.

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話に用いる対話シナリオを生成する対話シナリオ生成装置であって、
所定の発話の少なくとも一部を指示語で置換することにより生成した発話である第１の発話と、
前記第一のエージェントが前記第１の発話を提示した後に前記第二のエージェントが提示する発話であり、前記一部を含み、前記指示語の指示内容を質問する発話である第２の発話と、
前記第２の発話を提示した後に前記第一のエージェントが提示する発話であり、前記指示内容を肯定する発話と、
を含む対話シナリオを生成する、
対話シナリオ生成装置。 A dialogue scenario generator that generates a dialogue scenario used for a dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The first utterance, which is an utterance generated by replacing at least a part of a predetermined utterance with a demonstrative word, and
A second utterance that is a utterance presented by the second agent after the first agent presents the first utterance, includes a part of the utterance, and asks the instruction content of the demonstrative word. ,
An utterance presented by the first agent after the second utterance is presented, and an utterance that affirms the instruction content, and
Generate a dialogue scenario that includes
Dialogue scenario generator.

第一のエージェントと、前記第一のエージェントとは異なる第二のエージェントとを含む対話システムが行う対話に用いる対話シナリオを生成する対話シナリオ生成装置であって、
所定の発話の少なくとも一部を省略することにより生成した発話である第１の発話と、
前記第一のエージェントが前記第１の発話を提示した後に前記第二のエージェントが提示する発話であり、前記第１の発話における省略内容を質問する発話である第２の発話と、
前記第２の発話を提示した後に前記第一のエージェントが提示する発話である前記所定の発話と、
を含む対話シナリオを生成する、
対話シナリオ生成装置。 A dialogue scenario generator that generates a dialogue scenario used for a dialogue performed by a dialogue system including a first agent and a second agent different from the first agent.
The first utterance, which is an utterance generated by omitting at least a part of a predetermined utterance, and
A second utterance, which is an utterance presented by the second agent after the first agent presents the first utterance, and is an utterance asking the abbreviated content in the first utterance, and
The predetermined utterance, which is the utterance presented by the first agent after the second utterance is presented, and
Generate a dialogue scenario that includes
Dialogue scenario generator.

請求項８から請求項１２の何れかの対話システムとしてコンピュータを機能させるためのプログラム。 A program for operating a computer as an interactive system according to any one of claims 8 to 12.

請求項１３または請求項１４の対話シナリオ生成装置としてコンピュータを機能させるためのプログラム。 A program for operating a computer as an interactive scenario generator according to claim 13 or 14.