JP6682149B2

JP6682149B2 - Dialog system, method, and program

Info

Publication number: JP6682149B2
Application number: JP2017040958A
Authority: JP
Inventors: 東中　竜一郎; 竜一郎東中; 弘晃杉山; 宏美成松; 隆朗福冨; 松尾　義博; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2020-04-15
Anticipated expiration: 2037-03-03
Also published as: JP2018147189A

Description

本発明は、ユーザの発話に対して応答を行う対話システム、方法、及びプログラムに関する。 The present invention relates to a dialogue system, method, and program for responding to a user's utterance.

対話システムにおいて、人間はコンピュータと対話をし、種々の情報を得たり、要望を満たしたりする。また、所定のタスクを達成するだけではなく、日常会話を行う対話システムも存在し、これらによって、人間は精神的な安定を得たり、承認欲を満たしたり、信頼関係を築いたりする。対話システムの類型については非特許文献１や非特許文献２に詳述されている。 In a dialogue system, a human interacts with a computer to obtain various kinds of information and satisfy a request. In addition, there are dialogue systems that not only accomplish predetermined tasks but also carry out daily conversations, which enable humans to obtain mental stability, satisfy approval, and build relationships of trust. The types of dialogue systems are described in detail in Non-Patent Document 1 and Non-Patent Document 2.

一方、タスク達成や日常会話ではなく、より深い議論をコンピュータによって実現するための研究も進められている。議論は人間の価値判断を変えたり、思考を整理したりする働きがあり、人間にとって重要な役割を果たす。 On the other hand, research is also underway to realize deeper discussions using computers, rather than task achievement and daily conversation. Arguments have the role of changing human value judgments and organizing thoughts, and play an important role for humans.

たとえば、議論のモデルを提案する研究がある。トゥールミンのモデル（非特許文献３）によれば、議論は事実（データ）、論拠、結論からなるとされる。たとえば、「山は空気がよい」という事実があり、「空気がよいことは一般によいことである」という論拠があると、そこから、「山に行くのがよい」という結論が導かれるというモデルである。この議論構造を図示すると図４のようになる。 For example, some studies have proposed models for discussion. According to Toulmin's model (Non-Patent Document 3), an argument consists of facts (data), arguments, and conclusions. For example, there is a fact that "the air is good in the mountains" and the argument that "the air is good is generally good" leads to the conclusion that "the mountain is good". Is. This discussion structure is illustrated in FIG.

ウォルトンが提案する議論のモデルでは、Premise（論拠）が支持（プラス）・不支持（マイナス）、もしくは、議論スキーム（支持・不支持のパターン）を持つ矢印で結ばれている（非特許文献４）。 In the argument model proposed by Walton, Premises (rationales) are connected by arrows having support (plus) / disapproval (minus) or a discussion scheme (support / dissupport pattern) (Non-Patent Document 4). ).

矢印の先のPremise はConclusion と呼ぶこともある。また、議論の中心となるConclusion のことをMain Issue と呼ぶ。ウォルトンのモデルにおける議論構造を図示すると図５のようになる。楕円はPremise を表すノードである。Premise にはその内容を表す文字列が付与されている。「草花がきれい」は「山に行くのがよい」にプラスで接続されており、支持する内容である。「登山は疲れる」は「山に行くのがよい」にマイナスでつながれており、反論となっている。また、「登山は疲れる」は「足腰が鍛えられる」とマイナスでつながれており反論となっている。図中では、Practical Reasoning（PR）という議論スキームが用いられている。PR の前のプラスは支持を表している。PR は「一般的に考えて良いという根拠を支持・不支持に用いる」という議論のパターンを表す。非特許文献４には、29 の議論スキームが列挙されている。PR の他には、Expert Opinion（専門家がそう言っているという根拠を支持・不支持に用いる）などがある。 The Premise at the tip of the arrow is sometimes called Conclusion. Also, the main issue of the discussion is called the main issue. The discussion structure in the Walton model is illustrated in FIG. Ellipses are nodes that represent Premises. Premise has a character string that indicates its content. "Beautiful flowers" is positively connected to "It is better to go to the mountains", and it is the content to support. "I'm tired of climbing" is negatively linked to "It is better to go to the mountain", which is a counter argument. In addition, "I'm tired of climbing" is a negative argument that "I can train my legs and feet", which is a counter argument. In the figure, a discussion scheme called Practical Reasoning (PR) is used. The plus before PR represents support. PR represents a pattern of discussion that "uses the basis of generally good thinking as supportive or non-supportive." Non-Patent Document 4 lists 29 discussion schemes. Other than PR, there is Expert Opinion (supporting or disapproving the basis that experts say so).

トゥールミンやウォルトンが提案するモデルに基づく議論構造を、テキストデータから自動的に抽出する試みは多い。そのような研究は以下の非特許文献５にまとめられている。 There are many attempts to automatically extract the argument structure based on the model proposed by Toulmin and Walton from text data. Such studies are summarized in Non-Patent Document 5 below.

河原達也，荒木雅弘，「音声対話システム」，オーム社，2006．Tatsuya Kawahara, Masahiro Araki, "Spoken Dialog System", Ohmsha, 2006. 中野幹生，駒谷和範，船越孝太郎，中野有紀子，奥村学（監修）「対話システム」，コロナ社，2016．Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Yukiko Nakano, Manabu Okumura (Supervised) "Dialogue System", Corona Publishing Co., 2016. ”The Uses of Argument (updated edition)”，Stephen E. Toulmin，Cambridge University Press, 2003．“The Uses of Argument (updated edition)”, Stephen E. Toulmin, Cambridge University Press, 2003. ”Methods of Argumentation, Douglas Walton”, Cambridge University Press, 2013.“Methods of Argumentation, Douglas Walton”, Cambridge University Press, 2013. Lippi, M., Torroni, P., “Argumentation Mining: State of the Art and Emerging Trends”, ACM Transactions on Internet Technology, 16(2): 10, 2016.Lippi, M., Torroni, P., “Argumentation Mining: State of the Art and Emerging Trends”, ACM Transactions on Internet Technology, 16 (2): 10, 2016.

人間にとって重要な議論であるが、モデルの研究やテキストデータからの議論構造の研究は進められているものの、自動的なシステムが自由に人間と議論を行うシステムは実現されていない。 Although it is an important argument for human beings, although research on models and discussion structures from text data has been advanced, an automatic system for freely discussing with humans has not been realized.

本発明は、上記の事情に鑑みてなされたもので、ユーザと議論を行うことができる対話システム、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a dialogue system, a method, and a program capable of discussing with a user.

上記目的を達成するために、本発明の対話システムは、ユーザ発話についての対話行為を推定する対話行為変換部と、前記ユーザ発話について、所定のドメインに属しているか否かを判定するドメイン判定部と、前記ユーザ発話について、議論に関するものであるか否かを判定する議論判定部と、議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造の各ノードのうち、前記ユーザ発話に対応するノードを判定する議論構造内判定部と、前記対話行為変換部による推定結果、前記ドメイン判定部による判定結果、前記議論判定部による判定結果、及び前記議論構造内判定部による判定結果に基づいて、システム側の次の行動を決定する対話管理部と、を含んで構成されている。 In order to achieve the above object, the dialogue system of the present invention includes a dialogue act conversion unit that estimates a dialogue act regarding a user utterance, and a domain determination unit that determines whether or not the user utterance belongs to a predetermined domain. And a node that represents each of a reasoning unit that determines whether or not the user utterance is related to a discussion, a reasoning that is a center of the discussion, and a reasoning that supports or does not support another reasoning. Among the nodes of the discussion structure, the discussion structure determination unit that determines the node corresponding to the user utterance, the estimation result by the dialogue action conversion unit, the determination result by the domain determination unit, the determination result by the discussion determination unit, And a dialogue management unit that determines the next action on the system side based on the determination result by the determination unit in the discussion structure.

本発明の対話方法は、対話行為変換部が、ユーザ発話についての対話行為を推定し、ドメイン判定部が、前記ユーザ発話について、所定のドメインに属しているか否かを判定し、議論判定部が、前記ユーザ発話について、議論に関するものであるか否かを判定し、議論構造内判定部が、議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造の各ノードのうち、前記ユーザ発話に対応するノードを判定し、対話管理部が、前記対話行為変換部による推定結果、前記ドメイン判定部による判定結果、前記議論判定部による判定結果、及び前記議論構造内判定部による判定結果に基づいて、システム側の次の行動を決定する。 In the dialogue method of the present invention, the dialogue act conversion unit estimates the dialogue act about the user utterance, the domain determination unit determines whether the user utterance belongs to a predetermined domain, and the discussion determination unit determines , A node that determines whether or not the user utterance is related to an argument, and the in-argument structure determination unit represents each of the argument that is the center of the argument and the argument that supports or does not support another argument. Among the nodes of the discussion structure including, the node corresponding to the user utterance is determined, and the dialogue management unit determines the estimation result by the dialogue action conversion unit, the determination result by the domain determination unit, and the determination result by the discussion determination unit. , And the next action on the system side based on the determination result by the determination unit in the discussion structure.

本発明に係るプログラムは、コンピュータを、本発明の対話システムを構成する各部として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit configuring the dialogue system of the present invention.

以上説明したように、本発明の対話システム、方法、及びプログラムによれば、対話行為変換部による推定結果、ドメイン判定部による判定結果、議論判定部による判定結果、及び議論構造内判定部による判定結果に基づいて、システム側の次の行動を決定することにより、ユーザと議論を行うことができる、という効果が得られる。 As described above, according to the dialogue system, method, and program of the present invention, the estimation result by the dialogue act conversion unit, the determination result by the domain determination unit, the determination result by the discussion determination unit, and the determination by the discussion structure determination unit. By deciding the next action on the system side based on the result, it is possible to obtain the effect of being able to discuss with the user.

本発明の実施の形態に係る対話システムの構成の一例を示す図である。It is a figure which shows an example of a structure of the dialog system which concerns on embodiment of this invention. 次のアクションを決定する方法を説明するための図である。It is a figure for demonstrating the method of determining the following action. 本発明の実施の形態に係る対話処理ルーチンのフローチャートの一例である。It is an example of a flowchart of a dialogue processing routine according to an embodiment of the present invention. トゥールミンの理論における議論構造の一例を示す図である。It is a figure which shows an example of the argument structure in the theory of Toulmin. ウォルトンの理論における議論構造の一例を示す図である。It is a figure which shows an example of the argument structure in Walton's theory.

＜概要＞
まず、本発明の実施の形態の概要について説明する。 <Outline>
First, the outline of the embodiment of the present invention will be described.

本発明の実施の形態では、あるドメインにおける所定の議論構造があるとき、議論を行うことを可能とする。具体的には、ユーザの入力発話について、その対話行為、ドメインが一致しているか、議論に関するものか、議論構造に対応するノードが存在するか、を推定する。また、議論構造とマルチモーダル情報を参照しながら、対話管理部が次発話を決定する。このように、人間と議論を行う自動的なシステムを実現することにより、人間の知的活動が促進される。 The embodiment of the present invention enables discussion when there is a predetermined discussion structure in a domain. Specifically, the user's input utterance is estimated as to whether the dialogue act, the domain match, the discussion, or the existence of a node corresponding to the discussion structure. The dialogue management unit determines the next utterance by referring to the discussion structure and the multimodal information. Thus, by realizing an automatic system for discussing with humans, human intellectual activities are promoted.

＜対話システムの構成＞
以下、図面を参照して本発明の実施の形態を詳細に説明する。図１は、本実施の形態に係る対話システムの構成の一例を示す図である。なお、本発明の実施の形態では、「遊びにいくなら海がよいか山がよいか」という限られた話題（ドメインと呼ぶ）について議論するものとする。ユーザは対話システムに対して発話を発し、当該発話は対話システムに入力される。また、対話システムは、発話をユーザに対して発するものとする。 <Structure of dialogue system>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an example of the configuration of a dialogue system according to the present embodiment. In the embodiment of the present invention, a limited topic (referred to as a domain) “whether the sea is good or the mountains are good when going to play” will be discussed. The user speaks to the dialogue system, and the speech is input to the dialogue system. Further, the dialogue system is supposed to utter a speech to the user.

本実施の形態に係る対話システム１００は、ＣＰＵと、ＲＡＭと、後述する対話処理ルーチンを実行するためのプログラム及び各種データを記憶したＲＯＭとを備えたコンピュータで構成されている。また、記憶手段としてＨＤＤを設けてもよい。 The dialogue system 100 according to the present embodiment is composed of a computer including a CPU, a RAM, and a ROM storing a program for executing a dialogue processing routine described later and various data. Further, an HDD may be provided as a storage means.

このコンピュータは、機能的には、図１に示すように、入力部１０、演算部２０、及び出力部９０を含んだ構成で表わすことができる。 This computer can be functionally represented by a configuration including an input unit 10, a calculation unit 20, and an output unit 90, as shown in FIG.

入力部１０は、例えば、マイクであり、ユーザの発話の入力を受け付ける。 The input unit 10 is, for example, a microphone and receives an input of a user's utterance.

センサ１２は、ユーザの動作を検知する。本実施の形態では、ユーザが用いるマイクに付与された加速度センサ及びジャイロセンサを用いる。 The sensor 12 detects a user's action. In this embodiment, an acceleration sensor and a gyro sensor attached to a microphone used by the user are used.

演算部２０は、発話区間検知部２２、音声認識部２４、動作検知部２６、発話入力部２８、発話判定部３０、対話管理部３２、議論構造記憶部３４、マルチモーダル情報記憶部３６、雑談応答部３８、発話生成部４０、及び音声合成部４２を備えている。 The calculation unit 20 includes a speech section detection unit 22, a voice recognition unit 24, a motion detection unit 26, a speech input unit 28, a speech determination unit 30, a dialogue management unit 32, a discussion structure storage unit 34, a multimodal information storage unit 36, a chat. A response unit 38, a speech generation unit 40, and a voice synthesis unit 42 are provided.

発話判定部３０は、対話行為変換部５０、議論判定部５２、議論構造内判定部５４、及びドメイン判定部５６を備えている。以下、それぞれの部分について詳しく説明する。 The utterance determination unit 30 includes a dialogue action conversion unit 50, a discussion determination unit 52, a discussion structure determination unit 54, and a domain determination unit 56. Hereinafter, each part will be described in detail.

（発話区間検知部２２）
発話区間検知部２２は、入力されたユーザ発話に基づいて、音声の発話区間を検知する。音声のパワーやzero交差の数、ポーズ長などを用いることで、音声の開始時点、音声の終了時点を検出することができる。発話区間の検知は基本的な技術であり、一般的な音声認識エンジンにも組み込まれているものである。発話区間検知部２２は音声区間が開始すると開始したという信号（VAD_START）を、マルチモーダル情報記憶部３６に伝える。また、音声区間が終了した場合、その信号（VAD_END）をマルチモーダル情報記憶部３６に伝える。 (Utterance section detection unit 22)
The utterance section detection unit 22 detects the utterance section of voice based on the input user utterance. By using the power of the voice, the number of zero crossings, the pause length, etc., it is possible to detect the start point of the voice and the end point of the voice. The detection of the utterance section is a basic technique and is also incorporated in a general voice recognition engine. The utterance section detection unit 22 notifies the multimodal information storage unit 36 of a signal (VAD_START) that the speech section has started when the voice section starts. When the voice section ends, the signal (VAD_END) is transmitted to the multimodal information storage unit 36.

（音声認識部２４）
音声認識部２４は、発話区間検知部２２と連携し、検知した音声区間に含まれる音声波形をテキスト化する。 (Voice recognition unit 24)
The voice recognition unit 24 cooperates with the utterance section detection unit 22 to convert the voice waveform included in the detected voice section into text.

ここでは、出願人によるVoiceRex という音声認識エンジンを用いるが、その他の市販されている音声認識エンジンを用いてもよい。音声認識部２４は、音声認識途中のショートポーズ（数百ms のポーズ）において、ショートポーズが発生したという信号（RECG_SP）をマルチモーダル情報記憶部３６に伝える。 Here, a voice recognition engine called VoiceRex by the applicant is used, but other commercially available voice recognition engines may be used. The voice recognition unit 24 transmits a signal (RECG_SP) indicating that a short pause has occurred to the multimodal information storage unit 36 in the short pause (pause of several hundred ms) during voice recognition.

また、音声認識が終了し、テキストが得られた際には、音声認識が終了したという信号（RECG_LP）をマルチモーダル情報記憶部３６に伝えるとともに、認識された結果のテキストを発話入力部２８に送る。 Further, when the voice recognition is completed and the text is obtained, a signal (RECG_LP) indicating that the voice recognition is completed is transmitted to the multi-modal information storage unit 36, and the recognized text is output to the utterance input unit 28. send.

（動作検知部２６）
動作検知部２６は、センサ１２によって検知されたデータに基づいて、ユーザの動作を検知する。具体的には、ユーザが用いるマイクに付与された上げ下げ検知を行う加速度センサとジャイロセンサを用いてユーザがマイクを持ち上げて話し始める動作を検知する。ユーザがマイクを持ち上げたときには、マイクを持ち上げたという信号（MIC_UP）を、マイクを置いた時には、マイクを置いたという信号（MIC_DOWN）をマルチモーダル情報記憶部３６に送る。 (Motion detector 26)
The motion detector 26 detects a user motion based on the data detected by the sensor 12. Specifically, the motion of the user picking up the microphone and starting talking is detected by using an acceleration sensor and a gyro sensor that are provided for the microphone used by the user and that detects up and down. When the user lifts the microphone, a signal that the microphone is lifted (MIC_UP) is sent to the multimodal information storage unit 36, and when the microphone is put, a signal that the microphone is put (MIC_DOWN) is sent to the multimodal information storage unit 36.

（発話入力部２８）
発話入力部２８は、音声認識部２４から発話テキストを受け取り、対話行為変換部５０、議論判定部５２、議論構造内判定部５４、及びドメイン判定部５６にそれぞれ送る。 (Utterance input unit 28)
The utterance input unit 28 receives the utterance text from the voice recognition unit 24, and sends the utterance text to the dialogue action conversion unit 50, the discussion determination unit 52, the discussion structure determination unit 54, and the domain determination unit 56, respectively.

（対話行為変換部５０）
対話行為変換部５０は発話テキストを入力とし、対話行為を推定する。対話行為とは、ユーザ発話の発話意図を表すシンボルである。ここでは、４種類の対話行為を用いる。 (Dialogue action conversion unit 50)
The dialogue act conversion unit 50 receives the utterance text and estimates the dialogue act. The dialogue act is a symbol representing the utterance intention of the user utterance. Here, four types of dialogue actions are used.

これらは、Assertion、Question、Retraction、Concession である。それぞれ、主張、質問、撤回、譲歩という意味である。以下はそれぞれの対話行為の説明と発話の例である。 These are Assertion, Question, Retraction, Concession. It means assertion, question, withdrawal, and concession, respectively. Below are explanations of each dialogue act and examples of utterances.

Assertionは、命題を述べる発話であり、例えば、「私は山がいいと思います。」、「草花はきれいですよね。」である。 Assertion is an utterance that states a proposition, for example, "I think mountains are good." And "flowers are beautiful."

Questionは、命題について質問をする発話であり、例えば、「山がいいんですか？」、「登山は疲れませんか？」である。 Question is an utterance that asks a question about a proposition, such as "Is the mountain good?" And "Are you tired of climbing?"

Retractionは、自身が述べた命題を撤回する発話であり、例えば、「山は空気が美味しいというのは撤回します」、「反論できません」である。 Retraction is an utterance that withdraws the proposition that I have stated, such as "I can withdraw that the mountain has good air", "I can not argue."

Concessionは、相手が述べた命題を認める発話であり、例えば、「たしかに草花はきれいですね。」、「それは認めざるを得ないですね」である。 Concession is an utterance that admits the proposition stated by the other party, for example, "Yes, the flowers are beautiful.", "It must be admitted."

対話行為変換部５０は発話テキストをこれらの４種類の対話行為に分類する。この分類には、一般的なテキスト分類手法を用いればよい。たとえば、発話テキストから形態素解析を経て得られる単語n-gramを特徴量として、学習データに基づき（すなわち、複数の発話のそれぞれについて、上記４種類のいずれかがラベル付されたデータ）、機械学習の手法を用いて、分類器を学習すればよい。学習アルゴリズムとしては、ロジスティック回帰やサポートベクトルマシン、もしくは、深層学習による手法を用いればよい。テキスト分類手法の一般的なアルゴリズムについては、非特許文献６に示されている。深層学習による手法は非特許文献７に示されている。 The dialogue act conversion unit 50 classifies the utterance text into these four types of dialogue acts. A general text classification method may be used for this classification. For example, machine learning based on learning data (that is, data labeled with any of the above four types for each of a plurality of utterances) using a word n-gram obtained from utterance text through morphological analysis as a feature amount. The classifier may be learned by using the method of. As the learning algorithm, a method based on logistic regression, support vector machine, or deep learning may be used. Non-Patent Document 6 shows a general algorithm of the text classification method. The method based on deep learning is shown in Non-Patent Document 7.

［非特許文献6］「言語処理のための機械学習入門(自然言語処理シリーズ)」，高村大也 (著), 奥村学(監修)，コロナ社，2010.
［非特許文献7］Yoon Kim，"Convolutional Neural Networks for Sentence Classication", Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1746-1751, 2014. [Non-Patent Document 6] "Introduction to Machine Learning for Language Processing (Natural Language Processing Series)", Daiya Takamura (Author), Manabu Okumura (Supervision), Corona Publishing Co., 2010.
[Non-Patent Document 7] Yoon Kim, "Convolutional Neural Networks for Sentence Classication", Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1746-1751, 2014.

対話行為変換部５０の出力は、推定された対話行為とその信頼度である。たとえば、「Assertion:0.8」といったものになる。0.8 は信頼度である。なお、ここでは、ロジスティック回帰を用いているため分類結果に信頼度を出力することができている。 The output of the dialogue act conversion unit 50 is the estimated dialogue act and its reliability. For example, "Assertion: 0.8". 0.8 is the reliability. Note that here, since the logistic regression is used, the reliability can be output to the classification result.

（議論構造内判定部５４）
議論構造内判定部５４は、ユーザが発話した内容が、議論構造の中（ノードのいずれか）に対応するもの（意味的に類似するもの）があるかどうかを判定する。対応するものがある場合、そのノード番号とその類似度を出力する。議論構造については後述する。 (Discussion structure determination unit 54)
The discussion structure determination unit 54 determines whether or not the content uttered by the user corresponds (semanticly similar) to the discussion structure (any of the nodes). If there is a corresponding one, the node number and its similarity are output. The discussion structure will be described later.

意味的に類似しているノードを議論構造から得るためには、ユーザ発話と議論構造のノードの文字列との意味的な類似度を計算する必要がある。本発明の実施の形態では、word2vec に基づく文間の類似度を用いる。word2vec とは、各単語をその意味を表すベクトルに変換する手法であり、単語のベクトル表現を得る一般的な手法である（非特許文献８）。これを用いて類似度を計算するには、発話テキストから形態素解析を経て得られる単語それぞれのベクトルを基にこれらの平均ベクトルを求め、また、ノードの文字列についても同様に平均ベクトルを求め、これらのベクトルのコサイン類似度を求めればよい。この結果、最も類似度の高いノードを得ることができる。なお、文同士の意味的な類似度を計算する手法であれば、他の手法を用いてもよい。たとえば、単語の重複数やWordNet（英語のシソーラス）におけるSynset（類義語を表すID）の重複数などを用いてもよい。 In order to obtain the semantically similar nodes from the argument structure, it is necessary to calculate the semantic similarity between the user utterance and the character string of the node in the argument structure. In the embodiment of the present invention, the similarity between sentences based on word2vec is used. word2vec is a method of converting each word into a vector representing its meaning, and is a general method of obtaining a vector expression of a word (Non-Patent Document 8). To calculate the degree of similarity using this, the average vector of these is obtained based on the vector of each word obtained from the utterance text through morphological analysis, and the average vector is similarly obtained for the character strings of the nodes. The cosine similarity of these vectors may be calculated. As a result, the node with the highest degree of similarity can be obtained. Note that another method may be used as long as it is a method of calculating the semantic similarity between sentences. For example, the duplication of words or the duplication of Synset (an ID representing a synonym) in WordNet (English thesaurus) may be used.

［非特許文献8］Tomas Mikolov et al. “Distributed representations of words and phrases and their compositionality," Advances in neural information processing systems, 2013. [Non-Patent Document 8] Tomas Mikolov et al. “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, 2013.

（ドメイン判定部５６）
ドメイン判定部５６は、ユーザ発話が所定のドメインに属しているかを推定する。今回の例は、「遊びにいくなら海がよいか山がよいか」というドメインであるので、ユーザ発話の話題がこのドメインにどの程度合致するかを判定する。ここでも、対話行為変換部５０と同様、当該ドメインのテキストと、当該ドメインではないテキストを学習データとして機械学習の手法により分類器を学習すればよい。学習データが十分ない場合は、たとえば、議論構造内のノードに紐づけられた文字列のそれぞれから、議論構造内判定部５４で用いているものと同様の平均ベクトルを求め、発話テキストの平均ベクトルとそれぞれのノードの平均ベクトルとの全ての類似度を求め、所定の閾値よりも大きいものが無ければ、当該ドメインでないと判定すればよい。たとえば閾値は0.5 である。 (Domain determination unit 56)
The domain determination unit 56 estimates whether the user utterance belongs to a predetermined domain. In this example, the domain is "whether the sea is good or the mountain is good if you go to play." Therefore, it is determined how much the topic of the user utterance matches this domain. Here, similarly to the dialogue act conversion unit 50, the classifier may be learned by a machine learning method using the text of the domain and the text that is not the domain as learning data. If there is not enough learning data, for example, from each of the character strings linked to the nodes in the discussion structure, an average vector similar to that used in the discussion structure determination unit 54 is obtained, and the average vector of the utterance text is calculated. And the average vector of each node are obtained, and if there is no greater than a predetermined threshold value, it may be determined that the domain is not the domain. For example, the threshold is 0.5.

ドメイン判定部５６の出力は、システムが扱う所定のドメインに属しているかどうかという判定結果であり、ロジスティック回帰を用いる場合には、先の例と同様に信頼度も、ドメイン判定部５６の出力として得ることができる。コサイン類似度を用いている場合は、この類似度（例えば、最もコサイン類似度が高かったノードとのコサイン類似度の値）をそのまま信頼度とすればよい。 The output of the domain determination unit 56 is a determination result as to whether or not it belongs to a predetermined domain handled by the system. When logistic regression is used, the reliability is also output as an output of the domain determination unit 56 as in the previous example. Obtainable. When the cosine similarity is used, this similarity (for example, the value of the cosine similarity with the node having the highest cosine similarity) may be used as the reliability as it is.

（対話管理部３２）
対話管理部３２は、ユーザ発話の文字列と、対話行為変換部５０、議論判定部５２、議論構造内判定部５４、及びドメイン判定部５６の結果とを受け取り、図２に示す通り次の動作（アクション）を決定する。なお、ここでは信頼度の情報は用いず、結果のみを対象として次の動作を決定するが、信頼度の閾値を操作することで、信頼度が高い場合のみに所定の動作するようにするなど、適合率を重視した判定を行うようにしてもよい。 (Dialogue management unit 32)
The dialogue management unit 32 receives the character string of the user utterance and the results of the dialogue act conversion unit 50, the discussion determination unit 52, the discussion structure determination unit 54, and the domain determination unit 56, and performs the next operation as shown in FIG. Determine (action). Note that here, the reliability information is not used, and the next operation is determined based on only the result. However, by operating the reliability threshold value, the predetermined operation is performed only when the reliability is high. Alternatively, the determination may be performed with emphasis on the matching rate.

アクションは、議論構造更新、ドメイン外処理、議論構造外処理、及び雑談処理の４つある。それぞれ以下の処理を行う。 There are four actions: discussion structure update, processing outside the domain, processing outside the discussion structure, and chat processing. The following processes are performed respectively.

議論構造更新では、対話行為とノード番号を用いて、後述する議論構造を更新する。具体的には、たとえば、Assertionがなされたら、当該ノード番号に対応するノードのMentioned フラグを1 にする。Questionがなされたら、当該ノード番号に対応するノードのQuestioned フラグを1 にする。Retractionであれば、対応するノードがユーザ自身のサイドである場合、Accepted フラグをディフィートにする。Concession であれば、対応するノードがユーザ自身のサイドでない場合、Acceptedフラグをアクセプトにする。なお、ここで記述した更新は一例であって、その他の更新規則を用いて、これらの挙動と異なる更新を行うようにしてもよい。また、サイドとは、議論構造におけるＭａｉｎＩｓｓｕｅを支持する側か、支持しない側かを示すものである。 In the discussion structure update, the discussion structure described below is updated using the dialogue act and the node number. Specifically, for example, when Assertion is performed, the Mentioned flag of the node corresponding to the node number is set to 1. When a question is asked, set the Questioned flag of the node corresponding to the node number to 1. In the case of Retraction, if the corresponding node is the user's own side, the Accepted flag is defeated. If it is Concession, if the corresponding node is not the user's own side, the Accepted flag is set to accept. Note that the update described here is an example, and other update rules may be used to perform an update different from these behaviors. In addition, the side indicates whether the main issue in the discussion structure is supported or not supported.

ドメイン外処理は、対応できないドメインであるため、対応できない旨を応答する。 Since the processing outside the domain is a domain that cannot be handled, a response indicating that it cannot be handled is returned.

議論構造外処理は、議論でありドメイン内であるが、議論構造内にない議論がされてしまった場合であるので、議論を続けたくてもできない状況である。この場合は、議論不可の旨を応答する。 The process outside the discussion structure is a discussion and is within the domain, but it is a case where there is a discussion that is not within the discussion structure, so it is a situation where it is not possible to continue the discussion. In this case, the response is that the discussion is not possible.

雑談処理は、議論ではない入力ではあるが、ドメイン内であるため、議論とは関係のない雑談が入力された旨を応答する。 The chat process is an input that is not a discussion, but since it is within the domain, it responds that a chat that is not related to the discussion has been input.

議論構造を更新したり、その他の処理が行われた後、システムは自分自身の行動を行うが、その前に、ユーザが次の行動を取ろうとしている可能性があり、発話がかち合わないために、マルチモーダル情報記憶部３６を参照する。マルチモーダル情報記憶部３６については後述する。マルチモーダル情報記憶部３６に問い合わせることで、ユーザが発話中か、もしくは、ユーザが近い時刻に発話しそうかどうかの情報を得ることができる。もし、ユーザが発話中ではない、もしくは、近い時刻に発話しないと判定されたら、対話システム１００は現状の認識に従ったアクションを実際に行う。具体的には、議論構造が更新されている場合は、ユーザ発話に対応するノードにQuestioned フラグが立っていなければそのノードについての質問文を作成し発話生成部４０に送る。もしくは、ユーザ発話に対応するノードに対して、反対意見のノード（マイナスで接続されたノード）があれば、そのノードの発話文字列を取得し、発話生成部４０に送る。そのようなノードが見つからない場合は、ランダムにシステム自身のサイドのノードを選択し、そのノードの発話文字列を発話生成部４０に送る。これらは基本的な挙動であるが、これらの挙動以外にも別途ルールを作成してそのように動作するようにしてもよい。なお、このアクションと同時に、ユーザ発話が行われた時と同様、議論構造を更新する必要がある点に注意する。ドメイン外処理の場合は、「ドメインが違います」や「海と山について話しましょう」といったユーザを議論のコンテンツに引き戻すような発話を発話生成部４０に送る。議論構造外処理の場合は、「意見が思いつきません」といった意見や反論が思いつかないという旨の発話を発話生成部４０に送る。雑談処理の場合は、ユーザの発話文字列を雑談応答部３８に送る。 After updating the discussion structure or doing other things, the system will take its own action, but before that, the user may be trying to take the next action and the utterances do not conflict. Therefore, the multimodal information storage unit 36 is referred to. The multimodal information storage unit 36 will be described later. By inquiring the multi-modal information storage unit 36, it is possible to obtain information as to whether the user is speaking or if the user is likely to speak at a close time. If it is determined that the user is not speaking or is not speaking at a close time, the dialogue system 100 actually performs the action according to the current recognition. Specifically, when the discussion structure is updated, if the questioned flag is not set in the node corresponding to the user utterance, a question sentence about the node is created and sent to the utterance generation unit 40. Alternatively, if there is a node that disagrees with the node corresponding to the user's utterance (a node connected negatively), the utterance character string of the node is acquired and sent to the utterance generation unit 40. If no such node is found, a node on the side of the system itself is randomly selected and the utterance character string of the node is sent to the utterance generation unit 40. Although these are basic behaviors, rules other than these behaviors may be created separately to operate in that way. Note that the discussion structure needs to be updated at the same time as this action, just as when the user uttered. In the case of out-of-domain processing, an utterance that brings the user back to the content of the discussion, such as "the domain is different" or "let's talk about the sea and the mountain," is sent to the utterance generation unit 40. In the case of processing outside the discussion structure, an utterance indicating that an opinion such as “I cannot think of an opinion” or a counterargument cannot be conceived is sent to the utterance generation unit 40. In the case of the chat process, the uttered character string of the user is sent to the chat response unit 38.

（議論構造記憶部３４）
議論構造記憶部３４は、議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造を記憶している。議論構造にはウォルトンの議論構造を用いる。ただし、ここでは、議論スキームは扱わない。Premise を表すノードはプラスかマイナスの矢印によってのみ接続されている。矢印元と矢印先のノードがある時、矢印元は子ノード、矢印先は親ノードと呼ぶ。Main Issue 以外のノードは必ず親ノードが存在する。各ノードは少なくとも以下のノード番号、サイド、命題を表す文字列、Mentionedフラグ、Questionedフラグ、及びAcceptedフラグを持つものとする。 (Discussion structure storage unit 34)
The argument structure storage unit 34 stores an argument structure including nodes that represent arguments that are central to an argument and arguments that support or not support other arguments. Walton's argument structure is used as the argument structure. However, the discussion scheme is not dealt with here. The nodes representing Premises are connected only by plus or minus arrows. When there is an arrow source and an arrow destination node, the arrow source is called a child node and the arrow destination is called a parent node. Parent nodes always exist for nodes other than Main Issue. Each node has at least the following node numbers, sides, character strings representing propositions, Mentioned flags, Questioned flags, and Accepted flags.

ノード番号は、議論構造中のノードについてユニークに付与された番号である。 The node number is a number uniquely assigned to each node in the discussion structure.

サイドは、上述したように、議論構造におけるMain Issue について、支持側か不支持側かを表す。ここでは、一方をAサイド、もう一方をBサイドと呼ぶ。 As described above, the side indicates whether the Main Issue in the discussion structure is the support side or the non-support side. Here, one is called the A side and the other is called the B side.

命題を表す文字列は、Premise の内容を表す文字列である。たとえば、「草花がきれい」や「足腰が鍛えられる」などである。 The string that represents the proposition is the string that represents the contents of the Premise. For example, "flowers are beautiful" and "legs and legs are trained".

Mentioned フラグ（Mフラグ）は、当該ノードが議論において言及されたかどうかを表す2 値フラグである。0 だと言及されていない状態、1 だと言及された状態を表す。 The Mentioned flag (M flag) is a binary flag indicating whether the node is mentioned in the discussion. A state not referred to as 0 and a state referred to as 1.

Questioned フラグ（Q フラグ）当該ノードが議論において質問されたかどうかを表す2 値フラグである。0 だと質問されていない状態、1 だと質問された状態を表す。 Questioned flag (Q flag) This is a binary flag indicating whether the node was queried in the discussion. 0 indicates that no questions are asked, and 1 indicates that questions are asked.

Accepted フラグ（A フラグ）は、当該ノードが議論の参加者によってアクセプトされたかどうかを表す3 値フラグである。undef は未定義、accept はアクセプトされた状態、defeat はディフィートされた状態を表す。アクセプトされた状態とは、議論においてPremise の内容が受理された（認められた）ことを表す。ディフィートされた状態とは、議論においてPremise の内容が不受理となった（認められなかった）ことを表す。undef はアクセプトでもディフィートでもない状態を指す。 The Accepted flag (A flag) is a ternary flag that indicates whether the node has been accepted by the participants in the discussion. undef is undefined, accept is accepted, and defeat is defeated. The accepted state means that the content of the Premise has been accepted (acknowledged) in the discussion. The defeated state means that the content of the Premise has been rejected (not accepted) in the discussion. undef refers to a state that is neither accept nor defeat.

（マルチモーダル情報記憶部３６）
マルチモーダル情報記憶部３６は、VAD_START、VAD_END、RECG_SP、RECG_LP、MIC_UP、MIC_DOWN、TTS_START、TTS_END の信号を受け取り、マルチモーダル情報記憶部３６が管理するマルチモーダル情報が更新される。マルチモーダル情報には、ユーザとシステムを表す状態が含まれており、入力にしたがって、ユーザとシステムの状態を更新する。本実施の形態では、マルチモーダル情報記憶部３６は、ユーザの状態を、入力された信号に応じて以下のようにマルチモーダル情報を更新する。 (Multimodal information storage unit 36)
The multi-modal information storage unit 36 receives signals of VAD_START, VAD_END, RECG_SP, RECG_LP, MIC_UP, MIC_DOWN, TTS_START, and TTS_END, and the multi-modal information managed by the multi-modal information storage unit 36 is updated. The multimodal information includes states representing the user and the system, and updates the states of the user and the system according to the input. In the present embodiment, the multi-modal information storage unit 36 updates the multi-modal information of the user's state as follows according to the input signal.

VAD_STARTの信号を受け取ると、話者の状態を話している状態にする。
VAD_ENDの信号を受け取ると、話者の状態を話していない状態にする。
RECG_SPの信号を受け取ると、話者のショートポーズが行われたので、発話生成部４０に相槌命令を送る。
RECG_LPの信号を受け取ると、話者の状態を話していない状態にする。
MIC_UPの信号を受け取ると、話者の状態を話す可能性があるという状態にする。
MIC_DOWNの信号を受け取ると、話者の状態を話す可能性がないという状態にする。
TTS_STARTの信号を受け取ると、システムの状態を話している状態にする。
TTS_ENDの信号を受け取ると、システムの状態を話していない状態にする。 When the VAD_START signal is received, the state of the speaker is changed to the talking state.
When the VAD_END signal is received, the state of the speaker is changed to the non-speaking state.
When the RECG_SP signal is received, the speaker's short pause has been performed, and therefore, an azuchi command is sent to the utterance generation unit 40.
When the signal of RECG_LP is received, the state of the speaker is changed to the non-speaking state.
When the MIC_UP signal is received, the state of the speaker is changed to the state in which there is a possibility of speaking.
When the signal of MIC_DOWN is received, the state of the speaker is set to the state that there is no possibility of speaking.
When the signal of TTS_START is received, the state of the system is changed to the talking state.
When the signal of TTS_END is received, the state of the system is changed to the non-speaking state.

マルチモーダル情報記憶部３６は、対話管理部３２から問い合わせを受けた場合、話者の状態を話している状態、もしくは、話者の状態を話す可能性があるという状態であれば、ユーザが発話中、もしくは、ユーザが近い時刻に発話しそうであることを伝達する。 When the multi-modal information storage unit 36 receives an inquiry from the dialogue management unit 32, if the user is speaking the state of the speaker or the state of the speaker is likely to be spoken, the user speaks. Communicate that the user is about to speak in the middle or near time.

（雑談応答部３８）
雑談応答部３８は、対話管理部３２から入力される発話文字列から、雑談応答を生成する。ここでは、用例ベースの雑談応答を行う。具体的には、想定される入力発話（input部）とそれに対する出力発話(output部)のペアを大量に準備しておき、入力された発話文字列と最も意味的に類似するinput部を求め（たとえば、前述のword2vecによる手法や単語のオーバーラップ数などを用いればよい）、そのinput部のペアとなっているoutput部を得て、それを発話生成部４０に出力する。なお、雑談応答として、いわゆるif-then ルールで応答してもよいし、その他の雑談対話システムでよく用いられる抽出ベースの手法や深層学習（特に、再帰型ニューラルネットワークによる手法）を用いてもよい。入力発話に関係し、話を継続させることが可能と思われる発話を生成できるものであればよい。 (Chat response section 38)
The chat response unit 38 generates a chat response from the utterance character string input from the dialogue management unit 32. Here, a chat response based on an example is performed. Specifically, prepare a large number of pairs of expected input utterances (input parts) and output utterances (output parts) for them, and find the input part that is most semantically similar to the input utterance character string. (For example, the above-mentioned method using word2vec or the number of overlaps of words may be used), an output section that is a pair of the input section is obtained, and the obtained output section is output to the utterance generation section 40. The chat response may be a so-called if-then rule, or may be an extraction-based method or deep learning (particularly a recurrent neural network method) that is often used in other chat dialogue systems. . It may be any one that can generate an utterance related to the input utterance and that is likely to continue the utterance.

（発話生成部４０）
発話生成部４０は、対話管理部３２から送られた発話文字列、マルチモーダル情報から送られた相槌命令、雑談応答部３８から送られた発話文字列を入力とし、これらを音声合成部４２に送る。相槌命令の場合は「はい」や「ええ」といった発話文字列（これらは事前に定義しておく）を音声合成部４２に送る。 (Utterance generator 40)
The utterance generation unit 40 receives the utterance character string sent from the dialogue management unit 32, the azuchi command sent from the multimodal information, and the utterance character string sent from the chat response unit 38, and inputs these to the voice synthesis unit 42. send. In the case of the Aizuchi command, the utterance character strings such as “Yes” and “Yes” (these are defined in advance) are sent to the voice synthesis unit 42.

（音声合成部４２）
音声合成部４２は、発話生成部４０から送られた発話文字列を基に音声波形を生成し、システム発話として音声で出力部９０により出力する。本発明の実施の形態では、音声合成エンジンとして、出願人のFutureVoiceを用いるが、その他の市販の音声合成エンジンを用いてもよい。なお、音声合成を開始する際には、TTS_STARTという信号をマルチモーダル情報記憶部３６に送る。また、音声合成が終了した際には、TTS_ENDという信号をマルチモーダル情報記憶部３６に送る。 (Speech synthesizer 42)
The voice synthesizing unit 42 generates a voice waveform based on the utterance character string sent from the utterance generating unit 40, and outputs the voice waveform as a system utterance by the output unit 90. Although FutureVoice of the applicant is used as the voice synthesis engine in the embodiment of the present invention, other commercially available voice synthesis engines may be used. When starting the voice synthesis, a signal TTS_START is sent to the multimodal information storage unit 36. Further, when the voice synthesis is completed, a signal TTS_END is sent to the multimodal information storage unit 36.

＜対話システムの動作＞
次に、本実施の形態に係る対話システム１００の作用について説明する。まず、ユーザと対話システム１００との対話が開始され、入力部１０により、ユーザ発話の入力を受け付けると、対話システム１００によって、図３に示す対話処理ルーチンが実行される。対話処理ルーチンは、ユーザの発話が発せられる毎に実行される。 <Operation of dialogue system>
Next, the operation of the dialogue system 100 according to the present embodiment will be described. First, the dialogue between the user and the dialogue system 100 is started, and when the input unit 10 receives the input of the user's utterance, the dialogue system 100 executes the dialogue processing routine shown in FIG. 3. The dialogue processing routine is executed every time a user's utterance is made.

まず、ステップＳ１００において、ユーザ発話に基づいて、音声の発話区間を検知し、ステップＳ１０２において、検知した音声区間に含まれる音声波形をテキスト化する。 First, in step S100, the utterance section of the voice is detected based on the user's utterance, and in step S102, the voice waveform included in the detected voice section is converted into text.

ステップＳ１０４では、上記ステップＳ１０２で得られた発話テキストを受け取り、対話行為変換部５０、議論判定部５２、議論構造内判定部５４、及びドメイン判定部５６にそれぞれ送る。 In step S104, the utterance text obtained in step S102 is received and sent to the dialogue act conversion unit 50, the discussion determination unit 52, the discussion structure determination unit 54, and the domain determination unit 56, respectively.

ステップＳ１０６では、ユーザ発話についての対話行為を推定する。 In step S106, the dialogue act about the user's utterance is estimated.

ステップＳ１０８では、ユーザ発話について、議論に関するものであるか否かを判定する。 In step S108, it is determined whether or not the user utterance is related to discussion.

ステップＳ１１０では、議論構造の各ノードのうち、ユーザ発話に対応するノードを判定する。 In step S110, of the nodes in the discussion structure, the node corresponding to the user utterance is determined.

ステップＳ１１２では、ユーザ発話について、所定のドメインに属しているか否かを判定する。 In step S112, it is determined whether or not the user utterance belongs to a predetermined domain.

ステップＳ１１４では、発話テキストと、上記ステップＳ１０６〜Ｓ１１２の結果とに基づいて、上記図２に示す通り次の動作（アクション）を決定する。 In step S114, the next action (action) is determined as shown in FIG. 2 based on the uttered text and the results of steps S106 to S112.

ステップＳ１１６では、上記ステップＳ１１４で決定された次のアクションが、議論構造更新であれば、ステップＳ１１８へ移行し、決定された次のアクションが、雑談処理であれば、ステップＳ１２２へ移行し、決定された次のアクションが、ドメイン外処理又は議論構造外であれば、ステップＳ１２６へ移行する。 In step S116, if the next action determined in step S114 is the discussion structure update, the process proceeds to step S118. If the next action determined is the chat process, the process proceeds to step S122 and is determined. If the performed next action is outside the domain process or outside the discussion structure, the process proceeds to step S126.

ステップＳ１１８では、対話行為とノード番号を用いて議論構造を更新し、ステップＳ１２０において、ユーザ発話に対応するノードについての質問文、ノードの発話文字列、又はランダムに選択されたシステム自身のサイドのノードの発話文字列を、発話生成部４０に送る。 In step S118, the discussion structure is updated using the dialogue act and the node number, and in step S120, the question sentence about the node corresponding to the user utterance, the utterance character string of the node, or the randomly selected side of the system itself. The utterance character string of the node is sent to the utterance generation unit 40.

ステップＳ１２２では、ユーザ発話を雑談応答部３８に送り、ステップＳ１２４において、ユーザ発話の文字列から、雑談応答を表す発話文字列を生成し、発話生成部４０へ送る。 In step S122, the user utterance is sent to the chat response unit 38, and in step S124, the utterance character string representing the chat response is generated from the user utterance character string and sent to the utterance generation unit 40.

ステップＳ１２６では、次のアクションがドメイン外処理の場合、「ドメインが違います」や「海と山について話しましょう」といったユーザを議論のコンテンツに引き戻すような発話を発話生成部４０に送る。次のアクションが、議論構造外処理の場合は、「意見が思いつきません」といった意見や反論が思いつかないという旨の発話を発話生成部４０に送る。 In step S126, if the next action is out-of-domain processing, an utterance that brings the user back to the discussion content, such as "the domain is different" or "let's talk about the sea and the mountain," is sent to the utterance generation unit 40. If the next action is processing outside the discussion structure, the utterance generating unit 40 is sent an utterance indicating that an opinion or counterargument such as "I can't think of an opinion" or a counterargument cannot be conceived.

ステップＳ１２８では、マルチモーダル情報記憶部３６に記憶されているマルチモーダル情報を参照して、ユーザが発話中、もしくは、ユーザが近い時刻に発話しそうであるか否かを判定する。ユーザが発話中、もしくは、ユーザが近い時刻に発話しそうであると判定された場合には、対話システム１００が発話せずに、対話処理ルーチンを終了する。 In step S128, the multimodal information stored in the multimodal information storage unit 36 is referred to, and it is determined whether or not the user is speaking or the user is likely to speak at a near time. If the user is speaking or if it is determined that the user is about to speak at a near time, the dialogue system 100 does not speak and the dialogue processing routine ends.

一方、ユーザが発話中でもなく、ユーザが近い時刻に発話しそうでないと判定された場合には、ステップＳ１３０において、発話生成部４０は、対話管理部３２から送られた発話文字列、マルチモーダル情報から送られた相槌命令、又は雑談応答部３８から送られた発話文字列を入力とし、これらを音声合成部４２に送る。 On the other hand, when the user is not speaking and it is determined that the user is not likely to speak at a close time, the speech generation unit 40 determines from the speech character string and the multimodal information sent from the dialogue management unit 32 in step S130. The sent Azuchi command or the uttered character string sent from the chat response unit 38 is input, and these are sent to the voice synthesis unit 42.

ステップＳ１３２では、音声合成部４２は、発話生成部４０から送られた発話文字列を基に音声波形を生成し、システム発話として音声で出力部９０により出力する。そして、ステップＳ１３４では、上記ステップＳ１３２で出力したシステム発話に応じて、議論構造を更新し、対話処理ルーチンを終了する。例えば、上記ステップＳ１３２で出力したシステム発話に対応する対話行為（Assertion、Question、Retraction、Concession）に応じて、上述した議論構造更新と同様に、議論構造のうちの、現在の対象となっているノード（当該システム発話に対応するノード）のフラグを更新する。 In step S132, the voice synthesis unit 42 generates a voice waveform based on the uttered character string sent from the utterance generation unit 40, and outputs the voice waveform as a system utterance by the output unit 90 as a voice. Then, in step S134, the discussion structure is updated according to the system utterance output in step S132, and the dialogue processing routine is ended. For example, according to the dialogue act (Assertion, Question, Retraction, Concession) corresponding to the system utterance output in step S132, the discussion structure is the current target of the discussion structure in the same manner as the discussion structure update described above. The flag of the node (node corresponding to the system utterance) is updated.

以上説明したように、本実施の形態に係る対話システム１００によれば、ユーザ発話についての対話行為の推定結果、所定のドメインに属しているか否かの判定結果、議論に関するものであるか否かの判定結果、及び議論構造の各ノードのうちのユーザ発話に対応するノードの判定結果に基づいて、システム側の次の行動を決定することにより、ユーザと議論を行うことができる。 As described above, according to the dialogue system 100 according to the present embodiment, the estimation result of the dialogue act regarding the user utterance, the determination result of whether or not the user utters in a predetermined domain, and whether or not it is related to the discussion. It is possible to discuss with the user by determining the next action on the system side based on the determination result of (1) and the determination result of the node corresponding to the user's utterance among the nodes of the discussion structure.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the spirit of the present invention.

また、上記実施の形態では、議論構造記憶部３４及びマルチモーダル情報記憶部３６を備えている場合について説明したが、例えば、議論構造記憶部３４及びマルチモーダル情報記憶部３６の少なくとも１つが対話システムの外部装置に設けられ、対話システムは、外部装置と通信手段を用いて通信することにより、議論構造記憶部３４及びマルチモーダル情報記憶部３６の少なくとも１つを参照するようにしてもよい。 Further, in the above embodiment, the case where the discussion structure storage unit 34 and the multi-modal information storage unit 36 are provided has been described, but for example, at least one of the discussion structure storage unit 34 and the multi-modal information storage unit 36 is a dialogue system. The dialogue system may be provided in the external device of FIG. 1 to refer to at least one of the discussion structure storage unit 34 and the multimodal information storage unit 36 by communicating with the external device using a communication unit.

また、上述の対話システムは、内部にコンピュータシステムを有しているが、コンピュータシステムは、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Further, although the above-mentioned dialogue system has a computer system inside, the computer system also includes a homepage providing environment (or a display environment) if the WWW system is used.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読取り可能な記録媒体に格納して提供することも可能である。 Further, in the specification of the present application, the embodiment in which the program is pre-installed has been described, but the program can be stored in a computer-readable recording medium and provided.

１０入力部
１２センサ
２０演算部
２２発話区間検知部
２４音声認識部
２６動作検知部
２８発話入力部
３０発話判定部
３２対話管理部
３４議論構造記憶部
３６マルチモーダル情報記憶部
３８雑談応答部
４０発話生成部
４２音声合成部
５０対話行為変換部
５２議論判定部
５４議論構造内判定部
５６ドメイン判定部
９０出力部
１００対話システム 10 input unit 12 sensor 20 arithmetic unit 22 speech section detection unit 24 speech recognition unit 26 motion detection unit 28 speech input unit 30 speech determination unit 32 dialogue management unit 34 discussion structure storage unit 36 multimodal information storage unit 38 chat response unit 40 speech Generation unit 42 Speech synthesis unit 50 Dialogue action conversion unit 52 Discussion determination unit 54 Discussion structure determination unit 56 Domain determination unit 90 Output unit 100 Dialog system

Claims

ユーザ発話を表すテキストに基づいて、テキストを複数種類の対話行為に分類する分類器を用いて、前記ユーザ発話についての対話行為を推定する対話行為変換部と、
前記ユーザ発話を表すテキストに基づいて、テキストが所定のドメインに属しているか否かを分類する分類器を用いて、前記ユーザ発話について、前記所定のドメインに属しているか否かを判定するドメイン判定部と、
前記ユーザ発話を表すテキストに基づいて、前記ユーザ発話について、議論に関するものであるか否かを判定する議論判定部と、
議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造の各ノードのうち、前記ユーザ発話に対応するノードがあるか否か、及び前記ユーザ発話に対応するノードが何れであるかを、前記ノードのテキストと、前記ユーザ発話を表すテキストとの意味的な類似度に基づいて判定する議論構造内判定部と、
前記ドメイン判定部による判定結果、前記議論判定部による判定結果、及び前記議論構造内判定部による判定結果に基づいて、システム側の次の行動として、前記推定された前記対話行為と前記判定されたノードとに基づいて前記議論構造を更新する議論構造更新、前記所定のドメインに属していない旨を応答するドメイン外処理、議論ができない旨を応答する議論構造外処理、及び議論とは関係ない雑談に対する応答を行う雑談処理の何れかを決定する対話管理部と、
を含む対話システム。 A dialogue act conversion unit that estimates a dialogue act about the user utterance by using a classifier that classifies the text into a plurality of types of dialogue acts based on the text representing the user utterance ;
Based on the text representing the user's utterance by using a classifier that text to classify whether belonging to a predetermined domain and for said user's utterance, determining domain determines whether belonging to the given domain Department,
A discussion determination unit that determines whether or not the user utterance is related to a discussion based on a text representing the user utterance ,
Whether or not there is a node corresponding to the user utterance among each node of the argument structure including a node representing each of the argument that is the center of the argument and the argument that supports or does not support the other argument , and A discussion structure determination unit that determines which node corresponds to the user utterance based on the semantic similarity between the text of the node and the text that represents the user utterance ;
Determination result by previous SL domain determination unit, the determination result by the discussion judging unit, and based on a determination result of the discussion structure inside the determination section, as the next action of the system side, is the determination and the estimated the dialogue act The discussion structure update for updating the discussion structure based on the node, the out-of-domain processing for responding that the user does not belong to the predetermined domain, the out-of-argument processing for responding that the discussion cannot be performed, and the discussion A dialogue management unit that determines any of the chat processing for responding to the chat ;
Dialogue system including.

前記対話管理部は、前記ユーザ発話が議論に関するものであり、かつ、前記ユーザ発話に対応するノードがあり、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記議論構造更新を決定し、
前記ユーザ発話が前記所定のドメインに属していない場合、システム側の次の行動として、前記ドメイン外処理を決定し、
前記ユーザ発話が議論に関するものであり、かつ、前記ユーザ発話に対応するノードがなく、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記議論構造外処理を決定し、
前記ユーザ発話が議論に関するものでなく、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記雑談処理を決定する請求項１記載の対話システム。 When the user utterance is related to a discussion, there is a node corresponding to the user utterance, and the user utterance belongs to the predetermined domain, the dialogue management unit performs the next action on the system side. And decide to update the discussion structure,
If the user's utterance does not belong to the predetermined domain, as the next action on the system side, the out-of-domain process is determined,
When the user utterance is related to a discussion, there is no node corresponding to the user utterance, and the user utterance belongs to the predetermined domain, the next action on the system side is outside the discussion structure. Decide what to do,
The dialogue system according to claim 1 , wherein when the user utterance is not related to discussion and the user utterance belongs to the predetermined domain, the chat process is determined as the next action on the system side .

ユーザの状態を表すマルチモーダル情報を格納するマルチモーダル情報記憶部と、
前記ユーザの動作に基づいて、前記マルチモーダル情報を更新するユーザ動作検知部と、を更に含み、
前記対話管理部は、更に、前記マルチモーダル情報に応じて、決定したシステム側の次の行動を行う請求項１又は２記載の対話システム。 A multi-modal information storage unit that stores multi-modal information indicating the state of the user,
Further comprising a user motion detection unit that updates the multi-modal information based on the motion of the user,
The dialogue system according to claim 1 or 2, wherein the dialogue management unit further performs the determined next action on the system side in accordance with the multi-modal information.

ユーザ発話を表すテキストに基づいて、テキストを複数種類の対話行為に分類する分類器を用いて、前記ユーザ発話についての対話行為を推定する対話行為変換部と、
前記ユーザ発話を表すテキストに基づいて、テキストが所定のドメインに属しているか否かを分類する分類器を用いて、前記ユーザ発話について、前記所定のドメインに属しているか否かを判定するドメイン判定部と、
議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造の各ノードのうち、前記ユーザ発話に対応するノードがあるか否か、及び前記ユーザ発話に対応するノードが何れであるかを、前記ノードのテキストと、前記ユーザ発話を表すテキストとの意味的な類似度に基づいて判定する議論構造内判定部と、
前記ドメイン判定部による判定結果、前記議論構造内判定部による判定結果、及び前記ユーザ発話が議論に関するものであるか否かを表す情報に基づいて、システム側の次の行動として、前記推定された前記対話行為と前記判定されたノードとに基づいて前記議論構造を更新する議論構造更新、前記所定のドメインに属していない旨を応答するドメイン外処理、議論ができない旨を応答する議論構造外処理、及び議論とは関係ない雑談に対する応答を行う雑談処理の何れかを決定する対話管理部と、
を含む対話システム。 A dialogue act conversion unit that estimates a dialogue act about the user utterance by using a classifier that classifies the text into a plurality of types of dialogue acts based on the text representing the user utterance ;
Based on the text representing the user's utterance by using a classifier that text to classify whether belonging to a predetermined domain and for said user's utterance, determining domain determines whether belonging to the given domain Department,
Whether or not there is a node corresponding to the user utterance among each node of the argument structure including a node representing each of the argument that is the center of the argument and the argument that supports or does not support the other argument , and A discussion structure determination unit that determines which node corresponds to the user utterance based on the semantic similarity between the text of the node and the text that represents the user utterance ;
Determination result by previous SL domain determination unit, before Symbol discussed structure in determining unit according to the determination result, and based on information the user utterance representing whether or not related to the discussion, as the next action of the system side, the estimated A discussion structure update that updates the discussion structure based on the determined dialogue act and the determined node, an out-of-domain process that responds that the user does not belong to the predetermined domain, and a discussion structure that responds that the discussion cannot be performed. A dialogue management unit that determines one of external processing and chat processing that responds to chat that is not related to discussion ;
Dialogue system including.

対話システムが実行する対話方法であって、
対話行為変換部が、ユーザ発話を表すテキストに基づいて、テキストを複数種類の対話行為に分類する分類器を用いて、前記ユーザ発話についての対話行為を推定し、
ドメイン判定部が、前記ユーザ発話を表すテキストに基づいて、テキストが所定のドメインに属しているか否かを分類する分類器を用いて、前記ユーザ発話について、前記所定のドメインに属しているか否かを判定し、
議論判定部が、前記ユーザ発話を表すテキストに基づいて、前記ユーザ発話について、議論に関するものであるか否かを判定し、
議論構造内判定部が、議論の中心となる論拠と、他の論拠を支持する、または支持しない論拠との各々を表すノードを含む議論構造の各ノードのうち、前記ユーザ発話に対応するノードがあるか否か、及び前記ユーザ発話に対応するノードが何れであるかを、前記ノードのテキストと、前記ユーザ発話を表すテキストとの意味的な類似度に基づいて判定し、
対話管理部が、前記ドメイン判定部による判定結果、前記議論判定部による判定結果、及び前記議論構造内判定部による判定結果に基づいて、システム側の次の行動として、前記推定された前記対話行為と前記判定されたノードとに基づいて前記議論構造を更新する議論構造更新、前記所定のドメインに属していない旨を応答するドメイン外処理、議論ができない旨を応答する議論構造外処理、及び議論とは関係ない雑談に対する応答を行う雑談処理の何れかを決定する
対話方法。 A method of interaction performed by the interaction system,
The dialogue act conversion unit estimates a dialogue act for the user utterance by using a classifier that classifies the text into a plurality of types of dialogue acts based on the text representing the user utterance,
Domain determination unit, based on the text representing the user's utterance, the text using a classifier for classifying whether belonging to a predetermined domain and for said user's utterance, whether belonging to the given domain Is judged,
The discussion determination unit determines , based on the text representing the user utterance, whether or not the user utterance is related to a discussion,
Among the nodes of the discussion structure, the determination unit in the discussion structure includes a node representing each of the argument that is the center of the argument and the arguments that support or does not support other arguments, and the node corresponding to the user utterance is Whether or not there is, and which is the node corresponding to the user utterance is determined based on the semantic similarity between the text of the node and the text representing the user utterance ,
Dialogue management unit, the determination result by the previous SL domain determination unit, the determination result by the discussion judging unit, and based on a determination result of the discussion structure inside the determination section, as the next action of the system side, the estimated the dialogue An argument structure update that updates the argument structure based on an action and the determined node, an out-of-domain process that responds that the node does not belong to the predetermined domain, an out-of-argument process that responds that the argument cannot be discussed, and An interactive method that determines which chat process is to respond to a chat that is not related to discussion .

前記対話管理部が決定することでは、前記ユーザ発話が議論に関するものであり、かつ、前記ユーザ発話に対応するノードがあり、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記議論構造更新を決定し、
前記ユーザ発話が前記所定のドメインに属していない場合、システム側の次の行動として、前記ドメイン外処理を決定し、
前記ユーザ発話が議論に関するものであり、かつ、前記ユーザ発話に対応するノードがなく、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記議論構造外処理を決定し、
前記ユーザ発話が議論に関するものでなく、かつ、前記ユーザ発話が前記所定のドメインに属している場合、システム側の次の行動として、前記雑談処理を決定する請求項５記載の対話方法。 When the dialogue management unit determines that the user utterance is related to the discussion, there is a node corresponding to the user utterance, and the user utterance belongs to the predetermined domain, the system side As the next action of, decide to update the discussion structure,
If the user's utterance does not belong to the predetermined domain, as the next action on the system side, the out-of-domain process is determined,
When the user utterance is related to a discussion, there is no node corresponding to the user utterance, and the user utterance belongs to the predetermined domain, the next action on the system side is outside the discussion structure. Decide what to do,
6. The dialogue method according to claim 5 , wherein when the user utterance is not related to discussion and the user utterance belongs to the predetermined domain, the chat process is determined as the next action on the system side .

ユーザ動作検知部が、ユーザの動作に基づいて、前記ユーザの状態を表すマルチモーダル情報を格納するマルチモーダル情報記憶部の前記マルチモーダル情報を更新すること、及び
前記対話管理部が、前記マルチモーダル情報に応じて、決定したシステム側の次の行動を行うこと
を更に含む請求項５又は６記載の対話方法。 A user motion detection unit updates the multimodal information of a multimodal information storage unit that stores multimodal information indicating a state of the user based on the motion of the user; and the dialogue management unit, the multimodal The interaction method according to claim 5 , further comprising: performing the next action on the side of the determined system according to the information.

コンピュータを、請求項１〜請求項４の何れか１項記載の対話システムを構成する各部として機能させるためのプログラム。 A program for causing a computer to function as each unit constituting the dialogue system according to any one of claims 1 to 4 .