JP6790894B2

JP6790894B2 - Dialogue device

Info

Publication number: JP6790894B2
Application number: JP2017027331A
Authority: JP
Inventors: 生聖渡部; 侑司大沼
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2017-02-16
Filing date: 2017-02-16
Publication date: 2020-11-25
Anticipated expiration: 2037-02-16
Also published as: JP2018132704A

Description

本発明は、対話装置に関する。 The present invention relates to a dialogue device.

一般的な対話装置は、特許文献１に開示されているように、ユーザの発話に基づいて生成した複数の応答の中から１つの応答を、予め設定された評価関数を用いた評価結果に基づいて選択して出力している。 As disclosed in Patent Document 1, a general dialogue device selects one response from a plurality of responses generated based on a user's utterance based on an evaluation result using a preset evaluation function. Is selected and output.

特開２００３−２５５９９０号公報Japanese Unexamined Patent Publication No. 2003-255990

特許文献１の対話装置においては、予め評価関数を設定する際に全ての状況を想定する必要があり、想定外の発話が行われた場合に適切な応答を行うことが難しい。そのため、特許文献１の対話装置は、自然な対話を行うことが難しい。 In the dialogue device of Patent Document 1, it is necessary to assume all situations when setting the evaluation function in advance, and it is difficult to give an appropriate response when an unexpected utterance is made. Therefore, it is difficult for the dialogue device of Patent Document 1 to perform a natural dialogue.

本発明は、このような問題点に鑑みてなされたものであり、自然な対話を実現する対話装置を提供する。 The present invention has been made in view of such problems, and provides a dialogue device that realizes a natural dialogue.

本発明の一態様に係る対話装置は、入力されたユーザの言葉に対する応答文を対話モデルに基づいて決定して出力する対話装置であって、
前記ユーザの言葉に基づいて、応答の種類を示す応答種別と関連付られた応答文を生成する応答生成部と、
前記ユーザの言葉に基づいて、前記ユーザの意思を推定する意思推定部と、
前記ユーザの意思と前記応答種別との関係を示す対話モデルを格納した対話モデル格納部と、
前記対話モデルを参照して前記ユーザの意思に対応する応答種別を選択し、出力する応答文を選択する応答選択部と、
前記ユーザの意思と前記応答種別との組み合わせに基づいて、予め設定された評価点を格納した評価点格納部と、
前記推定したユーザの意思と、前記選択した応答種別と、に基づいて評価点を付与し、前記付与した評価点に基づいて、前記対話モデルの適否を学習する対話モデル学習部と、
を備える。
このような構成では、ユーザの意思と、選択した応答種別と、に基づいて評価点を付与し、付与した評価点に基づいて、対話モデルの適否を学習する。そのため、ユーザから想定外の発話が行われた場合も、学習を積めば、適切な応答を行うことができる。よって、ユーザとの自然な対話を実現することができる。 The dialogue device according to one aspect of the present invention is a dialogue device that determines and outputs a response sentence to an input user's words based on a dialogue model.
A response generation unit that generates a response statement associated with a response type indicating the response type based on the user's words.
An intention estimation unit that estimates the user's intention based on the user's words,
A dialogue model storage unit that stores a dialogue model showing the relationship between the user's intention and the response type, and
A response selection unit that selects a response type corresponding to the user's intention with reference to the dialogue model and selects a response statement to be output.
An evaluation point storage unit that stores preset evaluation points based on the combination of the user's intention and the response type, and
An evaluation point is given based on the estimated user's intention and the selected response type, and a dialogue model learning unit that learns the suitability of the dialogue model based on the given evaluation point.
To be equipped.
In such a configuration, evaluation points are given based on the user's intention, the selected response type, and the appropriateness of the dialogue model is learned based on the given evaluation points. Therefore, even if an unexpected utterance is made by the user, an appropriate response can be made by accumulating learning. Therefore, a natural dialogue with the user can be realized.

本発明によれば、自然な対話を実現する対話装置を提供することができる。 According to the present invention, it is possible to provide a dialogue device that realizes a natural dialogue.

実施の形態の対話装置を模式的に示すブロック図である。It is a block diagram which shows typically the dialogue apparatus of embodiment. 実施の形態の対話装置において、対話モデルを学習する流れを示すフローチャート図である。It is a flowchart which shows the flow of learning the dialogue model in the dialogue apparatus of embodiment. ユーザの意思と応答種別との組み合わせに基づいて予め設定された報酬の評価点を示す評価テーブルである。This is an evaluation table showing evaluation points of rewards set in advance based on the combination of the user's intention and the response type. ユーザの意思と応答種別との組み合わせに基づいて予め設定されたペナルティの評価点を示す評価テーブルである。This is an evaluation table showing evaluation points of penalties set in advance based on the combination of the user's intention and the response type.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。但し、本発明が以下の実施の形態に限定される訳ではない。また、説明を明確にするため、以下の記載及び図面は、適宜、簡略化されている。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. However, the present invention is not limited to the following embodiments. Further, in order to clarify the explanation, the following description and drawings have been simplified as appropriate.

本実施の形態の対話装置の基本構成を説明する。本実施の形態の対話装置は、ユーザの発話に対して応答することで、音声対話を実現する構成とされている。但し、本実施の形態の対話装置は、文章対話でも実施することができる。 The basic configuration of the dialogue device of the present embodiment will be described. The dialogue device of the present embodiment is configured to realize a voice dialogue by responding to a user's utterance. However, the dialogue device of the present embodiment can also be implemented in a text dialogue.

図１は、本実施の形態の対話装置を模式的に示すブロック図である。図１に示すように、対話装置１は、発話入力部２、音声認識部３、応答生成部４、意思推定部５、対話モデル格納部６、応答選択部７及び応答出力部８を備えている。 FIG. 1 is a block diagram schematically showing the dialogue device of the present embodiment. As shown in FIG. 1, the dialogue device 1 includes a speech input unit 2, a voice recognition unit 3, a response generation unit 4, an intention estimation unit 5, an dialogue model storage unit 6, a response selection unit 7, and a response output unit 8. There is.

発話入力部２は、ユーザの発話（音声）を集音するマイクロフォンなどを備えており、集音した音声のアナログデータをデジタルデータに変換して音声波形データを取得する。そして、発話入力部２は、音声波形データを音声認識部３及び意思推定部５に出力する。 The utterance input unit 2 is provided with a microphone or the like that collects the user's utterance (voice), and converts the analog data of the collected voice into digital data to acquire the voice waveform data. Then, the utterance input unit 2 outputs the voice waveform data to the voice recognition unit 3 and the intention estimation unit 5.

音声認識部３は、発話入力部２から入力される音声波形データが示す音声波形に基づいて認識文字列を取得する。例えば、音声認識部３は、音声波形データが示す音声波形をテキストデータ化し、当該音声波形のテキストデータから各形態素の種類や掛かり受け関係を推定して認識文字列を取得する。そして、音声認識部３は、取得した認識文字列を示す認識文字列データを応答生成部４及び意思推定部５に出力する。ちなみに、各形態素の種類や掛かり受け関係の推定は、一般的な手法を用いることができる。 The voice recognition unit 3 acquires a recognition character string based on the voice waveform indicated by the voice waveform data input from the utterance input unit 2. For example, the voice recognition unit 3 converts the voice waveform indicated by the voice waveform data into text data, estimates the type of each form element and the dependency relationship from the text data of the voice waveform, and acquires the recognition character string. Then, the voice recognition unit 3 outputs the recognition character string data indicating the acquired recognition character string to the response generation unit 4 and the intention estimation unit 5. By the way, a general method can be used for estimating the type of each morpheme and the dependency relationship.

応答生成部４は、音声認識部３から入力される認識文字列データが示す認識文字列に基づいて応答（応答文）を生成する。応答生成部４は、一つの認識文字列に対して複数の応答を生成できるように、複数の応答生成器４ａを備えている。 The response generation unit 4 generates a response (response sentence) based on the recognition character string indicated by the recognition character string data input from the voice recognition unit 3. The response generator 4 includes a plurality of response generators 4a so that a plurality of responses can be generated for one recognition character string.

これらの応答生成器４ａは、予め応答記憶部（図示を省略）に記憶された認識文字列と応答との関係を参照して、入力された認識文字列データが示す認識文字列に対応する応答を予め設定されている応答生成ルールに基づいて生成する。 These response generators 4a refer to the relationship between the response and the recognition character string stored in the response storage unit (not shown) in advance, and the response corresponding to the recognition character string indicated by the input recognition character string data. Is generated based on a preset response generation rule.

例えば、応答生成部４は、応答生成器４ａとして、ＱＡ（Question/Anser）応答生成器４ａ１、共感応答生成器４ａ２、不足格質問生成器４ａ３、話題誘導生成器４ａ４、オウム返し生成器４ａ５及び相槌生成器４ａ６を備えている。そして、例えば、入力された認識文字列データが示す認識文字列が「昨日プレゼントをもらったよ」の場合、ＱＡ応答生成器４ａ１は応答を生成せず、共感応答生成器４ａ２は「それはうれしかったね」を生成し、不足格質問生成器４ａ３は「誰にもらったのかな？」を生成し、話題誘導生成器４ａ４は「明日、晴れるかな？」を生成し、オウム返し生成器４ａ５は「プレゼントだね」を生成し、相槌生成器４ａ６は「うんうん」を生成する。各々の応答生成器４ａは、このように生成した応答を示す応答データを応答選択部７に出力する。但し、各々の応答生成器４ａで生成される応答は、適宜、変更することができる。 For example, the response generator 4 uses the QA (Question / Anser) response generator 4a1, the empathy response generator 4a2, the deficiency question generator 4a3, the topic induction generator 4a4, the parrot return generator 4a5, and the response generator 4a. It is equipped with an aizuchi generator 4a6. Then, for example, when the recognition character string indicated by the input recognition character string data is "I got a present yesterday", the QA response generator 4a1 does not generate a response, and the empathy response generator 4a2 "I was happy". The deficiency question generator 4a3 generates "Who got it?", The topic induction generator 4a4 generates "Is it sunny tomorrow?", And the parrot return generator 4a5 is "a gift." "Ne" is generated, and the sympathy generator 4a6 generates "Yeah". Each response generator 4a outputs response data indicating the response thus generated to the response selection unit 7. However, the response generated by each response generator 4a can be changed as appropriate.

意思推定部５は、発話入力部２から入力される音声波形データが示す音声波形、及び音声認識部３から入力される認識文字列データが示す認識文字列に基づいて、ユーザの意思（例えば、ユーザの感情及びユーザが対話装置１に求める意図）を推定する。例えば、意思推定部５は、感情推定器５ａ及び意図推定器５ｂを備えている。 The intention estimation unit 5 is based on the voice waveform indicated by the voice waveform data input from the speech input unit 2 and the recognition character string indicated by the recognition character string data input from the voice recognition unit 3, and the user's intention (for example, Estimate the user's feelings and the intention that the user wants from the dialogue device 1. For example, the intention estimator 5 includes an emotion estimator 5a and an intention estimator 5b.

感情推定器５ａは、発話入力部２から入力される音声波形データが示す音声波形、及び音声認識部３から入力される認識文字列データが示す認識文字列に基づいて、ユーザの感情（例えば、ポジティブ／ニュートラル／ネガティブ）を推定する。例えば、感情推定器５ａは、音声波形の基本周波数や振幅の変化など、即ち、韻律特徴に基づいて、ユーザの感情を推定する。また、例えば、感情推定器５ａは、認識文字列の複数の形態素を一組とする形態素組の特徴ベクトルを算出し、算出した特徴ベクトルをＳＶＭ（Support Vector Machines）を用いて判別し、ユーザの感情を推定する。そして、感情推定器５ａは、推定したユーザの感情を示す感情データを応答選択部７に出力する。但し、感情推定器５ａは、ユーザの感情を推定することができれば、一般的な手法を用いることができる。 The emotion estimator 5a is based on the voice waveform indicated by the voice waveform data input from the speech input unit 2 and the recognition character string indicated by the recognition character string data input from the voice recognition unit 3, and the user's emotion (for example, Estimate positive / neutral / negative). For example, the emotion estimator 5a estimates the user's emotions based on changes in the fundamental frequency and amplitude of the voice waveform, that is, prosodic features. Further, for example, the emotion estimator 5a calculates a feature vector of a morpheme set consisting of a plurality of morphemes of the recognition character string, determines the calculated feature vector using SVM (Support Vector Machines), and determines the user. Estimate emotions. Then, the emotion estimator 5a outputs emotion data indicating the estimated user's emotion to the response selection unit 7. However, the emotion estimator 5a can use a general method as long as it can estimate the user's emotion.

意図推定器５ｂは、音声認識部３から入力される認識文字列データが示す認識文字列に基づいて、ユーザが対話装置１に求める意図（例えば、質問／挨拶／同意の要求／語り／質問に対する回答／無効な入力（即ち、ノイズなど）を推定する。例えば、意図推定器５ｂは、認識文字列の品詞構造と述語項構造との組み合わせがユーザにとってどのような意図を有するかを機械学習から推定する。そして、意図推定器５ｂは、推定したユーザが対話装置１に求める意図を示す意図データを応答選択部７に出力する。但し、意図推定器５ｂは、ユーザの意図を推定することができれば、一般的な手法を用いることができる。 The intention estimator 5b responds to the intention (for example, question / greeting / consent request / narrative / question) requested by the user to the dialogue device 1 based on the recognition character string indicated by the recognition character string data input from the voice recognition unit 3. Estimate the answer / invalid input (ie, noise, etc.) For example, the intent estimator 5b uses machine learning to determine what the combination of the part-of-speech structure and the predicate term structure of the recognition string has for the user. Estimate. Then, the intention estimator 5b outputs the intention data indicating the intent that the estimated user requests from the dialogue device 1 to the response selection unit 7. However, the intention estimator 5b can estimate the user's intention. If possible, a general method can be used.

対話モデル格納部６には、ユーザの意思と応答生成器４ａの応答種別（例えば、ＱＡ（Question/Anser）、共感、オウム返し、不足格質問、相槌及び課題誘導）との関係を表す対話モデルを示す対話モデルデータを格納している。例えば、対話モデルは、感情と意図との組み合わせに基づいて、各々の応答種別に付与された係数である。 The dialogue model storage unit 6 contains a dialogue model that represents the relationship between the user's intention and the response type of the response generator 4a (for example, QA (Question / Anser), empathy, parrot return, deficiency question, aizuchi, and task guidance). Contains dialogue model data that indicates. For example, the dialogue model is a coefficient assigned to each response type based on the combination of emotion and intention.

応答選択部７は、感情推定器５ａから入力される感情データが示すユーザの感情、及び意図推定器５ｂから入力される意図データが示すユーザが対話装置１に求める意図に対応する応答種別を、対話モデル格納部６に格納されている対話モデルデータが示す対話モデルの係数を参照して選択し（例えば、推定したユーザの感情及びユーザが対話装置１に求める意図に対応する応答種別のうち、係数が最も高い応答種別を選択し）、各々の応答生成器から入力される応答データが示す応答から選択した応答種別の応答生成器から入力される応答データが示す応答を選択し、選択した応答を示す応答データを応答出力部８に出力する。 The response selection unit 7 determines the response type corresponding to the user's emotion indicated by the emotion data input from the emotion estimator 5a and the intention indicated by the user's intention data input from the intention estimator 5b to the dialogue device 1. Select by referring to the coefficient of the dialogue model indicated by the dialogue model data stored in the dialogue model storage unit 6 (for example, among the response types corresponding to the estimated user's emotion and the intention requested by the user from the dialogue device 1). Select the response type with the highest coefficient), select the response indicated by the response data input from the response type response generator selected from the responses indicated by the response data input from each response generator, and select the selected response. The response data indicating the above is output to the response output unit 8.

応答出力部８は、応答選択部７から入力される応答データが示す応答を出力する。例えば、応答出力部８は、スピーカを備えており、デジタルデータである応答データをアナログデータに変換して出力する。 The response output unit 8 outputs the response indicated by the response data input from the response selection unit 7. For example, the response output unit 8 is provided with a speaker, and converts the response data, which is digital data, into analog data and outputs the data.

このような対話装置１は、ユーザとの対話に基づいて、対話モデルを学習（補正）できる構成とされており、上述の構成に加えて、対話モデル学習部９を備えている。対話モデル学習部９は、詳細は後述するが、図示を省略した評価点格納部に格納されている評価点テーブルデータが示す評価点テーブルを参照しつつ、ユーザの発話に基づいて推定した当該ユーザの意思と、出力した応答を生成した応答生成器の応答種別と、に基づいて評価点を付与（加算及び減算）し、付与した評価点に基づいて、対話モデルの適否を学習する。 Such a dialogue device 1 is configured to be able to learn (correct) a dialogue model based on a dialogue with a user, and includes a dialogue model learning unit 9 in addition to the above configuration. Although the details will be described later, the dialogue model learning unit 9 estimates based on the user's utterance while referring to the evaluation point table indicated by the evaluation point table data stored in the evaluation point storage unit (not shown). Evaluation points are given (addition and subtraction) based on the intention of the above and the response type of the response generator that generated the output response, and the suitability of the dialogue model is learned based on the given evaluation points.

次に、本実施の形態の対話装置１において、対話モデルを学習する流れを説明する。図２は、本実施の形態の対話装置において、対話モデルを学習する流れを示すフローチャート図である。本実施の形態の対話モデル学習部９は、評価点として報酬とペナルティとを付与して、対話モデルを学習する。また、本実施の形態の対話モデル学習部９は、ユーザとの対話回数（ターン数）も考慮して対話モデルを学習する。そのため、意思推定部５は、ターン数を計測するターンカウンタ５ｃを備えており、計測したターン数を示すターン数データを対話モデル学習部９に出力する。 Next, the flow of learning the dialogue model in the dialogue device 1 of the present embodiment will be described. FIG. 2 is a flowchart showing a flow of learning a dialogue model in the dialogue device of the present embodiment. The dialogue model learning unit 9 of the present embodiment learns the dialogue model by giving a reward and a penalty as evaluation points. Further, the dialogue model learning unit 9 of the present embodiment learns the dialogue model in consideration of the number of dialogues (number of turns) with the user. Therefore, the intention estimation unit 5 includes a turn counter 5c for measuring the number of turns, and outputs the turn number data indicating the measured number of turns to the dialogue model learning unit 9.

先ず、図２に示すように、対話モデル学習部９は、感情推定器５ａからユーザの感情を示す感情データ、及び意図推定器５ｂからユーザが対話装置１に求める意図を示す意図データが入力されると、報酬及びペナルティを付与する（Ｓ１〜Ｓ３）。 First, as shown in FIG. 2, the dialogue model learning unit 9 inputs emotion data indicating the user's emotion from the emotion estimator 5a and intention data indicating the intention that the user requests from the dialogue device 1 from the intention estimator 5b. Then, a reward and a penalty are given (S1 to S3).

ここで、評価点格納部には、報酬の評価点テーブルを示す評価点テーブルデータ、及びペナルティの評価点テーブルを示す評価点テーブルデータが格納されている。図３は、ユーザの意思と応答種別との組み合わせに基づいて予め設定された報酬の評価点を示す評価テーブルである。図４は、ユーザの意思と応答種別との組み合わせに基づいて予め設定されたペナルティの評価点を示す評価テーブルである。 Here, the evaluation point storage unit stores the evaluation point table data indicating the evaluation point table of the reward and the evaluation point table data indicating the evaluation point table of the penalty. FIG. 3 is an evaluation table showing evaluation points of rewards set in advance based on the combination of the user's intention and the response type. FIG. 4 is an evaluation table showing evaluation points of penalties set in advance based on the combination of the user's intention and the response type.

図３に示すように、対話モデル学習部９は、報酬として、ユーザの発話に基づいて推定した当該ユーザの意思と応答出力部８から出力した応答の応答種別との関係に基づいて、「１」、「０」又は「−１」の何れかを付与する。 As shown in FIG. 3, the dialogue model learning unit 9 receives "1" as a reward based on the relationship between the user's intention estimated based on the user's utterance and the response type of the response output from the response output unit 8. , "0" or "-1" is given.

また、図４に示すように、対話モデル学習部９は、ペナルティとして、ユーザの発話に基づいて推定した当該ユーザの意思と応答出力部８から出力した応答の応答種別との関係に基づいて、「２」、「１」又は「０」の何れかを付与する。 Further, as shown in FIG. 4, as a penalty, the dialogue model learning unit 9 is based on the relationship between the user's intention estimated based on the user's utterance and the response type of the response output from the response output unit 8. Either "2", "1" or "0" is given.

次に、対話モデル学習部９は、ターンカウンタ５ｃから入力されるターン数データが示すターン数が一定回数（例えば、１０ターン）以上か否かを判定する（Ｓ４）。例えば、ターンカウンタ５ｃには、応答選択部７から選択した応答種別を示す応答種別データが入力される。そして、ターンカウンタ５ｃは、応答選択部７から応答種別データが入力されてから、予め設定された期間内に発話入力部２から音声波形データが入力された場合、ユーザとの対話において１ターン成立したと判定し、ユーザとの対話が開始されてからのターン数を計測（累計）する。なお、ターンカウンタ５ｃは、予め設定された期間内に発話入力部２から音声波形データが入力されない場合、ユーザとの対話が遮断されたものと判定し、ターン数をリセットする。 Next, the dialogue model learning unit 9 determines whether or not the number of turns indicated by the number of turns data input from the turn counter 5c is a certain number of times (for example, 10 turns) or more (S4). For example, response type data indicating the response type selected from the response selection unit 7 is input to the turn counter 5c. Then, when the voice waveform data is input from the utterance input unit 2 within a preset period after the response type data is input from the response selection unit 7, the turn counter 5c is established for one turn in the dialogue with the user. It is determined that this has been done, and the number of turns since the dialogue with the user is started is measured (cumulative). If the voice waveform data is not input from the utterance input unit 2 within the preset period, the turn counter 5c determines that the dialogue with the user has been interrupted and resets the number of turns.

ターン数が一定回数以上の場合（Ｓ４のＹＥＳ）は、ユーザへの応答が適切で、対話が弾んでいる可能性が高いため、対話モデル学習部９は、報酬として一定値（例えば、２０）を加算する（Ｓ５）。 When the number of turns is a certain number or more (YES in S4), the response to the user is appropriate and there is a high possibility that the dialogue is bouncing. Therefore, the dialogue model learning unit 9 has a fixed value (for example, 20) as a reward. Is added (S5).

一方、対話モデル学習部９は、ターンカウンタ５ｃから入力されるターン数データが示すターン数が一定回数未満の場合（Ｓ４のＮＯ）、Ｓ５をスキップしてＳ６に移行する。 On the other hand, when the number of turns indicated by the number of turns data input from the turn counter 5c is less than a certain number (NO in S4), the dialogue model learning unit 9 skips S5 and shifts to S6.

次に、対話モデル学習部９は、ペナルティの累計が一定値（例えば、１０）以上か否かを判定する（Ｓ６）。ペナルティの累計が一定値以上の場合（Ｓ６のＹＥＳ）は、ユーザへの応答が不適切で、対話が弾んでいない可能性が高いため、対話モデル学習部９は、報酬の累計から一定値（例えば、２０）を減算する（Ｓ７）。 Next, the dialogue model learning unit 9 determines whether or not the cumulative total of penalties is a certain value (for example, 10) or more (S6). If the cumulative penalty is greater than or equal to a certain value (YES in S6), there is a high possibility that the response to the user is inappropriate and the dialogue is not bouncing. Therefore, the dialogue model learning unit 9 has a constant value from the cumulative reward (YES). For example, 20) is subtracted (S7).

一方、ペナルティの累計が一定値未満の場合（Ｓ６のＮＯ）、ユーザへの応答が適切である可能性が高いので、Ｓ７をスキップしてＳ８に移行する。 On the other hand, when the cumulative total of penalties is less than a certain value (NO in S6), there is a high possibility that the response to the user is appropriate, so S7 is skipped and the process proceeds to S8.

次に、対話モデル学習部９は、報酬の累計及びペナルティの累計に基づいて、対話モデルにおける応答種別の係数を補正する（Ｓ８）。例えば、対話モデル学習部９は、ペナルティの累計に応じて、対話モデルにおいて、推定したユーザの意思に基づいて選択した応答種別に付与されている係数を下げ、報酬の累計に応じて、対話モデルにおいて、推定したユーザの意思に基づいて推定した応答種別に付与されている係数を上げる。 Next, the dialogue model learning unit 9 corrects the coefficient of the response type in the dialogue model based on the cumulative reward and the cumulative penalty (S8). For example, the dialogue model learning unit 9 lowers the coefficient given to the response type selected based on the estimated user's intention in the dialogue model according to the cumulative penalty, and the dialogue model according to the cumulative reward. In, the coefficient given to the estimated response type based on the estimated user's intention is increased.

次に、対話モデル学習部９は、ターンカウンタ５ｃから入力されるターン数データが示すターン数が学習上限ターン数以上か否かを判定する（Ｓ９）。対話モデル学習部９は、ターンカウンタ５ｃから入力されるターン数データが示すターン数が学習上限ターン数以上の場合（Ｓ９のＹＥＳ）、係数を補正した対話モデルを示す対話モデルデータを対話モデル格納部６に出力する（Ｓ１０）。対話モデル格納部６は、格納されている対話モデルデータを入力された対話モデルデータに書き換える。これにより、本実施の形態の対話装置１における対話モデルの学習が終了する。 Next, the dialogue model learning unit 9 determines whether or not the number of turns indicated by the number of turns data input from the turn counter 5c is equal to or greater than the upper limit of learning turns (S9). When the number of turns indicated by the number of turns data input from the turn counter 5c is equal to or greater than the upper limit of learning turns (YES in S9), the dialogue model learning unit 9 stores the dialogue model data indicating the coefficient-corrected dialogue model. Output to unit 6 (S10). The dialogue model storage unit 6 rewrites the stored dialogue model data into the input dialogue model data. As a result, the learning of the dialogue model in the dialogue device 1 of the present embodiment is completed.

一方、対話モデル学習部９は、ターンカウンタ５ｃから入力されるターン数データが示すターン数が学習上限ターン数未満の場合（Ｓ９のＮＯ）、Ｓ１に移行する。 On the other hand, when the number of turns indicated by the number of turns data input from the turn counter 5c is less than the maximum number of learning turns (NO in S9), the dialogue model learning unit 9 shifts to S1.

このように本実施の形態の対話装置１は、ユーザの意思と、選択した応答種別と、に基づいて評価点を付与し、付与した評価点に基づいて、対話モデルの適否を学習する。そのため、本実施の形態の対話装置１は、ユーザから想定外の発話が行われた場合も、学習を積めば、適切な応答を行うことができる。よって、本実施の形態の対話装置１は、ユーザとの自然な対話を実現することができる。 As described above, the dialogue device 1 of the present embodiment assigns evaluation points based on the user's intention and the selected response type, and learns the suitability of the dialogue model based on the assigned evaluation points. Therefore, the dialogue device 1 of the present embodiment can give an appropriate response by accumulating learning even when an unexpected utterance is made by the user. Therefore, the dialogue device 1 of the present embodiment can realize a natural dialogue with the user.

本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上述の実施の形態では、本発明をハードウェアの構成として説明したが、本発明は、これに限定されるものではない。本発明は、任意の処理を、ＣＰＵ（Central Processing Unit）にコンピュータプログラムを実行させることにより実現することも可能である。 The present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit. For example, in the above-described embodiment, the present invention has been described as a hardware configuration, but the present invention is not limited thereto. The present invention can also realize arbitrary processing by causing a CPU (Central Processing Unit) to execute a computer program.

上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、ＤＶＤ（Digital Versatile Disc）、ＢＤ(Blu-ray(登録商標) Disc)、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above example, the program can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, DVD (Digital Versatile Disc), BD (Blu-ray (registered trademark) Disc), semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (for example) Random Access Memory)) is included. The program may also be supplied to the computer by various types of transient computer readable media. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１対話装置
２発話入力部
３音声認識部
４応答生成部、４ａ応答生成器、４ａ１ＱＡ応答生成器、４ａ２共感応答生成器、４ａ３不足格質問生成器、４ａ４話題誘導生成器、４ａ５オウム返し生成器、４ａ６相槌生成器
５意思推定部、５ａ感情推定器、５ｂ意図推定器、５ｃターンカウンタ
６対話モデル格納部
７応答選択部
８応答出力部
９対話モデル学習部 1 Dialogue device 2 Speech input unit 3 Speech recognition unit 4 Response generator 4a Response generator, 4a1 QA response generator, 4a2 Sympathy response generator, 4a3 Insufficient question generator, 4a4 Topic guidance generator, 4a5 Parrot return generation Instrument, 4a6 Aizuchi generator 5 Intention estimation unit, 5a Emotion estimator, 5b Intention estimator, 5c Turn counter 6 Dialogue model storage unit 7 Response selection unit 8 Response output unit 9 Dialogue model learning unit

Claims

入力されたユーザの言葉に対する応答文を対話モデルに基づいて決定して出力する対話装置であって、
前記ユーザの言葉に基づいて、応答の種類を示す応答種別と関連付けられた応答文を生成する応答生成部と、
前記ユーザの言葉に基づいて、前記ユーザの意思を推定する意思推定部と、
前記ユーザの意思に基づいて前記応答種別に付与された係数である対話モデルを格納した対話モデル格納部と、
前記対話モデルの係数を参照して前記ユーザの意思に対応する応答種別を選択し、出力する応答文を選択する応答選択部と、
前記ユーザの意思と前記応答種別との組み合わせに基づいて予め設定された報酬の評価点及びペナルティの評価点を示す評価テーブルを格納した評価点格納部と、
前記評価テーブルを参照して前記推定したユーザの意思と、前記選択した応答種別と、に基づいて、前記報酬を累積すると共に前記ペナルティを累積し、前記報酬の累積に応じて、前記推定したユーザの意思に基づいて選択した前記応答種別に付与された係数を上げ、前記ペナルティの累積に応じて、前記推定したユーザの意思に基づいて選択した前記応答種別に付与された係数を下げる対話モデル学習部と、
を備える、対話装置。 It is a dialogue device that determines and outputs a response sentence to the input user's words based on the dialogue model.
A response generator that generates a response statement associated with a response type indicating the response type based on the user's words.
An intention estimation unit that estimates the user's intention based on the user's words,
Dialogue model storage unit for storing the interactive model is a coefficient granted prior Symbol response type based on intention of the user,
A response selection unit that selects the response type corresponding to the user's intention by referring to the coefficient of the dialogue model and selects the response statement to be output.
An evaluation point storage unit that stores an evaluation table showing evaluation points of rewards and evaluation points of penalties set in advance based on the combination of the user's intention and the response type.
Based on the estimated user's intention with reference to the evaluation table and the selected response type, the reward is accumulated and the penalty is accumulated, and the estimated user is accumulated according to the accumulation of the reward. Dialogue model learning that increases the coefficient assigned to the response type selected based on the intention of the user and decreases the coefficient assigned to the response type selected based on the estimated user's intention according to the accumulation of the penalties. Department and
A dialogue device.