JP4246548B2

JP4246548B2 - Dialogue method and apparatus using statistical information, dialogue program and recording medium recording the program

Info

Publication number: JP4246548B2
Application number: JP2003153596A
Authority: JP
Inventors: 竜一郎東中; 幹生中野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-30
Filing date: 2003-05-30
Publication date: 2009-04-02
Anticipated expiration: 2023-05-30
Also published as: JP2004354787A

Description

【０００１】
【発明の属する技術分野】
本発明は、過去の対話データから得られる統計的基準を用いることでユーザの情報要求内容を高精度に特定することができるようにする統計情報を用いた対話方法及びその装置と、その対話方法の実現に用いられる対話プログラム及びそのプログラムを記録した記録媒体とに関する。
【０００２】
【従来の技術】
ユーザが文字列または音声を用いてシステムに情報要求を入力するたびに、ユーザとシステムのやり取りの履歴を用いて、システムが保持するユーザ情報要求内容を更新し、その更新したユーザ情報要求内容に応じて、ユーザに対して文字列または音声により応答する処理を行う対話システムがある。
【０００３】
ユーザが対話システムに情報要求を入力する際に用いるユーザの文字列または音声をユーザ発話と呼び、ユーザ発話によって対話システムに伝達される情報要求内容のことを対話行為と呼ぶ。ユーザ発話から対話行為を導出することを言語理解といい、音声対話システムの場合は特に音声理解という。
【０００４】
なお、対話行為は、対話行為タイプとそれに付随する情報とに分けられる。対話行為タイプとは対話行為の大まかな意味種別を表す記号である。
【０００５】
対話システムが保持するユーザ情報要求内容は対話状態と呼ばれ、対話システムはユーザ発話の入力のたびに、ユーザとシステムのやり取りの履歴を用いて、対話状態を更新する。この更新プロセスのことを談話理解と呼ぶ。言語理解、談話理解など、ユーザ発話の入力により、システムの内部状態が変更されることを一般に理解という。ユーザの入力が音声に限られた対話システムのことを、特に音声対話システムと呼ぶ。
【０００６】
対話システムにおける、談話理解に関する技術として以下のものが挙げられる。
【０００７】
第１に、あるユーザ発話の対話行為タイプを直前のいくつかの対話行為タイプから統計的に推定する技術がある。これを統計的対話行為タイプの推定と呼び、下記の非特許文献１に記載の技術が挙げられる。この非特許文献１に記載の技術は、あるユーザ発話がどの対話行為タイプに属するかを推定することによって、音声認識器の受理すべき文法規則や語彙を絞り込み、音声認識精度が向上できることを示した。
【０００８】
第２に、プラン認識を用いて談話理解の精度向上を行う技術が提案されている（例えば、非特許文献２，３参照）。これは、ユーザがある目的を達成するために行うユーザ発話群の構造（プランと呼ぶ）をあらかじめ規定しておき、ユーザ発話に曖昧性があるとき、プランに最も合致する解釈を選ぶことによって、曖昧性を解消する技術のことである。プランに基づく談話理解は、主にキーボードを用いた対話システムを対象に研究されてきたが、近年では音声対話システムにも応用されている。
【０００９】
第３に、ＩＳＳＳ（Incremental Sentence Sequence Search）という手法が下記の非特許文献４及び特許文献１に示されている。この手法は対話における現象の一つである、幾つかの発話区間にまたがる対話行為に対処するための手法であり、入力として、対話行為を構成するユーザ発話の断片（単語、フレーズ）を受け付け、逐次的に対話状態を更新する。ある時点までのユーザ発話の断片が対話行為を構成するのかしないのか明確でない場合には、その双方を考慮して、複数の対話状態をあらかじめ用意されたスコア規則を用いてスコア付きで保持することにより、ユーザ発話入力後に、最尤な対話状態を決定する。
【００１０】
音声対話システムにおいて、音声認識の複数の候補とＩＳＳＳとを組み合わせる方法により、談話理解の精度が向上することが報告されている（ＩＳＳＳ−Ｎ法。例えば、非特許文献５参照）。
【００１１】
上記の従来技術はすべて、ユーザ発話における理解の曖昧性を解消する技術である。ユーザ発話に対応する対話行為候補は複数あるが、対話行為タイプの推定、プラン認識を用いた談話理解はともに、複数の候補から最尤な候補を選びだすための技術であり、ＩＳＳＳは複数の対話行為候補とある時点での対話状態から得られる複数の対話状態をスコア付きで表すことによって、複数の対話行為から最尤の候補を選ぶ問題を、複数の対話行為と対話状態の組み合わせから最もよいものを選ぶ問題として、対話行為の曖昧性を解消する技術である。
【００１２】
【非特許文献１】
Masaaki Nagata and Tsuyoshi Morimoto. First steps toward statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Communication, Vol.15, pp.193-203, 1994.
【非特許文献２】
Charles Rich, Neal Lesh, and Candace Sidner. COLLAGEN: Applying collaborative discourse theory. Vol.22, No.4, pp.15-25, 2001.
【非特許文献３】
Yusuke Shinyama, Takenobu Tokunaga, and Hozumi Tanaka. Kairai-software robots understanding natural language. Third International Workshop on Human-Computer Conversation, 2000.
【非特許文献４】
Mikio Nakano, Noboru Miyazaki, Jun-ichi Hirasawa, Kohji Dohsaka, and Takeshi Kawabata. Understanding unsegmented user utterances in realtime spoken dialogue systems. In Proc. 37th ACL, pp.200-207, 1999 ．
【非特許文献５】
宮崎昇, 中野幹生, 相川清明,"ｎ−ｂｅｓｔ音声認識と逐次理解法によるロバストな音声理解",SIG-SLP-40, 2002, 121-126
【特許文献１】
特開平１１−２３７８９４号公報
【００１３】
【発明が解決しようとする課題】
ＩＳＳＳでは、複数の対話行為と対話状態との組み合わせから最もよいものを選ぶことが必要である。
【００１４】
従来方法では、手作業によるルールを作成し、対話状態をスコアリングして、例えば、スコアが最も高い対話状態を最尤であるとしていた。
【００１５】
しかしながら、スコアリングのルールは、設計者の経験則に基づくものであるため、場合によってはスコアリングを誤ることも多く、ルール作成およびチューニングには高いコストと専門知識が必要であるという問題がある。
【００１６】
本発明はかかる事情に鑑みてなされたものであって、複数の対話行為と対話状態との組み合わせから得られる複数の対話状態のスコアリングに、手作業によるルールを用いるのではなく、人間と機械との対話記録から得られる統計情報を元にスコアリングを行うことで、ユーザの情報要求内容を高精度に特定することができるようにする新たな対話技術の提供を目的とする。
【００１７】
【課題を解決するための手段】
この目的を達成するために、本発明の対話装置は、ユーザとの対話機能を有して、ユーザからの情報要求を入力するたびに、ユーザとのやり取りの履歴を用いてユーザの情報要求内容を特定し、それに応じてユーザに対して応答する処理を行うために、（１）ユーザの情報要求内容の候補を記憶する記憶手段と、（２）入力したユーザの情報要求に応答して、そのユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成する生成手段と、（３）生成手段の生成した新たなユーザ情報要求内容の候補に対して、（ｉ）新たなユーザ情報要求内容の候補の生成元となった記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、（ ii ）入力したユーザ情報要求を解釈したときに得られる第２のスコアと、 (iii) 過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、 (iv) 過去の対話データから統計処理により得られる、記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与する付与手段と、（４）付与手段の付与したスコアに応じて、生成手段の生成した新たなユーザ情報要求内容の候補の中からユーザの情報要求内容を特定する特定手段と、（５）生成手段の生成した新たなユーザ情報要求内容の候補を記憶手段に書き込むことで、記憶手段に記憶されるユーザ情報要求内容の候補を更新する更新手段とを備えるように構成する。
【００１８】
以上の各処理手段が動作することで実現される本発明の対話方法はコンピュータプログラムで実現できるものであり、このコンピュータプログラムは、半導体メモリなどのような適当な記録媒体に記録して提供されたり、ネットワークを介して提供され、本発明を実施する際にインストールされてＣＰＵなどの制御手段上で動作することにより本発明を実現することになる。
【００１９】
このように構成される本発明の対話装置では、ユーザからの情報要求を入力すると、そのユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成して、それらのユーザ情報要求内容の候補に対して、過去の対話データから得られる統計的基準を用いてスコアを付与する。
【００２０】
具体的には、（ｉ）新たなユーザ情報要求内容の候補の生成元となった記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、（ ii) 入力したユーザ情報要求を解釈したときに得られる第２のスコアと、 (iii) 過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、 (iv) 過去の対話データから統計処理により得られる、記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与する。
【００２１】
そして、その付与したスコアに応じて、生成したユーザ情報要求内容の候補の中からユーザの情報要求内容を特定するとともに、生成したユーザ情報要求内容の候補を記憶手段に書き込むことで、記憶手段に記憶されるユーザ情報要求内容の候補を更新する。
【００２２】
このようにして、本発明では、ユーザが文字列または音声を用いてシステムに情報要求を入力するたびに、ユーザとシステムとのやり取りの履歴を用いて、システムが保持するユーザ情報要求内容を更新し、その更新したユーザ情報要求内容に応じて、ユーザに対して文字列または音声により応答するときに、ユーザの情報要求に１つ以上の解釈が存在し、システムが保持するユーザ情報要求内容に１つ以上の解釈が存在する場合において、ユーザの情報要求のそれぞれと、システムが保持するユーザ情報要求内容のそれぞれとの組み合わせから得られる、１つ以上のユーザ情報要求内容候補のそれぞれに対し、過去の対話データから得られる統計的基準を用いて、正解と考えられるものに高いスコアを、そうでないものには低いスコアを与え、スコアの高い候補を優先的なユーザ情報要求内容とすることで、ユーザの情報要求内容を高精度で取得し、同時に、その時点ではスコアの低いユーザ情報要求内容候補であっても、次のユーザ情報要求との組み合わせによっては、高いスコアを持つユーザ情報要求内容候補を生成する可能性があることを考慮して、スコアの低いユーザ情報要求内容候補についても次のユーザ入力まで保持し続けるようにするのである。
【００２３】
この構成に従って、本発明によれば、話の流れ（文脈）を考えてユーザの伝えたいことを理解して、それに対して応答することを実現するときにあって、複数の対話行為と対話状態との組み合わせから得られる複数の対話状態のスコアリングに、手作業によるルールを用いるのではなく、人間と機械との対話記録から得られる統計情報を元にスコアリングを行うことで、ユーザの情報要求内容を高精度に特定することができるようになる。
【００２４】
【発明の実施の形態】
以下、実施の形態に従って本発明を詳細に説明する。
【００２５】
図１に、本発明を具備する音声対話システム１の一実施形態例を図示する。
【００２６】
この図に示すように、本発明を具備する音声対話システム１は、ユーザ発話を受け付け、対話状態の内容が一定の条件を満たせば、ユーザに応答を返すように処理するものであって、大きく分けて、対話装置１０と統計情報管理装置２０とで構成される。
【００２７】
この対話装置１０は、ユーザ発話を受け付けるユーザ発話受付部１１と、ユーザ発話を対話行為に変換する対話行為変換部１２と、対話状態と対話行為とをもとに対話状態を更新する対話状態更新部１３と、対話状態を管理する対話状態管理部１４と、更新された対話状態をもとにユーザに情報を伝達する応答生成部１５とを備える。
【００２８】
一方、統計情報管理装置２０は、人間と機械との対話記録を保持する対話記録保持部２１と、対話記録保持部２１に保持される対話記録から統計情報を抽出する統計情報抽出部２２と、統計情報抽出部２２の抽出した統計情報を格納する統計情報保持部２３とを備える。
【００２９】
以下、会議室予約システムへの適用を具体例にして、このように構成される本発明の音声対話システム１の処理について詳細に説明する。
【００３０】
会議室予約システムでは、ユーザがシステムに会議室名、日にち、時間を伝え終わった後に、システムは会議室の予約を行い、その旨をユーザに伝える。図２に示すように、ユーザが一方的にシステムに要件を伝えるだけではなく、状況によっては、システムが発話によってユーザ発話を促す場合もある。
【００３１】
このとき、会議室予約システムは、図３に示すように、ユーザの発話のすべてを聞き取れるとは限らない。
【００３２】
そこで、本発明の音声対話システム１は、話の流れ（文脈）を考えて、ユーザの伝えたいことを理解するようにしている。
【００３３】
これを実現するために、本発明の音声対話システム１は、過去の対話データから得られる統計的基準を用いて、ユーザの対話行為によって変化していく対話状態に対してスコアを付与し、図４に示すように、ユーザが伝えたと思われる内容をすべて考慮して、その付与したスコアに従って、話の流れからもっともらしいものを選ぶようにしている。
【００３４】
このとき付与するスコアとして、ユーザ発話の対話行為タイプとそれに対するシステム発話の対話行為タイプとが連鎖するパターンの出現確率（以下、連鎖確率と称することがある）と、システムが保持する対話状態とそれに対するユーザ発話とが共起するパターンの出現確率（以下、共起確率と称することがある）とに応じたスコアを付与するようにしている。
【００３５】
すなわち、大まかな対話の流れを表す図５に示すような連鎖確率と、局所的で詳細な対話の流れを表す図６に示すような共起確率とに応じたスコアを付与するようにしている。
【００３６】
上述したように、対話行為とは、ユーザ発話によって対話システムに伝達される内容のことであり、対話状態とは、対話システムが保持するユーザ情報要求内容のことである。
【００３７】
本発明に言う対話状態とは、システムが内部に保持する様々な対話に関する情報のことを指し、話の流れ（文脈）を指すものであって、対話の各時点でのユーザ意図の理解結果（スロットと呼ばれる属性値対から構成されるフレーム表現で記述される）や、ユーザとシステムとの間の発話履歴や、生成元となった対話状態の履歴などを含んでいる。
【００３８】
会議室予約システムへの適用を具体例にして説明するならば、対話状態は、図７に示すようなデータ構造を有しており、図１に示す対話状態管理部１４は、このようなデータ構造を持つ対話状態を管理することになる。
【００３９】
次に、本発明の音声対話システム１の備える各手段について説明する。
【００４０】
〔イ〕統計情報管理装置２０
〔イ−１〕対話記録保持部２１
対話記録保持部２１は、人間と機械との対話記録を保持する。
【００４１】
対話記録保持部２１に保持される対話記録とは、人間と音声対話システムとが対話した履歴を記録したデータであるとする。対話記録には、行われたそれぞれの対話に関する、ユーザ発話、システムの発話の開始時間と終了時間、およびユーザ発話前後のシステムの対話状態が保存されている。
【００４２】
ユーザ音声とシステム音声とはすべて録音され、すべてのユーザ音声は書き起こされている。書き起こされたユーザ音声は、プログラムを用いた自動処理と手作業による修正によって、対話行為の表現に変換されているものとする。
【００４３】
〔イ−２〕統計情報抽出部２２
統計情報抽出部２２は、対話記録保持部２１に保持される対話記録から２種類の統計情報を抽出する。
【００４４】
一つは、対話行為タイプ（対話行為の意味種別を表わす）の連鎖確率であり、もう一つは、対話の各時点での対話状態とその直後の対話行為との共起確率である。
【００４５】
連鎖確率にはｎ-gram 確率を用いる。ｎ-gram 確率とは、あるｎ個の対話行為タイプの組合せが対話記録にどの程度の頻度で現れるかを調べたものである。実施形態例では、trigram 確率 (３-gram 確率）を用いる。
【００４６】
図８に、対話行為タイプの３-gram 確率の対数の一例を示す。ここで、図中に示す「refer-start-time」は会議予約システムにおける開始時刻を指定する発話の対話行為タイプを表し、「backchannel 」はあいづちの対話行為タイプを表し、「ask-date」は日付を質問する発話の対話行為タイプを表し、「ask-start-time」は開始時間を質問する発話の対話行為タイプを表し、「request 」は“おねがいします”といったユーザ要求発話の対話行為タイプを表し、「refer-day 」は日に言及する発話の対話行為タイプを表し、「refer-month 」は月に言及する発話の対話行為タイプを表し、「refer-room」は会議室に言及する発話の対話行為タイプを表している。
【００４７】
一方、共起確率とは、対話記録から、対話の各時点での対話状態とその直後の対話行為との組を抽出し、その組が対話記録における全組の組み合わせのうちどの程度の頻度で現れるかを調べたものである。
【００４８】
上述の組とは、単純な対話状態と対話行為との bigram(２-gram)とは限らない。なぜなら、対話状態に含まれる内容の複雑さのため、データがスパースになることも予想されるからである。
【００４９】
そのため、実施形態例では、工夫として、対話状態が保持するユーザ意図の理解結果が対話行為によってどのように更新されるか、更新される項目は更新時にどういった状態であったかなどの共起の仕方を、図９に示す７つの項目値を使って区分けされることになる図１０に示すような複数のクラス（図１０の例では１７クラス）で表し、対話状態とその直後の対話行為との組がどのクラスに属しているのかを判断して、その組の共起確率として、それが属するクラスの共起確率を代替として用いるようにしている。
【００５０】
ただし、データのスパースさが発生しないことが分かっていれば、こういった工夫は必要ない。
【００５１】
次に、図９について、以下に詳しく説明する。
【００５２】
対話状態は、図７に示したように、複数のスロットと呼ばれる属性値対（attribute-value pair）から構成されるフレーム表現であり、その他に、ユーザとシステムとの間の発話履歴を保持し、システムの質問・確認はユーザの情報要求内容のスロットと関連付けられるものとなる（e.g., 会議室予約システムにおいて、対話状態として開始時間のスロットが存在するとき、「３時からですか？」というシステムの確認発話は情報要求内容の開始時間に対応し、「何時からですか？」というシステムの質問発話も同様に情報要求内容の開始時間に対応することになる）。
【００５３】
このとき、本発明では、対話状態と対話行為との組を、図９に示す以下の７つの項目値によってクラス分けしている。
【００５４】
（１）直前に質問したスロットを更新するか
対話行為の更新対象であるスロットが、システムが直前に質問した内容に関連付けられたスロットであるかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００５５】
（２）確認中のスロットを更新するか
対話行為の更新対象であるスロットが、システムが直前に確認した内容に関連付けられたスロットであるかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００５６】
（３）これまでに確認したスロットに関する対話行為か
対話行為の更新対象であるスロットが、システムが直前までに確認した内容に関連付けられたスロットであるかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００５７】
（４）値を持たないスロットに関する対話行為か
対話行為の更新対象であるスロットが、すでに過去の対話行為によって値が入力されているかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００５８】
（５）値を持つスロットに関する対話行為で、更新後も値が同じか
対話行為の更新対象であるスロットが、すでに過去の対話行為によって値が入力されており、その値を対話行為が変更しないかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００５９】
（６）値を持つスロットに関する対話行為で、値を変更するか
対話行為の更新対象であるスロットが、すでに過去の対話行為によって値が入力されており、その値を対話行為が変更するかどうかを示す値である。該当すれば１となり、違うならば０となる。
【００６０】
（７）最初の対話状態についてか
実施例におけるシステムでは、システムは対話の最初においてのみ「ご用件をどうぞ」と発話する。このようなユーザが返答として何を言ってもいいような質問をオープンな質問というが、こういった質問の直後とそうでない場合との対話行為では、分布が異なるため、区別される必要がある。初回の対話状態に対する対話行為であれば１となり、違うならば０となる。
【００６１】
以上に説明した図９に示す７つの項目において、番号１，２，３，７は独立事象である。一方、番号４，５，６の組はいずれか１つが１であるか、または、すべてが０であるかの４通りの組み合わせとなる。従って、起こり得る事象は、２⁴×４＝６４通りである。これらの中で実際に対話中に起こった１７通りを図１０に示してある。
【００６２】
ここで、図１０中に示す共起パターンの各ビットは、左から右の順に図９で説明した７つの項目の項目番号に対応しており、それらのビットの値には、図９で説明した１または０の２値のいずれか一方の値が割り当てられる。そして、図１０に示す共起確率は、対応する共起パターンが起こった確率を示している。
【００６３】
本発明では、対話状態とその直後の対話行為との組が示す共起パターンを求めると、その共起パターンを検索キーにして図１０のテーブルを検索することで、その組が図１０に示す１７個のクラスの内のどのクラスに属しているのかを判断して、その組の共起確率として、それが属するクラスの共起確率を代替として用いるようにしている。
【００６４】
〔イ−３〕統計情報保持部２３
統計情報保持部２３は、統計情報抽出部２２の抽出した統計情報を格納する。具体的には、統計情報抽出部２２の抽出した連鎖確率の対数（後述するスコアｓ_ngramに相当するもの）を連鎖パターンに対応付けて格納するとともに、統計情報抽出部２２の抽出した共起確率の対数（後述するスコアｓ_col）を共起パターンに対応付けて格納する。
【００６５】
〔ロ〕対話装置１０
〔ロ−１〕ユーザ発話受付部１１
ユーザ発話受付部１１は、ユーザ発話を受け付けるものであり、マイクと音声認識器とから構成されている。
【００６６】
この音声認識器には、ユーザ発話のたびに、マイクを通してユーザ音声が入力され、これを受けて、音声認識器は音声認識結果を出力する。このとき、音声認識器は最尤の音声認識結果を出力することに加えて、上位ｎ個の音声認識結果の仮説（図１中に示すｎ−ｂｅｓｔ）を出力する。
【００６７】
〔ロ−２〕対話行為変換部１２
対話行為変換部１２は、ユーザ発話を対話行為に変換するものであり、ユーザ発話受付部１１の出力する音声認識結果の仮説を入力とし、対応する対話行為を出力する。
【００６８】
具体的には、ユーザ発話受付部１１の出力する上位ｎ個の音声認識結果のそれぞれについて、構文解析・意味解析を行うプログラム処理を施すことで対話行為を生成し、このとき、複数の解釈が存在する場合には、複数の対話行為を生成する。
【００６９】
簡単な例を挙げると、「２時」とだけユーザが時間を指定した場合、開始時間に関する対話行為と終了時間に関する対話行為のどちらにも対応できるようにするために、図４にも示したように、両方の場合、すなわち２つの対話行為を生成する。ここでは、説明の便宜上、上位ｎ個の音声認識結果から、ｍ個の対話行為を生成するものとする。（図１中に示すｍ−ｂｅｓｔ）。
【００７０】
また、それぞれの対話行為に対しては、音声認識時のスコアと構文解析・意味解析の妥当性の観点とから、スコア付けされているものとする。実施形態例では、この対話行為のスコア（後述するスコアｓ_act) として、音声認識時における認識結果順序の逆数の対数を用いている。したがって、認識結果順序の低いものはスコアが低くなる。
【００７１】
〔ロ−３〕対話状態管理部１４
対話状態管理部１４は、対話の各時点でのユーザ意図の理解結果などについて記載する図７に示したような対話状態を保持しており、要求に応じて、その保持する対話状態の情報を提供する。ここで、対話状態はスコア付きで一つ以上保持されているとし、対話の最初には、初期の対話状態が一つ保持されているとする。
【００７２】
対話状態管理部１４の管理する対話状態は、上述したように、システムが内部に保持する様々な対話に関する情報のことを指し、対話の各時点でのユーザ意図の理解結果や、ユーザとシステムとの間の発話履歴や、生成元となった対話状態の履歴などを含んでいる。
【００７３】
〔ロ−４〕対話状態更新部１３
対話状態更新部１３は、対話行為変換部１２の出力するｍ個のスコア付き対話行為と、対話状態管理部１４の保持するｋ個のスコア付き対話状態とを受け取り、これらの対話行為と対話状態との組み合わせによって、新たにｍ×ｋ個のスコア付き対話状態を生成する。
【００７４】
このとき、対話状態更新部１３は、ｍ×ｋ個のそれぞれの対話状態のスコアＳ_t+1を、
Ｓ_t+1＝Ｓ_t＋α×ｓ_act＋β×ｓ_ngram＋γ×ｓ_col
の計算式に従って算出する。
【００７５】
ここで、Ｓ_tは更新前の対話状態のスコア、ｓ_actは対話行為のスコア（対話行為変換部１２の割り付けるスコア）、ｓ_ngramは対話行為タイプの連鎖確率に関するスコア、ｓ_colは対話状態と対話行為との共起確率に関するスコアである。また、α，β，γはそれぞれ重み係数である。
【００７６】
本実施形態例では、ｓ_ngramには対象の対話行為の３-gram 確率の対数（図８に示したもの）を用い、ｓ_colには対象となる対話状態と対話行為との組の属するクラスの共起確率の対数（図１０に示したものの対数）を用いる。また、α、β、γは例えばすべて１を用いる。
【００７７】
このようにして計算されたスコアを持つｍ×ｋ個の対話状態は、対話状態管理部１４に渡され、それまでに保持されていた対話状態に代わって保持されることになる。
【００７８】
ここで、対話の進行に伴って対話状態の組み合わせ数が多くなり、処理に支障をきたす場合には、保持する対話状態の最大数（対話状態ビーム幅と呼ぶ）を決定し、高スコアの対話状態のみを残し、低スコアの対話状態を破棄することで、実時間的な処理を行うことを可能にする。
【００７９】
〔ロ−５〕応答生成部１５
応答生成部１５は、対話状態管理部１４から、ｍ×ｋ個の対話状態をスコア付きで受け取り、最もスコアの高いものを最尤の対話状態とし、その対話状態に基づいて、ユーザ応答を決定し、音声によって応答を伝達する。
【００８０】
このとき応答に用いられたシステム発話は、図７に示した対話状態の記録するユーザとシステムとの間の発話履歴に登録されることで、システムの持つ対話状態のすべてに反映されることになる。
【００８１】
次に、図１１及び図１２の処理フローに従って、このように構成される対話装置１０の実行する処理について詳細に説明する。
【００８２】
ここで、図１１はユーザ発話受付部１１／対話行為変換部１２／対話状態更新部１３の実行する処理フローであり、図１２は応答生成部１５の実行する処理フローである。
【００８３】
対話装置１０は、ユーザが発話すると、図１１の処理フローに示すように、先ず最初に、ステップ１０で、ユーザ音声を入力し、続くステップ１１で、その入力したユーザ音声について音声認識を行うことで、上位ｎ個の音声認識結果を得る。
【００８４】
続いて、ステップ１２で、上位ｎ個の音声認識結果のそれぞれについて構文・意味解析を行うことで、ｍ個の対話行為を生成するとともに、それらの対話行為に対して、音声認識時のスコアと構文・意味解析の妥当性の観点とから、スコアｓ_actを付与する。
【００８５】
例えば、「２時」とだけユーザが時間を指定した場合、開始時間に関する対話行為と終了時間に関する対話行為のどちらにも対応できるようにするために、図４にも示すように、開始時間に関する対話行為と終了時間に関する対話行為とを生成するとともに、その２つの対話行為に対して、音声認識時のスコアと構文・意味解析の妥当性の観点とから、スコアｓ_actを付与するのである。
【００８６】
ここで、構文・意味解析の妥当性とは、例えば、「第３会議室予約します」という解析結果が得られた場合と、「第３会議室を予約します」という解析結果が得られた場合とを考えた場合、「第３会議室の予約」という対話行為としては後者の方がより正確な文章となっていることで妥当性が高いと判断することになる。
【００８７】
続いて、ステップ１３で、対話状態管理部１４に保持される対話状態から、スコアＳ（以下で、この時点のスコアをＳ_tと記述する）の高い順にｋ個の対話状態を読み出す。後述するステップ１８の処理に従って、対話状態管理部１４に保持される対話状態にはスコアＳが付与されており、図７に示すように、対話状態管理部１４に保持される対話状態はこのスコアＳを記録しているので、この記録されるスコアＳの高い順にｋ個の対話状態を読み出すのである。
【００８８】
続いて、ステップ１４で、ステップ１３で読み出したｋ個の対話状態と、ステップ１２で生成したｍ個の対話行為とから、これまでの対話状態に代わるｍ×ｋ個の新たな対話状態を生成して、対話状態管理部１４に登録する。
【００８９】
続いて、ステップ１５で、新たに生成した対話状態の中から未処理のものを１つ選択する。
【００９０】
続いて、ステップ１６で、選択した対話状態について、その対話状態の生成元となった「対話状態→対話行為」の共起パターン（図１０に示したもの）を特定して、その特定した共起パターンを検索キーにして、統計情報保持部２３を検索することで、その共起パターンの指す共起確率に関するスコアｓ_colを取得する。
【００９１】
続いて、ステップ１７で、選択した対話状態に記録される発話履歴の記述する対話行為タイプの連鎖から、対話行為タイプの連鎖パターンを特定して、その特定した連鎖パターンを検索キーにして、統計情報保持部２３を検索することで、その連鎖パターンの指す連鎖確率に関するスコアｓ_ngramを取得する。
【００９２】
続いて、ステップ１８で、選択した対話状態のスコアＳ_t+1を、
Ｓ_t+1＝Ｓ_t＋α×ｓ _act＋β×ｓ_ngram＋γ×ｓ_col
に従って算出して、その選択した対話状態（対話状態管理部１４に登録されている）に記録する。
【００９３】
続いて、ステップ１９で、ステップ１４で生成した全ての対話状態を処理したのか否かを判断して、未処理のものが残されていることを判断するときには、ステップ１５に戻り、未処理のものが残されていないことを判断するときには、処理を終了する。
【００９４】
このようにして、対話装置１０は、ユーザが発話すると、図４に示すような形で、対話状態（図中に示す「開始スロット，終了スロット，スコア」を記録するもの）を更新していくように処理するのである。
【００９５】
一方、対話装置１０は、ユーザが発話することでそれに応答する場合には、図１２の処理フローに示すように、先ず最初に、ステップ２０で、図１１の処理フローに従って新たに生成したｍ×ｋ個の対話状態のスコアが算出されるのを待って、ｍ×ｋ個の対話状態のスコアが算出されると、続くステップ２１で、対話状態管理部１４から、ｍ×ｋ個の対話状態をスコア付きで読み出す。
【００９６】
続いて、ステップ２２で、読み出した対話状態の中から、最も高いスコアのものを最尤の対話状態として特定し、続くステップ２３で、その特定した最尤の対話状態に基づいて、ユーザ応答を決定する。
【００９７】
続いて、ステップ２４で、その決定したユーザ応答を音声出力し、続くステップ２５で、ｍ×ｋ個の対話状態のそれぞれに、発話履歴として、システム発話の対話行為が行われたことを記録して、処理を終了する。
【００９８】
このようにして、本発明の音声対話システム１では、ユーザの情報要求に１つ以上の解釈が存在し、システムが保持するユーザ情報要求内容に１つ以上の解釈が存在する場合において、ユーザの情報要求のそれぞれと、システムが保持するユーザ情報要求内容のそれぞれとの組み合わせから得られる、１つ以上のユーザ情報要求内容候補のそれぞれに対し、過去の対話データから得られる統計的基準を用いて、正解と考えられるものに高いスコアを、そうでないものには低いスコアを与え、スコアの高い候補を優先的なユーザ情報要求内容とすることで、ユーザの情報要求内容を高精度で取得し、同時に、その時点ではスコアの低いユーザ情報要求内容候補であっても、次のユーザ情報要求との組み合わせによっては、高いスコアを持つユーザ情報要求内容候補を生成する可能性があることを考慮して、スコアの低いユーザ情報要求内容候補についても次のユーザ入力まで保持し続けるようにするのである。
【００９９】
この構成に従って、本発明の音声対話システム１によれば、話の流れ（文脈）を考えて、ユーザの伝えたいことを理解できるようになって、それに応答できるようになる。
【０１００】
次に、本発明の有効性を検証すべく行った対話実験の結果について説明する。この対話実験は全部で３種類行った。それぞれの実験概要と結果とを以下に示す。
【０１０１】
（１）対話実験１
音声認識器の出力を５−ｂｅｓｔにし、対話状態ビーム幅（保持する対話状態の最大数）を１５で実験した。１６名の被験者から２５６の対話データを収集した。タスク達成に５分以上かかった対話を失敗としたとき、タスク達成率は８8.３％（２２６／２５６）であった。
【０１０２】
（２）対話実験２
音声認識器の出力を１−ｂｅｓｔにし、対話状態ビーム幅を１で実験した。２８名の被験者から２２４の対話データを収集した。タスク達成に５分以上かかった対話を失敗としたとき、タスク達成率は８8.３％であった。タスクが達成された対話の平均タスク達成時間は１０７.6６秒であった。
【０１０３】
（３）対話実験３
音声認識器の出力を１−ｂｅｓｔにし、対話状態ビーム幅を３０で実験した。２８名（対話実験２と同一）の被験者から２２４の対話データを収集した。タスク達成に５分以上かかった対話を失敗としたとき、タスク達成率は９1.０％であった。タスクが達成された対話の平均タスク達成時間は９５.8６秒であった。同一被験者の、同一タスク（会議室の予約内容）での対話の場合を対話実験２と比較した場合、タスク達成時間は統計的に有意に短縮された。
【０１０４】
以上の対話実験から次のことが検証できた。
【０１０５】
対話実験１において、比較的良好な達成率が得られたことは、人手で作成したルールによる対話状態のスコアリングをしなくても、音声対話システムとして十分機能していることを示す。また、対話実験２と対話実験３との結果により、対話状態を１つ以上持つことの意義が示された。
【０１０６】
【発明の効果】
以上説明したように、本発明によれば、話の流れを考えてユーザの伝えたいことを理解して、それに対して応答することを実現するときにあって、複数の対話行為と対話状態との組み合わせから得られる複数の対話状態のスコアリングに、手作業によるルールを用いるのではなく、人間と機械との対話記録から得られる統計情報を元にスコアリングを行うことで、ユーザの情報要求内容を高精度に特定することができるようになる。
【０１０７】
そして、本発明によれば、人手によるルール記述をなくすことで、システム作成のコストも低く抑えることが可能となる。
【図面の簡単な説明】
【図１】本発明の音声対話システムの一実施形態例である。
【図２】会議室予約システムの会話形態を説明する図である。
【図３】会議室予約システムの会話における問題点を説明する図である。
【図４】本発明の音声対話システムの実行する処理を説明する図である。
【図５】連鎖確率を説明する図である。
【図６】共起確率を説明する図である。
【図７】対話状態のデータ構造の一例を示す図である。
【図８】連鎖確率のデータ値の一例を示す図である。
【図９】共起パターンのクラス分けに用いる項目を説明する図である。
【図１０】共起確率のデータ値の一例を示す図である。
【図１１】対話装置の実行する処理フローである。
【図１２】対話装置の実行する処理フローである。
【符号の説明】
１音声対話システム
１０対話装置
１１ユーザ発話受付部
１２対話行為変換部
１３対話状態更新部
１４対話状態管理部
１５応答生成部
２０統計情報管理装置
２１対話記録保持部
２２統計情報抽出部
２３統計情報保持部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a dialogue method and apparatus using statistical information, and a dialogue method using the statistical criteria obtained from past dialogue data so that the user's information request content can be specified with high accuracy. The present invention relates to a dialogue program used for realizing the above and a recording medium on which the program is recorded.
[0002]
[Prior art]
Each time a user inputs an information request to the system using a character string or voice, the user information request content held by the system is updated using the history of user-system interaction, and the updated user information request content is updated. In response, there is an interactive system that performs a process of responding to a user by a character string or voice.
[0003]
The user's character string or voice used when the user inputs an information request to the dialog system is called user utterance, and the information request content transmitted to the dialog system by the user utterance is called dialog action. Deriving dialogue actions from user utterances is called language understanding, and in the case of a spoken dialogue system, it is called speech understanding.
[0004]
The dialogue action is divided into a dialogue action type and information accompanying it. The dialogue action type is a symbol representing a rough semantic type of the dialogue action.
[0005]
The user information request content held by the dialog system is called a dialog state, and the dialog system updates the dialog state by using the history of user-system interaction each time a user utterance is input. This update process is called discourse understanding. Generally speaking, the internal state of the system is changed by input of user utterances such as language understanding and discourse understanding. A dialog system in which user input is limited to voice is particularly called a voice dialog system.
[0006]
The following technologies are related to discourse understanding in dialogue systems.
[0007]
First, there is a technique for statistically estimating the interactive action type of a certain user utterance from several previous interactive action types. This is called statistical dialogue action type estimation, and includes the technique described in Non-Patent Document 1 below. The technique described in Non-Patent Document 1 shows that by estimating which interactive action type a certain user utterance belongs to, the grammar rules and vocabulary to be accepted by the speech recognizer can be narrowed down, and speech recognition accuracy can be improved. It was.
[0008]
Secondly, a technique for improving the accuracy of discourse understanding using plan recognition has been proposed (see, for example, Non-Patent Documents 2 and 3). This is because the user utterance group structure (referred to as a plan) to be performed in order to achieve a certain purpose is specified in advance, and when the user utterance is ambiguous, by selecting the interpretation that best matches the plan, It is a technology that eliminates ambiguity. Discourse comprehension based on plans has been studied mainly for dialog systems using keyboards, but in recent years it has also been applied to spoken dialog systems.
[0009]
Third, a technique called ISSS (Incremental Sentence Sequence Search) is shown in Non-Patent Document 4 and Patent Document 1 below. This method is a method for dealing with a dialogue act that spans several utterance intervals, which is one of the phenomena in dialogue, and accepts as an input fragments (words, phrases) of user utterances constituting the dialogue act, Update the conversation state sequentially. If it is not clear whether the fragments of user utterances up to a certain point constitute a dialogue act, multiple dialogue states should be retained with a score using a score rule prepared in advance. Thus, after the user utterance is input, the most likely dialogue state is determined.
[0010]
In a spoken dialogue system, it has been reported that the accuracy of discourse understanding is improved by combining a plurality of voice recognition candidates and ISSS (ISSS-N method, for example, see Non-Patent Document 5).
[0011]
All of the above prior arts are techniques for eliminating the ambiguity of understanding in user utterances. Although there are multiple dialogue action candidates corresponding to user utterances, both dialogue action type estimation and discourse understanding using plan recognition are techniques for selecting the most likely candidate from a plurality of candidates. The problem of selecting the most likely candidate from a plurality of dialogue actions is best determined from the combination of the dialogue actions and the dialogue states by expressing the dialogue states obtained from the dialogue states at a certain point in time with scores. As a problem of choosing a good one, it is a technology that eliminates the ambiguity of dialogue act.
[0012]
[Non-Patent Document 1]
Masaaki Nagata and Tsuyoshi Morimoto.First steps toward statistical modeling of dialogue to predict the speech act type of the next utterance.Speech Communication, Vol.15, pp.193-203, 1994.
[Non-Patent Document 2]
Charles Rich, Neal Lesh, and Candace Sidner.COLLAGEN: Applying collaborative discourse theory.Vol.22, No.4, pp.15-25, 2001.
[Non-Patent Document 3]
Yusuke Shinyama, Takenobu Tokunaga, and Hozumi Tanaka. Kairai-software robots understanding natural language.Third International Workshop on Human-Computer Conversation, 2000.
[Non-Patent Document 4]
Mikio Nakano, Noboru Miyazaki, Jun-ichi Hirasawa, Kohji Dohsaka, and Takeshi Kawabata. Understanding unsegmented user utterances in realtime spoken dialogue systems. In Proc. 37th ACL, pp.200-207, 1999.
[Non-Patent Document 5]
Noboru Miyazaki, Mikio Nakano, Kiyoaki Aikawa, "Robust Speech Understanding by n-best Speech Recognition and Sequential Understanding", SIG-SLP-40, 2002, 121-126
[Patent Document 1]
JP-A-11-237894
[0013]
[Problems to be solved by the invention]
In ISSS, it is necessary to select the best one from a combination of a plurality of dialogue actions and dialogue states.
[0014]
In the conventional method, manual rules are created and the dialogue state is scored. For example, the dialogue state having the highest score is regarded as the most likely.
[0015]
However, since scoring rules are based on designers' heuristics, scoring is often mistaken in some cases, and there is a problem that rule creation and tuning require high costs and expertise. .
[0016]
The present invention has been made in view of such circumstances, and does not use manual rules for scoring a plurality of dialogue states obtained from a combination of a plurality of dialogue actions and dialogue states, but humans and machines. It is an object of the present invention to provide a new dialogue technique that enables the user to specify the information request content with high accuracy by performing scoring based on the statistical information obtained from the dialogue record.
[0017]
[Means for Solving the Problems]
  In order to achieve this object, the dialogue apparatus of the present invention has a dialogue function with the user,fromEach time an information request is input, in order to identify the user's information request content using the history of interaction with the user and to respond to the user accordingly, (1) the user's information request content Storage means for storing candidates; (2)I input itA user information request obtained in response to a user information request and having one or more interpretations in the user information request.ContentGenerating means for generating one or more new user information request content candidates from a combination of each of the user information request contents stored in the storage means, and (3) generated by the generation meansNewFor user information request candidate(I) a first score of the user information request content stored in the storage means that is a generation source of a new user information request content candidate; ii ) A second score obtained when interpreting the input user information request; (iii) A third score based on the appearance probability of a pattern in which a user input information request and a response to it are obtained by statistical processing from past interaction data; (iv) Calculated by weighted calculation of the fourth score based on the appearance probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request co-occurring are obtained from the past dialogue data by statistical processing Give a scoreAnd (4) generated by the generating means according to the score given by the giving means.NewA specifying means for specifying the user information request contents from the candidate user information request contents, and (5) generated by the generating meansNewAn update unit that updates the user information request content candidate stored in the storage unit by writing the user information request content candidate in the storage unit is provided.
[0018]
The interactive method of the present invention realized by the operation of each of the above processing means can be realized by a computer program, and this computer program can be provided by being recorded on an appropriate recording medium such as a semiconductor memory. The present invention is realized by being provided via a network, installed when the present invention is implemented, and operating on a control means such as a CPU.
[0019]
  In the interactive device of the present invention configured as described above, the userfromWhen an information request is entered, a user information request obtained by the presence of one or more interpretations in the user information requestContentAnd one or more new user information request content candidates from the combination of each of the user information request content stored in the storage means, and for those user information request content candidates, A score is assigned using a statistical standard obtained from past dialogue data.
[0020]
  Specifically, (i) a first score possessed by the user information request content stored in the storage means that is a generation source of a new user information request content candidate, ii) A second score obtained when interpreting the input user information request; (iii) User obtained by statistical processing from past dialogue dataThe probability of appearance of a pattern in which a user input information request and a response to it are chainedA third score by rate, (iv) Recorded from past dialogue data by statistical processingThe occurrence probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request co-occurCalculated by weighting the fourth score by rateGrant a core.
[0021]
Then, according to the assigned score, the user information request content is specified from the generated user information request content candidates, and the generated user information request content candidates are written in the storage device, so that The stored user information request content candidate is updated.
[0022]
In this way, according to the present invention, whenever a user inputs an information request to the system using a character string or voice, the user information request content held by the system is updated using a history of interaction between the user and the system. In response to the updated user information request content, one or more interpretations exist in the user information request when responding to the user with a character string or voice, and the user information request content held by the system In the case where one or more interpretations exist, for each of one or more user information request content candidates obtained from a combination of each of the user information request and each of the user information request content held by the system, Use statistical criteria from past interaction data to give high scores to those that are considered correct and low scores to those that are not. By obtaining a candidate with a high score as a priority user information request content, the user's information request content is obtained with high accuracy. At the same time, even if the user information request content candidate has a low score, Considering the possibility of generating a user information request content candidate having a high score depending on the combination with the user information request, the user information request content candidate having a low score is kept until the next user input. To do.
[0023]
In accordance with this configuration, according to the present invention, when the user understands what the user wants to convey in consideration of the flow of the talk (context) and realizes responding to it, a plurality of dialogue actions and dialogue states are realized. Rather than using manual rules for scoring multiple dialogue states obtained from the combination of and user information, scoring is based on statistical information obtained from human and machine dialogue records. Request contents can be specified with high accuracy.
[0024]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail according to embodiments.
[0025]
FIG. 1 illustrates an example of an embodiment of a voice interaction system 1 having the present invention.
[0026]
As shown in this figure, the speech dialogue system 1 comprising the present invention accepts a user utterance and processes so as to return a response to the user if the content of the dialogue state satisfies a certain condition. It is divided into a dialogue device 10 and a statistical information management device 20.
[0027]
The dialog device 10 includes a user utterance receiving unit 11 that receives a user utterance, a dialog act converting unit 12 that converts a user utterance into a dialog act, and a dialog state update that updates the dialog state based on the dialog state and the dialog act. Unit 13, a dialog state management unit 14 that manages the dialog state, and a response generation unit 15 that transmits information to the user based on the updated dialog state.
[0028]
On the other hand, the statistical information management apparatus 20 includes a dialogue record holding unit 21 that holds a dialogue record between a human and a machine, a statistical information extraction unit 22 that extracts statistical information from the dialogue record held in the dialogue record holding unit 21, And a statistical information holding unit 23 for storing the statistical information extracted by the statistical information extracting unit 22.
[0029]
Hereinafter, the processing of the voice interactive system 1 of the present invention configured as described above will be described in detail with application to a conference room reservation system as a specific example.
[0030]
In the conference room reservation system, after the user has given the conference room name, date and time to the system, the system reserves the conference room and informs the user accordingly. As shown in FIG. 2, the user not only unilaterally communicates requirements to the system, but depending on the situation, the system may prompt the user to speak by speaking.
[0031]
At this time, as shown in FIG. 3, the conference room reservation system does not always hear all of the user's utterances.
[0032]
Therefore, the speech dialogue system 1 of the present invention understands what the user wants to convey in consideration of the flow (context) of the story.
[0033]
In order to realize this, the speech dialogue system 1 of the present invention uses a statistical standard obtained from past dialogue data, gives a score to a dialogue state that changes according to a user's dialogue action, As shown in FIG. 4, all the contents that the user seems to have transmitted are taken into consideration, and according to the assigned score, a plausible one is selected from the flow of the story.
[0034]
As a score to be given at this time, an appearance probability of a pattern in which a dialogue action type of a user utterance and a dialogue action type of a system utterance corresponding thereto are chained (hereinafter also referred to as a linkage probability), a dialogue state held by the system, and A score corresponding to the appearance probability (hereinafter sometimes referred to as a co-occurrence probability) of a pattern co-occurring with a user utterance is given.
[0035]
That is, a score corresponding to a chain probability as shown in FIG. 5 representing a rough flow of conversation and a co-occurrence probability as shown in FIG. 6 representing a local detailed flow of conversation is given. .
[0036]
As described above, the dialogue action is content transmitted to the dialogue system by the user utterance, and the dialogue state is user information request content held by the dialogue system.
[0037]
The dialog state referred to in the present invention refers to information related to various dialogs held in the system, refers to the flow of the talk (context), and is a result of understanding the user intention at each point in the dialog ( It is described in a frame expression composed of attribute value pairs called slots), an utterance history between the user and the system, a history of a conversation state that is a generation source, and the like.
[0038]
If the application to the conference room reservation system is described as a specific example, the dialog state has a data structure as shown in FIG. 7, and the dialog state management unit 14 shown in FIG. It manages the conversation state with structure.
[0039]
Next, each means with which the voice interactive system 1 of this invention is provided is demonstrated.
[0040]
[I] Statistical information management device 20
[A-1] Dialog record holding unit 21
The dialogue record holding unit 21 holds a dialogue record between a human and a machine.
[0041]
It is assumed that the dialogue record held in the dialogue record holding unit 21 is data in which a history of dialogue between a human and a voice dialogue system is recorded. In the dialogue record, the user utterance, the start time and end time of the utterance of the system, and the dialogue state of the system before and after the user utterance are stored for each dialogue performed.
[0042]
All user voices and system voices are recorded and all user voices are transcribed. It is assumed that the transcribed user voice has been converted into an expression of dialogue action by automatic processing using a program and manual correction.
[0043]
[A-2] Statistical information extraction unit 22
The statistical information extraction unit 22 extracts two types of statistical information from the dialogue record held in the dialogue record holding unit 21.
[0044]
One is the chain probability of the dialogue action type (representing the semantic type of the dialogue action), and the other is the co-occurrence probability of the dialogue state at each time point of the dialogue and the dialogue act immediately after that.
[0045]
The n-gram probability is used as the chain probability. The n-gram probability is an examination of how often a certain combination of n dialogue action types appears in the dialogue record. In the embodiment, trigram probability (3-gram probability) is used.
[0046]
FIG. 8 shows an example of the logarithm of the 3-gram probability of the dialogue action type. Here, “refer-start-time” shown in the figure represents the dialogue action type of the utterance that specifies the start time in the conference reservation system, “backchannel” represents the dialogue action type of Aichi, and “ask-date” Is the dialogue action type of the utterance that asks the date, "ask-start-time" shows the dialog action type of the utterance that asks the start time, and "request" is the dialogue action of the user-requested utterance such as "please please" Type, where “refer-day” represents the interaction type of the utterance referring to the day, “refer-month” represents the interaction type of the utterance referring to the month, and “refer-room” refers to the conference room This represents the dialogue action type of the utterance.
[0047]
On the other hand, the co-occurrence probability is a dialog record that extracts a pair of dialog state at each point in the dialog and the dialog action immediately after that, and how often the pair is a combination of all pairs in the dialog record. It has been investigated whether it appears.
[0048]
The above-mentioned set is not necessarily a bigram (2-gram) of simple dialog state and dialog action. This is because the data is expected to be sparse due to the complexity of the content contained in the dialog state.
[0049]
Therefore, in the embodiment, as a device, co-occurrence such as how the understanding result of the user intention held in the dialog state is updated by the dialog act, what state the updated item was at the time of the update, etc. The method is represented by a plurality of classes as shown in FIG. 10 (17 classes in the example of FIG. 10) to be divided using the seven item values shown in FIG. It is determined which class the class belongs to, and the co-occurrence probability of the class to which it belongs is used as an alternative as the co-occurrence probability of the set.
[0050]
However, if you know that the sparseness of the data does not occur, you do not need such a device.
[0051]
Next, FIG. 9 will be described in detail below.
[0052]
As shown in FIG. 7, the dialog state is a frame representation composed of attribute-value pairs called a plurality of slots, and holds an utterance history between the user and the system. The system question / confirmation will be associated with the slot of the user's information request (eg, when there is a start time slot as a dialog state in the conference room reservation system, say "From 3 o'clock?" The system confirmation utterance corresponds to the start time of the information request content, and the question utterance of the system “from when?” Corresponds to the start time of the information request content as well).
[0053]
At this time, in the present invention, the combinations of the dialogue state and the dialogue action are classified according to the following seven item values shown in FIG.
[0054]
(1) Whether to update the last questioned slot
It is a value indicating whether or not the slot that is the update target of the dialogue action is a slot that is associated with the content that the system has questioned immediately before. 1 if applicable, 0 if not.
[0055]
(2) Whether to update the slot being confirmed
It is a value indicating whether or not the slot that is the update target of the dialogue action is a slot associated with the content that the system has confirmed immediately before. 1 if applicable, 0 if not.
[0056]
(3) Is the dialogue act related to the slot confirmed so far?
It is a value indicating whether or not the slot that is the update target of the dialogue action is a slot associated with the content that the system has confirmed immediately before. 1 if applicable, 0 if not.
[0057]
(4) Is it an interactive action related to a slot that has no value?
This is a value indicating whether or not the slot that is the update target of the dialogue action has already been input by a past dialogue act. 1 if applicable, 0 if not.
[0058]
(5) Whether the value remains the same after the update in an interactive action related to a slot with a value
This value is a value indicating whether or not the value of the slot that is the update target of the dialog action has already been input by the past dialog action and the dialog action does not change the value. 1 if applicable, 0 if not.
[0059]
(6) Whether the value is changed by the interactive action related to the slot having the value
This is a value indicating whether or not the value of the slot that is the update target of the dialog action has already been input due to the past dialog action, and the dialog action changes the value. 1 if applicable, 0 if not.
[0060]
(7) About the first dialog state
In the system of the embodiment, the system speaks “please do business” only at the beginning of the dialogue. Such questions that the user can say as a response are called open questions. However, since the distribution is different in the dialogue act immediately after such a question and when it is not, it needs to be distinguished. . It is 1 if it is a dialogue action for the first dialogue state, and 0 if it is different.
[0061]
In the seven items shown in FIG. 9 described above, the numbers 1, 2, 3, and 7 are independent events. On the other hand, the combinations of numbers 4, 5, and 6 are four combinations, either one being 1 or all being 0. Therefore, a possible event is 2^FourX4 = 64 ways. Of these, 17 patterns that actually occurred during the dialogue are shown in FIG.
[0062]
Here, each bit of the co-occurrence pattern shown in FIG. 10 corresponds to the item numbers of the seven items described in FIG. 9 in order from left to right, and the values of these bits are described in FIG. One of the two values of 1 or 0 is assigned. The co-occurrence probability shown in FIG. 10 indicates the probability that the corresponding co-occurrence pattern has occurred.
[0063]
In the present invention, when the co-occurrence pattern indicated by the set of the dialog state and the immediately following dialog action is obtained, the set is shown in FIG. 10 by searching the table of FIG. 10 using the co-occurrence pattern as a search key. It is determined which of the 17 classes it belongs to, and the co-occurrence probability of the class to which it belongs is used as an alternative as the co-occurrence probability of the set.
[0064]
[A-3] Statistical information holding unit 23
The statistical information holding unit 23 stores the statistical information extracted by the statistical information extracting unit 22. Specifically, the logarithm of the chain probability extracted by the statistical information extraction unit 22 (score s described later)_ngramAnd the logarithm of the co-occurrence probability extracted by the statistical information extraction unit 22 (score s to be described later)_col) Is stored in association with the co-occurrence pattern.
[0065]
[B] Dialogue device 10
[B-1] User utterance reception unit 11
The user utterance reception unit 11 receives a user utterance and includes a microphone and a voice recognizer.
[0066]
Each time a user utters, a user voice is input to the voice recognizer through a microphone. In response to this, the voice recognizer outputs a voice recognition result. At this time, in addition to outputting the maximum likelihood speech recognition result, the speech recognizer outputs a hypothesis (n-best shown in FIG. 1) of the top n speech recognition results.
[0067]
[B-2] Dialogue action conversion unit 12
The dialogue action conversion unit 12 converts a user utterance into a dialogue action, and inputs a hypothesis of the speech recognition result output by the user utterance reception unit 11 and outputs a corresponding dialogue action.
[0068]
Specifically, for each of the top n speech recognition results output from the user utterance reception unit 11, a dialogue action is generated by performing a program process for parsing and semantic analysis. At this time, a plurality of interpretations are generated. If it exists, it generates multiple interactive actions.
[0069]
To give a simple example, if the user specifies the time only at “2 o'clock”, it is also shown in FIG. 4 in order to be able to deal with both the dialogue action concerning the start time and the dialogue action concerning the end time. Thus, in both cases, i.e., generate two interactive actions. Here, for convenience of explanation, it is assumed that m dialogue actions are generated from the top n speech recognition results. (M-best shown in FIG. 1).
[0070]
Each dialogue act is scored from the viewpoint of the speech recognition and the validity of the syntax analysis / semantic analysis. In the example embodiment, this interactive action score (score s described later)_act) Is the logarithm of the reciprocal of the recognition result order at the time of speech recognition. Therefore, a score with a low recognition result order has a low score.
[0071]
[B-3] Dialogue state management unit 14
The dialogue state management unit 14 holds the dialogue state as shown in FIG. 7 which describes the understanding result of the user intention at each time point of the dialogue. provide. Here, it is assumed that one or more dialog states are held with a score, and one initial dialog state is held at the beginning of the dialog.
[0072]
As described above, the dialog state managed by the dialog state management unit 14 refers to information related to various dialogs held in the system. As a result, the understanding result of the user intention at each point of the dialog, the user and the system, This includes the utterance history between and the history of the conversation state from which it was generated.
[0073]
[B-4] Dialog state update unit 13
The dialog state update unit 13 receives the m scored dialog actions output from the dialog action conversion unit 12 and the k scored dialog states held by the dialog state management unit 14, and these dialog actions and dialog states In combination, mxk interactive states with scores are newly generated.
[0074]
At this time, the dialog state update unit 13 obtains m × k score S of each dialog state._{t + 1}The
S_{t + 1}= S_t+ Α × s_act+ Β × s_ngram+ Γ × s_col
Calculate according to the following formula.
[0075]
Where S_tIs the score of conversation state before update, s_actIs a score of dialogue action (score assigned by the dialogue action conversion unit 12), s_ngramIs the score related to the chain probability of the dialogue action type, s_colIs a score related to the co-occurrence probability between the dialogue state and the dialogue act. Α, β, and γ are weighting factors, respectively.
[0076]
In the present embodiment example, s_ngramIs the logarithm of the 3-gram probability (shown in FIG. 8) of the target dialogue action, and s_colIs used as the logarithm of the co-occurrence probability of the class to which the set of the conversation state and the dialogue action as a target belongs (the logarithm of that shown in FIG. 10). Further, α, β, and γ are all 1 for example.
[0077]
The m × k dialog states having the scores calculated in this way are transferred to the dialog state management unit 14 and are held in place of the dialog states held so far.
[0078]
Here, when the number of combinations of dialog states increases as the dialog progresses, and the processing is hindered, the maximum number of dialog states to be retained (referred to as dialog state beam width) is determined, and high-scoring dialogs are determined. By leaving only the state and discarding the low-score interaction state, it is possible to perform real-time processing.
[0079]
[B-5] Response generator 15
The response generation unit 15 receives m × k dialogue states with a score from the dialogue state management unit 14, sets the one with the highest score as the most likely dialogue state, and determines a user response based on the dialogue state And transmit the response by voice.
[0080]
The system utterance used for the response is registered in the utterance history between the user who records the conversation state and the system shown in FIG. 7, and is reflected in all the conversation states of the system. Become.
[0081]
Next, processing executed by the interactive apparatus 10 configured as described above will be described in detail according to the processing flows of FIGS. 11 and 12.
[0082]
Here, FIG. 11 is a processing flow executed by the user utterance receiving unit 11 / interactive action converting unit 12 / interactive state updating unit 13, and FIG. 12 is a processing flow executed by the response generating unit 15.
[0083]
When the user speaks, the dialog device 10 first inputs a user voice in step 10 as shown in the processing flow of FIG. 11, and performs voice recognition on the input user voice in the following step 11. Thus, the top n speech recognition results are obtained.
[0084]
Subsequently, in step 12, syntax and semantic analysis is performed on each of the top n speech recognition results to generate m dialogue actions, and for those dialogue acts, the score at the time of speech recognition and Score s from the viewpoint of validity of syntax and semantic analysis_actIs granted.
[0085]
For example, when the user designates the time only at “2 o'clock”, as shown in FIG. 4, the start time is related to the start time in order to be able to deal with both the start time and the end time. A dialogue action and a dialogue action related to the end time are generated, and score s for the two dialogue actions is obtained from the viewpoint of validity of the speech recognition and syntax / semantic analysis._actIs given.
[0086]
Here, the validity of the syntax / semantic analysis means that, for example, an analysis result “Reservation of the third conference room” is obtained, and an analysis result “Reservation of the third conference room” is obtained. If the case is considered, it is judged that the latter is more appropriate for the dialogue act “reservation of the third conference room” because the latter is a more accurate sentence.
[0087]
Subsequently, in step 13, a score S (hereinafter, a score at this time is set to S from the dialogue state held in the dialogue state management unit 14._tK conversation states are read out in descending order. According to the processing of step 18 described later, a score S is given to the dialogue state held in the dialogue state management unit 14, and the dialogue state held in the dialogue state management unit 14 is the score as shown in FIG. Since S is recorded, k conversation states are read out in descending order of the recorded score S.
[0088]
Subsequently, in step 14, from the k conversation states read in step 13 and the m conversation actions generated in step 12, mxk new conversation states are generated in place of the previous conversation states. Then, it is registered in the dialogue state management unit 14.
[0089]
Subsequently, in step 15, one unprocessed one is selected from the newly generated dialog state.
[0090]
Subsequently, in step 16, for the selected dialog state, the co-occurrence pattern (shown in FIG. 10) of “dialog state → dialog action” that is the generation source of the dialog state is specified, and the specified common state is specified. By searching the statistical information holding unit 23 using the occurrence pattern as a search key, the score s regarding the co-occurrence probability indicated by the co-occurrence pattern_colTo get.
[0091]
Subsequently, in step 17, a dialogue action type chain pattern is identified from the dialogue action type chain described in the utterance history recorded in the selected dialogue state, and the identified linkage pattern is used as a search key to perform statistics. By searching the information holding unit 23, the score s regarding the chain probability indicated by the chain pattern_ngramTo get.
[0092]
  Subsequently, in step 18, the selected dialogue state score S is obtained._{t + 1}The
      S_{t + 1}= S_t+ Α ×s _act+ Β × s_ngram+ Γ × s_col
And recorded in the selected dialog state (registered in the dialog state management unit 14).
[0093]
Subsequently, in step 19, when it is determined whether or not all the dialog states generated in step 14 have been processed, and it is determined that there are unprocessed ones, the process returns to step 15 and unprocessed. When it is determined that nothing is left, the process is terminated.
[0094]
In this way, when the user speaks, the dialogue apparatus 10 updates the dialogue state (recording “start slot, end slot, score” shown in the figure) in the form as shown in FIG. It is processed like this.
[0095]
On the other hand, when the interactive device 10 responds by speaking by the user, first, as shown in the processing flow of FIG. 12, first, in step 20, the newly generated m × according to the processing flow of FIG. Waiting for the k conversation state scores to be calculated, and then calculating the m × k conversation state scores, in the subsequent step 21, the dialog state management unit 14 determines the m × k conversation state scores. Is read with a score.
[0096]
Subsequently, in step 22, the one having the highest score is specified as the maximum likelihood dialog state among the read dialog states, and in step 23, the user response is determined based on the specified maximum likelihood dialog state. decide.
[0097]
Subsequently, in step 24, the determined user response is output by voice, and in step 25, the fact that the dialogue action of the system utterance has been recorded as the utterance history is recorded in each of the m × k dialogue states. To end the process.
[0098]
In this way, in the spoken dialogue system 1 of the present invention, when one or more interpretations exist in the user information request and one or more interpretations exist in the user information request contents held by the system, For each of one or more user information request content candidates obtained from a combination of each information request and each user information request content held by the system, using statistical criteria obtained from past dialog data A high score is given to what is considered correct, a low score is given to those that are not, and a candidate with a high score is designated as a priority user information request content, so that the user information request content is obtained with high accuracy, At the same time, even if the user information request content candidate has a low score at that time, depending on the combination with the next user information request, the user with the high score Considering that there is a possibility of generating an information request content candidates is to so continues to hold until the next user inputs for low user information request content candidates scoring.
[0099]
According to this configuration, according to the spoken dialogue system 1 of the present invention, it is possible to understand what the user wants to convey in consideration of the flow (context) of the story and to respond to it.
[0100]
Next, the results of a dialogue experiment conducted to verify the effectiveness of the present invention will be described. A total of three types of this interactive experiment were conducted. The outline and results of each experiment are shown below.
[0101]
(1) Dialogue experiment 1
The speech recognizer output was set to 5-best, and the conversation state beam width (the maximum number of conversation states to be maintained) was experimented with 15. 256 interaction data were collected from 16 subjects. The task achievement rate was 88.3% (226/256) when a dialogue that took 5 minutes or more to accomplish the task was considered a failure.
[0102]
(2) Dialogue experiment 2
The speech recognizer output was set to 1-best, and the interaction state beam width was 1. 224 dialogue data were collected from 28 subjects. When a dialogue that took 5 minutes or more to accomplish a task was considered a failure, the task achievement rate was 88.3%. The average task achievement time of the dialogue in which the task was accomplished was 107.66 seconds.
[0103]
(3) Dialogue experiment 3
The speech recognizer output was set to 1-best, and the dialogue state beam width was 30. 224 dialogue data were collected from 28 subjects (same as dialogue experiment 2). The task achievement rate was 91.0% when the dialogue that took 5 minutes or more to complete the task was considered as a failure. The average task achievement time of the dialogue in which the task was accomplished was 95.86 seconds. When the dialogue of the same subject in the same task (conference room reservation contents) was compared with the dialogue experiment 2, the task achievement time was statistically significantly reduced.
[0104]
The following could be verified from the above dialogue experiment.
[0105]
In Dialog Experiment 1, the fact that a relatively good achievement rate was obtained indicates that the voice dialog system is functioning satisfactorily without scoring the dialog state using rules created manually. In addition, the results of Dialog Experiment 2 and Dialog Experiment 3 showed the significance of having one or more dialog states.
[0106]
【The invention's effect】
As described above, according to the present invention, when the user understands what the user wants to convey in consideration of the flow of the story and realizes response to the user, a plurality of dialogue actions and dialogue states are defined. Rather than using manual rules for scoring multiple dialogue states obtained from a combination of the above, users can request information by scoring based on statistical information obtained from human-machine dialogue records The content can be specified with high accuracy.
[0107]
According to the present invention, it is possible to keep the cost of creating a system low by eliminating manual rule description.
[Brief description of the drawings]
FIG. 1 is an example of an embodiment of a voice interaction system of the present invention.
FIG. 2 is a diagram for explaining a conversation mode of the conference room reservation system.
FIG. 3 is a diagram for explaining problems in the conversation of the conference room reservation system.
FIG. 4 is a diagram for explaining processing executed by the voice interaction system of the present invention.
FIG. 5 is a diagram illustrating a chain probability.
FIG. 6 is a diagram illustrating a co-occurrence probability.
FIG. 7 is a diagram illustrating an example of a data structure of a dialog state.
FIG. 8 is a diagram illustrating an example of data values of chain probabilities.
FIG. 9 is a diagram illustrating items used for classification of co-occurrence patterns.
FIG. 10 is a diagram illustrating an example of a data value of a co-occurrence probability.
FIG. 11 is a processing flow executed by the interactive apparatus.
FIG. 12 is a processing flow executed by the interactive apparatus.
[Explanation of symbols]
1 Spoken dialogue system
10 Dialogue device
11 User utterance reception
12 Dialogue action conversion part
13 Dialogue state update unit
14 Dialogue state management department
15 Response generator
20 Statistical information management device
21 Dialog record holding unit
22 Statistical information extractor
23 Statistical information storage

Claims

ユーザとの対話機能を有して、ユーザからの情報要求を入力するたびに、ユーザとのやり取りの履歴を用いてユーザの情報要求内容を特定し、それに応じてユーザに対して応答する対話装置で実行される対話方法であって、
上記入力したユーザの情報要求に応答して、該ユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、ユーザの情報要求内容の候補を記憶する記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成する過程と、
上記新たなユーザ情報要求内容の候補に対して、過去の対話データから得られる統計的基準を用いてスコアを付与する過程と、
上記スコアに応じて、上記新たなユーザ情報要求内容の候補の中からユーザの情報要求内容を特定する過程と、
上記新たなユーザ情報要求内容の候補を上記記憶手段に書き込むことで、上記記憶手段に記憶されるユーザ情報要求内容の候補を更新する過程とを備え、
上記スコアを付与する過程では、上記新たなユーザ情報要求内容の候補に対して、
上記新たなユーザ情報要求内容の候補の生成元となった上記記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、
上記入力したユーザ情報要求を解釈したときに得られる第２のスコアと、
過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、
過去の対話データから統計処理により得られる、上記記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与することを、
特徴とする統計情報を用いた対話方法。An interactive device that has an interactive function with a user, specifies the content of the user's information request using a history of interaction with the user each time an information request from the user is input, and responds to the user accordingly An interactive method executed in
In response to the information request Yoo over THE that the input, respectively, storage for storing candidate information request content of the user of the user information request content obtained by one or more interpretations to the user information request exists Generating one or more new user information request content candidates from a combination with each of the user information request content stored in the means;
With respect to the upper SL new User chromatography The information request content of the candidate, and the process of imparting a score using statistical criteria derived from past interactive data,
Depending on the score, the process of identifying the information request content of the user from among the above SL new User chromatography The information request content of the candidate,
Candidate above SL new User chromatography The information request content by writing in the storage means, and a process of updating the candidate of the user information request contents stored in the storage means,
In the process of assigning the score, for the candidate for the new user information request content,
A first score possessed by the user information request content stored in the storage means that is a generation source of the new user information request content candidate;
A second score obtained when interpreting the input user information request;
A third score based on the appearance probability of a pattern in which a user input information request and a response to it are obtained by statistical processing from past interaction data;
By weighting calculation on the fourth score based on the appearance probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request are obtained by statistical processing from past dialogue data To give a calculated score,
An interactive method using statistical information.

ユーザとの対話機能を有して、ユーザからの情報要求を入力するたびに、ユーザとのやり取りの履歴を用いてユーザの情報要求内容を特定し、それに応じてユーザに対して応答する対話装置であって、
ユーザの情報要求内容の候補を記憶する記憶手段と、
上記入力したユーザの情報要求に応答して、該ユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、上記記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成する生成手段と、
上記新たなユーザ情報要求内容の候補に対して、過去の対話データから得られる統計的基準を用いてスコアを付与する付与手段と、
上記スコアに応じて、上記新たなユーザ情報要求内容の候補の中からユーザの情報要求内容を特定する特定手段と、
上記新たなユーザ情報要求内容の候補を上記記憶手段に書き込むことで、上記記憶手段に記憶されるユーザ情報要求内容の候補を更新する更新手段とを備え、
上記付与手段は、上記新たなユーザ情報要求内容の候補に対して、
上記新たなユーザ情報要求内容の候補の生成元となった上記記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、
上記入力したユーザ情報要求を解釈したときに得られる第２のスコアと、
過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、
過去の対話データから統計処理により得られる、上記記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与することを、
特徴とする統計情報を用いた対話装置。An interactive device that has an interactive function with a user, specifies the content of the user's information request using a history of interaction with the user each time an information request from the user is input, and responds to the user accordingly Because
Storage means for storing candidates for information request contents of the user;
In response to the input information required Yu chromatography The, respectively of the user information request content obtained by one or more interpretations to the user information request is present, the user information request contents stored in the storage means Generating means for generating one or more new user information request content candidates from a combination with each of
With respect to the upper SL new User chromatography The information request content of the candidate, and assigning means for assigning scores using statistical criteria derived from past interactive data,
Depending on the score, and identifying means for identifying the information request content of the user from the candidates of the upper SL new User chromatography The information request content,
Candidate above SL new User chromatography The information request content by writing in the storage means, and a updating means for updating the candidate of the user information request contents stored in the storage means,
The granting means, for the new candidate user information request content,
A first score possessed by the user information request content stored in the storage means that is a generation source of the new user information request content candidate;
A second score obtained when interpreting the input user information request;
A third score based on the appearance probability of a pattern in which a user input information request and a response to it are obtained by statistical processing from past interaction data;
By weighting calculation on the fourth score based on the appearance probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request are obtained by statistical processing from past dialogue data To give a calculated score,
An interactive device using statistical information.

ユーザとの対話機能を有して、ユーザからの情報要求を入力するたびに、ユーザとのやり取りの履歴を用いてユーザの情報要求内容を特定し、それに応じてユーザに対して応答する対話装置で実行される対話方法をコンピュータに実行させるための対話プログラムであって、
コンピュータに、
上記入力したユーザの情報要求に応答して、該ユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、ユーザの情報要求内容の候補を記憶する記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成する手順と、
上記新たなユーザ情報要求内容の候補に対して、上記新たなユーザ情報要求内容の候補の生成元となった上記記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、上記入力したユーザ情報要求を解釈したときに得られる第２のスコアと、過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、過去の対話データから統計処理により得られる、上記記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与する手順と、
上記スコアに応じて、上記新たなユーザ情報要求内容の候補の中からユーザの情報要求内容を特定する手順と、
上記新たなユーザ情報要求内容の候補を上記記憶手段に書き込むことで、上記記憶手段に記憶されるユーザ情報要求内容の候補を更新する手順とを実行させるための対話プログラム。 Have interactivity with the user, each time you enter an information request from a user to identify the information request content of the user using the exchange history with the user responds to the User chromatography THE accordingly An interactive program for causing a computer to execute an interactive method executed by an interactive device,
On the computer,
In response to the input user information request, storage means for storing each of the user information request contents obtained by the presence of one or more interpretations in the user information request and candidates for the user information request contents A procedure for generating one or more new user information request content candidates from a combination with each stored user information request content;
For the new user information request content candidate, the first score of the user information request content stored in the storage means that is the generation source of the new user information request content candidate and the input A second score obtained by interpreting the user information request, and a third score based on a probability of occurrence of a pattern in which a user input information request and a response to it are obtained by statistical processing from past interaction data; , Weighting calculation for the fourth score based on the appearance probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request co-occurring are obtained by statistical processing from past dialogue data A procedure for assigning a score calculated by:
In accordance with the score, a procedure for identifying a user information request content from among the new user information request content candidates;
The new user information request content of the candidate by writing in the storage means, interactive program for executing a procedure for updating the candidate of the user information request contents stored in the storage means.

ユーザとの対話機能を有して、ユーザからの情報要求を入力するたびに、ユーザとのやり取りの履歴を用いてユーザの情報要求内容を特定し、それに応じてユーザに対して応答する対話装置で実行される対話方法をコンピュータに実行させるための対話プログラムを記録した記録媒体であって、
コンピュータに、
上記入力したユーザの情報要求に応答して、該ユーザ情報要求に１つ以上の解釈が存在することで得られるユーザ情報要求内容のそれぞれと、ユーザの情報要求内容の候補を記憶する記憶手段に記憶されるユーザ情報要求内容のそれぞれとの組み合わせから、１つ以上の新たなユーザ情報要求内容の候補を生成する手順と、
上記新たなユーザ情報要求内容の候補に対して、上記新たなユーザ情報要求内容の候補の生成元となった上記記憶手段に記憶されるユーザ情報要求内容が持つ第１のスコアと、上記入力したユーザ情報要求を解釈したときに得られる第２のスコアと、過去の対話データから統計処理により得られる、ユーザ入力の情報要求とそれに対する応答とが連鎖するパターンの出現確率による第３のスコアと、過去の対話データから統計処理により得られる、上記記憶手段に記憶されるユーザ情報要求内容と後続するユーザ入力の情報要求とが共起するパターンの出現確率による第４のスコアとについての重み付け計算によって算出されるスコアを付与する手順と、
上記スコアに応じて、上記新たなユーザ情報要求内容の候補の中からユーザの情報要求内容を特定する手順と、
上記新たなユーザ情報要求内容の候補を上記記憶手段に書き込むことで、上記記憶手段に記憶されるユーザ情報要求内容の候補を更新する手順とを実行させるための対話プログラムを記録した記録媒体。 An interactive device that has an interactive function with a user, specifies the content of the user's information request using a history of interaction with the user each time an information request from the user is input, and responds to the user accordingly A recording medium for recording an interactive program for causing a computer to execute the interactive method executed in
On the computer,
In response to the input user information request, storage means for storing each of the user information request contents obtained by the presence of one or more interpretations in the user information request and candidates for the user information request contents A procedure for generating one or more new user information request content candidates from a combination with each stored user information request content;
For the new user information request content candidate, the first score of the user information request content stored in the storage means that is the generation source of the new user information request content candidate and the input A second score obtained by interpreting the user information request, and a third score based on a probability of occurrence of a pattern in which a user input information request and a response to it are obtained by statistical processing from past interaction data; , Weighting calculation for the fourth score based on the appearance probability of the pattern in which the user information request content stored in the storage means and the subsequent user input information request co-occurring are obtained by statistical processing from past dialogue data A procedure for assigning a score calculated by:
In accordance with the score, a procedure for identifying a user information request content from among the new user information request content candidates;
The new user information request content of the candidate by writing in the storage means, recording medium recording a conversation program for executing a procedure for updating the candidate of the user information request contents stored in the storage means.