JP6824795B2

JP6824795B2 - Correction device, correction method and correction program

Info

Publication number: JP6824795B2
Application number: JP2017052980A
Authority: JP
Inventors: 峻平佐野; 伸裕鍜治; 颯々野　学; 学颯々野
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2021-02-03
Anticipated expiration: 2037-03-17
Also published as: JP2018156418A

Description

本発明は、修正装置、修正方法および修正プログラムに関する。 The present invention relates to a modification device, a modification method, and a modification program.

従来、利用者の発話に対して応答を出力する技術が知られている。このような技術の一例として、対話データを学習することにより、対話モデルを生成し、生成した対話モデルを用いて、利用者の発話に対する応答を生成する技術が知られている。 Conventionally, a technique for outputting a response to a user's utterance has been known. As an example of such a technique, there is known a technique of generating a dialogue model by learning dialogue data and generating a response to a user's utterance by using the generated dialogue model.

特開２０１３−１０５４３６号公報Japanese Unexamined Patent Publication No. 2013-105436

“Characterizing and Predicting Voice Query Reformulation”,Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem and Rosie Jones, Microsoft Redmond, WA USA“Characterizing and Predicting Voice Query Reformulation”, Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem and Rosie Jones, Microsoft Redmond, WA USA

しかしながら、上述した従来技術では、誤りが生じた際に効率的な修正を実現できるとは言えない場合がある。 However, it may not be possible to say that the above-mentioned conventional technique can realize efficient correction when an error occurs.

例えば、従来技術においては、対話モデルとして、利用者の発話をテキストに変換する音声認識モデルと、テキストから発話の意図を解析する意図解析モデルと、解析した意図から応答を生成する応答生成モデルとを段階的に用いて応答が生成される。このような従来技術では、応答を生成する際に一部のモデルにおいて誤りが生じた場合にも、誤った応答が生成される。このため、誤った応答を用いて、対話モデル全体を修正した場合には、正しい処理を実行したモデルも修正が行われてしまい、適切な修正を実現できない恐れがある。 For example, in the prior art, as a dialogue model, a voice recognition model that converts a user's utterance into text, an intention analysis model that analyzes the intention of the utterance from the text, and a response generation model that generates a response from the analyzed intention. Is used stepwise to generate a response. In such a prior art, an erroneous response is generated even if an error occurs in some models when generating the response. Therefore, if the entire dialogue model is modified by using an erroneous response, the model that has executed the correct processing will also be modified, and there is a risk that an appropriate modification cannot be realized.

本願は、上記に鑑みてなされたものであって、誤りが生じた際に効率的な修正を実現することを目的とする。 The present application has been made in view of the above, and an object of the present application is to realize an efficient correction when an error occurs.

本願に係る修正装置は、利用者が入力した入力情報から複数の処理を段階的に実行することで生成された出力情報に対する利用者の反応に基づいて、複数の前記処理のうち修正対象となる処理を特定する特定部と、前記特定部により特定された処理の内容を修正する修正部とを有することを特徴とする。 The correction device according to the present application is a correction target among the plurality of the above-mentioned processes based on the reaction of the user to the output information generated by executing a plurality of processes stepwise from the input information input by the user. It is characterized by having a specific unit for specifying a process and a correction unit for modifying the content of the process specified by the specific unit.

実施形態の一態様によれば、誤りが生じた際に効率的な修正を実現できる。 According to one aspect of the embodiment, efficient correction can be realized when an error occurs.

図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。FIG. 1 is a diagram showing an example of processing executed by the information providing device according to the embodiment. 図２は、実施形態に係る情報提供装置の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the information providing device according to the embodiment. 図３は、実施形態に係る対話モデルデータベースに登録される情報の一例を示す図である。FIG. 3 is a diagram showing an example of information registered in the dialogue model database according to the embodiment. 図４は、実施形態に係る学習データデータベースに登録される情報の一例を示す図である。FIG. 4 is a diagram showing an example of information registered in the learning data database according to the embodiment. 図５は、実施形態に係る情報提供装置が誤りを分類する処理の一例を示す図である。FIG. 5 is a diagram showing an example of processing in which the information providing device according to the embodiment classifies errors. 図６は、実施形態に係る情報提供装置が実行する修正処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of a flow of correction processing executed by the information providing device according to the embodiment. 図７は、ハードウェア構成の一例を示す図である。FIG. 7 is a diagram showing an example of a hardware configuration.

以下に、本願に係る修正装置、修正方法および修正プログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る修正装置、修正方法および修正プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略する。 Hereinafter, a modification device, a modification method, and a mode for implementing the modification program according to the present application (hereinafter, referred to as “embodiment”) will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the modification device, modification method, and modification program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description will be omitted.

〔１−１．情報提供装置の概要〕
まず、図１を用いて、修正処理を実行する修正装置の一例である情報提供装置が実行する決定処理の一例について説明する。図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。なお、以下の説明では、情報提供装置１０が実行する処理として、利用者Ｕの発話に対する応答を生成する応答生成処理と、応答に対する利用者Ｕの発話に基づいて、応答生成処理を修正する修正処理との一例について説明する。すなわち、情報提供装置１０は、利用者Ｕとの対話を実現する対話システムである。 [1-1. Overview of information providing equipment]
First, with reference to FIG. 1, an example of a determination process executed by an information providing device, which is an example of a correction device that executes a correction process, will be described. FIG. 1 is a diagram showing an example of processing executed by the information providing device according to the embodiment. In the following description, as the process executed by the information providing device 10, the response generation process for generating the response to the utterance of the user U and the response generation process for modifying the response generation process based on the utterance of the user U for the response are modified. An example of processing will be described. That is, the information providing device 10 is a dialogue system that realizes a dialogue with the user U.

情報提供装置１０は、インターネット等の所定のネットワークＮ（例えば、図２を参照。）を介して、利用者端末１００と通信可能な情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。なお、情報提供装置１０は、ネットワークＮを介して、任意の数の利用者端末１００と通信可能であってよい。 The information providing device 10 is an information processing device capable of communicating with the user terminal 100 via a predetermined network N (for example, see FIG. 2) such as the Internet, and is realized by, for example, a server device or a cloud system. Will be done. The information providing device 10 may be able to communicate with an arbitrary number of user terminals 100 via the network N.

利用者端末１００は、対話システムで対話を行う利用者Ｕが使用する情報処理装置であり、ＰＣ（Personal Computer）、サーバ装置、スマートデバイスといった情報処理装置により実現される。例えば、利用者端末１００は、利用者Ｕが発話した音声を取得すると、音声データを発話として情報提供装置１０へと送信する。なお、利用者端末１００は、利用者Ｕが入力した文字列を発話として情報提供装置１０へと送信してもよい。 The user terminal 100 is an information processing device used by a user U who has a dialogue in a dialogue system, and is realized by an information processing device such as a PC (Personal Computer), a server device, and a smart device. For example, when the user terminal 100 acquires the voice spoken by the user U, the user terminal 100 transmits the voice data as an utterance to the information providing device 10. The user terminal 100 may transmit the character string input by the user U to the information providing device 10 as an utterance.

このような場合、情報提供装置１０は、音声データを解析してテキスト化する音声認識処理、音声認識処理の結果であるテキストを用いて、利用者Ｕの発話の意図の解析等といった各種の意図解析を行う意図解析処理、および意図解析処理の結果を用いて応答を生成する応答生成処理を実行し、発話に対する応答を生成する。すなわち、情報提供装置１０は、音声認識処理、意図解析処理、および応答生成処理といった段階的に実行させる複数の処理を含む応答処理を実行することで、利用者Ｕの発話から応答を生成する。なお、情報提供装置１０は、テキスト形式の応答を生成してもよく、テキストの読み上げを行った音声データを応答として生成してもよい。そして、情報提供装置１０は、応答を利用者端末１００へと送信する。この結果、利用者端末１００は、応答として生成された各種テキストの読み上げ技術や、音声データの再生を行うことで、利用者Ｕとの対話を実現する。 In such a case, the information providing device 10 has various intentions such as voice recognition processing for analyzing voice data and converting it into text, analysis of the intention of the user U's utterance using the text resulting from the voice recognition processing, and the like. The intention analysis process for performing analysis and the response generation process for generating a response using the result of the intention analysis process are executed to generate a response to the utterance. That is, the information providing device 10 generates a response from the utterance of the user U by executing a response process including a plurality of processes to be executed stepwise, such as a voice recognition process, an intention analysis process, and a response generation process. The information providing device 10 may generate a response in text format, or may generate voice data obtained by reading the text as the response. Then, the information providing device 10 transmits the response to the user terminal 100. As a result, the user terminal 100 realizes a dialogue with the user U by performing a reading technique of various texts generated as a response and a reproduction of voice data.

〔１−２．決定処理について〕
ここで、対話処理においては、利用者Ｕの発話に対して適切な応答を出力することができない場合、すなわち、誤りとなる応答を出力する場合がある。このような誤りは、応答処理全体として発生するよりも、応答処理に含まれる各処理のいずれかにおいて生じる可能性が高い。 [1-2. About decision processing]
Here, in the dialogue processing, there is a case where an appropriate response cannot be output to the utterance of the user U, that is, a response which is an error may be output. Such an error is more likely to occur in any of the processes included in the response process than it occurs in the response process as a whole.

例えば、音声認識処理では、音声からテキストを生成する音声認識モデルをもちいて、音声からテキストが生成される。しかしながら、このような音声認識処理では、「一番高い（ｉｃｈｉｂａｎｔａｋａｉ）」といった発話の音声のうち、最後の「い（ｉ）」の音声を適切に認識できず、「一番鷹（ｉｔｉｂａｎｔａｋａ）」といった認識を行ってしまう場合がある。このように、音声認識処理で誤りが生じた場合、続く意図解析処理および応答生成処理が、誤った音声認識に基づいて実行されるため、誤りが蓄積し、適切ではない応答を出力する恐れがある。 For example, in speech recognition processing, text is generated from speech using a speech recognition model that generates text from speech. However, in such a voice recognition process, among the voices of utterances such as "highest (ichibantakai)", the last voice of "i (i)" cannot be properly recognized, and "first hawk (itibantaka)". It may be recognized as such. In this way, if an error occurs in the speech recognition process, the subsequent intention analysis process and response generation process are executed based on the incorrect speech recognition, so that errors may accumulate and an inappropriate response may be output. is there.

また、音声認識処理が適切に行われたとしても、意図解析処理で誤った意図解析が行われる場合がある。例えば、意図解析処理においては、テキストから発話の意図を解析する意図解析モデルを用いて、利用者Ｕの発話の意図を解析する場合が多い。しかしながら、このような意図解析においては、学習が不十分である場合等に、誤った意図が解析される恐れがある。例えば、「一番高い山は？」といったテキストには、「一番高い」といった形容詞と「山」といった名詞が含まれるが、意図解析モデルの精度が十分ではない場合、「山」といった名詞の意図が曖昧に解析され、例えば、「一番高い」、「建造物」を聞いていると解析される場合がある。より具体的な例では、意図解析モデルの学習が不十分な結果、「山」の概念と「建造物」の概念とが類似もしくは共通する概念として学習（分類）されていた場合、このような誤りが生じる恐れがある。このような意図解析処理が行われた場合、後述する応答生成処理では、誤った意図解析の結果に基づいて、応答が生成されるので、例えば、「スカイツリー」等といった応答を生成してしまう恐れがある。 Further, even if the voice recognition process is properly performed, an erroneous intention analysis may be performed in the intention analysis process. For example, in the intention analysis process, the intention of the user U is often analyzed by using an intention analysis model that analyzes the intention of the utterance from the text. However, in such intention analysis, there is a risk that an erroneous intention will be analyzed when learning is insufficient. For example, a text such as "What is the highest mountain?" Contains an adjective such as "highest" and a noun such as "mountain", but if the intent analysis model is not accurate enough, the noun such as "mountain" The intention may be ambiguously analyzed, for example, when listening to "highest" or "building". In a more specific example, if the concept of "mountain" and the concept of "building" are learned (classified) as similar or common concepts as a result of insufficient learning of the intention analysis model, such There is a risk of error. When such an intention analysis process is performed, in the response generation process described later, a response is generated based on the result of an erroneous intention analysis, so that a response such as "Sky Tree" is generated, for example. There is a fear.

また、応答生成処理においては、意図解析処理の結果から対応する応答を生成する応答生成モデルを用いて、応答となるテキストが生成される場合がある。例えば、意図解析処理においては、所謂スロットフィリングと呼ばれる技術を用いて、発話の意図が解析される場合がある。このようなスロットフィリングの技術では、応答を生成するための処理ごとに、その処理を実行するために必要な項目が予め対応付けられており、各項目ごとに意図の解析を行う。そして、スロットフィリングの技術では、処理を実行するために必要な全ての項目について、意図の解析が行われた場合は、その処理内容と、解析した全項目の意図を意図解析結果として出力する。 Further, in the response generation process, a text to be a response may be generated by using a response generation model that generates a corresponding response from the result of the intention analysis process. For example, in the intention analysis process, the intention of the utterance may be analyzed by using a technique called so-called slot filling. In such a slot filling technique, items necessary for executing the process are associated with each process for generating a response in advance, and the intention is analyzed for each item. Then, in the slot filling technique, when the intention is analyzed for all the items necessary for executing the process, the processing content and the intention of all the analyzed items are output as the intention analysis result.

例えば、意図解析モデルは、「一番高い山は？」といった音声認識結果を受付けた場合、処理内容「検索」について、形容詞「一番高い」、および名詞「山」を意図解析結果として出力する。この結果、応答生成モデルは、一番高い山の検索を行い、検索結果「エベレスト」を出力する。しかしながら、このような応答は、利用者Ｕが「日本で一番高い山」の検索を所望していた場合には、誤りとなる。すなわち、意図解析モデルや応答生成モデルの学習が不十分である場合には、利用者Ｕが表明していない意図を応答に反映させることができず、誤った結果を出力してしまう恐れがある。 For example, when the intention analysis model receives a voice recognition result such as "What is the highest mountain?", The adjective "highest" and the noun "mountain" are output as the intention analysis result for the processing content "search". .. As a result, the response generation model searches for the highest mountain and outputs the search result "Everest". However, such a response would be erroneous if User U wanted to search for "the highest mountain in Japan". That is, if the learning of the intention analysis model and the response generation model is insufficient, the intention not expressed by the user U cannot be reflected in the response, and an erroneous result may be output. ..

ここで、応答を出力した際に、利用者Ｕから受付けた発話の内容に基づいて、応答が正しい応答であったか否かを推定し、正しい応答であったと推定した場合は、応答を正解データとし、誤った応答であったと推定した場合は、応答を不正解データとして各モデルの再学習を行うといった手法が考えられる。しかしながら、このような技術では、実際に誤りが生じた処理のみならず、正しい処理についても、不正解データを用いた学習が行われてしまい、全体としての対話精度が低下してしまう恐れがある。 Here, when the response is output, it is estimated whether or not the response was the correct response based on the content of the utterance received from the user U, and if it is estimated that the response was the correct response, the response is regarded as the correct answer data. If it is presumed that the response was incorrect, a method of retraining each model using the response as incorrect answer data can be considered. However, with such a technique, not only the processing in which an error actually occurs but also the correct processing is learned using incorrect answer data, and the dialogue accuracy as a whole may be lowered. ..

そこで、情報提供装置１０は、以下の修正処理を実行する。まず、情報提供装置１０は、利用者が入力した入力情報から複数の処理を段階的に実行することで生成された出力情報に対する利用者の反応に基づいて、複数の処理のうち修正対象となる処理を特定する。。そして、情報提供装置１０は、特定された処理の内容を修正する。 Therefore, the information providing device 10 executes the following correction processing. First, the information providing device 10 is a correction target among the plurality of processes based on the reaction of the user to the output information generated by executing a plurality of processes stepwise from the input information input by the user. Identify the process. .. Then, the information providing device 10 corrects the content of the specified process.

例えば、情報提供装置１０は、応答に対する利用者Ｕの反応から、応答が正解であった否かを判定するとともに、誤りであったと判定した場合は、利用者Ｕの反応から、応答処理として段階的に実行される各処理のうち、どの処理で誤りが生じたかを特定する。そして、情報提供装置１０は、特定した処理の修正を行う。例えば、情報提供装置１０は、誤りであると判定した応答と、その応答の前に受け付けた利用者Ｕの発話とを不正解ペアとして、特定した処理において使用されるモデルの再学習を行う。 For example, the information providing device 10 determines whether or not the response is correct from the reaction of the user U to the response, and if it determines that the response is incorrect, the information providing device 10 steps from the reaction of the user U as a response process. Identify in which process the error occurred among the processes to be executed. Then, the information providing device 10 corrects the specified process. For example, the information providing device 10 relearns the model used in the specified process by using the response determined to be incorrect and the utterance of the user U received before the response as an incorrect answer pair.

このように、情報提供装置１０は、生成処理全体ではなく、生成処理として段階的に実行される処理のうち、誤りが生じた処理のみについて、修正を行うので、生成処理全体としての処理精度を低下させることなく、誤りの修正を行うことができるので、効率的な修正を行うことができる。 In this way, the information providing device 10 corrects only the process in which an error occurs among the processes executed stepwise as the generation process, not the entire generation process, so that the processing accuracy of the entire generation process can be improved. Since the error can be corrected without lowering it, efficient correction can be performed.

〔１−３．決定処理の一例について〕
次に、図１を用いて、情報提供装置１０が実行する決定処理の一例について説明する。まず、情報提供装置１０は、利用者端末１００から発話＃１を受付ける（ステップＳ１）。このような場合、情報提供装置１０は、発話から応答を生成する応答処理を実行する（ステップＳ２）。 [1-3. About an example of decision processing]
Next, an example of the determination process executed by the information providing device 10 will be described with reference to FIG. First, the information providing device 10 receives the utterance # 1 from the user terminal 100 (step S1). In such a case, the information providing device 10 executes a response process for generating a response from the utterance (step S2).

例えば、情報提供装置１０は、発話の音声データを受付けると、発話のテキストを生成する音声認識モデルを用いて、音声データからテキストを生成する音声認識処理を実行する。また、情報提供装置１０は、音声認識処理の結果であるテキストの入力を受付けると、テキストから利用者Ｕの発話の意図を解析し、解析結果となるパラメータを出力する意図解析モデルを用いて、発話＃１の意図を解析する意図解析処理を実行する。また、情報提供装置１０は意図解析処理の結果となるパラメータから応答内容を生成する応答生成モデルを用いて、発話＃１に対する応答＃１を生成する応答生成処理を実行する。なお、このような音声認識モデル、意図解析モデルおよび応答生成モデルは、例えば、ＳＶＭ（Support Vector Machine）やＤＮＮ（Deep Neural Network）等、任意の学習器又は分類器等のモデルにより実現される。そして、情報提供装置１０は、生成した応答＃１を利用者端末１００に出力する（ステップＳ３）。 For example, when the information providing device 10 receives the voice data of the utterance, the information providing device 10 executes the voice recognition process of generating the text from the voice data by using the voice recognition model that generates the text of the utterance. Further, when the information providing device 10 receives the input of the text that is the result of the voice recognition processing, the information providing device 10 analyzes the intention of the user U's utterance from the text and uses an intention analysis model that outputs a parameter that is the analysis result. Execute the intention analysis process to analyze the intention of utterance # 1. Further, the information providing device 10 executes a response generation process for generating a response # 1 to the utterance # 1 by using a response generation model that generates a response content from the parameters resulting from the intention analysis process. It should be noted that such a speech recognition model, an intention analysis model, and a response generation model are realized by, for example, a model of an arbitrary learning device or classifier such as SVM (Support Vector Machine) or DNN (Deep Neural Network). Then, the information providing device 10 outputs the generated response # 1 to the user terminal 100 (step S3).

ここで、情報提供装置１０は、利用者端末１００から、応答＃１に続く発話＃２を受付ける（ステップＳ４）。このような発話＃２の内容は、利用者Ｕが応答＃１の内容に満足しているか否か、発話＃１の内容を誤って解析していたか否か等、応答＃１に誤りがあったか、どのような誤りがあったかを判断するための指標となりうる。そこで、情報提供装置１０は、発話＃２の内容から、応答＃１に誤りが含まれるか否かを判定し、誤りが含まれると推定される場合は、誤りの内容を分類し、分類結果に応じた処理におけるモデルの修正を行う（ステップＳ５）。 Here, the information providing device 10 receives the utterance # 2 following the response # 1 from the user terminal 100 (step S4). Regarding the content of utterance # 2, whether or not the user U is satisfied with the content of response # 1, whether or not the content of utterance # 1 has been erroneously analyzed, and whether or not there is an error in response # 1. , Can be an index to judge what kind of error was made. Therefore, the information providing device 10 determines whether or not the response # 1 contains an error from the content of the utterance # 2, and if it is presumed that the response # 1 contains an error, classifies the content of the error and classifies the result. The model is modified in the process according to the above (step S5).

例えば、情報提供装置１０は、発話＃２から、音声認識処理、意図解析処理、および応答生成処理を段階的に実行することで、発話＃２に対する応答＃２を生成する。このような応答処理とともに、情報提供装置１０は、発話＃２の音声認識処理の結果となるテキストを取得し、取得したテキストから誤りの種別を分類する分類処理を実行する。より具体的には、情報提供装置１０は、発話の音声認識を行う音声認識処理と、音声認識の結果から発話の意図を解析する意図解析処理と、意図解析処理の結果から発話に対する応答を生成する応答生成処理とのうち、修正対象となる処理を特定する。 For example, the information providing device 10 generates a response # 2 to the utterance # 2 by sequentially executing the voice recognition process, the intention analysis process, and the response generation process from the utterance # 2. Along with such a response process, the information providing device 10 acquires a text resulting from the voice recognition process of the utterance # 2, and executes a classification process for classifying the type of error from the acquired text. More specifically, the information providing device 10 generates a voice recognition process for recognizing the voice of the utterance, an intention analysis process for analyzing the intention of the utterance from the result of the voice recognition, and a response to the utterance from the result of the intention analysis process. Among the response generation processes to be performed, the process to be modified is specified.

例えば、情報提供装置１０は、応答に含まれる誤りの種別を学習した誤り分類モデルを用いて、発話＃２のテキストから応答＃１に含まれる誤りの分類を行う。このような誤り分類モデルは、例えば、ある発話の特徴と、その発話に対する応答の特徴と、その応答に含まれる誤りの種別の特徴との間の共起性を学習したＤＮＮ等により実現される。なお、誤り分類モデルは、ある発話の特徴と、その発話に対する応答の特徴とから、その応答に含まれる誤りの種別を分類するように学習が行われた分類器であってもよい。すなわち、情報提供装置１０は、ある発話と、その発話に対する応答とから、その応答に含まれる誤りの種別を推定することができるモデルを誤り分類モデルとして用いるのであれば、任意の学習が行われたモデルを採用してよい。 For example, the information providing device 10 classifies the error included in the response # 1 from the text of the utterance # 2 by using the error classification model that has learned the types of errors included in the response. Such an error classification model is realized, for example, by learning the co-occurrence between the characteristics of a certain utterance, the characteristics of the response to the utterance, and the characteristics of the type of error included in the response. .. The error classification model may be a classifier that has been trained to classify the types of errors included in the response from the characteristics of a certain utterance and the characteristics of the response to the utterance. That is, if the information providing device 10 uses a model capable of estimating the type of error included in the response from a certain utterance and the response to the utterance as the error classification model, arbitrary learning is performed. You may adopt the model.

また、情報提供装置１０は、音声認識処理の結果ではなく、発話＃２の音声データと、応答＃１とから直接誤りの分類を行ってもよい。例えば、情報提供装置１０は、音声データと、その音声データに含まれる発話の内容や音声特有の各種情報に基づいて付与されたラベルであって、どのような誤りを示すかを示すラベルとの間の関係を学習した誤り分類モデルを用いて、発話＃２の音声データから、応答＃１に含まれる誤りの種別の分類を行ってもよい（例えば、非特許文献１を参照。）。 Further, the information providing device 10 may directly classify the error from the voice data of the utterance # 2 and the response # 1 instead of the result of the voice recognition process. For example, the information providing device 10 is a label given based on the voice data and various information peculiar to the voice and the content of the utterance included in the voice data, and is a label indicating what kind of error is shown. Using the error classification model that learned the relationship between the two, the type of error included in the response # 1 may be classified from the voice data of the utterance # 2 (see, for example, Non-Patent Document 1).

ここで、情報提供装置１０は、誤りの具体的な種別を特定する必要はない。例えば、情報提供装置１０は、誤りが、音声認識処理において生じたものであるか、意図解析処理において生じたものであるか、応答生成処理において生じたものであるかを分類できるのであれば、任意の分類を行ってよい。すなわち、情報提供装置１０は、段階的に実行させる複数の処理のうち、どの処理に起因する誤りであるかを分類することができるのであれば、任意の態様で、誤りの分類を行ってよい。 Here, the information providing device 10 does not need to specify a specific type of error. For example, if the information providing device 10 can classify whether the error is caused in the voice recognition process, the intention analysis process, or the response generation process. Any classification may be performed. That is, the information providing device 10 may classify the error in any manner as long as it can classify which process is the cause of the error among the plurality of processes to be executed stepwise. ..

そして、情報提供装置１０は、誤りの分類結果に応じた処理の修正を行う。例えば、情報提供装置１０は、応答＃１に含まれる誤りの種別が、音声認識誤り等、音声認識処理に起因するものである場合は、音声認識処理の修正を行う。例えば、情報提供装置１０は、応答＃１を生成する際に音声認識モデルに入力した音声データ、すなわち、発話＃１の音声データと、応答＃１を生成する際に音声認識モデルが出力したテキストとを不正解ペアとして、音声認識モデルの再学習を行う。 Then, the information providing device 10 corrects the processing according to the error classification result. For example, the information providing device 10 corrects the voice recognition process when the type of the error included in the response # 1 is caused by the voice recognition process such as a voice recognition error. For example, the information providing device 10 inputs voice data to the voice recognition model when generating response # 1, that is, voice data of speech # 1 and text output by the voice recognition model when generating response # 1. The speech recognition model is relearned with and as an incorrect answer pair.

また、例えば、情報提供装置１０は、応答＃１に含まれる誤りの種別が、意図解析の誤りである場合等、意図解析処理に起因するものである場合は、意図解析処理の修正を行う。例えば、情報提供装置１０は、応答＃１を生成する際に音声認識モデルが出力したテキスト、すなわち、発話＃１のテキストと、応答＃１を生成する際に意図解析モデルが出力したパラメータとを不正解ペアとして、音声認識モデルの再学習を行う。 Further, for example, the information providing device 10 corrects the intention analysis process when the type of the error included in the response # 1 is an error of the intention analysis or the like is caused by the intention analysis process. For example, the information providing device 10 outputs the text output by the speech recognition model when generating the response # 1, that is, the text of the utterance # 1 and the parameters output by the intention analysis model when generating the response # 1. The speech recognition model is relearned as an incorrect answer pair.

また、例えば、情報提供装置１０は、応答＃１に含まれる誤りの種別が、発話意図の解析不足等、応答生成処理に起因するものである場合は、応答生成処理の修正を行う。例えば、情報提供装置１０は、応答＃１を生成する際に意図解析モデルが出力したパラメータ、すなわち、発話＃１の意図を示すパラメータと、応答生成モデルが出力した応答＃１とを不正解ペアとして、応答生成モデルの再学習を行う。 Further, for example, when the type of error included in the response # 1 is due to the response generation process such as insufficient analysis of the utterance intention, the information providing device 10 corrects the response generation process. For example, the information providing device 10 makes an incorrect pair of the parameter output by the intention analysis model when generating the response # 1, that is, the parameter indicating the intention of the utterance # 1 and the response # 1 output by the response generation model. As a result, the response generation model is retrained.

このように、情報提供装置１０は、利用者Ｕの発話から複数の処理を段階的に実行することで生成された応答に対する利用者Ｕの発話に基づいて、複数の処理のうち修正対象となる処理を特定する。例えば、情報提供装置１０は、利用者Ｕの発話に基づいて、応答に含まれる誤りの種別を推定し、複数の処理のうち、推定した種別と対応する処理を修正対象として特定する。 In this way, the information providing device 10 is a correction target among the plurality of processes based on the utterance of the user U in response to the response generated by stepwise executing the plurality of processes from the utterance of the user U. Identify the process. For example, the information providing device 10 estimates the type of error included in the response based on the utterance of the user U, and specifies the process corresponding to the estimated type as the correction target among the plurality of processes.

そして、情報提供装置１０は、特定した処理の内容を修正する。例えば、情報提供装置１０は、それぞれ異なるモデルを用いる複数の処理を段階的に実行することで生成された応答に対する利用者Ｕの発話に基づいて、複数の処理のうち修正対象となる処理を特定し、特定された処理に用いるモデルを再学習する。 Then, the information providing device 10 corrects the content of the specified process. For example, the information providing device 10 identifies a process to be modified among the plurality of processes based on the utterance of the user U in response to the response generated by executing a plurality of processes using different models step by step. Then relearn the model used for the identified process.

この結果、情報提供装置１０は、段階的に実行される複数の処理のうち、実際に誤りが生じたと推定される処理のモデルのみを再学習することとなる。この結果、情報提供装置１０は、正しい処理を行ったモデルをそのままに、誤りが生じたモデルのみの修正を実現することができるので、複数の処理全体の精度を悪化させずに、処理の効率的な修正を実現できる。また、情報提供装置１０は、誤りが生じたモデルのみを修正することとなるので、再学習等といった修正に伴う処理コストや時間を削減することができる。 As a result, the information providing device 10 relearns only the model of the process in which it is presumed that an error actually occurred, out of the plurality of processes executed in stages. As a result, the information providing device 10 can realize the correction of only the model in which the error occurs while keeping the model in which the correct processing is performed, so that the processing efficiency is not deteriorated without deteriorating the accuracy of the entire plurality of processing. Modification can be realized. Further, since the information providing device 10 corrects only the model in which the error occurs, it is possible to reduce the processing cost and time associated with the correction such as re-learning.

〔１−４．修正のバリエーションについて〕
上述した例では、情報提供装置１０は、誤りの種別に対応する処理のモデルを修正した。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、誤りの種別に対応する複数の処理を特定し、特定した処理の修正を行ってもよい。 [1-4. About variation of correction]
In the above example, the information providing device 10 modifies the processing model corresponding to the type of error. However, the embodiments are not limited to this. For example, the information providing device 10 may specify a plurality of processes corresponding to the type of error and correct the specified processes.

例えば、情報提供装置１０は、音声認識処理、意図解析処理、および応答生成処理において誤りが生じた確度をそれぞれ算出する誤り分類モデルを用いて、利用者Ｕの発話＃２から、応答＃１の生成において音声認識処理、意図解析処理、および応答生成処理において誤りが生じた確度をそれぞれ算出する。そして、情報提供装置１０は、各処理のうち、誤り分類モデルが算出した確度が所定の閾値を超える処理を特定する。例えば、情報提供装置１０は、意図解析処理と応答生成処理とに誤りが生じた確度が所定の閾値を超える場合は、意図解析処理と応答生成処理とを修正対象とする。 For example, the information providing device 10 uses an error classification model for calculating the probability that an error has occurred in the voice recognition process, the intention analysis process, and the response generation process, respectively, from the utterance # 2 of the user U to the response # 1. In the generation, the probability that an error has occurred in the voice recognition process, the intention analysis process, and the response generation process is calculated. Then, the information providing device 10 identifies a process in which the accuracy calculated by the error classification model exceeds a predetermined threshold value among the processes. For example, when the probability that an error has occurred in the intention analysis process and the response generation process exceeds a predetermined threshold value, the information providing device 10 makes the intention analysis process and the response generation process a correction target.

そして、情報提供装置１０は、意図解析処理と応答生成処理との修正を行う。例えば、情報提供装置１０は、応答＃１を生成した際に意図解析モデルが出力したパラメータを不正解データとして意図解析モデルの再学習を行い、応答＃１を不正解データとして応答生成モデルの再学習を行ってもよい。また、例えば、情報提供装置１０は、応答＃１を生成した際に音声認識処理が出力したテキストと応答＃１とを不正解ペアとして、意図解析モデルおよび応答生成モデルを合わせて再学習してもよい。 Then, the information providing device 10 corrects the intention analysis process and the response generation process. For example, the information providing device 10 relearns the intention analysis model using the parameter output by the intention analysis model as incorrect answer data when the response # 1 is generated, and retrains the response generation model using the response # 1 as incorrect answer data. You may study. Further, for example, the information providing device 10 relearns the intention analysis model and the response generation model together with the text output by the voice recognition process and the response # 1 as an incorrect answer pair when the response # 1 is generated. May be good.

ここで、情報提供装置１０は、任意の手法により、所定の修正を行ってよい。例えば、情報提供装置１０は、誤りが生じたと分類された処理において用いるモデルが前回出力したデータを不正解データとする教師あり学習を実行してもよい。また、情報提供装置１０は、利用者Ｕとの対話を通じてモデルの再学習を行ってもよい。すなわち、情報提供装置１０は、強化学習によるモデルの再学習を行ってもよい。 Here, the information providing device 10 may make a predetermined modification by an arbitrary method. For example, the information providing device 10 may execute supervised learning in which the data previously output by the model used in the process classified as having an error is used as incorrect answer data. Further, the information providing device 10 may relearn the model through a dialogue with the user U. That is, the information providing device 10 may relearn the model by reinforcement learning.

例えば、情報提供装置１０は、ある処理について誤りが生じた確度が所定の閾値よりも高い場合は、負の報酬を設定し、確度が所定の閾値よりも低い場合は、正の報酬を設定する。そして、情報提供装置１０は、モデルに前回入力したデータを強化学習におけるコントローラの状態観測の結果とし、モデルが前回出力したデータを強化学習におけるコントローラの行動とし、利用者Ｕが前回出力した応答に満足しているか否かに基づく報酬をコントローラに対して設定することで、各モデルの強化学習を個別に進めてもよい。すなわち、情報提供装置１０は、誤りの分類結果から、学習対象となる処理を特定し、特定した処理において用いられるモデルの強化学習を実行してもよい。 For example, the information providing device 10 sets a negative reward when the probability that an error has occurred in a certain process is higher than a predetermined threshold value, and sets a positive reward when the probability is lower than the predetermined threshold value. .. Then, the information providing device 10 uses the data previously input to the model as the result of the state observation of the controller in reinforcement learning, the data output last time by the model as the action of the controller in reinforcement learning, and the response previously output by the user U. Reinforcement learning of each model may be individually advanced by setting a reward based on whether or not the controller is satisfied. That is, the information providing device 10 may specify the process to be learned from the error classification result and execute the reinforcement learning of the model used in the specified process.

また、情報提供装置１０は、誤りが生じた確度に基づいて、学習手法を変更してもよい。例えば、情報提供装置１０は、ある処理について誤りが生じた確度が第１の閾値以上、第２の閾値未満となる場合は、その処理において用いるモデルが前回出力したデータを不正解データとする教師あり学習を実行し、第２の閾値以上となる場合は、利用者Ｕとの対話を通じて学習を行う、所謂強化学習を実行してもよい。 Further, the information providing device 10 may change the learning method based on the probability that an error has occurred. For example, when the probability that an error has occurred in a certain process is equal to or more than the first threshold value and less than the second threshold value, the information providing device 10 is a teacher that uses the data output last time by the model used in the process as incorrect answer data. If the existence learning is executed and the threshold value is equal to or higher than the second threshold value, so-called reinforcement learning, in which learning is performed through dialogue with the user U, may be executed.

また、情報提供装置１０は、正解データを用いたモデルの再学習を行ってもよい。例えば、情報提供装置１０は、誤りが含まれないと判定した場合や、誤り分類モデルによって、全ての処理において誤りが生じた確度が所定の閾値よりも低い場合は、前回各モデルが出力したデータを正解データとして、各モデルの教師あり学習や強化学習を実行してもよい。 Further, the information providing device 10 may relearn the model using the correct answer data. For example, if the information providing device 10 determines that no error is included, or if the probability that an error has occurred in all the processes is lower than a predetermined threshold by the error classification model, the data output by each model last time. With the correct answer data, supervised learning or reinforcement learning of each model may be executed.

〔１−５．分類のバリエーションについて〕
ここで、情報提供装置１０は、応答に含まれる誤りの種別を分類することができるのであれば、応答や発話のみならず、他の情報に基づいて、誤りの種別を分類してもよい。例えば、利用者によっては、音声認識がし辛い、発話が不十分になりやすいといった対話態様が考えられる。そこで、情報提供装置１０は、利用者Ｕの発話と、利用者Ｕの属性とに基づいて、誤りの分類を行い、分類結果に対応する処理を修正対象として特定してもよい。 [1-5. About classification variations]
Here, if the information providing device 10 can classify the types of errors included in the response, the information providing device 10 may classify the types of errors based on other information as well as the response and utterance. For example, depending on the user, it is possible to consider a dialogue mode in which voice recognition is difficult and utterance is likely to be insufficient. Therefore, the information providing device 10 may classify errors based on the utterance of the user U and the attributes of the user U, and specify the process corresponding to the classification result as the correction target.

例えば、情報提供装置１０は、利用者Ｕのデモグラフィック属性やサイコグラフィック属性の特徴と、誤りが含まれる応答に対する利用者Ｕの発話の特徴と、誤りが生じた処理とを関係性や共起性を学習した誤り分類モデルを学習する。そして、情報提供装置１０は、利用者Ｕのデモグラフィック属性やサイコグラフィック属性の特徴と、利用者Ｕの発話とに基づいて、誤りが生じた処理の推定を行ってもよい。 For example, the information providing device 10 relates or co-occurs the characteristics of the demographic attribute and the psychographic attribute of the user U, the characteristics of the utterance of the user U in response to the response including the error, and the processing in which the error occurs. Learn the error classification model that learned sex. Then, the information providing device 10 may estimate the processing in which the error has occurred based on the characteristics of the demographic attribute and the psychographic attribute of the user U and the utterance of the user U.

〔１−６．ドメインの考慮について〕
ここで、情報提供装置１０は、意図解析処理として、発話が属する分野（すなわち、ドメイン）を推定し、推定したドメインごとに異なる意図解析モデルを用いて、発話の意図を解析する場合がある。例えば、情報提供装置１０は、発話がドメイン「雑談」に属するか、ドメイン「天気予報」に属するか、ドメイン「経路案内」に属するか等、発話が属するドメインを推定し、推定したドメインの意図解析モデルを用いて、発話の意図を解析する場合がある。 [1-6. About domain consideration]
Here, the information providing device 10 may estimate the field (that is, the domain) to which the utterance belongs as the intention analysis process, and analyze the intention of the utterance by using an intention analysis model different for each estimated domain. For example, the information providing device 10 estimates the domain to which the utterance belongs, such as whether the utterance belongs to the domain "chat", the domain "weather forecast", or the domain "route guidance", and the intent of the estimated domain. An analysis model may be used to analyze the intent of the utterance.

このような場合、情報提供装置１０は、意図解析モデルのうち、応答の生成に用いたモデルの修正を行えばよい。例えば、情報提供装置１０は、意図解析処理において、発話＃１が属するドメインを、ドメイン分類モデルを用いて分類し、分類結果と対応する意図解析モデル（例えば、意図解析モデル＃１）を用いて、発話＃１の意図を解析する。このような場合において、情報提供装置１０は、応答＃１に意図の解析誤りに起因する誤りが含まれると推定した場合は、意図解析モデル＃１の再学習を行えばよい。また、情報提供装置１０は、ドメイン分類モデルと意図解析モデル＃１との両方を、それぞれ個別に再学習してもよい。また、情報提供装置１０は、ドメインの分類に誤りが生じたと推定可能な場合は、ドメイン分類モデルのみの再学習を行ってもよい。 In such a case, the information providing device 10 may modify the model used for generating the response among the intention analysis models. For example, in the intention analysis process, the information providing device 10 classifies the domain to which the utterance # 1 belongs by using the domain classification model, and uses the intention analysis model (for example, the intention analysis model # 1) corresponding to the classification result. , Analyze the intention of utterance # 1. In such a case, if the information providing device 10 estimates that the response # 1 contains an error caused by an intention analysis error, the information providing device 10 may relearn the intention analysis model # 1. Further, the information providing device 10 may individually relearn both the domain classification model and the intention analysis model # 1. Further, the information providing device 10 may relearn only the domain classification model when it can be estimated that an error has occurred in the domain classification.

また、情報提供装置１０は、応答生成モデルをドメインごとに準備し、発話が属するドメインと対応する応答生成モデルを用いて、応答を生成する場合がある。このような場合にも、情報提供装置１０は、応答を生成した際に用いた応答生成モデルの再学習を行えばよい。また、情報提供装置１０は、例えば、発話＃１を分類したドメインに対応する意図解析モデルおよび応答生成モデルを、同時に再学習してもよい。 Further, the information providing device 10 may prepare a response generation model for each domain and generate a response by using the response generation model corresponding to the domain to which the utterance belongs. Even in such a case, the information providing device 10 may relearn the response generation model used when the response was generated. Further, the information providing device 10 may simultaneously relearn the intention analysis model and the response generation model corresponding to the domain in which the utterance # 1 is classified.

〔１−７．誤り分類モデルについて〕
ここで、情報提供装置１０は、利用者Ｕの発話から前回の応答に含まれる誤りの種別や、誤りが生じた処理を推定することができるのであれば、任意の学習手法により学習が行われた誤り分類モデルを用いてもよい。例えば、情報提供装置１０は、利用者Ｕの発話と、その発話に対する応答と、その応答に対する利用者Ｕの新たな発話とを含む三つ組みのデータ（すなわち、トリプル）と、その応答に含まれる誤りの種別との間の関係を学習した誤り分類モデルを用いて、修正対象となる処理の特定を行ってもよい。 [1-7. About the error classification model]
Here, the information providing device 10 learns by an arbitrary learning method as long as it can estimate the type of error included in the previous response and the process in which the error occurred from the utterance of the user U. An error classification model may be used. For example, the information providing device 10 includes a triplet of data (that is, a triple) including an utterance of the user U, a response to the utterance, and a new utterance of the user U to the response, and the response. The process to be corrected may be specified by using an error classification model that has learned the relationship with the error type.

例えば、情報提供装置１０は、利用者Ｕとの対話を通じて、上述したトリプルを取得し、取得したトリプルの応答に誤りが含まれるか否かを判定する。例えば、情報提供装置１０は、（発話＃１、応答＃１、発話＃２）といったトリプルを取得した場合、発話＃２の内容に基づいて、利用者Ｕが応答＃１に満足しているか否かを推定する。すなわち、情報提供装置１０は、利用者Ｕの発話＃１に対する応答＃１への利用者Ｕの反応から、利用者Ｕが応答＃１に満足しているか否かを推定する。 For example, the information providing device 10 acquires the above-mentioned triple through a dialogue with the user U, and determines whether or not the response of the acquired triple contains an error. For example, when the information providing device 10 acquires a triple such as (utterance # 1, response # 1, utterance # 2), whether or not the user U is satisfied with response # 1 based on the content of utterance # 2. Estimate. That is, the information providing device 10 estimates whether or not the user U is satisfied with the response # 1 from the reaction of the user U to the response # 1 to the utterance # 1 of the user U.

そして、情報提供装置１０は、利用者Ｕが応答＃１に満足していないと判定した場合は、トリプルを所定の管理者等に提供し、応答に含まれると推定される誤りの種別を示すラベルを取得する。なお、このようなラベルは、既に誤りの特徴を学習したモデルにより付与されてもよく、クラウドソーシング等により収集されてもよい。そして、情報提供装置１０は、トリプルと、ラベルが示す誤りの種別との間の関係性を誤り分類モデルに学習させる。そして、情報提供装置１０は、対話処理において取得される（発話＃１、応答＃１、発話＃２）といったトリプルを誤り分類モデルに入力することで、応答＃１に含まれる誤りの種別を推定してもよい。 Then, when the information providing device 10 determines that the user U is not satisfied with the response # 1, the information providing device 10 provides the triple to a predetermined administrator or the like, and indicates the type of error presumed to be included in the response. Get the label. It should be noted that such a label may be given by a model that has already learned the characteristics of the error, or may be collected by crowdsourcing or the like. Then, the information providing device 10 causes the error classification model to learn the relationship between the triple and the type of error indicated by the label. Then, the information providing device 10 estimates the type of error included in the response # 1 by inputting the triples (utterance # 1, response # 1, utterance # 2) acquired in the dialogue processing into the error classification model. You may.

なお、情報提供装置１０は、誤り分類モデルを任意の学習手法により学習させてよい。例えば、情報提供装置１０は、トリプルとラベルとの間の関係性を教師あり学習により学習させてもよく、利用者Ｕとの対話を通じた強化学習により学習させてもよい。例えば、情報提供装置１０は、利用者Ｕの発話内容の変遷に基づいて、利用者Ｕの満足度が低下したか否か、若しくは、応答の精度が低下したか否かを推定する。そして、情報提供装置１０は、利用者Ｕの満足度が低下した、若しくは、応答の精度が低下したと推定された場合は、誤り分類モデルの強化学習における報酬の値を負値とすることで、誤り分類モデルの強化学習を実現してもよい。 The information providing device 10 may train the error classification model by an arbitrary learning method. For example, the information providing device 10 may learn the relationship between the triple and the label by supervised learning, or may be learned by reinforcement learning through dialogue with the user U. For example, the information providing device 10 estimates whether or not the satisfaction level of the user U has decreased or whether or not the accuracy of the response has decreased based on the transition of the utterance content of the user U. Then, when it is estimated that the satisfaction level of the user U has decreased or the accuracy of the response has decreased, the information providing device 10 sets the reward value in the reinforcement learning of the error classification model to a negative value. , Reinforcement learning of the error classification model may be realized.

〔１−８．利用者への問合せ〕
また、情報提供装置１０は、誤りが生じた処理の推定結果を適時利用者Ｕに開示することで、推定結果が正しいか否かの確認を行うことで、誤り分類モデルの学習に必要なデータを対話的に取得してもよい。例えば、情報提供装置１０は、利用者Ｕの発話から、応答に誤りが含まれる確度、および、各処理において誤りが生じた確度を推定する。そして、情報提供装置１０は、応答に誤りが含まれる確度が所定の閾値よりも高いにも関わらず、各処理において誤りが生じた確度が所定の閾値よりも低い場合は、「どのような誤りがありましたか？」等といった誤りの種別を問い合わせる応答を出力する。 [1-8. Inquiries to users]
In addition, the information providing device 10 discloses the estimation result of the process in which the error has occurred to the user U in a timely manner, confirms whether or not the estimation result is correct, and data necessary for learning the error classification model. May be obtained interactively. For example, the information providing device 10 estimates the probability that an error is included in the response and the probability that an error has occurred in each process from the utterance of the user U. Then, when the probability that the response contains an error is higher than the predetermined threshold value, but the probability that the error has occurred in each process is lower than the predetermined threshold value, the information providing device 10 "what kind of error". Is there a response that inquires about the type of error, such as "Did you have any?"

また、情報提供装置１０は、問合せに対する利用者Ｕの応答を解析することで、どの処理に誤りが生じたかを推定し、推定結果に基づいて、処理の修正を行う。なお、情報提供装置１０は、問合せの結果を用いて、誤り分類モデルの再学習を行ってもよい。例えば、情報提供装置１０は、問合せの結果から推定した誤りの種別を正解データとして、誤り分類モデルの再学習を行ってもよい。 Further, the information providing device 10 estimates which process has an error by analyzing the response of the user U to the inquiry, and corrects the process based on the estimation result. The information providing device 10 may relearn the error classification model using the result of the inquiry. For example, the information providing device 10 may relearn the error classification model using the type of error estimated from the result of the inquiry as correct answer data.

〔１−９．処理の内容について〕
ここで、上述した例では、情報提供装置１０は、音声認識処理、意図解析処理、および応答生成処理が段階的に行われる応答処理において、誤りが生じた処理を特定し、特定した処理の修正を行った。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、このような３段階の処理が行われる応答処理以外にも、任意の数の複数の処理が段階的に行われる応答処理において、誤りが生じた処理を特定し、特定した処理の修正を行ってよい。 [1-9. About the contents of processing]
Here, in the above-described example, the information providing device 10 identifies a process in which an error occurs in the response process in which the voice recognition process, the intention analysis process, and the response generation process are performed stepwise, and corrects the specified process. Was done. However, the embodiments are not limited to this. For example, the information providing device 10 identifies a process in which an error occurs in a response process in which an arbitrary number of a plurality of processes are performed in a stepwise manner, in addition to the response process in which such a three-step process is performed. The specified process may be modified.

〔２．情報提供装置の構成〕
以下、上記した情報提供装置１０が有する機能構成の一例について説明する。図２は、実施形態に係る情報提供装置の構成例を示す図である。図２に示すように、情報提供装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [2. Configuration of information providing device]
Hereinafter, an example of the functional configuration of the information providing device 10 described above will be described. FIG. 2 is a diagram showing a configuration example of the information providing device according to the embodiment. As shown in FIG. 2, the information providing device 10 includes a communication unit 20, a storage unit 30, and a control unit 40.

通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、利用者端末１００や外部サーバ２００との間で情報の送受信を行う。 The communication unit 20 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 20 is connected to the network N by wire or wirelessly, and transmits / receives information to / from the user terminal 100 and the external server 200.

記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、対話モデルデータベース３１、誤り分類モデル３２、および学習データデータベース３３を記憶する。 The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. In addition, the storage unit 30 stores the dialogue model database 31, the error classification model 32, and the learning data database 33.

対話モデルデータベース３１には、応答処理に用いられる各種のモデル、すなわち、段階的に使用される複数のモデルが対話モデルとして登録される。例えば、図３は、実施形態に係る対話モデルデータベースに登録される情報の一例を示す図である。図３に示すように、対話モデルデータベース３１には、「モデル種別」、「モデルＩＤ（Identifier）」、「ドメイン」、および「モデルデータ」といった項目を有する情報が登録される。 Various models used for response processing, that is, a plurality of models used stepwise are registered as dialogue models in the dialogue model database 31. For example, FIG. 3 is a diagram showing an example of information registered in the dialogue model database according to the embodiment. As shown in FIG. 3, information having items such as "model type", "model ID (Identifier)", "domain", and "model data" is registered in the dialogue model database 31.

ここで、「モデル種別」とは、モデルの種別を示す情報であり、どの処理に用いられるかを示す情報である。また、「モデルＩＤ」とは、モデルの識別子である。また、「ドメイン」とは、対応付けられた「モデルＩＤ」が示すモデルと対応するドメインを示す情報である。また、「モデルデータ」とは、対応付けられた「モデルＩＤ」が示すモデルを構成する各種パラメータ等のデータである。 Here, the "model type" is information indicating the type of the model, and is information indicating which process is used. Further, the "model ID" is an identifier of the model. Further, the "domain" is information indicating a domain corresponding to the model indicated by the associated "model ID". Further, the "model data" is data such as various parameters constituting the model indicated by the associated "model ID".

例えば、図３に示す例では、モデル種別「音声認識モデル」、モデルＩＤ「ＳＩＭ＃１」、ドメイン「ＡＬＬ」、およびモデルデータ「データ＃１」が対応付けて登録される。このような情報は、モデルＩＤ「ＳＩＭ＃１」が示すモデルのモデル種別が「音声認識モデル」であり、対応するドメインが「ＡＬＬ」（すなわち、全てのドメイン）であり、モデルデータが「データ＃１」である旨を示す。なお、図３に示す例では、「ドメイン＃１」、「データ＃１」といった概念的な値について記載したが、実際には、対話モデルデータベース３１には、各モデルと対応するドメインの識別子、およびパラメータ等のデータが登録されることとなる。 For example, in the example shown in FIG. 3, the model type "speech recognition model", the model ID "SIM # 1", the domain "ALL", and the model data "data # 1" are registered in association with each other. In such information, the model type of the model indicated by the model ID "SIM # 1" is "speech recognition model", the corresponding domain is "ALL" (that is, all domains), and the model data is "data". It indicates that it is "# 1". In the example shown in FIG. 3, conceptual values such as "domain # 1" and "data # 1" are described, but in reality, the interactive model database 31 contains the identifiers of the domains corresponding to each model. And data such as parameters will be registered.

図２に戻り、説明を続ける。誤り分類モデル３２は、応答に含まれる誤りの分類を行うモデル、すなわち、誤り分類モデル３２のデータである。例えば、記憶部３０は、誤り分類モデル３２の各種パラメータ等を記憶する。 Returning to FIG. 2, the explanation will be continued. The error classification model 32 is the data of the error classification model 32, which classifies the errors included in the response. For example, the storage unit 30 stores various parameters of the error classification model 32 and the like.

学習データデータベース３３には、各対話モデルの再学習に用いられるデータが登録される。例えば、図４は、実施形態に係る学習データデータベースに登録される情報の一例を示す図である。図４に示す例では、学習データデータベース３３には、「学習データＩＤ」、「分類種別」、および「学習データ」が対応付けて登録される。 Data used for re-learning each dialogue model is registered in the training data database 33. For example, FIG. 4 is a diagram showing an example of information registered in the learning data database according to the embodiment. In the example shown in FIG. 4, the "learning data ID", the "classification type", and the "learning data" are registered in association with each other in the learning data database 33.

ここで、「学習データＩＤ」とは、学習データを識別する情報である。また、「分類種別」は、対応付けられた「学習データＩＤ」が示す学習データが、どのような種別の学習データであるかを示す情報である。また、「学習データ」とは、対応付けられた「学習データＩＤ」が示す学習データであり、例えば、上述したトリプルである。 Here, the "learning data ID" is information that identifies the learning data. Further, the "classification type" is information indicating what type of learning data the learning data indicated by the associated "learning data ID" is. Further, the "learning data" is the learning data indicated by the associated "learning data ID", and is, for example, the triple described above.

例えば、学習データデータベース３３には、「分類種別」として、学習データが正解データである旨を示す「正解」、学習データが意図解析に誤りがあったと推定されたトリプルである旨を示す「意図解析」、学習データが音声認識に誤りがあったと推定されたトリプルである旨を示す「音声認識」、学習データが発話不十分であったと推定されたトリプルである旨を示す「発話不十分」等といった情報が登録される。 For example, in the training data database 33, as the "classification type", "correct answer" indicating that the training data is correct answer data, and "intention" indicating that the training data is a triple presumed to have an error in intention analysis. "Analysis", "Speech recognition" indicating that the training data is a triple presumed to have an error in speech recognition, "Insufficient speech" indicating that the learning data is a triple presumed to have insufficient utterance. Information such as etc. is registered.

また、学習データデータベース３３には、学習データＩＤ「ＩＤ＃１」、分類種別「正解」、および学習データ「学習データ＃１」といった情報が対応付けて登録される。このような情報は、学習データＩＤ「ＩＤ＃１」が示す学習データが「学習データ＃１」であり、分類種別が「正解」である旨を示す。なお、図４に示す例では、「学習データ＃１」などといった概念的な値を記載したが、実際には、利用者Ｕの発話と、発話に対して生成した応答と、その応答に対する利用者Ｕの新たな発話とがトリプルとして登録される。 In addition, information such as the learning data ID "ID # 1", the classification type "correct answer", and the learning data "learning data # 1" is registered in the learning data database 33 in association with each other. Such information indicates that the learning data indicated by the learning data ID "ID # 1" is "learning data # 1" and the classification type is "correct answer". In the example shown in FIG. 4, conceptual values such as "learning data # 1" are described, but in reality, the utterance of the user U, the response generated for the utterance, and the use for the response are described. The new utterance of person U is registered as a triple.

図２に戻り、説明を続ける。制御部４０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部４０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。図２に示すように、制御部４０は、受付部４１、音声認識部４２、意図解析部４３、応答生成部４４、特定部４５、および修正部４６を有する。 Returning to FIG. 2, the explanation will be continued. The control unit 40 is a controller, and for example, various programs stored in a storage device inside the information providing device 10 by a processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) are stored in a RAM or the like. Is realized by executing as a work area. Further, the control unit 40 is a controller, and may be realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). As shown in FIG. 2, the control unit 40 includes a reception unit 41, a voice recognition unit 42, an intention analysis unit 43, a response generation unit 44, a specific unit 45, and a correction unit 46.

受付部４１は、発話を受付ける。例えば、受付部４１は、利用者端末１００から利用者Ｕの発話の音声データを受付ける。 The reception unit 41 accepts utterances. For example, the reception unit 41 receives the voice data of the utterance of the user U from the user terminal 100.

音声認識部４２は、利用者Ｕの発話から発話のテキストデータを生成する音声認識処理を実行する。例えば、音声認識部４２は、発話の音声データが受け付けられた場合は、対話モデルデータベース３１から音声認識モデルを読出し、読み出した音声認識モデルを用いて、発話の音声データから発話のテキストデータを生成する。 The voice recognition unit 42 executes a voice recognition process that generates text data of the utterance from the utterance of the user U. For example, when the voice data of the utterance is received, the voice recognition unit 42 reads the voice recognition model from the dialogue model database 31 and uses the read voice recognition model to generate the text data of the utterance from the voice data of the utterance. To do.

意図解析部４３は、音声認識モデルの処理結果を用いて、発話の意図を解析する意図解析処理を実行する。例えば、意図解析部４３は、発話のテキストデータが生成された場合は、対話モデルデータベース３１からドメイン分類モデルを読出し、読み出したドメイン分類モデルを用いて、発話のテキストデータから発話が属するドメインを推定する。続いて、意図解析部４３は、推定したドメインと対応する意図解析モデルを、対話モデルデータベース３１から読み出す。そして、意図解析部４３は、読み出した意図解析モデルを用いて、発話のテキストデータから発話の意図を解析する。より具体的には、意図解析部４３は、発話のテキストデータから利用者Ｕの発話の意図を示すパラメータを生成する。 The intention analysis unit 43 executes an intention analysis process for analyzing the intention of the utterance by using the processing result of the voice recognition model. For example, when the utterance text data is generated, the intention analysis unit 43 reads the domain classification model from the dialogue model database 31 and estimates the domain to which the utterance belongs from the utterance text data using the read domain classification model. To do. Subsequently, the intention analysis unit 43 reads out the intention analysis model corresponding to the estimated domain from the dialogue model database 31. Then, the intention analysis unit 43 analyzes the intention of the utterance from the text data of the utterance by using the read intention analysis model. More specifically, the intention analysis unit 43 generates a parameter indicating the utterance intention of the user U from the utterance text data.

応答生成部４４は、意図解析モデルの処理結果を用いて、発話に対する応答を生成する。例えば、応答生成部４４は、発話の意図を示すパラメータが生成された場合は、対話モデルデータベース３１から応答生成モデルを読み出す。そして、応答生成部４４は、意図解析部４３によって生成されたパラメータから、応答生成モデルを用いて、応答を生成する。例えば、応答生成部４４は、パラメータから利用者Ｕが所望する情報の種別や、その情報を取得するための情報等を特定し、外部サーバ２００等から対応する情報を取得する。そして、応答生成部４４は、取得した情報を用いて、応答となる音声データを生成し、生成した音声データを利用者端末１００へと送信する。 The response generation unit 44 uses the processing result of the intention analysis model to generate a response to the utterance. For example, the response generation unit 44 reads the response generation model from the dialogue model database 31 when a parameter indicating the intention of utterance is generated. Then, the response generation unit 44 generates a response from the parameters generated by the intention analysis unit 43 by using the response generation model. For example, the response generation unit 44 specifies the type of information desired by the user U from the parameters, the information for acquiring the information, and the like, and acquires the corresponding information from the external server 200 and the like. Then, the response generation unit 44 uses the acquired information to generate voice data to be a response, and transmits the generated voice data to the user terminal 100.

特定部４５は、利用者Ｕの発話から複数の処理を段階的に実行することで生成された応答に対する利用者Ｕの発話に基づいて、複数の処理のうち修正対象となる処理を特定する。より具体的には、特定部４５は、利用者Ｕの発話＃１に対する応答＃１が出力された後で、利用者Ｕから受けつけた発話＃２に基づいて、応答＃１に含まれる誤りの種別を推定し、複数の処理のうち、推定した種別と対応する１つまたは複数の処理を修正対象として特定する。 The identification unit 45 identifies the process to be modified among the plurality of processes based on the utterance of the user U in response to the response generated by executing the plurality of processes stepwise from the utterance of the user U. More specifically, the specific unit 45 outputs an error included in the response # 1 based on the utterance # 2 received from the user U after the response # 1 to the utterance # 1 of the user U is output. The type is estimated, and one or more processes corresponding to the estimated type are specified as correction targets among the plurality of processes.

すなわち、特定部４５は、それぞれ異なるモデルを用いる複数の処理を段階的に実行することで生成された応答に対する利用者Ｕの発話に基づいて、複数の処理のうち修正対象となる処理を特定する。例えば、特定部４５は、発話の音声認識を行う処理と、音声認識の結果から発話の意図を解析する処理と、意図の解析結果から発話に対する応答を生成する処理とのうち、修正対象となる処理を特定する。 That is, the specific unit 45 identifies the process to be modified among the plurality of processes based on the utterance of the user U in response to the response generated by executing a plurality of processes using different models step by step. .. For example, the specific unit 45 is a modification target of a process of performing voice recognition of an utterance, a process of analyzing an intention of an utterance from the result of voice recognition, and a process of generating a response to an utterance from the analysis result of the intention. Identify the process.

例えば、特定部４５は、発話＃２を受付けた場合は、誤り分類モデル３２を用いて、発話＃２の前に出力した応答＃１に誤りが含まれるか否かや誤りの種別を推定する。より具体的には、特定部４５は、誤り分類モデル３２を用いて、各種別の誤りが含まれる確度をそれぞれ算出する。そして、特定部４５は、ある種別の誤りが含まれる確度の値が所定の閾値を超えた場合は、その種別と対応する処理を修正対象とする。例えば、特定部４５は、誤りが含まれると推定される応答と、その応答に前後して利用者Ｕから受付けた発話とを学習データとし、学習データと、確度の値が所定の閾値を超えた誤りの種別（すなわち、「分類種別」）と、を対応付けて学習データデータベース３３に登録する。 For example, when the utterance # 2 is received, the identification unit 45 uses the error classification model 32 to estimate whether or not the response # 1 output before the utterance # 2 contains an error and the type of the error. .. More specifically, the identification unit 45 uses the error classification model 32 to calculate the probability that each type of error is included. Then, when the value of the certainty including an error of a certain type exceeds a predetermined threshold value, the specific unit 45 sets the processing corresponding to the type as a correction target. For example, the specific unit 45 uses the response presumed to contain an error and the utterance received from the user U before and after the response as learning data, and the learning data and the value of the accuracy exceed a predetermined threshold value. The type of error (that is, "classification type") is associated with and registered in the learning data database 33.

なお、特定部４５は、利用者Ｕの発話と発話に対する応答とその応答に対する利用者Ｕの新たな発話とを用いて応答に含まれる誤りの種別を学習した誤り分類モデル３２を用いて、複数の処理のうち修正対象となる処理を特定してもよい。このような誤り分類モデル３２を用いる場合、特定部４５は、例えば、利用者Ｕの発話と、発話に対する応答と、その応答に対する利用者Ｕの新たな発話とを誤り分類モデル３２に入力することで、応答に含まれる誤りの種別を推定することとなる。 It should be noted that the specific unit 45 uses an error classification model 32 in which the types of errors included in the response are learned by using the utterance of the user U, the response to the utterance, and the new utterance of the user U to the response. The process to be modified may be specified among the processes of. When such an error classification model 32 is used, for example, the specific unit 45 inputs the utterance of the user U, the response to the utterance, and the new utterance of the user U to the response into the error classification model 32. Then, the type of error included in the response is estimated.

例えば、図５は、実施形態に係る情報提供装置が誤りを分類する処理の一例を示す図である。例えば、特定部４５は、（発話＃１、応答＃１、発話＃２）等といったトリプルと、トリプルの応答＃１に含まれる誤りの種別とを学習した誤り分類モデル３２を用いる場合、トリプルを誤り分類モデル３２に入力することで、応答に含まれる誤りの種別を推定する。 For example, FIG. 5 is a diagram showing an example of processing in which the information providing device according to the embodiment classifies errors. For example, when the specific unit 45 uses an error classification model 32 that has learned triples such as (utterance # 1, response # 1, utterance # 2) and error types included in response # 1 of the triple, the triple is used. By inputting to the error classification model 32, the type of error included in the response is estimated.

例えば、特定部４５は、（１番高い山は？、エベレスト、１番低い山は？）といったトリプル＃１を誤り分類モデル３２に入力する。このようなトリプル＃１では、利用者Ｕとの対話が円滑に進んでおり、「エベレスト」といった応答に誤りが含まれていないと考えられる。このような場合、誤り分類モデル３２は、各種別の誤りが含まれる確度として、所定の閾値よりも低い値を算出する。この結果、特定部４５は、トリプル＃１を正解データに分類する。 For example, the specific unit 45 inputs triple # 1 such as (what is the highest mountain ?, Everest, what is the lowest mountain?) In the error classification model 32. In such triple # 1, it is considered that the dialogue with the user U is proceeding smoothly and that the response such as "Everest" does not contain an error. In such a case, the error classification model 32 calculates a value lower than a predetermined threshold value as the probability of including various types of errors. As a result, the specific unit 45 classifies the triple # 1 into correct answer data.

また、特定部４５は、（１番高い山は？、エベレスト、ありがとう）といったトリプル＃２を誤り分類モデル３２に入力する。このようなトリプル＃２でも、利用者Ｕとの対話が円滑に進んでおり、「エベレスト」といった応答に誤りが含まれていないと考えられるため、誤り分類モデル３２は、各種別の誤りが含まれる確度として、所定の閾値よりも低い値を算出する。この結果、特定部４５は、トリプル＃２を正解データに分類する。 In addition, the specific unit 45 inputs triple # 2 such as (What is the highest mountain ?, Everest, thank you) to the error classification model 32. Even in such triple # 2, the dialogue with the user U is proceeding smoothly, and it is considered that the response such as "Everest" does not contain an error. Therefore, the error classification model 32 contains various errors. A value lower than a predetermined threshold value is calculated as the certainty. As a result, the specific unit 45 classifies the triple # 2 into correct answer data.

また、特定部４５は、（１番鷹、はい。、１番高い山は？）といったトリプル＃３や、（１番鷹、はい。、ちゃんと認識して）といったトリプル＃４を誤り分類モデル３２に入力する。このようなトリプル＃３では、「一番高い山は？」という発話が「一番鷹」と認識してしまう音声認識誤りが生じており、発話＃２において利用者Ｕが本来の発話＃１を言い直したものと推定される。また、トリプル＃４には、利用者Ｕが「ちゃんと認識して」等といった音声認識誤りを示唆する発言が含まれる。このような場合、誤り分類モデル３２は、音声認識誤りが含まれる確度として、所定の閾値よりも高い値を算出する。この結果、特定部４５は、トリプル＃３やトリプル＃４を音声認識誤りの学習データ（すなわち、不正解データ）に分類する。 In addition, the specific unit 45 mistakenly classifies triple # 3 such as (No. 1 hawk, yes. What is the highest mountain?) And triple # 4 such as (No. 1 hawk, yes., Recognize properly). Enter in. In such triple # 3, there is a voice recognition error in which the utterance "What is the highest mountain?" Is recognized as "the highest hawk", and in utterance # 2, user U is the original utterance # 1. Is presumed to have been rephrased. In addition, triple # 4 includes a statement suggesting a voice recognition error such as "recognize properly" by the user U. In such a case, the error classification model 32 calculates a value higher than a predetermined threshold value as the probability of including the voice recognition error. As a result, the identification unit 45 classifies the triple # 3 and the triple # 4 into learning data (that is, incorrect answer data) of the speech recognition error.

また、特定部４５は、（１番高い山は？、スカイツリー、違うでしょ）といったトリプル＃５を誤り分類モデル３２に入力する。このようなトリプル＃５では、「山」を「建造物」と混同してしまうといった意図解析誤りが生じており、「違うでしょ」等といった意図解析誤りを示唆する発言が含まれる。このような場合、誤り分類モデル３２は、意図解析誤りが含まれる確度として、所定の閾値よりも高い値を算出する。この結果、特定部４５は、トリプル＃５を意図解析誤りの学習データに分類する。 In addition, the specific unit 45 inputs triple # 5 such as (What is the highest mountain ?, Sky Tree, isn't it?) Into the error classification model 32. In such triple # 5, an intention analysis error such as confusing "mountain" with "building" has occurred, and a statement suggesting an intention analysis error such as "it is different" is included. In such a case, the error classification model 32 calculates a value higher than a predetermined threshold value as the probability that the intention analysis error is included. As a result, the identification unit 45 classifies the triple # 5 into learning data of intention analysis error.

また、特定部４５は、（１番高い山は？、エベレスト、日本で１番高い山）といったトリプル＃６を誤り分類モデル３２に入力する。このようなトリプル＃６では、目立った誤りが存在しないものの、利用者Ｕが発話＃２において、発話＃１よりも詳細な情報を入力しており、発話が不十分であったことが示唆される。このような場合、誤り分類モデル３２は、発話が不十分であった旨の確度として、所定の閾値よりも高い値を算出する。この結果、特定部４５は、トリプル＃６を発話不十分の学習データに分類する。 Further, the specific unit 45 inputs triple # 6 such as (What is the highest mountain ?, Everest, the highest mountain in Japan) to the error classification model 32. In such triple # 6, although there is no conspicuous error, the user U has input more detailed information in utterance # 2 than in utterance # 1, suggesting that the utterance was insufficient. To. In such a case, the error classification model 32 calculates a value higher than a predetermined threshold value as the certainty that the utterance was insufficient. As a result, the specific unit 45 classifies the triple # 6 into learning data with insufficient utterance.

なお、特定部４５は、利用者Ｕの発話と、利用者Ｕの属性とに基づいて、修正対象となる処理を特定してもよい。例えば、特定部４５は、利用者Ｕの発話と、利用者Ｕの属性と、誤りの種別との間の関係性を学習した誤り分類モデル３２を用いて、応答に含まれる誤りの種別を推定してもよい。 The specifying unit 45 may specify the process to be corrected based on the utterance of the user U and the attributes of the user U. For example, the specific unit 45 estimates the type of error included in the response by using the error classification model 32 that has learned the relationship between the utterance of the user U, the attribute of the user U, and the type of error. You may.

図２に戻り、説明を続ける。修正部４６は、特定部４５により特定された処理の内容を修正する。より具体的には、修正部４６は、応答に対する利用者の発話に基づいて特定された処理の内容を修正する。例えば、修正部４６は、特定された処理に用いるモデルを再学習する。 Returning to FIG. 2, the explanation will be continued. The correction unit 46 corrects the content of the process specified by the specific unit 45. More specifically, the correction unit 46 corrects the content of the process specified based on the utterance of the user in response to the response. For example, the correction unit 46 relearns the model used for the specified process.

例えば、修正部４６は、所定のタイミングで、学習データデータベース３３を参照し、学習データと、学習データと対応付けられた分類種別との組を読み出す。続いて、修正部４６は、読み出した分類種別と対応する処理に用いられるモデルを対話モデルデータベース３１から読み出す。そして、修正部４６は、読み出した学習データを用いて、読み出したモデルの再学習を実行する。 For example, the correction unit 46 refers to the learning data database 33 at a predetermined timing, and reads out a set of the learning data and the classification type associated with the learning data. Subsequently, the correction unit 46 reads out the model used for the processing corresponding to the read classification type from the interactive model database 31. Then, the correction unit 46 relearns the read model using the read learning data.

例えば、修正部４６は、学習データと対応付けられた分類種別が「音声認識」であった場合は、音声認識モデルを読出し、学習データと対応付けられた分類種別が「意図解析」であった場合は、意図解析モデルを読出し、学習データと対応付けられた分類種別が「発話不十分」であった場合は、応答生成モデルを読出す。そして、修正部４６は、学習データを不正解データとして、読み出したモデルの再学習を行う。 For example, when the classification type associated with the training data was "speech recognition", the correction unit 46 read the voice recognition model, and the classification type associated with the training data was "intention analysis". In this case, the intention analysis model is read, and if the classification type associated with the training data is "insufficient speech", the response generation model is read. Then, the correction unit 46 relearns the read model by using the training data as incorrect answer data.

なお、修正部４６は、学習データと対応付けられた分類種別が「意図解析」であった場合は、各ドメインに対応する意図解析モデルのうち、学習データに含まれる発話＃１を分類したドメインと対応する意図解析モデルを再学習の対象としてもよい。また、修正部４６は、学習データと対応付けられた分類種別が「意図解析」であった場合は、ドメイン分類モデルと、意図解析モデルとを同時に再学習してもよい。 When the classification type associated with the learning data is "intention analysis", the correction unit 46 classifies the utterance # 1 included in the learning data among the intention analysis models corresponding to each domain. The intention analysis model corresponding to the above may be the target of re-learning. Further, when the classification type associated with the learning data is "intention analysis", the correction unit 46 may relearn the domain classification model and the intention analysis model at the same time.

〔３．情報提供装置が実行する処理の流れの一例〕
続いて、図６を用いて、情報提供装置１０が実行する処理の流れの一例を説明する。図６は、実施形態に係る情報提供装置が実行する修正処理の流れの一例を示すフローチャートである。例えば、情報提供装置１０は、第１発話と、第１発話に対する第１応答と、第１応答に対する第２発話のトリプルを取得する（ステップＳ１０１）。続いて、情報提供装置１０は、トリプルの第１応答に誤りが含まれるか否かを判定する（ステップＳ１０２）。 [3. An example of the flow of processing executed by the information providing device]
Subsequently, an example of the flow of processing executed by the information providing device 10 will be described with reference to FIG. FIG. 6 is a flowchart showing an example of a flow of correction processing executed by the information providing device according to the embodiment. For example, the information providing device 10 acquires a triple of the first utterance, the first response to the first utterance, and the second utterance to the first response (step S101). Subsequently, the information providing device 10 determines whether or not the first response of the triple contains an error (step S102).

そして、情報提供装置１０は、第１応答に誤りが含まれると判定した場合は（ステップＳ１０２：Ｙｅｓ）、誤りの内容に応じて誤りを分類する（ステップＳ１０３）。例えば、情報提供装置１０は、第２発話の内容に基づいて、誤りの分類を行う。そして、情報提供装置１０は、トリプルを分類結果に対応する処理を修正するための不正解データとする（ステップＳ１０４）。一方、情報提供装置１０は、応答に誤りが含まれないと判定した場合は（ステップＳ１０２：Ｎｏ）、トリプルを正解データとする（ステップＳ１０５）。そして、情報提供装置１０は、正解データおよび不正解データを用いて、各種処理に用いるモデルを修正し（ステップＳ１０６）、処理を終了する。 Then, when the information providing device 10 determines that the first response contains an error (step S102: Yes), the information providing device 10 classifies the error according to the content of the error (step S103). For example, the information providing device 10 classifies errors based on the content of the second utterance. Then, the information providing device 10 sets the triple as incorrect answer data for correcting the process corresponding to the classification result (step S104). On the other hand, when the information providing device 10 determines that the response does not include an error (step S102: No), the information providing device 10 sets the triple as correct answer data (step S105). Then, the information providing device 10 modifies the model used for various processes by using the correct answer data and the incorrect answer data (step S106), and ends the process.

〔４．変形例〕
上記では、情報提供装置１０による修正処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、情報提供装置１０が実行する修正処理のバリエーションについて説明する。 [4. Modification example]
In the above, an example of the correction process by the information providing device 10 has been described. However, the embodiments are not limited to this. Hereinafter, variations of the correction process executed by the information providing device 10 will be described.

〔４−１．誤り種別の推定について〕
情報提供装置１０は、応答に対する利用者Ｕの発話の内容から、応答に含まれる誤りの種別を推定した。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、応答を出力した際の利用者Ｕの表情や、応答に対する利用者Ｕの発話の周波数等に基づいて、応答に誤りが含まれるか否か、応答に含まれる誤りの種別等を推定してもよい。 [4-1. About estimation of error type]
The information providing device 10 estimated the type of error included in the response from the content of the utterance of the user U in response to the response. However, the embodiments are not limited to this. For example, the information providing device 10 determines whether or not the response contains an error based on the facial expression of the user U when the response is output, the frequency of the user U's utterance with respect to the response, and the error included in the response. You may estimate the type of.

〔４−２．装置構成〕
記憶部３０に登録された各データベース３１、３３は、外部のストレージサーバに保持されていてもよい。また、情報提供装置１０は、利用者Ｕとの対話を行う対話サーバおよび対話サーバが用いるモデルの修正を行う修正サーバとが連携して動作することにより、実現されてもよい。このような場合、対話サーバには、図２に示す受付部４１、音声認識部４２、意図解析部４３、および応答生成部４４が配置され、修正サーバには、特定部４５および修正部４６が配置されていてもよい。 [4-2. Device configuration〕
The databases 31 and 33 registered in the storage unit 30 may be stored in an external storage server. Further, the information providing device 10 may be realized by operating in cooperation with a dialogue server that interacts with the user U and a modification server that modifies the model used by the dialogue server. In such a case, the reception unit 41, the voice recognition unit 42, the intention analysis unit 43, and the response generation unit 44 shown in FIG. 2 are arranged on the dialogue server, and the specific unit 45 and the correction unit 46 are arranged on the correction server. It may be arranged.

また、情報提供装置１０は、受付部４１が配置されたフロントエンドサーバ、音声認識部４２が配置された音声認識サーバ、意図解析部４３が配置された意図解析サーバ、応答生成部４４が配置された応答生成サーバ、特定部４５が配置された特定サーバ、および修正部４６が配置された修正サーバが連携して動作することにより、実現されてもよい。 Further, in the information providing device 10, a front-end server in which the reception unit 41 is arranged, a voice recognition server in which the voice recognition unit 42 is arranged, an intention analysis server in which the intention analysis unit 43 is arranged, and a response generation unit 44 are arranged. It may be realized by operating the response generation server, the specific server in which the specific unit 45 is arranged, and the correction server in which the correction unit 46 is arranged in cooperation with each other.

〔４−３．適用対象について〕
上述した例では、情報提供装置１０は、利用者の発話に対する応答を生成するための複数の処理を段階的に実行する対話システムにおいて、利用者の応答からどのような種別の誤りが生じたかを特定し、特定した誤りの種別に応じた処理の修正を行った。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、利用者の入力に対する出力を生成する処理であって、入力から段階的に複数の処理が実行させることにより出力を生成する処理であれば、任意の処理について、上述した修正処理を実行してもよい。 [4-3. Applicable target]
In the above example, the information providing device 10 determines what kind of error has occurred from the user's response in the dialogue system that sequentially executes a plurality of processes for generating a response to the user's utterance. It was identified and the processing was corrected according to the type of identified error. However, the embodiments are not limited to this. For example, if the information providing device 10 is a process of generating an output with respect to a user's input and is a process of generating an output by executing a plurality of processes stepwise from the input, the above-mentioned arbitrary process will be described. You may execute the correction process.

例えば、ウェブ検索においては、利用者Ｕが入力した検索クエリから、形態素解析やトークンの特定を行う第１処理、第１処理の結果を用いてウェブ検索を行う第２処理、第２処理の結果得られたウェブコンテンツを利用者Ｕの属性情報等に基づいて並び替える第３処理を実行することで、検索クエリに対応する検索結果を生成する。このような場合に、検索結果を出力した後で利用者Ｕが新たに入力する検索クエリは、前回の検索結果が正しいかったか否か、誤りであった場合には、どのような誤りが生じたかの指標となりえる。例えば、利用者Ｕが新たに入力した検索クエリと、前回入力した検索クエリとの類似性や差異等は、どのような誤りが生じたかの指標となりえる。 For example, in a web search, the results of the first process of performing morphological analysis and token identification from the search query input by the user U, the second process of performing a web search using the results of the first process, and the results of the second process. By executing the third process of sorting the obtained web contents based on the attribute information of the user U and the like, a search result corresponding to the search query is generated. In such a case, the search query newly input by the user U after outputting the search result may or may not be correct in the previous search result, and if it is incorrect, what kind of error may occur. It can be an index of Taka. For example, the similarity or difference between the search query newly input by the user U and the search query input last time can be an index of what kind of error has occurred.

そこで、情報提供装置１０は、利用者Ｕが新たに入力した検索クエリから、段階的に実行させる複数の処理のうち誤りが生じた処理、すなわち、処理対象となる処理を特定し、特定した処理の修正を行ってもよい。また、情報提供装置１０は、利用者Ｕが新たに入力した検索クエリと、前回入力した検索クエリとから、処理の修正を行ってもよい。 Therefore, the information providing device 10 identifies and identifies a process in which an error occurs among a plurality of processes to be executed stepwise from a search query newly input by the user U, that is, a process to be processed. May be modified. Further, the information providing device 10 may modify the processing from the search query newly input by the user U and the search query newly input by the user U.

すなわち、情報提供装置１０は、利用者Ｕが入力した入力情報から複数の処理を段階的に実行することで生成された出力情報に対する利用者Ｕの反応（例えば、新たな入力情報から推定される利用者Ｕの反応）基づいて、複数の処理のうち修正対象となる処理を特定する。そして、情報提供装置１０は、特定された処理の内容を修正すればよい。 That is, the information providing device 10 is estimated from the reaction of the user U to the output information generated by stepwise executing a plurality of processes from the input information input by the user U (for example, from new input information). Based on the reaction of the user U), the process to be modified is specified among the plurality of processes. Then, the information providing device 10 may modify the content of the specified process.

〔４−４．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、逆に、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4-4. Others]
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, and conversely, the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method. In addition, the processing procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed / physically in any unit according to various loads and usage conditions. It can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

〔４−５．プログラム〕
また、上述した実施形態に係る情報提供装置１０は、例えば図７に示すような構成のコンピュータ１０００によって実現される。図７は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [4-5. program〕
Further, the information providing device 10 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. 7. FIG. 7 is a diagram showing an example of a hardware configuration. The computer 1000 is connected to the output device 1010 and the input device 1020, and the arithmetic unit 1030, the primary storage device 1040, the secondary storage device 1050, the output IF (Interface) 1060, the input IF 1070, and the network IF 1080 are connected by the bus 1090. Has.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等により実現される。 The arithmetic unit 1030 operates based on a program stored in the primary storage device 1040 or the secondary storage device 1050, a program read from the input device 1020, or the like, and executes various processes. The primary storage device 1040 is a memory device that temporarily stores data used by the arithmetic unit 1030 for various calculations, such as a RAM. Further, the secondary storage device 1050 is a storage device in which data used by the calculation device 1030 for various calculations and various databases are registered, such as a ROM (Read Only Memory), an HDD (Hard Disk Drive), and a flash memory. Is realized by.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various information such as a monitor and a printer. For example, USB (Universal Serial Bus), DVI (Digital Visual Interface), and the like. It is realized by a connector of a standard such as HDMI (registered trademark) (High Definition Multimedia Interface). Further, the input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 is, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), or a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), or a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. Further, the input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF1080 receives data from another device via the network N and sends it to the arithmetic unit 1030, and also transmits the data generated by the arithmetic unit 1030 to the other device via the network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。 For example, when the computer 1000 functions as the information providing device 10, the arithmetic unit 1030 of the computer 1000 realizes the function of the control unit 40 by executing the program loaded on the primary storage device 1040.

〔５．効果〕
上述したように、情報提供装置１０は、利用者Ｕが入力した入力情報から複数の処理を段階的に実行することで生成された出力情報に対する利用者の反応に基づいて、複数の処理のうち修正対象となる処理を特定する。そして、情報提供装置１０は、特定された処理の内容を修正する。このため、情報提供装置１０は、複数の処理が段階的に実行されるような処理において、誤りが生じた場合に、効率的な修正を実現できる。 [5. effect〕
As described above, the information providing device 10 has a plurality of processes based on the reaction of the user to the output information generated by stepwise executing a plurality of processes from the input information input by the user U. Identify the process to be modified. Then, the information providing device 10 corrects the content of the specified process. Therefore, the information providing device 10 can realize efficient correction when an error occurs in a process in which a plurality of processes are executed step by step.

また、情報提供装置１０は、利用者Ｕの発話から複数の処理を段階的に実行することで生成された応答に対する利用者Ｕの発話に基づいて、複数の処理のうち修正対象となる処理を特定する。そして、情報提供装置１０は、処理の内容を修正する。このため、情報提供装置１０は、複数の処理が段階的に実行されるような処理において、誤りが生じた場合に、効率的な修正を実現できる。 Further, the information providing device 10 performs a process to be corrected among the plurality of processes based on the utterance of the user U in response to the response generated by executing a plurality of processes stepwise from the utterance of the user U. Identify. Then, the information providing device 10 corrects the content of the process. Therefore, the information providing device 10 can realize efficient correction when an error occurs in a process in which a plurality of processes are executed step by step.

また、情報提供装置１０は、利用者Ｕの入力情報に基づいて、出力情報に含まれる誤りの種別を推定し、複数の処理のうち、推定した種別と対応する１つまたは複数の処理を修正対象として特定する。そして、情報提供装置１０は、出力情報に対する利用者Ｕの入力情報に基づいて、特定された処理の内容を修正する。このため、情報提供装置１０は、誤りが生じたと推定される処理を効率的に修正することができる。 Further, the information providing device 10 estimates the type of error included in the output information based on the input information of the user U, and corrects one or a plurality of processes corresponding to the estimated type among the plurality of processes. Identify as a target. Then, the information providing device 10 corrects the content of the specified process based on the input information of the user U with respect to the output information. Therefore, the information providing device 10 can efficiently correct the process presumed to have caused an error.

また、情報提供装置１０は、それぞれ異なるモデルを用いる複数の処理を段階的に実行することで生成された出力情報に対する利用者Ｕの入力情報に基づいて、複数の処理のうち修正対象となる処理を特定し、特定された処理に用いるモデルを再学習する。このため、情報提供装置１０は、各処理に用いられるモデルを効率的に修正することができる。 Further, the information providing device 10 is a process to be corrected among the plurality of processes based on the input information of the user U for the output information generated by stepwise executing a plurality of processes using different models. And relearn the model used for the identified process. Therefore, the information providing device 10 can efficiently modify the model used for each process.

また、情報提供装置１０は、利用者Ｕの入力情報と、利用者Ｕの属性とに基づいて、修正対象となる処理を特定する。このため、情報提供装置１０は、誤りが生じた処理の推定精度を向上させることができる。 Further, the information providing device 10 identifies the process to be corrected based on the input information of the user U and the attributes of the user U. Therefore, the information providing device 10 can improve the estimation accuracy of the processing in which an error occurs.

また、情報提供装置１０は、利用者Ｕの入力情報と、その入力情報に対する出力情報と、その出力情報に対する利用者Ｕの新たな入力情報とを用いて、出力情報に含まれる誤りの種別を学習した誤り分類モデルを用いて、複数の処理のうち修正対象となる処理を特定する。このため、情報提供装置１０は、誤りが生じた処理を適切に推定することができる。 Further, the information providing device 10 uses the input information of the user U, the output information for the input information, and the new input information of the user U for the output information to determine the type of error included in the output information. Using the learned error classification model, identify the process to be corrected among multiple processes. Therefore, the information providing device 10 can appropriately estimate the processing in which the error has occurred.

また、情報提供装置１０は、利用者Ｕの発話の音声認識を行う処理と、音声認識の結果から発話の意図を解析する処理と、意図の解析結果から発話に対する応答を生成する処理とのうち、修正対象となる処理を特定する。このため、情報提供装置１０は、音声認識処理、意図解析処理、および応答生成処理が段階的に実行されるような応答処理において、効率的な処理の修正を実現できる。 Further, the information providing device 10 includes a process of performing voice recognition of the utterance of the user U, a process of analyzing the intention of the utterance from the result of the voice recognition, and a process of generating a response to the utterance from the analysis result of the intention. , Identify the process to be modified. Therefore, the information providing device 10 can realize efficient processing modification in the response processing in which the voice recognition processing, the intention analysis processing, and the response generation processing are executed step by step.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure column of the invention. It is possible to practice the present invention in other improved forms.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、特定部は、特定手段や特定回路に読み替えることができる。 In addition, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the specific unit can be read as a specific means or a specific circuit.

１０情報提供装置
２０通信部
３０記憶部
３１対話モデルデータベース
３２誤り分類モデル
３３学習データデータベース
４０制御部
４１受付部
４２音声認識部
４３意図解析部
４４応答生成部
４５特定部
４６修正部
１００利用者端末
２００外部サーバ 10 Information provider 20 Communication unit 30 Storage unit 31 Dialogue model database 32 Error classification model 33 Learning data database 40 Control unit 41 Reception unit 42 Speech recognition unit 43 Intention analysis unit 44 Response generation unit 45 Specific unit 46 Correction unit 100 User terminal 200 external server

Claims

利用者が入力した入力情報と当該入力情報から複数の処理を段階的に実行することで生成された出力情報と当該出力情報に対する新たな入力情報とを用いて前記出力情報に含まれる誤りの種別を学習した誤り分類モデルを用いて、利用者が入力した入力情報から生成された出力情報に含まれる誤りの種別を特定し、前記処理のうち特定された種別と対応する処理を修正対象となる処理として特定する特定部と、
前記誤りが含まれる出力情報が出力された際における前記特定部により特定された処理の処理結果を不正解データとして当該処理に用いたモデルを再学習することにより、前記特定部により特定された処理の内容を修正する修正部と
を有することを特徴とする修正装置。 Types of errors included in the output information using the input information input by the user, the output information generated by executing multiple processes from the input information in stages, and the new input information for the output information. Using the error classification model learned from, the type of error included in the output information generated from the input information input by the user is specified, and the process corresponding to the specified type among the above processes is to be corrected. The specific part to be specified as processing and
The process specified by the specific unit by re-learning the model used for the process as incorrect answer data for the processing result of the process specified by the specific unit when the output information including the error is output. A correction device having a correction unit for correcting the contents of the above.

前記特定部は、前記利用者の発話から複数の処理を段階的に実行することで生成された応答に対する利用者の新たな発話に基づいて、複数の前記処理のうち修正対象となる処理を特定する
ことを特徴とする請求項１に記載の修正装置。 The specific unit identifies the process to be modified among the plurality of processes based on the user's new utterance to the response generated by executing a plurality of processes stepwise from the user's utterance. The correction device according to claim 1, wherein the correction device is made.

前記特定部は、前記入力情報に基づいて、前記出力情報に含まれる誤りの種別を推定し、複数の前記処理のうち、推定した種別と対応する１つまたは複数の処理を前記修正対象として特定する
ことを特徴とする請求項１または２に記載の修正装置。 The specific unit estimates the type of error included in the output information based on the input information, and specifies one or a plurality of processes corresponding to the estimated type as the correction target among the plurality of the processes. The correction device according to claim 1 or 2, wherein the correction device is made.

前記修正部は、前記出力情報に対する前記利用者の新たな入力情報に基づく情報を正解データとして、前記特定部により特定された処理の内容を修正する
ことを特徴とする請求項１〜３のうちいずれか１つに記載の修正装置。 Of claims 1 to 3, the correction unit corrects the content of the process specified by the specific unit, using the information based on the new input information of the user with respect to the output information as correct answer data. The correction device according to any one.

前記特定部は、それぞれ異なるモデルを用いる複数の処理を段階的に実行することで生成された前記出力情報に対する前記利用者の新たな入力情報に基づいて、複数の前記処理のうち修正対象となる処理を特定し、
前記修正部は、前記特定部により特定された処理に用いるモデルを再学習する
ことを特徴とする請求項１〜４のうちいずれか１つに記載の修正装置。 The specific unit is to be modified among the plurality of processes based on new input information of the user for the output information generated by executing a plurality of processes using different models step by step. Identify the process and
The correction device according to any one of claims 1 to 4, wherein the correction unit relearns a model used for the process specified by the specific unit.

前記特定部は、前記利用者の入力情報と、当該利用者の属性とに基づいて、前記修正対象となる処理を特定する
ことを特徴とする請求項１〜５のうちいずれか１つに記載の修正装置。 The specific unit is described in any one of claims 1 to 5, characterized in that the process to be modified is specified based on the input information of the user and the attributes of the user. Correction device.

前記特定部は、前記利用者の発話の音声認識を行う処理と、当該音声認識の結果から当該発話の意図を解析する処理と、当該意図の解析結果から前記発話に対する応答を生成する処理とのうち、修正対象となる処理を特定する
ことを特徴とする請求項１〜６のうちいずれか１つに記載の修正装置。 The specific unit includes a process of performing voice recognition of the user's utterance, a process of analyzing the intention of the utterance from the result of the voice recognition, and a process of generating a response to the utterance from the analysis result of the intention. The correction device according to any one of claims 1 to 6, wherein the processing to be corrected is specified.

修正装置が実行する修正方法であって、
利用者が入力した入力情報と当該入力情報から複数の処理を段階的に実行することで生成された出力情報と当該出力情報に対する新たな入力情報とを用いて前記出力情報に含まれる誤りの種別を学習した誤り分類モデルを用いて、利用者が入力した入力情報から生成された出力情報に含まれる誤りの種別を特定し、前記処理のうち特定された種別と対応する処理を修正対象となる処理として特定する特定工程と、
前記誤りが含まれる出力情報が出力された際における前記特定工程により特定された処理の処理結果を不正解データとして当該処理に用いたモデルを再学習することにより、前記特定工程により特定された処理の内容を修正する修正工程と
を含むことを特徴とする修正方法。 This is the correction method performed by the correction device.
Types of errors included in the output information using the input information input by the user, the output information generated by executing multiple processes from the input information in stages, and the new input information for the output information. Using the error classification model learned from, the type of error included in the output information generated from the input information input by the user is specified, and the process corresponding to the specified type among the above processes is to be corrected. A specific process to be specified as a process and
The process specified by the specific process by re-learning the model used for the process as incorrect answer data for the process result of the process specified by the specific process when the output information including the error is output. A correction method characterized by including a correction step for correcting the contents of.

利用者が入力した入力情報と当該入力情報から複数の処理を段階的に実行することで生成された出力情報と当該出力情報に対する新たな入力情報とを用いて前記出力情報に含まれる誤りの種別を学習した誤り分類モデルを用いて、利用者が入力した入力情報から生成された出力情報に含まれる誤りの種別を特定し、前記処理のうち特定された種別と対応する処理を修正対象となる処理として特定する特定手順と、
前記誤りが含まれる出力情報が出力された際における前記特定手順により特定された処理の処理結果を不正解データとして当該処理に用いたモデルを再学習することにより、前記特定手順により特定された処理の内容を修正する修正手順と
を実行するための修正プログラム。 Types of errors included in the output information using the input information input by the user, the output information generated by executing multiple processes from the input information in stages, and the new input information for the output information. Using the error classification model learned from, the type of error included in the output information generated from the input information input by the user is specified, and the process corresponding to the specified type among the above processes is to be corrected. Specific procedures to identify as processing and
The process specified by the specific procedure by re-learning the model used for the process as incorrect answer data for the process result of the process specified by the specific procedure when the output information including the error is output. A fix to fix the contents of and to perform the fix.