JP2005017603A

JP2005017603A - Method and program for estimating speech recognition rate

Info

Publication number: JP2005017603A
Application number: JP2003181220A
Authority: JP
Inventors: Masayuki Takahashi; 真之高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-06-25
Filing date: 2003-06-25
Publication date: 2005-01-20

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition rate estimating method in which a speech interaction system sequentially estimates an assumed recognition rate at speaking input points of time. <P>SOLUTION: The speech interaction system which provides information service through speech interaction between a user and a computer system has the stages of; inputting sound scores and language scores obtained by analyzing an input speech; inputting the speech recognition result at the time of the execution of an interacting operation; calculating total scores by adding the sound scores and language scores together; previously finding a correlation curve between the total scores and speech recognition rate according to the total scores and speech recognition result; and estimating a speech recognition rate for the total scores obtained by adding newly inputted sound score and language scores by reference to the correlation curve. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識率推定方法及び音声認識率推定プログラムに係わり、利用者による電話、携帯電話、携帯情報端末、車載情報端末等からの音声入力に対して音声認識機能を有する情報サービス提供システムにおいて、音声入力時において予想される音声認識率を推定する方法に関する。
【０００２】
【従来の技術】
ある認識語彙数の場合の認識率が既知であれば、認識誤り率が語彙数の平方根に比例するという経験則から、語彙数ｎ語の場合の認識率を推定することができる（例えば、特許文献１参照）。しかし、音声対話システムの実際の利用状況は、利用者が不特定多数であること、利用者周辺の雑音環境が一定でないこと等の理由から、認識語彙数のみから認識率を推定することは極めて困難である。また従来の音声対話システムでは、認識率をいかに向上させるかという手法が検討されているのみであり、積極的に発話入力時に認識率を推定する手法は見られない。このように、音声認識システムにおいて、利用者に情報サービスを提供する際に、最小限の対話のやり取りで完了させるための戦略として、従来は何らかの技術的工夫により音声認識率の向上を図ることで認識誤りを最小にするということが一般に行われてきたが、現状の音声認識の技術は未だ完全ではなく、様々な利用者の環境下において、常に高い認識率を得ることは非常に困難となっている。
【０００３】
【特許文献１】
特開２００２−２７８５８９号公報
【０００４】
【発明が解決しようとする課題】
ところで、一般の音声対話システムでは、一定の認識誤りを前提として対話シナリオが設計されているが、想定した認識率より悪い場合は対話が非常に長くなってしまったり、逆に想定した認識率より良い場合は結果的に必要でない確認応答の割合が多くなることとなる。そこで、様々な利用者の発話状況における対話長さの期待値を最小とするために、ユーザ要求の確率分布と期待ターン数（対話のやり取りを行う回数）から次の確認内容を可変的に決定する手段が提案されている（特許文献１）。
【０００５】
しかしながら、期待ターン数を正確に推定するためには、時々刻々と変化する利用者の発話環境に応じて逐次音声認識率を推定する必要があるが、特許文献１に記載の装置ではそのような手段が提案されておらず、実際の利用環境においては期待ターン数の推定精度が悪化し、結果的に対話長さの期待値の短縮が図れない可能性があった。
【０００６】
本発明は、このような事情に鑑みてなされたもので、音声対話システムにおいて想定される音声認識率を発話入力時点において逐次推定する音声認識率推定方法及び音声認識率推定プログラムを提供することを目的とする。
【０００７】
【課題を解決するための手段】
請求項１に記載の発明は、音声によって利用者とコンピュータシステムが対話を行うことにより情報サービスを提供する音声対話システムにおいて、音声認識率を推定する方法であって、入力音声を分析することにより得られた音響スコアと言語スコアとを入力する過程と、対話動作の実行時における音声認識結果を入力する過程と、前記音響スコアと前記言語スコアとを加算することにより合計スコアを計算する過程と、前記合計スコアと前記音声認識結果とに基づいて、該合計スコアと音声認識率との相関曲線を予め求めておく過程と、前記相関曲線を参照して、新たに入力された音響スコアと言語スコアとを加算した合計スコアに対応する音声認識率を推定する過程とを有することを特徴とする。
【０００８】
請求項２に記載の発明は、音声によって利用者とコンピュータシステムが対話を行うことにより情報サービスを提供する音声対話システムにおいて動作する音声認識率推定プログラムであって、入力音声を分析することにより得られた音響スコアと言語スコアとを入力する処理と、対話動作の実行時における音声認識結果を入力する処理と、前記音響スコアと前記言語スコアとを加算することにより合計スコアを計算する処理と、前記合計スコアと前記音声認識結果とに基づいて、該合計スコアと音声認識率との相関曲線を予め求めておく処理と、前記相関曲線を参照して、新たに入力された音響スコアと言語スコアとを加算した合計スコアに対応する音声認識率を推定する処理とをコンピュータに行わせることを特徴とする。
【０００９】
【発明の実施の形態】
以下、本発明の一実施形態による音声認識システムを図面を参照して説明する。図１は同実施形態の構成を示すブロック図である。符号１は、利用者が発話したマイクロホンを通して直接、または電話網、携帯電話網、インターネット、構内網等を経由して間接的に入力される発話音声を取り込み、特徴パラメータを抽出して、内部の辞書中にある単語群に音響スコアを付加し、その結果を送出する音響分析部である。符号２は、内部の辞書内にある各単語列に対し、単語と単語の繋がりやすさを表す言語スコアを付加する言語分析部である。符号３は、内部に蓄積された認識結果を用いて、ある単位発話音声がある合計スコアを獲得したとき、その発話音声が正しく認識される確率を推定する認識率推定部である。符号４は、言語分析部２より出力される音響スコア及び言語スコアの合計等から、認識結果を判断し、その結果を送出する探索部である。符号５は、探索部４からの出力に基づいて対話動作を制御する対話制御部である。
【００１０】
次に、図２を参照して、図１に示す認識率推定部３の動作を説明する。
まず、認識率推定部３は、言語分析部２から音響スコアと言語スコアを受信する（ステップＳ１）。そして、認識率推定部３は、受信した音響スコアと言語スコアの合計スコアを計算する（ステップＳ２）。続いて、認識率推定部３は、近似曲線の計算式に先に計算した合計スコアを代入し、推定音声認識率を決定する（ステップＳ３）。そして、認識率推定部３は、計算した推定音声認識率を対話制御部５へ送信する（ステップＳ４）。対話制御部５は、この推定音声認識率に基づいて、次に行うべき動作を判断して対話動作の制御を行う。
【００１１】
一方、認識率推定部３は、対話制御部５から音声認識結果（成功または失敗のいずれか）を受信する（ステップＳ５）。続いて、認識率推定部３は、受信した認識結果を、この合計スコア幅（範囲）内の認識成否データに追加する（ステップＳ６）。そして、合計スコア幅（範囲）内における音声認識率を再計算して内部に保持する（ステップＳ７）。続いて、認識率推定部３は、内部に保持したデータに基づいて、最適近似曲線の計算式を再計算する（ステップＳ８）。
【００１２】
次に、図３を参照しながら具体例を使用して、認識率推定部３の動作を説明する。ここで用いる音響スコア、言語スコアの数値はあくまで一例である。まず、近似曲線を求める動作を説明する。言語分析部２から送出される音響スコアと言語スコアの合計を求める。また、このときの音声認識結果（認識成功または認識失敗のいずれか）を対話制御部５から取得する。続いて、得られた合計スコアを、予め決められた合計スコアの値の範囲（図３では、０−９９、１００−２００、…、５００−６００、６００−の１００点刻み）に当てはめ、合計スコアの範囲を特定する。例えば、合計スコアが３１５点であれば、「３００−４００」が該当する。そして、対話制御部５より得られた音声認識結果に基づいて、認識成功数または認識失敗数の値に「１」を加算する。この処理を所定回数繰り返すことにより、図３に示す認識率テーブルが生成される。続いて、認識率推定部３は、下記に示す（１）式により、各合計スコア範囲毎の認識率を計算する。この認識率は、各合計スコア範囲の中央値（ここでは、５０点、１５０点、２５０点、…、５５０点、６５０点）における認識率として、認識率テーブルに書き込む。この認識率テーブルは、認識率推定部３内に保持される。
（認識成功数／（認識成功数＋認識失敗数））×１００・・・（１）
【００１３】
次に、認識率推定部３は、得られた認識率（３３，５２，６４，８２，８８，９２，９６）の点を最も良く近似する曲線を計算する。この近似曲線を求める方法は、周知の方法を用いる。求めた近似曲線（相関曲線）は、認識率推定部３内に保持する。
【００１４】
次に、先に求めた近似曲線（相関曲線）に基づいて、音声認識率を推定する動作を説明する。認識率推定部３は、新たに言語分析部２から送出される音響スコアと言語スコアを加算し、合計スコアを求める。そして、内部に保持している近似曲線に照らし合わせ、推定音声認識率を求める。例えば、合計スコアが３１５点であれば、７５％という推定音声認識率を求めることができる。認識率推定部３は、近似曲線に当てはめて得られた推定音声認識率を対話制御部５へ送信する。この推定音声認識率は発話音声が入力される度に対話制御部５に送出され、対話制御部５が対話ターン数の期待値を最小にするよう対話シナリオを制御するためのパラメータとして用いる。
【００１５】
次に、近似曲線を更新する動作を説明する。認識率推定部３は、合計スコアと近似曲線に基づいて、音声認識率を推定した後、得られた合計スコアと音声認識結果（成功または失敗）とに基づき、前述した認識率テーブル中の認識成功数または認識失敗数の値を更新する。そして、改めて認識率の計算を行い、新たな近似曲線を求め、内部に保持する。以降の推定音声認識率は、新たに求められた近似曲線に基づいて推定する。
【００１６】
なお、合計スコアは音響スコアと言語スコアの単純な和と限定するものではなく、推定音声認識率が最適になるよう適当な係数を掛けたり定数を加える等の操作を行った結果を合計スコアとして用いてもよい。
【００１７】
このように、音声対話システム中の音声認識システムが生成する音響スコア、言語スコア及び認識結果から、各スコアの合計と認識率の相関を表す近似曲線を生成し、発話音声入力がなされた時点で、期待される音声認識率を推定することができるため、対話ターン数の期待値の最小化、すなわち、音声対話を利用した情報サービスを提供する際に利用者がサービス利用の目的を達成するまでに要する時間の短縮化を図ることが可能となる。また、認識結果が蓄積するに従い、合計スコアと認識率の相関関係を表す近似曲線の精度を向上させることができる。
【００１８】
なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより音声認識率推定処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。
【００１９】
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。
【００２０】
【発明の効果】
以上説明したように、この発明によれば、音声対話システムにおいて利用者が発話を行う度に精度の高い音声認識率推定を行うことが可能となる。これにより、推定音声認識率を用いて対話シナリオ制御を行う音声対話システムにおいて、対話ターン数の期待値の最小化、すなわち、音声対話を利用した情報サービスを提供する際に、サービスを開始してから完了するまでの対話時間の長さの期待値を最小化することができるため、利用者がサービス利用の目的を達成するまでに要する時間の短縮化を図ることが可能になるという効果が得られる。
【図面の簡単な説明】
【図１】本発明の一実施形態の音声対話システムにおける音声認識部の構成を示すブロック図である。
【図２】図１に示す認識率推定部３の動作を示すフローチャートである。
【図３】合計スコアと認識率結果から認識率推定曲線を算出する方法を示した図である。
【符号の説明】
１・・・音響分析部
２・・・言語分析部
３・・・認識率推定部
４・・・探索部
５・・・対話制御部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition rate estimation method and a speech recognition rate estimation program, and an information service providing system having a speech recognition function for speech input from a user, such as a telephone, a mobile phone, a portable information terminal, and an in-vehicle information terminal. The present invention relates to a method for estimating an expected speech recognition rate at the time of speech input.
[0002]
[Prior art]
If the recognition rate in the case of a certain number of vocabulary is known, the recognition rate in the case of n words can be estimated from an empirical rule that the recognition error rate is proportional to the square root of the number of vocabularies (for example, patents) Reference 1). However, it is extremely difficult to estimate the recognition rate from the number of recognized vocabularies because the number of users is unspecified and the noise environment around the users is not constant. Have difficulty. In addition, in the conventional spoken dialogue system, only a method for improving the recognition rate has been studied, and no method for positively estimating the recognition rate at the time of utterance input is found. In this way, in the speech recognition system, when providing information services to users, as a strategy for completing with minimal interaction, conventionally, the speech recognition rate has been improved by some technical device. It has been common practice to minimize recognition errors, but the current speech recognition technology is not yet perfect, and it is very difficult to always obtain a high recognition rate in various user environments. ing.
[0003]
[Patent Document 1]
JP 2002-278589 A [0004]
[Problems to be solved by the invention]
By the way, in a general voice dialogue system, a dialogue scenario is designed on the assumption of a certain recognition error. If it is good, the proportion of confirmation responses that are not necessary increases. Therefore, in order to minimize the expected value of the dialogue length in the utterance situation of various users, the next confirmation contents are variably determined from the probability distribution of the user request and the expected number of turns (number of dialogue exchanges). Means to do this has been proposed (Patent Document 1).
[0005]
However, in order to accurately estimate the expected number of turns, it is necessary to sequentially estimate the speech recognition rate in accordance with the user's utterance environment that changes from moment to moment. No means have been proposed, and in the actual usage environment, the estimation accuracy of the expected number of turns deteriorated, and as a result, the expected value of the dialog length may not be shortened.
[0006]
The present invention has been made in view of such circumstances, and provides a speech recognition rate estimation method and a speech recognition rate estimation program for sequentially estimating a speech recognition rate assumed in a speech dialogue system at the time of utterance input. Objective.
[0007]
[Means for Solving the Problems]
The invention according to claim 1 is a method for estimating a speech recognition rate in a speech dialogue system that provides an information service by a dialogue between a user and a computer system by speech, and by analyzing input speech A process of inputting an obtained acoustic score and a language score, a process of inputting a speech recognition result at the time of execution of an interactive operation, and a process of calculating a total score by adding the acoustic score and the language score A process of obtaining a correlation curve between the total score and the speech recognition rate in advance based on the total score and the speech recognition result; and a newly input acoustic score and language with reference to the correlation curve And a step of estimating a speech recognition rate corresponding to a total score obtained by adding the scores.
[0008]
The invention according to claim 2 is a speech recognition rate estimation program that operates in a speech dialogue system that provides an information service by dialogue between a user and a computer system by speech, and is obtained by analyzing input speech. A process of inputting the obtained acoustic score and language score, a process of inputting a speech recognition result at the time of executing the interactive operation, a process of calculating a total score by adding the acoustic score and the language score, Based on the total score and the speech recognition result, a process for obtaining a correlation curve between the total score and the speech recognition rate in advance, and a newly input acoustic score and language score with reference to the correlation curve And a process of estimating the speech recognition rate corresponding to the total score obtained by adding the above.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
A speech recognition system according to an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. Reference numeral 1 captures speech speech that is input directly through a microphone spoken by a user or indirectly through a telephone network, a mobile phone network, the Internet, a local area network, etc., extracts feature parameters, This is an acoustic analysis unit that adds an acoustic score to a word group in the dictionary and sends the result. Reference numeral 2 denotes a language analysis unit that adds a language score representing the ease of connection between words to each word string in the internal dictionary. Reference numeral 3 denotes a recognition rate estimation unit that estimates a probability that a given utterance voice is correctly recognized when a certain total score is obtained using a recognition result stored therein. Reference numeral 4 denotes a search unit that determines the recognition result from the sum of the acoustic score and the language score output from the language analysis unit 2 and sends the result. Reference numeral 5 denotes a dialogue control unit that controls the dialogue operation based on the output from the search unit 4.
[0010]
Next, the operation of the recognition rate estimation unit 3 shown in FIG. 1 will be described with reference to FIG.
First, the recognition rate estimation part 3 receives an acoustic score and a language score from the language analysis part 2 (step S1). And the recognition rate estimation part 3 calculates the total score of the received acoustic score and language score (step S2). Subsequently, the recognition rate estimation unit 3 substitutes the previously calculated total score into the approximate curve calculation formula to determine the estimated speech recognition rate (step S3). Then, the recognition rate estimation unit 3 transmits the calculated estimated speech recognition rate to the dialogue control unit 5 (step S4). The dialogue control unit 5 controls the dialogue operation by determining the next operation to be performed based on the estimated speech recognition rate.
[0011]
On the other hand, the recognition rate estimation unit 3 receives a speech recognition result (either success or failure) from the dialogue control unit 5 (step S5). Subsequently, the recognition rate estimation unit 3 adds the received recognition result to the recognition success / failure data within the total score width (range) (step S6). Then, the speech recognition rate within the total score width (range) is recalculated and held inside (step S7). Subsequently, the recognition rate estimation unit 3 recalculates the formula for calculating the optimum approximate curve based on the data held therein (step S8).
[0012]
Next, the operation of the recognition rate estimation unit 3 will be described using a specific example with reference to FIG. The numerical values of the acoustic score and language score used here are merely examples. First, an operation for obtaining an approximate curve will be described. The sum of the acoustic score and the language score sent from the language analysis unit 2 is obtained. Further, the voice recognition result (either recognition success or recognition failure) at this time is acquired from the dialogue control unit 5. Subsequently, the obtained total score is applied to a range of predetermined total score values (in FIG. 3, 0-99, 100-200,..., 500-600, 600-in increments of 100), and the total Specify the score range. For example, if the total score is 315 points, “300-400” corresponds. Then, “1” is added to the value of the number of recognition successes or the number of recognition failures based on the speech recognition result obtained from the dialogue control unit 5. By repeating this process a predetermined number of times, the recognition rate table shown in FIG. 3 is generated. Subsequently, the recognition rate estimation unit 3 calculates a recognition rate for each total score range by the following equation (1). This recognition rate is written in the recognition rate table as the recognition rate at the median value (here, 50 points, 150 points, 250 points,..., 550 points, 650 points) of each total score range. This recognition rate table is held in the recognition rate estimation unit 3.
(Number of recognition successes / (Number of recognition successes + Number of recognition failures)) × 100 (1)
[0013]
Next, the recognition rate estimation unit 3 calculates a curve that best approximates the obtained recognition rate (33, 52, 64, 82, 88, 92, 96). A known method is used as a method of obtaining this approximate curve. The obtained approximate curve (correlation curve) is held in the recognition rate estimation unit 3.
[0014]
Next, an operation for estimating the speech recognition rate based on the previously obtained approximate curve (correlation curve) will be described. The recognition rate estimation unit 3 adds the acoustic score and the language score newly sent from the language analysis unit 2 to obtain a total score. Then, the estimated speech recognition rate is obtained in light of the approximate curve held inside. For example, if the total score is 315 points, an estimated speech recognition rate of 75% can be obtained. The recognition rate estimation unit 3 transmits the estimated speech recognition rate obtained by fitting to the approximate curve to the dialogue control unit 5. This estimated speech recognition rate is sent to the dialogue control unit 5 every time speech speech is input, and the dialogue control unit 5 uses it as a parameter for controlling the dialogue scenario so as to minimize the expected value of the number of dialogue turns.
[0015]
Next, the operation for updating the approximate curve will be described. The recognition rate estimation unit 3 estimates the speech recognition rate based on the total score and the approximate curve, and then recognizes the recognition in the recognition rate table described above based on the obtained total score and the speech recognition result (success or failure). Update the number of successes or recognition failures. Then, the recognition rate is calculated again, a new approximate curve is obtained, and held inside. The estimated speech recognition rate thereafter is estimated based on the newly obtained approximate curve.
[0016]
The total score is not limited to the simple sum of the acoustic score and the language score. The total score is the result of performing an operation such as multiplying an appropriate coefficient or adding a constant to optimize the estimated speech recognition rate. It may be used.
[0017]
As described above, when an approximate curve representing the correlation between the sum of each score and the recognition rate is generated from the acoustic score, the language score, and the recognition result generated by the speech recognition system in the speech dialogue system, and when the speech input is made. Since the expected speech recognition rate can be estimated, the expected value of the number of dialogue turns is minimized, that is, until the user achieves the purpose of using the service when providing information services using voice dialogue. It is possible to shorten the time required for this. Further, as the recognition result accumulates, the accuracy of the approximate curve representing the correlation between the total score and the recognition rate can be improved.
[0018]
Note that a program for realizing the functions of the processing unit in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to estimate the speech recognition rate. Processing may be performed. The “computer system” here includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.
[0019]
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.
[0020]
【The invention's effect】
As described above, according to the present invention, it is possible to estimate a speech recognition rate with high accuracy each time a user speaks in a speech dialogue system. As a result, in a spoken dialogue system that performs dialogue scenario control using the estimated speech recognition rate, the expected value of the number of dialogue turns is minimized, that is, when providing an information service using voice dialogue, the service is started. As a result, it is possible to minimize the expectation of the length of dialogue time from completion to completion, so that the time required for the user to achieve the purpose of using the service can be shortened. It is done.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a voice recognition unit in a voice dialogue system according to an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of a recognition rate estimation unit 3 shown in FIG.
FIG. 3 is a diagram showing a method of calculating a recognition rate estimation curve from a total score and a recognition rate result.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Acoustic analysis part 2 ... Language analysis part 3 ... Recognition rate estimation part 4 ... Search part 5 ... Dialog control part

Claims

音声によって利用者とコンピュータシステムが対話を行うことにより情報サービスを提供する音声対話システムにおいて、音声認識率を推定する方法であって、
入力音声を分析することにより得られた音響スコアと言語スコアとを入力する過程と、
対話動作の実行時における音声認識結果を入力する過程と、
前記音響スコアと前記言語スコアとを加算することにより合計スコアを計算する過程と、
前記合計スコアと前記音声認識結果とに基づいて、該合計スコアと音声認識率との相関曲線を予め求めておく過程と、
前記相関曲線を参照して、新たに入力された音響スコアと言語スコアとを加算した合計スコアに対応する音声認識率を推定する過程と、
を有することを特徴とする音声認識率推定方法。A method for estimating a speech recognition rate in a speech dialogue system that provides an information service by a dialogue between a user and a computer system,
A process of inputting an acoustic score and a language score obtained by analyzing the input speech;
A process of inputting a speech recognition result at the time of executing a dialogue operation;
Calculating a total score by adding the acoustic score and the language score;
A process of obtaining a correlation curve between the total score and the speech recognition rate in advance based on the total score and the speech recognition result;
Referring to the correlation curve, estimating a speech recognition rate corresponding to a total score obtained by adding a newly input acoustic score and a language score;
A speech recognition rate estimation method characterized by comprising:

音声によって利用者とコンピュータシステムが対話を行うことにより情報サービスを提供する音声対話システムにおいて動作する音声認識率推定プログラムであって、
入力音声を分析することにより得られた音響スコアと言語スコアとを入力する処理と、
対話動作の実行時における音声認識結果を入力する処理と、
前記音響スコアと前記言語スコアとを加算することにより合計スコアを計算する処理と、
前記合計スコアと前記音声認識結果とに基づいて、該合計スコアと音声認識率との相関曲線を予め求めておく処理と、
前記相関曲線を参照して、新たに入力された音響スコアと言語スコアとを加算した合計スコアに対応する音声認識率を推定する処理と、
をコンピュータに行わせることを特徴とする音声認識率推定プログラム。A speech recognition rate estimation program that operates in a speech dialogue system that provides information services by dialogue between a user and a computer system,
A process of inputting an acoustic score and a language score obtained by analyzing the input speech;
A process of inputting a speech recognition result at the time of executing a dialogue operation;
A process of calculating a total score by adding the acoustic score and the language score;
Based on the total score and the speech recognition result, a process for obtaining a correlation curve between the total score and the speech recognition rate in advance;
A process of estimating a speech recognition rate corresponding to a total score obtained by adding a newly input acoustic score and a language score with reference to the correlation curve;
A speech recognition rate estimation program characterized by causing a computer to perform.