JP4565768B2

JP4565768B2 - Voice recognition device

Info

Publication number: JP4565768B2
Application number: JP2001120777A
Authority: JP
Inventors: 修一松本
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2001-04-19
Filing date: 2001-04-19
Publication date: 2010-10-20
Anticipated expiration: 2021-04-19
Also published as: JP2002311991A

Description

【０００１】
【発明の属する技術分野】
本発明は音声認識装置に係わり、特に、各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語を音声入力単語であると認識する音声認識装置に関する。
【０００２】
【従来の技術】
カーナビゲーションシステムの入力手段としてリモコンや音声認識装置が使われており、これらにより、ユーザーは目的地の設定、周辺施設の検索など、ナビゲーション操作の大半を実現することができる。特に、音声入力はリモコンのように画面を見る必要がなく、しかも、キー操作が不要であり、有効な手段である。
音声認識方式には、尤もらしい候補単語のうち音声入力された単語である確率が最大となる候補単語を認識結果とする方法(確率モデルによる音声認識方法)がある。しかし、音声認識率は、音声入力時における車室内のノイズやオーディオ音の有無、話者の違い、前後に接続する音素の違い(調和結合)などにより大きく影響を受け、100%の認識率を達成することができず、誤認識が発生している。
【０００３】
【発明が解決しようとする課題】
従来の確率モデルによる音声認識方法では、ユーザーが音声入力をした際、誤認識しても候補単語群の中でも最も確率の高い単語をカーナビゲーションシステムに入力する。カーナビゲーションシステムは、この誤った認識単語をユーザーからのコマンドとして受け取り、該認識単語を画面表示あるいは音声ガイドでユーザに通知するが、正誤の確認をせず該コマンドを実行する。そのため、ナビゲーションシステムは誤った動作をしてしまい、一つ前の元の状態に戻す作業や再入力する必要が頻繁に生じ、使い勝手が悪い問題がある。一方、音声入力のたびに毎回音声ガイドにより認識単語が正しいか否かの確認をする方法もある。かかる方法によれば、誤ってコマンドを実行することはないが、ユーザーの入力ステップ数(スイッチあるいはキー操作回数)が増えてしまい、操作性が悪い問題がある。
以上から本発明の目的は、ユーザーの入力ステップ数を減小でき、しかも、正しいコマンドを指示できるようにすることである。
【０００４】
【課題を解決するための手段】
本発明は、各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語を音声入力単語であると認識する音声認識装置であり、音声入力された単語の候補単語を検索すると共に、各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語とその最大確率を出力する音声認識部、前記最大確率と予め設定されている設定確率とを比較し、該最大確率が設定確率より大きいとき、前記認識された単語に応じたコマンドの実行を被制御装置に指示すると共に、該設定確率の変更制御を行う音声ガイド制御部を備え、該音声ガイド制御部は、前記被制御装置において前記指示により実行したコマンドの取消し操作があったとき、前記設定確率を増大し、コマンドの取消し操作がないとき、前記設定確率を減小する。
以上のようにすれば、ユーザーの入力ステップ数(キーまたはスイッチ操作回数)を減小でき、しかも、正しいコマンドをナビゲーションシステムに指示することができる。
【０００５】
【発明の実施の形態】
（Ａ）システム構成
図１は本発明の音声認識システムよりナビゲーションシステムに音声でコマンドを入力する場合のシステム構成図である。
音声認識システム１０において、マイクロホン１１は話者音声を検出し、音響分析部１２はマイクロホンより入力された音声波形データの分析、変換を行なって、たとえば短時間スペクトル等の時系列データ（ベクトル系列）を発生する。
音声辞書データベース１３は、単語ＩＤに対応させて単語文字列及び該単語のスペクトル時系列データ（音声パターン）を保持する。音響モデル記憶部１４は、各候補単語が音声入力された単語である確率を算出するための音響モデルを記憶するものであり、たとえばHMM(Hidden Markov Model)法に基づいて各単語や音素を標準的な確率状態遷移機械(マルコフモデル)で表現する。音声認識エンジン１５は、(1) パターンマッチングにより入力音声と類似する複数の候補単語を検索すると共に、(2) 各候補単語が音声入力された単語である確率を算出し、最大確率の候補音声を音声入力単語であると認識し、(3) 該認識された単語と最大確率とを音声ガイド制御部１６に入力する。音声ガイド制御部１６は後述する認識単語のナビゲーションシステムへの出力及び設定確率αの変更制御を行なう。
【０００６】
ナビゲーションシステム２０において、プロセッサ（ＣＰＵ）２１は、入力機器（たとえばリモコン）２２あるいは音声認識システム１０からの指示にしたがって所定のナビゲーション制御を行ない、車輌周辺の地図、誘導経路、交差点拡大図等を表示部２３に表示したり、交差点までの距離や交差点での進行方向をスピーカ２４より出力する。又、プロセッサ２１は音声認識システム１０から最大確率の高い単語が入力されて該単語に応じたコマンドを実行した場合において、戻しボタンが操作された時、コマンド実行前の状態に戻すと共に該戻しボタンが操作されたことを音声認識システム１０に通知する。又、プロセッサ２１はナビゲーションシステム１０から最大確率の低い単語が入力されたとき、入力された単語の正誤を問うために該単語の表示あるいは音声出力を行ない、正誤入力に応じた処理を実行する。正誤入力、戻し入力は入力機器に設けたボタンを使用しても良いし、表示部２３に正誤入力メニュー、戻しメニューを表示し、メニュー選択することにより行なってもよい。
【０００７】
（Ｂ）確率モデルによる音声認識方法
図２は確率モデルによる音声認識方法の概略説明図であり、
一般に音声認識システムは、図２に示すように、音響分析部１２とそれに続く音声認識エンジン１５から成り、音声検出部（マイクロホン）１１と音響分析部１２を一つの音響チャネルとしてモデル化している。話者は入力すべき単語ｗから、その話者の発話習慣に従って音声波形ｓを生成して出力し、音声検出部１１は該話者音声を検出して音響分析部１２に入力し、音響分析部１２は音声波形データの分析・変換を行って、たとえば短時間スペクトルなどの時系列データ(ベクトル系列)を取得して音声認識エンジン１５に入力する。
音声認識エンジン１５は入力されたスペクトル時系列データから複数の候補単語ｙを決定し、候補単語のうち確率が最大となる候補単語を入力単語として推定し、推定値ｗ^∧を出力する。ｗ^∧はベイズ則(Bayes theorem)によって、次式を満たすように推定される。
【０００８】
Ｐ(w^∧|y)=max[Ｐ(y|w)Ｐ(w)／Ｐ(y)] (1)
上式において、Ｐ(y|w)は候補単語ｙが入力単語ｗである確率(条件つき確率)、Ｐ(w)は単語ｗが発声される事前確率である。上式において、Ｐ(y)はｗに無関係であるので無視することができ、条件つき確率Ｐ(y|w)は音響モデルより得られ、事前確率 P(w)は言語モデルにより得られる。すなわち、音声認識エンジン１５は、(1) パターンマッチングにより入力音声と類似する複数の候補単語ｙを検索すると共に、(2) 各候補単語ｙが入力単語ｗである確率Ｐ(y|w)を算出し、最大確率の候補単語ｙを入力単語であると推定して推定値ｗ^∧を出力する。
【０００９】
(1)式の条件つき確率Ｐ(y|W)を求めるための音響モデルとして隠れマルコフモデル(Hidden Markov Model:HMM)があり、HMM法では、各単語を標準的な確率状態遷移機械(マルコフモデル)で表現する。音声認識に用いられるHMMは、left-to- right型で一つの初期状態と一つの最終状態がある構造が多く、図３は最もよく用いられるベイキス(Bakis)モデルとよばれる型の例である。
図３の状態遷移のアークに付けられた数値ａ_ijは、状態ｑ_iから状態ｑ_jへの状態遷移確率を表し、状態数をＳとするとＳ×Ｓの行列で表現できる。通常、音声パターンには、時間的な非可逆性の性質があるから、ｉ＞ｊならａ_ij=0であり、又、状態ｑ_iから全状態ｑ_j(j=1,2,..)に移る状態遷移確率ａ_ij(j=1,2,..)の和は1.0である。ｂ_ij(k)は状態ｑ_iから状態ｑ_jへの遷移で種々のスペクトルパターンがそれぞれ観測(出力)される出現確率で、｛ｂ_ij(k)｝は出現確率行列とよばれ、出現確率行列を構成する行列要素の和は1.0となる。
【００１０】
図３における数値例は、以後の説明のために特に簡略化したものであり、出力シンボル(音素)を{a,b}の二つに限り、図の[ ]内にａ，ｂそれぞれの出現確率を示している。この例では、遷移確率行列は、
【数１】

となり、初期状態確率π₁=1、π_i=0 (i>1),Ｆ={ｑ₄}である。
ｙ=y₁,y₂,・・・,y_rを候補単語ｙの出力シンボルの観測系列、具体的にはスペクトルの時系列パターンとする。このとき、各HMMモデルによって候補単語ｙが音声入力された単語である確率Ｐ(y|M)(MはHMMによって表現される入力単語)を求め、最大確率を与える候補単語を選んで、これを認識結果とする。
【００１１】
図３の例ついて、候補単語ｙのシンボル系列"ａｂｂ"が出力される確率を求める。状態遷移系列は時間を横に状態を縦に並べた図４の平面で左上隅から右下隅に至る経路に対応し、次の7通りである。
q₁→q₁→q₂→q₄Ｐ₁=0.008640
q₁→q₁→q₃→q₄Ｐ₂=0.006912
q₁→q₂→q₂→q₄Ｐ₃=0.029400
q₁→q₂→q₃→q₄Ｐ₄=0.012600
q₁→q₂→q₄→q₄Ｐ₅=0.075600
q₁→q₃→q₃→q₄Ｐ₆=0.001728
q₁→q₃→q₄→q₄Ｐ₇=0.038880
それぞれの確率Ｐ₁〜Ｐ₇は上に示す通りとなるので、ｙ＝“ａｂｂ”が入力単語である確率は次式
Ｐ(abb|M)＝Ｐ₁+Ｐ₂+Ｐ₃+Ｐ₄+Ｐ₅+Ｐ₆+Ｐ₇=0.17376
となる。同様に、候補単語ｙ′のシンボル系列"ａａｂ"が出力される確率を求めると図５に示すように
Ｐ(aab|M)＝0.11598
となる。2つの候補ｙ，ｙ′のうち、確率が大きな候補単語ｙを入力単語であると認識する。
【００１２】
（Ｃ）本発明の音声認識処理
図６は本発明の音声認識処理フローであり、点線内はナビゲーションシステムが実行する処理である。
ユーザが音声入力すると（ステップ１０１）、マイクロホン１１はユーザが発した音声を検出して音響分析部１２に入力し、音響分析部１２は入力された音声波形データの分析、変換を行なってスペクトル時系列データ（音声パターン）を音声認識エンジン１５に入力する。音声認識エンジン１５は音声辞書データベース１３を参照して入力された音声パターンと類似する音声パターンを有する複数の候補を検索する（ステップ１０２）。
ついで、音声認識エンジン１５は隠れマルコフモデル(HMM)１４を用いて各候補単語が入力単語である確率を演算し、最大確率の候補単語を入力単語であると認識し、該単語と最大確率を音声ガイド制御部１６に入力する(ステップ１０３）。音声ガイド制御部１６は最大確率と設定確率αの大小を比較し（ステップ１０４）、最大確率≧αであれば、認識単語をナビゲーションシステム２０に入力すると共に該単語に応じたコマンドを実行するよう指示する。これによりナビゲーションシステムは入力された単語に応じたコマンドを実行する（ステップ１０５）。
【００１３】
ユーザはナビゲーションシステムが音声指示した通りの制御を行なえば戻しボタンの操作をせず、一方、音声指示と異なる制御を行なえば戻しボタン操作を行なう。したがって、プロセッサ２１は、コマンド実行後所定時間内に戻しボタン操作があるかチェックし（ステップ１０６）、戻しボタン操作があれば、コマンド実行前の状態に戻すと共に（ステップ１０７）、戻しボタン操作があったことを音声ガイド制御部１６に通知する。又、プロセッサ２１はステップ１０６において戻しボタン操作がなければその旨を音声ガイド制御部１６に通知する。
音声ガイド制御部１６は戻しボタン操作がなければ、設定確率αを所定値Δα減小し（ステップ１０８）、次の音声入力を待つ。このように、設定確率αを小さくすれば、ステップ１０４において最大確率が設定確率αより大きくなるケースが多くなり、音声入力だけでコマンドを実行することができるようになる。
一方、音声ガイド制御部１６は戻しボタン操作があったことが通知されれば、設定確率を所定値Δα増加し（ステップ１０９）、次の音声入力を待つ。このようにすれば、該設定確率αがステップ１０８で小さくなり過ぎても、適正な値になるように補正することができる。
【００１４】
ステップ１０４において、最大確率が設定確率αより小さければ、音声ガイド制御部１６は認識単語をナビゲーションシステム２０に入力すると共に該単語を表示あるいは音声出力するよう指示する。これによりナビゲーションシステムのプロセッサ２１は入力された単語を表示部２３に表示し、あるいは音声出力し、認識単語の正誤入力を待つ（ステップ１１０）。
正誤入力があれば、認識単語が誤りであったのか、正しかったのかチェックし（ステップ１１１）、認識単語の誤りが入力されれば、プロセッサ２１は音声ガイド制御部１６に認識単語の誤りを通知する。これにより、音声ガイド制御部１６は設定確率αを所定値Δα増加し（ステップ１０９）、次の音声入力を待つ。
このようにすれば、設定確率が小さすぎる傾向にある場合でも該設定確率αを増加して適正値に補正できる。
【００１５】
一方、認識単語が正しければ、プロセッサ２１は該認識単語に応じたコマンドを実行すると共に、音声ガイド制御部１６に認識単語が正しいことを通知する（ステップ１１２）。これにより、音声ガイド制御部１６は設定確率αを所定値Δα減小し（ステップ１０８）、次の音声入力を待つ。このようにすれば、以後ステップ１０４において「YES」となってステップ１０５でコマンドを実行する機会が多くなり、音声入力だけでコマンドを実行することができるようになる。
以上では、音声認識結果に基づいてナビゲーションシステムを制御する場合について説明したが、本発明の音声認識方法は任意の装置を音声入力制御する場合に適用できる。
【００１６】
【発明の効果】
以上本発明によれば、各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語を音声入力単語であると認識する音声認識方法において、最大確率が設定確率より大きいとき、前記認識された単語に応じたコマンドの実行を指示し、実行されたコマンドの取消しが指示されなければ前記設定確率を減小し、該実行されたコマンドの取消しが指示されれば前記設定確率を増大するようにしたから、ユーザーの入力ステップ数(キーまたはスイッチ操作回数)を減小でき、しかも、音声入力単語を正しく認識して認識単語に応じたコマンドを指示することができる。
また、本発明によれば、最大確率が設定確率より小さいとき、認識単語を表示し、あるいは音声出力し、認識結果の誤りが入力されれば前記設定確率を増大し、認識結果の正しさが入力されれば認識単語に応じたコマンドの実行を指示すると共に、前記設定確率を減小するようにしたから、ますますユーザーの入力ステップ数を減小でき、しかも、音声入力単語を正しく認識して認識単語に応じたコマンドを指示することができる。
【図面の簡単な説明】
【図１】本発明の音声認識システムよりナビゲーションシステムに音声でコマンドを入力する場合のシステム構成図である。
【図２】確率モデルによる音声認識方法の概略説明図である。
【図３】音声認識に用いられるHMMにおけるベイキス(Bakis)モデルの例である。
【図４】候補単語ｙ＝“ａｂｂ”の確率算出説明図である。
【図５】候補単語ｙ′＝“ａａｂ”の確率算出説明図である。
【図６】本発明の音声認識処理フローである。
【符号の説明】
１０・・音声認識システム
１１・・マイクロホン
１２・・音響分析部
１３・・音声辞書データベース
１４・・音響モデル記憶部
１５・・音声認識エンジン
１６・・音声ガイド制御部
２０・・ナビゲーションシステム
２１・・プロセッサ（ＣＰＵ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition apparatus, in particular, each candidate word to calculate the probability that a word is speech input, a speech recognition apparatus recognizes that the probability is voice input word up candidate words.
[0002]
[Prior art]
A remote controller or a voice recognition device is used as an input means of the car navigation system, and the user can realize most of navigation operations such as setting a destination and searching for nearby facilities. In particular, voice input is an effective means that does not require viewing the screen as with a remote controller, and does not require key operations.
As a speech recognition method, there is a method (a speech recognition method based on a probability model) in which a candidate word having a maximum probability of being a speech input word among likely candidate words is used as a recognition result. However, the speech recognition rate is greatly affected by the presence or absence of noise in the passenger compartment or audio sound at the time of speech input, differences in speakers, differences in phonemes connected in the front and rear (harmonic coupling), etc., and the recognition rate is 100%. It cannot be achieved and misrecognition has occurred.
[0003]
[Problems to be solved by the invention]
In a conventional speech recognition method based on a probability model, when a user performs speech input, a word with the highest probability in the candidate word group is input to the car navigation system even if it is erroneously recognized. The car navigation system receives the erroneously recognized word as a command from the user, and notifies the user of the recognized word through a screen display or a voice guide, but executes the command without confirming correctness. For this reason, the navigation system malfunctions, and it is frequently necessary to return to the previous state or to re-input, resulting in poor usability. On the other hand, there is a method of confirming whether or not the recognized word is correct by voice guidance every time voice is input. According to such a method, the command is not executed by mistake, but the number of user input steps (the number of switches or key operations) increases, and there is a problem that the operability is poor.
As described above, an object of the present invention is to reduce the number of user input steps and to be able to designate a correct command.
[0004]
[Means for Solving the Problems]
The present invention is a speech recognition device that calculates a probability that each candidate word is a word input by speech and recognizes a candidate word having the highest probability as a speech input word. Searching, calculating a probability that each candidate word is a word input by voice, a candidate word having the maximum probability and a voice recognition unit for outputting the maximum probability, the maximum probability and a preset set probability Comparing, when the maximum probability is greater than the set probability, the control unit instructs the controlled device to execute a command according to the recognized word, and includes a voice guide control unit that performs change control of the set probability. The guide control unit increases the setting probability when there is a command canceling operation executed by the instruction in the controlled device, and decreases the setting probability when there is no command canceling operation.
By doing so, the number of user input steps (number of key or switch operations) can be reduced, and a correct command can be instructed to the navigation system.
[0005]
DETAILED DESCRIPTION OF THE INVENTION
(A) System Configuration FIG. 1 is a system configuration diagram when a voice command is input to the navigation system by the voice recognition system of the present invention.
In the speech recognition system 10, the microphone 11 detects speaker speech, and the acoustic analysis unit 12 analyzes and converts speech waveform data input from the microphone, for example, time series data (vector series) such as a short-time spectrum. Is generated.
The speech dictionary database 13 stores a word character string and spectrum time-series data (speech pattern) of the word corresponding to the word ID. The acoustic model storage unit 14 stores an acoustic model for calculating the probability that each candidate word is a speech input word. For example, each word or phoneme is standardized based on the HMM (Hidden Markov Model) method. It is expressed by a typical stochastic state transition machine (Markov model). The speech recognition engine 15 (1) searches for a plurality of candidate words similar to the input speech by pattern matching, and (2) calculates the probability that each candidate word is a speech input word, and has the maximum probability of the candidate speech Is recognized as a voice input word, and (3) the recognized word and the maximum probability are input to the voice guide control unit 16. The voice guide control unit 16 controls the output of the recognized word, which will be described later, to the navigation system and the setting probability α.
[0006]
In the navigation system 20, a processor (CPU) 21 performs predetermined navigation control in accordance with an instruction from an input device (for example, a remote controller) 22 or the voice recognition system 10, and displays a map around the vehicle, a guide route, an enlarged intersection map, and the like. The information is displayed on the unit 23, and the distance to the intersection and the traveling direction at the intersection are output from the speaker 24. In addition, when a word having a high probability is input from the speech recognition system 10 and a command corresponding to the word is executed by the processor 21, when the return button is operated, the processor 21 returns to the state before the command execution and the return button. Is notified to the voice recognition system 10. Further, when a word with a low maximum probability is input from the navigation system 10, the processor 21 displays the word or outputs a voice in order to ask whether the input word is correct or not, and executes processing corresponding to the correct / incorrect input. Correct / incorrect input and return input may be performed by using buttons provided on the input device, or by displaying a correct / incorrect input menu and return menu on the display unit 23 and selecting a menu.
[0007]
(B) Speech recognition method based on probability model FIG. 2 is a schematic explanatory diagram of a speech recognition method based on a probability model,
In general, as shown in FIG. 2, the speech recognition system includes an acoustic analysis unit 12 and a subsequent speech recognition engine 15. The speech detection unit (microphone) 11 and the acoustic analysis unit 12 are modeled as one acoustic channel. The speaker generates and outputs a speech waveform s from the word w to be input according to the speech habit of the speaker, and the speech detection unit 11 detects the speaker speech and inputs it to the acoustic analysis unit 12 for acoustic analysis. The unit 12 analyzes and converts the voice waveform data, acquires time series data (vector series) such as a short-time spectrum, and inputs it to the voice recognition engine 15.
The speech recognition engine 15 determines a plurality of candidate words y from the input spectral time-series data, estimates a candidate word with the highest probability among the candidate words as an input word, and outputs an estimated value w ^∧ . w ^推定 is estimated by Bayes theorem so as to satisfy the following equation.
[0008]
P (w ^∧ | y) = max [P (y | w) P (w) / P (y)] (1)
In the above equation, P (y | w) is a probability (conditional probability) that the candidate word y is the input word w, and P (w) is a prior probability that the word w is uttered. In the above equation, P (y) is irrelevant to w and can be ignored, the conditional probability P (y | w) is obtained from the acoustic model, and the prior probability P (w) is obtained from the language model. That is, the speech recognition engine 15 (1) searches for a plurality of candidate words y similar to the input speech by pattern matching, and (2) determines the probability P (y | w) that each candidate word y is the input word w. Calculate, estimate the candidate word y with the highest probability as an input word, and output the estimated value w ^∧ .
[0009]
There is a Hidden Markov Model (HMM) as an acoustic model for obtaining the conditional probability P (y | W) of Eq. (1). In the HMM method, each word is represented by a standard stochastic state transition machine (Markov model). Model). Many HMMs used for speech recognition have a left-to-right type structure with one initial state and one final state, and Fig. 3 shows an example of a type called the Bakis model that is most often used. .
The numerical value a _ij attached to the arc of the state transition in FIG. 3 represents the state transition probability from the state q _i to the state q _j , and can be expressed as an S × S matrix where S is the number of states. Usually, since a speech pattern has a time irreversible property, if i> j, a _ij = 0, and from state q _i to all states q _j (j = 1, 2,...) The sum of the state transition probabilities a _ij (j = 1, 2,...) Going to is 1.0. b _ij (k) is an appearance probability that various spectrum patterns are observed (output) at the transition from the state q _i to the state q _j , and {b _ij (k)} is called an appearance probability matrix. The sum of the matrix elements constituting the matrix is 1.0.
[0010]
The numerical example in FIG. 3 is particularly simplified for the following explanation. The output symbols (phonemes) are limited to two {a, b}, and each of a and b appears in [] in the figure. Probability is shown. In this example, the transition probability matrix is
[Expression 1]

The initial state probabilities π ₁ = 1, π _i = 0 (i> 1), and F = {q ₄ }.
y = y ₁ , y ₂ ,..., y _r are observation sequences of output symbols of the candidate word y, specifically, a time series pattern of the spectrum. At this time, the probability P (y | M) (M is an input word expressed by HMM) that the candidate word y is a voice input word by each HMM model is obtained, and the candidate word that gives the maximum probability is selected, Is the recognition result.
[0011]
For the example of FIG. 3, the probability that the symbol series “abb” of the candidate word y is output is obtained. The state transition series correspond to the route from the upper left corner to the lower right corner on the plane of FIG.
q ₁ → q ₁ → q ₂ → q ₄ P ₁ = 0.008640
q ₁ → q ₁ → q ₃ → q ₄ P ₂ = 0.006912
q ₁ → q ₂ → q ₂ → q ₄ P ₃ = 0.029400
q ₁ → q ₂ → q ₃ → q ₄ P ₄ = 0.012600
q ₁ → q ₂ → q ₄ → q ₄ P ₅ = 0.075600
q ₁ → q ₃ → q ₃ → q ₄ P ₆ = 0.001728
q ₁ → q ₃ → q ₄ → q ₄ P ₇ = 0.038880
Since the respective probabilities P _{1 to} P ₇ are as shown above, the probability that y = “abb” is an input word is expressed by the following equation: P (abb | M) = P ₁ + P ₂ + P ₃ + P ₄ + P ₅ + P ₆ + P ₇ = 0.17376
It becomes. Similarly, when the probability that the symbol series “aab” of the candidate word y ′ is output is obtained, P (aab | M) = 0.11598 as shown in FIG.
It becomes. Of the two candidates y and y ′, the candidate word y having a high probability is recognized as the input word.
[0012]
(C) Voice Recognition Processing of the Present Invention FIG. 6 is a flowchart of the voice recognition processing of the present invention. The dotted line is the processing executed by the navigation system.
When the user inputs voice (step 101), the microphone 11 detects the voice uttered by the user and inputs it to the acoustic analysis unit 12, and the acoustic analysis unit 12 analyzes and converts the input voice waveform data to obtain the spectral time. The series data (voice pattern) is input to the voice recognition engine 15. The speech recognition engine 15 searches a plurality of candidates having speech patterns similar to the speech pattern input with reference to the speech dictionary database 13 (step 102).
Next, the speech recognition engine 15 uses the hidden Markov model (HMM) 14 to calculate the probability that each candidate word is an input word, recognizes the candidate word with the maximum probability as the input word, and determines the maximum probability with the word. Input to the voice guide controller 16 (step 103). The voice guide control unit 16 compares the maximum probability and the set probability α (step 104). If the maximum probability ≧ α, the recognition word is input to the navigation system 20 and a command corresponding to the word is executed. Instruct. As a result, the navigation system executes a command corresponding to the input word (step 105).
[0013]
The user does not operate the return button if the navigation system performs the control as instructed by voice, while the user performs the return button operation if the control different from the voice instruction is performed. Therefore, the processor 21 checks whether there is a return button operation within a predetermined time after the command is executed (step 106). If there is a return button operation, the processor 21 returns to the state before the command execution (step 107). The voice guide control unit 16 is notified of the occurrence. If there is no return button operation in step 106, the processor 21 notifies the voice guide controller 16 to that effect.
If there is no return button operation, the voice guide controller 16 decreases the set probability α by a predetermined value Δα (step 108) and waits for the next voice input. As described above, if the setting probability α is reduced, the maximum probability becomes larger than the setting probability α in step 104, and the command can be executed only by voice input.
On the other hand, when notified that the return button operation has been performed, the voice guide control unit 16 increases the set probability by a predetermined value Δα (step 109) and waits for the next voice input. In this way, even if the set probability α becomes too small at step 108, it can be corrected to an appropriate value.
[0014]
In step 104, if the maximum probability is smaller than the set probability α, the voice guide control unit 16 inputs the recognized word to the navigation system 20 and instructs the navigation system 20 to display or output the word. As a result, the processor 21 of the navigation system displays the input word on the display unit 23 or outputs it as a voice, and waits for correct / incorrect input of the recognized word (step 110).
If there is a correct / incorrect input, it is checked whether the recognized word is correct or correct (step 111). If the recognized word error is input, the processor 21 notifies the voice guide control unit 16 of the recognized word error. To do. As a result, the voice guide controller 16 increases the set probability α by a predetermined value Δα (step 109), and waits for the next voice input.
In this way, even when the setting probability tends to be too small, the setting probability α can be increased and corrected to an appropriate value.
[0015]
On the other hand, if the recognized word is correct, the processor 21 executes a command corresponding to the recognized word and notifies the voice guide control unit 16 that the recognized word is correct (step 112). As a result, the voice guide controller 16 reduces the set probability α by a predetermined value Δα (step 108) and waits for the next voice input. In this way, “YES” is obtained in step 104 thereafter, and the opportunity to execute the command in step 105 increases, and the command can be executed only by voice input.
The case where the navigation system is controlled based on the voice recognition result has been described above, but the voice recognition method of the present invention can be applied to the case where voice input control is performed on an arbitrary device.
[0016]
【The invention's effect】
As described above, according to the present invention, in the speech recognition method for calculating the probability that each candidate word is a word input by speech and recognizing the candidate word having the maximum probability as a speech input word, the maximum probability is greater than the set probability. Instructing the execution of the command according to the recognized word, the setting probability is reduced if the cancellation of the executed command is not instructed, and the setting is performed if the cancellation of the executed command is instructed Since the probability is increased, the number of input steps (keys or switch operations) of the user can be reduced, and a command corresponding to the recognized word can be designated by correctly recognizing the voice input word.
Further, according to the present invention, when the maximum probability is smaller than the set probability, the recognition word is displayed or output as a voice, and if the recognition result error is input, the set probability is increased, and the correctness of the recognition result is If it is input, the command execution according to the recognized word is instructed and the set probability is reduced, so that the number of input steps of the user can be further reduced, and the voice input word is recognized correctly. A command corresponding to the recognized word can be instructed.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram when a voice command is input to a navigation system by the voice recognition system of the present invention.
FIG. 2 is a schematic explanatory diagram of a speech recognition method based on a probability model.
FIG. 3 is an example of a Bakis model in an HMM used for speech recognition.
FIG. 4 is an explanatory diagram of probability calculation of a candidate word y = “abb”.
FIG. 5 is an explanatory diagram of probability calculation of a candidate word y ′ = “aab”.
FIG. 6 is a speech recognition processing flow of the present invention.
[Explanation of symbols]
10. Voice recognition system 11. Microphone 12. Acoustic analysis unit 13. Voice dictionary database 14. Acoustic model storage unit 15. Voice recognition engine 16. Voice guide control unit 20. Navigation system 21. Processor (CPU)

Claims

各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語を音声入力単語であると認識する音声認識装置において、
音声入力された単語の候補単語を検索すると共に、各候補単語が音声入力された単語である確率を算出し、確率が最大の候補単語とその最大確率を出力する音声認識部、
前記最大確率と予め設定されている設定確率とを比較し、該最大確率が設定確率より大きいとき、前記認識された単語に応じたコマンドの実行を被制御装置に指示すると共に、該設定確率の変更制御を行う音声ガイド制御部、
を備え、該音声ガイド制御部は、前記被制御装置において前記指示により実行したコマンドの取消し操作があったとき、前記設定確率を増大し、コマンドの取消し操作がないとき、前記設定確率を減小することを特徴とする音声認識装置。In the speech recognition device that calculates the probability that each candidate word is a word input by speech and recognizes the candidate word having the maximum probability as a speech input word,
A speech recognition unit that searches for candidate words of words input by speech, calculates a probability that each candidate word is a word input by speech, and outputs a candidate word having the maximum probability and its maximum probability,
The maximum probability is compared with a preset probability, and when the maximum probability is greater than the preset probability, the controlled device is instructed to execute a command according to the recognized word, and the set probability is A voice guide control unit for performing change control,
The voice guide control unit increases the setting probability when there is a command canceling operation executed by the instruction in the controlled device, and decreases the setting probability when there is no command canceling operation. A speech recognition apparatus characterized by:

前記音声ガイド制御部は、最大確率が設定確率より小さいとき、被制御装置に認識単語を表示し、あるいは認識単語を音声出力するよう指示し、被制御装置において、認識結果の誤りが入力されたとき、前記設定確率を増大し、認識結果が正しければ前記設定確率を減小することを特徴とする請求項１記載の音声認識装置。When the maximum probability is smaller than the set probability, the voice guide control unit displays the recognized word on the controlled device or instructs the controlled device to output the recognized word by voice, and an error in the recognition result is input to the controlled device. The speech recognition apparatus according to claim 1 , wherein the setting probability is increased and the setting probability is decreased if the recognition result is correct.