JP2004184716A

JP2004184716A - Speech recognition apparatus

Info

Publication number: JP2004184716A
Application number: JP2002351961A
Authority: JP
Inventors: Takeshi Ono; 健大野; Daisuke Saito; 大介斎藤
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2002-12-04
Filing date: 2002-12-04
Publication date: 2004-07-02
Anticipated expiration: 2022-12-04
Also published as: JP4178931B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition apparatus with a high speech recognition rate. <P>SOLUTION: In the initial state, a signal processing part 3 regards network grammar including an unknown word as an object to be recognized. When, however, a user operates a correction switch 14 for correcting a recognition result and speaks again, the user clearly recognizes the speech contents and unknown words such as "Well" are less included. In this case, network grammar including no unknown word is set to prevent misrecognition due to inclusion of an unknown word. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声を認識する音声認識装置に関する。
【０００２】
【従来の技術】
【特許文献１】特開平６−２６６３８６号公報
従来、音声認識を行う装置として、たとえば特開平６−２６６３８６号公報に開示されたようなものがある。これは入力音声の時刻に同期して入力音声中に存在するキーワードを検出し、音声認識を行っている。
【０００３】
【発明が解決しようとする課題】
このような上記従来の音声認識装置にあっては、使用者が未知語を発話する可能性が少ない場合においても未知語を含む発話を認識可能としているために、音声中に未知語が含まれていないにもかかわらず、音声中に未知語が含まれていると認識してしまい、音声認識率が低下するといった問題があった。たとえば地名を発話する場合において、使用者が「神奈川県横浜市旭区」と発話し、音声認識装置が未知語を含む音声を認識可能状態である場合、認識結果が「神奈川県横浜市あ瀬谷区」と、未知語である「あ」が含まれた音声であると認識されてしまい、「旭区」であるべき認識結果が「あ瀬谷区」と誤認識されてしまっていた。
【０００４】
そこで本発明はこのような問題点に鑑み、音声認識率の高い音声認識装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
本発明は、認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、認識結果の訂正を指示する誤認識訂正指示部を備え、記憶部は未知語を含まないネットワーク文法を記憶し、ネットワーク文法設定手段は、初期状態では未知語を含むネットワーク文法を認識対象とするが、誤認識訂正指示部から認識結果の訂正指示があった場合には、未知語を含まないネットワーク文法を認識対象として設定するものとした。
【０００６】
【発明の効果】
本発明によれば、音声認識装置は初期状態では未知語を含むネットワーク文法を認識対象とするが、音声の認識結果に対して誤認識訂正指示部から訂正指示があった場合、未知語を含まないネットワーク文法を認識対象として設定する。音声認識装置の使用者が訂正後の発話を行う際には、発話内容を正確に認識しており、発話中に「あー」、「えー」などの未知語が含まれることが少なくなる。よってこのような場合に、未知語を含まないネットワーク文法を認識対象とすることにより、未知語が含まれることに起因する誤認識を防止することができる。
【０００７】
【発明の実施の形態】
次に本発明の実施の形態を実施例により説明する。
以下に示す各実施例は、本発明における音声認識装置を車両のナビゲーションシステムに適用したものである。
図１に、第一の実施例における車両のナビゲーションシステムの全体構成を示す。
図示しないＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）アンテナによって受信された信号より自車両の位置を演算し、使用者に各種の情報を提示するナビゲーション制御部２が、音声の認識処理を行う信号処理部３に接続される。
【０００８】
信号処理部３はメモリやＣＰＵから構成される。信号処理部３には、音声認識を行う認識対象語とその他の未知語用の音響モデル、および認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶している記憶部６と、発話スイッチ１３および訂正スイッチ１４を備えた入力部１２とが接続される。
【０００９】
また信号処理部３には、Ｄ／Ａコンバータ７、出力アンプ８を介してスピーカ９が接続され、信号処理部３から出力されたデジタルの音声信号がＤ／Ａコンバータ７によってアナログの音声信号に変換され、出力アンプ８によって増幅されてスピーカ９から音声として出力される。
信号処理部３には、Ａ／Ｄコンバータ１０を介してマイク１１が接続され、マイク１１から入力されたアナログの音声信号がＡ／Ｄコンバータ１０によってデジタルの音声信号に変換されて信号処理部３に伝達される。
【００１０】
ナビゲーション制御部２は表示部１６およびスピーカ９に接続されており、表示部１６およびスピーカ９を通じて車両のドライバ等に位置情報等を提示する。信号処理部３、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１が構成される。
また、音声認識部１、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０が構成される。
【００１１】
次に図２のフローチャートを用いて、ナビゲーションシステムの音声認識処理の流れについて説明する。
なお本実施例においては、ナビゲーション制御部２に目的地の入力を行うために発話された地名の音声認識処理について説明する。
ステップ１００において、信号処理部３はナビゲーションシステム２０の使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ１０１へ進む。
【００１２】
ステップ１０１において、信号処理部３は記憶部６に記憶されたネットワーク文法を認識対象として設定する。ここでネットワーク文法とは地名の階層構造の文法を指すものとし、図３にその一例を示す。まずはじめに都道府県名を認識対象語として規定し、次に各都道府県に対応する市町村名のように順次地名を規定する。また未知語が挿入される可能性のある各単語の前には、図中ＵＫで示すように未知語（たとえば「あー」、「えー」、「のー」等）を認識対象語として規定する。このようにネットワーク文法として、地名の文中に未知語が出現する順序関係が規定される。
これにより使用者が地名以外の未知語を含む発話、たとえば「えー神奈川県のー横浜市のー旭区」と発話した際にも、未知語を含む地名を認識することができる。
【００１３】
図２のステップ１０２において、信号処理部３はステップ１０１において設定したネットワーク文法にもとづいて最大待受け時間を設定する。この最大待受け時間は、設定したネットワーク文法の最長の地名が発話された際にも、信号処理部３が十分に発話を受理できるように設定される。
【００１４】
ステップ１０３において、信号処理部３は音声取り込み処理を開始した旨を使用者に知らせるために、記憶部６に記憶された告知音声信号をＤ／Ａコンバータ７および出力アンプ８を通じて、スピーカ９から出力する。
【００１５】
音声取り込み開始を知らせる告知音声を聞いた使用者は、認識対象に含まれる単語の発話を行う。なお本実施例において、認識対象は図３に示すような地名とする。
マイク１１から入力された音声信号は、Ａ／Ｄコンバータ１０によってデジタル信号に変換されて信号処理部３に入力される。
【００１６】
発話スイッチ１３が操作されるまでの間、信号処理部３はＡ／Ｄコンバータ１０によって変換された音声のデジタル信号の平均パワーを演算している。発話スイッチ１３が操作された後、演算していた平均パワーに比べてデジタル信号の瞬間パワーが所定値以上大きくなったときに、ステップ１０４において、使用者が発話したと判断して音声の取り込みを開始する。
【００１７】
音声取り込みが開始されると、ステップ１０５において信号処理部３は記憶部６に記憶された認識対象語との一致度演算を開始する。一致度とは取り込まれた音声部分と個々の認識対象語とがどの程度似ているかを指し、さらにこの一致度はスコアとして得られる。本実施例において、スコアの値が大きいほど一致度が高いとする。
なお、このステップの処理を行う間も、並列して信号処理部３による音声取り込みは継続されている。
【００１８】
ステップ１０６において、発話の終端が検出されたかどうかの判断を行う。この終端の検出は、音声のデジタル信号の瞬間パワーが所定値以下の状態が所定時間以上続いた場合に、使用者の発話が終了したと判断するものである。発話の終端を検出した場合はステップ１０７へ進み、終了していない場合はステップ１１１へ進む。
【００１９】
ステップ１１１において、音声取り込み開始後、最大待受け時間を経過したかどうかの判断を行い、経過していない場合はステップ１０４へ戻る。また、最大待受け時間を経過しているときはステップ１０７へ進む。
【００２０】
ステップ１０７において、音声の取り込み処理を終了し、ステップ１０８において、信号処理部３は一致度の最も大きい認識対象語を認識結果として、Ｄ／Ａコンバータ７および出力アンプ８を通じてスピーカ９から出力する。本実施例においては、使用者が発話した「神奈川県横浜市旭区」に対し、信号処理部３は「神奈川県横浜市あ瀬谷区」（「あ」は未知語）と誤認識し、「神奈川県横浜市瀬谷区」をスピーカ９を通して出力したものとする。
【００２１】
ステップ１０９では、ステップ１０８における認識対象語の出力後、信号処理部３は所定時間内に、入力部１２に備えられた訂正スイッチ１４が操作されたかどうかの判断を行う。訂正スイッチ１４の操作があった場合はナビゲーションシステム２０の音声認識結果に対して、使用者が修正要求したと判断してステップ１１２へ進む。
【００２２】
ステップ１１２において、ネットワーク文法の再設定を行う。ここで再設定するネットワーク文法は、図４に示すように未知語を含まないものであり、ステップ１０１において設定したネットワーク文法と同様に地名の階層構造の文法を設定する。ネットワーク文法の再設定後、ステップ１０２へ戻り音声の認識処理を繰り返す。
【００２３】
一方、ステップ１０９において所定時間内に訂正スイッチ１４の操作がない場合は、使用者がナビゲーションシステム２０の認識結果を容認したと判断してステップ１１０へ進み、認識結果に応じた処理を行う。本実施例においては、信号処理部３は認識結果である地名をナビゲーション制御部２へ出力する。ナビゲーション制御部２は認識された地名を目的地として設定し、表示部１６等を通じて使用者に道案内等の情報提示を行う。
なお本実施例において、訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。また本実施例において、図２におけるステップ１０１およびステップ１１２が本発明におけるネットワーク文法設定手段を構成する。
【００２４】
本実施例は以上のように構成され、音声認識装置の認識結果を訂正するため、使用者が訂正スイッチ１４を操作して再度発話を行った場合には、使用者は発話内容を明確に認識しており、「あー」、「えー」などの未知語が含まれることが少ない。よってこのような場合には、未知語を含まないネットワーク文法を認識対象として設定することにより、未知語が含まれることに起因する誤認識を低減することができる。
【００２５】
次に第二の実施例について説明する。
なお本実施例は上記第一の実施例における信号処理部３での処理内容を変更したものである。
図５のフローチャートを用いて、本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
ステップ２００からステップ２０８は上記第一の実施例におけるステップ１００からステップ１０８と同様であり、またステップ２１０からステップ２１１は第一の実施例におけるステップ１１０からステップ１１１と同様であり説明を省略する。
【００２６】
ステップ２０９において、ステップ２０８における認識対象語の出力後、信号処理部３は所定時間内に入力部１２に備えられた訂正スイッチ１４が操作されたかどうかの判断を行う。訂正スイッチ１４の操作があった場合はナビゲーションシステム２０の音声認識結果に対して、使用者が修正要求したと判断してステップ２１２へ進む。訂正スイッチ１４の操作がない場合はステップ２１０へ進む。
【００２７】
ステップ２１２では、信号処理部３は認識結果に未知語が含まれているかどうかを判断し、未知語を含む場合はステップ２１３へ進む。また未知語を含まない場合は、ステップ２０２へ戻り、音声の認識処理を繰り返す。
ステップ２１３では、未知語を含まないネットワーク文法を認識対象として再設定し、ステップ２０２へ戻り音声の認識処理を繰り返す。
なお本実施例において、図５におけるステップ２０１およびステップ２１３が本発明におけるネットワーク文法設定手段を構成する。また訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。
【００２８】
本実施例は以上のように構成され、信号処理部３による音声の認識結果が未知語を含む場合であって、かつその認識結果が誤認識であり訂正スイッチが操作されたあとの発話に対しては、未知語を含まないネットワーク文法を認識対象として設定する。このように未知語を含む認識結果に対して訂正が指示された場合、この誤認識が未知語を含むことに起因する可能性が高い。よってこのような場合には未知語を含まないネットワーク文法を設定することにより、未知語が含まれることに起因する誤認識を低減することができる。
【００２９】
次に第三の実施例について説明する。
図６に本実施例における車両のナビゲーションシステムの全体構成を示す。
音声の認識処理を行う信号処理部３Ａの内部に、ナビゲーションシステムの音声認識処理の使用回数、すなわち音声認識処理の実行回数を記憶する使用回数記憶部２３を有している。
信号処理部３Ａ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ａが構成される。
【００３０】
また音声認識部１Ａ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０Ａが構成される。
なお本実施例において、上記第一の実施例と同じ構成については同じ番号を付して説明を省略する。
【００３１】
次に図７のフローチャートを用いて、本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
ステップ３００において、信号処理部３Ａはナビゲーションシステム２０Ａの使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ３０１へ進む。
【００３２】
ステップ３０１において、信号処理部３Ａは使用回数記憶部２３に記憶された音声認識装置の使用回数が所定値以上かどうかを判断し、所定値未満である場合はステップ３０２へ進み、使用回数が所定値以上である場合はステップ３０３へ進む。
【００３３】
ステップ３０２において、信号処理部３Ａは、記憶部６に記憶された図３に示すような未知語を含むネットワーク文法を認識対象として設定する。
ステップ３０３において、信号処理部３Ａは、記憶部６に記憶された図４に示すような未知語を含まないネットワーク文法を認識対象として設定する。
【００３４】
ステップ３０４では、ステップ３０２またはステップ３０３において設定されたネットワーク文法にもとづいて最大待受け時間を設定する。
ステップ３０５からステップ３１３は上記第一の実施例におけるステップ１０３からステップ１１１と同様であり説明を省略する。
なお本実施例において、図７におけるステップ３０１からステップ３０３が本発明におけるネットワーク文法設定手段を構成する。また訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。
【００３５】
本実施例は以上のように構成され、音声認識処理の使用回数が所定値未満の場合には、未知語を含むネットワーク文法を認識対象とするが、使用回数が所定値以上の場合には、未知語を含まないネットワーク文法を認識対象とする。
使用者のナビゲーションシステムの音声認識処理の使用頻度が多くなってきた場合、使用者の発話中に「あー」、「えー」などの未知語が含まれることが少なくなる。よってこのような場合には、未知語を含まないネットワーク文法を設定することにより未知語が含まれることに起因する誤認識を低減することができる。
【００３６】
次に第四の実施例について説明する。
図８に本実施例における車両のナビゲーションシステムの全体構成を示す。
音声の認識処理を行う信号処理部３Ｂの内部に、車両内の騒音量を計測する騒音計測部２４を有している。
信号処理部３Ｂ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ｂが構成される。
【００３７】
また音声認識部１Ｂ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０Ｂが構成される。
なお本実施例において、上記第一の実施例と同じ構成については同じ番号を付して説明を省略する。
【００３８】
次に図９のフローチャートを用いて、本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
ステップ４００において、信号処理部３Ｂはナビゲーションシステム２０Ｂの使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ４０１へ進む。
【００３９】
ここで信号処理部３Ｂは、騒音計測部２４によって計測される騒音を常時監視しており、使用者が発話スイッチ１３を押す以前の音信号の所定時間あたりの平均パワーを騒音量として算出している。
ステップ４０１において、信号処理部３は騒音計測部２４によって計測された騒音量が所定値以上かどうかを判断し、所定値以上である場合にはステップ４０３において、未知語を含まないネットワーク文法を認識対象として設定する。
【００４０】
一方ステップ４０１において、騒音量が所定値以下である場合には、ステップ４０２において未知語を含むネットワーク文法を認識対象として設定する。
ステップ４０４においては、ステップ４０２またはステップ４０３において設定されたネットワーク文法にもとづいて最大待受け時間を設定する。
ステップ４０５からステップ４１３は、上記第一の実施例におけるステップ１０３からステップ１１１と同様であり説明を省略する。
なお本実施例において、図９におけるステップ４０１からステップ４０３が本発明におけるネットワーク文法設定手段を構成する。また訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。
【００４１】
本実施例は以上のように構成され、騒音計測部２４によって計測された騒音量が所定値以上のときには、未知語を含まないネットワーク文法を認識対象とする。使用者は車両内の騒音が大きい場合には、簡潔に発話した方が音声の認識率が高くなることを使用経験から認知するようになり、使用者の発話中に「あー」、「えー」などの未知語が含まれることが少なくなる。よってこのような場合には、未知語を含まないネットワーク文法を設定することにより未知語が含まれることに起因する誤認識を低減することができる。
【００４２】
次に第五の実施例について説明する。
図１０に本実施例における車両のナビゲーションシステムの全体構成を示す。音声の認識処理を行う信号処理部３Ｃの内部に、発話された文の発話頻度を係数する発話頻度係数部２５を有している。また信号処理部３Ｃは、発話頻度係数部２５によって係数された発話頻度を記憶部６に記憶する。
【００４３】
信号処理部３Ｃ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ｃが構成される。
また音声認識部１Ｃ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０Ｃが構成される。
なお本実施例において、上記第一の実施例と同じ構成については同じ番号を付して説明を省略する。
【００４４】
次に図１１のフローチャートを用いて、本実施例におけるナビゲーションシステムの音声認識処理の流れについて説明する。
ステップ５００において、信号処理部３Ｃはナビゲーションシステム２０Ｃの使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ５０１へ進む。
【００４５】
ステップ５０１において、信号処理部３Ｃは、認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を認識対象として設定する。図１２にネットワーク文法の例を示す。図は地名の階層構造の文法を示し、未知語が挿入される可能性のある箇所を図中「ＵＫ」で示している。信号処理部３Ｃは、図１２の上段に示すように未知語を含む文法と、図の下段に示すように未知語を含まない文法とを同時に認識対象としている。
【００４６】
またステップ５０１においてネットワーク文法を設定する際に、記憶部６に記憶された発話頻度が所定値以上の発話文については、未知語を含まないネットワーク文法として設定する。
たとえば、過去に所定回数以上「神奈川県横浜市旭区」が認識結果となる地名の発話があった場合、この地名は図１２のＡに示すように未知語を含まないネットワーク文法として設定し、かつ未知語を含むネットワーク文法の認識対象語からは排除する。
【００４７】
ステップ５０２からステップ５１１は、上記第一の実施例におけるステップ１０２からステップ１１１と同じであり説明を省略する。
なお本実施例において、図１１におけるステップ５０１が本発明におけるネットワーク文法設定手段を構成する。また訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。
【００４８】
本実施例は以上のように構成され、発話頻度係数部２５によって発話頻度が所定値以上係数された発話文については、その発話文を未知語を含まないネットワーク文法として設定し、かつ未知語を含むネットワーク文法からは排除する。使用者は言いなれた発話に対しては、「あー」、「えー」などの未知語を含む発話をすることが少なくなる。よってこのような認識対象語は未知語を含まないネットワーク文法として設定することにより、未知語が含まれることに起因する誤認識を低減することができる。
【００４９】
次に第六の実施例について説明する。
図１３に本実施例における車両のナビゲーションシステムの全体構成を示す。音声の認識処理を行う信号処理部３Ｄの内部に、使用者の地名に対するなじみ度を判断するなじみ度判断部２６を有している。
信号処理部３Ｄ、記憶部６、Ｄ／Ａコンバータ７、出力アンプ８およびＡ／Ｄコンバータ１０より音声認識部１Ｄが構成される。
また音声認識部１Ｄ、ナビゲーション制御部２、表示部１６、スピーカ９、マイク１１および入力部１２よりナビゲーションシステム２０Ｄが構成される。
【００５０】
なじみ度判断部２６が行う地名に対するなじみ度判断は、たとえば使用者がナビゲーションシステム２０Ｄに登録した自宅住所、過去の走行履歴、過去に行ったことがある場所、あるいはこれらの場所の近傍の地名などについては、なじみ度があると判断するものである。
なお本実施例において、上記第一の実施例と同じ構成については同じ番号を付して説明を省略する。
【００５１】
本実施例における音声認識処理は、上記第五の実施例における図１１のフローチャートのステップ５０１をステップ６０１に置き換えたものであり、他のステップについては説明を省略する。
ステップ５００において、信号処理部３Ｄはナビゲーションシステム２０Ｄの使用者によって、発話の開始を指示する入力部１２に設けられた発話スイッチ１３が操作されたかどうかの判断を行う。発話スイッチ１３の操作があった場合にはステップ６０１へ進む。
【００５２】
ステップ６０１において、信号処理部３Ｄは認識対象であるネットワーク文法を設定する際に、なじみ度判断部２６によってなじみ度があると判断された地名については、未知語を含まないネットワーク文法として設定し、かつ未知語を含むネットワーク文法からは排除する。ネットワーク文法の設定後ステップ５０２へ進む。
本実施例において、図１１におけるステップ６０１が本発明におけるネットワーク文法設定手段を構成する。また訂正スイッチ１４が本発明における誤認識訂正指示部を構成する。
【００５３】
本実施例は以上のように構成され、なじみ度判断部２６によってなじみ度があると判断された地名については、使用者は「あー」、「えー」などの未知語を含む発話をすることが少なくなる。よってこのような認識対象語については、未知語を含まないネットワーク文法として設定することにより未知語が含まれることに起因する誤認識を低減することができる。
【図面の簡単な説明】
【図１】本発明における第一の実施例を示す図である。
【図２】第一の実施例における音声認識処理の流れを示す図である。
【図３】未知語を含むネットワーク文法を示す図である。
【図４】未知語を含まないネットワーク文法を示す図である。
【図５】第二の実施例における音声認識処理の流れを示す図である。
【図６】第三の実施例を示す図である。
【図７】第三の実施例における音声認識処理の流れを示す図である。
【図８】第四の実施例を示す図である。
【図９】第四の実施例における音声認識処理の流れを示す図である。
【図１０】第五の実施例を示す図である。
【図１１】第五および第六の実施例における音声認識処理の流れを示す図である。
【図１２】ネットワーク文法を示す図である。
【図１３】第六の実施例を示す図である。
【符号の説明】
１、１Ａ、１Ｂ、１Ｃ，１Ｄ音声認識部
２ナビゲーション制御部
３、３Ａ、３Ｂ、３Ｃ、３Ｄ信号処理部
６記憶部
７Ｄ／Ａコンバータ
８出力アンプ
９スピーカ
１０Ａ／Ｄコンバータ
１１マイク
１２入力部
１３発話スイッチ
１４訂正スイッチ
１６表示部
２０、２０Ａ、２０Ｂ、２０Ｃ、２０Ｄナビゲーションシステム
２３使用回数記憶部
２４騒音計測部
２５発話頻度係数部
２６なじみ度判断部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice recognition device that recognizes voice.
[0002]
[Prior art]
[Patent Document 1] Japanese Patent Application Laid-Open No. 6-266386 Conventionally, as an apparatus for performing voice recognition, there is one disclosed, for example, in Japanese Patent Application Laid-Open No. 6-266386. This detects a keyword present in the input voice in synchronization with the time of the input voice, and performs voice recognition.
[0003]
[Problems to be solved by the invention]
In the above-described conventional speech recognition device, since the utterance including the unknown word can be recognized even when the user is unlikely to utter the unknown word, the unknown word is not included in the speech. In spite of this, there is a problem that an unknown word is included in the voice and the voice recognition rate is reduced. For example, when uttering a place name, if the user utters “Asahi-ku, Yokohama-shi, Kanagawa” and the speech recognition device is in a state where it can recognize speech including unknown words, the recognition result is “Aseya, Yokohama-shi, Kanagawa, Japan”. The word "ku" was recognized as a voice containing the unknown word "a", and the recognition result that should be "asahi-ku" was erroneously recognized as "asen-ku".
[0004]
Accordingly, an object of the present invention is to provide a speech recognition device having a high speech recognition rate in view of such a problem.
[0005]
[Means for Solving the Problems]
The present invention is a storage unit that stores a network grammar that defines an order relationship in which a recognition target word and other unknown words appear, a network grammar setting unit that sets the network grammar stored in the storage unit as a recognition target, A speech recognition device having a signal processing unit for performing a speech signal recognition process based on the grammar set by the network grammar setting means, comprising a misrecognition correction instructing unit for instructing correction of a recognition result; The network grammar that does not include the unknown word is stored, and the network grammar setting unit recognizes the network grammar that includes the unknown word in the initial state. , A network grammar that does not include unknown words is set as a recognition target.
[0006]
【The invention's effect】
According to the present invention, the speech recognition device initially recognizes a network grammar including an unknown word as an object to be recognized, but includes an unknown word when a speech recognition result is instructed to be corrected by an erroneous recognition correction instructing unit. Set network grammar that does not exist as a recognition target. When the user of the speech recognition device performs the corrected utterance, the utterance content is accurately recognized, and the utterance is less likely to contain unknown words such as “ah” and “er”. Therefore, in such a case, by making a network grammar that does not include an unknown word a recognition target, it is possible to prevent erroneous recognition caused by including an unknown word.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described with reference to examples.
In each of the embodiments described below, the speech recognition device according to the present invention is applied to a vehicle navigation system.
FIG. 1 shows the overall configuration of a vehicle navigation system according to the first embodiment.
The navigation control unit 2 that calculates the position of the vehicle based on a signal received by a GPS (Global Positioning System) antenna (not shown) and presents various information to the user is connected to a signal processing unit 3 that performs voice recognition processing. Is done.
[0008]
The signal processing unit 3 includes a memory and a CPU. The signal processing unit 3 stores an acoustic model for a recognition target word for performing speech recognition and other unknown words, and a network grammar that defines an order relation in which the recognition target word and the other unknown words appear. 6 is connected to the input unit 12 including the speech switch 13 and the correction switch 14.
[0009]
A speaker 9 is connected to the signal processing unit 3 via a D / A converter 7 and an output amplifier 8, and the digital audio signal output from the signal processing unit 3 is converted into an analog audio signal by the D / A converter 7. It is converted, amplified by the output amplifier 8 and output from the speaker 9 as sound.
A microphone 11 is connected to the signal processing unit 3 via an A / D converter 10, and an analog audio signal input from the microphone 11 is converted into a digital audio signal by the A / D converter 10, and the signal processing unit 3 Is transmitted to.
[0010]
The navigation control unit 2 is connected to the display unit 16 and the speaker 9, and presents position information and the like to a vehicle driver or the like through the display unit 16 and the speaker 9. The speech recognition unit 1 is composed of the signal processing unit 3, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
In addition, a navigation system 20 includes the voice recognition unit 1, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
[0011]
Next, the flow of the voice recognition processing of the navigation system will be described with reference to the flowchart of FIG.
In this embodiment, a speech recognition process of a place name spoken for inputting a destination to the navigation control unit 2 will be described.
In step 100, the signal processing unit 3 determines whether the user of the navigation system 20 has operated the utterance switch 13 provided on the input unit 12 for instructing the start of utterance. When the utterance switch 13 is operated, the process proceeds to step 101.
[0012]
In step 101, the signal processing unit 3 sets the network grammar stored in the storage unit 6 as a recognition target. Here, the network grammar refers to a grammar having a hierarchical structure of place names, and FIG. 3 shows an example thereof. First, the names of prefectures are defined as words to be recognized, and then the names of places are sequentially defined, such as the names of municipalities corresponding to each prefecture. Before each word into which an unknown word may be inserted, an unknown word (for example, “Ah”, “Eh”, “Noh”, etc.) is defined as a recognition target word as shown by UK in the figure. . As described above, the order relation in which unknown words appear in the sentence of the place name is defined as the network grammar.
Thus, even when the user utters an utterance including an unknown word other than the place name, for example, “Er-Asa Ward of Yokohama City in Kanagawa Prefecture”, the place name including the unknown word can be recognized.
[0013]
In step 102 of FIG. 2, the signal processing unit 3 sets the maximum standby time based on the network grammar set in step 101. The maximum standby time is set so that the signal processing unit 3 can sufficiently receive the utterance even when the longest place name of the set network grammar is uttered.
[0014]
In step 103, the signal processing unit 3 outputs the notification voice signal stored in the storage unit 6 from the speaker 9 through the D / A converter 7 and the output amplifier 8 to notify the user that the voice capturing process has started. I do.
[0015]
The user who hears the notification voice notifying the start of voice capture speaks a word included in the recognition target. In this embodiment, the recognition target is a place name as shown in FIG.
The audio signal input from the microphone 11 is converted into a digital signal by the A / D converter 10 and input to the signal processing unit 3.
[0016]
Until the utterance switch 13 is operated, the signal processing unit 3 calculates the average power of the audio digital signal converted by the A / D converter 10. After the utterance switch 13 is operated, when the instantaneous power of the digital signal becomes larger than the calculated average power by a predetermined value or more, in step 104, it is determined that the user has uttered, and the voice is captured. Start.
[0017]
When the voice capture is started, the signal processing unit 3 starts calculating the degree of coincidence with the recognition target word stored in the storage unit 6 in step 105. The degree of coincidence indicates how similar the captured voice part is to the individual recognition target words, and the degree of coincidence is obtained as a score. In this embodiment, it is assumed that the higher the score value, the higher the matching degree.
It should be noted that during the processing of this step, the voice capturing by the signal processing unit 3 is continued in parallel.
[0018]
In step 106, it is determined whether the end of the utterance has been detected. The detection of the termination is to judge that the utterance of the user has ended when the instantaneous power of the audio digital signal is lower than the predetermined value for a predetermined time or more. If the end of the utterance has been detected, the process proceeds to step 107; otherwise, the process proceeds to step 111.
[0019]
In step 111, it is determined whether or not the maximum standby time has elapsed after the start of voice capture. If not, the process returns to step 104. If the maximum standby time has elapsed, the process proceeds to step 107.
[0020]
In step 107, the voice capturing process ends, and in step 108, the signal processing unit 3 outputs the recognition target word having the highest matching degree from the speaker 9 through the D / A converter 7 and the output amplifier 8 as a recognition result. In this embodiment, the signal processing unit 3 misrecognizes "Asa-ku, Yokohama-shi, Kanagawa"("A" is an unknown word) for "Asa-ku, Yokohama-shi, Kanagawa" spoken by the user, and " It is assumed that “Seya-ku, Yokohama-shi, Kanagawa” is output through the speaker 9.
[0021]
In step 109, after outputting the recognition target word in step 108, the signal processing unit 3 determines whether the correction switch 14 provided in the input unit 12 has been operated within a predetermined time. If the correction switch 14 has been operated, it is determined that the user has made a correction request to the voice recognition result of the navigation system 20 and the process proceeds to step 112.
[0022]
In step 112, the network grammar is reset. The network grammar to be reset here does not include unknown words as shown in FIG. 4, and a grammar having a hierarchical structure of place names is set similarly to the network grammar set in step 101. After resetting the network grammar, the process returns to step 102 to repeat the speech recognition process.
[0023]
On the other hand, if there is no operation of the correction switch 14 within the predetermined time in step 109, it is determined that the user has accepted the recognition result of the navigation system 20, and the process proceeds to step 110, where processing according to the recognition result is performed. In the present embodiment, the signal processing unit 3 outputs a place name as a recognition result to the navigation control unit 2. The navigation control unit 2 sets the recognized place name as the destination, and presents information such as road guidance to the user through the display unit 16 or the like.
In this embodiment, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention. In this embodiment, steps 101 and 112 in FIG. 2 constitute a network grammar setting unit in the present invention.
[0024]
This embodiment is configured as described above. In order to correct the recognition result of the speech recognition device, when the user operates the correction switch 14 and speaks again, the user clearly recognizes the utterance content. And rarely include unknown words such as "ah" and "er". Therefore, in such a case, by setting a network grammar that does not include an unknown word as a recognition target, it is possible to reduce erroneous recognition caused by including an unknown word.
[0025]
Next, a second embodiment will be described.
This embodiment is a modification of the processing of the signal processing unit 3 in the first embodiment.
The flow of the voice recognition processing of the navigation system in the present embodiment will be described with reference to the flowchart of FIG.
Steps 200 to 208 are the same as steps 100 to 108 in the first embodiment, and steps 210 to 211 are the same as steps 110 to 111 in the first embodiment, and will not be described.
[0026]
In step 209, after the recognition target word is output in step 208, the signal processing unit 3 determines whether the correction switch 14 provided in the input unit 12 has been operated within a predetermined time. If the correction switch 14 has been operated, it is determined that the user has made a correction request with respect to the speech recognition result of the navigation system 20, and the process proceeds to step 212. If there is no operation of the correction switch 14, the process proceeds to step 210.
[0027]
In step 212, the signal processing section 3 determines whether or not the recognition result includes an unknown word. If the recognition result includes an unknown word, the process proceeds to step 213. If no unknown word is included, the process returns to step 202 to repeat the speech recognition process.
In step 213, a network grammar that does not include an unknown word is reset as a recognition target, and the process returns to step 202 to repeat the speech recognition process.
In this embodiment, steps 201 and 213 in FIG. 5 constitute a network grammar setting unit in the present invention. Further, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention.
[0028]
The present embodiment is configured as described above, in a case where the speech recognition result by the signal processing unit 3 includes an unknown word, and when the recognition result is erroneous recognition and the utterance after the correction switch is operated, In other words, a network grammar that does not include unknown words is set as a recognition target. When the correction is instructed for the recognition result including the unknown word in this way, there is a high possibility that the erroneous recognition is caused by including the unknown word. Therefore, in such a case, by setting a network grammar that does not include an unknown word, it is possible to reduce erroneous recognition caused by including an unknown word.
[0029]
Next, a third embodiment will be described.
FIG. 6 shows the overall configuration of the vehicle navigation system in this embodiment.
The signal processing unit 3A that performs the voice recognition process includes a use frequency storage unit 23 that stores the number of times the voice recognition process of the navigation system is used, that is, the number of times the voice recognition process is executed.
The speech recognition unit 1A is composed of the signal processing unit 3A, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
[0030]
A navigation system 20A is constituted by the voice recognition unit 1A, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
In this embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0031]
Next, the flow of the voice recognition processing of the navigation system in the present embodiment will be described using the flowchart of FIG.
In step 300, the signal processing unit 3A determines whether the user of the navigation system 20A has operated the utterance switch 13 provided on the input unit 12 for instructing the start of utterance. When the utterance switch 13 has been operated, the process proceeds to step 301.
[0032]
In step 301, the signal processing unit 3A determines whether or not the number of times of use of the voice recognition device stored in the number-of-uses storage unit 23 is equal to or greater than a predetermined value. If it is not less than the value, the process proceeds to step 303.
[0033]
In step 302, the signal processing unit 3A sets a network grammar including an unknown word stored in the storage unit 6 as shown in FIG.
In step 303, the signal processing unit 3A sets a network grammar that does not include unknown words and that is stored in the storage unit 6 as illustrated in FIG.
[0034]
In step 304, the maximum standby time is set based on the network grammar set in step 302 or step 303.
Steps 305 to 313 are the same as steps 103 to 111 in the first embodiment, and a description thereof will be omitted.
In this embodiment, steps 301 to 303 in FIG. 7 constitute a network grammar setting unit in the present invention. Further, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention.
[0035]
The present embodiment is configured as described above, and when the number of times of use of the voice recognition process is less than a predetermined value, the network grammar including the unknown word is to be recognized. Network grammar that does not include unknown words is targeted for recognition.
When the frequency of use of the voice recognition processing of the navigation system of the user increases, unknown words such as “ah” and “er” are less included in the utterance of the user. Therefore, in such a case, by setting a network grammar that does not include unknown words, it is possible to reduce erroneous recognition caused by including unknown words.
[0036]
Next, a fourth embodiment will be described.
FIG. 8 shows the overall configuration of the vehicle navigation system in this embodiment.
A noise measurement unit 24 that measures the amount of noise in the vehicle is provided inside the signal processing unit 3B that performs voice recognition processing.
The voice recognition unit 1B is composed of the signal processing unit 3B, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
[0037]
A navigation system 20B is composed of the voice recognition unit 1B, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
In this embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0038]
Next, the flow of the voice recognition processing of the navigation system in the present embodiment will be described using the flowchart of FIG.
In step 400, the signal processing unit 3B determines whether or not the user of the navigation system 20B has operated the utterance switch 13 provided on the input unit 12 for instructing the start of utterance. When the utterance switch 13 is operated, the process proceeds to step 401.
[0039]
Here, the signal processing unit 3B constantly monitors the noise measured by the noise measuring unit 24, and calculates an average power per predetermined time of the sound signal before the user presses the utterance switch 13 as a noise amount. I have.
In step 401, the signal processing unit 3 determines whether the noise amount measured by the noise measuring unit 24 is equal to or more than a predetermined value, and if it is equal to or more than the predetermined value, in step 403, recognizes a network grammar that does not include an unknown word. Set as target.
[0040]
On the other hand, if the noise amount is equal to or smaller than the predetermined value in step 401, a network grammar including an unknown word is set as a recognition target in step 402.
In step 404, the maximum standby time is set based on the network grammar set in step 402 or 403.
Steps 405 to 413 are the same as steps 103 to 111 in the first embodiment, and a description thereof will be omitted.
In this embodiment, steps 401 to 403 in FIG. 9 constitute the network grammar setting means in the present invention. Further, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention.
[0041]
The present embodiment is configured as described above. When the noise amount measured by the noise measurement unit 24 is equal to or more than a predetermined value, a network grammar that does not include an unknown word is set as a recognition target. If the noise in the vehicle is high, the user will recognize from the experience that the utterance of the utterance will be higher if he / she speaks succinctly, and during the utterance of the user, "Ah", "Eh" Unknown words such as are reduced. Therefore, in such a case, by setting a network grammar that does not include unknown words, it is possible to reduce erroneous recognition caused by including unknown words.
[0042]
Next, a fifth embodiment will be described.
FIG. 10 shows the overall configuration of the vehicle navigation system in this embodiment. The signal processing unit 3C that performs the speech recognition processing includes an utterance frequency coefficient unit 25 that calculates the utterance frequency of the uttered sentence. The signal processing unit 3 </ b> C stores the speech frequency calculated by the speech frequency coefficient unit 25 in the storage unit 6.
[0043]
The speech recognition unit 1C is configured by the signal processing unit 3C, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
Further, a navigation system 20C includes the voice recognition unit 1C, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
In this embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0044]
Next, the flow of the voice recognition processing of the navigation system in the present embodiment will be described using the flowchart of FIG.
In step 500, the signal processing unit 3C determines whether or not the user of the navigation system 20C has operated the utterance switch 13 provided on the input unit 12 for instructing the start of utterance. If the utterance switch 13 has been operated, the process proceeds to step 501.
[0045]
In step 501, the signal processing unit 3C sets, as a recognition target, a network grammar that defines an order relationship in which the recognition target word and other unknown words appear. FIG. 12 shows an example of the network grammar. The figure shows the grammar of the hierarchical structure of place names, and places where an unknown word may be inserted are indicated by “UK” in the figure. The signal processing unit 3C simultaneously recognizes a grammar including an unknown word as shown in the upper part of FIG. 12 and a grammar not including the unknown word as shown in the lower part of FIG.
[0046]
When the network grammar is set in step 501, the utterance sentence whose utterance frequency stored in the storage unit 6 is equal to or more than a predetermined value is set as a network grammar that does not include an unknown word.
For example, if there is a utterance of a place name in which “Asahi-ku, Yokohama-shi, Kanagawa-ken” is a recognition result more than a predetermined number of times in the past, this place name is set as a network grammar that does not include unknown words as shown in FIG. In addition, it is excluded from words to be recognized in network grammar including unknown words.
[0047]
Steps 502 to 511 are the same as steps 102 to 111 in the first embodiment, and a description thereof will be omitted.
In this embodiment, step 501 in FIG. 11 constitutes a network grammar setting unit in the present invention. Further, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention.
[0048]
The present embodiment is configured as described above. For an utterance sentence whose utterance frequency is increased by a predetermined value or more by the utterance frequency coefficient unit 25, the utterance sentence is set as a network grammar that does not include an unknown word, and the unknown word is set. Exclude from network grammar including. The user is less likely to make utterances containing unknown words such as "ah" and "er" for the uttered utterances. Therefore, by setting such a recognition target word as a network grammar that does not include an unknown word, it is possible to reduce erroneous recognition caused by including an unknown word.
[0049]
Next, a sixth embodiment will be described.
FIG. 13 shows the overall configuration of the vehicle navigation system in this embodiment. The signal processing unit 3D that performs the voice recognition process includes a familiarity determination unit 26 that determines the familiarity with the place name of the user.
The voice recognition unit 1D is configured by the signal processing unit 3D, the storage unit 6, the D / A converter 7, the output amplifier 8, and the A / D converter 10.
Further, a navigation system 20D includes the voice recognition unit 1D, the navigation control unit 2, the display unit 16, the speaker 9, the microphone 11, and the input unit 12.
[0050]
The familiarity degree determination for the place name performed by the familiarity degree determination unit 26 may be, for example, a home address registered by the user in the navigation system 20D, a past driving history, a place that has been performed in the past, or a place name near these places. Is determined to be familiar.
In this embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
[0051]
The voice recognition processing in the present embodiment is obtained by replacing step 501 in the flowchart of FIG. 11 in the fifth embodiment with step 601, and the description of the other steps will be omitted.
In step 500, the signal processing unit 3D determines whether the user of the navigation system 20D has operated the utterance switch 13 provided in the input unit 12 for instructing the start of utterance. When the utterance switch 13 is operated, the process proceeds to step 601.
[0052]
In step 601, when setting the network grammar to be recognized, the signal processing unit 3D sets a place name determined to be familiar by the familiarity determination unit 26 as a network grammar that does not include unknown words, In addition, it is excluded from network grammar including unknown words. After setting the network grammar, the process proceeds to step 502.
In this embodiment, step 601 in FIG. 11 constitutes a network grammar setting unit in the present invention. Further, the correction switch 14 constitutes an erroneous recognition / correction instruction section in the present invention.
[0053]
The present embodiment is configured as described above, and for the place name determined to be familiar by the familiarity determination unit 26, the user may utter an utterance including an unknown word such as “ah” or “er”. Less. Therefore, by setting such a recognition target word as a network grammar that does not include an unknown word, it is possible to reduce erroneous recognition caused by including an unknown word.
[Brief description of the drawings]
FIG. 1 is a diagram showing a first embodiment of the present invention.
FIG. 2 is a diagram showing a flow of a voice recognition process in the first embodiment.
FIG. 3 is a diagram illustrating a network grammar including an unknown word.
FIG. 4 is a diagram illustrating a network grammar that does not include unknown words.
FIG. 5 is a diagram showing a flow of a voice recognition process in the second embodiment.
FIG. 6 is a diagram showing a third embodiment.
FIG. 7 is a diagram illustrating a flow of a voice recognition process in the third embodiment.
FIG. 8 is a diagram showing a fourth embodiment.
FIG. 9 is a diagram showing a flow of a voice recognition process in the fourth embodiment.
FIG. 10 is a diagram showing a fifth embodiment.
FIG. 11 is a diagram showing a flow of a voice recognition process in the fifth and sixth embodiments.
FIG. 12 is a diagram showing a network grammar.
FIG. 13 is a view showing a sixth embodiment.
[Explanation of symbols]
1, 1A, 1B, 1C, 1D Voice recognition unit 2 Navigation control unit 3, 3A, 3B, 3C, 3D Signal processing unit 6 Storage unit 7 D / A converter 8 Output amplifier 9 Speaker 10 A / D converter 11 Microphone 12 Input Unit 13 utterance switch 14 correction switch 16 display unit 20, 20A, 20B, 20C, 20D navigation system 23 use frequency storage unit 24 noise measuring unit 25 utterance frequency coefficient unit 26 familiarity determination unit

Claims

認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、
該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、
該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、
認識結果の訂正を指示する誤認識訂正指示部を備え、
前記記憶部は未知語を含まないネットワーク文法を記憶し、
前記ネットワーク文法設定手段は、初期状態では未知語を含むネットワーク文法を認識対象とするが、前記誤認識訂正指示部から認識結果の訂正指示があった場合には、未知語を含まないネットワーク文法を認識対象として設定することを特徴とする音声認識装置。A storage unit that stores a network grammar that defines an order relationship in which the recognition target word and other unknown words appear,
Network grammar setting means for setting the network grammar stored in the storage unit as a recognition target;
A speech processing device for performing a speech signal recognition process based on the grammar set by the network grammar setting means,
Equipped with an erroneous recognition and correction instruction unit that instructs correction of the recognition result,
The storage unit stores a network grammar that does not include unknown words,
The network grammar setting means recognizes a network grammar including an unknown word in an initial state, but, when a recognition result correction instruction is given from the erroneous recognition correction instruction unit, a network grammar not including the unknown word is recognized. A speech recognition device, which is set as a recognition target.

前記ネットワーク文法設定手段は、音声の認識結果に未知語を含む場合であって、かつその後に認識結果の訂正指示があった場合には、未知語を含まないネットワーク文法を認識対象として設定することを特徴とする請求項１記載の音声認識装置。The network grammar setting means sets a network grammar that does not include unknown words as a recognition target when the speech recognition result includes an unknown word, and when there is an instruction to correct the recognition result thereafter. The speech recognition device according to claim 1, wherein:

認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、
該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、
該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、
該音声認識装置の使用回数を記憶する使用回数記憶部を備え、
前記記憶部は未知語を含まないネットワーク文法を記憶し、
前記ネットワーク文法設定手段は、初期状態では未知語を含むネットワーク文法を認識対象とするが、前記使用回数記憶部によって記憶された使用回数が所定値以上となった場合に、未知語を含まないネットワーク文法を認識対象として設定することを特徴とする音声認識装置。A storage unit that stores a network grammar that defines an order relationship in which the recognition target word and other unknown words appear,
Network grammar setting means for setting the network grammar stored in the storage unit as a recognition target;
A speech processing device for performing a speech signal recognition process based on the grammar set by the network grammar setting means,
A use count storage unit that stores the use count of the voice recognition device,
The storage unit stores a network grammar that does not include unknown words,
The network grammar setting means recognizes a network grammar including an unknown word in an initial state. However, when the number of uses stored by the use number storage unit is equal to or more than a predetermined value, a network not including an unknown word is recognized. A speech recognition device, wherein a grammar is set as a recognition target.

認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、
該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、
該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、
騒音量を計測する騒音計測部を備え、
前記記憶部は未知語を含まないネットワーク文法を記憶し、
前記ネットワーク文法設定手段は、初期状態では未知語を含むネットワーク文法を認識対象とするが、前記騒音計測部によって計測された騒音量が所定値以上となった場合に、未知語を含まないネットワーク文法を認識対象として設定することを特徴とする音声認識装置。A storage unit that stores a network grammar that defines an order relationship in which the recognition target word and other unknown words appear,
Network grammar setting means for setting the network grammar stored in the storage unit as a recognition target;
A speech processing device for performing a speech signal recognition process based on the grammar set by the network grammar setting means,
Equipped with a noise measurement unit that measures the amount of noise,
The storage unit stores a network grammar that does not include unknown words,
The network grammar setting means recognizes a network grammar including an unknown word in an initial state, but, when the noise amount measured by the noise measurement unit is equal to or more than a predetermined value, a network grammar not including the unknown word. A speech recognition apparatus characterized in that is set as a recognition target.

認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、
該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、
該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、
発話された文の発話頻度を係数する発話頻度係数部を備え、
前記ネットワーク文法設定手段は、前記発話頻度係数部によって所定値以上計数された発話文については、未知語を含まないネットワーク文法として設定することを特徴とする音声認識装置。A storage unit that stores a network grammar that defines an order relationship in which the recognition target word and other unknown words appear,
Network grammar setting means for setting the network grammar stored in the storage unit as a recognition target;
A speech processing device for performing a speech signal recognition process based on the grammar set by the network grammar setting means,
An utterance frequency coefficient unit for calculating the utterance frequency of the uttered sentence,
The speech recognition device, wherein the network grammar setting means sets the utterances counted by the utterance frequency coefficient unit to a predetermined value or more as a network grammar that does not include unknown words.

認識対象語とその他の未知語が出現する順序関係を規定したネットワーク文法を記憶する記憶部と、
該記憶部に記憶されたネットワーク文法を認識対象として設定するネットワーク文法設定手段と、
該ネットワーク文法設定手段によって設定された文法にもとづいて、音声信号の認識処理を行う信号処理部とを有する音声認識装置において、
使用者の地名に対するなじみ度を判断するなじみ度判断部を備え、
前記ネットワーク文法設定部は、前記なじみ度判断部によってなじみ度が高いと判断された地名については、未知語を含まないネットワーク文法として設定することを特徴とする音声認識装置。A storage unit that stores a network grammar that defines an order relationship in which the recognition target word and other unknown words appear,
Network grammar setting means for setting the network grammar stored in the storage unit as a recognition target;
A speech processing device for performing a speech signal recognition process based on the grammar set by the network grammar setting means,
Equipped with a familiarity determination unit that determines the familiarity of the user with the place name,
The speech recognition device, wherein the network grammar setting unit sets a place name determined to be highly familiar by the familiarity determination unit as a network grammar that does not include unknown words.