JP2004133162A

JP2004133162A - Display device and name display method therefor

Info

Publication number: JP2004133162A
Application number: JP2002297209A
Authority: JP
Inventors: Fumio Seto; 瀬戸　史生; Masayuki Watabe; 渡部　眞幸; Okihiko Nakayama; 中山　沖彦
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2002-10-10
Filing date: 2002-10-10
Publication date: 2004-04-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a display device in which speech input operations are made smooth and to provide a name display method therefor. <P>SOLUTION: One or more speech recognition object words which are objects for speech recognition are displayed on a screen. When the words are accurately uttered, a prescribed process is executed. When the utterance of the speech recognition object word by the user is recognized to be incorrect by aspeech recognition section 31 or when the utterance of the word by the user is not recognized within a prescribed time, an indication processing section 33 displays equal to or at least more than one word among one or a plurality of speech recognition object words to be displayed on the screen in KANA (Japanese syllabary) to produce display contents. A display 40 displays the produced display contents. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識の対象となる音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置及びその名称表示方法に関する。
【０００２】
【従来の技術】
従来における表示装置のひとつであるナビゲーション装置として、例えば、目的地の読み方が不明である場合、目的地の周辺地名及びその周辺地名から目的地までの距離等をユーザに音声入力させて目的地の位置を特定し、この位置に存在する目的地の名称をユーザに音声案内して、当初読み方が不明であった目的地をユーザに指定させるものが知られている（例えば特許文献１参照）。
【０００３】
【特許文献１】
特開２０００−３３７９１２号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、従来における表示装置（ナビゲーション装置）では、目的地等の読み方が不明である場合、目的地等を指定するまでの操作が複雑であり煩わしいものであった。特に、雨天の走行時など運転に集中している場合には、その操作が非常に煩わしいものになってしまう。
【０００５】
本発明はこのような従来の課題を解決するためになされたものであり、その目的とするところは、音声による入力操作の円滑化を図ることが可能な表示装置及びその名称表示方法を提供することにある。
【０００６】
【課題を解決するための手段】
上記目的を達成するため、本発明では、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、音声認識対象語が正確に発話された場合には、目的地の設定や画面切替などの所定の処理を実行し、ユーザによる音声認識対象語の発話が不正確であったと認識された場合やユーザによる音声認識対象語の発話が所定時間認識されなかった場合には、画面上に表示される１又は複数の音声認識対象語のうち少なくとも１以上の音声認識対象語を仮名表記にして表示させることを特徴としている。
【０００７】
【発明の効果】
本発明によれば、ユーザの発話状態を検知し、画面上に表示される少なくとも１以上の音声認識対象を仮名表記にして表示するので、ユーザは、漢字等の表記では読むことができなかった音声認識対象語の正確な読みを知ることとなり、複雑な操作を要することなく音声認識対象語を簡易に指定することが可能となる。従って、音声による入力操作の円滑化を図ることができる。
【０００８】
【発明の実施の形態】
以下、本発明の好適な実施形態を図面に基づいて説明する。
【０００９】
図１は、本発明の第１実施形態に係る表示装置の構成図である。同図に示すように、表示装置１は、車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、音声認識対象語が正確に発話されることにより所定の処理を実行するものであって、好適には、音声認識のナビゲーション装置などに適用される。以下、表示装置１を音声認識のナビゲーション装置に適用した例について、説明する。
【００１０】
この表示装置１は、音声を入力する音声入力部１０と、各種データを保有するデータベース２０と、音声入力部１０からの音声信号を入力すると共に、生成された画像データ等を出力するナビゲーション部３０と、ナビゲーション部３０からの画像データに基づき画像を表示するディスプレイ（表示手段）４０と、ナビゲーション部３０により生成された音声データに基づき音声を出力するスピーカ５０とを備えている。
【００１１】
より詳しく説明すると、データベース２０は、ディスプレイ４０に表示され音声入力の対象となる音声認識対象語、及びその音声認識対象語の誤った読みである誤読名称を記憶する認識辞書部２１と、道路地図や建物等の位置のデータ及び音声認識対象の漢字データや仮名データ等を記憶する地図データ部２２とを具備している。
【００１２】
また、ナビゲーション部３０は、ユーザが発話した音声を認識する音声認識部（音声認識手段）３１と、音声認識部３１にてユーザによる音声認識対象語の発話が不正確であったと認識された場合又は音声認識部３１にてユーザによる発話が所定時間認識されなかった場合に、ディスプレイ４０の画面上に表示される１又は複数の音声認識対象語のうち少なくとも１以上の音声認識対象語を選択する判定部（選択手段）３２と、判定部３２により選択された音声認識対象語を仮名表記にして表示内容を生成する提示処理部（表示内容生成手段）３３とを具備している。
【００１３】
なお、上記音声認識部３１は、認識辞書部２１が記録する音声認識対象語とユーザが発話した音声とを比較して尤度を算出し、尤度が第１の所定値（例えば８０％）以上となった音声認識対象語が発話されたものと認識するように構成されている。
【００１４】
図２は、本実施形態の表示装置１の動作を示すフローチャートである。同図に示すように、まず、音声認識部３１は、ユーザによって音声が入力されたか否かを判断する（ＳＴ１００）。ユーザによって音声が入力されなかったと判断した場合（ＳＴ１００：ＮＯ）、ナビゲーション部３０は、音声認識部３１にてユーザによる発話が所定時間認識されなかったか否かを判断する（ＳＴ１０１）。
【００１５】
音声認識部３１にてユーザによる発話が所定時間以内に認識されたと判断した場合（ＳＴ１０１：ＮＯ）、ナビゲーション部３０は終了動作が行われたか否かを判断する（ＳＴ１０２）。すなわち、ナビゲーション部３０は、例えばイグニッションスイッチがオフされた否かを判断する。
【００１６】
終了動作が行われていないと判断した場合（ＳＴ１０２：ＮＯ）、処理はステップＳＴ１００に戻る。終了動作が行われたと判断した場合（ＳＴ１０２：ＹＥＳ）、処理は終了する。
【００１７】
ところで、音声認識部３１にてユーザによる発話が所定時間認識されなかったと判断した場合（ＳＴ１０１：ＹＥＳ）、ナビゲーション部３０は、第１又は第２の提示処理を実行する（ＳＴ１１０）。そして、第１又は第２の提示処理の実行後、処理はステップＳＴ１０２に移行する。
【００１８】
ここで、第１の提示処理とは、提示処理部３３が、画面上に表示される１又は複数の音声認識対象語のすべてを仮名表記にして表示内容を生成する処理である。そして、第１の提示処理が実行されると、ディスプレイ４０には、次のような画像が表示される。
【００１９】
図３は、第１の提示処理が実行される場合にディスプレイ４０に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【００２０】
図３（ａ）に示すように、第１の提示処理が実行される前の表示画像は、音声認識対象語である「田浦大作町」や「逸見駅」等が漢字による一般表記４１で表示されている。そして、第１の提示処理が実行されると、図３（ｂ）に示すように、音声認識対象語である「田浦大作町」や「逸見駅」等は、すべて平仮名による仮名表記４２で表示される。このように、第１の提示処理が実行されると、画面上に表示されるすべての音声認識対象語が漢字などの一般表記４１から平仮名などの仮名表記４２に変更されて表示される。
【００２１】
また、第２の提示処理とは、判定部３２が、画面上に表示される１又は複数の音声認識対象語のうち予め難読であるとして登録された音声認識対象語を選択し、提示処理部３３が、選択された音声認識対象語を仮名表記４２にして表示内容を生成する処理である。そして、第２の提示処理が実行されると、ディスプレイ４０には、次のような画像が表示される。
【００２２】
図４は、第２の提示処理が実行される場合にディスプレイ４０に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【００２３】
図４（ａ）に示すように、第１の提示処理が実行される前の表示画像は、音声認識対象語である「汐入駅」や「逸見駅」等が漢字による一般表記４１で表示されている。そして、第２の提示処理が実行されると、図４（ｂ）に示すように、音声認識対象語のうち難読である「汐入駅」や「逸見駅」等は、すべて平仮名による仮名表記４２で表示される。このように、第２の提示処理が実行されると、画面上に表示される音声認識対象語のうち難読であるものが、漢字などの一般表記４１から平仮名などの仮名表記４２に変更されて表示される。
【００２４】
なお、この難読であるか否かは、データベース２０に予め登録されているデータを参照することによって、判定部３２により判断される。
【００２５】
再び、図２を参照して説明する。ユーザによって音声が入力されたと判断した場合（ＳＴ１００：ＹＥＳ）、音声認識部３１は、入力された音声と音声認識辞書部２１に記録されている音声データとがマッチングしたか否かを判断する（ＳＴ１２０）。すなわち、音声認識部３１は、画面上に表示される音声認識対象語それぞれと発話された音声との尤度のいずれかが第１の所定値以上となったか否かを判断する。
【００２６】
音声認識辞書部２１に記録されている音声データと入力された音声とがマッチングしなかったと判断した場合（ＳＴ１２０：ＮＯ）、すなわち、尤度がすべて第１の所定値を下回った場合、音声認識部３１は、第１の所定値よりも小さく設定される第２の所定値（例えば５０％）以上の尤度となった音声認識対象語が存在するか否かを判断する（ＳＴ１２１）。
【００２７】
第２の所定値以上の尤度となった音声認識対象語が存在しないと判断した場合（ＳＴ１２１：ＮＯ）、処理はステップＳＴ１０２に移行する。第２の所定値以上の尤度の音声認識対象語が存在すると判断した場合（ＳＴ１２１：ＹＥＳ）、ナビゲーション部３０は、第３の提示処理を実行する（ＳＴ１３０）。そして、第３の提示処理の実行後、処理はステップＳＴ１０２に移行する。
【００２８】
ここで、第３の提示処理とは、判定部３２が、画面上に表示される１又は複数の音声認識対象語のうち第２の所定値以上の尤度となった音声認識対象語を選択し、提示処理部３３が、選択された音声認識対象語を仮名表記４２にして表示内容を生成する処理である。そして、第３の提示処理が実行されると、ディスプレイ４０には、次のような画像が表示される。
【００２９】
図５は、第３の提示処理が実行される場合にディスプレイ４０に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。また、図６は、音声認識対象語の尤度の一例を示す説明図である。
【００３０】
図５（ａ）に示すように、第３の提示処理が実行される前の表示画像は、音声認識対象語である「田浦大作町」等が漢字による一般表記４１で表示されている。ここで、ユーザが「田浦大作町（たうらだいさくちょう）」を指定するつもりで「たうらたいさくちょう」と発話したとする。このとき、尤度は、例えば図６に示すように「田浦大作町」が「６５％」、「田浦泉町」が「３０％」、「田浦駅」が「２０％」となる。
【００３１】
そして、第２の所定値が「５０％」に設定されている場合、第２の所定値以上となる音声認識対象語は、「田浦大作町」だけである。この場合、第３の提示処理が実行されると、図５（ｂ）に示すように、第２の所定値以上となった音声認識対象語である「田浦大作町」は、平仮名による仮名表記４２にされて表示される。このように、第３の提示処理が実行されると、画面上に表示される音声認識対象語のうち第２の所定値以上となったものが漢字などの一般表記４１から平仮名などの仮名表記４２に変更されて表示されることとなる。
【００３２】
なお、ここでの説明では、仮名表記４２にされた音声認識対象語は１つであったが、特に１つに限らず、第２の所定値以上となった音声認識対象語が２つ以上存在する場合には、当然２つ以上の音声認識対象語が仮名表記４２にされることとなる。
【００３３】
再度、図２を参照して説明する。音声認識辞書部２１に記録されている音声データと入力された音声とがマッチングしたと判断した場合（ＳＴ１２０：ＹＥＳ）、音声認識部３１は、音声認識対象語の誤った読みである誤読名称とマッチングしたか否かを判断する（ＳＴ１４０）。
【００３４】
音声認識対象語の誤った読みである誤読名称とマッチングしなかったと判断した場合（ＳＴ１４０：ＮＯ）、すなわち、音声認識対象語が正確に発話されたと判断した場合、通常の表示内容が提示される（ＳＴ１４１）。すなわち、図３（ａ）や図４（ａ）に示すように、音声認識対象語が仮名表記４２でなく一般表記４１として表示される。通常表示後、処理は、ステップＳＴ１０２に移行する。
【００３５】
一方、音声認識対象語の誤った読みである誤読名称とマッチングしたと判断した場合（ＳＴ１４０：ＹＥＳ）、ナビゲーション部３０は、第４の提示処理を実行する（ＳＴ１５０）。そして、第４の提示処理の実行後、処理はステップＳＴ１０２に移行する。
【００３６】
ここで、第４の提示処理とは、判定部３２が、誤読名称の発話があった音声認識対象語を選択し、提示処理部３３が、画面上に表示される１又は複数の音声認識対象語のうち判定部３２に選択された音声認識対象語を仮名表記４２にして表示内容を生成する処理である。そして、第４の提示処理が実行されると、ディスプレイ４０には、次のような画像が表示される。
【００３７】
図７は、第４の提示処理が実行される場合にディスプレイ４０に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。図７（ａ）に示すように、第４の提示処理が実行される前の表示画像は、音声認識対象語である「逸見駅」等が漢字表記で表示されている。
【００３８】
ここで、ユーザが「逸見駅（へみえき）」を指定するつもりで「いつみえき」と発話したとする。このとき、音声認識部３１は、「逸見駅（へみえき）」の誤読名称である「いつみえき」という読みを記憶しており、「逸見駅」の誤読名称が発話されたと判断する。そして、第４の提示処理が実行されると、図７（ｂ）に示すように、音声認識対象語である「逸見駅」は、平仮名による仮名表記４２で表示される。このように、第４の提示処理が実行されると、画面上に表示される１又は複数の音声認識対象語のうち誤読名称によって発話された音声認識対象語が漢字などの一般表記４１から平仮名などの仮名表記４２に変更されて表示される。
【００３９】
このようにして、本実施形態に係る表示装置１及びその名称表示方法では、ユーザの発話状態を検知し、画面上に表示される少なくとも１以上の音声認識対象を仮名表記４２にして表示するので、ユーザは、漢字等の表記では読むことができなかった音声認識対象語の正確な読みを知ることとなり、複雑な操作を要することなく音声認識対象語を指定することが可能となる。従って、音声による入力操作の円滑化を図ることができる（請求項１，１３の効果）。
【００４０】
また、音声認識部３１にてユーザによる音声認識対象語が所定時間認識されなかった場合には、画面上に表示される音声認識対象語のうち難読であるものを仮名表記４２にして表示するので、画面上の音声認識対象語のすべてを仮名表記４２にする場合に比して、画面全体の文字数の増加を抑制することとなり、文字数増加による視認性の低下を抑制することができる（請求項２の効果）。
【００４１】
また、尤度が第２の所定値以上となった音声認識対象語を仮名表記４２にして表示するので、音声認識対象語が難読であるか否かにかかわらず、ユーザが正確に発話できなければ、ユーザが指定しようとした可能性が高い音声認識対象語について正確な読みを提示することが可能となっており、ユーザに対する音声入力の支援を柔軟に行うことができる（請求項３の効果）。
【００４２】
また、誤読名称が発話された音声認識対象語を仮名表記４２にして表示するので、誤った読みが発話された音声認識対象語だけが仮名表記４２とされることとなる。このため、ユーザが指定しようとした確率が極めて高い音声認識対象語だけを仮名表記４２にすることとなり、画面全体の文字数の増加を最小限に抑え、視認性の低下をより効率よく抑制することができる（請求項４の効果）。
【００４３】
次に本発明の第２実施形態を説明する。第２の実施形態に係る表示装置２は、第１の実施形態に係る表示装置１とほぼ同様であるが、以下の点で異なっている。
【００４４】
すなわち、第１実施形態に係る表示装置１では、第１〜第４の提示処理において、漢字などの一般表記４１を仮名表記４２にして表示していたが、第２実施形態に係る表示装置２では、漢字などの一般表記４１に記号を付して表示するように構成されている。つまり、第２実施形態では、提示処理部３３が音声認識対象語に数字やアルファベットなどの記号を付して記号付名称とし、それをディスプレイ４０が表示するようになっている。
【００４５】
また、音声認識部３１は、記号が付された音声認識対象語がディスプレイ４０の画面上に表示されている間、記号の発話を認識するようになっている。記号が発話されると、音声認識部３１は記号が付されている音声認識対象語が発話されたと認識し、表示装置２は目的地の設定や画面切替など所定の処理を実行する。
【００４６】
このように、本実施形態に係る表示装置２及びその名称表示方法では、ユーザは、漢字等の表記では読むことができなかった音声認識対象語を容易に発話することが可能となり、複雑な操作を要することなく音声認識対象語を指定することが可能となる。従って、音声による入力操作の円滑化を図ることができる（請求項５，１４の効果）。
【００４７】
また、音声認識部３１にてユーザによる音声認識対象語が所定時間認識されなかった場合には、画面上に表示される音声認識対象語のうち難読であるものに記号を付して表示するので、画面上の音声認識対象語のすべてに記号を付す場合に比して、画面全体の文字数の増加を抑制することとなる。従って、文字数増加による視認性の低下を抑制することができる（請求項６の効果）。
【００４８】
また、尤度が第２の所定値以上となった音声認識対象語に記号を付して表示するので、音声認識対象語が難読であるか否かにかかわらず、ユーザが正確に発話できなければ、ユーザが指定しようとした可能性が高い音声認識対象語に記号を付すことになり、ユーザに対する音声入力の支援を柔軟に行うことができる（請求項７の効果）。
【００４９】
また、誤読名称が発話された音声認識対象語に記号を付して表示するので、誤った読みが発話された音声認識対象語だけに記号が付されることとなる。このため、ユーザが指定しようとした確率が極めて高い音声認識対象語だけに記号を付すこととなる。従って、画面全体の文字数の増加を最小限に抑え、視認性の低下をより効率よく抑制することができる（請求項８の効果）。
【００５０】
次に、本発明の第３実施形態について説明する。図８は、本発明の第３実施形態に係る表示装置の構成図である。同図に示すように、第３実施形態に係る表示装置３は、第１実施形態の表示装置１に加え、ＧＰＳ衛星からの電波を受信すると共に、車両の現在位置の緯度及び経度、並びに現在時刻等の情報を出力するＧＰＳ受信機１１と、車体の角度変化を知るためのジャイロセンサ１２と、車両の走行速度及び距離に比例した数のパルス信号を出力する車速センサ１３と、信号をナビゲーション部３０に無線にて送出するリモコン１４と、地域毎におけるユーザの親和度を記憶する親和度データベース（記憶手段）６０とを備えている。
【００５１】
また、同図に示すように、ナビゲーション部３０は、判定部３２に代えて、所定の操作等に基づいて親和度の更新などを行う親和度登録部３４を有している。また、ナビゲーション部３０の提示処理部３３は、判定部３２が選択した音声認識対象語を仮名表記４２にする代わりに、親和度データベース６０に記憶されている親和度に基づいて、音声認識対象語を仮名表記４２にするか否かを判断するようになっている。
【００５２】
ここで、親和度とは、ユーザが各地域の地名をどれだけ知っているかを示す指標であって、車両の走行履歴や予めユーザによって登録された登録内容やユーザの操作履歴に基づいて求められるものである。
【００５３】
図９は、本実施形態の表示装置３の動作を示すフローチャートである。同図に示すように、まず、音声認識部３１は、音声認識対象語がユーザによって発話されたか否かを判断する（ＳＴ２００）。ユーザによって音声認識対象語が発話されなかったと判断した場合（ＳＴ２００：ＮＯ）、ナビゲーション部３０は、現在、自車両が走行中であるか否かを判断する（ＳＴ２０１）。
【００５４】
自車両が走行中でないと判断した場合（ＳＴ２０１：ＮＯ）、ナビゲーション部３０は、ポイントが登録中であるか否かを判断する（ＳＴ２０２）。ポイントが登録中でないと判断した場合（ＳＴ２０２：ＮＯ）、ナビゲーション部３０は終了動作が行われたか否かを判断する（ＳＴ２０３）。すなわち、ナビゲーション部３０は、イグニッションスイッチ等がオフされた否かを判断する。
【００５５】
終了動作が行われていないと判断した場合（ＳＴ２０３：ＮＯ）、処理はステップＳＴ２００に戻る。終了動作が行われたと判断した場合（ＳＴ２０３：ＹＥＳ）、処理は終了する。
【００５６】
ところで、ポイントが登録中であると判断した場合（ＳＴ２０２：ＹＥＳ）、ナビゲーション部３０は、登録されたポイントの位置情報をデータベース２０から取得し（ＳＴ２１０）、親和度登録部３４は、取得した位置情報を登録内容として親和度データベース６０に登録する（ＳＴ２１１）。これにより、親和度データベース６０は、登録された位置について記録されている親和度を更新して記録することとなる。その後、処理はステップＳＴ２０３に移行する。
【００５７】
また、自車両が走行中であると判断した場合（ＳＴ２０１：ＹＥＳ）、ナビゲーション部３０は、ＧＰＳ受信機１１からの緯度や経度の情報及び時刻情報に基づいて、現在位置と時刻とを走行履歴として取得する（ＳＴ２２０）。そして、親和度登録部３４は親和度の更新を行う（ＳＴ２１１）。これにより、親和度データベース６０は親和度を更新する。その後、処理はステップＳＴ２０３に移行する。
【００５８】
なお、表示装置３は、ジャイロセンサ１２や車速センサ１３を備えているため、ナビゲーション部３０は、ＧＰＳ受信機１１からの信号によることなく、ジャイロセンサ１２や車速センサ１３からの信号に基づいて位置情報を求めるようにしてもよい。
【００５９】
また、音声認識対象語がユーザによって発話されたと判断した場合（ＳＴ２００：ＹＥＳ）、提示処理部３３は、親和度データベース６０から親和度の情報を取得する。そして、音声認識部３１にて認識された音声が表示を指定する地域について、親和度と予め記憶されている所定値とを比較する。
【００６０】
比較後、提示処理部３３は、比較結果に基づいて表示内容を生成し、ディスプレイ４０は、生成された表示内容を表示する（ＳＴ２３１）。表示後、ナビゲーション部３０は、ディスプレイ４０に表示された表示位置の情報を操作履歴として取得する（ＳＴ２３２）。そして、親和度登録部３４は親和度の更新を行う（ＳＴ２１１）。これにより、親和度データベース６０は親和度を更新する。その後、処理はステップＳＴ２０３に移行する。
【００６１】
以下、ステップＳＴ２３１で表示される表示画像について、図１０を参照して説明する。図１０は、親和度に基づいて表示される画像を示す説明図であり、（ａ）は表示内容切替前の画像の一例を示しており、（ｂ）は表示内容切替後の画像の一例を示しており、（ｃ）は表示内容切替後の画像の他の例を示している。
【００６２】
図１０（ａ）に示すように、表示内容切替前の画像には、音声認識対象語である「横浜市」や「横須賀市」等が漢字による一般表記４１で表示されている。また、領域７０は、ユーザが過去に行ったことがあったり（走行履歴）、予めユーザに登録されていたり（登録内容）、操作された表示されたことがあったり（操作履歴）する横浜市を含む地域であり、親和度が所定値以上となっている。
【００６３】
次に、ユーザが「横浜市」を指定したとする。このとき、提示処理部３３は、図１０（ｂ）に示すように、領域７０に含まれる音声認識対象語（「青葉区」等）を一般表記４１にし、領域７０に含まれない音声認識対象語（「たまく」等）を平仮名による仮名表記４２にして横浜市及びその周辺の詳細地図を表示する。
【００６４】
一方、ユーザが「横須賀市」を指定した場合、表示画像は図１０（ｃ）に示すようになる。すなわち、横須賀市は領域７０に含まれていないので、音声認識対象語である「山中町」等は、すべて「やまなかちょう」などの平仮名表記で表示される。
【００６５】
すなわち、提示処理部３３は、親和度が所定値以上の地域について漢字表記などの一般表記４１とし、親和度が所定値を下回る地域について一般表記４１を仮名表記４２にして表示内容を生成している。
【００６６】
このようにして、本実施形態に係る表示装置３及びその渋滞表示方法では、音声が表示を指定する地域について、記憶されたユーザの親和度が所定値を下回る場合に、指定された地域内に表示される１又は複数の音声認識対象語のうち少なくとも１以上の音声認識対象語を仮名表記４２にして表示するので、ユーザは、初めて訪れたり画面表示したりした地域について、音声認識対象語の読み方に迷うことなくなり、複雑な操作を要することなく音声認識対象語を指定することが可能となる。従って、音声による入力操作の円滑化を図ることができる（請求項５，１４の効果）。
【００６７】
また、親和度は、走行履歴や登録内容や操作履歴に基づいて求められる。すなわち、親和度はユーザの行動や使用状態に応じて変化することとなる。このため、各地域の親和度はユーザ毎に設定されることとなり、各ユーザに対して仮名表記４２が適切に提供されることとなる。従って、ユーザに対して柔軟に音声入力の支援を行うことができる（請求項６，７，８）。
【００６８】
なお、本発明は上記実施形態に限られるものではない。例えば、第１及び第２実施形態では、走行履歴や操作履歴は記憶されたままとされているが、走行履歴や操作履歴を取得してから所定日数経過すると、親和度登録部３４がこれらの履歴を削除するようにしてもよい。また、第３実施形態では、親和度と所定値とを比較し、親和度が所定値を下回る場合に仮名表記４２として表示するようにしているが、親和度を所定値と比較することなく、単に走行履歴や登録内容や操作履歴のうちいずれかが親和度データベース６０に記憶されている場合に音声認識対象語を漢字などの一般表記４１にし、いずれも記憶されていない場合に音声認識対象語を平仮名などの仮名表記４２にしてもよい。
【００６９】
また、第１〜第３実施形態では、音声認識対象語として各市区町名や駅名を挙げているが、音声認識対象語は、ランドマークや建築物等の名称であってもよい。また、音声認識対象語を仮名表記４２とする際、すべて平仮名としているが、仮名表記４２は、平仮名でなく片仮名による表記であってもよい。
【００７０】
さらに、第１〜第３実施形態では、提示処理部３３は、仮名表記４２にて表示を行う場合、常に仮名を表示する必要はなく、時分割で一般表記４１と仮名表記４２とを切り替えるようにしてもよい。また、提示処理部３３は、仮名表記４２として「へみ駅」などのように、区市町村や駅等の明らかに読むことができる漢字を仮名にせず、他の部分だけを仮名にして、ディスプレイ４０に表示させるようにしてもよい。
【図面の簡単な説明】
【図１】本発明の第１実施形態に係る表示装置の構成図である。
【図２】第１実施形態の表示装置の動作を示すフローチャートである。
【図３】第１の提示処理が実行される場合に表示手段に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【図４】第２の提示処理が実行される場合に表示手段に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【図５】第３の提示処理が実行される場合に表示手段に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【図６】音声認識対象語の尤度の一例を示す説明図である。
【図７】第４の提示処理が実行される場合に表示手段に表示される画像を示す説明図であり、（ａ）は実行前の表示画像の一例を示しており、（ｂ）は実行後の表示画像の一例を示している。
【図８】本発明の第３実施形態に係る表示装置の構成図である。
【図９】第３実施形態の表示装置の動作を示すフローチャートである。
【図１０】親和度に基づいて表示される画像を示す説明図であり、（ａ）は表示内容切替前の画像の一例を示しており、（ｂ）は表示内容切替後の画像の一例を示しており、（ｃ）は表示内容切替後の画像の他の例を示している。
【符号の説明】
３１　音声認識部（音声認識手段）
３２　判定部（選択手段）
３３　提示処理部（表示内容生成手段）
４０　ディスプレイ（表示手段）
４２　仮名表記
６０　親和度データベース（記憶手段）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a display device that executes a predetermined process when a speech recognition target word to be subjected to speech recognition is accurately uttered, and a name display method thereof.
[0002]
[Prior art]
As a navigation device, which is one of the conventional display devices, for example, when it is unknown how to read a destination, the user inputs a voice of a place name around the destination and a distance from the place name around the destination to the destination. 2. Description of the Related Art There is known an apparatus that specifies a position, gives voice guidance to a user about the name of a destination existing at this position, and allows the user to specify a destination whose reading method was initially unknown (for example, see Patent Document 1).
[0003]
[Patent Document 1]
JP 2000-337912 A
[0004]
[Problems to be solved by the invention]
However, in a conventional display device (navigation device), when it is not clear how to read a destination or the like, an operation for designating the destination or the like is complicated and troublesome. In particular, when the user concentrates on driving, such as when running on rainy weather, the operation becomes very troublesome.
[0005]
SUMMARY OF THE INVENTION The present invention has been made to solve such a conventional problem, and an object of the present invention is to provide a display device and a name display method capable of facilitating an input operation by voice. It is in.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, according to the present invention, one or more speech recognition target words to be subjected to speech recognition are displayed on a screen, and when the speech recognition target word is correctly uttered, setting of a destination and When a predetermined process such as screen switching is performed and the user's utterance of the speech recognition target word is recognized as being inaccurate, or when the user's utterance of the speech recognition target word is not recognized for a predetermined time, the screen is displayed. At least one of the one or more speech recognition target words displayed above is displayed in kana notation.
[0007]
【The invention's effect】
According to the present invention, since the utterance state of the user is detected and at least one or more speech recognition targets displayed on the screen are displayed in kana notation, the user cannot read in notation such as kanji. It is possible to know the correct reading of the speech recognition target word, and it is possible to easily specify the speech recognition target word without requiring a complicated operation. Therefore, it is possible to facilitate the input operation by voice.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a preferred embodiment of the present invention will be described with reference to the drawings.
[0009]
FIG. 1 is a configuration diagram of a display device according to the first embodiment of the present invention. As shown in FIG. 1, the display device 1 is mounted on a vehicle and displays one or a plurality of speech recognition target words to be subjected to speech recognition on a screen. , And is preferably applied to a voice recognition navigation device or the like. Hereinafter, an example in which the display device 1 is applied to a navigation device for voice recognition will be described.
[0010]
The display device 1 includes a voice input unit 10 for inputting voice, a database 20 holding various data, and a navigation unit 30 for inputting a voice signal from the voice input unit 10 and outputting generated image data and the like. And a display (display means) 40 for displaying an image based on the image data from the navigation unit 30, and a speaker 50 for outputting a sound based on the sound data generated by the navigation unit 30.
[0011]
More specifically, the database 20 includes a recognition dictionary unit 21 that stores a speech recognition target word displayed on the display 40 and is a target of voice input, and a misread name that is an erroneous reading of the speech recognition target word, and a road map. And a map data section 22 for storing data on the position of a building, a building, and the like, kanji data and kana data for voice recognition, and the like.
[0012]
The navigation unit 30 includes a speech recognition unit (speech recognition unit) 31 for recognizing the speech uttered by the user, and a case where the speech recognition unit 31 recognizes that the speech of the speech recognition target word by the user is incorrect. Alternatively, when the utterance of the user is not recognized by the voice recognition unit 31 for a predetermined time, at least one or more voice recognition target words are selected from one or a plurality of voice recognition target words displayed on the screen of the display 40. The apparatus includes a determination unit (selection unit) 32 and a presentation processing unit (display content generation unit) 33 that generates display content by converting the speech recognition target word selected by the determination unit 32 into kana notation.
[0013]
The voice recognition unit 31 calculates the likelihood by comparing the voice recognition target word recorded by the recognition dictionary unit 21 with the voice uttered by the user, and determines the likelihood to be a first predetermined value (for example, 80%). The speech recognition target words described above are configured to be recognized as uttered.
[0014]
FIG. 2 is a flowchart illustrating the operation of the display device 1 of the present embodiment. As shown in the figure, first, the voice recognition unit 31 determines whether a voice has been input by the user (ST100). When it is determined that no voice has been input by the user (ST100: NO), navigation unit 30 determines whether or not voice recognition unit 31 has not recognized the utterance of the user for a predetermined time (ST101).
[0015]
When the voice recognition unit 31 determines that the utterance of the user has been recognized within the predetermined time (ST101: NO), the navigation unit 30 determines whether or not an end operation has been performed (ST102). That is, the navigation unit 30 determines, for example, whether the ignition switch is turned off.
[0016]
If it is determined that the end operation has not been performed (ST102: NO), the process returns to step ST100. When it is determined that the end operation has been performed (ST102: YES), the process ends.
[0017]
By the way, when the voice recognition unit 31 determines that the utterance of the user has not been recognized for the predetermined time (ST101: YES), the navigation unit 30 executes the first or second presentation process (ST110). Then, after the execution of the first or second presentation process, the process proceeds to step ST102.
[0018]
Here, the first presentation process is a process in which the presentation processing unit 33 generates display contents by converting all of one or a plurality of speech recognition target words displayed on the screen into kana. Then, when the first presentation process is executed, the following image is displayed on the display 40.
[0019]
FIGS. 3A and 3B are explanatory diagrams showing images displayed on the display 40 when the first presentation process is executed. FIG. 3A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
[0020]
As shown in FIG. 3A, the display image before the first presentation process is executed includes words such as “Taura Daisakucho” and “Hemi Station”, which are the words to be recognized, in a general kanji notation 41. Have been. Then, when the first presentation process is executed, as shown in FIG. 3B, all the words to be recognized, such as "Taura Daisakucho" and "Hemi Station", are displayed in the kana notation 42 in hiragana. Is done. As described above, when the first presentation processing is executed, all the words to be recognized on the screen displayed on the screen are changed from the general notation 41 such as kanji to the kana notation 42 such as hiragana.
[0021]
In the second presentation process, the determination unit 32 selects a speech recognition target word registered in advance as being difficult to read from one or a plurality of speech recognition target words displayed on the screen, 33 is a process of generating display contents by changing the selected speech recognition target word to a kana notation 42. Then, when the second presentation process is executed, the following image is displayed on the display 40.
[0022]
FIGS. 4A and 4B are explanatory diagrams showing images displayed on the display 40 when the second presentation process is executed. FIG. 4A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
[0023]
As shown in FIG. 4A, in the display image before the first presentation process is executed, words such as “Shioiri Station” and “Hemi Station” which are speech recognition target words are displayed in a general kanji notation 41 in kanji. ing. Then, when the second presentation process is executed, as shown in FIG. 4B, all of the words “Shiori Station” and “Hemi Station”, which are difficult to read, among the words to be recognized, are written in hiragana 42 Displayed with. As described above, when the second presentation process is performed, the words that are difficult to read among the speech recognition target words displayed on the screen are changed from the general notation 41 such as kanji to the kana notation 42 such as hiragana. Is displayed.
[0024]
The determination unit 32 determines whether the obfuscation is the obfuscation by referring to data registered in the database 20 in advance.
[0025]
Description will be made again with reference to FIG. When it is determined that a voice has been input by the user (ST100: YES), the voice recognition unit 31 determines whether or not the input voice matches voice data recorded in the voice recognition dictionary unit 21 (step S100). ST120). That is, the speech recognition unit 31 determines whether any of the likelihoods of the speech recognition target words displayed on the screen and the uttered speech has become equal to or greater than the first predetermined value.
[0026]
If it is determined that the voice data recorded in the voice recognition dictionary unit 21 and the input voice do not match (ST120: NO), that is, if all the likelihoods fall below the first predetermined value, the voice recognition is performed. The unit 31 determines whether or not there is a speech recognition target word having a likelihood greater than or equal to a second predetermined value (for example, 50%) set smaller than the first predetermined value (ST121).
[0027]
If it is determined that there is no speech recognition target word having a likelihood equal to or greater than the second predetermined value (ST121: NO), the process proceeds to step ST102. When it is determined that there is a speech recognition target word having a likelihood greater than or equal to the second predetermined value (ST121: YES), navigation unit 30 executes a third presentation process (ST130). After executing the third presentation process, the process proceeds to step ST102.
[0028]
Here, the third presentation process means that the determination unit 32 selects a speech recognition target word having a likelihood of a second predetermined value or more from one or a plurality of speech recognition target words displayed on the screen. Then, the presentation processing unit 33 converts the selected speech recognition target word into a kana notation 42 and generates display content. Then, when the third presentation process is executed, the following image is displayed on the display 40.
[0029]
FIGS. 5A and 5B are explanatory diagrams illustrating images displayed on the display 40 when the third presentation process is executed. FIG. 5A illustrates an example of a display image before execution, and FIG. An example of a display image after is shown. FIG. 6 is an explanatory diagram showing an example of the likelihood of the speech recognition target word.
[0030]
As shown in FIG. 5A, in the display image before the third presentation process is executed, words such as “Taura Daisakucho”, which is a target word for speech recognition, are displayed in the general notation 41 in kanji. Here, it is assumed that the user intends to specify “Taura Daisakucho” and utters “Tauradaisakucho”. At this time, the likelihood is “65%” for “Taura Daisakucho”, “30%” for “Taura Izumicho”, and “20%” for “Taura Station” as shown in FIG. 6, for example.
[0031]
When the second predetermined value is set to “50%”, the only voice recognition target word that is equal to or more than the second predetermined value is “Taura Daisakucho”. In this case, when the third presentation process is executed, as shown in FIG. 5B, the word “Taura Daisakucho” which is the speech recognition target word having the second predetermined value or more is written in hiragana and kana. 42 is displayed. As described above, when the third presentation process is executed, words having a second predetermined value or more among the speech recognition target words displayed on the screen are changed from the general notation 41 such as kanji to the kana notation such as hiragana. 42 is displayed.
[0032]
In the description here, the number of speech recognition target words represented by the kana 42 is one, but the number is not particularly limited to one, and two or more speech recognition target words have a second predetermined value or more. If there is, two or more speech recognition target words are naturally used as the kana notation 42.
[0033]
Description will be made again with reference to FIG. When it is determined that the voice data recorded in the voice recognition dictionary unit 21 matches the input voice (ST120: YES), the voice recognition unit 31 determines whether the erroneous reading of the target word for voice recognition is incorrect. It is determined whether or not matching has been performed (ST140).
[0034]
When it is determined that there is no matching with the misread name that is an erroneous reading of the speech recognition target word (ST140: NO), that is, when it is determined that the speech recognition target word has been uttered correctly, normal display contents are presented. (ST141). That is, as shown in FIG. 3A and FIG. 4A, the speech recognition target word is displayed as the general notation 41 instead of the kana notation 42. After the normal display, the process proceeds to step ST102.
[0035]
On the other hand, when it is determined that the voice recognition target word matches the misreading name that is a wrong reading (ST140: YES), the navigation unit 30 executes a fourth presentation process (ST150). Then, after execution of the fourth presentation process, the process proceeds to step ST102.
[0036]
Here, the fourth presentation processing means that the determination unit 32 selects a speech recognition target word having an utterance of the misread name, and the presentation processing unit 33 displays one or more speech recognition targets displayed on the screen. In this process, the speech recognition target word selected by the determination unit 32 among the words is changed to a kana notation 42 to generate display contents. Then, when the fourth presentation process is executed, the following image is displayed on the display 40.
[0037]
FIGS. 7A and 7B are explanatory diagrams illustrating an image displayed on the display 40 when the fourth presentation process is executed. FIG. 7A illustrates an example of a display image before execution, and FIG. An example of a display image after is shown. As shown in FIG. 7A, in the display image before the fourth presentation process is executed, words such as “Hemi Station”, which is a target word for voice recognition, are displayed in Chinese characters.
[0038]
Here, it is assumed that the user intends to designate "Hemi Station" and utters "Isumi Eki". At this time, the voice recognizing unit 31 stores the reading of “Himieki”, which is the misread name of “Hemi Station”, and determines that the misread name of “Hemi Station” was uttered. Then, when the fourth presentation process is executed, as shown in FIG. 7B, the word “Hemi Station”, which is a target word for speech recognition, is displayed in a kana notation 42 in hiragana. As described above, when the fourth presentation process is executed, the speech recognition target word uttered by the misread name among the one or more speech recognition target words displayed on the screen is changed from the general notation 41 such as a kanji to the hiragana. Is displayed after being changed to a pseudonym notation 42.
[0039]
In this manner, in the display device 1 and the name display method according to the present embodiment, the utterance state of the user is detected, and at least one or more voice recognition targets displayed on the screen are displayed in the pseudonym 42 and displayed. In addition, the user knows the correct reading of the speech recognition target word that could not be read in the notation of the kanji or the like, and can specify the speech recognition target word without requiring a complicated operation. Therefore, the input operation by voice can be smoothly performed (the effects of the first and the thirteenth aspects).
[0040]
Further, when the speech recognition target word is not recognized by the user in the speech recognition unit 31 for a predetermined time, the one that is difficult to read among the speech recognition target words displayed on the screen is displayed as the pseudonym 42. As compared with the case where all the words to be recognized on the screen are kana 42, the increase in the number of characters on the entire screen is suppressed, and the decrease in visibility due to the increase in the number of characters can be suppressed. 2).
[0041]
Further, since the speech recognition target word whose likelihood is equal to or more than the second predetermined value is displayed as the kana notation 42, the user must be able to utter correctly regardless of whether the speech recognition target word is obfuscated or not. For example, it is possible to present an accurate reading for a speech recognition target word that is highly likely to be specified by the user, and it is possible to flexibly support the user for voice input. ).
[0042]
Further, since the speech recognition target word in which the misread name is uttered is displayed as the kana notation 42, only the speech recognition target word in which the erroneous reading is uttered is used as the kana notation 42. For this reason, only the speech recognition target word that has a very high probability of being specified by the user is changed to the kana notation 42, thereby minimizing the increase in the number of characters on the entire screen and suppressing the decrease in visibility more efficiently. (Effect of claim 4).
[0043]
Next, a second embodiment of the present invention will be described. The display device 2 according to the second embodiment is substantially the same as the display device 1 according to the first embodiment, but differs in the following points.
[0044]
That is, in the display device 1 according to the first embodiment, in the first to fourth presentation processes, the general notation 41 such as a kanji is displayed as the kana notation 42, but the display device 2 according to the second embodiment is displayed. Is configured to display a general notation 41 such as a kanji with a symbol attached. That is, in the second embodiment, the presentation processing unit 33 attaches a symbol such as a number or an alphabet to the speech recognition target word to make it a symbol-added name, and the display 40 displays it.
[0045]
Further, the voice recognition unit 31 recognizes the utterance of the symbol while the voice recognition target word to which the symbol is attached is displayed on the screen of the display 40. When the symbol is uttered, the voice recognition unit 31 recognizes that the voice recognition target word to which the symbol is attached has been uttered, and the display device 2 executes a predetermined process such as setting a destination or switching a screen.
[0046]
As described above, in the display device 2 and the name display method according to the present embodiment, the user can easily utter the speech recognition target word that could not be read in the notation of the kanji or the like, and a complicated operation can be performed. It is possible to specify the speech recognition target word without the need for (1). Therefore, the input operation by voice can be smoothly performed (the effects of claims 5 and 14).
[0047]
Further, if the speech recognition target word is not recognized by the user in the speech recognition unit 31 for a predetermined time, a symbol that is difficult to read among the speech recognition target words displayed on the screen is attached and displayed. As a result, an increase in the number of characters on the entire screen is suppressed as compared with the case where symbols are attached to all of the words to be recognized on the screen. Therefore, a decrease in visibility due to an increase in the number of characters can be suppressed (the effect of claim 6).
[0048]
In addition, since the speech recognition target word whose likelihood is equal to or greater than the second predetermined value is displayed with a symbol attached thereto, the user must be able to accurately speak regardless of whether the speech recognition target word is obfuscated or not. In this case, a symbol is attached to the speech recognition target word that is highly likely to be specified by the user, and the user can flexibly support the voice input (effect of claim 7).
[0049]
Further, since the misrecognized name is displayed with a symbol attached to the uttered speech recognition target word, the symbol is attached only to the speech recognition target word for which the erroneous reading was uttered. For this reason, a symbol is attached only to the speech recognition target word that has an extremely high probability of being specified by the user. Therefore, an increase in the number of characters on the entire screen can be minimized, and a decrease in visibility can be suppressed more efficiently (the effect of claim 8).
[0050]
Next, a third embodiment of the present invention will be described. FIG. 8 is a configuration diagram of a display device according to the third embodiment of the present invention. As shown in the figure, the display device 3 according to the third embodiment receives a radio wave from a GPS satellite in addition to the display device 1 of the first embodiment, and displays the latitude and longitude of the current position of the vehicle and the current position. A GPS receiver 11 for outputting information such as time, a gyro sensor 12 for knowing a change in the angle of the vehicle body, a vehicle speed sensor 13 for outputting a pulse signal of a number proportional to the traveling speed and the distance of the vehicle, and a navigation signal. The unit 30 includes a remote controller 14 for wirelessly transmitting the data, and an affinity database (storage means) 60 for storing the affinity of the user for each area.
[0051]
As shown in the figure, the navigation unit 30 has an affinity registration unit 34 for updating the affinity based on a predetermined operation or the like, instead of the determination unit 32. Also, the presentation processing unit 33 of the navigation unit 30 uses the speech recognition target word based on the affinity stored in the affinity database 60 instead of changing the speech recognition target word selected by the determination unit 32 to the kana notation 42. Is determined to be a kana notation 42.
[0052]
Here, the affinity is an index indicating how much the user knows the place name of each region, and is obtained based on the running history of the vehicle, the registered contents registered by the user in advance, and the operation history of the user. Things.
[0053]
FIG. 9 is a flowchart illustrating the operation of the display device 3 of the present embodiment. As shown in the figure, first, the speech recognition unit 31 determines whether or not the speech recognition target word has been uttered by the user (ST200). When it is determined that the speech recognition target word has not been uttered by the user (ST200: NO), navigation unit 30 determines whether or not the own vehicle is currently running (ST201).
[0054]
When it is determined that the vehicle is not traveling (ST201: NO), the navigation unit 30 determines whether or not a point is being registered (ST202). When it is determined that the point is not being registered (ST202: NO), the navigation unit 30 determines whether an end operation has been performed (ST203). That is, the navigation unit 30 determines whether the ignition switch or the like is turned off.
[0055]
If it is determined that the end operation has not been performed (ST203: NO), the process returns to step ST200. If it is determined that the end operation has been performed (ST203: YES), the process ends.
[0056]
By the way, when it is determined that the point is being registered (ST202: YES), the navigation unit 30 acquires the position information of the registered point from the database 20 (ST210), and the affinity registration unit 34 acquires the acquired position. The information is registered in the affinity database 60 as registered contents (ST211). As a result, the affinity database 60 updates and records the affinity recorded for the registered position. Thereafter, the process proceeds to step ST203.
[0057]
When it is determined that the own vehicle is traveling (ST201: YES), the navigation unit 30 records the current position and the time on the basis of the latitude and longitude information and the time information from the GPS receiver 11, and records the travel history. (ST220). Then, the affinity registering unit 34 updates the affinity (ST211). Thereby, the affinity database 60 updates the affinity. Thereafter, the process proceeds to step ST203.
[0058]
In addition, since the display device 3 includes the gyro sensor 12 and the vehicle speed sensor 13, the navigation unit 30 does not rely on the signal from the GPS receiver 11 but based on the signal from the gyro sensor 12 and the vehicle speed sensor 13. Information may be requested.
[0059]
When it is determined that the speech recognition target word has been uttered by the user (ST200: YES), the presentation processing unit 33 acquires the affinity information from the affinity database 60. Then, for the region where the voice recognized by the voice recognition unit 31 specifies display, the affinity is compared with a predetermined value stored in advance.
[0060]
After the comparison, the presentation processing unit 33 generates display contents based on the comparison result, and the display 40 displays the generated display contents (ST231). After the display, the navigation unit 30 acquires information on the display position displayed on the display 40 as an operation history (ST232). Then, the affinity registering unit 34 updates the affinity (ST211). Thereby, the affinity database 60 updates the affinity. Thereafter, the process proceeds to step ST203.
[0061]
Hereinafter, the display image displayed in step ST231 will be described with reference to FIG. FIGS. 10A and 10B are explanatory diagrams showing images displayed based on the affinity. FIG. 10A shows an example of an image before display content switching, and FIG. 10B shows an example of an image after display content switching. (C) shows another example of the image after the display contents are switched.
[0062]
As shown in FIG. 10A, in the image before the display contents are switched, words such as “Yokosuka City” and “Yokosuka City”, which are speech recognition target words, are displayed in general notation 41 using kanji. The area 70 is a Yokohama city where the user has performed in the past (driving history), registered with the user in advance (registration contents), or operated and displayed (operation history). And the affinity is equal to or higher than a predetermined value.
[0063]
Next, it is assumed that the user has designated "Yokohama City". At this time, as illustrated in FIG. 10B, the presentation processing unit 33 converts the speech recognition target words (such as “Aoba Ward”) included in the area 70 into the general notation 41, and sets the speech recognition target words not included in the area 70. A word ("tamaku" or the like) is changed to a kana notation 42 in hiragana, and a detailed map of Yokohama and its surroundings is displayed.
[0064]
On the other hand, when the user designates “Yokosuka City”, the display image is as shown in FIG. That is, since Yokosuka City is not included in the area 70, all of the words to be speech-recognized, such as "Yamanakacho", are displayed in hiragana, such as "Yamanakacho".
[0065]
That is, the presentation processing unit 33 generates display contents by setting the general notation 41 such as a kanji notation for an area where the affinity is equal to or higher than a predetermined value, and changing the general notation 41 to a kana notation 42 for an area where the affinity is lower than the predetermined value. I have.
[0066]
In this manner, in the display device 3 and the traffic jam display method according to the present embodiment, when the stored user affinity is lower than the predetermined value for the region for which the voice is specified to be displayed, the display device 3 is displayed within the specified region. Since at least one or more speech recognition target words among the one or a plurality of displayed speech recognition target words are displayed in the kana notation 42, the user can determine whether or not the user has visited or displayed the screen for the first time. It is possible to specify a speech recognition target word without being confused about how to read and without requiring a complicated operation. Therefore, the input operation by voice can be smoothly performed (the effects of claims 5 and 14).
[0067]
The affinity is obtained based on the running history, the registered contents, and the operation history. That is, the affinity changes in accordance with the user's behavior and usage state. Therefore, the affinity of each area is set for each user, and the kana notation 42 is appropriately provided to each user. Therefore, it is possible to flexibly support the user for voice input (claims 6, 7, and 8).
[0068]
Note that the present invention is not limited to the above embodiment. For example, in the first and second embodiments, the traveling history and the operation history are kept stored. However, when a predetermined number of days have elapsed since the acquisition of the traveling history and the operation history, the affinity registration unit 34 sets the traveling history and the operation history. The history may be deleted. Further, in the third embodiment, the affinity is compared with a predetermined value, and when the affinity is lower than the predetermined value, it is displayed as a pseudonym notation 42, but without comparing the affinity with the predetermined value, If any one of the driving history, the registered content, and the operation history is stored in the affinity database 60, the speech recognition target word is changed to a general notation 41 such as a kanji, and if none is stored, the speech recognition target word is used. May be used as a kana notation 42 such as hiragana.
[0069]
In the first to third embodiments, the names of cities, towns and towns and the names of stations are listed as the words to be recognized. However, the words to be recognized may be names of landmarks and buildings. In addition, when the words to be recognized for speech are used as the kana notation 42, they are all hiragana, but the kana notation 42 may be written in katakana instead of hiragana.
[0070]
Furthermore, in the first to third embodiments, when performing the display in the kana notation 42, the presentation processing unit 33 does not need to always display the kana, and switches between the general notation 41 and the kana notation 42 by time sharing. It may be. Further, the presentation processing unit 33 does not use kanji that can be clearly read, such as a ward, municipal, or station, as the kana notation 42, such as "Hemi Station", and uses only the other parts of the kana as display names. 40 may be displayed.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a display device according to a first embodiment of the present invention.
FIG. 2 is a flowchart illustrating an operation of the display device of the first embodiment.
FIGS. 3A and 3B are explanatory diagrams showing images displayed on a display unit when a first presentation process is executed, wherein FIG. 3A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
FIGS. 4A and 4B are explanatory diagrams showing images displayed on a display unit when a second presentation process is executed, wherein FIG. 4A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
FIGS. 5A and 5B are explanatory diagrams showing images displayed on a display unit when a third presentation process is executed, wherein FIG. 5A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
FIG. 6 is an explanatory diagram showing an example of the likelihood of a speech recognition target word.
FIGS. 7A and 7B are explanatory diagrams showing images displayed on a display unit when a fourth presentation process is executed, wherein FIG. 7A shows an example of a display image before execution, and FIG. An example of a display image after is shown.
FIG. 8 is a configuration diagram of a display device according to a third embodiment of the present invention.
FIG. 9 is a flowchart illustrating an operation of the display device according to the third embodiment.
FIGS. 10A and 10B are explanatory diagrams showing images displayed based on affinity, wherein FIG. 10A shows an example of an image before display content switching, and FIG. 10B shows an example of an image after display content switching. (C) shows another example of the image after the display contents are switched.
[Explanation of symbols]
31 voice recognition unit (voice recognition means)
32 Judgment unit (selection means)
33 presentation processing unit (display content generation means)
40 Display (display means)
42 Kana notation
60 Affinity database (storage means)

Claims

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置において、
ユーザが発話した音声を認識する音声認識手段と、
前記音声認識手段にてユーザによる前記音声認識対象語の発話が不正確であったと認識された場合又は前記音声認識手段にてユーザによる発話が所定時間認識されなかった場合に、画面上に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語を仮名表記にして表示内容を生成する表示内容生成手段と、
前記表示内容生成手段にて生成された前記表示内容を表示する表示手段と、
を備えることを特徴とする表示装置。A display device that is mounted on a vehicle and displays one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and executes a predetermined process when the speech recognition target words are accurately uttered.
Voice recognition means for recognizing voice uttered by the user;
When the speech recognition unit recognizes that the speech of the speech recognition target word by the user was incorrect or when the speech recognition unit does not recognize the speech of the user for a predetermined time, the speech recognition unit displays the speech on the screen. Display content generation means for generating display content by using at least one of the speech recognition target words among the one or more speech recognition target words as kana notation;
Display means for displaying the display content generated by the display content generation means,
A display device comprising:

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により仮名表記とされる前記音声認識対象語を選択する選択手段をさらに備え、
前記選択手段は、前記音声認識手段にてユーザによる発話が所定時間認識されなかった場合に、１又は複数の前記音声認識対象語のうち予め難読であるとして登録された前記音声認識対象語を選択することを特徴とする請求項１に記載の表示装置。Selecting means for selecting, from one or a plurality of the speech recognition target words displayed on a screen, the speech recognition target word to be kana notation by the display content generation means,
The selecting means selects the speech recognition target word registered in advance as being difficult to read from one or a plurality of the speech recognition target words when the speech by the user is not recognized by the speech recognition unit for a predetermined time. The display device according to claim 1, wherein:

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により仮名表記とされる前記音声認識対象語を選択する選択手段をさらに備え、
前記音声認識手段は、画面上に表示される１又は複数の前記音声認識対象語それぞれと発話された音声との尤度がすべて第１の所定値を下回った場合に、前記音声認識対象語の発話が不正確であったと認識し、
前記選択手段は、前記尤度がすべて第１の所定値を下回ったことによって、前記音声認識手段にて前記音声認識対象語の発話が不正確であったと認識された場合、前記尤度が前記第１の所定値よりも小さく設定される第２の所定値以上となった前記音声認識対象語を選択することを特徴とする請求項１に記載の表示装置。Selecting means for selecting, from one or a plurality of the speech recognition target words displayed on a screen, the speech recognition target word to be kana notation by the display content generation means,
The voice recognition means, when the likelihood of each of the one or more speech recognition target words displayed on the screen and the uttered voice is all below a first predetermined value, the speech recognition target word Recognizing that the utterance was incorrect,
The selecting means, when the likelihood is less than a first predetermined value, when the speech recognition means is recognized that the utterance of the speech recognition target word was incorrect, the likelihood is the said The display device according to claim 1, wherein the speech recognition target word having a second predetermined value which is set to be smaller than a first predetermined value or more is selected.

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により仮名表記とされる前記音声認識対象語を選択する選択手段をさらに備え、
前記音声認識手段は、前記音声認識対象語の誤った読みであって予め登録された誤読名称が発話された場合に、前記音声認識対象語の発話が不正確であったと認識し、
前記選択手段は、前記誤読名称が発話されたことによって、前記音声認識手段にて前記音声認識対象語の発話が不正確であったと認識された場合、その誤読名称が発話された前記音声認識対象語を選択することを特徴とする請求項１に記載の表示装置。Selecting means for selecting, from one or a plurality of the speech recognition target words displayed on a screen, the speech recognition target word to be kana notation by the display content generation means,
The voice recognition means, when the erroneous reading of the speech recognition target word is uttered and a misregistered name registered in advance is recognized, recognizes that the utterance of the speech recognition target word was incorrect,
The selecting means, when the misrecognized name is spoken, and the speech recognizing means recognizes that the speech of the speech recognition target word is incorrect, the speech recognition target in which the misread name is spoken. The display device according to claim 1, wherein a word is selected.

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置において、
ユーザが発話した音声を認識する音声認識手段と、
前記音声認識手段にてユーザによる前記音声認識対象語の発話が不正確であったと認識された場合又は前記音声認識手段にてユーザによる発話が所定時間認識されなかった場合に、画面上に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語に記号を付して表示内容を生成する表示内容生成手段と、
前記表示内容生成手段にて生成された前記表示内容を表示する表示手段とを備え、
前記音声認識手段は、前記記号が発話されることにより、前記記号のみとされた又は前記記号が付されている前記音声認識対象語が発話されたと認識することを特徴とする表示装置。A display device that is mounted on a vehicle and displays one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and executes a predetermined process when the speech recognition target words are accurately uttered.
Voice recognition means for recognizing voice uttered by the user;
When the speech recognition unit recognizes that the speech of the speech recognition target word by the user was incorrect or when the speech recognition unit does not recognize the speech of the user for a predetermined time, the speech recognition unit displays the speech on the screen. Display content generating means for generating a display content by attaching a symbol to at least one or more of the speech recognition target words of the one or more speech recognition target words,
Display means for displaying the display content generated by the display content generation means,
The display device, wherein the voice recognition unit recognizes that the voice recognition target word having only the symbol or having the symbol attached has been uttered when the symbol is uttered.

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により前記記号が付される前記音声認識対象語を選択する選択手段をさらに備え、
前記選択手段は、前記音声認識手段にてユーザによる発話が所定時間認識されなかった場合に、１又は複数の前記音声認識対象語のうち予め難読であるとして登録された前記音声認識対象語を選択することを特徴とする請求項５に記載の表示装置。Selecting means for selecting, from one or a plurality of speech recognition target words displayed on a screen, the speech recognition target word to which the symbol is attached by the display content generation means;
The selecting means selects the speech recognition target word registered in advance as being difficult to read from one or a plurality of the speech recognition target words when the speech by the user is not recognized by the speech recognition unit for a predetermined time. The display device according to claim 5, wherein:

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により前記記号が付される前記音声認識対象語を選択する選択手段をさらに備え、
前記音声認識手段は、画面上に表示される１又は複数の前記音声認識対象語それぞれと発話された音声との尤度がすべて第１の所定値を下回った場合に、前記音声認識対象語の発話が不正確であったと認識し、
前記選択手段は、前記尤度がすべて第１の所定値を下回ったことによって、前記音声認識手段にて前記音声認識対象語の発話が不正確であったと認識された場合、前記尤度が前記第１の所定値よりも小さく設定される第２の所定値以上となった前記音声認識対象語を選択することを特徴とする請求項５に記載の表示装置。Selecting means for selecting, from one or a plurality of speech recognition target words displayed on a screen, the speech recognition target word to which the symbol is attached by the display content generation means;
The voice recognition means, when the likelihood of each of the one or more speech recognition target words displayed on the screen and the uttered voice is all below a first predetermined value, the speech recognition target word Recognizing that the utterance was incorrect,
The selecting means, when the likelihood is less than a first predetermined value, when the speech recognition means is recognized that the utterance of the speech recognition target word was incorrect, the likelihood is the said The display device according to claim 5, wherein the speech recognition target word having a second predetermined value which is set to be smaller than a first predetermined value or more is selected.

画面上に表示される１又は複数の前記音声認識対象語から、前記表示内容生成手段により前記記号が付される前記音声認識対象語を選択する選択手段をさらに備え、
前記音声認識手段は、前記音声認識対象語の誤った読みであって予め登録されている誤読名称が発話された場合に、前記音声認識対象語の発話が不正確であったと認識し、
前記選択手段は、前記誤読名称が発話されたことによって、前記音声認識手段にて前記音声認識対象語の発話が不正確であったと認識された場合、その誤読名称が発話された前記音声認識対象語を選択することを特徴とする請求項５に記載の表示装置。Selecting means for selecting, from one or a plurality of speech recognition target words displayed on a screen, the speech recognition target word to which the symbol is attached by the display content generation means;
The voice recognition means, when an erroneous reading of the voice recognition target word is uttered and a misregistered name registered in advance is recognized, recognizes that the utterance of the voice recognition target word was incorrect,
The selecting means, if the misrecognized name is spoken, and the speech recognizing means recognizes that the speech of the speech recognition target word is incorrect, the speech recognition target in which the misread name is spoken. The display device according to claim 5, wherein a word is selected.

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置において、
ユーザが発話した音声を認識する音声認識手段と、
地域毎におけるユーザの親和度を記憶する記憶手段と、
前記音声認識手段にて認識された音声が表示を指定する地域について、前記記憶手段に記憶されたユーザの親和度が所定値を下回る場合に、指定された地域内に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語を仮名表記にして表示内容を生成する表示内容生成手段と、
前記表示内容生成手段にて生成された前記表示内容を表示する表示手段と、
を備えることを特徴とする表示装置。A display device that is mounted on a vehicle and displays one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and executes a predetermined process when the speech recognition target words are accurately uttered.
Voice recognition means for recognizing voice uttered by the user;
Storage means for storing user affinity in each region;
One or a plurality of areas displayed in the designated area when the user's affinity stored in the storage means falls below a predetermined value for the area in which the voice recognized by the voice recognition means specifies the display. A display content generation unit configured to generate display content by using at least one of the speech recognition target words in the kana notation in the speech recognition target words,
Display means for displaying the display content generated by the display content generation means,
A display device comprising:

前記地域毎におけるユーザの親和度は、前記車両の走行履歴に基づいて求められることを特徴とする請求項９に記載の表示装置。The display device according to claim 9, wherein the affinity of the user in each area is obtained based on a running history of the vehicle.

前記地域毎におけるユーザの親和度は、予めユーザによって登録された登録内容に基づいて求められることを特徴とする請求項９又は請求項１０のいずれかに記載の表示装置。The display device according to claim 9, wherein the affinity of the user in each area is obtained based on registration content registered by the user in advance.

前記地域毎におけるユーザの親和度は、ユーザの操作履歴に基づいて求められることを特徴とする請求項９〜請求項１１のいずれか１項に記載の表示装置。The display device according to any one of claims 9 to 11, wherein the affinity of the user for each area is obtained based on a user's operation history.

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置の名称表示方法において、
ユーザが発話した音声を認識する第１ステップと、
前記第１ステップにてユーザによる前記音声認識対象語の発話が不正確であったと認識された場合又は前記第１ステップにてユーザによる発話が所定時間認識されなかった場合に、画面上に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語を仮名表記にして表示内容を生成する第２ステップと、
前記第２ステップにて生成された前記表示内容を表示する第３ステップと、
を備えることを特徴とする表示装置の名称表示方法。A name display method for a display device mounted on a vehicle and displaying one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and performing a predetermined process when the speech recognition target words are accurately uttered. At
A first step of recognizing a voice uttered by the user;
When the utterance of the speech recognition target word by the user is recognized as being incorrect in the first step, or when the utterance by the user is not recognized for a predetermined time in the first step, the message is displayed on the screen. A second step in which at least one or more of the speech recognition target words of the one or more speech recognition target words is displayed in a kana notation and display content is generated;
A third step of displaying the display content generated in the second step;
A method for displaying a name of a display device, comprising:

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置の名称表示方法において、
ユーザが発話した音声を認識する第１ステップと、
前記第１ステップにてユーザによる前記音声認識対象語の発話が不正確であったと認識された場合又は前記音声認識手段にてユーザによる発話が所定時間認識されなかった場合に、画面上に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語に記号を付して表示内容を生成する第２ステップと、
前記第２ステップにて生成された前記表示内容を表示する第３ステップとを備え、
次回以降の前記第１ステップでは、前記記号が発話されることにより、前記記号が付されている前記音声認識対象語が発話されたと認識することを特徴とする表示装置の名称表示方法。A name display method for a display device mounted on a vehicle and displaying one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and performing a predetermined process when the speech recognition target words are accurately uttered. At
A first step of recognizing a voice uttered by the user;
If the utterance of the speech recognition target word by the user is recognized as being incorrect in the first step, or if the utterance by the user is not recognized for a predetermined time by the voice recognition means, the message is displayed on the screen. A second step of adding a symbol to at least one or more of the speech recognition target words among the one or more speech recognition target words to generate a display content;
A third step of displaying the display content generated in the second step,
In the first step after the next time, the symbol is uttered to recognize that the speech recognition target word to which the symbol is attached is uttered.

車両に搭載され、音声認識の対象となる音声認識対象語を画面上に１又は複数表示し、前記音声認識対象語が正確に発話されることにより所定の処理を実行する表示装置の名称表示方法において、
ユーザが発話した音声を認識する第１ステップと、
前記第１ステップにて認識された音声が表示を指定する地域について、記憶されているユーザの親和度が所定値を下回る場合に、指定された地域内に表示される１又は複数の前記音声認識対象語のうち少なくとも１以上の前記音声認識対象語を仮名表記にして表示内容を生成する第２ステップと、
前記第２ステップにて生成された前記表示内容を表示する第３ステップと、
前記第３ステップにて表示された地域について、ユーザの親和度を更新して記憶する第４ステップとを備え、
前記第４ステップにて更新して記憶された前記ユーザの親和度を、次回以降の前記第２ステップにおける所定値との比較に用いることを特徴とする表示装置の名称表示方法。A name display method for a display device mounted on a vehicle and displaying one or a plurality of speech recognition target words to be subjected to speech recognition on a screen, and performing a predetermined process when the speech recognition target words are accurately uttered. At
A first step of recognizing a voice uttered by the user;
One or more of the voice recognitions displayed in a specified area when the stored user affinity is lower than a predetermined value for the area in which the voice recognized in the first step specifies the display. A second step of generating display content by converting at least one or more of the target words of the target words into kana notation;
A third step of displaying the display content generated in the second step;
A fourth step of updating and storing the affinity of the user for the area displayed in the third step,
A name display method for a display device, wherein the affinity of the user updated and stored in the fourth step is used for comparison with a predetermined value in the second step from the next time.