JP3916947B2

JP3916947B2 - Display device with voice recognition function

Info

Publication number: JP3916947B2
Application number: JP2001387701A
Authority: JP
Inventors: 貴史續木; 良宏小島
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2001-12-20
Filing date: 2001-12-20
Publication date: 2007-05-23
Anticipated expiration: 2021-12-20
Also published as: JP2003186496A

Description

【０００１】
【発明の属する技術分野】
本発明は、ハイパーテキスト表示装置やＷＷＷブラウザ装置等に用いられる音声認識機能付き表示装置に関するものである。
【０００２】
【従来の技術】
特開平１０−２２２３４２号公報には、音声認識の対象語及びそれに結びつく処理を、ハイパーテキスト中で指定することを可能とし、柔軟にハイパーテキスト表示装置を音声制御することが可能なハイパーテキスト音声制御方法及び装置が開示されている。
【０００３】
【発明が解決しようとする課題】
表示手段に表示される同一文書内の他の部分や他文書にリンクされる領域をホットスポットと呼ぶ。従来の音声認識機能を有するハイパーテキスト表示装置やＷＷＷブラウザ等では、ハイパーテキストやＨＴＭＬ言語などの記述言語が記述されたファイルが入力されると、この入力されたファイルを解析し、ホットスポットに含まれる文字列の全てを、認識対象語として音声認識を行うようになっている。このため、一つのファイル内に存在するホットスポットが多いと、認識対象語が増加し、音声認識の精度が低下したり、使用するメモリ容量が増大するといった問題があった。
【０００４】
また、ホットスポットが表示領域の境界上に存在する場合、ユーザは、ホットスポットに含まれる文字列を全て読むことができないため、音声認識の精度が低下するという問題があった。
【０００５】
本発明は、このような従来の問題点に鑑みてなされたものであって、表示手段に表示されたホットスポットに含まれる文字列のみを認識対象語とすることにより、認識対象語を減らし、使用するメモリ容量を減少させ、音声認識の精度を向上できる声認識機能付き表示装置を提供することを目的とする。
【０００６】
【課題を解決するための手段】
本願の請求項１の発明は、ハイパーテキスト又はＨＴＭＬ言語を含む記述言語で記載されたファイルを格納するファイル記憶手段と、リンク先が入力されると、入力されたリンク先のファイルを前記ファイル記憶手段から読み出すファイル入力手段と、入力される情報を表示する表示手段と、前記ファイル入力手段からファイルが入力されると、入力ファイルに記載された記述言語を解析し、この解析結果に基づき入力ファイルを前記表示手段に表示すると共に、前記表示手段に表示される同一文書内の他の部分又は他文書にリンクされる領域（以下、ホットスポットという）に含まれる文字列と前記ホットスポットに対応したリンク先とを出力し、さらに前記ホットスポットが前記表示手段の表示領域と非表示領域にまたがって表示される場合に、ホットスポットに含まれる文字列から表示領域に表示される文字列のみを抽出し、抽出された文字列と抽出元のホットスポットに対応したリンク先を出力する記述言語解析手段と、リンク先と前記ホットスポットに含まれる文字列とを対応付けて記憶するリンクテーブルと、前記記述言語解析手段からリンク先と前記ホットスポットに含まれる文字列とが入力されると、入力されたリンク先とホットスポットに含まれる文字列とを前記リンクテーブルに蓄積するリンクテーブル作成手段と、認識対象語を記憶する認識辞書記憶手段と、前記リンクテーブルから前記ホットスポットに含まれる文字列を読み出し、読み出したホットスポットに含まれる文字列を前記認識辞書記憶手段に蓄積する認識辞書作成手段と、ユーザが音声入力すると、前記認識辞書記憶手段に蓄積されている文字列を認識対象語として音声認識を行い、認識結果を出力する音声認識手段と、前記音声認識手段から認識結果が与えられると、前記リンクテーブルを参照し、前記認識結果に対するリンク先を読み出し、読み出したリンク先を出力するリンク先取得手段と、を具備することを特徴とするものである。
【０００７】
本願の請求項２の発明は、ハイパーテキスト又はＨＴＭＬ言語を含む記述言語で記載されたファイルを格納するファイル記憶手段と、リンク先が入力されると、入力されたリンク先のファイルを前記ファイル記憶手段から読み出すファイル入力手段と、入力される情報を表示する表示手段と、前記ファイル入力手段からファイルが入力されると、入力ファイルに記述された記述言語を解析し、解析結果と前記入力ファイルを出力するファイル解析手段と、前記ファイル解析手段から解析結果とファイルとが入力されると、前記解析結果に基づき前記ファイルを前記表示手段に表示し、前記表示手段に表示されたホットスポットにおいて、前記ホットスポットに含まれる文字列から前記表示手段に表示された文字列のみを抽出し、抽出された文字列と前記ホットスポットに対応したリンク先を出力するホットスポット表示解析手段と、前記ホットスポット表示解析手段から文字列と前記文字列に対応したリンク先とが入力されると、入力文字列の意味を解析し、前記入力文字列から意味のある文字列のみを抽出し、抽出した文字列と入力されたリンク先とを出力する意味解析手段と、リンク先と前記ホットスポットに含まれる文字列とを対応付けて記憶するリンクテーブルと、前記意味解析手段からリンク先と前記ホットスポットに含まれる文字列とが入力されると、入力されたリンク先とホットスポットに含まれる文字列とを前記リンクテーブルに蓄積するリンクテーブル作成手段と、認識対象語を記憶する認識辞書記憶手段と、前記リンクテーブルから前記ホットスポットに含まれる文字列を読み出し、読み出したホットスポットに含まれる文字列を前記認識辞書記憶手段に蓄積する認識辞書作成手段と、ユーザが音声入力すると、前記認識辞書記憶手段に蓄積されている文字列を認識対象語として音声認識を行い、認識結果を出力する音声認識手段と、前記音声認識手段から認識結果が与えられると、前記リンクテーブルを参照し、前記認識結果に対するリンク先を読み出し、読み出したリンク先を出力するリンク先取得手段と、を具備することを特徴とするものである。
【０００８】
本願の請求項３の発明は、請求項１または２の音声認識機能付き表示装置において、前記音声認識手段は、話者によって発話された音声を入力し、音声信号を出力するマイクロホンと、音声認識に用いる各音節の標準モデルを蓄積する音節モデル記憶手段と、前記音節モデル記憶手段を用いて、前記認識辞書記憶手段に記憶されている認識対象語の音声認識用モデルを作成する音声認識用モデル作成手段と、前記音声認識用モデル作成手段から音声認識用モデルが入力され、前記マイクロホンから音声信号が入力されると、入力音声信号を前記音声認識用モデルを用いて音声認識し、認識結果であるテキスト情報を出力する認識手段と、を有することを特徴とするものである。
【００１０】
【発明の実施の形態】
（実施の形態１）
本発明の実施の形態１による音声認識機能付き表示装置について、図面を参照しながら説明する。図１は本発明の実施の形態１による音声認識機能付き表示装置の構成図である。この音声認識機能付き表示装置は、ファイル記憶手段１０１、ファイル入力手段１０２、表示手段１０３、記述言語解析手段１０４Ａ、リンクテーブル１０５、リンクテーブル作成手段１０６、認識辞書記憶手段１０７、認識辞書作成手段１０８、音節モデル記憶手段１０９、音声認識用モデル作成手段１１０、マイクロホン１１１、認識手段１１２、リンク先取得手段１１３を含んで構成される。
【００１１】
ファイル記憶手段１０１は、ハイパーテキストやＨＴＭＬ言語などの記述言語で記述されたファイルを格納するものである。ファイル入力手段１０２は、リンク先が入力されると、入力されたリンク先のファイルをファイル記憶手段１０１から読み出し、この読み出したファイルを出力するものである。
【００１２】
表示手段１０３は、ＣＲＴディスプレイ、液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）等で構成され、入力される情報を表示するものである。記述言語解析手段１０４Ａは、ファイル入力手段１０２を介してファイルが入力されると、入力ファイルに記述された記述言語を解析し、この解析結果に基づき入力ファイルを表示手段１０３に表示すると共に、表示手段１０３に表示されるホットスポットに含まれる文字列と、このホットスポットに対応したリンク先とを出力するものである。
【００１３】
リンクテーブル１０５は、リンク先とホットスポットに含まれる文字列とを対応付けて記憶するものである。リンクテーブル作成手段１０６は、リンク先とホットスポットに含まれる文字列が入力されると、入力されたリンク先とホットスポットに含まれる文字列とをリンクテーブル１０５に蓄積するものである。
【００１４】
認識辞書記憶手段１０７は、認識対象語の文字列と、単語を構成する音節情報とを対にして記憶するものである。日本語における音節は、所謂５０音に、濁音、半濁音、拗音を加えたものである。子音をＣとし、母音をＶとし、半母音をＹとすると、日本語の音節はＣＶ、ＣＹＶの形式をとることが多い。いずれにしても日本語の音節情報は、仮名、片仮名、又はローマ字で表現できる。また、発音記号を用いると、全ての言語における単語の音節を表現できる。
【００１５】
認識辞書作成手段１０８は、リンクテーブル１０５からホットスポットに含まれる文字列を読み出し、この読み出したホットスポットに含まれる文字列を音節情報に変換し、この変換された音節情報と読み出したホットスポットに含まれる文字列とを対にして、認識辞書記憶手段１０７に蓄積するものである。
【００１６】
音節モデル記憶手段１０９は、音声認識に用いる各音節の標準モデルを蓄積するものである。音声認識用モデル作成手段１１０は、認識辞書記憶手段１０７から認識対象語の文字列とこの文字列に対応した音節情報とを全て読み出し、この音節情報に基づいて音節モデル記憶手段１０９を参照して音声認識用モデルを作成し、この作成された音声認識用モデルと、この音声認識用モデルに対応した認識対象語の文字列とを出力するものである。
【００１７】
マイクロホン１１１は、話者によって発話された音声を入力し、音声信号を出力するものである。認識手段１１２は、音声認識用モデル作成手段１１０から音声認識用モデルとこの音声認識用モデルに対応した認識対象語の文字列とが入力され、更にマイクロホン１１１から音声信号が入力されると、入力された音声信号と入力された音声認識用モデルとの比較照合を行い、最も照合度合いが大きい音声認識用モデルに対応した認識対象語の文字列を認識結果として出力するものである。
【００１８】
ここで音節モデル記憶手段１０９、音声認識用モデル作成手段１１０、マイクロホン１１１、認識手段１１２は、ユーザが音声を入力すると、認識辞書記憶手段１０７に蓄積されている文字列を認識対象語として音声認識を行い、認識結果を出力する音声認識手段の機能を構成している。
【００１９】
リンク先取得手段１１３は、認識手段１１２から認識結果が入力されると、リンクテーブル１０５を参照し、入力された認識結果に対するリンク先を読み出し、この読み出したリンク先をファイル入力手段１０２に出力するものである。
【００２０】
このように構成された本実施の形態による音声認識機能付き表示装置の動作例について説明する。図１において、ファイル記憶手段１０１は、ハイパーテキストやＨＴＭＬ言語などの記述言語で記述されたファイルを格納している。具体的な一例として、ファイル記憶手段１０１は、ＨＴＭＬ言語で記述されたファイル「osaka.html，event.html，…」を格納しているとする。ここで、ファイル「osaka.html」の記述内容の一例を図２に示す。
【００２１】
ファイル入力手段１０２は、リンク先が入力されると、この入力されたリンク先のファイルをファイル記憶手段１０１から読み出し、読み出したファイルを記述言語解析手段１０４Ａに出力する。上記の例では、ファイル入力手段１０２は、リンク先が入力されると、入力されたリンク先のファイルをファイル記憶手段１０１から読み出す。一例として、ファイル「osaka.html」をファイル記憶手段１０１から読み出し、このファイル「osaka.html」を記述言語解析手段１０４Ａに出力する。
【００２２】
記述言語解析手段１０４Ａは、ファイル入力手段１０２からファイルが入力されると、入力されたファイルに記述された記述言語を解析し、この解析結果に基づき入力されたファイルを表示手段１０３に表示する。更に記述言語解析手段１０４Ａは、表示手段１０３に表示されるホットスポットに含まれる文字列と、このホットスポットに対応したリンク先とを、リンクテーブル作成手段１０６に出力する。上記の例では、記述言語解析手段１０４Ａは、ファイル入力手段１０２からファイル「osaka.html」が入力されると、このファイル「osaka.html」に記述されたＨＴＭＬ言語を解析し、この解析結果に基づき入力されたファイルを表示手段１０３に表示する。このときの表示手段１０３の表示例を図３に示す。更に、記述言語解析手段１０４Ａは表示手段１０３に表示されるホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」と、このホットスポットに対応したリンク先「event.html，tenki.html，kankou.html 」とをリンクテーブル作成手段１０６に出力する。
【００２３】
リンクテーブル作成手段１０６は、記述言語解析手段１０４Ａからリンク先とホットスポットに含まれる文字列とが入力されると、入力されたリンク先とホットスポットに含まれる文字列とをリンクテーブル１０５に蓄積する。上記の例では、リンクテーブル作成手段１０６は、記述言語解析手段１０４Ａからリンク先「event.html，tenki.html，kankou.html 」と、ホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」が入力されると、入力されたリンク先「event.html，tenki.html，kankou.html 」と、ホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」とをリンクテーブル１０５に蓄積する。このときのリンクテーブル１０５の一例を図４に示す。
【００２４】
認識辞書作成手段１０８は、リンクテーブル１０５からホットスポットに含まれる文字列を読み出し、読み出したホットスポットに含まれる文字列を音節情報に変換し、変換された音節情報と、読み出したホットスポットに含まれる文字列とを対にして認識辞書記憶手段１０７に蓄積する。上記の例では、認識辞書作成手段１０８は、リンクテーブル１０５からホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」を読み出す。ここで音節を平仮名とすると、この読み出したホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」を、音節情報「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」に変換する。そして、この変換された音節情報「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」と、読み出したホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所」とを対にして、認識辞書記憶手段１０７に蓄積する。このときの認識辞書記憶手段１０７の一例を図５に示す。
【００２５】
音節モデル記憶手段１０９は、音声認識に用いる各音節の標準モデルを蓄積している。上記の例では、音節モデル記憶手段１０９は、音声認識に用いる各平仮名の標準モデルを蓄積している。
【００２６】
音声認識用モデル作成手段１１０は、認識辞書記憶手段１０７から認識対象語の文字列とこの文字列に対応した音節情報とを全て読み出し、この音節情報に基づいて音節モデル記憶手段１０９を参照して音声認識用モデルを作成する。そして音声認識用モデル作成手段１１０は、作成された音声認識用モデルと、この音声認識用モデルに対応した認識対象語の文字列とを認識手段１１２に出力する。上記の例では、音声認識用モデル作成手段１１０は、認識辞書記憶手段１０７から認識対象語の文字列「今週のおすすめイベント，天気，観光名所」と、この文字列に対応した音節情報「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」とを全て読み出し、この音節情報に基づいて音節モデル記憶手段１０９を参照して音声認識用モデル「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」を作成する。そして作成された音声認識用モデル「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」と、この音声認識用モデルに対応した認識対象語の文字列「今週のおすすめイベント，天気，観光名所」とを認識手段１１２に出力する。
【００２７】
認識手段１１２は、音声認識用モデル作成手段１１０から音声認識用モデルとこの音声認識用モデルに対応した認識対象語の文字列とが入力され、更に、マイクロホン１１１から音声信号が入力されると、入力された音声信号と入力された音声認識用モデルとの比較照合を行う。そして認識手段１１２は、最も照合度合いが大きい音声認識用モデルに対応した認識対象語の文字列を、認識結果としてリンク先取得手段１１３に出力する。上記の例では、認識手段１１２は、音声認識用モデル作成手段１１０から音声認識用モデル「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」と、この音声認識用モデルに対応した認識対象語の文字列「今週のおすすめイベント，天気，観光名所」とが入力され、更に、マイクロホン１１１から音声信号「てんき」が入力されると、この入力された音声信号「てんき」と、入力された音声認識用モデル「こんしゅうのおすすめいべんと，てんき，かんこうめいしょ」との比較照合を行う。ここでは、比較照合の結果、最も照合度合いが大きい音声認識用モデルを音声認識用モデル「てんき」であるとすると、認識手段１１２はこの音声認識用モデル「てんき」に対応した認識対象語の文字列「天気」を認識結果としてリンク先取得手段１１３に出力する。
【００２８】
リンク先取得手段１１３は、認識手段１１２から認識結果が入力されると、リンクテーブル１０５を参照し、入力された認識結果に対するリンク先を読み出し、この読み出したリンク先をファイル入力手段１０２に出力する。上記の例では、リンク先取得手段１１３は、認識手段１１２から認識結果の文字列「天気」が入力されると、リンクテーブル１０５を参照し、入力された認識結果の文字列「天気」に対するリンク先「tenki.html」を読み出し、この読み出したリンク先「tenki.html」をファイル入力手段１０２に出力する。
【００２９】
なお、上記の実施例では、ホットスポットに含まれる文字列を音声認識の対象語として説明したが、この例に限定されることなく、ホットスポットに番号や記号等を対応付け、この番号や記号等を表示手段に表示し、この表示手段に表示された番号や記号等を音声認識の対象語とするようにしてもよい。
【００３０】
本実施の形態によれば、表示手段に表示されたホットスポットに含まれる文字列のみを音音声認識の対象語とするので、音声認識の対象語を減らすことができ、使用するメモリ容量を減少させ、音声認識の精度を向上させることができる。
【００３１】
（実施の形態２）
次に本発明の実施の形態２による音声認識機能付き表示装置について、図面を参照しながら説明する。前述した実施の形態１では、表示手段１０３に表示されたホットスポットに含まれる文字列を音声認識の対象語とした。しかし、本実施の形態の音声認識機能付き表示装置では、表示手段の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、この抽出された文字列を音声認識の対象語とすることを特徴とする。
【００３２】
図６は実施の形態２による音声認識機能付き表示装置の構成図である。ここで、実施の形態１と同一符号のブロックは同じ動作を行うものとし、それらの詳細な説明は省略する。本実施の形態の音声認識機能付き表示装置は、ファイル記憶手段１０１、ファイル入力手段１０２、表示手段１０３、リンクテーブル１０５、リンクテーブル作成手段１０６、認識辞書記憶手段１０７、認識辞書作成手段１０８、音節モデル記憶手段１０９、音声認識用モデル作成手段１１０、マイクロホン１１１、認識手段１１２、リンク先取得手段１１３、ファイル解析手段２０１、ホットスポット表示解析手段２０２を含んで構成される。
【００３３】
ファイル解析手段２０１は、ファイル入力手段１０２からファイルが入力されると、この入力されたファイルの記述言語を解析し、この解析結果と入力されたファイルとを出力するものである。
【００３４】
ホットスポット表示解析手段２０２は、ファイル解析手段２０１から解析結果とファイルとが入力されると、入力された解析結果に基づき入力されたファイルを表示手段１０３に表示すると共に、表示手段１０３に表示されたホットスポットにおいて、ホットスポットに含まれる文字列で表示手段１０３に表示された文字列のみを抽出し、この抽出された文字列とこのホットスポットに対応したリンク先を出力するものである。ここでファイル解析手段２０１及びホットスポット表示解析手段２０２は記述言語解析手段１０４Ｂを構成している。
【００３５】
このように構成された本実施の形態による音声認識機能付き表示装置の動作例について説明する。図６において、ファイル記憶手段１０１は、ハイパーテキストやＨＴＭＬ言語などの記述言語で記述されたファイルを格納する。具体的な一例として、ファイル記憶手段１０１は、ＨＴＭＬ言語で記述されたファイル「osaka.html，event.html，…」を格納しているとする。ここで、ファイル「osaka.html」の記述内容は図２に示すものと同一である。
【００３６】
ファイル入力手段１０２は、リンク先が入力されると、この入力されたリンク先のファイルをファイル記憶手段１０１から読み出し、このファイルをファイル解析手段２０１に出力する。上記の例では、ファイル入力手段１０２は、リンク先が入力されると、ファイル「osaka.html」をファイル記憶手段１０１から読み出し、このファイル「osaka.html」をファイル解析手段２０１に出力する。
【００３７】
ファイル解析手段２０１は、ファイル入力手段１０２からファイルが入力されると、この入力されたファイルに記述された記述言語を解析し、この解析結果と入力されたファイルとをホットスポット表示解析手段２０２に出力する。上記の例では、ファイル解析手段２０１は、ファイル入力手段１０２からファイル「osaka.html」が入力されると、このファイル「osaka.html」に記述されたＨＴＭＬ言語を解析し、この解析結果とファイル「osaka.html」をホットスポット表示解析手段２０２に出力する。
【００３８】
ホットスポット表示解析手段２０２は、ファイル解析手段２０１から解析結果とファイルとが入力されると、入力された解析結果に基づき入力されたファイルを表示手段１０３に表示する。更にホットスポット表示解析手段２０２は、表示手段１０３に表示されたホットスポットにおいて、ホットスポットに含まれる文字列で表示手段１０３に表示された文字列のみを抽出し、この抽出された文字列と、このホットスポットに対応したリンク先とをリンクテーブル作成手段１０６に出力する。
【００３９】
上記の例では、ホットスポット表示解析手段２０２は、ファイル解析手段２０１から解析結果とファイル「osaka.html」とが入力されると、入力された解析結果に基づき、ファイル「osaka.html」を表示手段１０３に表示する。このときの表示手段１０３の表示例は図３と同一である。そして、ユーザが図３のように表示されている画面において、下方向にスクロールを行うと、表示手段１０３の表示は図７のようになる。
【００４０】
図７においては、ホットスポットに含まれる文字列「今週のおすすめイベント」が表示領域の境界上に位置するため、一部の文字列「イベント」のみが表示される。このとき、ホットスポット表示解析手段２０２は、表示手段１０３に表示されたホットスポットにおいて、ホットスポットに含まれる文字列「今週のおすすめイベント，天気，観光名所，ナイトスポット」において、表示手段１０３に表示された文字列「イベント，天気，観光名所，ナイトスポット」のみを抽出する。そしてホットスポット表示解析手段２０２は、この抽出された文字列「イベント，天気，観光名所，ナイトスポット」と、このホットスポットに対応したリンク先「event.html，tenki.html，kankou.html ，night.html」とをリンクテーブル作成手段１０６に出力する。以降の動作は実施の形態１と同じであるので、それらの詳細な説明は省略する。
【００４１】
本実施の形態によれば、表示手段１０３の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、この抽出された文字列を音声認識の対象語とする。このため、ユーザは表示手段１０３の表示領域の境界上に位置するホットスポットを指定する場合、ホットスポットに含まれる文字列において、表示されている文字列のみを読み上げるだけで、読み上げた文字列のホットスポットに対応したリンク先のファイルを、表示手段１０３に表示させることができる。
【００４２】
（実施の形態３）
次に本発明の実施の形態３による音声認識機能付き表示装置について、図面を参照しながら説明する。前述した実施の形態２では、表示手段１０３の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、この抽出された文字列を音声認識対象語とした。しかし、本実施の形態の音声認識機能付き表示装置では、表示手段１０３の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、更に抽出された文字列の意味を解析し、この意味解析によって抽出される意味のある文字列を音声認識の対象語とすることを特徴とする。
【００４３】
図８は実施の形態３による音声認識機能付き表示装置の構成図である。ここで、実施の形態１、実施の形態２と同一符号のブロックは同じ動作を行うものとし、それらの詳細な説明は省略する。本実施の形態の音声認識機能付き表示装置は、ファイル記憶手段１０１、ファイル入力手段１０２、表示手段１０３、リンクテーブル１０５、リンクテーブル作成手段１０６、認識辞書記憶手段１０７、認識辞書作成手段１０８、音節モデル記憶手段１０９、音声認識用モデル作成手段１１０、マイクロホン１１１、認識手段１１２、リンク先取得手段１１３、ファイル解析手段２０１、ホットスポット表示解析手段２０２、意味解析手段３０１を含んで構成される。ここでファイル解析手段２０１、ホットスポット表示解析手段２０２、意味解析手段３０１は記述言語解析手段１０４Ｃを構成している。
【００４４】
意味解析手段３０１は、ホットスポット表示解析手段２０２から文字列とこの文字列に対応したリンク先とが入力されると、入力された文字列を意味解析し、入力された文字列から意味のある文字列のみを抽出し、この抽出した意味のある文字列と入力されたリンク先とを出力するものである。
【００４５】
このように構成された本実施の形態による音声認識機能付き表示装置の動作例について説明する。図８において、ファイル記憶手段１０１は、ハイパーテキストやＨＴＭＬ言語などの記述言語で記述されたファイルを格納している。具体的な一例として、ファイル記憶手段１０１は、ＨＴＭＬ言語で記述されたファイル「osaka.html，event.html，…」を格納しているとする。ここで、ファイル「osaka.html」の記述内容の一例は図２に示すものと同一である。
【００４６】
ファイル入力手段１０２は、リンク先が入力されると、この入力されたリンク先のファイルをファイル記憶手段１０１から読み出し、読み出したファイルをファイル解析手段２０１に出力する。上記の例では、ファイル入力手段１０２は、リンク先が入力されると、ファイル「osaka.html」をファイル記憶手段１０１から読み出し、このファイル「osaka.html」をファイル解析手段２０１に出力する。
【００４７】
ファイル解析手段２０１は、ファイル入力手段１０２からファイルが入力されると、入力されたファイルに記述された記述言語を解析し、この解析結果と入力されたファイルとをホットスポット表示解析手段２０２に出力する。上記の例では、ファイル解析手段２０１は、ファイル入力手段１０２からファイル「osaka.html」が入力されると、このファイル「osaka.html」に記述されたＨＴＭＬ言語を解析し、この解析結果とファイル「osaka.html」とをホットスポット表示解析手段２０２に出力する。
【００４８】
ホットスポット表示解析手段２０２は、ファイル解析手段２０１から解析結果とファイルとが入力されると、入力された解析結果に基づき入力されたファイルを表示手段１０３に表示する。更にホットスポット表示解析手段２０２は、表示手段１０３の表示領域の境界上に位置するホットスポットにおいては、ホットスポットに含まれる文字列で表示手段１０３に表示された文字列のみを抽出する。そして、抽出された文字列とこのホットスポットに対応したリンク先とを意味解析手段３０１に出力し、表示手段１０３に表示された他のホットスポットにおいては、ホットスポットに含まれる文字列とこのホットスポットに対応したリンク先とをリンクテーブル作成手段１０６に出力する。
【００４９】
上記の例では、ホットスポット表示解析手段２０２は、ファイル解析手段２０１から解析結果とファイル「osaka.html」が入力されると、入力された解析結果に基づき、ファイル「osaka.html」を表示手段１０３に表示する。このときの表示手段１０３の表示例を図９に示す。そして、ユーザが図９のように表示されている画面において、下方向にスクロールを行うと、表示手段１０３の表示は図１０のようになる。
【００５０】
図１０においては、ホットスポットに含まれる文字列「今週のおすすめイベント」が表示領域の境界上に位置するため、一部の文字列「めイベント」のみが表示される。このとき、ホットスポット表示解析手段２０２は、表示手段１０３の表示領域の境界線上に位置するホットスポットにおいては、ホットスポットに含まれる文字列「今週のおすすめイベント」で表示手段１０３に表示された文字列「めイベント」のみを抽出する。そしてホットスポット表示解析手段２０２は、この抽出された文字列「めイベント」とこのホットスポットに対応したリンク先「event.html」とを意味解析手段３０１に出力する。更にホットスポット表示解析手段２０２は、表示手段１０３に表示された他のホットスポットにおいては、ホットスポットに含まれる文字列「天気，観光名所，ナイトスポット」とこのホットスポットに対応したリンク先「tenki.html，kankou.html，night.html」とをリンクテーブル作成手段１０６に出力する。
【００５１】
意味解析手段３０１は、ホットスポット表示解析手段２０２から文字列とこの文字列に対応したリンク先が入力されると、入力された文字列を意味解析し、入力された文字列から意味のある文字列のみを抽出する。そして意味解析手段３０１は、抽出した文字列と入力されたリンク先とをリンクテーブル作成手段１０６に出力する。上記の例では、意味解析手段３０１は、ホットスポット表示解析手段２０２から文字列「めイベント」と、この文字列に対応したリンク先「event.html」とが入力されると、入力された文字列「めイベント」を意味解析し、入力された文字列から意味のある文字列「イベント」を抽出する。そして、抽出した文字列「イベント」とこの文字列に対応したリンク先「event.html」とをリンクテーブル作成手段１０６に出力する。以降の動作は実施の形態１と同じであるので、それらの詳細な説明は省略する。
【００５２】
なお、上記の実施の形態では、表示手段１０３の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示手段１０３に表示されている文字列を抽出し、この抽出された文字列とこのホットスポットに対応したリンク先のみを意味解析手段に出力した。しかし、表示手段１０３に表示されるホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、この抽出された全文字列とこのホットスポットに対応した全リンク先を意味解析手段３０１に出力するようにしてもよい。
【００５３】
本実施の形態によれば、表示手段１０３の表示領域の境界上に位置するホットスポットに含まれる文字列から、表示されている文字列のみを抽出し、更に、抽出された文字列から意味のある文字列を抽出し、意味のある文字列を音声認識の対象語とするようにしている。このため、ユーザは、表示手段１０３の表示領域の境界上に位置するホットスポットを指定する場合、ホットスポットに含まれる文字列において、表示されている文字列から、意味のある文字列だけを読み上げるだけで、読み上げた文字列のホットスポットに対応したリンク先のファイルを表示手段１０３に表示させることができる。
【００５４】
【発明の効果】
以上のように、本発明の音声認識機能付き表示装置によれば、表示手段に表示されたホットスポットに含まれる文字列のみを、音声認識の対象語するため、メモリ容量も低減でき、音声認識の精度が向上する。
【００５５】
また本発明の音声認識機能付き表示装置によれば、表示手段の表示領域の境界上に位置するホットスポットに含まれる文字列に対して、表示されている文字列のみを抽出し、この抽出された文字列を音声認識の対象語とするため、メモリ容量も低減でき、表示領域に表示される文字列のみを読み上げるだけで優れた音声認識機能が得られる。
【００５６】
また本発明の音声認識機能付き表示装置によれば、表示手段に表示されるホットスポットに含まれる文字列に対して、表示されている文字列のみを抽出し、更に、この抽出された文字列から意味のある文字列を抽出し、この意味のある文字列を音声認識の対象語としている。このためメモリ容量も低減でき、表示領域に表示される意味のある文字列を読み上げるだけで、優れた音声認識機能が得られる。
【００５７】
このような音声認識機能付き表示装置を用いると、優れたハイパーテキスト表示装置やＷＷＷブラウザ等を実現できる。
【図面の簡単な説明】
【図１】本発明の実施の形態１における音声認識機能付き表示装置の構成図である。
【図２】音声認識機能付き表示装置に読み込まれるファイルの記述例である。
【図３】音声認識機能付き表示装置の動作（その１）を示す表示例である。
【図４】音声認識機能付き表示装置に用いられるリンクテーブルの内容例を示す説明図である。
【図５】音声認識機能付き表示装置に用いられる認識辞書記憶手段の内容例を示す説明図である。
【図６】本発明の実施の形態２における音声認識機能付き表示装置の構成図である。
【図７】音声認識機能付き表示装置の動作（その２）を示す表示例である。
【図８】本発明の実施の形態３における音声認識機能付き表示装置の構成図である。
【図９】音声認識機能付き表示装置の動作（その３）を示す表示例である。
【図１０】音声認識機能付き表示装置の動作（その４）を示す表示例である。
【符号の説明】
１０１ファイル記憶手段
１０２ファィル入力手段
１０３表示手段
１０４Ａ，１０４Ｂ，１０４Ｃ記述言語解析手段
１０５リンクテーブル
１０６リンクテーブル作成手段
１０７認識辞書記憶手段
１０８認識辞書作成手段
１０９音節モデル記憶手段
１１０音声認識用モデル作成手段
１１１マイクロホン
１１２認識手段
１１３リンク先取得手段
２０１ファイル解析手段
２０２ホットスポット表示解析手段
３０１意味解析手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a display device with a voice recognition function used for a hypertext display device, a WWW browser device, and the like.
[0002]
[Prior art]
Japanese Patent Application Laid-Open No. 10-222342 discloses a hypertext voice control that enables a voice recognition target word and a process associated therewith to be designated in hypertext, and allows a voice control of a hypertext display device flexibly. A method and apparatus is disclosed.
[0003]
[Problems to be solved by the invention]
Other parts in the same document displayed on the display means and areas linked to other documents are called hot spots. In a hypertext display device or a WWW browser having a conventional voice recognition function, when a file describing a description language such as hypertext or HTML is input, the input file is analyzed and included in a hot spot. Speech recognition is performed using all the character strings to be recognized as recognition target words. For this reason, if there are many hot spots in one file, there are problems that the number of recognition target words increases, the accuracy of voice recognition decreases, and the memory capacity used increases.
[0004]
In addition, when the hot spot exists on the boundary of the display area, the user cannot read all the character strings included in the hot spot, so that there is a problem that the accuracy of voice recognition is lowered.
[0005]
The present invention has been made in view of such a conventional problem, and by using only character strings included in hot spots displayed on the display means as recognition target words, the recognition target words are reduced, An object of the present invention is to provide a display device with a voice recognition function that can reduce the memory capacity to be used and improve the accuracy of voice recognition.
[0006]
[Means for Solving the Problems]
The invention of claim 1 of the present application is a file storage means for storing a file described in a description language including hypertext or HTML language, and when the link destination is input, the input link destination file is stored in the file storage. A file input means for reading out from the means; a display means for displaying input information; and when a file is input from the file input means, the description language described in the input file is analyzed, and an input file based on the analysis result Is displayed on the display unit, and corresponds to the character string and the hot spot included in another part of the same document displayed on the display unit or an area linked to another document (hereinafter referred to as a hot spot). Output link destination In addition, when the hot spot is displayed across the display area and the non-display area of the display means, only the character string displayed in the display area is extracted from the character string included in the hot spot and extracted. Outputs the link destination corresponding to the character string and the source hotspot A description language analysis unit, a link table that stores a link destination and a character string included in the hot spot in association with each other, and a link destination and a character string included in the hot spot are input from the description language analysis unit. A link table creating means for storing the input link destination and a character string included in the hot spot in the link table, a recognition dictionary storage means for storing a recognition target word, and the hot spot from the link table. A recognition dictionary creating unit for storing the character string included in the read hot spot in the recognition dictionary storage unit, and when the user inputs a voice, the character string stored in the recognition dictionary storage unit is recognized. Speech recognition means for performing speech recognition as a target word and outputting a recognition result; and a recognition result is given from the speech recognition means It is the, by referring to the link table, reads the destination for said recognition result, and is characterized in that it comprises a link destination obtaining means for outputting the read destination, a.
[0007]
The invention of claim 2 of the present application is File storage means for storing a file described in a description language including hypertext or HTML language, and a file input means for reading the input link destination file from the file storage means when a link destination is input, and an input Display means for displaying the information to be processed, file analysis means for analyzing the description language described in the input file when the file is inputted from the file input means, and outputting the analysis result and the input file, and the file When an analysis result and a file are input from the analysis unit, the file is displayed on the display unit based on the analysis result, and in the hot spot displayed on the display unit, the character string included in the hot spot is used to display the file. Only the character string displayed on the display means is extracted, and the extracted character string and the hot spot are extracted. When a hotspot display analysis unit that outputs a corresponding link destination and a character string and a link destination corresponding to the character string are input from the hotspot display analysis unit, the meaning of the input character string is analyzed, and the input Only a meaningful character string is extracted from the character string, and a semantic analysis unit that outputs the extracted character string and the input link destination is stored in association with the link destination and the character string included in the hot spot. When a link destination and a character string included in the hot spot are input from the semantic analysis unit, a link table that accumulates the input link destination and the character string included in the hot spot in the link table A character string included in the hot spot is read from the creation means, a recognition dictionary storage means for storing a recognition target word, and the link table. A recognition dictionary creating means for storing the character string included in the hot spot that has been put out in the recognition dictionary storage means, and when the user inputs a voice, the character string stored in the recognition dictionary storage means is used as a recognition target word for speech recognition. And a speech recognition unit for outputting a recognition result, and when a recognition result is given from the speech recognition unit, referring to the link table, reading a link destination for the recognition result and outputting a read link destination Means It is characterized by doing.
[0008]
The invention of claim 3 of the present application is claimed in claim 1. Or 2 In the display device with voice recognition function of The speech recognition means uses a microphone that inputs speech uttered by a speaker and outputs a speech signal, a syllable model storage means that stores a standard model of each syllable used for speech recognition, and the syllable model storage means A speech recognition model creating means for creating a speech recognition model of a recognition target word stored in the recognition dictionary storage means, and a speech recognition model from the speech recognition model creating means, and from the microphone Recognizing means for recognizing an input speech signal using the speech recognition model and outputting text information as a recognition result when a speech signal is input; It is characterized by having.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
A display device with a voice recognition function according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a configuration diagram of a display device with a voice recognition function according to Embodiment 1 of the present invention. This display device with a voice recognition function includes a file storage unit 101, a file input unit 102, a display unit 103, a description language analysis unit 104A, a link table 105, a link table creation unit 106, a recognition dictionary storage unit 107, and a recognition dictionary creation unit 108. A syllable model storage unit 109, a speech recognition model creation unit 110, a microphone 111, a recognition unit 112, and a link destination acquisition unit 113.
[0011]
The file storage means 101 stores a file described in a description language such as hypertext or HTML language. When the link destination is input, the file input unit 102 reads the input link destination file from the file storage unit 101 and outputs the read file.
[0012]
The display means 103 is composed of a CRT display, a liquid crystal display (LCD), a plasma display (PDP) or the like, and displays input information. When the file is input via the file input unit 102, the description language analysis unit 104A analyzes the description language described in the input file, and displays the input file on the display unit 103 based on the analysis result. The character string included in the hot spot displayed on the means 103 and the link destination corresponding to this hot spot are output.
[0013]
The link table 105 stores link destinations and character strings included in hot spots in association with each other. When a character string included in a link destination and a hot spot is input, the link table creating unit 106 accumulates the input link destination and the character string included in the hot spot in the link table 105.
[0014]
The recognition dictionary storage unit 107 stores a character string of a recognition target word and a syllable information constituting the word as a pair. The syllables in Japanese are the so-called 50 sounds plus muddy sounds, semi-turbid sounds and stuttering sounds. When a consonant is C, a vowel is V, and a semi-vowel is Y, a Japanese syllable often takes the form of CV or CYV. In any case, Japanese syllable information can be expressed in kana, katakana, or romaji. Using phonetic symbols can express syllables of words in all languages.
[0015]
The recognition dictionary creating means 108 reads out the character string included in the hot spot from the link table 105, converts the character string included in the read hot spot into syllable information, and converts the converted syllable information and the read hot spot into the syllable information. The character strings included are stored in the recognition dictionary storage means 107 as a pair.
[0016]
The syllable model storage unit 109 stores a standard model of each syllable used for speech recognition. The speech recognition model creation means 110 reads all the character string of the recognition target word and the syllable information corresponding to the character string from the recognition dictionary storage means 107, and refers to the syllable model storage means 109 based on the syllable information. A speech recognition model is created, and the created speech recognition model and a character string of a recognition target word corresponding to the speech recognition model are output.
[0017]
The microphone 111 inputs voice uttered by a speaker and outputs a voice signal. The recognition unit 112 receives a speech recognition model and a character string of a recognition target word corresponding to the speech recognition model from the speech recognition model creation unit 110, and further receives a speech signal from the microphone 111. The comparison is performed between the received speech signal and the input speech recognition model, and a character string of a recognition target word corresponding to the speech recognition model having the highest matching level is output as a recognition result.
[0018]
Here, when the user inputs speech, the syllable model storage unit 109, the speech recognition model creation unit 110, the microphone 111, and the recognition unit 112 perform speech recognition using the character strings stored in the recognition dictionary storage unit 107 as recognition target words. The function of the speech recognition means which performs recognition and outputs a recognition result is comprised.
[0019]
When the recognition result is input from the recognition unit 112, the link destination acquisition unit 113 refers to the link table 105, reads the link destination for the input recognition result, and outputs the read link destination to the file input unit 102. Is.
[0020]
An operation example of the display device with a voice recognition function according to this embodiment configured as described above will be described. In FIG. 1, a file storage unit 101 stores a file described in a description language such as hypertext or HTML language. As a specific example, it is assumed that the file storage unit 101 stores files “osaka.html, event.html,...” Described in the HTML language. Here, an example of the description content of the file “osaka.html” is shown in FIG.
[0021]
When the link destination is input, the file input unit 102 reads the input link destination file from the file storage unit 101, and outputs the read file to the description language analysis unit 104A. In the above example, when the link destination is input, the file input unit 102 reads the input link destination file from the file storage unit 101. As an example, the file “osaka.html” is read from the file storage unit 101, and this file “osaka.html” is output to the description language analysis unit 104A.
[0022]
When a file is input from the file input unit 102, the description language analysis unit 104A analyzes the description language described in the input file, and displays the input file on the display unit 103 based on the analysis result. Further, the description language analyzing unit 104A outputs the character string included in the hot spot displayed on the display unit 103 and the link destination corresponding to the hot spot to the link table creating unit 106. In the above example, when the file “osaka.html” is input from the file input unit 102, the description language analysis unit 104A analyzes the HTML language described in the file “osaka.html”, and the analysis result is The input file is displayed on the display means 103. A display example of the display means 103 at this time is shown in FIG. Further, the description language analyzing unit 104A displays the character string “recommended event, weather, tourist attraction of this week” included in the hot spot displayed on the display unit 103, and the link destination “event.html, tenki. html, kankou.html ”to the link table creation means 106.
[0023]
When the link destination and the character string included in the hot spot are input from the description language analysis unit 104A, the link table creation unit 106 stores the input link destination and the character string included in the hot spot in the link table 105. To do. In the above example, the link table creation means 106 sends the link destination “event.html, tenki.html, kankou.html” from the description language analysis means 104A and the character string “recommended event of this week, weather, When "Tourist attraction" is entered, the entered link destination "event.html, tenki.html, kankou.html" and the string "This week's recommended event, weather, tourist attraction" included in the hotspot are linked Accumulate in table 105. An example of the link table 105 at this time is shown in FIG.
[0024]
The recognition dictionary creation means 108 reads out the character string included in the hot spot from the link table 105, converts the character string included in the read hot spot into syllable information, and includes the converted syllable information and the read hot spot. Are stored in the recognition dictionary storage means 107 in pairs. In the above example, the recognition dictionary creating unit 108 reads the character string “recommended event of the week, weather, tourist attraction” included in the hot spot from the link table 105. If the syllable is Hiragana here, the character string “Recommended events, weather, and tourist attractions of this week” included in the read hotspot is converted to the syllable information “Recommended events, weather, and sights”. To do. Then, this converted syllable information “Konshu no Recommended Event, Tenki, Kanko Meisho” is paired with the character string “This Week's Recommended Event, Weather, Tourist Attraction” included in the read hotspot. And stored in the recognition dictionary storage means 107. An example of the recognition dictionary storage means 107 at this time is shown in FIG.
[0025]
The syllable model storage unit 109 stores a standard model of each syllable used for speech recognition. In the above example, the syllable model storage unit 109 stores a standard model of each hiragana used for speech recognition.
[0026]
The speech recognition model creation means 110 reads all the character string of the recognition target word and the syllable information corresponding to the character string from the recognition dictionary storage means 107, and refers to the syllable model storage means 109 based on the syllable information. Create a speech recognition model. Then, the speech recognition model creating unit 110 outputs the created speech recognition model and the character string of the recognition target word corresponding to the speech recognition model to the recognition unit 112. In the above example, the speech recognition model creation unit 110 reads the character string “recommended event of the week, weather, tourist attraction” from the recognition dictionary storage unit 107 and the syllable information “Konshu” corresponding to this character string. "Recommendation of school, school and school" is read out, and based on this syllable information, the syllable model storage means 109 is referred to, and the speech recognition model "Recommendation of school, school, school and school"". And the created speech recognition model “Konshu no recommended event, Tenki, Kanko Meisho” and the character string of the recognition target word corresponding to this speech recognition model “This week's recommended event, weather, tourist attraction” Is output to the recognition unit 112.
[0027]
The recognition unit 112 receives a speech recognition model and a character string of a recognition target word corresponding to the speech recognition model from the speech recognition model creation unit 110, and further receives a speech signal from the microphone 111. The collation between the input speech signal and the input speech recognition model is performed. Then, the recognition unit 112 outputs the character string of the recognition target word corresponding to the speech recognition model having the highest matching degree to the link destination acquisition unit 113 as a recognition result. In the above example, the recognizing unit 112 recognizes the speech recognition model from the speech recognition model creation unit 110 and “recommends, recommends, learns, and recognizes” and the recognition target words corresponding to the speech recognition model. When a character string “This week's recommended event, weather, tourist attraction” is input, and further, an audio signal “Tenki” is input from the microphone 111, the input audio signal “Tenki” and the input audio Compare and check the recognition model “Recommended for Konshu, Tenki, Kanko Meisho”. Here, if the speech recognition model having the highest degree of matching is the speech recognition model “Tenki” as a result of comparison and collation, the recognition means 112 recognizes the character of the recognition target word corresponding to this speech recognition model “Tenki”. The column “weather” is output to the link destination obtaining unit 113 as a recognition result.
[0028]
When the recognition result is input from the recognition unit 112, the link destination acquisition unit 113 refers to the link table 105, reads the link destination for the input recognition result, and outputs the read link destination to the file input unit 102. . In the above example, when the character string “weather” of the recognition result is input from the recognition unit 112, the link destination acquisition unit 113 refers to the link table 105 and links to the input character string “weather” of the recognition result. The destination “tenki.html” is read, and the read link destination “tenki.html” is output to the file input means 102.
[0029]
In the above embodiment, the character string included in the hot spot has been described as a target word for speech recognition. However, the present invention is not limited to this example. Etc. may be displayed on the display means, and the numbers, symbols, etc. displayed on the display means may be the target words for speech recognition.
[0030]
According to the present embodiment, since only the character string included in the hot spot displayed on the display means is the target word for speech recognition, the target words for speech recognition can be reduced, and the memory capacity used is reduced. And the accuracy of voice recognition can be improved.
[0031]
(Embodiment 2)
Next, a display device with a voice recognition function according to a second embodiment of the present invention will be described with reference to the drawings. In the first embodiment described above, the character string included in the hot spot displayed on the display means 103 is the target word for speech recognition. However, in the display device with a speech recognition function of the present embodiment, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display unit, and the extracted character string is extracted. A character string is a target word for speech recognition.
[0032]
FIG. 6 is a configuration diagram of a display device with a voice recognition function according to the second embodiment. Here, blocks having the same reference numerals as those in Embodiment 1 perform the same operations, and detailed descriptions thereof are omitted. The display device with a speech recognition function of the present embodiment includes a file storage unit 101, a file input unit 102, a display unit 103, a link table 105, a link table creation unit 106, a recognition dictionary storage unit 107, a recognition dictionary creation unit 108, a syllable. A model storage unit 109, a speech recognition model creation unit 110, a microphone 111, a recognition unit 112, a link destination acquisition unit 113, a file analysis unit 201, and a hot spot display analysis unit 202 are configured.
[0033]
When a file is input from the file input unit 102, the file analysis unit 201 analyzes the description language of the input file and outputs the analysis result and the input file.
[0034]
When the analysis result and the file are input from the file analysis unit 201, the hot spot display analysis unit 202 displays the input file on the display unit 103 based on the input analysis result and also displays the file on the display unit 103. In the hot spot, only the character string displayed on the display means 103 is extracted from the character strings included in the hot spot, and the extracted character string and the link destination corresponding to the hot spot are output. Here, the file analysis means 201 and the hot spot display analysis means 202 constitute a description language analysis means 104B.
[0035]
An operation example of the display device with a voice recognition function according to this embodiment configured as described above will be described. In FIG. 6, the file storage means 101 stores a file described in a description language such as hypertext or HTML language. As a specific example, it is assumed that the file storage unit 101 stores files “osaka.html, event.html,...” Described in the HTML language. Here, the description content of the file “osaka.html” is the same as that shown in FIG.
[0036]
When the link destination is input, the file input unit 102 reads the input link destination file from the file storage unit 101 and outputs the file to the file analysis unit 201. In the above example, when the link destination is input, the file input unit 102 reads the file “osaka.html” from the file storage unit 101 and outputs this file “osaka.html” to the file analysis unit 201.
[0037]
When the file is input from the file input unit 102, the file analysis unit 201 analyzes the description language described in the input file, and the analysis result and the input file are sent to the hot spot display analysis unit 202. Output. In the above example, when the file “osaka.html” is input from the file input unit 102, the file analysis unit 201 analyzes the HTML language described in the file “osaka.html”, and the analysis result and the file “Osaka.html” is output to the hot spot display analyzing means 202.
[0038]
When the analysis result and the file are input from the file analysis unit 201, the hot spot display analysis unit 202 displays the input file on the display unit 103 based on the input analysis result. Further, the hot spot display analysis unit 202 extracts only the character string displayed on the display unit 103 from the character strings included in the hot spot from the hot spots displayed on the display unit 103, and the extracted character string, The link destination corresponding to this hot spot is output to the link table creation means 106.
[0039]
In the above example, when the analysis result and the file “osaka.html” are input from the file analysis unit 201, the hot spot display analysis unit 202 displays the file “osaka.html” based on the input analysis result. Displayed on the means 103. The display example of the display means 103 at this time is the same as that in FIG. When the user scrolls downward on the screen displayed as shown in FIG. 3, the display on the display means 103 is as shown in FIG.
[0040]
In FIG. 7, since the character string “Recommended Events of the Week” included in the hot spot is located on the boundary of the display area, only a part of the character string “Event” is displayed. At this time, the hot spot display analysis unit 202 displays the character string “recommended event of the week, weather, tourist attraction, night spot” on the display unit 103 in the hot spot displayed on the display unit 103. Only the extracted character string “event, weather, tourist attraction, night spot” is extracted. The hot spot display analysis unit 202 then extracts the extracted character string “event, weather, tourist attraction, night spot” and link destinations “event.html, tenki.html, kankou.html, night” corresponding to the hot spot. .html ”is output to the link table creation means 106. Since the subsequent operation is the same as that of the first embodiment, detailed description thereof will be omitted.
[0041]
According to the present embodiment, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display means 103, and the extracted character string is extracted from the voice recognition. The target word. For this reason, when the user designates a hot spot located on the boundary of the display area of the display means 103, only the displayed character string is read out in the character string included in the hot spot, and the character string read out is read. The link destination file corresponding to the hot spot can be displayed on the display means 103.
[0042]
(Embodiment 3)
Next, a display device with a voice recognition function according to a third embodiment of the present invention will be described with reference to the drawings. In the above-described second embodiment, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display unit 103, and the extracted character string is subjected to speech recognition. Word. However, in the display device with a speech recognition function of the present embodiment, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display unit 103, and further extracted. The meaning of the character string is analyzed, and a meaningful character string extracted by the semantic analysis is set as a speech recognition target word.
[0043]
FIG. 8 is a configuration diagram of a display device with a voice recognition function according to the third embodiment. Here, blocks having the same reference numerals as those in the first and second embodiments perform the same operation, and detailed descriptions thereof are omitted. The display device with a voice recognition function of the present embodiment includes a file storage unit 101, a file input unit 102, a display unit 103, a link table 105, a link table creation unit 106, a recognition dictionary storage unit 107, a recognition dictionary creation unit 108, a syllable. A model storage unit 109, a speech recognition model creation unit 110, a microphone 111, a recognition unit 112, a link destination acquisition unit 113, a file analysis unit 201, a hot spot display analysis unit 202, and a semantic analysis unit 301 are configured. Here, the file analysis unit 201, the hot spot display analysis unit 202, and the semantic analysis unit 301 constitute a description language analysis unit 104C.
[0044]
When a character string and a link destination corresponding to the character string are input from the hot spot display analysis unit 202, the semantic analysis unit 301 performs semantic analysis on the input character string and makes sense from the input character string. Only the character string is extracted, and the extracted meaningful character string and the input link destination are output.
[0045]
An operation example of the display device with a voice recognition function according to this embodiment configured as described above will be described. In FIG. 8, a file storage means 101 stores a file described in a description language such as hypertext or HTML language. As a specific example, it is assumed that the file storage unit 101 stores files “osaka.html, event.html,...” Described in the HTML language. Here, an example of the description content of the file “osaka.html” is the same as that shown in FIG.
[0046]
When the link destination is input, the file input unit 102 reads the input link destination file from the file storage unit 101, and outputs the read file to the file analysis unit 201. In the above example, when the link destination is input, the file input unit 102 reads the file “osaka.html” from the file storage unit 101 and outputs this file “osaka.html” to the file analysis unit 201.
[0047]
When the file is input from the file input unit 102, the file analysis unit 201 analyzes the description language described in the input file, and outputs the analysis result and the input file to the hot spot display analysis unit 202. To do. In the above example, when the file “osaka.html” is input from the file input unit 102, the file analysis unit 201 analyzes the HTML language described in the file “osaka.html”, and the analysis result and the file “Osaka.html” is output to the hot spot display analyzing means 202.
[0048]
When the analysis result and the file are input from the file analysis unit 201, the hot spot display analysis unit 202 displays the input file on the display unit 103 based on the input analysis result. Further, the hot spot display analysis unit 202 extracts only the character string displayed on the display unit 103 from the character strings included in the hot spot from the hot spots located on the boundary of the display area of the display unit 103. Then, the extracted character string and the link destination corresponding to the hot spot are output to the semantic analyzing unit 301. For other hot spots displayed on the display unit 103, the character string included in the hot spot and the hot spot are displayed. The link destination corresponding to the spot is output to the link table creation means 106.
[0049]
In the above example, when the analysis result and the file “osaka.html” are input from the file analysis unit 201, the hot spot display analysis unit 202 displays the file “osaka.html” based on the input analysis result. 103. A display example of the display means 103 at this time is shown in FIG. When the user scrolls downward on the screen displayed as shown in FIG. 9, the display on the display means 103 is as shown in FIG.
[0050]
In FIG. 10, since the character string “Recommended Events of the Week” included in the hotspot is located on the boundary of the display area, only a part of the character string “Me Events” is displayed. At this time, the hot spot display analysis unit 202 displays the characters displayed on the display unit 103 with the character string “recommended event of the week” included in the hot spot in the hot spot located on the boundary line of the display area of the display unit 103. Extract only the column “event”. Then, the hot spot display analysis unit 202 outputs the extracted character string “Me event” and the link destination “event.html” corresponding to the hot spot to the semantic analysis unit 301. Further, the hot spot display analyzing unit 202 includes a character string “weather, tourist attraction, night spot” included in the hot spot and a link destination “tenki” corresponding to the hot spot in the other hot spots displayed on the display unit 103. .html, kankou.html, night.html ”are output to the link table creation means 106.
[0051]
When a character string and a link destination corresponding to the character string are input from the hot spot display analysis unit 202, the semantic analysis unit 301 performs semantic analysis on the input character string, and makes a meaningful character from the input character string. Extract columns only. The semantic analysis unit 301 then outputs the extracted character string and the input link destination to the link table creation unit 106. In the above example, the semantic analysis unit 301 receives the input character string from the hotspot display analysis unit 202 when the character string “Me Event” and the link destination “event.html” corresponding to the character string are input. The column “M event” is subjected to semantic analysis, and a meaningful character string “event” is extracted from the input character string. Then, the extracted character string “event” and the link destination “event.html” corresponding to this character string are output to the link table creating means 106. Since the subsequent operation is the same as that of the first embodiment, detailed description thereof will be omitted.
[0052]
In the above-described embodiment, the character string displayed on the display unit 103 is extracted from the character string included in the hot spot located on the boundary of the display area of the display unit 103, and the extracted character string is extracted. Only the link destination corresponding to this hot spot is output to the semantic analysis means. However, only the displayed character string is extracted from the character strings included in the hot spot displayed on the display means 103, and all the extracted character strings and all link destinations corresponding to the hot spot are extracted as semantic analysis means. The data may be output to 301.
[0053]
According to the present embodiment, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display means 103, and further, the meaning of the extracted character string is extracted. A certain character string is extracted, and a meaningful character string is set as a speech recognition target word. For this reason, when the user designates a hot spot located on the boundary of the display area of the display means 103, only the meaningful character string is read from the displayed character string in the character string included in the hot spot. Only the link destination file corresponding to the hot spot of the read character string can be displayed on the display means 103.
[0054]
【The invention's effect】
As described above, according to the display device with a voice recognition function of the present invention, only the character string included in the hot spot displayed on the display means is a target word for voice recognition, so that the memory capacity can be reduced, and voice recognition is performed. Improves accuracy.
[0055]
Further, according to the display device with a voice recognition function of the present invention, only the displayed character string is extracted from the character string included in the hot spot located on the boundary of the display area of the display means, and the extracted character string is extracted. Therefore, the memory capacity can be reduced, and an excellent speech recognition function can be obtained simply by reading only the character string displayed in the display area.
[0056]
Further, according to the display device with a voice recognition function of the present invention, only the displayed character string is extracted from the character string included in the hot spot displayed on the display means, and the extracted character string is further extracted. A meaningful character string is extracted from this, and this meaningful character string is used as a speech recognition target word. For this reason, the memory capacity can be reduced, and an excellent speech recognition function can be obtained simply by reading a meaningful character string displayed in the display area.
[0057]
When such a display device with a voice recognition function is used, an excellent hypertext display device, a WWW browser, or the like can be realized.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a display device with a voice recognition function according to a first embodiment of the present invention.
FIG. 2 is a description example of a file read into a display device with a voice recognition function.
FIG. 3 is a display example showing an operation (part 1) of the display device with a voice recognition function;
FIG. 4 is an explanatory diagram showing an example of the contents of a link table used in a display device with a voice recognition function.
FIG. 5 is an explanatory diagram showing an example of contents of a recognition dictionary storage unit used in a display device with a voice recognition function.
FIG. 6 is a configuration diagram of a display device with a voice recognition function in Embodiment 2 of the present invention.
FIG. 7 is a display example showing an operation (part 2) of the display device with a voice recognition function;
FIG. 8 is a configuration diagram of a display device with a voice recognition function according to a third embodiment of the present invention.
FIG. 9 is a display example showing the operation (part 3) of the display device with a voice recognition function;
FIG. 10 is a display example showing an operation (part 4) of the display device with a voice recognition function;
[Explanation of symbols]
101 File storage means
102 File input means
103 Display means
104A, 104B, 104C Description language analysis means
105 Link table
106 Link table creation means
107 recognition dictionary storage means
108 Recognition dictionary creation means
109 Syllable model storage means
110 Voice recognition model creation means
111 microphone
112 Recognition means
113 Link destination acquisition means
201 File analysis means
202 Hot spot display analysis means
301 Meaning analysis means

Claims

ハイパーテキスト又はＨＴＭＬ言語を含む記述言語で記載されたファイルを格納するファイル記憶手段と、
リンク先が入力されると、入力されたリンク先のファイルを前記ファイル記憶手段から読み出すファイル入力手段と、
入力される情報を表示する表示手段と、
前記ファイル入力手段からファイルが入力されると、入力ファイルに記載された記述言語を解析し、この解析結果に基づき入力ファイルを前記表示手段に表示すると共に、前記表示手段に表示される同一文書内の他の部分又は他文書にリンクされる領域（以下、ホットスポットという）に含まれる文字列と前記ホットスポットに対応したリンク先とを出力し、さらに前記ホットスポットが前記表示手段の表示領域と非表示領域にまたがって表示される場合に、ホットスポットに含まれる文字列から表示領域に表示される文字列のみを抽出し、抽出された文字列と抽出元のホットスポットに対応したリンク先を出力する記述言語解析手段と、
リンク先と前記ホットスポットに含まれる文字列とを対応付けて記憶するリンクテーブルと、
前記記述言語解析手段からリンク先と前記ホットスポットに含まれる文字列とが入力されると、入力されたリンク先とホットスポットに含まれる文字列とを前記リンクテーブルに蓄積するリンクテーブル作成手段と、
認識対象語を記憶する認識辞書記憶手段と、
前記リンクテーブルから前記ホットスポットに含まれる文字列を読み出し、読み出したホットスポットに含まれる文字列を前記認識辞書記憶手段に蓄積する認識辞書作成手段と、
ユーザが音声入力すると、前記認識辞書記憶手段に蓄積されている文字列を認識対象語として音声認識を行い、認識結果を出力する音声認識手段と、
前記音声認識手段から認識結果が与えられると、前記リンクテーブルを参照し、前記認識結果に対するリンク先を読み出し、読み出したリンク先を出力するリンク先取得手段と、を具備することを特徴とする音声認識機能付き表示装置。File storage means for storing files described in a description language including hypertext or HTML language;
When a link destination is input, a file input means for reading the input link destination file from the file storage means;
Display means for displaying input information;
When a file is input from the file input means, the description language described in the input file is analyzed, the input file is displayed on the display means based on the analysis result, and the same document displayed on the display means is displayed. A character string included in an area linked to another part or another document (hereinafter referred to as a hot spot) and a link destination corresponding to the hot spot are output , and the hot spot is a display area of the display means. When displayed across the non-display area, only the character string displayed in the display area is extracted from the character string included in the hotspot, and the link destination corresponding to the extracted character string and the source hotspot is displayed. A description language analysis means for outputting ;
A link table for storing a link destination and a character string included in the hot spot in association with each other;
Link table creation means for storing the input link destination and the character string included in the hot spot in the link table when the link destination and the character string included in the hot spot are input from the description language analysis means; ,
A recognition dictionary storage means for storing a recognition target word;
A recognition dictionary creating means for reading a character string included in the hot spot from the link table and storing the character string included in the read hot spot in the recognition dictionary storage means;
When a user inputs a voice, a voice recognition unit that performs voice recognition using a character string stored in the recognition dictionary storage unit as a recognition target word and outputs a recognition result;
And a link destination acquisition unit that, when a recognition result is given from the voice recognition unit, refers to the link table, reads a link destination for the recognition result, and outputs the read link destination. A display device with a recognition function.

ハイパーテキスト又はＨＴＭＬ言語を含む記述言語で記載されたファイルを格納するファイル記憶手段と、
リンク先が入力されると、入力されたリンク先のファイルを前記ファイル記憶手段から読み出すファイル入力手段と、
入力される情報を表示する表示手段と、
前記ファイル入力手段からファイルが入力されると、入力ファイルに記述された記述言語を解析し、解析結果と前記入力ファイルを出力するファイル解析手段と、
前記ファイル解析手段から解析結果とファイルとが入力されると、前記解析結果に基づき前記ファイルを前記表示手段に表示し、前記表示手段に表示されたホットスポットにおいて、前記ホットスポットに含まれる文字列から前記表示手段に表示された文字列のみを抽出し、抽出された文字列と前記ホットスポットに対応したリンク先を出力するホットスポット表示解析手段と、
前記ホットスポット表示解析手段から文字列と前記文字列に対応したリンク先とが入力されると、入力文字列の意味を解析し、前記入力文字列から意味のある文字列のみを抽出し、抽出した文字列と入力されたリンク先とを出力する意味解析手段と、
リンク先と前記ホットスポットに含まれる文字列とを対応付けて記憶するリンクテーブルと、
前記意味解析手段からリンク先と前記ホットスポットに含まれる文字列とが入力されると、入力されたリンク先とホットスポットに含まれる文字列とを前記リンクテーブルに蓄積するリンクテーブル作成手段と、
認識対象語を記憶する認識辞書記憶手段と、
前記リンクテーブルから前記ホットスポットに含まれる文字列を読み出し、読み出したホットスポットに含まれる文字列を前記認識辞書記憶手段に蓄積する認識辞書作成手段と、
ユーザが音声入力すると、前記認識辞書記憶手段に蓄積されている文字列を認識対象語として音声認識を行い、認識結果を出力する音声認識手段と、
前記音声認識手段から認識結果が与えられると、前記リンクテーブルを参照し、前記認識結果に対するリンク先を読み出し、読み出したリンク先を出力するリンク先取得手段と、を具備することを特徴とする音声認識機能付き表示装置。 File storage means for storing files described in a description language including hypertext or HTML language;
When a link destination is input, a file input means for reading the input link destination file from the file storage means;
Display means for displaying input information;
When a file is input from the file input means, a description language described in the input file is analyzed, and an analysis result and a file analysis means for outputting the input file;
When an analysis result and a file are input from the file analysis unit, the file is displayed on the display unit based on the analysis result, and a character string included in the hot spot in the hot spot displayed on the display unit Only the character string displayed on the display means from the hot spot display analysis means for outputting the extracted character string and the link destination corresponding to the hot spot,
When a character string and a link destination corresponding to the character string are input from the hot spot display analysis unit, the meaning of the input character string is analyzed, and only a meaningful character string is extracted from the input character string and extracted. Semantic analysis means for outputting the input character string and the input link destination,
A link table for storing a link destination and a character string included in the hot spot in association with each other;
When a link destination and a character string included in the hot spot are input from the semantic analysis unit, a link table creating unit that accumulates the input link destination and a character string included in the hot spot in the link table;
A recognition dictionary storage means for storing a recognition target word;
A recognition dictionary creating means for reading a character string included in the hot spot from the link table and storing the character string included in the read hot spot in the recognition dictionary storage means;
When a user inputs a voice, a voice recognition unit that performs voice recognition using a character string stored in the recognition dictionary storage unit as a recognition target word and outputs a recognition result;
And a link destination acquisition unit that, when a recognition result is given from the voice recognition unit, refers to the link table, reads a link destination for the recognition result, and outputs the read link destination. A display device with a recognition function.

前記音声認識手段は、
話者によって発話された音声を入力し、音声信号を出力するマイクロホンと、
音声認識に用いる各音節の標準モデルを蓄積する音節モデル記憶手段と、
前記音節モデル記憶手段を用いて、前記認識辞書記憶手段に記憶されている認識対象語の音声認識用モデルを作成する音声認識用モデル作成手段と、
前記音声認識用モデル作成手段から音声認識用モデルが入力され、前記マイクロホンから音声信号が入力されると、入力音声信号を前記音声認識用モデルを用いて音声認識し、認識結果であるテキスト情報を出力する認識手段と、を有することを特徴とする請求項１または２のいずれかに記載の音声認識機能付き表示装置。The voice recognition means
A microphone that inputs voice uttered by a speaker and outputs a voice signal;
Syllable model storage means for storing a standard model of each syllable used for speech recognition;
Using the syllable model storage means, a speech recognition model creation means for creating a speech recognition model of a recognition target word stored in the recognition dictionary storage means;
When a speech recognition model is input from the speech recognition model creating means and a speech signal is input from the microphone, the input speech signal is speech-recognized using the speech recognition model, and text information as a recognition result is obtained. 3. A display device with a voice recognition function according to claim 1, further comprising a recognition means for outputting.