JPH1049195A

JPH1049195A - Voice recognition device

Info

Publication number: JPH1049195A
Application number: JP8225907A
Authority: JP
Inventors: Koji Hori; 孝二堀
Original assignee: Equos Research Co Ltd
Current assignee: Equos Research Co Ltd
Priority date: 1996-08-07
Filing date: 1996-08-07
Publication date: 1998-02-20

Abstract

PROBLEM TO BE SOLVED: To provide the voice recognition device in which an efficient voice recognition is conducted by appropriately classifying the contents of a voice dictionary. SOLUTION: When the device conducts a voice recognition, a beginning word vowel and an ending word vowel are comparatively less affected by other sounds and are easily recognized. Using this theory, a word dictionary is classified into plural individual dictionaries for each word, which are the objects of voice recognition, based on the combination of the beginning word vowel and the ending word vowel of the word. Then, a pattern matching section 165 and a discriminating section 166 conduct the prerecognition of the beginning word vowel and the ending word vowel from the similarity between the beginning and ending word portions of the features extracted by a feature extracting section 162 and the standard patterns of a vowel dictionary 163a. Then, the pattern matching of the inputted voice recognition is conducted with the priority to the individual dictionary of the combination of the both vowels.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置に係
り、例えば、車両用のナビゲーション装置における入力
装置等として使用される音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device used as an input device in a vehicle navigation device.

【０００２】[0002]

【従来の技術】人間の話した音声を言葉として認識する
音声認識装置が各種方面で実用化されている。この音声
認識装置は、例えば、工場における各種装置に対応する
指示をはなれた場所から音声で指示する入力装置として
実用化されており、また、自動車のナビゲーション装置
において、目的地や指示情報等を音声入力する場合の音
声入力装置として用いることが考えられている。このよ
うな音声認識装置では、一般に入力された音声を特定す
るために、予め認識対象となる音声の周波数分布を分析
することで、例えば、スペクトルや基本周波数の時系列
情報等を入力音声の特徴量として抽出し、そのパターン
を各単語に対応させて格納する音声認識用辞書を備えて
いる。2. Description of the Related Art Speech recognition devices for recognizing speech spoken by humans as words have been put to practical use in various fields. This voice recognition device has been put to practical use as an input device for giving a voice instruction from a place where an instruction corresponding to various devices in a factory has been released. It is considered to be used as a voice input device when inputting. Such a speech recognition apparatus generally analyzes the frequency distribution of the speech to be recognized in advance in order to identify the input speech, and for example, analyzes the spectrum and time-series information of the fundamental frequency to obtain the characteristics of the input speech. It has a speech recognition dictionary that extracts as quantities and stores the patterns in association with each word.

【０００３】そして、認識するべき音声が入力される
と、入力された音声の周波数パターンと音声認識用辞書
に格納された各単語のパターンをパターンマッチングに
より比較照合し、各単語に対する類似度を算出する。次
に算出された類似度が最も高い単語（パターンが最も近
い単語）を、入力された音声であると認識し、その単語
を出力するようにしている。つまり、入力された単語の
周波数分布のパターンがどの単語パターンに最もよく似
ているかを調べることによって、入力音声を判定してい
る。When a voice to be recognized is input, the frequency pattern of the input voice is compared with the pattern of each word stored in the voice recognition dictionary by pattern matching, and the similarity for each word is calculated. I do. Next, the word having the highest calculated similarity (the word having the closest pattern) is recognized as the input voice, and the word is output. That is, the input voice is determined by checking which word pattern most closely matches the pattern of the frequency distribution of the input word.

【０００４】音声認識装置において使用される音声認識
用辞書は、通常マッチング処理時間との関係から、通常
１０００単語程度で構成されている。１０００以上の単
語についての認識が必要な場合には、グループ毎に単語
を分けた複数の辞書を用意し、アプリケーションプログ
ラムによって辞書を切り替えて、マッチングを行う必要
があり、その切り替えをどのように行うかが問題にな
る。[0004] A dictionary for speech recognition used in a speech recognition apparatus is usually composed of about 1000 words in view of the relationship with the normal matching processing time. If it is necessary to recognize more than 1000 words, it is necessary to prepare a plurality of dictionaries in which the words are divided for each group, switch the dictionaries using an application program, and perform matching. Is a problem.

【０００５】ところで、音声認識装置を車載用のナビゲ
ーション装置に適用した技術に特開平７−６４４８０号
公報に記載された、車載情報処理用音声認識装置があ
る。この音声認識装置では、音声辞書に登録されている
ナビゲーション装置用の地図の表示内容に係る地名や施
設名などの語彙とを比較照合して入力語を認識する際、
音声辞書に登録されている語彙が大量になっても、音声
による入力語の音声認識率を効率よく迅速に行わせると
ともに、類似語による誤認識の確率を低減すしている。
そのために、このナビゲーション装置では、音声辞書の
登録内容を地域に応じてグループ分けしたうえで、ナビ
ゲーション装置によって求められている車両の現在位置
に対する距離にもとづいて、入力語を認識する際に用い
る音声辞書のグループを優先順位をもって決定するよう
にしている。[0005] A technology in which the voice recognition device is applied to a vehicle-mounted navigation device is a voice recognition device for in-vehicle information processing described in Japanese Patent Application Laid-Open No. 7-64480. In this voice recognition device, when recognizing an input word by comparing and collating with a vocabulary such as a place name or a facility name related to a display content of a map for a navigation device registered in a voice dictionary,
Even if the vocabulary registered in the speech dictionary becomes large, the speech recognition rate of the input words by speech is efficiently and quickly performed, and the probability of erroneous recognition by similar words is reduced.
For this purpose, in this navigation device, the registered contents of the voice dictionary are grouped according to the area, and the voice used for recognizing the input word is determined based on the distance from the current position of the vehicle obtained by the navigation device. Dictionary groups are determined by priority.

【０００６】[0006]

【発明が解決しようとする課題】しかし、前記公報に記
載された音声認識装置では、音声辞書の優先順位決定指
標が現在位置であるため、現在位置から目的地の入力語
の位置座標との距離が離れているほど音声辞書の切替え
回数が増える。また、地名で代表されるような、広大な
敷地の目的地であれば音声辞書の切替え回数は少なくて
よいが、商店や個人宅のような市街地図のように詳細な
地図にしか記載されていない目的地を入力した場合は、
地図の詳細度の低い音声辞書から詳細度の高い音声辞書
へ順次音声辞書を切替える必要性があり、かえって検索
に時間を要していた。However, in the speech recognition device described in the above publication, since the priority determination index of the speech dictionary is the current position, the distance from the current position to the position coordinates of the input word of the destination is determined. The more distant, the more times the voice dictionary is switched. Also, if the destination is a vast site such as a place name, the number of times the voice dictionary is switched may be small, but it is described only on a detailed map such as a city map such as a store or a private house. If you enter a destination that is not
It is necessary to sequentially switch the voice dictionary from a voice dictionary having a low level of detail to a voice dictionary having a high level of detail, and it takes time to search.

【０００７】本発明の目的は、音声辞書の内容を適切に
分類することにより、効率的に音声を認識することが可
能な音声認識装置を提供することにある。An object of the present invention is to provide a speech recognition device capable of efficiently recognizing speech by appropriately classifying the contents of a speech dictionary.

【０００８】[0008]

【課題を解決するための手段】請求項１に記載した発明
では、母音の標準パターンを格納した母音辞書と、認識
対象となる複数の単語の標準パターンを、その単語の語
頭の母音と語尾の母音の少なくとも一方の区別が可能な
状態に格納した個別辞書とを有する単語辞書と、音声を
入力する音声入力手段と、この音声入力手段から入力さ
れた音声についての特徴を抽出する特徴抽出手段と、こ
の特徴抽出手段で抽出された入力音声についての特徴の
語頭部分と語尾部分の少なくとも一方と、前記母音辞書
に格納された各母音の標準パターンとの類似度を算出す
る母音類似度算出手段と、この母音類似度算出手段によ
り算出された類似度から、音声入力手段から入力された
音声についての語頭の母音と語尾の母音の少なくとも一
方を認識する母音認識手段と、この母音認識手段で認識
された、母音に応じた単語の標準パターンを前記単語辞
書から選択する単語辞書選択手段と、前記特徴抽出手段
で抽出された特徴と、前記単語辞書選択手段で選択され
た標準パターンとの類似度を算出する単語類似度算出手
段と、この単語類似度算出手段で算出された類似度か
ら、入力された音声を判定する判定手段と、を音声認識
装置に具備させて前記目的を達成する。請求項２に記載
した発明では、母音の標準パターンを格納した母音辞書
と、認識対象となる複数の単語の標準パターンを、その
単語の語頭の母音と語尾の母音の少なくとも一方の区別
が可能な状態に格納した個別辞書とを有する単語辞書
と、音声を入力する音声入力手段と、この音声入力手段
から入力された音声についての特徴を抽出する特徴抽出
手段と、この特徴抽出手段で抽出された入力音声につい
ての特徴の語頭部分と語尾部分の少なくとも一方と、前
記母音辞書に格納された各母音の標準パターンとの類似
度を算出する母音類似度算出手段と、前記特徴抽出手段
で抽出された特徴と、前記単語辞書選択手段で選択され
た標準パターンとの類似度を算出する単語類似度算出手
段と、この単語類似度算出手段で算出された各単語の類
似度に、前記母音類似度算出手段で算出された母音類似
度に応じた重み付けを行う重み付け手段と、この重み付
け手段で、重み付けした後の類似度から、入力された音
声を判定する判定手段と、を音声認識装置に具備させて
前記目的を達成する。According to the first aspect of the present invention, a vowel dictionary storing a standard pattern of vowels and a standard pattern of a plurality of words to be recognized are defined by a vowel at the beginning of the word and an ending of the ending of the word. A word dictionary having an individual dictionary stored in a state where at least one of vowels can be distinguished, voice input means for inputting voice, and feature extracting means for extracting characteristics of voice input from the voice input means. Vowel similarity calculation means for calculating the similarity between at least one of the beginning part and the end part of the feature of the input voice extracted by the feature extraction means and the standard pattern of each vowel stored in the vowel dictionary. A vowel for recognizing at least one of a vowel at the beginning and a vowel at the end of the voice input from the voice input means from the similarity calculated by the vowel similarity calculating means. Recognition means; word dictionary selection means for selecting a standard pattern of a word corresponding to a vowel recognized by the vowel recognition means from the word dictionary; features extracted by the feature extraction means; A word similarity calculating means for calculating a similarity with the standard pattern selected in the above, and a determining means for determining an input voice from the similarity calculated by the word similarity calculating means, The purpose is achieved by providing. In the invention described in claim 2, the vowel dictionary storing the standard patterns of vowels and the standard patterns of a plurality of words to be recognized can distinguish at least one of the vowel at the beginning of the word and the vowel at the end of the word. A word dictionary having an individual dictionary stored in the state, a voice input unit for inputting voice, a feature extraction unit for extracting a feature of the voice input from the voice input unit, and a feature extracted by the feature extraction unit. Vowel similarity calculating means for calculating the similarity between at least one of the initial part and the ending part of the feature of the input voice and the standard pattern of each vowel stored in the vowel dictionary, and the vowel similarity extracted by the feature extracting means. Word similarity calculating means for calculating the similarity between the feature and the standard pattern selected by the word dictionary selecting means; and similarity of each word calculated by the word similarity calculating means. A weighting means for performing weighting according to the vowel similarity calculated by the vowel similarity calculating means, and a determining means for determining an input voice from the similarity after weighting by the weighting means. The above object is achieved by providing a recognition device.

【０００９】[0009]

【発明の実施の形態】以下、本発明の音声認識装置にお
ける実施形態を図１ないし図３を参照して詳細に説明す
る。（１）実施形態の概要本実施形態の音声認識装置では、音声認識を行う場合、
単語の語頭の母音（以下語頭母音という。）や、単語の
語尾の母音（以下語尾母音という。）は、他の音の影響
を比較的受けにくいため、これを認識することが容易で
あることに着目したものである。この点を利用して、音
声認識の対象となる各単語について、その単語の語頭母
音と語尾母音の組み合わせに基づいて、単語辞書を複数
の個別辞書にグループ分け（分類）する。そして、予備
認識として、入力音声の語頭母音と語尾母音について、
各母音の標準パターンとのパターンマッチングにより母
音認識を行う。その後、認識した両母音の組み合わせの
個別辞書から優先的に、入力音声認識のパターンマッチ
ングを行うようにする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the speech recognition apparatus according to the present invention will be described below in detail with reference to FIGS. (1) Overview of Embodiment In the speech recognition apparatus of the present embodiment, when performing speech recognition,
The vowel at the beginning of a word (hereinafter referred to as an initial vowel) and the vowel at the end of a word (hereinafter referred to as an ending vowel) are relatively insensitive to other sounds, so that they can be easily recognized. It pays attention to. Utilizing this point, for each word to be subjected to speech recognition, the word dictionary is grouped (classified) into a plurality of individual dictionaries based on a combination of an initial vowel and an end vowel of the word. Then, as preliminary recognition, for the initial vowel and the final vowel of the input speech,
Vowel recognition is performed by pattern matching with the standard pattern of each vowel. After that, pattern matching for input speech recognition is performed preferentially from the individual dictionary of the combination of both recognized vowels.

【００１０】（２）実施形態の詳細図１は本発明の一実施形態に係る音声認識装置をナビゲ
ーション装置に適用した場合のシステム構成を表したも
のである。このナビゲーション装置は、演算部１０を備
えている。この演算部１０には、タッチパネルとして機
能するディスプレイ１１ａとこのディスプレイ１１ａの
周囲に設けられた操作用のスイッチ１１ｂとを含む表示
部１１と、この表示部１１のタッチパネルやスイッチ１
１ｂからの入力を管理するスイッチ入力類管理部１２が
接続されている。(2) Details of Embodiment FIG. 1 shows a system configuration when a voice recognition device according to one embodiment of the present invention is applied to a navigation device. This navigation device includes a calculation unit 10. The calculation unit 10 includes a display unit 11 including a display 11a functioning as a touch panel and an operation switch 11b provided around the display 11a, and a touch panel and a switch 1 of the display unit 11
A switch input class management unit 12 for managing the input from 1b is connected.

【００１１】スイッチ１１ｂには、ナビゲーションのメ
ニュー画面を指定するスイッチ、エアコンの調整用のス
イッチ、オーディオの操作を行うためのスイッチ等の各
種スイッチがある。これらのスイッチを押すと、対応す
るメニュー画面がディスプレイ１１ａに表示されるよう
になっている。The switches 11b include various switches such as a switch for designating a menu screen for navigation, a switch for adjusting an air conditioner, and a switch for operating audio. When these switches are pressed, the corresponding menu screen is displayed on the display 11a.

【００１２】演算部１０には、現在位置測定部１３と、
速度センサ１４と、地図情報記憶部１５と、本実施形態
おける音声認識部１６と、音声出力部１７とが接続され
ている。現在位置測定部１３は、緯度と経度による座標
データを検出することで、車両が現在走行または停止し
ている現在位置を検出する。この現在位置測定部１３に
は、人工衛星を利用して車両の位置を測定するＧＰＳ(G
lobal Positioning System)レシーバ２１と、路上に配
置されたビーコンからの位置情報を受信するビーコン受
信装置２０と、方位センサ２２と、距離センサ２３とが
接続され、現在位置測定部１３はこれらからの情報を用
いて車両の現在位置を測定するようになっている。The arithmetic unit 10 includes a current position measuring unit 13 and
The speed sensor 14, the map information storage unit 15, the voice recognition unit 16 in the present embodiment, and the voice output unit 17 are connected. The current position measuring unit 13 detects the current position where the vehicle is currently running or stopped by detecting coordinate data based on latitude and longitude. The current position measuring unit 13 has a GPS (G
(lobal Positioning System) A receiver 21, a beacon receiving device 20 for receiving position information from a beacon placed on the road, an azimuth sensor 22, and a distance sensor 23 are connected, and the current position measurement unit 13 receives information from these. Is used to measure the current position of the vehicle.

【００１３】方位センサ２２は、例えば、地磁気を検出
して車両の方位を求める地磁気センサ、車両の回転角速
度を検出しその角速度を積分して車両の方位を求めるガ
スレートジャイロや光ファイバジャイロ等のジャイロ、
左右の車輪センサを配置しその出力パルス差（移動距離
の差）により車両の旋回を検出することで方位の変位量
を算出するようにした車輪センサ、等が使用される。距
離センサ２３は、例えば、車輪の回転数を検出して計数
し、または加速度を検出して２回積分するもの等の各種
の方法が使用される。なお、ＧＰＳレシーバ２１とビー
コン受信装置２０は単独で位置測定が可能であるが、Ｇ
ＰＳレシーバ２１やビーコン受信装置２０による受信が
不可能な場所では、方位センサ２２と距離センサ２３の
双方を用いた推測航法によって現在位置を検出するよう
になっている。The azimuth sensor 22 is, for example, a geomagnetic sensor for detecting the terrestrial magnetism to determine the azimuth of the vehicle, a gas rate gyro or an optical fiber gyro for detecting the angular velocity of the vehicle and integrating the angular velocity to determine the azimuth of the vehicle. gyro,
A wheel sensor or the like is used in which left and right wheel sensors are disposed, and a displacement of the azimuth is calculated by detecting turning of the vehicle based on an output pulse difference (difference in moving distance). As the distance sensor 23, for example, various methods such as a method of detecting and counting the number of rotations of a wheel, or a method of detecting acceleration and integrating twice are used. Note that the GPS receiver 21 and the beacon receiving device 20 can perform position measurement independently,
In a place where reception by the PS receiver 21 or the beacon receiving device 20 is not possible, the current position is detected by dead reckoning navigation using both the direction sensor 22 and the distance sensor 23.

【００１４】地図情報記憶部１５は、例えばＣＤＲＯＭ
等の大容量の記録媒体とその駆動装置（ドライバ）で構
成されている。この地図情報記憶部１５には、目的地ま
での経路探索に必要な道路データや、探索した経路をデ
ィスプレイ１１ａに表示するための地図データ等の、経
路探索および経路案内に必要な各種データが格納されて
いる。また、地図情報記憶部１５には、公共施設、ガソ
リンスタンド、公園、等の目的地として設定可能な各種
建造物や地点についての名称と、その位置を示す座標デ
ータ（緯度、経度）からなる、目的地データが格納され
ている。音声認識部１６には、音声が入力されるマイク
２４が接続されている。音声出力部１７は、音声を電気
信号として出力する音声出力用ＩＣ２６と、この音声出
力用ＩＣ２６の出力をディジタル−アナログ変換するＤ
／Ａコンバータ２７と、変換されたアナログ信号を増幅
するアンプ２８とを備えている。アンプ２８の出力端に
はスピーカ２９が接続されている。The map information storage unit 15 is, for example, a CDROM.
Etc., and a large-capacity recording medium and its driving device (driver). The map information storage unit 15 stores various data necessary for route search and route guidance, such as road data necessary for route search to the destination and map data for displaying the searched route on the display 11a. Have been. The map information storage unit 15 includes names of various buildings and points that can be set as destinations such as public facilities, gas stations, parks, and the like, and coordinate data (latitude and longitude) indicating the positions. Destination data is stored. The voice recognition unit 16 is connected to a microphone 24 to which voice is input. The audio output unit 17 includes an audio output IC 26 for outputting audio as an electric signal, and a digital-to-analog converter (D / A) for converting the output of the audio output IC 26 into digital signals.
/ A converter 27 and an amplifier 28 for amplifying the converted analog signal. A speaker 29 is connected to an output terminal of the amplifier 28.

【００１５】演算部１０は、ＣＰＵ（中央処理装置）、
ＲＯＭ（リード・オンリ・メモリ）、ＲＡＭ（ランダム
・アクセス・メモリ）等を備え、ＣＰＵがＲＡＭをワー
キングエリアとして、ＲＯＭまたは外部記憶装置に格納
されたプログラムを実行することによって、上記の各構
成を実現するようになっている。すなわち、演算部１０
は、速度センサ１４および地図情報記憶部１５に接続さ
れた地図データ読込部３１と、地図描画部３２と、地図
データ読込部３１および地図描画部３２を管理する地図
管理部３３と、地図描画部３２および表示部１１に接続
された画面管理部３４と、スイッチ入力類管理部１２お
よび音声認識部１６に接続された入力管理部３５と、音
声出力部１７の音声出力用ＩＣ２６に接続された音声出
力管理部３６、通信管理部３８、および、地図管理部３
３、画面管理部３４、入力管理部３５、音声出力管理部
３６、通信管理部３８を管理する全体管理部３７とを備
えている。通信管理部３８には、図示しない自動車電話
や、ＰＨＳ、携帯電話等の通信機器が接続可能になって
おり、通常の電話通信の他、ファクシミリ通信やパソコ
ン通信等のマルチメディア通信、ＡＴＩＳによる通信等
の各種通信を行う場合に、通信を管理するようになって
いる。The arithmetic unit 10 includes a CPU (central processing unit),
A ROM (Read Only Memory), a RAM (Random Access Memory), and the like are provided, and the CPU executes the programs stored in the ROM or the external storage device using the RAM as a working area, thereby implementing each of the above configurations. Is to be realized. That is, the operation unit 10
A map data reading unit 31 connected to the speed sensor 14 and the map information storage unit 15; a map drawing unit 32; a map management unit 33 that manages the map data reading unit 31 and the map drawing unit 32; 32, a screen management unit 34 connected to the display unit 11, an input management unit 35 connected to the switch input type management unit 12 and the voice recognition unit 16, and a voice connected to the voice output IC 26 of the voice output unit 17. Output management unit 36, communication management unit 38, and map management unit 3
3, a screen management unit 34, an input management unit 35, an audio output management unit 36, and an overall management unit 37 that manages a communication management unit 38. The communication management unit 38 can be connected to communication devices such as a car telephone, a PHS, and a mobile phone (not shown). In addition to ordinary telephone communication, multimedia communication such as facsimile communication and personal computer communication, and communication using ATIS. When various communications such as are performed, the communications are managed.

【００１６】図２は、図１における音声認識部１６の構
成を示すブロック図である。この図に示すように、音声
認識部１６は、前処理部１６１、特徴抽出部１６２、単
語辞書１６３、パターンマッチング部１６５、および、
判定部１６６を備えている。前処理部１６１は、マイク
２４から入力される音声信号をディジタル信号に変換す
るとともに、Ａ／Ｄ変換後の音声信号に対して音声区間
の検出、プリエンファシス（高域強調）、雑音除去等の
前処理を行うようになっている。特徴抽出部１６２は、
前処理部１６１で前処理が行われた後の音声信号から、
その音声についての特徴を抽出するようになっている。
抽出した音声についての特徴は、その単語の単語パター
ンとされる。ここで、音声信号の特徴は、例えば、高速
フーリエ変換（ＦＦＴ）により得られる、スペクトルや
ケプストラムについての、時系列情報が使用される。こ
の特徴抽出部１６２は、多チャネル・バンドパスフィル
タや線形予測分析等の各種分析法によって、入力音声に
ついての特徴を抽出するようになっている。FIG. 2 is a block diagram showing the configuration of the speech recognition section 16 in FIG. As shown in the figure, the speech recognition unit 16 includes a preprocessing unit 161, a feature extraction unit 162, a word dictionary 163, a pattern matching unit 165,
A determination unit 166 is provided. The pre-processing unit 161 converts an audio signal input from the microphone 24 into a digital signal, and performs audio section detection, pre-emphasis (high-frequency emphasis), noise removal, and the like on the audio signal after the A / D conversion. Pre-processing is performed. The feature extraction unit 162
From the audio signal after the preprocessing is performed by the preprocessing unit 161,
The feature of the voice is extracted.
The feature of the extracted voice is a word pattern of the word. Here, as the feature of the audio signal, for example, time-series information about a spectrum or a cepstrum obtained by fast Fourier transform (FFT) is used. The feature extraction unit 162 extracts a feature of the input speech by various analysis methods such as a multi-channel bandpass filter and a linear prediction analysis.

【００１７】単語辞書１６３には、音声認識の対象とな
るすべての単語についての標準パターンと、各母音（本
明細書では、「あ」、「い」、「う」、「え」、「お」
に「ん」を含めた６音を母音という。）についての標準
パターンが格納されている。この標準パターンは、不特
定話者認識用のもので、特徴抽出部１６２による音声の
分析方法と同一の方法によって抽出した各単語の特徴が
標準パターンとして格納されている。音声認識の対象と
なる単語としては、タッチパネル１１ａの画面に表示さ
れる各種指定キーとスイッチ１１ｂの各種スイッチの内
容、及び、地図情報記憶部１５に格納されている目的地
設定可能な目的地名等である。The word dictionary 163 stores standard patterns for all words to be subjected to speech recognition and respective vowels (in this specification, “A”, “I”, “U”, “E”, “O” "
The six sounds including "n" are called vowels. ) Is stored. This standard pattern is for speaker-independent recognition, and the features of each word extracted by the same method as the speech analysis method by the feature extraction unit 162 are stored as standard patterns. The words to be subjected to voice recognition include various designation keys displayed on the screen of the touch panel 11a and the contents of various switches of the switch 11b, destination names stored in the map information storage unit 15 and capable of setting destinations, and the like. It is.

【００１８】図３は、単語辞書１６３の内容の一例を概
念的に表したものである。この図３に示すように単語辞
書１６３は、母音の標準パターンが格納されている母音
辞書１６３ａと、認識対象単語の標準パターンが分類毎
に格納されている複数の個別辞書１６３ｂから構成され
ている。各個別辞書に格納される単語数は１０００単語
以下になっている。各個別辞書１６３ｂは、図３に示す
ように、ああ単語辞書、あい単語辞書、あう単語辞書、
…、おん単語辞書というように、各単語の語頭母音と語
尾母音との組み合わせにより認識単語が分類されてい
る。各個別辞書に格納される単語としては、上述したよ
うに、タッチパネル１１ａに表示される各種指定キーと
して、おい単語辞書の「ほせい」、おう単語辞書の「ご
るふ」、うい単語辞書の「ゆうえんち」等があり、目的
地名としてあう単語辞書の「たまてっく」、おん単語辞
書の「としまえん」、いあ単語辞書の「きよみずでら」
等がある。各個別辞書には、実際は、該当単語の標準パ
ターンと、その単語に対応する符号列からなるコード情
報とが格納されている。各単語のコード情報は、地図記
憶部１５に格納されている目的地名のコード情報や、タ
ッチパネル１１ａ等からの入力内容に対応したコード情
報と同一のコード情報が使用される。FIG. 3 conceptually shows an example of the contents of the word dictionary 163. As shown in FIG. 3, the word dictionary 163 includes a vowel dictionary 163a in which standard patterns of vowels are stored, and a plurality of individual dictionaries 163b in which standard patterns of recognition target words are stored for each classification. . The number of words stored in each individual dictionary is 1000 words or less. As shown in FIG. 3, each of the individual dictionaries 163b includes a word dictionary, a word dictionary, a word dictionary,
... Recognized words are classified according to a combination of an initial vowel and an end vowel of each word, such as an on-word dictionary. As described above, the words stored in each of the individual dictionaries are, as described above, various designation keys displayed on the touch panel 11a, such as "Hoshii" of the Oi word dictionary, "Goruf" of the Oo word dictionary, and "Yuenchi" of the Ui word dictionary. , Etc., and the word dictionary "Tamatek" that matches as the destination name, the On word dictionary "Toshimaen", and the Ia word dictionary "Kiyomizu dera"
Etc. Actually, each individual dictionary stores a standard pattern of a corresponding word and code information including a code string corresponding to the word. As the code information of each word, the same code information as the code information of the destination name stored in the map storage unit 15 and the code information corresponding to the input content from the touch panel 11a or the like is used.

【００１９】パターンマッチング部１６５は、特徴抽出
部１６２で抽出された単語パターン（特徴）と、単語辞
書１６３に格納された標準パターンとを比較すること
で、両者の類似度を算出するようになっている。ここ
で、パターンマッチング部１６５が行うパターンマッチ
ング（比較）は、入力音声認識でのマッチングと、この
入力音声認識のパターンマッチングで使用する個別辞書
を選択するための母音認識でのマッチングとがある。最
初に行われる、母音認識のためのマッチングでは、入力
音声の語頭部分、語尾部分の単語パターンと、母音の標
準パターンとの類似度を算出し、その算出結果を判定部
１６６に供給するようになっている。次いで行われる、
入力音声認識のためのマッチングでは、判定部１６６か
ら供給される母音認識結果に基づいて、入力音声の語頭
と語尾の母音の組み合わせの個別辞書を選択し、その個
別辞書内の各標準パターンと特徴抽出部１６２で抽出さ
れた単語パターンとの類似度を算出し、判定部１６６に
供給するようになっている。The pattern matching unit 165 calculates the similarity between the word patterns (features) extracted by the feature extraction unit 162 and the standard patterns stored in the word dictionary 163 by comparing them. ing. Here, the pattern matching (comparison) performed by the pattern matching unit 165 includes matching in input speech recognition and matching in vowel recognition for selecting an individual dictionary to be used in the pattern matching for input speech recognition. In the matching for vowel recognition performed first, the similarity between the word pattern of the initial part and the ending part of the input voice and the standard pattern of the vowel is calculated, and the calculation result is supplied to the determination unit 166. Has become. Then,
In the matching for input voice recognition, based on the vowel recognition result supplied from the determination unit 166, an individual dictionary of a combination of a vowel at the beginning and end of the input voice is selected, and each standard pattern and feature in the individual dictionary are selected. The degree of similarity with the word pattern extracted by the extraction unit 162 is calculated and supplied to the determination unit 166.

【００２０】判定部１６６は、パターンマッチング部１
６５の比較結果に基づいて、入力音声の語頭と語尾の母
音を認識する母音認識と、入力音声の認識とを行う。入
力音声の認識では、マイク２４から入力された音声の内
容を認識し、その認識内容（認識単語）に対応するコー
ド情報を、演算部１０の入力管理部３５に供給するよう
になっている。The determination unit 166 is provided by the pattern matching unit 1
Based on the comparison result of 65, vowel recognition for recognizing the vowel at the beginning and end of the input voice and recognition of the input voice are performed. In the recognition of the input voice, the content of the voice input from the microphone 24 is recognized, and code information corresponding to the recognized content (recognized word) is supplied to the input management unit 35 of the arithmetic unit 10.

【００２１】次に、このように構成された音声認識装置
における音声認識動作について説明する。マイク２４か
ら認識対象となる音声が音声認識部１６に入力される
と、前処理部１６１では、入力された音声のアナログ信
号をディジタル信号に変換した後、声区間の検出、プリ
エンファシス、雑音除去等の前処理を行った後、その音
声信号を特徴抽出部１６２に供給する。Next, the speech recognition operation in the speech recognition apparatus thus configured will be described. When a voice to be recognized is input from the microphone 24 to the voice recognition unit 16, the preprocessing unit 161 converts an analog signal of the input voice into a digital signal, and then detects a voice section, pre-emphasis, and noise removal. After that, the audio signal is supplied to the feature extraction unit 162.

【００２２】特徴抽出部１６２では、供給された音声信
号を分析することで、その入力音声の特徴を抽出する。
そして、抽出した特徴を、その入力音声についての単語
パターンとして、パターンマッチング部１６５に供給す
る。パターンマッチング部１６５では、まず、入力音声
の単語パターンの語頭部分と、語尾部分を抽出し、この
両者と、単語辞書１６３の母音辞書１６３ａの各母音標
準パターンとのパターンマッチングを行い、それぞれの
類似度を算出して判定部１６６に供給する。ここで、語
頭部分と語尾部分の抽出としては、種々の方法が考えら
れるが、例えば、入力音声の最初と最後の所定時間に対
応する間隔の単語パターンを抽出することで行う。これ
は、音声認識を行う場合の話者は通常一定の速度で音声
を入力するということを利用したもので、入力単語や入
力話者に関係なく所定範囲の単語パターンを抽出するこ
とで母音を認識することが可能である。The characteristic extracting section 162 analyzes the supplied audio signal to extract the characteristic of the input voice.
Then, the extracted feature is supplied to the pattern matching unit 165 as a word pattern for the input voice. The pattern matching unit 165 first extracts the beginning part and the end part of the word pattern of the input voice, performs pattern matching between these parts and each vowel standard pattern of the vowel dictionary 163a of the word dictionary 163, and performs similarity analysis. The degree is calculated and supplied to the determination unit 166. Here, various methods can be considered for extracting the beginning part and the end part. For example, the extraction is performed by extracting word patterns at intervals corresponding to the first and last predetermined times of the input voice. This is based on the fact that a speaker when performing speech recognition usually inputs speech at a constant speed, and vowels are extracted by extracting a predetermined range of word patterns regardless of the input word or input speaker. It is possible to recognize.

【００２３】判定部１６６では、パターンマッチング部
１６５から供給される語頭についての各母音に対する類
似度から、最も類似度が高い母音を、語頭の母音である
と判定する。同様に、類似度が最も高い母音から語尾の
母音を判定する。そして、判定部１６６は、判定した、
語頭と語尾の母音の組み合わせを、パターンマッチング
部１６５に供給する。The determining unit 166 determines that the vowel having the highest similarity is the vowel at the beginning of the word, based on the similarity of each of the vowels at the beginning supplied from the pattern matching unit 165. Similarly, the vowel at the end is determined from the vowel having the highest similarity. Then, the determination unit 166 determines
The combination of the beginning and ending vowels is supplied to the pattern matching unit 165.

【００２４】パターンマッチング部１６５では、判定部
１６６での判定結果に基づいて、単語辞書１６３の個別
辞書１６３ｂの中から、該当する母音の組み合わせの個
別辞書を選択する。そして、既に特徴抽出部１６２から
供給されている入力音声の単語パターンと、選択した個
別辞書内の各標準パターンとのパターンマッチングを行
い、各単語に対する類似度を算出して判定部１６６に供
給する。判定部１６６では、供給された各単語に対する
類似度の大きい順にソートし、類似度が所定のしきい値
を越えていていることを条件に、類似度が大きい上位の
単語を所定個Ｎ個取り出す。そして、類似度が最も大き
い単語を入力音声に対する認識単語とし、他の単語を類
似度が大きい順に次候補として、各単語に対応するコー
ド情報を、制御部１０の入力管理部３５に供給する。The pattern matching unit 165 selects an individual dictionary of a combination of vowels from the individual dictionaries 163b of the word dictionary 163 based on the determination result of the determination unit 166. Then, pattern matching is performed between the word pattern of the input voice already supplied from the feature extraction unit 162 and each standard pattern in the selected individual dictionary, the similarity for each word is calculated and supplied to the determination unit 166. . The determination unit 166 sorts the supplied words in descending order of similarity, and extracts a predetermined number N of high-order words having a high degree of similarity on condition that the degree of similarity exceeds a predetermined threshold. . Then, the word having the highest similarity is set as the recognition word for the input voice, and the other words are set as the next candidates in descending order of the similarity, and the code information corresponding to each word is supplied to the input management unit 35 of the control unit 10.

【００２５】なお、判定部１６６は、パターンマッチン
グ部１６５から供給される類似度が所定のしきい値を越
える単語の数がＮ個無い場合には、パターンマッチング
部１６５に対して、語頭と語尾の母音の他の組み合わせ
を供給し、他の個別辞書についての再度のパターンマッ
チングを要求する。ここで、他の組み合わせとしては、
類似度が２番目に高い語頭母音と、類似度が最も高い語
尾母音の組み合わせである。このように、語尾母音を換
えずに、語頭母音を換えた組み合わせとするのは、語頭
母音よりも語尾母音の方が一般に高い認識率を得ること
ができ、語頭部分の誤認識である確率が高いからであ
る。When there are no N words whose similarity supplied from the pattern matching unit 165 exceeds a predetermined threshold value, the determination unit 166 gives the pattern matching unit 165 an initial word and an end word. To provide another combination of vowels, and request another pattern matching for another individual dictionary. Here, as another combination,
This is a combination of an initial vowel having the second highest similarity and a final vowel having the highest similarity. In this way, the combination in which the initial vowel is changed without changing the final vowel can generally obtain a higher recognition rate for the initial vowel than for the initial vowel, and the probability of misrecognition of the initial part is low. Because it is expensive.

【００２６】パターンマッチング部１６５は、この再度
のパターンマッチング要求があると、供給された他の母
音の組み合わせの個別辞書に切り換えて、再度パターン
マッチングを行い、各単語についての類似度を判定部１
６６に供給する。判定部１６６は、既に供給されている
類似度と、再度のパターンマッチングで新たに供給され
た類似度の中から、しきい値を越える上位Ｎ個を選択し
てそのコード情報を入力管理部３５に供給する。これに
よってもしきい値を越える単語がＮ個無い場合、判定部
１６６は、しきい値を越える単語がＮ個以上になるま
で、母音の他の組み合わせをパターンマッチング部１６
５に供給して、再度パターンマッチングを要求する。３
度目のパターンマッチングを要求する場合、類似度が最
も高い語頭母音と、類似度が２番目に高い語尾母音の組
み合わせとし、４度目のパターンマッチングを要求する
場合、類似度が共に２番目の語頭母音と語尾母音の組み
合わせとする。また、５度目以降のパターンマッチング
を要求する場合には、既に要求した組み合わせを除き、
語頭母音の類似度と語尾母音の類似度の合計値が大きい
組み合わせの順に、再度のパターンマッチングを要求す
る。なお、５度目以降ではなく、２度目以降のパターン
マッチングの要求を行う時点で、語頭母音の類似度と語
尾母音の類似度の合計値が大きい組み合わせの順に、再
度のパターンマッチングを要求するようにしてもよい。When there is a request for the pattern matching again, the pattern matching unit 165 switches to the supplied individual dictionary of another combination of vowels, performs pattern matching again, and determines the similarity for each word.
66. The determination unit 166 selects the top N items exceeding the threshold value from the similarities already supplied and the similarities newly supplied by the pattern matching again, and inputs the code information to the input management unit 35. To supply. If there are no N words that exceed the threshold value, the determination unit 166 determines whether another combination of vowels is available until the number of words that exceed the threshold value reaches N or more.
5 to request pattern matching again. 3
If the fourth pattern matching is requested, the first vowel with the highest similarity and the second highest vowel with the second highest similarity are combined, and if the fourth pattern matching is requested, the second initial vowel with the same similarity is used. And the vowel ending. When requesting pattern matching for the fifth time or later, except for the combination already requested,
The pattern matching is requested again in the order of the combination in which the total value of the similarity of the initial vowel and the similarity of the final vowel is large. At the time of requesting the second or subsequent pattern matching, not the fifth or later, the pattern matching is requested again in the order of the combination having the larger total similarity between the initial vowel and the final vowel. You may.

【００２７】演算部１０では、認識単語に対応するコー
ド情報が入力管理部に供給されると、全体管理部３７
が、音声出力管理部３６を介してスピーカ２９から音声
によるアンサーバックを行うことで、認識音声の確認を
行う。または、供給されたコード情報に対応するＮ個の
単語の所定数をディスプレイ１１ａに表示し、ユーザに
選択してもらうことで認識音声を特定する。When the code information corresponding to the recognized word is supplied to the input management unit, the calculation unit 10
However, by performing answerback by voice from the speaker 29 via the voice output management unit 36, the recognized voice is confirmed. Alternatively, a predetermined number of N words corresponding to the supplied code information is displayed on the display 11a, and the user selects the word, thereby identifying the recognition voice.

【００２８】以上説明したように本実施形態によれば、
音声認識を行う場合、単語の語頭や語尾の母音は他の音
の影響を比較的受けにくいため、これを認識することが
容易であることに着目し、音声認識の対象となる各単語
を、その単語の語頭と語尾の母音の組み合わせによる複
数の個別辞書に分類している。そして、入力音声の語頭
と語尾の母音について、各母音の標準パターンとのパタ
ーンマッチングにより母音認識を行い、両母音の組み合
わせの個別辞書から優先的にパターンマッチングを行う
ようにしたので、音声辞書の選択と切り換えを適切に行
うことともに、認識時間を短縮することができる。ま
た、個別辞書の適切な分類と、適切な選択が行われるた
め、認識率を向上させることができる。As described above, according to the present embodiment,
When performing voice recognition, the vowel at the beginning or end of a word is relatively insensitive to other sounds, so it is easy to recognize this. The words are classified into a plurality of individual dictionaries based on a combination of the beginning and the end vowels. Then, the vowels at the beginning and end of the input voice are subjected to vowel recognition by pattern matching with the standard pattern of each vowel, and pattern matching is preferentially performed from the individual dictionary of the combination of both vowels. Appropriate selection and switching can be performed, and the recognition time can be reduced. In addition, since appropriate classification and selection of the individual dictionaries are performed, the recognition rate can be improved.

【００２９】以上説明した実施形態では、本発明の好適
な実施形態の内の１実施形態について説明したもので、
本発明は特許請求の範囲に記載した発明の範囲において
種々の変形が可能である。例えば、説明した実施形態で
は、類似度が最も高い語頭母音と語尾母音の組み合わせ
に対応する個別辞書から順次パターンマッチングを行
い、所定のしきい値を越える類似度の単語がＮ個以上に
なった時点で他の個別辞書に対するパターンマッチング
を終了する構成としたが、本発明では他に、すべての個
別辞書に対するパターンマッチングを行うようにしても
良い。In the embodiment described above, only one of the preferred embodiments of the present invention has been described.
The present invention can be variously modified within the scope of the invention described in the claims. For example, in the described embodiment, pattern matching is sequentially performed from the individual dictionary corresponding to the combination of the initial vowel and the ending vowel having the highest similarity, and the number of words having a similarity exceeding a predetermined threshold becomes N or more. Although the pattern matching for the other individual dictionaries is terminated at the point in time, the present invention may alternatively perform pattern matching for all the individual dictionaries.

【００３０】この場合、各個別辞書の単語に対するパタ
ーンマッチングの結果得られる類似度に対して、語頭母
音と語尾母音の各組み合わせについての類似度の合計値
に応じた重みづけをしても良い。例えば、母音認識にお
いて、語頭母音と語尾母音の類似度の合計値が、大きい
順に、「あう」、「いう」、「あえ」、…、であったと
する。この場合、あう単語辞書内の単語「まる」、「ば
つ」、「さんせいどう」等に対する類似度に最も大きな
重み付けをする。そして、いう単語辞書内の、「ぎんこ
う」、「ちてんとうろく」等に対する類似度に２番目に
大きな重み付けをし、あえ単語辞書内の単語「あかば
ね」等に３番目に大きな重み付けをする。なお、この重
み付けをどの範囲まで（語頭母音と語尾母音の類似度の
合計値が何番目まで）重み付けをするかについては、任
意に選択することができる。また、重み付けとして、所
定の値を加算するのか、または、所定係数を乗算するの
かについて、および、加算値、乗算値についても任意に
選択することができる。In this case, the similarity obtained as a result of the pattern matching for the words of each individual dictionary may be weighted according to the sum of the similarities for each combination of the initial vowel and the final vowel. For example, in the vowel recognition, it is assumed that the total value of the similarities between the initial vowel and the final vowel is “a”, “a”, “a”,. In this case, the greatest weight is given to the degree of similarity to the words “maru”, “batsu”, “sansei” and the like in the matching word dictionary. In the word dictionary, the similarity to “Ginko”, “Chitentokuroku”, etc. is weighted second largest, and the word “Akabane” in the word dictionary is weighted third largest. . It is to be noted that it is possible to arbitrarily select a range to which the weighting is to be applied (to what order is the total similarity between the initial vowel and the final vowel). As the weighting, it is possible to arbitrarily select whether to add a predetermined value or multiply by a predetermined coefficient, and also to add or multiply the value.

【００３１】また、以上説明した実施形態では、パター
ンマッチングを行う回路またはチップ等が１つである場
合を前提に説明したが、本発明では、複数配置するよう
にしても良い。例えば、母音辞書１６３ａと、分類した
個別辞書１６３ｂのそれぞれに対して、専用のパターン
マッチング用の回路又はチップ等を配置するようにして
も良い。この場合には、上記したように、入力音声の語
頭母音と語尾母音の類似度合計に応じた重み付けをす
る。このように、パターンマッチングを行う回路又はチ
ップ等を複数配置する構成とすることで、入力音声を高
速で認識すると共に、高い認識率を得ることができる。In the above-described embodiment, the description has been made on the assumption that the number of circuits or chips for performing pattern matching is one, but in the present invention, a plurality of circuits or chips may be arranged. For example, a dedicated circuit or chip for pattern matching may be arranged for each of the vowel dictionary 163a and the classified individual dictionary 163b. In this case, as described above, weighting is performed according to the total similarity between the initial vowel and the final vowel of the input voice. In this manner, by arranging a plurality of circuits or chips for performing pattern matching, input voice can be recognized at high speed and a high recognition rate can be obtained.

【００３２】また、説明した実施形態では、図３に示す
ように、語頭母音と語尾母音の組み合わせに応じた個別
辞書に各単語を分類したが、個別辞書による分類をする
ことなく、各単語の標準パターンデータとコード情報に
加えて、その単語の語頭母音と語尾母音の情報を格納す
るようにしても良い。この場合、パターンマッチング部
１６５では、パターンマッチングを行う前に、単語辞書
の全単語の中から、判定部１６６から供給される、語頭
母音と語尾母音の組み合わせに対応する単語をセレクト
し、その後にパターンマッチングを行う。Further, in the embodiment described above, as shown in FIG. 3, each word is classified into an individual dictionary corresponding to a combination of an initial vowel and an end vowel, but without being classified by the individual dictionary, each word is classified. In addition to the standard pattern data and the code information, the information of the initial vowel and the final vowel of the word may be stored. In this case, the pattern matching unit 165 selects a word corresponding to the combination of the initial vowel and the ending vowel supplied from the determination unit 166 from all the words in the word dictionary before performing the pattern matching. Perform pattern matching.

【００３３】以上説明した実施形態では、個別単語辞書
１６３ｂを、認識単語の語頭母音と語尾母音の組み合わ
せにより分類することとしたが、本発明では、語頭母音
のみによる分類、または、語尾母音のみによる分類とし
ても良い。語頭母音または語尾母音のみによる分類は、
例えば、認識対象となる単語数が少ない場合に有効であ
る。In the embodiment described above, the individual word dictionary 163b is classified based on the combination of the initial vowel and the final vowel of the recognized word. However, in the present invention, the classification is performed only by the initial vowel or only the final vowel. It is good also as classification. Classification using only the initial vowel or the final vowel
For example, this is effective when the number of words to be recognized is small.

【００３４】また、以上説明した実施形態では、音声認
識装置の全機能をナビゲーション装置に適用したが、本
発明では、音声認識装置の一部、又は全部をナビゲーシ
ョン装置外の他の装置に配置するようにしても良い。他
の装置としては、車両に対して目的地までの走行経路等
に関する情報を通信によって提供する、情報提供局とす
ることが望ましい。情報提供局には、少なくとも母音辞
書１６３ａと個別辞書１６３ｂを有する単語辞書１６３
と、パターンマッチング部１６５と、判定部１６６を配
置しておくが、前処理部１６１、特徴抽出部１６２を含
めた音声認識部１６全体を情報提供局に配置しておくこ
とが、ナビゲーション装置側の装置構成を少なくするう
えで好ましい。音声認識部１６全体を情報提供局に配置
した場合、目的地等の音声をナビゲーション装置から入
力し、これを通信管理部３８を介して自動車電話等から
情報提供局に送信する。情報提供局では、受信した音声
に対して、前処理、特徴抽出、パターンマッチング、お
よび、判定を行い、類似度が上位Ｎ個の認識単語に対す
るコード情報を、通信によってナビゲーション装置に送
信する。情報提供局によるパターンマッチング処理と判
定処理については、前記実施形態で説明した方法でも、
その変形例で説明したいずれの方法でも良い。ナビゲー
ション装置では、通信管理部３８を介してこのコード情
報を受信し、目的地の設定等を行う。なお、情報提供局
では、音声認識により得られた目的地に基づいて、その
目的地までの経路探索を行い、探索経路の情報をナビゲ
ーション装置に送信するようにしても良い。Further, in the embodiment described above, all functions of the speech recognition device are applied to the navigation device. However, in the present invention, part or all of the speech recognition device is arranged in another device outside the navigation device. You may do it. As another device, it is desirable to be an information providing station that provides information on a traveling route to a destination and the like to a vehicle by communication. The information provider has a word dictionary 163 having at least a vowel dictionary 163a and an individual dictionary 163b.
And the pattern matching unit 165 and the determination unit 166 are arranged. However, the entire speech recognition unit 16 including the preprocessing unit 161 and the feature extraction unit 162 is arranged in the information providing station. It is preferable in reducing the device configuration of the above. When the entire voice recognition unit 16 is arranged at the information providing station, the voice of the destination or the like is input from the navigation device, and is transmitted from the car telephone or the like to the information providing station via the communication management unit 38. The information providing station performs pre-processing, feature extraction, pattern matching, and determination on the received voice, and transmits the code information for the top N recognized words having similarities to the navigation device by communication. For the pattern matching process and the determination process by the information providing station, even in the method described in the above embodiment,
Any of the methods described in the modified examples may be used. The navigation device receives the code information via the communication management unit 38, and sets a destination. The information providing station may search for a route to the destination based on the destination obtained by voice recognition, and transmit information on the searched route to the navigation device.

【００３５】[0035]

【発明の効果】本発明によれば、音声辞書の内容を適切
に分類したので、効率的に音声を認識することができ
る。According to the present invention, since the contents of the speech dictionary are appropriately classified, speech can be recognized efficiently.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る、音声認識装置をナビゲーション
装置に適用した場合の構成図である。FIG. 1 is a configuration diagram when a voice recognition device according to the present invention is applied to a navigation device.

【図２】音声認識部の構成図である。FIG. 2 is a configuration diagram of a voice recognition unit.

【図３】単語辞書の内容の一例を概念的に表した説明図
である。FIG. 3 is an explanatory diagram conceptually showing an example of the contents of a word dictionary.

【符号の説明】[Explanation of symbols]

１０演算部１１表示部１１ａディスプレイ１３現在位置測定部１５地図情報記憶部１６音声認識部１６１前処理部１６２特徴抽出部１６３単語辞書１６３ａ母音辞書１６３ｂ個別辞書１６５パターンマッチング部１６６判定部１７音声出力部２４マイク３３地図管理部３４画面管理部３５入力管理部３７全体管理部３８通信管理部 Reference Signs List 10 arithmetic unit 11 display unit 11a display 13 current position measurement unit 15 map information storage unit 16 speech recognition unit 161 preprocessing unit 162 feature extraction unit 163 word dictionary 163a vowel dictionary 163b individual dictionary 165 pattern matching unit 166 determination unit 17 audio output unit 24 microphone 33 map management unit 34 screen management unit 35 input management unit 37 overall management unit 38 communication management unit

Claims

【特許請求の範囲】[Claims]

【請求項１】母音の標準パターンを格納した母音辞書
と、認識対象となる複数の単語の標準パターンを、その
単語の語頭の母音と語尾の母音の少なくとも一方の区別
が可能な状態に格納した個別辞書とを有する単語辞書
と、音声を入力する音声入力手段と、この音声入力手段から入力された音声についての特徴を
抽出する特徴抽出手段と、この特徴抽出手段で抽出された入力音声についての特徴
の語頭部分と語尾部分の少なくとも一方と、前記母音辞
書に格納された各母音の標準パターンとの類似度を算出
する母音類似度算出手段と、この母音類似度算出手段により算出された類似度から、
音声入力手段から入力された音声についての語頭の母音
と語尾の母音の少なくとも一方を認識する母音認識手段
と、この母音認識手段で認識された、母音に応じた単語の標
準パターンを前記単語辞書から選択する単語辞書選択手
段と、前記特徴抽出手段で抽出された特徴と、前記単語辞書選
択手段で選択された標準パターンとの類似度を算出する
単語類似度算出手段と、この単語類似度算出手段で算出された類似度から、入力
された音声を判定する判定手段と、を具備することを特徴とする音声認識装置。A vowel dictionary storing standard vowel patterns and a standard pattern of a plurality of words to be recognized are stored in a state where at least one of a vowel at the beginning of the word and a vowel at the end of the word can be distinguished. A word dictionary having an individual dictionary; voice input means for inputting voice; feature extraction means for extracting features of voice input from the voice input means; and input voice extracted by the feature extraction means. Vowel similarity calculating means for calculating the similarity between at least one of the initial part and the ending part of the feature and the standard pattern of each vowel stored in the vowel dictionary; and the similarity calculated by the vowel similarity calculating means. From
A vowel recognition means for recognizing at least one of a vowel at the beginning and a vowel at the end of a voice input from the voice input means; and a standard pattern of a word corresponding to a vowel recognized by the vowel recognition means from the word dictionary. A word dictionary selecting unit to be selected; a word similarity calculating unit that calculates a similarity between the feature extracted by the feature extracting unit and a standard pattern selected by the word dictionary selecting unit; A determination unit that determines the input voice from the similarity calculated in (b).

【請求項２】母音の標準パターンを格納した母音辞書
と、認識対象となる複数の単語の標準パターンを、その
単語の語頭の母音と語尾の母音の少なくとも一方の区別
が可能な状態に格納した個別辞書とを有する単語辞書
と、音声を入力する音声入力手段と、この音声入力手段から入力された音声についての特徴を
抽出する特徴抽出手段と、この特徴抽出手段で抽出された入力音声についての特徴
の語頭部分と語尾部分の少なくとも一方と、前記母音辞
書に格納された各母音の標準パターンとの類似度を算出
する母音類似度算出手段と、前記特徴抽出手段で抽出された特徴と、前記単語辞書選
択手段で選択された標準パターンとの類似度を算出する
単語類似度算出手段と、この単語類似度算出手段で算出された各単語の類似度
に、前記母音類似度算出手段で算出された母音類似度に
応じた重み付けを行う重み付け手段と、この重み付け手段で、重み付けした後の類似度から、入
力された音声を判定する判定手段と、を具備することを
特徴とする音声認識装置。2. A vowel dictionary storing a standard pattern of vowels and a standard pattern of a plurality of words to be recognized are stored in a state where at least one of a vowel at the beginning of the word and a vowel at the end of the word can be distinguished. A word dictionary having an individual dictionary; voice input means for inputting voice; feature extraction means for extracting features of voice input from the voice input means; and input voice extracted by the feature extraction means. A vowel similarity calculating unit that calculates a similarity between at least one of a beginning part and an end part of a feature and a standard pattern of each vowel stored in the vowel dictionary; a feature extracted by the feature extracting unit; A word similarity calculating means for calculating a similarity with the standard pattern selected by the word dictionary selecting means; and a vowel for the similarity of each word calculated by the word similarity calculating means. Weighting means for performing weighting in accordance with the vowel similarity calculated by the similarity calculation means; and determination means for determining the input voice from the similarity after weighting by the weighting means. Characteristic speech recognition device.