JP2009251470A

JP2009251470A - In-vehicle information system

Info

Publication number: JP2009251470A
Application number: JP2008101885A
Authority: JP
Inventors: Yusuke Oku; 雄介奥; Takamitsu Suzuki; 孝光鈴木; Takuji Yamada; 卓司山田
Original assignee: Denso Corp; Toyota Motor Corp
Current assignee: Denso Corp; Toyota Motor Corp
Priority date: 2008-04-09
Filing date: 2008-04-09
Publication date: 2009-10-29
Anticipated expiration: 2028-04-09
Also published as: JP4938719B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an in-vehicle information system with improved speech recognition precision by learning the feature of speech of each user. <P>SOLUTION: This in-vehicle information system includes: a first input means to input instructions manually, a second input means to input voice, a speech recognition means to recognize the voice input into the second input means, an information processing means to output predetermined information based on the contents of the instructions input into the first input means or the semantic contents of the voice recognized by the speech recognition means, and a learning means to learn a recognition method of the speech recognition means based on data indicating the contents of the instructions and voice data indicating the voice input into the second input means if the instructions are input into the first input means within a predetermined time period after the voice is input in the second input means. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ユーザの発話の特徴を学習することにより音声認識の精度を向上させた車載情報システムに関する。 The present invention relates to an in-vehicle information system in which the accuracy of speech recognition is improved by learning features of a user's utterance.

従来より、車載用のナビゲーション装置では音声入力における認識精度を向上させるために種々の工夫がなされている。 2. Description of the Related Art Conventionally, various ingenuity has been made in an in-vehicle navigation device in order to improve recognition accuracy in voice input.

例えば、音声認識を行う際に誤認識が発生した場合に、複数回のやり直しを行った後においても誤認識が発生するときは、認識用のテンプレートを他のテンプレートに入れ替えて音声認識を再実行するナビゲーション装置が提案されている（例えば、特許文献１参照）。
特開２００２−１０８３８６号公報 For example, if a misrecognition occurs during voice recognition and the error is recognized even after multiple redoes, replace the recognition template with another template and re-execute the speech recognition. A navigation device has been proposed (see, for example, Patent Document 1).
JP 2002-108386 A

ところで、上述のような従来のナビゲーション装置では、音声認識に用いるテンプレートは不特定多数のユーザを想定して作製されているため、他のテンプレートに入れ替えても、精度の向上には限界があった。 By the way, in the conventional navigation apparatus as described above, since the template used for speech recognition is prepared assuming an unspecified number of users, there is a limit in improving accuracy even if it is replaced with another template. .

そこで、本発明は、個々のユーザの発話の特徴を学習することにより、音声認識精度の向上を図った車載情報システムを提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide an in-vehicle information system in which speech recognition accuracy is improved by learning features of individual users' utterances.

本発明の一局面の車載情報システムは、手動操作により指令を入力する第１入力手段と、音声を入力する第２入力手段と、前記第２入力手段に入力される音声を音声認識する音声認識手段と、前記第１入力手段に入力される指令の内容、又は前記音声認識手段によって音声認識される音声の意味内容に基づき、所定の情報を出力する情報処理手段と、前記第２入力手段に音声が入力された後の所定時間内に前記第１入力手段へ指令が入力されると、当該指令の内容を表すデータと、前記第２入力手段に入力される音声を表す音声データとに基づき、前記音声認識手段における認識手法を学習する学習手段とを含む。 An in-vehicle information system according to one aspect of the present invention includes a first input unit that inputs a command by manual operation, a second input unit that inputs a voice, and a voice recognition that recognizes a voice input to the second input unit. Means, information processing means for outputting predetermined information based on the content of the command input to the first input means, or the meaning content of the voice recognized by the voice recognition means, and the second input means When a command is input to the first input means within a predetermined time after the voice is input, based on data representing the content of the command and voice data representing the voice input to the second input means. Learning means for learning a recognition method in the voice recognition means.

また、前記学習手段は、前記第１入力手段に入力された指令の内容を表すデータと、前記音声データとの一致度合いに基づき、前記第２入力手段に入力される音声の特徴を学習することにより、前記音声認識手段における認識手法を学習してもよい。 Further, the learning means learns the characteristics of the voice input to the second input means based on the degree of coincidence between the data representing the content of the command input to the first input means and the voice data. Thus, the recognition method in the voice recognition means may be learned.

また、音声の特徴を評価するための基準値と、音声データの意味内容を特定するための閾値とを格納する音声認識辞書をさらに含み、前記音声認識手段は、前記第２入力手段に入力される音声の特徴の評価値と前記音声認識辞書に格納された基準値との類似度を前記閾値と比較することによって当該音声の意味内容を認識するように構成されており、前記学習手段は、前記第１入力手段に入力された指令の内容を表すデータと、前記音声データとの一致度合いに基づいて前記閾値を変更することにより、前記音声認識手段における認識手法を学習してもよい。 The voice recognition dictionary further stores a reference value for evaluating the feature of the voice and a threshold value for specifying the semantic content of the voice data, and the voice recognition means is input to the second input means. The speech content evaluation value and the reference value stored in the speech recognition dictionary are compared with the threshold value to recognize the semantic content of the speech, and the learning means The recognition method in the voice recognition unit may be learned by changing the threshold based on the degree of coincidence between the data representing the content of the command input to the first input unit and the voice data.

また、前記音声の特徴は、声紋、アクセント、ピッチ、又は発話内容の少なくともいずれか一つであってもよい。 The voice feature may be at least one of a voice print, an accent, a pitch, and an utterance content.

また、前記第１入力手段はタッチパネル式表示手段、又は遠隔操作装置であってもよい。 The first input means may be a touch panel display means or a remote operation device.

前記情報処理手段は、ナビゲーション装置の演算処理手段に含まれてもよい。 The information processing means may be included in arithmetic processing means of the navigation device.

本発明によれば、個々のユーザの発話の特徴を学習することにより、音声認識精度の向上を図った車載情報システムを提供できるという特有の効果が得られる。 According to the present invention, it is possible to provide a unique effect that an in-vehicle information system with improved voice recognition accuracy can be provided by learning the features of each user's utterance.

以下、本発明の車載情報システムを適用した実施の形態について説明する。 Embodiments to which the in-vehicle information system of the present invention is applied will be described below.

図１は、本実施の形態の車載情報システムの構成を示す図である。この車載情報システムはナビゲーション装置１０であり、以下で説明する学習機能はナビゲーションＥＣＵ(Electronic Control Unit)１１によって実現される。 FIG. 1 is a diagram showing a configuration of an in-vehicle information system according to the present embodiment. This in-vehicle information system is a navigation device 10, and a learning function described below is realized by a navigation ECU (Electronic Control Unit) 11.

このナビゲーション装置１０は、ナビゲーションＥＣＵ１１に加えて、タッチパネル１２、現在位置検出部１３、方位検出部１４、ルート検索部１５、地図データベース１６、マイク１７、音声認識辞書１８、受信部１９、及び遠隔操作装置２０を備える。 In addition to the navigation ECU 11, the navigation device 10 includes a touch panel 12, a current position detection unit 13, an orientation detection unit 14, a route search unit 15, a map database 16, a microphone 17, a voice recognition dictionary 18, a reception unit 19, and a remote operation. A device 20 is provided.

ナビゲーションＥＣＵ１１は、図示しないバスを介してＣＰＵ(Central Processing Unit)、ＲＯＭ(Read Only Memory)、及びＲＡＭ(Random Access Memory)等からなるマイクロコンピュータを中心として構成される。 The navigation ECU 11 is configured around a microcomputer including a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like via a bus (not shown).

タッチパネル１２には、電子地図と自車両の位置のほか、ナビゲーション装置１０の所定の機能（例えば、ルート検索機能等）を実現するための入力スイッチが表示される。このタッチパネル１２は、手動操作により指令を入力する入力手段であり、例えば、液晶モニタとマトリクススイッチとを組み合わせたものであればよく、パネルに表示した入力スイッチが利用者に押圧されることにより、操作可能に構成されていればよい。 In addition to the electronic map and the position of the host vehicle, the touch panel 12 displays input switches for realizing predetermined functions (for example, a route search function) of the navigation device 10. This touch panel 12 is an input means for inputting a command by manual operation. For example, the touch panel 12 may be a combination of a liquid crystal monitor and a matrix switch. When the input switch displayed on the panel is pressed by the user, What is necessary is just to be comprised so that operation is possible.

現在位置検出部１３は、衛星航法システム（ＧＰＳ：Global Positioning System）を利用して車両の現在位置及び走行速度等を計算するＧＰＳ受信装置で構成される。 The current position detection unit 13 is composed of a GPS receiver that calculates the current position and traveling speed of the vehicle using a satellite navigation system (GPS).

方位検出部１４は、ジャイロコンパスで構成される。 The direction detection unit 14 is configured by a gyrocompass.

ルート検索部１５は、利用者から指定された検索条件に重み付けを行うことにより、候補のルートの中から最適ルートを検索するように構成される。 The route search unit 15 is configured to search for an optimum route from candidate routes by weighting search conditions designated by the user.

また、地図データベース１６は、ナビゲーション装置に必要な電子地図を格納できればよく、例えば、ハードディスクで構成される。 Moreover, the map database 16 should just be able to store the electronic map required for a navigation apparatus, for example, is comprised with a hard disk.

マイク１７は、音声を入力する入力手段であり、入力された音声は、音声データに変換されてナビゲーションＥＣＵ１１に入力される。ナビゲーションＥＣＵ１１は、後述する音声認識機能により、音声データの意味内容を認識する。 The microphone 17 is an input means for inputting sound. The input sound is converted into sound data and input to the navigation ECU 11. The navigation ECU 11 recognizes the meaning content of the voice data by a voice recognition function described later.

音声認識辞書１８は、ナビゲーションＥＣＵ１１が音声認識機能を実行する際に利用される辞書であり、様々な音声の音声データの評価の基準となる基準値と、音声データを特定するための判定に用いる閾値と、その音声の意味内容を表す意味内容データとを関連付けて群データとして格納するデータベースであり、例えば、ハードディスクで構成される。なお、音声データを特定するための判定に用いる閾値については後述する。 The voice recognition dictionary 18 is a dictionary used when the navigation ECU 11 executes the voice recognition function, and is used for a reference value as a reference for evaluating voice data of various voices and a determination for specifying the voice data. A database that stores threshold values and semantic content data representing the semantic content of the speech in association with each other and stores them as group data, for example, a hard disk. In addition, the threshold value used for the determination for specifying audio | voice data is mentioned later.

受信部１９は、遠隔操作装置２０から送信される指令を受信する受信手段であり、遠隔操作装置２０に手動操作によって入力される指令を受信し、ナビゲーションＥＣＵ１１に伝送するように構成されている。 The receiving unit 19 is a receiving unit that receives a command transmitted from the remote control device 20, and is configured to receive a command input to the remote control device 20 by a manual operation and transmit the command to the navigation ECU 11.

ナビゲーション装置１０は、利用者がタッチパネル１２又は遠隔装置２０を通じて手動操作による指令を入力できるとともに、マイク１７を通じて音声による指令を入力できるように構成されており、上述の検出部等（１３、１４、１５）によって検出される情報や地図データを用いて電子地図上における自車両の位置を表す位置データを導出することにより、利用者に所望の案内情報を提供できるように構成されていればよい。 The navigation device 10 is configured such that a user can input a command by manual operation through the touch panel 12 or the remote device 20, and can input a command by voice through the microphone 17, and the above-described detection unit (13, 14, It is only necessary to be configured so that desired guidance information can be provided to the user by deriving position data representing the position of the host vehicle on the electronic map using the information detected in 15) and map data.

本実施の形態では、ナビゲーションＥＣＵ１１は、利用者に所望の案内情報を提供するための案内機能に加えて、利用者の発話の特徴に基づいて音声認識機能における認識手法を学習する学習機能を有する。この学習機能については図２を用いて説明する。 In the present embodiment, the navigation ECU 11 has a learning function for learning a recognition method in the voice recognition function based on the features of the user's utterance, in addition to the guidance function for providing desired guidance information to the user. . This learning function will be described with reference to FIG.

図２は、本実施の形態の車載情報システムにおける音声認識手法の学習機能を示すブロック図である。 FIG. 2 is a block diagram showing the learning function of the speech recognition method in the in-vehicle information system of the present embodiment.

ナビゲーションＥＣＵ１１は、入力管理部２１、入力解析制御部２２、手動入力解析部２３、音声認識部２４、及び音声認識学習部２５を含む。なお、この図２は、ナビゲーションＥＣＵ１１が含む機能のうち、音声認識機能と認識手法の学習機能を実現するために必要なブロックだけを示すが、ナビゲーションＥＣＵ１１は、利用者に所望の案内情報を提供するための案内機能を実現するための他のブロックを含むものである。 The navigation ECU 11 includes an input management unit 21, an input analysis control unit 22, a manual input analysis unit 23, a speech recognition unit 24, and a speech recognition learning unit 25. FIG. 2 shows only the blocks necessary for realizing the speech recognition function and the learning function of the recognition method among the functions included in the navigation ECU 11. The navigation ECU 11 provides the user with desired guidance information. It includes other blocks for realizing a guidance function for the purpose.

入力管理部２１は、タッチパネル１２、マイク１７、又は遠隔操作装置２０に入力される指令の管理を行う。入力された指令は、入力解析制御部２２に伝送される。 The input management unit 21 manages commands input to the touch panel 12, the microphone 17, or the remote operation device 20. The input command is transmitted to the input analysis control unit 22.

入力解析制御部２２は、入力される指令を解析する機能を有し、入力管理部２１から入力される指令を手動入力解析部２３又は音声認識部２４に伝送する。タッチパネル１２及び遠隔装置２０から入力される指令は、手動入力解析部２３に伝送され、マイク１７から入力される指令は、音声データに変換されて音声認識部２４に伝送される。 The input analysis control unit 22 has a function of analyzing an input command, and transmits the command input from the input management unit 21 to the manual input analysis unit 23 or the voice recognition unit 24. A command input from the touch panel 12 and the remote device 20 is transmitted to the manual input analysis unit 23, and a command input from the microphone 17 is converted into voice data and transmitted to the voice recognition unit 24.

また、この入力解析制御部２２は、入力管理部２１を介してマイク１７から指令が入力された場合には、その後の経過時間をカウントする。この場合において、所定時間以内に入力管理部２１を介してタッチパネル１２又は遠隔操作装置２０から指令が入力された場合には、その旨を表す信号を音声認識学習部２５に伝送する。なお、この所定時間は、例えば５秒間に設定される。 In addition, when an instruction is input from the microphone 17 via the input management unit 21, the input analysis control unit 22 counts the elapsed time thereafter. In this case, when a command is input from the touch panel 12 or the remote operation device 20 via the input management unit 21 within a predetermined time, a signal indicating that is transmitted to the speech recognition learning unit 25. The predetermined time is set to 5 seconds, for example.

手動入力解析部２３は、タッチパネル１２及び遠隔装置２０から入力される指令を解析する。タッチパネル１２及び遠隔装置２０から入力される指令は、ナビゲーション装置１０を操作するための指令である。例えば、現在地を入力するために、タッチパネル１２又は遠隔装置２０の「現在地」ボタンが押された場合は、現在地を入力するモードを選択するための指令となる。手動入力解析部２３は、必要に応じてこの指令を音声認識学習部２５に伝送する。 The manual input analysis unit 23 analyzes commands input from the touch panel 12 and the remote device 20. A command input from the touch panel 12 and the remote device 20 is a command for operating the navigation device 10. For example, when the “current location” button on the touch panel 12 or the remote device 20 is pressed in order to input the current location, a command for selecting a mode for inputting the current location is used. The manual input analysis unit 23 transmits this command to the speech recognition learning unit 25 as necessary.

音声認識部２４は、音声認識辞書１８を用いて、入力解析制御部２２から伝送される音声データの意味内容を認識する。この認識処理は、音声データに含まれる複数の認証子を評価することによって行われる。認証子は、声紋、アクセント、ピッチ、及び発話内容であり、これらのうち、声紋、アクセント、及びピッチは発話の特徴を表す。 The voice recognition unit 24 recognizes the meaning content of the voice data transmitted from the input analysis control unit 22 using the voice recognition dictionary 18. This recognition process is performed by evaluating a plurality of authenticators included in the voice data. The authenticator is a voiceprint, accent, pitch, and utterance content. Among these, the voiceprint, accent, and pitch represent the characteristics of the utterance.

また、この評価は、音声データの評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）と、評価基準となるパラメータ（声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄ）との類似度を判定することによって行われる。これらのパラメータ（声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄ）の値（基準値）を表すデータは、音声認識辞書１８に格納されている。 This evaluation is performed by evaluating voice data evaluation values (voice print evaluation value, accent evaluation value, pitch evaluation value, and utterance content evaluation value) and evaluation reference parameters (voice print parameter Pa, accent parameter Pb, pitch use). This is done by determining the similarity between the parameter Pc and the speech content parameter Pd). Data representing values (reference values) of these parameters (voice print parameter Pa, accent parameter Pb, pitch parameter Pc, and utterance content parameter Pd) are stored in the speech recognition dictionary 18.

声紋及び発話内容の評価は、例えば、隠れマルコフモデル（ＨＭＭ：Hidden Markov Model）を用いて、音声データに含まれる単語や音素の評価を行うことによって行われる。この評価には、評価基準として声紋用パラメータＰａ及び発話内容用パラメータＰｄが用いられる。 The evaluation of the voiceprint and the utterance content is performed, for example, by evaluating words and phonemes included in the voice data using a Hidden Markov Model (HMM). For this evaluation, a voiceprint parameter Pa and a speech content parameter Pd are used as evaluation criteria.

アクセントの評価は、音声データが表す声の高さの分布を用いて認証を行うことによって実現される。この評価には、評価基準としてアクセント用パラメータＰｂが用いられる。 Accent evaluation is realized by performing authentication using a voice pitch distribution represented by voice data. In this evaluation, an accent parameter Pb is used as an evaluation criterion.

また、ピッチの評価は、音声データが表す声の高さを用いて認証を行うことによって実現される。この評価には、評価基準としてピッチ用パラメータＰｃが用いられる。 The pitch evaluation is realized by performing authentication using the voice pitch represented by the voice data. In this evaluation, a pitch parameter Pc is used as an evaluation criterion.

類似度の判定は、各々の評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）と、評価基準となるパラメータ（声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄ）の各々の値（基準値）との類似度が、声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、及び発話内容閾値Ｔｄの各々の値以上であるか否かによって行われる。すべての類似度が閾値以上であれば類似していると判定され、いずれかの類似度が閾値未満であれば非類似と判定される。 The similarity is determined by evaluating each evaluation value (voice print evaluation value, accent evaluation value, pitch evaluation value, and utterance content evaluation value) and evaluation reference parameters (voice print parameter Pa, accent parameter Pb, pitch parameter). Whether the degree of similarity with each value (reference value) of Pc and utterance content parameter Pd) is equal to or greater than each of voice print threshold Ta, accent threshold Tb, pitch threshold Tc, and utterance content threshold Td Is done by. If all the similarities are greater than or equal to the threshold, it is determined that they are similar, and if any of the similarities is less than the threshold, it is determined that they are dissimilar.

図３は、本実施の形態の車載情報システムにおける音声認識辞書１８のデータ構造を示す図である。このように、音声認識辞書内には、様々な言葉毎に識別ＩＤ（Identification）が割り振られ、識別ＩＤ毎に、声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、発話内容閾値Ｔｄ、及び意味内容ＩＤが関連付けられて格納されている。 FIG. 3 is a diagram showing a data structure of the speech recognition dictionary 18 in the in-vehicle information system of the present embodiment. In this way, identification IDs (Identification) are assigned to various words in the speech recognition dictionary, and for each identification ID, voiceprint threshold Ta, accent threshold Tb, pitch threshold Tc, utterance content threshold Td, and semantic content. An ID is stored in association with each other.

なお、図３に示すデータは音声認識辞書１８に格納されているデータの一部であり、実際には様々な言葉のデータが格納されている。また、図３には示さないが識別ＩＤ毎に、その言葉についての平均的な評価基準となるパラメータ（声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄ）の値（基準値）を表すデータも格納されている。 Note that the data shown in FIG. 3 is a part of the data stored in the voice recognition dictionary 18, and actually data of various words is stored. Further, although not shown in FIG. 3, for each identification ID, parameters (voice print parameter Pa, accent parameter Pb, pitch parameter Pc, and utterance content parameter Pd) serving as an average evaluation standard for the word are included. Data representing a value (reference value) is also stored.

例えば、識別ＩＤ「０００１」の言葉は、声紋閾値Ｔａが０．８、アクセント閾値Ｔｂが０．９、ピッチ閾値Ｔｃが０．７、及び発話内容閾値Ｔｄが０．７５で表される言葉であり、これらのパラメータを満たす言葉の意味は意味内容ＩＤが「Ｍ００００１」とされている。 For example, the word with the identification ID “0001” is a word represented by a voiceprint threshold Ta of 0.8, an accent threshold Tb of 0.9, a pitch threshold Tc of 0.7, and an utterance content threshold Td of 0.75. The meaning of the words satisfying these parameters is “M00001” as the semantic content ID.

ここで、意味内容ＩＤ「Ｍ００００１」に対応する意味内容データは「現在地」を表すこととすると、認識手法は次の通りである。 Here, if the semantic content data corresponding to the semantic content ID “M00001” represents “current location”, the recognition method is as follows.

音声認識部２４は、声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄを用いて、音声データの評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）との類似度を演算する。 The voice recognition unit 24 uses the voiceprint parameter Pa, the accent parameter Pb, the pitch parameter Pc, and the speech content parameter Pd to evaluate the voice data evaluation value (voiceprint evaluation value, accent evaluation value, pitch evaluation value, and The similarity to the utterance content evaluation value) is calculated.

音声認識部２４は、演算した声紋の類似度、アクセントの類似度、ピッチの類似度、及び発話内容の類似度のすべてが声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、及び発話内容閾値Ｔｄの値以上である場合に、音声データと識別ＩＤ「０００１」の言葉が類似していると判定する。 The voice recognizing unit 24 determines that the calculated voiceprint similarity, accent similarity, pitch similarity, and speech content similarity are all the voiceprint threshold Ta, accent threshold Tb, pitch threshold Tc, and speech content threshold Td. If it is equal to or greater than the value, it is determined that the voice data and the word with the identification ID “0001” are similar.

音声認識部２４は、識別ＩＤ「０００１」に関連付けられた意味内容ＩＤ「Ｍ００００１」によって特定される意味内容を音声認識辞書１８から読み出す。これにより、音声データが表す意味内容が認識される。 The voice recognition unit 24 reads out the semantic content specified by the semantic content ID “M00001” associated with the identification ID “0001” from the voice recognition dictionary 18. Thereby, the meaning content represented by the audio data is recognized.

具体的には、利用者がマイク１７に「げんざいち」という音声を入力した場合に、音声データの評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）が様々な識別ＩＤの声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄと照合され、類似度が評価されることにより、識別ＩＤ「０００１」がヒットする。 Specifically, when the user inputs a voice “Genzaichi” to the microphone 17, there are various voice data evaluation values (voice print evaluation value, accent evaluation value, pitch evaluation value, and utterance content evaluation value). The identification ID “0001” is hit by collating the voice ID parameter Pa, the accent parameter Pb, the pitch parameter Pc, and the utterance content parameter Pd of the unique identification ID and evaluating the similarity.

これにより、音声認識部２４は、入力された音声データの意味内容は、意味内容ＩＤ「Ｍ００００１」によって表されると判定し、タッチパネル１２に「現在地」という言葉を表示する。このようにして、音声認識部２４によって音声データが「現在地」という意味内容を表すと認識されることになる。 Thereby, the voice recognition unit 24 determines that the meaning content of the input voice data is represented by the meaning content ID “M00001”, and displays the word “current location” on the touch panel 12. In this way, the voice recognition unit 24 recognizes that the voice data represents the meaning content “current location”.

ところが、声紋の類似度、アクセントの類似度、ピッチの類似度、又は発話内容の類似度のうちの少なくともいずれか一つが声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、又は発話内容閾値Ｔｄに満たない場合は、音声データと類似する言葉は音声認識辞書１８から見つからず、利用者によってタッチパネル１２又は遠隔操作装置２０に入力がなされる場合が想定される。このように、タッチパネル１２又は遠隔操作装置２０に入力がなされた場合には、音声認識学習部２５によって次のように学習が行われる。 However, at least one of voiceprint similarity, accent similarity, pitch similarity, or speech content similarity satisfies voiceprint threshold Ta, accent threshold Tb, pitch threshold Tc, or speech content threshold Td. When there is no word, it is assumed that words similar to the voice data are not found in the voice recognition dictionary 18 and are input to the touch panel 12 or the remote control device 20 by the user. As described above, when an input is made to the touch panel 12 or the remote control device 20, the speech recognition learning unit 25 performs learning as follows.

音声認識学習部２５は、マイク１７への音声の入力後の所定時間内にタッチパネル１２又は遠隔操作装置２０に入力があったことを表す信号が入力解析制御部２２から入力された場合に、音声認識部２４によって認識された意味内容と、タッチパネル１２又は遠隔操作装置２０に入力された指令の内容とに基づき、認識手法の学習を行う。 The voice recognition learning unit 25 receives a voice signal from the input analysis control unit 22 when a signal indicating that there is an input to the touch panel 12 or the remote control device 20 within a predetermined time after the voice is input to the microphone 17. The recognition method is learned based on the meaning content recognized by the recognition unit 24 and the content of the command input to the touch panel 12 or the remote operation device 20.

マイク１７への音声の入力後の所定時間内（５秒以内）にタッチパネル１２又は遠隔操作装置２０に入力があったことを表す信号が入力解析制御部２２から音声認識学習部２５に入力される場合は、音声認識が正しく行われなかったためにタッチパネル１２又は遠隔操作装置２０が操作された場合と想定される。 A signal indicating that there is an input to the touch panel 12 or the remote control device 20 within a predetermined time (within 5 seconds) after the voice is input to the microphone 17 is input from the input analysis control unit 22 to the voice recognition learning unit 25. In this case, it is assumed that the touch panel 12 or the remote control device 20 is operated because voice recognition is not correctly performed.

このため、本実施の形態のナビゲーション装置１０は、利用者の音声の特徴を学習することにより音声認識の精度の向上を図るべく、音声認識部２４によって認識された意味内容と、タッチパネル１２又は遠隔操作装置２０に入力された指令の内容との類似度に基づき、利用者の音声の特徴に合わせて声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、又は発話内容閾値Ｔｄの値を変更する。 For this reason, the navigation device 10 according to the present embodiment learns the features of the user's voice to improve the accuracy of the voice recognition, and the meaning content recognized by the voice recognition unit 24 and the touch panel 12 or the remote Based on the similarity to the content of the command input to the controller device 20, the value of the voiceprint threshold Ta, the accent threshold Tb, the pitch threshold Tc, or the speech content threshold Td is changed in accordance with the user's voice characteristics.

このように、利用者の音声の特徴に合わせていずれかの閾値を変更することにより、音声認識処理における評価基準が変更されるので、変更前は正しく音声認識されなかった利用者の発話が正しく音声認識されるようになり、音声認識の精度の向上を図ることができる。 In this way, by changing one of the thresholds according to the characteristics of the user's voice, the evaluation criteria in the voice recognition process are changed, so that the user's utterance that was not correctly recognized before the change is correct. Voice recognition is started, and the accuracy of voice recognition can be improved.

ここで、利用者が「げんざいち」という音声をマイク１７に入力した場合に、音声データのアクセントを表す評価値が「現在地」という単語のアクセントパラメータＰｂと類似していると判定されずに音声認識が正しく行われなかった場合を具体例として説明する。 Here, when the user inputs the voice “Genzaichi” to the microphone 17, it is not determined that the evaluation value indicating the accent of the voice data is similar to the accent parameter Pb of the word “current location”. A case where voice recognition is not performed correctly will be described as a specific example.

この場合、音声の入力後５秒以内に利用者によってタッチパネル１２の「現在地」ボタンが押されると、音声認識学習部２５は、タッチパネル１２に入力された指令が表す言葉を特定する声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、及び発話内容閾値Ｔｄの値を音声認識辞書１８から読み出す。 In this case, when the “current location” button on the touch panel 12 is pressed by the user within 5 seconds after the voice is input, the voice recognition learning unit 25 specifies the voiceprint threshold Ta, which specifies the word represented by the command input to the touch panel 12. The values of the accent threshold value Tb, the pitch threshold value Tc, and the speech content threshold value Td are read from the speech recognition dictionary 18.

音声認識学習部２５は、読み出した閾値（Ｔａ〜Ｔｄ）と類似度を比較し、どの類似度が閾値に満たなかったのかを特定する。 The speech recognition learning unit 25 compares the read threshold value (Ta to Td) with the similarity, and specifies which similarity is less than the threshold value.

この場合は、声紋の類似度、ピッチの類似度、及び発話内容の類似度の各々が声紋閾値Ｔａ、ピッチ閾値Ｔｃ、及び発話内容閾値Ｔｄの各々の値以上であると判定したが、アクセントの類似度がアクセント閾値Ｔｂ未満であったため、アクセント閾値Ｔｂの値を変更する。 In this case, it is determined that the similarity of the voiceprint, the similarity of the pitch, and the similarity of the utterance content are not less than the values of the voiceprint threshold Ta, the pitch threshold Tc, and the utterance content threshold Td. Since the similarity is less than the accent threshold value Tb, the value of the accent threshold value Tb is changed.

ここで、変更後のアクセント閾値Ｔｂ（変更後）、変更前のアクセント閾値Ｔｂ（変更前）、及び変更前のアクセント閾値Ｔｂとアクセントの類似度との差ΔＴｂを用いると、Ｔｂ（変更後）は次のように表される。 Here, if the accent threshold value Tb after change (after change), the accent threshold value Tb before change (before change), and the difference ΔTb between the accent threshold value Tb before change and the similarity between accents are used, Tb (after change) Is expressed as:

Ｔｂ（変更後）＝Ｔｂ（変更前）−ΔＴｂ
ここで、ΔＴｂ＝Ｔｂ（変更前）−Ｋであり、Ｋは「音声認識が正しく行われなかった場合のアクセントの類似度」である。 Tb (after change) = Tb (before change) −ΔTb
Here, ΔTb = Tb (before change) −K, and K is “accent similarity when speech recognition is not performed correctly”.

すなわち、Ｔｂ（変更前）からΔＴｂを減算することになる。これにより、アクセント閾値Ｔｂ（変更後）は、「音声認識が正しく行われなかった場合のアクセントの類似度（Ｋ）」と同一の値に設定される。これが音声認識手法の学習である。 That is, ΔTb is subtracted from Tb (before change). Thereby, the accent threshold value Tb (after change) is set to the same value as “accent similarity (K) when speech recognition is not performed correctly”. This is learning of a speech recognition method.

これにより、次回、利用者が「げんざいち」という音声をマイク１７に入力した場合は、声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値と、声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄとは類似すると判定されるため、音声認識学習部２５によって利用者の音声データの意味内容が「現在地」であると正しく認識されるようになる。 Thereby, when the user inputs the voice “Genzaichi” to the microphone 17 next time, the voiceprint evaluation value, the accent evaluation value, the pitch evaluation value, the utterance content evaluation value, the voiceprint parameter Pa, and the accent Since it is determined that the parameter Pb, the pitch parameter Pc, and the speech content parameter Pd are similar, the speech recognition learning unit 25 correctly recognizes that the meaning content of the user's speech data is “current location”. Become.

これは、声紋評価値、ピッチ評価値、又は発話内容評価値のいずれかが声紋用パラメータＰａ、ピッチ用パラメータＰｃ、又は発話内容用パラメータＰｄと類似しないと判定された場合においても同様であり、音声認識学習部２５によって声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、又は発話内容閾値Ｔｄが変更されることにより、次回からは音声認識が正しく行われることになる。 This is the same even when it is determined that any of the voiceprint evaluation value, the pitch evaluation value, or the speech content evaluation value is not similar to the voiceprint parameter Pa, the pitch parameter Pc, or the speech content parameter Pd. When the voice recognition threshold value Ta, the accent threshold value Tb, the pitch threshold value Tc, or the utterance content threshold value Td is changed by the voice recognition learning unit 25, voice recognition is correctly performed from the next time.

図４は、本実施の形態の車載情報システムにおける音声認識手法の学習処理の処理手順を示す図である。なお、図２に示す処理は、ナビゲーションＥＣＵ１１によって実行される。 FIG. 4 is a diagram showing a processing procedure of a learning process of a voice recognition method in the in-vehicle information system of the present embodiment. Note that the process shown in FIG. 2 is executed by the navigation ECU 11.

ナビゲーション装置１０の電源がオンにされると、ナビゲーションＥＣＵ１１は、本実施の形態の車載情報システムにおける認識手法の学習処理の処理手順を開始する（スタート）。 When the power of the navigation device 10 is turned on, the navigation ECU 11 starts the processing procedure of the learning process of the recognition method in the in-vehicle information system of the present embodiment (start).

ナビゲーションＥＣＵ１１は、マイク１７に入力された音声を音声データに変換する（ステップＳ１）。この処理は、ナビゲーションＥＣＵ１１の入力解析制御部２２としての機能によって実行される処理であり、マイク１７から出力される音声信号をデジタル変換することにより音声データが得られる。 The navigation ECU 11 converts the voice input to the microphone 17 into voice data (step S1). This process is a process executed by the function of the navigation ECU 11 as the input analysis control unit 22, and audio data is obtained by digitally converting the audio signal output from the microphone 17.

ナビゲーションＥＣＵ１１は、音声データに対して音声認識処理を行う（ステップＳ２）。この音声認識処理は、ナビゲーションＥＣＵ１１の音声認識部２４としての機能によって実行される処理であり、音声認識辞書１８を用いて、入力解析制御部２２から伝送される音声データの意味内容を認識する。 The navigation ECU 11 performs voice recognition processing on the voice data (step S2). This voice recognition process is a process executed by the function of the navigation ECU 11 as the voice recognition unit 24, and recognizes the meaning content of the voice data transmitted from the input analysis control unit 22 using the voice recognition dictionary 18.

具体的には、音声データの評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）を演算する。 Specifically, the voice data evaluation values (voice print evaluation value, accent evaluation value, pitch evaluation value, and speech content evaluation value) are calculated.

次いで、ナビゲーションＥＣＵ１１は、音声の特徴の評価結果を分析する（ステップＳ３）。この処理は、ナビゲーションＥＣＵ１１の音声認識部２４としての機能によって実行される処理であり、具体的には、ステップＳ２における比較の結果、音声データの評価値（声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値）を用いて音声認識辞書１８に格納されたデータと照合し、声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄがすべて類似するデータが見つかった場合は、そのデータの識別ＩＤに関連付けられた意味内容ＩＤに対応する意味内容の文言を音声認識辞書１８から抽出する。 Next, the navigation ECU 11 analyzes the evaluation result of the voice feature (step S3). This process is a process executed by the function of the navigation ECU 11 as the voice recognition unit 24. Specifically, as a result of the comparison in step S2, an evaluation value of voice data (voice print evaluation value, accent evaluation value, pitch evaluation). Data and voice content parameter Pa, and voice pattern parameter Pa, accent parameter Pb, pitch parameter Pc, and utterance content parameter Pd are all similar to each other. Is found, the semantic content word corresponding to the semantic content ID associated with the identification ID of the data is extracted from the speech recognition dictionary 18.

次いで、ナビゲーションＥＣＵ１１は、ステップＳ３で抽出した文言をタッチパネル１２に表示する（ステップＳ４）。利用者の確認のためである。 Next, the navigation ECU 11 displays the word extracted in step S3 on the touch panel 12 (step S4). This is for user confirmation.

さらに、ナビゲーションＥＣＵ１１は、タッチパネル１２又は遠隔操作装置２０への手動操作の入力を監視する（ステップＳ５）。音声認識が正しく行われていない場合は、利用者によってタッチパネル１２又は遠隔操作装置２０への手動操作が行われる可能性が高いからである。 Furthermore, navigation ECU11 monitors the input of the manual operation to the touch panel 12 or the remote control device 20 (step S5). This is because when voice recognition is not performed correctly, the user is likely to perform manual operation on the touch panel 12 or the remote control device 20.

次いで、ナビゲーションＥＣＵ１１は、マイク１７に音声が入力されてから５秒間の間にタッチパネル１２又は遠隔操作装置２０に手動操作が入力されたか否かを判定する（ステップＳ６）。音声認識が正しく行われていない場合は、認識手法の学習が必要になるからである。 Next, the navigation ECU 11 determines whether or not a manual operation has been input to the touch panel 12 or the remote operation device 20 within 5 seconds after the sound is input to the microphone 17 (step S6). This is because if speech recognition is not performed correctly, learning of a recognition method is required.

ナビゲーションＥＣＵ１１は、マイク１７に音声が入力されてから５秒間の間にタッチパネル１２又は遠隔操作装置２０に手動操作が入力されたと判定した場合は、その手動操作を解析する（ステップＳ７）。例えば、タッチパネル１２に表示されている「現在地」ボタンが押された場合は、「現在地」ボタンが押されたことがナビゲーションＥＣＵ１１によって解析される。 When the navigation ECU 11 determines that a manual operation has been input to the touch panel 12 or the remote control device 20 within 5 seconds after the voice is input to the microphone 17, the navigation ECU 11 analyzes the manual operation (step S7). For example, when the “current location” button displayed on the touch panel 12 is pressed, the navigation ECU 11 analyzes that the “current location” button is pressed.

ナビゲーションＥＣＵ１１は、ステップＳ７で解析した操作内容が表す言葉を特定するために必要な閾値（Ｔａ〜Ｔｄ）を音声認識辞書１８から読み出し、ステップＳ１で取得した音声データについて演算された類似度と比較することにより、閾値（Ｔａ〜Ｔｄ）に満たない類似度が声紋、アクセント、ピッチ、又は発話内容のいずれの類似度であるかを特定する（ステップＳ８）。 The navigation ECU 11 reads a threshold (Ta to Td) necessary for specifying the word represented by the operation content analyzed in step S7 from the speech recognition dictionary 18, and compares it with the similarity calculated for the speech data acquired in step S1. By doing so, it is specified whether the similarity that is less than the threshold (Ta to Td) is the similarity of the voiceprint, accent, pitch, or utterance content (step S8).

ナビゲーションＥＣＵ１１は、ステップＳ８で類似度が閾値に満たないと判定されたアクセント閾値Ｔｂ（変更前）からΔＴｂを減じる（ステップＳ９）。これにより、アクセント閾値Ｔｂの値は、ステップＳ８で特定されたアクセントの類似度（Ｋ）と同一の値になる。ここで、ΔＴｂはＴｂ（変更前）と「音声認識が正しく行われなかった場合のアクセントの類似度（Ｋ）」の差分である。 The navigation ECU 11 subtracts ΔTb from the accent threshold Tb (before change) determined that the similarity is less than the threshold in step S8 (step S9). As a result, the value of the accent threshold value Tb becomes the same value as the accent similarity (K) specified in step S8. Here, ΔTb is a difference between Tb (before change) and “Accent similarity (K) when speech recognition is not correctly performed”.

これは、例えば、「げんざいち」という音声がマイク１７に入力された場合に、音声データのアクセントを表す評価値が「現在地」という単語のアクセントパラメータＰｂと類似しないと判定されたために音声認識が正しく行われなかった場合に、次回の音声認識時には類似すると判定されるようにするために、アクセント閾値Ｔｂの値を変更する処理である。 This is because, for example, when a voice “Genzaichi” is input to the microphone 17, it is determined that the evaluation value indicating the accent of the voice data is not similar to the accent parameter Pb of the word “current location”. This is a process of changing the value of the accent threshold value Tb so that it is determined that it is similar at the next speech recognition when it is not correctly performed.

ナビゲーションＥＣＵ１１は、ステップＳ９で変更したアクセント閾値Ｔｂ（変更後）を音声認識辞書１８に登録する（ステップＳ１０）。 The navigation ECU 11 registers the accent threshold Tb (after change) changed in step S9 in the speech recognition dictionary 18 (step S10).

これにより、次回、利用者が「げんざいち」という音声をマイク１７に入力した場合は、声紋評価値、アクセント評価値、ピッチ評価値、及び発話内容評価値と、声紋用パラメータＰａ、アクセント用パラメータＰｂ、ピッチ用パラメータＰｃ、及び発話内容用パラメータＰｄとがそれぞれ類似すると判定されるため、音声認識学習部２５によって利用者の音声データの意味内容が「現在地」であると正しく認識されるようになる。 Thereby, when the user inputs the voice “Genzaichi” to the microphone 17 next time, the voiceprint evaluation value, the accent evaluation value, the pitch evaluation value, the utterance content evaluation value, the voiceprint parameter Pa, and the accent Since it is determined that the parameter Pb, the pitch parameter Pc, and the speech content parameter Pd are similar to each other, the speech recognition learning unit 25 correctly recognizes that the meaning content of the user's speech data is “current location”. become.

また、ステップＳ６において、マイク１７に音声が入力されてから５秒間の間にタッチパネル１２又は遠隔操作装置２０に手動操作が入力されなかったと判定した場合は、ナビゲーションＥＣＵ１１は、ステップＳ３で音声認識辞書１８から抽出された意味内容の文言によって特定される指令を用いてナビゲーション装置１０の処理を実行する（ステップＳ１１）。 If it is determined in step S6 that no manual operation is input to the touch panel 12 or the remote control device 20 within 5 seconds after the sound is input to the microphone 17, the navigation ECU 11 determines in step S3 that the speech recognition dictionary The process of the navigation device 10 is executed using the command specified by the semantic content extracted from 18 (step S11).

この場合は、「現在地」という意味内容の文言が正しく音声認識された場合であるので、ナビゲーションＥＣＵ１１は、その文言によって特定される指令をナビゲーションＥＣＵ１１内の所定の機能部に伝送する。 In this case, since the phrase having the meaning of “current location” is correctly recognized by voice, the navigation ECU 11 transmits a command specified by the phrase to a predetermined function unit in the navigation ECU 11.

このように、本実施の形態のナビゲーション装置１０によれば、利用者の音声の特徴を学習することにより音声認識の精度の向上を図るべく、音声認識部２４によって認識された意味内容と、タッチパネル１２又は遠隔操作装置２０に入力された指令の内容との一致性に基づき、利用者の音声の特徴に合わせて声紋閾値Ｔａ、アクセント閾値Ｔｂ、ピッチ閾値Ｔｃ、又は発話内容閾値Ｔｄの値を変更するので、変更前は正しく音声認識されなかった利用者の発話が正しく音声認識されるようになり、音声認識の精度の向上を図ることができる。 As described above, according to the navigation device 10 of the present embodiment, in order to improve the accuracy of speech recognition by learning the features of the user's speech, the meaning content recognized by the speech recognition unit 24, and the touch panel 12 or the voice print threshold value Ta, accent threshold value Tb, pitch threshold value Tc, or utterance content threshold value Td is changed in accordance with the characteristics of the user's voice based on the consistency with the content of the command input to the remote control device 12 or the remote control device 20. Therefore, the user's utterance that was not correctly recognized before the change is recognized correctly, and the accuracy of the speech recognition can be improved.

以上では、ナビゲーションＥＣＵ１１が音声認識手法の学習処理を実行する形態について説明したが、本実施の形態の車載情報システムを実現するための制御装置は、ナビゲーションＥＣＵとは別の専用のＥＣＵによって実現されてもよい。この専用ＥＣＵは、ナビゲーション装置１０の内部又は外部のいずれに配設されてもよい。 In the above, the form in which the navigation ECU 11 performs the learning process of the speech recognition method has been described. However, the control device for realizing the in-vehicle information system of the present embodiment is realized by a dedicated ECU different from the navigation ECU. May be. This dedicated ECU may be disposed either inside or outside the navigation device 10.

また、以上では、液晶モニタとマトリクススイッチとを組み合わせたタッチパネル１２を用いる場合について説明したが、パネルに表示した入力スイッチが利用者に押圧されることにより、操作可能に構成されるタッチパネル式表示手段であれば、上述のようなタッチパネル１２に限られるものではない。 In the above description, the case where the touch panel 12 in which the liquid crystal monitor and the matrix switch are combined is used has been described. However, the touch panel type display unit configured to be operable when the input switch displayed on the panel is pressed by the user. If it is, it is not restricted to the touch panel 12 as mentioned above.

また、以上では、車載情報システムがナビゲーション装置１０であり、このナビゲーション装置１０の音声認識機能に学習機能を付加した形態について説明したが、車載情報システムはナビゲーション装置１０に限定されるものではなく、オーディオやエアコン等の様々な車載装置が音声認識機能を有する場合は、その車載装置に本実施の形態の車載情報システムを適用することができる。 In the above description, the in-vehicle information system is the navigation apparatus 10 and the learning function is added to the voice recognition function of the navigation apparatus 10. However, the in-vehicle information system is not limited to the navigation apparatus 10. When various in-vehicle devices such as audio and air conditioners have a voice recognition function, the in-vehicle information system of the present embodiment can be applied to the in-vehicle devices.

以上、本発明の例示的な実施の形態の車載情報システムについて説明したが、本発明は、具体的に開示された実施の形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 As mentioned above, although the vehicle-mounted information system of exemplary embodiment of this invention was demonstrated, this invention is not limited to embodiment disclosed specifically, and does not deviate from a claim. Various modifications and changes are possible.

本実施の形態の車載情報システムの構成を示す図である。It is a figure which shows the structure of the vehicle-mounted information system of this Embodiment. 本実施の形態の車載情報システムにおける音声認識手法の学習機能を示すブロック図である。It is a block diagram which shows the learning function of the speech recognition method in the vehicle-mounted information system of this Embodiment. 本実施の形態の車載情報システムにおける音声認識辞書１８のデータ構造を示す図である。It is a figure which shows the data structure of the speech recognition dictionary 18 in the vehicle-mounted information system of this Embodiment. 本実施の形態の車載情報システムにおける音声認識手法の学習処理の処理手順を示す図である。It is a figure which shows the process sequence of the learning process of the speech recognition method in the vehicle-mounted information system of this Embodiment.

符号の説明Explanation of symbols

１０ナビゲーション装置
１１ナビゲーションＥＣＵ
１２タッチパネル
１３現在位置検出部
１４方位検出部
１５ルート検索部
１６地図データベース
１７マイク
１８音声認識辞書
１９受信部
２０遠隔操作装置
２１力管理部
２２入力解析制御部
２３手動入力解析部
２４音声認識部
２５音声認識学習部 10 Navigation Device 11 Navigation ECU
DESCRIPTION OF SYMBOLS 12 Touch panel 13 Current position detection part 14 Direction detection part 15 Route search part 16 Map database 17 Microphone 18 Voice recognition dictionary 19 Reception part 20 Remote operation device 21 Force management part 22 Input analysis control part 23 Manual input analysis part 24 Voice recognition part 25 Speech recognition learning unit

Claims

手動操作により指令を入力する第１入力手段と、
音声を入力する第２入力手段と、
前記第２入力手段に入力される音声を音声認識する音声認識手段と、
前記第１入力手段に入力される指令の内容、又は前記音声認識手段によって音声認識される音声の意味内容に基づき、所定の情報を出力する情報処理手段と、
前記第２入力手段に音声が入力された後の所定時間内に前記第１入力手段へ指令が入力されると、当該指令の内容を表すデータと、前記第２入力手段に入力される音声を表す音声データとに基づき、前記音声認識手段における認識手法を学習する学習手段と
を含む、車載情報システム。 First input means for inputting a command by manual operation;
A second input means for inputting voice;
Speech recognition means for recognizing speech input to the second input means;
Information processing means for outputting predetermined information based on the content of the command input to the first input means or the meaning content of the voice recognized by the voice recognition means;
When a command is input to the first input unit within a predetermined time after the voice is input to the second input unit, the data representing the content of the command and the voice input to the second input unit are An in-vehicle information system comprising learning means for learning a recognition method in the voice recognition means based on voice data to be expressed.

前記学習手段は、前記第１入力手段に入力された指令の内容を表すデータと、前記音声データとの一致度合いに基づき、前記第２入力手段に入力される音声の特徴を学習することにより、前記音声認識手段における認識手法を学習する、請求項１に記載の車載情報システム。 The learning means learns the characteristics of the voice input to the second input means based on the degree of coincidence between the data representing the content of the command input to the first input means and the voice data, The in-vehicle information system according to claim 1, wherein a recognition method in the voice recognition unit is learned.

音声の特徴を評価するための基準値と、音声データの意味内容を特定するための閾値とを格納する音声認識辞書をさらに含み、
前記音声認識手段は、前記第２入力手段に入力される音声の特徴の評価値と前記音声認識辞書に格納された基準値との類似度を前記閾値と比較することによって当該音声の意味内容を認識するように構成されており、
前記学習手段は、前記第１入力手段に入力された指令の内容を表すデータと、前記音声データとの一致度合いに基づいて前記閾値を変更することにより、前記音声認識手段における認識手法を学習する、請求項２に記載の車載情報システム。 A speech recognition dictionary that stores a reference value for evaluating the characteristics of the speech and a threshold value for specifying the semantic content of the speech data;
The voice recognition unit compares the similarity between the evaluation value of the voice feature input to the second input unit and the reference value stored in the voice recognition dictionary with the threshold value, thereby obtaining the semantic content of the voice. Configured to recognize,
The learning means learns a recognition method in the voice recognition means by changing the threshold based on the degree of coincidence between the data representing the content of the command input to the first input means and the voice data. The in-vehicle information system according to claim 2.

前記音声の特徴は、声紋、アクセント、ピッチ、又は発話内容の少なくともいずれか一つである、請求項２又は３に記載の車載情報システム。 The in-vehicle information system according to claim 2 or 3, wherein the feature of the voice is at least one of a voice print, an accent, a pitch, and an utterance content.

前記第１入力手段はタッチパネル式表示手段、又は遠隔操作装置である、請求項１乃至４のいずれか一項に記載の車載情報システム。 The in-vehicle information system according to any one of claims 1 to 4, wherein the first input means is a touch panel display means or a remote control device.

前記情報処理手段は、ナビゲーション装置の演算処理手段に含まれる、請求項１乃至５のいずれか一項に記載の車載情報システム。 The in-vehicle information system according to any one of claims 1 to 5, wherein the information processing unit is included in an arithmetic processing unit of a navigation device.