JP4784156B2

JP4784156B2 - Speech synthesizer for performing voice guidance by a plurality of characters, speech synthesis method, program thereof, and information recording medium on which the program is recorded

Info

Publication number: JP4784156B2
Application number: JP2005158574A
Authority: JP
Inventors: 実新川; 篤鶴見; 邦博須賀; さゆり柚木崎
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-05-31
Filing date: 2005-05-31
Publication date: 2011-10-05
Anticipated expiration: 2025-05-31
Also published as: JP2006337432A

Abstract

<P>PROBLEM TO BE SOLVED: To perform speech guidance which is not missed and easy to hear by changing characters to be voiced for each message. <P>SOLUTION: A speech synthesizer is equipped with a speech unit data storage means of storing speech unit data of a plurality of characters together with identifiers by the characters, and the identifiers and character kind codes included in document data are made to correspond to each other, and speech unit data corresponding to document data are detected and synthesized speech data are generated from the speech unit data detected by a speech unit editing means and voiced as a speech by a speech output means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、複数のキャラクタ音声を使用して音声案内を行う音声合成装置、音声合成方法、そのプログラム及びこのプログラムが記録された情報記録媒体に関する。 The present invention relates to a voice synthesizer that performs voice guidance using a plurality of character voices, a voice synthesis method, a program thereof, and an information recording medium on which the program is recorded.

従来、ナビゲーション装置は種々の情報を映像や音声で提供することが知られている。映像や音声での情報の提供は必要最低限の案内を行うものがほとんどであるが、近年、情報の伝達手段として映像表示の工夫が施されるものも開発されている。 Conventionally, it is known that a navigation apparatus provides various types of information as video and audio. Most of the provision of information by video and audio provides the necessary minimum guidance, but in recent years, there has also been developed a video display device as an information transmission means.

例えば、特許文献１には、車両が行政区画を越境したことを判定すると、表示装置上に表示させる現在地マークを新たな行政区画に応じたキャラクタに変更するナビゲーション装置が開示されている。特許文献２には、車両の走行状況によって表示画面上に表示されたキャラクタの表情、態度及び発声する文言が変化するナビゲーション装置が開示されている。また、特許文献３では、車両の走行状態を検出し、ドライバーが交通ルールを守っているかを自動判定し、判定状況に応じて画面上に表示されるキャラクタの表情や容姿を変化させるナビゲーション装置が開示されている。
特開２００２−３９７７１号公報特開平１１−２５９２７１号公報特開２００２−２１３９８６号公報 For example, Patent Document 1 discloses a navigation device that changes a current location mark to be displayed on a display device to a character corresponding to a new administrative division when it is determined that the vehicle has crossed the administrative division. Patent Document 2 discloses a navigation device in which a character's facial expression, attitude, and utterance wording displayed on a display screen vary depending on the traveling state of the vehicle. Patent Document 3 discloses a navigation device that detects a running state of a vehicle, automatically determines whether a driver is observing traffic rules, and changes the facial expression and appearance of a character displayed on the screen according to the determination status. It is disclosed.
JP 2002-37971 A JP 11-259271 A JP 2002-213986 A

しかしながら、車両等の運転中は表示画面を注視すると危険であり、画面上の演出に工夫を凝らしても視覚による情報取得には限界がある。この点ナビゲーション装置等に搭載される音声合成装置は音声案内に伴う発話を行う機能が備えられているが、発声される音声は機械音や単一合成音声であり単調であるという問題がある。特に、近年ナビゲーション装置には電子メールの送受信機能を備えて内容を読み上げたり、各種センサによる警告メッセージを発声したりと多機能化が図られている。経路案内メッセージ以外に、趣旨が異なる音声が発声されてもユーザのメッセージに対する注意が低下し、聞き逃しや聞き取りが困難である場合がある。 However, it is dangerous to watch the display screen while driving a vehicle or the like, and there is a limit to visual information acquisition even if the effects on the screen are devised. A voice synthesizer mounted on a navigation device or the like is provided with a function for performing speech accompanying voice guidance. However, there is a problem that a voice to be uttered is a mechanical sound or a single synthesized voice and is monotonous. In particular, in recent years, navigation devices have been equipped with an electronic mail transmission / reception function to read out the contents, and to utter warning messages from various sensors, and thus have been made multifunctional. In addition to the route guidance message, even if voices with different meanings are uttered, the user's attention to the message is lowered, and it may be difficult to miss or hear.

本発明の課題は、ナビゲーション装置等に搭載される音声合成装置において、発声するキャラクタをメッセージ毎に変更して聞き逃しがなく、聞き取りやすい音声案内を行うことである。 SUMMARY OF THE INVENTION An object of the present invention is to provide voice guidance that is easy to hear without changing the character to be uttered for each message in a speech synthesizer mounted on a navigation device or the like.

上記課題を解決するために請求項１に記載の発明は、
案内内容を示す文章データに対応する音声案内を出力する音声合成装置において、
複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶手段と、
前記音片データ記憶手段から前記文章データと対応する音片データを検出する音片データ検出手段と、
前記音片データ検出手段が音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する検索指定手段と、
前記音片データ検出手段が検出した音片データから合成音声データを生成する音片編集手段と、
前記音片編集手段が生成した合成音声データを音声として出力する音声出力手段と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する優先順位比較手段と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う処理状況記録手段と、
を備え、
前記文章データ毎に、前記処理状況記録手段に一時記録される長さを表した時間コードが更に付されており、
前記処理状況記録手段は、前記優先順位が低い文章データの一時記録を行った時間を記憶し、
前記優先順位比較手段の比較の結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記処理状況記録手段の一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行い、
前記優先順位が低い文章データの再合成処理の際、前記処理状況記録手段の記録した時間と現在時刻との間隔が前記時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わないことを特徴とする。 In order to solve the above problems, the invention according to claim 1
In a speech synthesizer that outputs voice guidance corresponding to text data indicating guidance content,
Sound piece data storage means for storing sound piece data by a plurality of characters with an identifier for each character;
Sound piece data detecting means for detecting sound piece data corresponding to the sentence data from the sound piece data storage means;
When the sound piece data detecting means detects sound piece data, search specifying means for specifying a search for sound piece data having the identifier corresponding to the character type code included in the sentence data;
Sound piece editing means for generating synthesized speech data from sound piece data detected by the sound piece data detecting means;
Voice output means for outputting the synthesized voice data generated by the sound piece editing means as voice;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A priority comparison means for comparing ranks;
And processing status recording device intends row a temporary recording processing status of the text data is voice synthesis was done before the interruption synthesis process of the another text data,
With
For each sentence data, a time code representing a length temporarily recorded in the processing status recording unit is further attached,
The processing status recording means stores the time when the sentence data with low priority is temporarily recorded,
As a result of the comparison by the priority comparison means, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the synthesis of the other sentence data is finished , Based on the temporary recording of the processing status recording means, performs speech synthesis of the text data that has been subjected to the speech synthesis processing before the interrupt synthesis processing of the other text data,
When re-synthesizing text data with low priority, if the interval between the time recorded by the processing status recording means and the current time exceeds the value of the time code, re-synthesizing text data with low priority It is characterized by not performing.

請求項２に記載の発明は、請求項１に記載の音声合成装置において、
音声合成要求を示すトリガー信号を受信した際、前記トリガー信号と前記キャラクタとの対応付けが行われたテーブルに基づいてキャラクタを選択し、選択したキャラクタに対応するキャラクタ種別コードを前記文章データに含めるキャラクタ選択・指示手段を更に備えることを特徴とする。 The invention according to claim 2 is the speech synthesizer according to claim 1,
When a trigger signal indicating a voice synthesis request is received, a character is selected based on a table in which the trigger signal and the character are associated with each other, and a character type code corresponding to the selected character is included in the sentence data It further comprises character selection / instruction means.

請求項３に記載の発明は、請求項１又は２に記載の音声合成装置において、
前記音片データのキャラクタに対応するキャラクタの素片データをキャラクタ毎に識別子を付して記憶する素片データ記憶手段と、
前記音片データ検出手段が前記文章データと対応する音片データを検出する際、前記音片データ記憶手段に対応する音片データがない場合に、前記素片データに基づいて前記音片データのない表音文字列の音声データを生成する音響処理手段を更に備え、
前記音片編集手段は、前記音片記憶手段から検出した音片データと音響処理手段により生成された音声データとから音声データを合成することを特徴とする。 The invention according to claim 3 is the speech synthesizer according to claim 1 or 2,
Segment data storage means for storing the segment data of the character corresponding to the character of the speech segment data with an identifier for each character;
When the sound piece data detecting means detects the sound piece data corresponding to the sentence data, if there is no sound piece data corresponding to the sound piece data storage means, the sound piece data is detected based on the element data. Further comprising acoustic processing means for generating speech data of no phonogram string,
The sound piece editing means synthesizes sound data from sound piece data detected from the sound piece storage means and sound data generated by the sound processing means.

請求項４に記載の発明は、
複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶手段と、前記音片データ記憶手段から前記文章データと対応する音片データを検出する音片データ検出手段と、により案内内容を示す文章データに対応する音声案内を音声合成して出力する音声合成方法であって、
前記音片データ検出手段が音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する工程と、
前記音片データ検出手段が検出した音片データから合成音声データを生成する工程と、
前記生成した合成音声データを音声として出力する工程と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する工程と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う工程と、
前記優先順位が低い文章データの一時記録を行った時間を記憶する工程と、
前記優先順位を比較した結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行う工程と、
前記優先順位が低い文章データの再合成処理の際、前記記録した時間と現在時刻との間隔が、前記文章データ毎に付され、前記一時記録される長さを表した時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わない工程と、
を備えることを特徴とする。 The invention according to claim 4
Sound piece data storage means for storing sound piece data by a plurality of characters with an identifier for each character, and sound piece data detection means for detecting sound piece data corresponding to the sentence data from the sound piece data storage means , A voice synthesis method for synthesizing and outputting voice guidance corresponding to sentence data indicating guidance content,
Designating the search for sound piece data having the identifier corresponding to the character type code included in the sentence data when the sound piece data detecting means detects the sound piece data;
Generating synthesized speech data from the speech piece data detected by the speech piece data detecting means;
Outputting the generated synthesized speech data as speech;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A step of comparing ranks;
The other of a temporary record of the processing status of the text data is speech synthesis had been performed before the interrupt synthesis processing of the text data as the line cormorant Engineering,
Storing the time when temporary recording of the sentence data with low priority is performed;
As a result of comparing the priorities, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the speech synthesis of the other sentence data is finished, serial and performing speech synthesis of text data processing speech synthesis has been performed before the interruption synthesis process of the other sentence data based on the temporary recording,
During the priority resynthesis processing low text data, prior to the interval between Kiki recording the time and the current time is assigned to each of the sentence data, the temporary recording The time code value that represents the length If it exceeds, the step of not re-synthesizing the sentence data with low priority,
It is characterized by providing.

請求項５に記載の発明は、請求項４に記載の音声合成方法において、
音声合成要求を示すトリガー信号を受信した際、前記トリガー信号と前記キャラクタとの対応付けが行われたテーブルに基づいてキャラクタを選択し、選択したキャラクタに対応するキャラクタ種別コードを前記文章データに含める工程を更に備えることを特徴とする。 The invention according to claim 5 is the speech synthesis method according to claim 4,
When a trigger signal indicating a voice synthesis request is received, a character is selected based on a table in which the trigger signal and the character are associated with each other, and a character type code corresponding to the selected character is included in the sentence data The method further includes a step.

請求項６に記載の発明は、請求項４又は５に記載の音声合成方法において、
前記音片データのキャラクタに対応するキャラクタの素片データをキャラクタ毎に識別子を付して記憶する工程と、
前記音片データ検出手段が前記文章データと対応する音片データを検出する際、前記音片データ記憶手段に対応する音片データがない場合に、前記素片データに基づいて前記音片データのない表音文字列の音声データを生成する工程と、
を備え、
前記音片データ検出手段が検出した音片データから合成音声データを生成する際に、前記音片記憶手段から検出した音片データと音響処理手段により生成された音声データとから音声データを合成することを特徴とする。 The invention according to claim 6 is the speech synthesis method according to claim 4 or 5,
Storing the fragment data of the character corresponding to the character of the sound piece data with an identifier for each character;
When the sound piece data detecting means detects the sound piece data corresponding to the sentence data, if there is no sound piece data corresponding to the sound piece data storage means, the sound piece data is detected based on the element data. Generating speech data of no phonetic character string;
With
When generating the synthesized voice data from the voice piece data detected by the voice piece data detecting means, the voice data is synthesized from the voice piece data detected by the voice piece storage means and the voice data generated by the acoustic processing means. It is characterized by that.

請求項７に記載の発明は、案内内容を示す文章データに対応する音声案内を出力するコンピュータに、
複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶機能と、
前記音片データ記憶機能から前記文章データと対応する音片データを検出する音片データ検出機能と、
前記音片データ検出機能により音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する検索指定機能と、
前記音片データ検出機能により検出した音片データから合成音声データを生成する音片編集機能と、
前記音片編集機能が生成した合成音声データを音声として出力する音声出力機能と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する優先順位比較機能と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う処理状況記録機能と、
前記処理状況記録手段により、前記優先順位が低い文章データの一時記録を行った時間を記憶する機能と、
前記優先順位比較手段の比較の結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記処理状況記録機能の一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行う機能と、
前記優先順位が低い文章データの再合成処理の際、前記処理状況記録機能の記録した時間と現在時刻との間隔が、前記文章データ毎に付され、前記処理状況記録機能に一時記録される長さを表した時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わない機能と、を実現させることを特徴とするプログラムである。 The invention according to claim 7 is a computer that outputs voice guidance corresponding to text data indicating guidance content.
Sound piece data storage function for storing sound piece data by a plurality of characters with identifiers for each character,
A sound piece data detection function for detecting sound piece data corresponding to the sentence data from the sound piece data storage function;
A search designation function for designating a search for sound piece data having the identifier corresponding to a character type code included in the sentence data when detecting sound piece data by the sound piece data detection function;
A sound piece editing function for generating synthesized voice data from the sound piece data detected by the sound piece data detection function;
A voice output function for outputting the synthesized voice data generated by the sound piece editing function as voice;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A priority comparison function for comparing ranks;
And processing status recording function intends row a temporary recording processing status of the text data is voice synthesis was done before the interruption synthesis process of the another text data,
A function for storing the time when the sentence data having a low priority is temporarily recorded by the processing status recording means;
As a result of the comparison by the priority comparison means, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the synthesis of the other sentence data is finished , A function for performing speech synthesis of text data that has been subjected to speech synthesis processing before the interrupt synthesis processing of the other text data based on the temporary recording of the processing status recording function ;
At the time of the recombination processing of the sentence data with low priority, an interval between the time recorded by the processing status recording function and the current time is assigned to each text data and is temporarily recorded in the processing status recording function When the value of the time code representing the above is exceeded, a program that realizes a function of not re-synthesizing sentence data having a low priority is realized.

請求項８に記載の発明は、請求項７に記載のプログラムを格納したコンピュータ読み取り可能な情報記録媒体である。 The invention according to claim 8 is a computer-readable information recording medium storing the program according to claim 7.

本発明によれば、発声するキャラクタをメッセージ毎に変更して聞き逃しがなく、聞き取りやすい音声案内を行うことができる。また、キャラクタの違いにより案内内容の差別化を図ることができる。 According to the present invention, it is possible to change the character to be uttered for each message so that there is no missed listening, and voice guidance that is easy to hear can be performed. Further, the guidance content can be differentiated depending on the character.

また、ナビゲーションシステムの技術分野ではナビゲーション案内だけでなく、電子メール機能や映像音声出力機能等が負荷されて多機能化される傾向にある。これら機能が音声合成用に出力するテキストデータは多種多様になる。本発明は、複数キャラクタの音声合成を可能とするため、これら各種の機能から出力される各種の案内毎に異なるキャラクタを設定し、異なる音声で情報を取得できる。 Further, in the technical field of navigation systems, not only navigation guidance but also an electronic mail function and a video / audio output function tend to be loaded and become multifunctional. There are a wide variety of text data output by these functions for speech synthesis. Since the present invention enables voice synthesis of a plurality of characters, different characters can be set for each type of guidance output from these various functions, and information can be acquired with different voices.

次に、図を用いて本発明を実施するための最良の形態について説明する。本実施形態では音声合成装置をカーナビゲーション装置に搭載した例を説明する。 Next, the best mode for carrying out the present invention will be described with reference to the drawings. In this embodiment, an example in which a speech synthesizer is mounted on a car navigation device will be described.

図１は、本発明を適用したカーナビゲーション装置１の構成を示した概要図である。カーナビゲーション装置１は、制御部２、ナビゲーション部３、音声合成部６、通信部４、センサ部５、入力部７、表示部８及び音声出力部９から構成される。 FIG. 1 is a schematic diagram showing a configuration of a car navigation apparatus 1 to which the present invention is applied. The car navigation device 1 includes a control unit 2, a navigation unit 3, a voice synthesis unit 6, a communication unit 4, a sensor unit 5, an input unit 7, a display unit 8, and a voice output unit 9.

制御部２は、ＣＰＵ１０、ＲＯＭ（Read Only Memory）１１及びＲＡＭ（Random Access Memory）１２から構成される。ＲＯＭ１１に予め記憶されたオペレーションプログラム及び各種のアプリケーションプログラムをワークエリアとしてのＲＡＭ１２に展開し、カーナビゲーション装置１の全体制御を行う。また、後述するナビゲーション部３、通信部４、センサ部５及び入力部７から出力される種々の信号に基づいて各種処理を行う。この各種処理に連動して音声発声を行う為のテキストデータをＲＯＭ１１から読み出し音声合成部６に送信する。 The control unit 2 includes a CPU 10, a ROM (Read Only Memory) 11, and a RAM (Random Access Memory) 12. The operation program and various application programs stored in advance in the ROM 11 are expanded in the RAM 12 as a work area, and the overall control of the car navigation apparatus 1 is performed. In addition, various processes are performed based on various signals output from the navigation unit 3, the communication unit 4, the sensor unit 5, and the input unit 7 described later. Text data for voice utterance is read from the ROM 11 and transmitted to the voice synthesizer 6 in conjunction with these various processes.

テキストデータにはキャラクタ種別コードが付されている。キャラクタ種別コードとは、テキストデータの内容を表す音声の種類、即ちテキストデータの内容を発声するキャラクタを指定するコードである。このコードと音片データベース及び素片データベースに記憶される音片データ及び素片データの記憶領域Ｍ１，Ｓ１，Ｓ２，Ｓｎ，ＭＳ１，ＳＳ２及びＳＳｎと対応するようになっている。キャラクタ種別コードはＲＯＭ１１に予め記憶されており、テーブル上でナビゲーション部３、通信部４、センサ部５及び入力部７から出力される種々の信号と対応付けがなされている。ＣＰＵ１０はナビゲーション部３、通信部４、センサ部５及び入力部７から出力される種々の信号を受信するとテキストデータとキャラクタ種別コードとを音声合成部６に出力する。 The character type code is attached to the text data. The character type code is a code for designating a voice type representing the contents of text data, that is, a character that utters the contents of text data. This code corresponds to the sound piece data stored in the sound piece database and the piece database and the storage areas M1, S1, S2, Sn, MS1, SS2 and SSn of the piece data. The character type code is stored in advance in the ROM 11, and is associated with various signals output from the navigation unit 3, the communication unit 4, the sensor unit 5, and the input unit 7 on the table. When the CPU 10 receives various signals output from the navigation unit 3, the communication unit 4, the sensor unit 5, and the input unit 7, the CPU 10 outputs text data and a character type code to the speech synthesis unit 6.

また、ＲＡＭ１１は音声合成部６での処理経過を記録するログ機能を備える。後述するテキストデータの合成処理時に他のテキストデータの合成処理要求が発生した場合に、先のテキストデータの合成処理経過又は他のテキストデータの合成処理経過を記憶する。例えば、現在音声合成処理部６で音声合成処理を行っている途中で、他のテキストデータの割込み処理要求がきた場合に、現在行っている音声合成を中断し、割込み処理を優先したとする。このとき現在まで合成処理を行っていたテキストデータのしょり経過をログに記録し、割込み処理が終了した後にこの記録に基づいて、再び音声合成処理を行うものである。また、テキストデータにもこの機能に関連する処理待機時間コードが付されている。
割込み合成処理及び処理待機時間コードについては後述する。 Further, the RAM 11 has a log function for recording the progress of processing in the speech synthesizer 6. When a text data compositing process request is generated during text data compositing described later, the text data compositing process progress or the text data compositing process progress is stored. For example, when an interrupt processing request for other text data is received while the speech synthesis processing unit 6 is currently performing speech synthesis processing, the current speech synthesis is interrupted and the interrupt processing is given priority. At this time, the progress of the text data that has been subjected to the synthesizing process is recorded in a log, and after the interruption process is completed, the voice synthesizing process is performed again based on this recording. The text data is also given a processing standby time code related to this function.
The interrupt synthesis process and the processing standby time code will be described later.

ナビゲーション部３は、少なくとも３つのＮＡＶＳＴＡＲ（NAVigation System using Timing And Ranging）等の人工衛星から送信されるＧＰＳ（Global Positioning System）信号をＧＰＳ信号受信部１３で受信する。受信した信号が示す位置座標の値からカーナビゲーション装置１の現在位置（カーナビゲーション装置１を搭載した車両等の現在位置。）を演算により算出する。検出された現在位置の座標データとメモリ（不図示）に記憶される地図データとを対応させて制御部２に供給する。制御部２はこのデータを表示部８に供給し現在位置の表示を行う。 The navigation unit 3 receives GPS (Global Positioning System) signals transmitted from artificial satellites such as at least three NAVSTAR (NAVIGation System using Timing And Ranging) by the GPS signal receiving unit 13. The current position of the car navigation device 1 (the current position of a vehicle or the like equipped with the car navigation device 1) is calculated from the position coordinate value indicated by the received signal. The detected coordinate data of the current position is associated with map data stored in a memory (not shown) and supplied to the control unit 2. The control unit 2 supplies this data to the display unit 8 to display the current position.

通信部４は、各種の無線通信手段を介して各種のデータ通信を行う。ＶＩＣＳ（Vehicle Information and Communication System）等の道路交通情報をＦＭビーコン、電波ビーコン又は光ビーコンから受信しナビゲーション部３に供給する。また、制御部２のＲＯＭ１１には電子メールの送受信を可能とするアプリケーションを格納しており、この際の電子メールデータ等の送受信も行う。
無線通信手段としては、ＧＳＭ（Global System for Mobile communication）方式、ＧＰＲＳ（General Packet Radio System ；汎用パケット無線システム）方式、ＰＤＣ（Personal Digital Cellular）方式、ＣＤＭＡ（Code Division Multiple Access）方式、ＰＨＳ（Personal Handyphone System）方式、Bluetooth（登録商標）無線通信方式、無線ＬＡＮ（Local Area Network）等の各種携帯電話や無線モデムの規格、通信方式が適用できる。図示しない通信業者の無線基地局と通信用アンテナ（不図示）との間でデータ通信が行われ、受信された信号を制御部３２へ出力する。 The communication unit 4 performs various data communications via various wireless communication means. Road traffic information such as VICS (Vehicle Information and Communication System) is received from the FM beacon, radio wave beacon or optical beacon and supplied to the navigation unit 3. The ROM 11 of the control unit 2 stores an application that enables transmission / reception of electronic mail, and also transmits / receives electronic mail data and the like at this time.
Wireless communication means include GSM (Global System for Mobile communication), GPRS (General Packet Radio System), PDC (Personal Digital Cellular), CDMA (Code Division Multiple Access), PHS (Personal) Various cellular phones and wireless modem standards such as Handyphone System), Bluetooth (registered trademark) wireless communication, and wireless LAN (Local Area Network), and communication methods can be applied. Data communication is performed between a radio base station of a communication carrier (not shown) and a communication antenna (not shown), and the received signal is output to the control unit 32.

センサ部５は、車両の各所に備えられたセンサと有線又は無線で接続され、このセンサからの出力結果を制御部２に供給するものである。本実施形態では、速度計、回転計、燃料計、水温計、走行距離計等のメータパネルに供えられる各種の計測器の状態を検出するセンサが設けられている。また、車両の進行方向にある物体を検出する赤外線センサ、ステアリングの舵角を検出する舵角センサが設けられる。
ＲＯＭ１１には、これらセンサの出力結果が異常値であると警告を発するアプリケーションが備えられており、ＣＰＵ１０は表示部８に警告画面（映像及びテキスト）を表示させ、同時に音声合成部６に警告テキストデータを供給し音声出力部９から警告メッセージが出力されるようになっている。 The sensor unit 5 is connected to sensors provided in various parts of the vehicle by wire or wirelessly, and supplies an output result from the sensor to the control unit 2. In the present embodiment, sensors for detecting the state of various measuring instruments provided in a meter panel such as a speedometer, a tachometer, a fuel meter, a water temperature meter, and an odometer are provided. An infrared sensor that detects an object in the traveling direction of the vehicle and a steering angle sensor that detects the steering angle of the steering are provided.
The ROM 11 includes an application that issues a warning that the output result of these sensors is an abnormal value. The CPU 10 displays a warning screen (video and text) on the display unit 8 and simultaneously displays a warning text on the voice synthesis unit 6. Data is supplied and a warning message is output from the voice output unit 9.

表示部８は、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electro Luminescence）等のＦＰＤ（flat Panel Display）から構成され、カーナビゲーション装置１で処理される地図情報や案内情報等の各種情報を表示する。なお、表示部８としてＣＲＴ（Cathode Ray Tube）を適用することも当然に可能である。 The display unit 8 includes an FPD (flat panel display) such as an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence), and displays various types of information such as map information and guidance information processed by the car navigation device 1. . Naturally, a CRT (Cathode Ray Tube) can be applied as the display unit 8.

入力部７は、ユーザの操作により各種の設定を行うものである。表示部８に設けられたタッチパネル１４や操作キー１５により構成される。なお、外部機器からのデータを入力するインターフェイスを設ける構成としてもよい。 The input unit 7 performs various settings by user operations. The touch panel 14 and the operation keys 15 provided on the display unit 8 are configured. Note that an interface for inputting data from an external device may be provided.

音声出力部９は、各種の車両搭載スピーカ（不図示）と有線又は無線で接続され、後述する音声合成部６で生成された合成音声データを音声として出力する。ＦＭ周波数帯を受信可能なカーオーディオに音声電波を送信する構成としてもよい。出力される音声は、表示部８に表示される現在位置や右左折地点の指示等の各種の映像と連動して発声する場合もあり、近郊の渋滞情報等を音声で出力する場合もある。 The voice output unit 9 is connected to various vehicle-mounted speakers (not shown) by wire or wirelessly, and outputs the synthesized voice data generated by the voice synthesis unit 6 described later as voice. It is good also as a structure which transmits an audio wave to the car audio which can receive FM frequency band. The output voice may be uttered in conjunction with various images such as an indication of the current position and right / left turn points displayed on the display unit 8, and there may be a voice output of nearby traffic information.

次に、音声合成部６について説明する。音声合成の技術分野では、予め人の発声により単語又は文節を録音した音片データから音片データベースを構築し、この音片データと発声を所望する内容を表したテキストデータとを対応させて音声合成を行う録音編集方式が利用されている。これに対し、１文字毎の単位で生成された波形データから人工的な音片を生成しこれを繋ぎ合わせて音声データを生成する機械式（以下、単に「規則合成音声処理」という。）の音声合成方式がある。録音編集方式は実際の人の発声によるためより肉声に近い音声を再生することができる。このため機械式の合成音声に比して聞きやすいというメリットがある。
本実施形態では、この録音編集方式に機械式の音声合成方式を加えた複合型の音声合成方式（以下、単に「ハイブリッド式音声合成方式」という。）を適用した音声合成装置を採用する。ハイブリッド式音声合成方式は、音片データと素片データとをそれぞれ異なる記憶装置又は領域に記憶し、発声を所望するテキストデータを両者のデータを組み合わせて音声合成を行うものである。 Next, the speech synthesizer 6 will be described. In the technical field of speech synthesis, a speech piece database is constructed from speech piece data in which words or phrases are recorded in advance by human speech, and the speech data is associated with text data representing the content desired to be spoken. A recording editing method is used to perform composition. On the other hand, a mechanical type (hereinafter simply referred to as “rule synthesis speech processing”) that generates speech data by generating artificial sound pieces from waveform data generated in units of one character and connecting them. There is a speech synthesis method. Since the recording and editing method is based on the actual voice of a person, it is possible to reproduce a voice closer to the real voice. For this reason, there is a merit that it is easier to hear compared to mechanical synthesized speech.
In the present embodiment, a speech synthesizer is adopted in which a composite speech synthesis method (hereinafter simply referred to as a “hybrid speech synthesis method”) obtained by adding a mechanical speech synthesis method to the recording and editing method. In the hybrid speech synthesis method, speech piece data and segment data are stored in different storage devices or areas, respectively, and text data desired to be uttered is combined to perform speech synthesis.

図２に音声合成部６の構成を示す。
音声合成部６は、言語処理部２０、音片編集部２１、音響処理部２３、キャラクタ切替処理部２２、音片データ検索部２４、音片データベース３０、素片データ検索部２５及び素片データベース３１から構成される。 FIG. 2 shows the configuration of the speech synthesizer 6.
The speech synthesis unit 6 includes a language processing unit 20, a speech piece editing unit 21, an acoustic processing unit 23, a character switching processing unit 22, a speech piece data search unit 24, a speech piece database 30, a segment data search unit 25, and a segment database. 31.

音片データベース３０は、ハードディスクやＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）等の書換え可能な不揮発性メモリから構成される。所定の単語、文節又は文が人の発声により予め録音された音声データである音片データを記録したものである。音片データベース３０には異なるキャラクタの音片が記憶されている。 The sound piece database 30 includes a rewritable nonvolatile memory such as a hard disk or an EEPROM (Electrically Erasable Programmable Read Only Memory). This is a recording of sound piece data, which is voice data in which a predetermined word, phrase or sentence is recorded in advance by a human voice. The sound piece database 30 stores sound pieces of different characters.

ここで、「キャラクタ」とは音声の有する性格を意味するものであり、特徴的又は個性的な口調、役柄、声色又はこれらの組合せにより特有の性格を有するものである。例えば、口調に癖があり関西弁で発声する場合等である。また、キャラクタは人に限らず動物等やこれらを擬人化したものでもよい。例えば、犬の鳴き声等であってもよいし、犬を擬人化して人間の言葉を発するものとしてもよいし、既存のアニメ−ションキャラクタ等であってもよい。 Here, the “character” means a personality of speech, and has a unique personality with a characteristic or individual tone, character, voice color, or a combination thereof. For example, there is a habit in the tone and the voice is spoken in the Kansai dialect. Further, the character is not limited to a person but may be an animal or the like, or anthropomorphized person. For example, it may be a cry of a dog, an anthropomorphic dog that utters human words, or an existing animation character.

音片データベース３０の記憶領域Ｍ１には、ナビゲーション部３のナビゲーション処理に連動するキャラクタ音声を示す音片データが記憶されている。同様に、記憶領域Ｓ１には、電子メールの読み上げ用のキャラクタを示す音片データが記憶される。記憶領域Ｓ２にはセンサ部５の検出結果に基づいて警告を行うキャラクタを示す音片データが記憶されている。これらの音片データ群は格納領域を示す識別子がキャラクタの種類毎に付されて記憶されている。具体的には、キャラクタ毎に分類されたディレクトリ（あるいはフォルダ）により格納されている。 In the storage area M1 of the sound piece database 30, sound piece data indicating character voice linked to the navigation processing of the navigation unit 3 is stored. Similarly, in the storage area S1, sound piece data indicating a character for reading out an e-mail is stored. In the storage area S2, sound piece data indicating a character that gives a warning based on the detection result of the sensor unit 5 is stored. These sound piece data groups are stored with an identifier indicating a storage area attached to each character type. Specifically, it is stored in a directory (or folder) classified for each character.

なお、音片データベースは単一の記憶素子で構成しても、キャラクタ毎に異なる記憶素子で構成してもよい。また、本実施例では同一キャラクタの音片データが同一の「記憶領域」に記憶されるとして説明するが、これは音片データが記憶素子上において必ずしも物理的な連続する領域に記憶されることを意味することに限定されない。例えば、ハードディスクであれば、同一キャラクタの音片データ群がディスク、トラックあるいはセクタ毎単位で物理的に離れた状態で記憶されることも有る。 The sound piece database may be composed of a single memory element or may be composed of memory elements that are different for each character. In the present embodiment, the sound piece data of the same character is described as being stored in the same “storage area”, but this means that the sound piece data is not necessarily stored in a physically continuous area on the storage element. Is not limited to mean. For example, in the case of a hard disk, sound piece data groups of the same character may be stored in a state physically separated in units of disks, tracks or sectors.

素片データベース３１は、ＲＲＯＭ（Programmable Read Only Memory）やハードディスク等の不揮発性メモリから構成される。素片は予め人の発声により録音されたものであり、素片データベース３１に格納された各素片データは音片データベース３０の各キャラクタと同一のキャラクタにより発声されたものである。従って、素片データベースの記憶領域ＭＳ１に記憶された素片データは音片データベースＭ１に記憶された音片データと同一キャラクタによるものである。同様に記憶領域Ｓ１とＳＳ１、記憶領域Ｓ２とＳＳ２は同様のキャラクタの発声によるものである。このように、各素片データは音片データと基本的には同じ波形データを有するため声色等は両者でほぼ同一なものとなり、合成音声としたときに違和感のない発声を行うことができる。
なお、素片データベースの各記憶領域が必ずしも物理的に連続した領域を意味しないのは音片データベースと同様である。 The segment database 31 is composed of a non-volatile memory such as an RROM (Programmable Read Only Memory) or a hard disk. The segment is recorded in advance by a person's utterance, and each segment data stored in the segment database 31 is uttered by the same character as each character in the speech segment database 30. Therefore, the segment data stored in the storage area MS1 of the segment database is based on the same character as the speech segment data stored in the speech segment database M1. Similarly, the storage areas S1 and SS1 and the storage areas S2 and SS2 are due to the utterance of the same character. As described above, since each piece data has basically the same waveform data as the sound piece data, the voice color and the like are almost the same in both, and it is possible to make a voice without feeling uncomfortable when it is a synthesized voice.
In addition, it is the same as the sound piece database that each storage area of the element database does not necessarily mean a physically continuous area.

言語処理部２０は、入力されたテキストデータに含まれる表意文字列の形態素解析を行い単語単位あるいは単語に助詞や助動詞を付随させた文節単位に分割を行う。この分割した単語や文節に対し、ＲＯＭ１１に記憶された単語辞書（不図示）を参照しながら単語等の読み等を表す表音文字列へと変換を行う。単語辞書には漢字等の表意文字を含む単語等と、この単語等の読みを表す表音文字が記憶されている。 The language processing unit 20 performs morphological analysis of the ideographic character string included in the input text data, and divides the word unit or phrase unit in which a particle or auxiliary verb is attached to the word. The divided words and phrases are converted into phonetic character strings representing the reading of words and the like while referring to a word dictionary (not shown) stored in the ROM 11. The word dictionary stores words including ideographic characters such as kanji and phonograms representing readings of the words.

表音文字とは表意文字を音片編集部２１、音響処理部２３、音片データ検索部２４及び素片データ検索部２５で読み取り可能なデータ形式に変換したものである。合成を行う音声に関する各種の制御文字（記号）列や音声文字列等から構成される。図３に表音文字列のデータ構造の一例を模式的に示す。表音文字列は、制御文字列、音声文字列及び区切記号から構成される。 The phonetic character is obtained by converting an ideogram into a data format that can be read by the sound piece editing unit 21, the sound processing unit 23, the sound piece data search unit 24, and the segment data search unit 25. It consists of various control character (symbol) strings, voice character strings, and the like related to the voice to be synthesized. FIG. 3 schematically shows an example of the data structure of the phonetic character string. The phonetic character string is composed of a control character string, a voice character string, and a separator.

制御文字列は、声の大きさ、発声速度、音量、高域強調及び抑揚等の発声音声に関する各種の設定値を定めるデータである。例えば、図３の例では制御文字列として「３Ｓ５Ｖ７Ｔ０２」としている。それぞれ「声の大きさ」は「（５段階中の）３」、発声速度「Ｓ」は「（５段階中の）５」、音量「Ｖ」が「１０段階中の７」、「高音強調」は「０（＝なし）」、抑揚「Ｉ」が「（３段階中の）２」を意味する。 The control character string is data that defines various setting values related to the uttered voice, such as voice volume, utterance speed, volume, high frequency emphasis, and inflection. For example, in the example of FIG. 3, “3S5V7T02” is used as the control character string. “Volume” is “3” (in 5 steps), “S” is “5” (in 5 steps), “V” is “7 in 10”, “high tone emphasis” "Means" 0 (= none) "and the inflection" I "means" 2 "(in 3 stages).

音声文字列は、複数の音片が連続して発声されて得られる音声を表す連続音声データである。例えば、表音文字列が『明りょうな音声を合成します。』であるとした場合、上述した形態素解析や辞書変換により、図３に示すように、『メーリョーナオ’ ンセイオゴーセーシマ’ ス％』のごとく変換される。「メーリョーナ」等の各音節記号（表意文字における文節）は、カタカナ、ローマ字、無声化記号としての「％」及び鼻濁音化記号としての「＆」から構成される。また、アクセント記号として「’」（シングルクォート）は直前の音節にアクセントを付けることを示し、「*」（アスタリスク）は直前の音節に弱いアクセントを付けることを示し、「”」（ダブルクォート）は直前の音節に更に弱いアクセントを付けることを示す。
区切り記号は、句や文の終端や文中での発声の区切り等を意味するものである。例えば、疑問文であれば「？」、平叙文であれば「。」、文中であれば「、」等である。
このようにして変換された表音文字列は音片編集部２１に供給される。 The phonetic character string is continuous voice data representing a voice obtained by continuously uttering a plurality of sound pieces. For example, the phonetic string is “Synthesizes clear speech. , The above-described morpheme analysis and dictionary conversion are performed as shown in FIG. Each syllable symbol (phrase in ideographs) such as “Mellona” is composed of katakana, romaji, “%” as a devoicing symbol, and “&” as a nasal muffler symbol. In addition, “'” (single quote) as an accent symbol indicates that the previous syllable is accented, “*” (asterisk) indicates that a weak accent is added to the previous syllable, and “” (double quote) is immediately before. This indicates that a weaker accent is added to the syllable.
The delimiter signifies the end of a phrase or sentence, or the separation of utterances in a sentence. For example, “?” For a question sentence, “.” For a plain text, “,” etc. for a sentence.
The phonetic character string converted in this way is supplied to the sound piece editing unit 21.

キャラクタ切替処理部２２は、制御部２から供給される各種のテキストデータに予め付されたキャラクタ種別コードを管理する。即ち、ナビゲーション部３の処理に基づいて、制御部２から『次の信号を左です。』といったテキストデータを送信する場合、このテキストデータがナビゲーション用のキャラクタであることを表す「００」といったキャラクタ種別コードが付される。同様に、電子メールのテキストには「０１」、センサ部５に基づく警告メッセージのテキストデータには「１０」のキャラクタ種別コードが付される。キャラクタ種別コードと音片データに付された識別子とは予め所定のテーブルにより対応するようになっている。キャラクタ切替処理部２２は、キャラクタ種別コードの供給を受け、音片データ検索部２４、音響処理部２３及び素片データ検索部２５に対して音片データベース３０及び素片データベース３１へのアクセス領域を指示する。 The character switching processing unit 22 manages a character type code added in advance to various text data supplied from the control unit 2. That is, based on the processing of the navigation unit 3, “the next signal is left from the control unit 2. When the text data such as “” is transmitted, a character type code “00” indicating that the text data is a navigation character is added. Similarly, the character type code of “01” is attached to the text of the e-mail, and the text data of the warning message based on the sensor unit 5 is attached to “10”. The character type code and the identifier added to the sound piece data correspond to each other in advance by a predetermined table. The character switching processing unit 22 is supplied with the character type code, and provides an access area to the speech unit database 30 and the unit database 31 for the speech unit data search unit 24, the acoustic processing unit 23, and the unit data search unit 25. Instruct.

音片編集部２１は、音片データ検索部２４を介して表音文字列に対応する音片データを音片データベース３０から読み込む。必要な場合には音響処理部００から規則合成音声データを読み込む。これらデータを表音文字列の順序で組合せ、合成音声データを生成する（即ち、ハイブリッド式音声合成。）。なお、音片編集部２１が処理する音声合成手順については後述する。 The sound piece editing unit 21 reads sound piece data corresponding to the phonetic character string from the sound piece database 30 via the sound piece data search unit 24. If necessary, the rule synthesized voice data is read from the acoustic processing unit 00. These data are combined in the order of phonetic character strings to generate synthesized speech data (ie, hybrid speech synthesis). The speech synthesis procedure processed by the sound piece editing unit 21 will be described later.

音片データ検索部２４は、音片編集部２１から供給された表音文字列に対応する音片データを音片データベース３０から検索して音片編集部２１に供給する。検索に際しては、キャラクタ切替処理部２２から供給されるキャラクタ種別コードによりアクセスする記憶領域が指定される。指定された記憶領域に該当する音片データが無い場合には、この音片データが無い表音文字列を音響処理部２３に供給する。 The sound piece data search unit 24 searches the sound piece database 30 for sound piece data corresponding to the phonetic character string supplied from the sound piece editing unit 21 and supplies it to the sound piece editing unit 21. In the search, a storage area to be accessed is specified by the character type code supplied from the character switching processing unit 22. If there is no sound piece data corresponding to the designated storage area, the phonetic character string without this sound piece data is supplied to the acoustic processing unit 23.

音響処理部２３は、音片データベースに該当する音片データが無い表音文字列の供給を受けて規則合成音声処理を行う。即ち、表音文字列を素片に分割して素片データベース３１に記録された素片データと対応付けて単語、文節又は文単位で合成音声データを生成する。なお、素片とは音声の最小単位であり、表音文字列を構成する表音文字が表す音素を構成する音声の１サイクル分の音声データである。音響処理部２３はキャラクタ切替処理部２２から供給されるキャラクタ切替信号に基づいて、素片データ検索部２５を介して所定の記憶領域にアクセスし該当する素片データを検索し規則合成音声データを生成する。即ち、合成対象とする表音文字列がナビゲーションテキストであれば記憶領域ＭＳ１へとアクセスし記憶領域ＭＳ１に記憶される素片データから規則合成音声処理を行う。 The sound processing unit 23 receives the supply of the phonetic character string having no sound piece data corresponding to the sound piece database, and performs the rule synthesis sound processing. That is, the phonetic character string is divided into segments, and the synthesized speech data is generated in units of words, phrases, or sentences in association with the segment data recorded in the segment database 31. Note that a segment is a minimum unit of speech and is speech data for one cycle of speech constituting a phoneme represented by a phonetic character constituting a phonetic character string. Based on the character switching signal supplied from the character switching processing unit 22, the acoustic processing unit 23 accesses a predetermined storage area via the segment data search unit 25, searches for the corresponding segment data, and obtains the rule synthesized voice data. Generate. That is, if the phonetic character string to be synthesized is navigation text, the storage area MS1 is accessed, and rule synthesis speech processing is performed from the segment data stored in the storage area MS1.

以上の構成を有するカーナビゲーション装置１での音声合成処理について図４に示すフロー図を用いて説明する。なお、以下の処理はプログラムに従いＣＰＵ１０により処理されるものである。合成するテキストデータはナビゲーション部３の処理によるナビゲーション（経路案内）のテキストデータとする。通信部４の処理に基づく電子メールのテキストデータ及びセンサ部５の処理による警告案内のテキストデータについても同様の処理が行われる。 The speech synthesis process in the car navigation apparatus 1 having the above configuration will be described with reference to the flowchart shown in FIG. The following processing is performed by the CPU 10 according to a program. The text data to be synthesized is text data for navigation (route guidance) by the processing of the navigation unit 3. Similar processing is performed for text data of e-mail based on processing of the communication unit 4 and text data of warning guidance by processing of the sensor unit 5.

カーナビゲーション装置１のナビゲーション部３が起動中に右折予定地点の接近を検知すると、ＣＰＵ１０はトリガー信号を受信する（ステップＳ１０１）。トリガー信号を受信したＣＰＵ１０は、表示部８に表示する案内画像データを出力する処理を行う。また同時に、ＲＯＭ１１からトリガー信号（音声合成処理要求）に対応するテキストデータを読み出し、音声合成部６の言語処理部２０に供給するとともに、トリガー信号の種類毎に対応するキャラクタ種別コードを示すテーブルを参照してキャラクタの選択を行いナビゲーション部３と対応するキャラクタ種別コードをＲＯＭ１１から読み出し、キャラクタ切替処理部２２にキャラクタ種別データを供給する（ステップＳ１０２）。 When the navigation unit 3 of the car navigation apparatus 1 detects the approach of the right turn scheduled point during activation, the CPU 10 receives a trigger signal (step S101). The CPU 10 that has received the trigger signal performs a process of outputting guide image data to be displayed on the display unit 8. At the same time, the text data corresponding to the trigger signal (speech synthesis processing request) is read from the ROM 11 and supplied to the language processing unit 20 of the speech synthesis unit 6, and a table indicating the character type code corresponding to each type of trigger signal is provided. The character is selected with reference to read out the character type code corresponding to the navigation unit 3 from the ROM 11, and the character type data is supplied to the character switching processing unit 22 (step S102).

言語処理部２０では、供給されたテキストデータに含まれる表意文字列の形態素解析を実行し、辞書データを参照しながら表音文字列に変換する（ステップＳ１０３）。また、変換された表音文字列を音片編集部２１に供給する。なお、このとき音片編集部２１には、キャラクタ切替処理部２２から表音文字列毎のキャラクタ種別コードが供給されている。 The language processing unit 20 performs morphological analysis of the ideographic character string included in the supplied text data, and converts it into a phonetic character string with reference to the dictionary data (step S103). The converted phonetic character string is supplied to the sound piece editing unit 21. At this time, the character type code for each phonetic character string is supplied from the character switching processing unit 22 to the sound piece editing unit 21.

音片編集部２１では、キャラクタ切替処理部２２から供給されたキャラクタ種別コード（例えば、「００」）に基づいて音片データベース３０の記憶領域Ｍ１にアクセスし、表音文字列に対応する音片データの検索を行う（ステップＳ１０４）。 The sound piece editing unit 21 accesses the storage area M1 of the sound piece database 30 based on the character type code (for example, “00”) supplied from the character switching processing unit 22, and the sound piece corresponding to the phonetic character string. Data search is performed (step S104).

ここで、対象とする音片データが存在するときは（ステップＳ１０５：ＹＥＳ）、音片編集部２１に該当する音片データを供給する（ステップＳ１０６）。 Here, when the target sound piece data exists (step S105: YES), the sound piece data corresponding to the sound piece editing unit 21 is supplied (step S106).

次いで、検索するデータが以上であるかの判断がなされる（ステップＳ１０９）。他に検索するデータが無い場合には（ステップＳ１０９：ＹＥＳ）、音片編集部２１にて音声合成処理を行い、合成音声データを出力する（ステップＳ１１０）。 Next, it is determined whether or not the data to be searched is the above (step S109). If there is no other data to be searched (step S109: YES), the speech piece editing unit 21 performs speech synthesis processing and outputs synthesized speech data (step S110).

一方、ステップＳ１０５にて表音文字列に対象とする音片データが存在しない場合には（ステップＳ１０５：ＮＯ）、この表音文字列を音響処理部２３に供給する（ステップＳ１０７）。 On the other hand, when there is no target piece data in the phonetic character string in step S105 (step S105: NO), the phonetic character string is supplied to the acoustic processing unit 23 (step S107).

音響処理部２３では、供給された表音文字列のキャラクタ種別コードに基づいて、素片データベース３１の記憶領域ＭＳ１にアクセスし、素片データを読み込んで規則合成音声処理を行う。即ち、記憶領域ＭＳ１には記憶領域Ｍ１と同一のキャラクタの発声による素片データが記憶されている。その後、生成した規則合成音声データを音片編集部２１に供給する（ステップＳ１０８）。 The acoustic processing unit 23 accesses the storage area MS1 of the segment database 31 based on the supplied character type code of the phonetic character string, reads the segment data, and performs rule synthesis speech processing. That is, the segment data by the utterance of the same character as the storage area M1 is stored in the storage area MS1. Thereafter, the generated rule synthesized speech data is supplied to the sound piece editing unit 21 (step S108).

音片編集部２１では音片データからあるいは音片データ及び規則合成音声データから音声合成を行い、合成音声データを音声出力部９に出力する（ステップＳ１１０）。 The sound piece editing unit 21 performs sound synthesis from the sound piece data or from the sound piece data and the rule synthesized voice data, and outputs the synthesized voice data to the voice output unit 9 (step S110).

次に、特定のテキストデータを音声合成部６にて処理している途中で他のテキストデータの音声合成処理要求（トリガー信号）が重複した場合に、カーナビゲーション装置１が行う動作について、図５に示すフロー図を用いて説明する。
なお、以下の処理はプログラムに従ってＣＰＵ１０が処理するものである。また、音声合成部６にはナビゲーション部３の処理に基づいてナビゲーション（経路案内）のテキストデータが音声合成処理されているものとし、このテキストデータの処理中に、電子メールの受信が行われたものとする。処理の優先順位は電子メールのテキストデータの音声合成要求が高いものとする。 Next, FIG. 5 shows operations performed by the car navigation apparatus 1 when voice synthesis processing requests (trigger signals) of other text data are duplicated while specific text data is being processed by the voice synthesis unit 6. It demonstrates using the flowchart shown in FIG.
The following processing is performed by the CPU 10 according to a program. Further, it is assumed that text data for navigation (route guidance) is subjected to voice synthesis processing in the voice synthesis unit 6 based on the processing of the navigation unit 3, and an e-mail is received during the processing of the text data. Shall. It is assumed that the priority of processing is a high demand for speech synthesis of e-mail text data.

通信部４が電子メールデータの着信を検出すると、電子メールの着信を示すトリガー信号を制御部２に送信する（ステップＳ２０１）。 When the communication unit 4 detects an incoming e-mail data, it transmits a trigger signal indicating the incoming e-mail to the control unit 2 (step S201).

このトリガー信号を受信したＣＰＵ１０は、現在音声合成部６で処理中のテキストデータの有無を判断する（ステップＳ２０２）。現在処理中のテキストデータが無い場合は（ステップＳ２０２：ＮＯ）、電子メールのテキストデータを音声合成部６に供給し音声合成が行われ、音声出力を行う（ステップＳ２１０）。 Receiving this trigger signal, the CPU 10 determines whether there is text data currently being processed by the speech synthesizer 6 (step S202). If there is no text data currently being processed (step S202: NO), the text data of the e-mail is supplied to the speech synthesizer 6, speech synthesis is performed, and speech output is performed (step S210).

現在処理中のテキストデータがある場合には（ステップＳ２０２：ＹＥＳ）、優先順位を定めたテーブルを参照して、ナビゲーションのテキストデータ処理と電子メールのテキストデータとの何れを優先するかを判別する（ステップＳ２０３）。なお、優先順位を定める基準としては電子メールの送受信時刻、送受信者名、件名等が考えられ、それらを用いた優先度の設定に関しては、ユーザが決定できる者としても良い。 If there is text data that is currently being processed (step S202: YES), it is determined by referring to the table in which the priority order is determined whether priority is given to text data processing for navigation or text data for e-mail. (Step S203). Note that the e-mail transmission / reception time, the sender / receiver name, the subject name, and the like can be considered as the criteria for determining the priority order, and the priority setting using them may be determined by the user.

なお、電子メールのテキストデータが、優先順位が低く設定されている場合には（ステップＳ２０３：ＮＯ）、ナビゲーションのテキストデータの処理を優先する。この処理が終了するまでは電子メールのテキストデータの処理は待機状態となる。このときテキストデータの音声合成処理がまだ必要かが判断される（ステップＳ２１１）。即ち、テキストデータに付された処理待機時間コードと実際の待機時間を比較し、実際の待機時間が処理待機時間を超過する場合に（必要でない場合に（ステップＳ２１１：ＮＯ））、電子メールのテキストデータを破棄する（ステップＳ２１２）。一方、超過しない場合に（必要である場合に（ステップＳ２１１：ＹＥＳ）、ステップＳ２０１に戻り、ナビゲーションのテキストデータの音声処理終了後に音声合成処理が行われる（ステップＳ２１０）。 If the text data of the e-mail is set with a low priority (step S203: NO), the navigation text data is prioritized. Until this process is completed, the text data processing of the e-mail is in a standby state. At this time, it is determined whether speech synthesis processing of text data is still necessary (step S211). That is, the processing standby time code attached to the text data is compared with the actual standby time, and when the actual standby time exceeds the processing standby time (when not necessary (step S211: NO)), the e-mail The text data is discarded (step S212). On the other hand, if it is not exceeded (if necessary (step S211: YES)), the process returns to step S201, and the speech synthesis process is performed after the speech process of the navigation text data is completed (step S210).

優先されるのは電子メールのテキストデータとされているので（ステップＳ２０３：ＹＥＳ）、ナビゲーションのテキストデータの処理を一時中止し（ステップＳ２０４）、ナビゲーションのテキストデータの処理状況（ログ）をＲＡＭ１２に送信し記録する（ステップＳ２０５） Since the text data of the e-mail is prioritized (step S203: YES), the processing of the text data for navigation is temporarily stopped (step S204), and the processing status (log) of the text data for navigation is stored in the RAM 12. Transmit and record (step S205)

その後、電子メールのテキストデータの処理を開始する（ステップＳ２０６）。即ち、音声合成部６に電子メールのテキストデータを供給し、内容を表す合成音声データを生成して出力する。 Thereafter, processing of text data of the e-mail is started (step S206). That is, the text data of the e-mail is supplied to the voice synthesizer 6 to generate and output synthesized voice data representing the contents.

電子メールのテキストデータの処理が終了したかを判別し（ステップＳ２０７）、終了している場合に（ステップＳ２０７：ＹＥＳ）、ＲＯＭ１１にアクセスしログの確認を行う（ステップＳ２０８）。 It is determined whether the processing of the text data of the e-mail has been completed (step S207), and if it has been completed (step S207: YES), the ROM 11 is accessed to check the log (step S208).

ステップＳ２０８でログを確認し、ナビゲーションのテキストデータの音声処理がまだ必要かが判断される（ステップＳ２０９）。即ち、ナビゲーションの内容によっては、音声処理を行っても既に情報として意味が無い場合があるからである。例えば、音声情報がパーキングエリアへの接近を知らせるものであったとする。電子メールの合成音声を出力している間にパーキングエリアを通過してしまっていると、もはやこの情報を音声出力する必要がなくなる。このような場合にはナビゲーションのテキストデータを破棄する（ステップＳ２１３）。具体的な破棄の判断は、ステップＳ２１１の判断と同様である。即ち、ナビゲーションのテキストデータの処理待機時間データ、ログに記録された時間及び現在時刻とを比較し判断する。 In step S208, the log is confirmed, and it is determined whether speech processing of navigation text data is still necessary (step S209). That is, depending on the content of navigation, even if voice processing is performed, it may not be meaningful as information. For example, it is assumed that the audio information is for notifying the approach to the parking area. If it passes through the parking area while outputting the synthesized voice of the e-mail, it is no longer necessary to output this information as a voice. In such a case, the navigation text data is discarded (step S213). The specific discard determination is the same as the determination in step S211. That is, the determination is made by comparing the processing standby time data of the navigation text data, the time recorded in the log, and the current time.

一方、ナビゲーションが渋滞情報等である場合は、電子メールの合成音声の出力後も有効な情報であるので、ログに記録された処理状況に基づいて続きから音声合成を行い合成音声の出力を行う（ステップＳ２１０）。なお、上記破棄の判断として、車両の現在位置とナビゲーション案内が対象としているエリアとを比較するようにしてもよい。 On the other hand, if the navigation is traffic jam information or the like, the information is valid after the output of the synthesized voice of the e-mail, so the synthesized voice is output from the subsequent voice synthesis based on the processing status recorded in the log. (Step S210). Note that, as the determination of the discard, the current position of the vehicle and the area targeted by the navigation guidance may be compared.

以上、本発明を適用したカーナビゲーション装置１によれば、発声する案内の内容によってキャラクタを変更することができる。車両等の運転中にナビゲーション画面を注視することは危険であり、視覚による情報の獲得には限界がある。また、音声案内も画一的な単一キャラクタの音声のみでは聞き逃すという問題もある。このような問題に対し、カーナビゲーション装置１は合成音声の発声にバリエーションを与え、キャラクタの違いにより案内内容の差別化を図ることができる。 As described above, according to the car navigation apparatus 1 to which the present invention is applied, the character can be changed according to the content of the guidance to be uttered. It is dangerous to watch the navigation screen while driving a vehicle or the like, and there is a limit to the acquisition of information by vision. There is also a problem that voice guidance is missed only by a uniform single character voice. In response to such a problem, the car navigation apparatus 1 can vary the utterance of the synthesized speech, and can differentiate the guidance contents depending on the character.

また、予め音声合成処理の優先処理を設定し、この優先処理設定に基づいて適宜必要な音声案内を出力することができる。車両等の運転中における音声案内の重要性を鑑みれば、優先して必要な音声を異なるキャラクタの音声で聞くことは、情報の取得、認知、把握に優れ有効である。 Also, priority processing for voice synthesis processing can be set in advance, and necessary voice guidance can be output as appropriate based on this priority processing setting. In view of the importance of voice guidance during driving of a vehicle or the like, it is excellent and effective in obtaining, recognizing, and grasping information to listen to a necessary voice with different voices.

更に、カーナビゲーションシステムの技術分野ではナビゲーション案内だけでなく、電子メール機能や映像音声出力機能等が負荷されて多機能化される傾向にある。これら機能が音声合成用に出力するテキストデータは多種多様になるが、カーナビゲーション装置１は複数キャラクタの音声合成を可能とし又ハイブリッド型音声合成方式を採用するため、これら各種の機能から出力される様々な口調や言い回しのテキストデータに容易に対応することができる。 Furthermore, in the technical field of car navigation systems, not only navigation guidance but also an electronic mail function and a video / audio output function tend to be loaded and become multifunctional. The text data output for speech synthesis by these functions is various, but the car navigation device 1 is capable of synthesizing a plurality of characters and adopts a hybrid speech synthesis system. It can easily handle text data of various tone and wording.

以上、本発明を実施するための最良の形態について説明したが、本発明は上記種々の例に限定されるものではない。 As mentioned above, although the best form for implementing this invention was demonstrated, this invention is not limited to the said various example.

本発明を適用したカーナビゲーション装置の構成を示した概要図である。It is the schematic which showed the structure of the car navigation apparatus to which this invention is applied. 図１に示すカーナビゲーション装置の音声合成部の機能的構成を示したブロック図であるIt is the block diagram which showed the functional structure of the speech synthesizer of the car navigation apparatus shown in FIG. 図１に示すカーナビゲーション装置の音声合成処理を示したフロー図である。It is the flowchart which showed the speech synthesis process of the car navigation apparatus shown in FIG. 図１に示すカーナビゲーション装置における音声合成処理が重複した場合の処理手順を示したフロー図である。It is the flowchart which showed the process sequence when the speech synthesis process in the car navigation apparatus shown in FIG. 1 overlaps. 図１に示すカーナビゲーション装置の割り込み音声合成処理を示したフロー図である。It is the flowchart which showed the interruption speech synthesis process of the car navigation apparatus shown in FIG.

符号の説明Explanation of symbols

１カーナビゲーション装置
２制御部
３ナビゲーション部
４通信部
５センサ部
６音声合成部
７入力部
８表示部
９音声出力部
１０ＣＰＵ
１１ＲＯＭ
１２ＲＡＭ
１３ＧＰＳ信号受信部
１４タッチパネル
１５操作キー
２０言語処理部
２１音片編集部
２２キャラクタ切替処理部
２３音響処理部
２４音片データ検索部
２５素片データ検索部
３０音片データベース
３１素片データベース
Ｍ１，Ｓ１，Ｓ２，Ｓｎキャラクタ毎の音片データ記憶領域
ＭＳ１，ＳＳ１，ＳＳ２，ＳＳｎキャラクタ毎の素片データ記録領域 DESCRIPTION OF SYMBOLS 1 Car navigation apparatus 2 Control part 3 Navigation part 4 Communication part 5 Sensor part 6 Voice synthesis part 7 Input part 8 Display part 9 Voice output part 10 CPU
11 ROM
12 RAM
13 GPS signal receiving unit 14 Touch panel 15 Operation key 20 Language processing unit 21 Sound piece editing unit 22 Character switching processing unit 23 Acoustic processing unit 24 Sound piece data searching unit 25 Fragment data searching unit 30 Sound piece database 31 Segment database M1, S1, S2, Sn Sound piece data storage area for each character MS1, SS1, SS2, SSn Segment data recording area for each character

Claims

案内内容を示す文章データに対応する音声案内を出力する音声合成装置において、
複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶手段と、
前記音片データ記憶手段から前記文章データと対応する音片データを検出する音片データ検出手段と、
前記音片データ検出手段が音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する検索指定手段と、
前記音片データ検出手段が検出した音片データから合成音声データを生成する音片編集手段と、
前記音片編集手段が生成した合成音声データを音声として出力する音声出力手段と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する優先順位比較手段と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う処理状況記録手段と、
を備え、
前記文章データ毎に、前記処理状況記録手段に一時記録される長さを表した時間コードが更に付されており、
前記処理状況記録手段は、前記優先順位が低い文章データの一時記録を行った時間を記憶し、
前記優先順位比較手段の比較の結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記処理状況記録手段の一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行い、
前記優先順位が低い文章データの再合成処理の際、前記処理状況記録手段の記録した時間と現在時刻との間隔が前記時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わないことを特徴とする音声合成装置。 In a speech synthesizer that outputs voice guidance corresponding to text data indicating guidance content,
Sound piece data storage means for storing sound piece data by a plurality of characters with an identifier for each character;
Sound piece data detecting means for detecting sound piece data corresponding to the sentence data from the sound piece data storage means;
When the sound piece data detecting means detects sound piece data, search specifying means for specifying a search for sound piece data having the identifier corresponding to the character type code included in the sentence data;
Sound piece editing means for generating synthesized speech data from sound piece data detected by the sound piece data detecting means;
Voice output means for outputting the synthesized voice data generated by the sound piece editing means as voice;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A priority comparison means for comparing ranks;
And processing status recording device intends row a temporary recording processing status of the text data is voice synthesis was done before the interruption synthesis process of the another text data,
With
For each sentence data, a time code representing a length temporarily recorded in the processing status recording unit is further attached,
The processing status recording means stores the time when the sentence data with low priority is temporarily recorded,
As a result of the comparison by the priority comparison means, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the synthesis of the other sentence data is finished , Based on the temporary recording of the processing status recording means, performs speech synthesis of the text data that has been subjected to the speech synthesis processing before the interrupt synthesis processing of the other text data,
When re-synthesizing text data with low priority, if the interval between the time recorded by the processing status recording means and the current time exceeds the value of the time code, re-synthesizing text data with low priority A speech synthesizer characterized by not performing the above.

請求項１に記載の音声合成装置において、
音声合成要求を示すトリガー信号を受信した際、前記トリガー信号と前記キャラクタとの対応付けが行われたテーブルに基づいてキャラクタを選択し、選択したキャラクタに対
応するキャラクタ種別コードを前記文章データに含めるキャラクタ選択・指示手段を更に備えることを特徴とする音声合成装置。 The speech synthesis apparatus according to claim 1,
When a trigger signal indicating a voice synthesis request is received, a character is selected based on a table in which the trigger signal and the character are associated with each other, and a character type code corresponding to the selected character is included in the sentence data A speech synthesizer further comprising character selection / instruction means.

請求項１又は２に記載の音声合成装置において、
前記音片データのキャラクタに対応するキャラクタの素片データをキャラクタ毎に識別子を付して記憶する素片データ記憶手段と、
前記音片データ検出手段が前記文章データと対応する音片データを検出する際、前記音片データ記憶手段に対応する音片データがない場合に、前記素片データに基づいて前記音片データのない表音文字列の音声データを生成する音響処理手段を更に備え、
前記音片編集手段は、前記音片記憶手段から検出した音片データと音響処理手段により生成された音声データとから音声データを合成することを特徴とする音声合成装置。 The speech synthesizer according to claim 1 or 2,
Segment data storage means for storing the segment data of the character corresponding to the character of the speech segment data with an identifier for each character;
When the sound piece data detecting means detects the sound piece data corresponding to the sentence data, if there is no sound piece data corresponding to the sound piece data storage means, the sound piece data is detected based on the element data. Further comprising acoustic processing means for generating speech data of no phonogram string,
The speech synthesis apparatus, wherein the speech piece editing means synthesizes speech data from the speech piece data detected from the speech piece storage means and the speech data generated by the acoustic processing means.

複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶手段と、前記音片データ記憶手段から前記文章データと対応する音片データを検出する音片データ検出手段と、により案内内容を示す文章データに対応する音声案内を音声合成して出力する音声合成方法であって、
前記音片データ検出手段が音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する工程と、
前記音片データ検出手段が検出した音片データから合成音声データを生成する工程と、
前記生成した合成音声データを音声として出力する工程と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する工程と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う工程と、
前記優先順位が低い文章データの一時記録を行った時間を記憶する工程と、
前記優先順位を比較した結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行う工程と、
前記優先順位が低い文章データの再合成処理の際、前記記録した時間と現在時刻との間隔が、前記文章データ毎に付され、前記一時記録される長さを表した時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わない工程と、
を備えることを特徴とする音声合成方法。 Sound piece data storage means for storing sound piece data by a plurality of characters with an identifier for each character, and sound piece data detection means for detecting sound piece data corresponding to the sentence data from the sound piece data storage means , A voice synthesis method for synthesizing and outputting voice guidance corresponding to sentence data indicating guidance content,
Designating the search for sound piece data having the identifier corresponding to the character type code included in the sentence data when the sound piece data detecting means detects the sound piece data;
Generating synthesized speech data from the speech piece data detected by the speech piece data detecting means;
Outputting the generated synthesized speech data as speech;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A step of comparing ranks;
The other of a temporary record of the processing status of the text data is speech synthesis had been performed before the interrupt synthesis processing of the text data as the line cormorant Engineering,
Storing the time when temporary recording of the sentence data with low priority is performed;
As a result of comparing the priorities, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the speech synthesis of the other sentence data is finished, serial and performing speech synthesis of text data processing speech synthesis has been performed before the interruption synthesis process of the other sentence data based on the temporary recording,
During the priority resynthesis processing low text data, prior to the interval between Kiki recording the time and the current time is assigned to each of the sentence data, the temporary recording The time code value that represents the length If it exceeds, the step of not re-synthesizing the sentence data with low priority,
A speech synthesis method comprising:

請求項４に記載の音声合成方法において、
音声合成要求を示すトリガー信号を受信した際、前記トリガー信号と前記キャラクタとの対応付けが行われたテーブルに基づいてキャラクタを選択し、選択したキャラクタに対応するキャラクタ種別コードを前記文章データに含める工程を更に備えることを特徴とする音声合成方法。 The speech synthesis method according to claim 4,
When a trigger signal indicating a voice synthesis request is received, a character is selected based on a table in which the trigger signal and the character are associated with each other, and a character type code corresponding to the selected character is included in the sentence data A speech synthesis method, further comprising a step.

請求項４又は５に記載の音声合成方法において、
前記音片データのキャラクタに対応するキャラクタの素片データをキャラクタ毎に識別子を付して記憶する工程と、
前記音片データ検出手段が前記文章データと対応する音片データを検出する際、前記音片データ記憶手段に対応する音片データがない場合に、前記素片データに基づいて前記音片データのない表音文字列の音声データを生成する工程と、
を備え、
前記音片データ検出手段が検出した音片データから合成音声データを生成する際に、前
記音片記憶手段から検出した音片データと音響処理手段により生成された音声データとから音声データを合成することを特徴とする音声合成方法。 The speech synthesis method according to claim 4 or 5,
Storing the fragment data of the character corresponding to the character of the sound piece data with an identifier for each character;
When the sound piece data detecting means detects the sound piece data corresponding to the sentence data, if there is no sound piece data corresponding to the sound piece data storage means, the sound piece data is detected based on the element data. Generating speech data of no phonetic character string;
With
When generating the synthesized voice data from the voice piece data detected by the voice piece data detecting means, the voice data is synthesized from the voice piece data detected by the voice piece storage means and the voice data generated by the acoustic processing means. A speech synthesis method characterized by the above.

案内内容を示す文章データに対応する音声案内を出力するコンピュータに、
複数のキャラクタによる音片データをキャラクタ毎に識別子を付して記憶する音片データ記憶機能と、
前記音片データ記憶機能から前記文章データと対応する音片データを検出する音片データ検出機能と、
前記音片データ検出機能により音片データを検出する際、前記文章データに含まれるキャラクタ種別コードと対応する前記識別子を有する音片データの検索を指定する検索指定機能と、
前記音片データ検出機能により検出した音片データから合成音声データを生成する音片編集機能と、
前記音片編集機能が生成した合成音声データを音声として出力する音声出力機能と、
前記文章データの音声合成が行われているときに、該文章データと異なる他の文章データの割込み合成処理を行う場合に、前記音声合成を行っている文章データと前記他の文章データとの優先順位を比較する優先順位比較機能と、
前記他の文章データの割込み合成処理の前に音声合成が行われていた文章データの処理状況の一時記録を行う処理状況記録機能と、
前記処理状況記録手段により、前記優先順位が低い文章データの一時記録を行った時間を記憶する機能と、
前記優先順位比較手段の比較の結果、割込み合成処理を行う前記他の文章データの優先順位が高い場合、該他の文章データの音声合成を行い、当該他の文章データの音声合成が終了した後に、前記処理状況記録機能の一時記録に基づいて前記他の文章データの割込み合成処理前に音声合成の処理が行われていた文章データの音声合成を行う機能と、
前記優先順位が低い文章データの再合成処理の際、前記処理状況記録機能の記録した時間と現在時刻との間隔が、前記文章データ毎に付され、前記処理状況記録機能に一時記録される長さを表した時間コードの値を超える場合、前記優先順位が低い文章データの再合成処理を行わない機能と、を実現させることを特徴とするプログラム。 To the computer that outputs the voice guidance corresponding to the text data indicating the guidance content,
Sound piece data storage function for storing sound piece data by a plurality of characters with identifiers for each character,
A sound piece data detection function for detecting sound piece data corresponding to the sentence data from the sound piece data storage function;
A search designation function for designating a search for sound piece data having the identifier corresponding to a character type code included in the sentence data when detecting sound piece data by the sound piece data detection function;
A sound piece editing function for generating synthesized voice data from the sound piece data detected by the sound piece data detection function;
A voice output function for outputting the synthesized voice data generated by the sound piece editing function as voice;
When speech synthesis of the text data is performed, priority is given to the text data being speech-synthesized and the other text data when performing interrupt synthesis processing of other text data different from the text data A priority comparison function for comparing ranks;
And processing status recording function intends row a temporary recording processing status of the text data is voice synthesis was done before the interruption synthesis process of the another text data,
A function for storing the time when the sentence data having a low priority is temporarily recorded by the processing status recording means;
As a result of the comparison by the priority comparison means, if the priority of the other sentence data to be interrupted is high, the other sentence data is synthesized, and after the synthesis of the other sentence data is finished , A function for performing speech synthesis of text data that has been subjected to speech synthesis processing before the interrupt synthesis processing of the other text data based on the temporary recording of the processing status recording function ;
At the time of the recombination processing of the sentence data with low priority, an interval between the time recorded by the processing status recording function and the current time is assigned to each text data and is temporarily recorded in the processing status recording function A program that realizes a function of not re-synthesizing sentence data having a low priority when the value of a time code representing the above is exceeded.

請求項７に記載のプログラムを格納したコンピュータ読み取り可能な情報記録媒体 A computer-readable information recording medium storing the program according to claim 7