JP3983313B2

JP3983313B2 - Speech synthesis apparatus and speech synthesis method

Info

Publication number: JP3983313B2
Application number: JP01039996A
Authority: JP
Inventors: 良明寺本; 伸之片江; 秀敏辻内; 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-01-24
Filing date: 1996-01-24
Publication date: 2007-09-26
Anticipated expiration: 2016-01-24
Also published as: JPH09204434A

Abstract

PROBLEM TO BE SOLVED: To exactly read sentence information, in which plural kinds of reading are existent corresponding to one description, by providing a dictionary in hierarchical structure hierarchizing the information of descriptions and reading of addresses, etc., for which plural kinds of reading are existent corresponding to one description. SOLUTION: Corresponding to the input of a sentence input part 1, a hierarchical dictionary retrieval part 2 retrieves candidates for the reading of a word contained in inputted sentence information from a stored hierarchical dictionary 3 together with the information in hierarchical structure hierarchizing the information showing the descriptions and reading of words according to the connection order of words by words such as addresses composed of word groups for which the reading of a word to appear next is decided by a word to appear first. Based on the information in the hierarchical structure, a sentence parsing part 4 selects the reading of a word string coincident with the word string in the hierarchical dictionary 3 contained in the sentence information out of the candidates of reading and converts the selected word string to the reading of the dictionary 3. A voice waveform generating part 6 generates a voice waveform from the information of reading. Thus, the word string of the address or the like is read in correct reading.

Description

【０００１】
【発明の属する技術分野】
本発明は、一つの表記に複数の読みが存在する住所の文章情報を正確な読みで読み上げる音声合成装置及び音声合成方法に関する。
【０００２】
【従来の技術】
従来の音声合成装置では、文章情報が入力されると、単語の表記に対応付けて、単語の読み、音声を合成するためのアクセント等の情報が格納されている辞書を参照して文章情報を読みに変換し、読みの情報及びアクセント等の情報から音声波形を生成して文章情報を合成音声で読み上げる。
【０００３】
【発明が解決しようとする課題】
しかし、一つの表記に複数の読みが存在する住所等が文章情報に含まれている場合、例えば「本町」という表記の文章情報を「ホ′ンチョー」「ホ′ンマチ」「モト′マチ」（「′」はアクセント記号）といった複数の読みのいずれの読みで読み上げるべきかを正確に決定できずに誤った読みで読み上げてしまう可能性がある。
【０００４】
本発明はこのような問題点を解決するためになされたものであって、一つの表記に対して複数の読みが存在する住所の表記及び読みの情報を階層化した階層構造の辞書を持つことにより、一つの表記に対して複数の読みが存在する住所の文章情報を正確な読みで読み上げる音声合成装置及び音声合成方法の提供を目的とする。
【０００５】
【課題を解決するための手段】
図１は本発明の音声合成装置の基本ブロック図である。
ＣＤ−ＲＯＭ，光磁気ディスク等の記録媒体から直接的に、又は公衆回線等を介して文章情報を入力する文章入力部１が文章情報を入力すると、階層辞書検索部２は、先に出現する住所を表わす単語によって該単語の次に出現する住所を表わす単語の読みが予め決められている住所を表わす単語群からなる単語列の各単語の表記及び読みの情報が、単語を例えば都道府県・市郡・町村区の順に階層化した階層構造の情報とともに格納されている階層辞書３から、入力された文章情報に含まれている文字列の中から住所の単語列に一致する単語列の読みの候補を検索し、文章解析部４が、読みの候補の中から、文章情報に含まれている、階層辞書３中の単語列の中から住所の単語列に一致する単語列の読みを階層構造の情報に基づいて選択し、この単語列を階層辞書３中の読みに変換する文章解析部４と、読みの情報から音声波形を生成する音声波形生成部６と、生成した音声波形に係る音声を出力するスピーカ７とを備える。
これにより、一つの表記に複数の読みが存在する住所を表わす単語列を正しい読みで読み上げる。
【０００６】
また、本発明の音声合成装置は、地域を特定する情報の入力、文章情報に含まれる地名からの判定等に基づいて地域を指定することにより、指定された地域に属する階層から検索を開始して検索時間を短縮する。
【０００７】
また、本発明の音声合成装置は、階層構造の情報が、その読みを決定する各単語の上位の階層の親単語を特定する情報である階層辞書と、階層辞書から検索した読みの候補のこの情報を参照して文章情報に含まれる文字列の中から住所の単語列に一致する単語列の読みの候補の接続関係を設定する。
【０００８】
また、本発明の音声合成装置は、単語列のいずれかの階層の単語の表記が省略されている単語列を基に階層辞書を検索することにより、文書中の単語列の一部が省略されている場合でも正しい読みで読み上げる。
【０００９】
また、本発明の音声合成装置は、所定数以上の文字又は単語の表記が階層辞書に格納されている単語列に含まれる該文字又は単語の表記と一致する場合に階層辞書の読みを該文字又は単語の読みと判定することにより、複数の読みが存在する住所の単語列以外で、その表記がこの単語列の一部と一致する一般の単語を住所の読みで誤って読み上げることがない。
【００１０】
また、本発明の音声合成装置は、単語列に接続される接尾語の表記及び読みの情報が格納されている接尾語辞書を設け、階層辞書中の単語列と一致する単語列の直後の文章情報の表記と一致する表記の情報を接尾語辞書から検索し、直後の表記に一致する接尾語辞書の表記の読みを、直後の表記の読みとして選択することにより、この単語列に接続されることによって一般の単語の読みと異なる読みになる接尾語を正しい読みで読み上げる。
【００１１】
【発明の実施の形態】
図２は本発明の音声合成装置の一例を示す模式図である。
この例の音声合成装置は、音声出力機能を有する汎用のパーソナルコンピュータのディスクドライブに、以下に述べるような、住所辞書，住所接尾語辞書等の辞書及び音声合成方法のコンピュータプログラムが記録されている光磁気ディスク，ＣＤ−ＲＯＭ等の記録媒体Ｄを装填してコンピュータプログラムをローディングし、文章データベースのディスクから又は公衆回線（図示省略）を介して入力された文章情報から音声を合成し、コンピュータ本体に接続されているスピーカから、又は公衆回線を介して合成音声を出力する構成である。
【００１２】
なお、本発明の音声合成装置は上述のように汎用のパーソナルコンピュータにソフトウェアをローディングする構成以外に、ＦＭ多重放送から受信した交通情報等の文字列情報を合成音声で読み上げるような音声合成専用機であってもよく、その場合、文章情報はアンテナを介して入力される。
【００１３】
〔実施の形態１〕
図３は本発明の音声合成装置の実施の形態１の構成図である。
文章入力部101 はＣＤ−ＲＯＭ，光磁気ディスク等から文章情報を入力し、住所辞書検索部102 は、文章情報に含まれる住所の単語列に一致する単語列を、住所の単語列の表記及び読みの情報が階層化され、この階層構造の情報とともに階層構造の情報が格納されている階層構造の住所辞書103 から検索する。
文章解析部104 は、住所辞書103 から検索された単語の読みを住所辞書103 の読みとし、それ以外の一般の単語に一致する単語を基本辞書105 から検索して文章情報を読みに変換し、音声波形生成部106 は読み及びアクセント情報等から音声波形を生成し、スピーカ107 から文章情報を読み上げる合成音声が出力される。
【００１４】
図４は、住所辞書103 の一例の概念図である。
住所辞書103 には、住所の単語列を構成する各単語の表記及び読みの情報が階層化されて格納されている。階層の先頭は４７都道府県名であり、それ以下の階層として、市区町又は郡の後に町又は村の地名、さらに大字、小字などの地名がその接続順に階層化されている。各々の単語に関してはこのような階層情報の他に、検索キーとなる漢字表記、アクセント句境界情報、アクセント型等の韻律情報を含んだ発音情報、即ち読みの情報を持っている。
【００１５】
図５は実施の形態１における住所辞書103 の基本的な検索アルゴリズムのフローチャートである。
まず、入力された文章情報を格納するテキストバッファの先頭にテキストポインタを設定し（Ｓ101 ）、住所辞書検索ポインタを住所辞書103 の階層構造の先頭に設定する（Ｓ102 ）。テキストポインタを一文字ずつずらしながらそのテキストポインタ位置で始まる単語を住所辞書103 中の検索候補の単語と比較して一致するか否かを判定し（Ｓ103 ，104 ）、住所辞書103 中の単語がテキストバッファ内に存在するか否かを調べていく。住所辞書103 に、テキストポインタで始まる単語が存在している場合は、住所辞書検索ポインタを次の階層に設定するとともに（Ｓ105 ）、テキストポインタを次の単語位置に設定し（Ｓ106 ）、テキストの次の単語位置に次の階層の単語が存在するかどうかを調べていく（Ｓ107 ，Ｓ108 ）。
【００１６】
住所辞書103 中の単語に一致する単語が存在しなくなったら、住所辞書103 中の単語に一致した区間を住所区間とみなし、一致した単語列に設定されている読みの列を、その住所区間の読みとして発音情報を設定し（Ｓ109 ）、文章解析部104 にわたす。また、住所区間以降の文章にも住所が含まれている可能性があるため、住所辞書検索ポインタを階層構造の先頭に設定し（Ｓ110 ）、同様の処理をテキスト情報の最後まで文章全体に対して行う（Ｓ111 ，Ｓ112 ）。
【００１７】
〔実施の形態２〕
図６は実施の形態２の構成図である。なお、上述の実施の形態１と同一部分には同一符号を付してその説明を省略する。実施の形態２では、文章解析部104 で使用する基本辞書105 中には単語が存在しないが、住所辞書103 の検索によって発音情報が既知である場合に、単語登録しないで、入力文章中に発音情報を発音指定文字列としてテキストに埋め込む住所発音設定部108 と、テキストに埋め込まれた発音指定文字列を識別して発音指定と解析する発音指定解析部109 とが設けられている。
【００１８】
ここで、発音指定文字列のフォーマットを「〈発音：漢字表記：発音情報〉」と定義した場合、「〈：〉」は、文章中で特殊な意味を持たせるための記号であり、「発音」という文字列は発音指定を識別するためのキーワードである。「漢字表記」は単なるコメントの役割で、「発音情報」にはカタカナ及びアクセント記号で表現された発音情報を記述する。
【００１９】
次に、動作について説明する。
文章入力部101 から、例えば「東京都大田区に住んでいます。」という文章が入力されると、住所辞書検索部102 は、実施の形態１と同様に住所辞書103 を検索し、「東京都大田区」の区間が住所単語列であり、その住所の発音が「トーキョ′ートオータ′ク」（「′」はアクセント記号、「」はアクセント区境界記号）であることを判定する。さらに、住所発音設定部108 では、入力された文章の住所の区間を前述の発音指定文字列に置換し、「〈発音：東京都大田区：トーキョ′ートオータ′ク〉に住んでいます。」という文字列を発音指定解析部109 に出力する。
【００２０】
発音指定解析部109 は、括弧記号（〈〉）で区切られた文字を識別して発音指定として解析し、その部分には、発音が「トーキョ′ートオータ′ク」である名詞が存在しているとする一方、その他の部分の文章はそのまま文章解析部104 にわたす。文章解析部104 は発音指定の情報とその他の文章とを解析して、正しい読み情報を設定する。
【００２１】
実施の形態２では、住所辞書検索部102 に住所発音設定部108 を接続し、住所発音設定部108 と文章解析部104 との間に、住所発音設定部108 からの入力経路の他に、文章入力部101 からの文章情報の入力経路を有する発音指定解析部109 を接続することにより、住所を含まない文章を読み上げる場合に、基本辞書105 を参照して発音情報に変換すべく文章情報を文章解析部104 に直接的に入力できる。即ち、図中、破線で囲んだ住所読み上げ部を独立した装置として構成したり、また文章情報の入力経路を選択的に使用することができる。
【００２２】
従って、第１に、住所読み上げ部を独立の装置として構成した場合には並列処理が可能になる。住所辞書103 の単語数は一般的に十万単語を超え、言語処理部の基本辞書105 の単語数は数万単語から十万単語を超える場合もあるので、検索処理の負荷が大きいが、このような構成にすれば、２個のＣＰＵで住所読み上げ部分の処理とその他の部分の音声合成のための言語処理とを並列処理することが可能になるため、処理時間の増加を防ぐことが可能である。
また、第２に、住所読み上げ部をソフトウェアで構成した場合には、住所読み部と言語処理部とのコマンドとして独立のコマンドを作成することができるので、ソフトウェアの保守作業、システム変更への対応が容易である。
【００２３】
〔実施の形態３〕
図７(a) は実施の形態３の構成図である。なお、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。
この実施の形態では、文章情報の入力の都度、地域名を設定する指定地域入力部110 と、指定された地域を階層構造の住所辞書103 の検索開始地点として保持しておく検索開始位置格納バッファ111 とが設けられており、地域名が指定されている場合は、指定されている地域に属する階層構造の各階層の検索から開始することによって、検索を行うために必要な処理時間を大幅に削減するものである。
【００２４】
住所は４７都道府県名から全部表記する場合もあるが、文脈などから都道府県名が自明であったり、よく知られている地名であるために都道府県名が省略できる地名であったりした場合は都道府県名を省略して表記することが多い。しかし、文章中に階層の途中から始まる住所が含まれている場合、階層化された住所辞書103 は途中の階層からの検索も可能であるが、階層が下がるにつれて検索対象の単語数は増え、検索対象の単語が数十万以上にも及ぶ可能性がある。
そのため、この実施の形態では、階層上のどの地点を起点にして検索を行うかという情報を地域で指定する。
【００２５】
図８は、例えば、指定地域入力部110 より「北海道旭川市」という地域が指定された場合の検索開始位置格納バッファ111 の概念図である。指定地域入力部110 より「北海道旭川市」という地域が指定された場合、指定地域が住所辞書検索部102 により検索され、「階層先頭」及び「北海道」及び「旭川市」という３つのそれぞれの階層構造上の検索ポインタが検索開始位置格納バッファ111 に格納される。住所辞書検索部102 は、検索開始位置格納バッファ111 中の各々の検索ポインタを起点として住所辞書103 を検索する。
なお、検索開始位置の情報は１地域に限らず、複数地域の情報を格納しておく構成であってもよい。
【００２６】
図７(b) は実施の形態３の変形例の構成図である。本変形例が実施の形態３と異なる点は、指定地域入力部110 に替えて、入力された文章情報中から地域の情報を獲得する指定地域獲得部112 が設けられている点である。指定地域獲得部112 は、住所辞書検索部102 で住所辞書103 が検索され、文章中の単語に一致する住所辞書103 中の単語列候補が存在した場合、文章中の地名の地域を獲得して検索開始位置格納バッファ111 に格納し、検索開始位置の情報は上述と同様に利用される。
【００２７】
〔実施の形態４〕
図９は本発明の音声合成装置の実施の形態４の構成図である。なお、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。
実施の形態４では、途中の階層から始まる地名を検索すべく、住所辞書の途中から以下の各階層を全て検索する方法をとった場合、検索に長時間を要するという不具合を解消するために階層構造の住所辞書103 を図10のような構成にするとともに、単語間の接続関係を求めるための接続関係設定部113 が設けられている。
【００２８】
即ち、図10の住所辞書103 では、階層構造の全ての単語が、表記及び読みの情報に、読みがこの読みになる、上位の階層の親単語を特定する親番号のような親情報を付与して親子関係で表現されている（例：親単語♯１，親単語♯２，…）。さらに、表記による検索の簡単のために表記のコード順にソーティングされている。このとき、一つの表記に異なる読みを持つ単語は複数の読みのそれぞれに親情報を持たせ、また異なる表記で同じ読みを持つ単語は各読みに複数の親情報を持たせることで情報量を圧縮することもできる。
【００２９】
図11は接続関係設定部113 のアルゴリズムのフローチャートであって、破線で囲んだステップは接続関係設定部113 での処理を示している。
検索文字位置を文章情報の最初の文字に設定し（Ｓ201 ）、開始位置で始まる単語を検索する（Ｓ202 ）。開始位置で始まる単語が存在するか否かを判定し（Ｓ203 ）、開始位置で始まる単語が存在しない場合は検索文字位置を次の文字に設定し（Ｓ204 ）、テキスト中の最後の文字か否かを判定する（Ｓ205 ）。テキスト中の最後の文字でない場合はステップＳ202 に移行し、開始位置で始まる単語を検索する（Ｓ202 ）。住所辞書検索部102 は文章に含まれる全ての単語に基づいて、住所辞書103 の全階層を検索して文章中の単語に一致する全候補を抽出する。
【００３０】
ステップＳ203 での判定の結果、開始位置で始まる単語が存在する場合、異なる読み、単語を持つ単語を分割する（Ｓ206 ）。
接続関係設定部113 は、全ての単語に関する処理を行ったか否かを判定し（Ｓ207 ）、全ての単語に関して処理を行っていない場合、開始位置で終わる単語が存在するか否かを判定し（Ｓ208 ）、存在しない場合は親単語無しの情報を設定する（Ｓ209 ）。
一方、ステップＳ208 の判定の結果、開始位置で終わる単語が存在する場合は親番号が一致する単語が存在するか否かを判定する（Ｓ210 ）。親番号が一致する単語がない場合は親単語無しの情報を設定する一方（Ｓ209 ）、親番号が一致する単語が存在する場合は親単語へのポインタを設定する（Ｓ211 ）。
【００３１】
ステップＳ207 の判定の結果、全ての単語に関して処理を行った場合はステップＳ204 に移行して検索文字位置を次の文字に設定し、テキスト中の最後の文字になるまでステップＳ204 〜Ｓ211 を繰り返す。
【００３２】
この実施の形態では、親単語が存在する場合には必ずその単語の持つ読みを選択する。住所辞書103 をこのような構成にすることにより、階層構造のどの階層から始まっている住所であっても正しい読みが得られる。
また、住所辞書として、住所の各階層を表す単語にハッシュインデックスを付与し、ハッシュインデックスを介して読みを検索する構成であってもよい。住所辞書をこのような構造にした場合でも、住所辞書の検索により表記に一致する候補を住所辞書から抽出した後で接続関係を求める手順は図10の構成の場合と同様である。
【００３３】
また、文章中の単語列に一致する単語列を住所辞書103 から求めたときに複数通りのマッチングが発生する場合がある。例えば、《東京都−港区−白金》と《東京都−港区−白金台》という地名が含まれている住所辞書103 を用いて『東京都港区白金台は、…。』という文章を音声合成する場合、又は《山形県−南陽市−宮内−新町》と《熊本県−荒尾市−宮内─新町》という地名が含まれている住所辞書103 を用いて『荒尾市宮内新町は、…。』という文章を音声合成する場合に住所辞書103 において両方の地名にマッチングする。
このとき、先の例では《東京都−港区−白金》の全文字長は７文字であり、《東京都−港区−白金台》の全文字長は８文字であるので、文字長が長い方の読みを選択する。また後の例で、山形県の地名では《宮内−新町》と４文字しか一致しないが、熊本県の地名では《荒尾市−宮内−新町》と７文字一致するので、文字長が長い方の熊本県の読みを選択する。
【００３４】
以上のアルゴリズムのフローチャートを図12に示す。
全単語から単語候補列を作成し（Ｓ301 ）、一番長い文字数の単語列を選択する（Ｓ302 ）。選択した単語と、この単語と区間が重複する候補列を、単語候補列を格納しているバッファ（図示せず）から削除する（Ｓ303 ）。単語候補列のバッファが空か否かを判定し（Ｓ304 ）、バッファが空になるまでステップＳ302 、Ｓ303 を繰り返す。
このアルゴリズムでは文章中に複数個の住所が含まれている場合も考慮した処理を行う。
【００３５】
〔実施の形態５〕
図13(a) 及び図13(b) は本発明の音声合成装置の実施の形態５の構成図であって、図13(a) は階層構造をそのまま持つ図４の構成の住所辞書103 を使用する場合の構成図、図13(b) は階層情報を単語間の親子関係で表現した図10の構成の住所辞書103 を使用する場合の構成図である。なお、図中、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。この実施の形態では、階層省略情報設定部114 を設けた点が異なる。
【００３６】
即ち、地名を表す場合に省略される部分として、都道府県名から始まる先頭部分だけではなく、住所の途中の階層が省略される場合もある。例えば、正確には『山梨県西八代郡上九一色村』であるが、『山梨県上九一色村』と郡の名称が省略されている場合、また正確には『神奈川県横浜市緑区長津田』であるが、『神奈川県横浜市長津田』と区の名称が省略されているような場合がある。このような住所辞書の階層構造の一階層又は何階層かが省略された住所表記が文章中に存在する場合には階層構造を持った住所辞書をそのまま参照して検索しても一致する候補が探し出せず、住所を正しく読み上げることはできない。
【００３７】
従って、実施の形態５では、単語検索時又は接続関係設定時に一階層飛ばした組み合わせも可能であるという規則を設ける。
図13(a) に関しては、階層毎にポインタをずらしながら単語を検索していく方法を基に説明する。この場合、特定の階層又は全ての階層での単語を検索する際に、検索ポインタで示される階層の単語を検索するだけでなく、それらの単語及びそれらの単語の一階層又は何階層か下の全ての単語を検索する。階層省略情報設定部114 は、このような検索の階層省略の情報を設定する手段である。
【００３８】
図13(b) に関しては、文章中に表れる全階層中の全単語を検索し、その後に、辞書引きされた単語間の接続関係を接続関係設定部113 で求める方法を基に説明する。この場合、一階層又は何階層か省略された場合でも単語間の親子関係が存在するという関係を接続関係設定部113 で設定する必要があるが、そのための制御を階層省略情報設定部114 で行う。即ち、親情報のポインタを二回たどることで、一階層省略された場合でも接続関係があるということを簡単に判定することができる。
【００３９】
また、上記問題点を解決する他の方法として、単語検索時又は接続関係設定時に、郡の名称，区の名称等の省略される可能性のある単語に関して、省略される可能性があるという情報を、住所辞書103 の単語の属性として持つ方法が考えられる。
図14(a) 、図14(b) は実施の形態５の変形例の構成図であって、図14(a) は階層構造をそのまま持つ図４の構成の住所辞書103 を使用する場合の構成図、図14(b) は階層情報を単語間の親子関係で表現した図10の構成の住所辞書103 を使用する場合の構成図である。
【００４０】
図14(a) に関しては、階層毎にポインタをずらしながら単語を検索していく方法を基に説明する。この場合、階層省略情報獲得部115 が、住所辞書103 中の単語の属性を調べ、省略される可能性があるという情報が含まれている場合のみ、上述と同様に、それらの単語及びそれらの単語の一階層又は何階層か下の全ての単語を検索する。この方法では、一階層又は何階層か省略された場合に、省略される可能性のある部分だけを検索するので、処理量が増加しないという利点がある。
【００４１】
図14(b) に関しては、文章中に表れる全階層中の全単語を検索し、その後に、辞書引きされた単語間の接続関係を接続関係設定部113 で求める方法を基にして説明する。この場合、階層省略情報獲得部115 が、住所辞書103 中の単語の属性を調べて、省略される可能性があるという情報が含まれている場合のみ、親情報のポインタを二回たどることで、一階層省略されて記載されている場合でも接続関係があることを判定することができる。また、親単語が省略される可能性がある場合には、親単語の親単語を親情報として持つというように、複数個の親情報を持つ構成としてもよい。
【００４２】
〔実施の形態６〕
図15は本発明の音声合成装置の実施の形態６の構成図であって、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。
この実施の形態では、全ての可能な単語列候補を作成した後で、その中から一番確からしい候補を選択する単語列候補選択部116 と、文章中に一致した住所辞書103 中の単語列候補に対して住所の読みを選択するか否かを判定する住所読み判定部117 とが設けられている。
【００４３】
即ち、住所辞書にマッチングする際に必ず住所辞書を参照すると、以下のような、住所以外の部分を住所の読みに置き換えてしまうという不具合がある。例えば、『化石（バケ′イシ）』、『三角（ミカド、ミ′スミ）』、『山寺（ヤ′マジ）』、『小文字（コモンジ）』、『大文字（ダ′イモンジ）』等のように、普通名詞と異なる読みを持つ地名が存在する場合がある。また、たとえ読みが同じであっても、アクセント型又はアクセント結合属性が違うために、読み上げるアクセントが違ってくる場合も発生する。従って、このような場合には住所の読みで読み上げないようにする必要がある。この不具合を回避するため、実施の形態６では、所定の単語数又は文字数より少ない単語数又は文字数しか一致しない場合は住所の読みを選択しないようにする。
【００４４】
例えば、《岡山県（オカヤマ′ケン）−上房郡（ジョーボ′ーグン）−賀陽町（カヨーチョー）−北（キ′タ）−門（カド）》、《大分県（オーイタ′ケン）−東国東郡（ヒガシクニサキ′グン）−国見町（クニミチョー）−中（ナ′カ）─下（シモ）》という地名が住所辞書103 に登録されている場合、『北門で待つ』『上中下』等の文章が入力された場合でも「北門」の２文字、又は「中下」の２文字しかマッチングしない場合には住所の読みを選択しないようにして基本辞書105 を参照するようにすれば、正しく読み上げることができる。
【００４５】
図16は実施の形態６のアルゴリズムのフローチャートである。
候補単語列の単語数又は文字数を求め（Ｓ401 ）、求めた単語数又は文字数がしきい値より小さいか否かを判定する（Ｓ402 ）。候補単語列の単語数又は文字数がしきい値以上の場合は住所の読みを選択する（Ｓ403 ）。一方、求めた単語数又は文字数がしきい値より小さい場合は住所の読みを選択せずに終了する。
【００４６】
〔実施の形態７〕
図17は本発明の音声合成装置の実施の形態７の構成図であって、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。実施の形態７では、自然な発声の合成音声を得るために韻律境界記号を設定する韻律境界設定部118 が設けられている。
【００４７】
即ち、全体のモーラ数の長い住所を読み上げる場合、呼気段落境界及びフレーズ境界を設定しないと、発声のピッチが低くなりすぎたり、息つぎが無い発声で息苦しく聞こえたりする。例えば、『北海道札幌市南区定山渓定山渓豊羽鉱山くるみ沢』という住所の読みは、「ホッカ′イドーサッポロ′シミナミ′クジョーザ′ンケージョーザンケートヨハコ′ーザンクルミ′サワ」であるが、一気に読むと非常に不自然に聞こえる。また、区切りすぎでも不自然に聞こえる。そのために、フレーズ境界又は呼気段落境界等の韻律境界記号を適当な位置に設定する必要がある。
【００４８】
単語間境界記号を設定する第１の方法として、住所辞書103 とマッチングした単語のモーラ数を累積していき、累積モーラ数がしきい値を超えないように、又は超える毎にフレーズ境界又は呼気段落境界等の韻律境界記号を設定する方法が考えられる。
【００４９】
図18はこのアルゴリズムのフローチャートである。
累積モーラ数に現単語モーラ数を設定する（Ｓ501 ）。地名単語の読みをバッファ（図示せず）に設定し（Ｓ502 ）、地名単語のポインタを次に進める（Ｓ503 ）。地名単語候補列が終了か否かを判定し（Ｓ504 ）、終了でない場合は地名単語のモーラ数を加算する（Ｓ505 ）。累積モーラ数がしきい値を超えたか否かを判定し（Ｓ506 ）、しきい値を超えるまでステップＳ502 〜Ｓ505 を繰り返す。
累積モーラ数がしきい値を超えると、呼気段落記号をバッファに設定し（Ｓ507 ）、ステップＳ501 に戻って地名単語候補列が終了するまで、ステップＳ501 〜Ｓ507 を繰り返す。
【００５０】
先の『北海道札幌市南区定山渓定山渓豊羽鉱山くるみ沢』の例では、単語毎のモーラ数は｛６，５，４，６，13, ５｝であるから、しきい値を13モーラに設定すれば、｛（６，５），（４，６），13, ５｝に分割される。ここで、呼気段落記号を「・」で表した場合、読みは「ホッカ′イドーサッポロ′シ・ミナミ′クジョーザ′ンケー・ジョーザンケートヨハコ′ーザン・クルミ′サワ」となり、自然に読み上げることができる。
【００５１】
単語間境界記号を設定する第２の方法として、住所辞書103 中の地名単語の階層データ構造中に境界記号を含めておき、それを参照してフレーズ境界又は呼気段落境界等の韻律境界記号を設定する方法が考えられる。
図19はこのアルゴリズムのフローチャートである。
地名単語の読みをバッファに設定する（Ｓ601 ）。地名単語のポインタを次に進め（Ｓ602 ）、地名単語候補列が終了か否かを判定する（Ｓ603 ）。地名単語候補列が終了でない場合は韻律境界記号があれば獲得して設定し（Ｓ604 ）、ステップＳ601 に戻って、地名単語候補列が終了するまでステップＳ601 〜Ｓ604 を繰り返す。
【００５２】
第２の方法では、モーラ数だけ見るのではなく、予めフレーズ境界及び呼気段落境界記号を区別して入れることができるので、より自然に発声できる。前述の例で示すと、例えば『札幌市』と『南区』とを一緒に発声した方がより自然であるため、それらの情報を住所辞書103 に格納しておく。ここで、呼気段落境界記号を「・」、フレーズ境界記号を「／」で表した場合、前述の例を「ホッカ′イドー／サッポロ′シミナミ′ク／ジョーザ′ンケー・ジョーザンケートヨハコ′ーザン／クルミ′サワ」と読み上げることが可能になる。
【００５３】
〔実施の形態８〕
図20は本発明の音声合成装置の実施の形態８の構成図であって、図中、上述の実施の形態と同一部分には同一符号を付してその説明を省略する。実施の形態８では、マイナス記号，長音記号等で表記されている番地用記号を、番地用の読みに変換する番地用記号変換部119 が設けられている。
即ち、『東京都千代田区丸の内１−６−１』と表記された住所を含む文章が入力された場合、番地『１−６−１』を「イチ・マイナス・ロク・マイナス・イチ」と読み上げてしまうと、住所の読みとして違和感を与える。そのため、住所に上記のような連続する数値が表記されている場合に、『１の６の１』、即ち、「イチ′ノ・ロク′ノ・イチ」と読み上げる。
【００５４】
図21はこのアルゴリズムのフローチャートである。
文章中に一致した住所辞書103 中の単語列から決定される住所を認識した後、その住所区間に続く文字を検索していく。次の文字が数字（０〜９、〇〜九、十、百、千、万）か否かを判定し（Ｓ701 ）、数字の場合はポインタを一文字分進め（Ｓ702 ）、ステップＳ701 に戻る。次の文字が数字でない場合は、次の単語が助数詞（番地、番、丁目、号）か否かを判定し（Ｓ703 ）、助数詞の場合はポインタを単語の文字分進め（Ｓ704 ）、ステップＳ701 に戻る。
【００５５】
次の単語が助数詞でない場合は次の文字が区切り記号（−、ー）か否かを判定し（Ｓ705 ）、区切り記号の場合は文字を『の』で置き換える（Ｓ706 ）。ここで、区切り記号として長音記号『ー』を含めたのは、マイナス記号『−』を長音記号で誤って表記するケースも多いので、誤って表記されていても読めるようにするためである。
以上のいずれの文字、単語でもない場合は処理を終了する。これらの処理を行うことで、数値間の区切り記号文字を含む番地の記述を正しく読み上げることができる。
【００５６】
〔実施の形態９〕
図22は本発明の音声合成装置の実施の形態９の構成図である。なお、上述の実施例と同一部分には同一符号を付してその説明を省略する。
例えば、『東京都豊島区立第三中学校』という文章が入力された場合に、文章中に一致した住所辞書103 の単語列から決定される住所区間のみを発音情報に変換してしまうと、「トーキョ′ートトシマ′ク」＋『立第三中学校』と解釈し、結果的に文章解析部104 で『立』を「タ′チ」と読んでしまう。
【００５７】
従って、実施の形態９では、『立、内、民、行き、発、着、…』等の語彙からなる住所接尾語及びアクセント結合属性等が格納されている住所接尾語辞書120 と、文章中の単語に一致した住所辞書103 中の単語列から決定される住所区間の直後に続く単語に一致する単語を住所接尾語辞書120 から検索する住所接尾語辞書検索部121 と、住所接尾語辞書検索部121 の検索によって住所の後ろに住所接尾語が存在した場合は、文章中の単語に一致した住所辞書103 中の単語列からなる住所区間に住所接尾語が含まれるように修正し、文章中の単語に一致した住所辞書103 中の単語列の最終単語と住所接尾語とを住所接尾語に設定されているアクセント結合属性に応じてアクセント結合し、読みを設定するアクセント結合処理部122 とが設けられている。
【００５８】
図23は実施の形態９のアルゴリズムのフローチャートである。
住所の単語列の次の単語が住所接尾語辞書120 にあるか否かを判定し（Ｓ801 ）、住所接尾語辞書120 にある場合はアクセント結合処理部122 がアクセント結合する（Ｓ802 ）。
このような処理を行うことで、先の例の文章情報を「トーキョ′ートトシマク′リツ」＋『第三中学校』という情報として文章解析部105 に渡せるため、正しく読み上げることができる。
【００５９】
〔実施の形態10〕
図24は本発明の音声合成装置の実施の形態10の構成図である。なお、上述の実施例と同一部分には同一符号を付してその説明を省略する。
例えば、文章中に住所が出現した後で、その住所に含まれる単語が繰り返し出現する場合、例えば、『神奈川県中原区上小田中に、上小田中公民館はあります。』という文章で、地名の『神奈川県中原区上小田中』は正しく読めるが、次に出てくる『上小田中公民館は…』という文章に関しては、先で読めているにもかかわらず、同じ表記で別の読みが登録されているために間違った読みで読み上げることがある。また、実施の形態６のような構成を採っている場合、文字数（単語数）が所定のしきい値より少ないために住所読みで処理できないことも起こり得る。
【００６０】
従って、実施の形態10では、住所辞書検索部102 で処理された住所の全ての構成単語に対して、漢字表記と発音情報とをペアにして学習用単語バッファ124 に格納する住所単語学習部123 と、最近使用した住所の表記と読みとを格納する学習用単語バッファ124 と、階層構造の住所辞書103 を検索した後の文章に対して学習用単語の検索を行い、学習用単語に一致する単語が存在する場合には、該当する文章の部分に、対応する発音情報を埋め込む学習用単語検索部125 とが設けられている。学習用単語バッファ124 に格納されている単語は、その読みを優先的に使用することによって、文章中に一度でも出現した住所の一部が次に出現した場合に正しく読み上げることができる。
【００６１】
また、学習用単語バッファ124 内の内容を初期化する学習用単語削除部126 を設けてもよい。この場合、学習用単語バッファ124 は通常は初期化せずに使用者の指定に応じて初期化する構成であっても、また、文章情報の入力の都度、初期化する構成であってもよい。さらに、登録された後に入力された文章の数が所定数を超えた時点で初期化する構成であってもよい。
【００６２】
なお、本発明の音声合成装置において実施される音声合成方法は、音声合成装置のＲＯＭに書き込んでおく以外に、図25に示すように、コンパクトディスク等の記録媒体Ｄに記録しておき、この記録媒体Ｄをパーソナルコンピュータのディスクドライブに装填して音声合成する構成であってもよい。
【００６３】
なお、上述の実施の形態では単語列が住所の場合について説明したが、単語列は住所に限らず、階層構造を有する単語列であれば同様の効果が得られる。
【００６４】
【発明の効果】
以上のように、本発明の音声合成装置及び音声合成方法は、一つの表記に対して複数の読みが存在する住所の表記及び読みの情報を階層化した階層構造の辞書を持つので、一つの表記に対して複数の読みが存在する住所の文章情報を正確な読みで読み上げるという優れた効果を奏する。
【図面の簡単な説明】
【図１】本発明の音声合成装置の基本ブロック図である。
【図２】本発明の音声合成装置の一例の模式図である。
【図３】本発明の音声合成装置の実施の形態１の構成図である。
【図４】住所辞書の一例の概念図である。
【図５】実施の形態１のアルゴリズムのフローチャートである。
【図６】本発明の音声合成装置の実施の形態２の構成図である。
【図７】本発明の音声合成装置の実施の形態３及びその変形例の構成図である。
【図８】検索開始位置格納バッファの概念図である。
【図９】本発明の音声合成装置の実施の形態４の構成図である。
【図１０】住所辞書の他の例の概念図である。
【図１１】実施の形態４のアルゴリズムのフローチャート（その１）である。
【図１２】実施の形態４のアルゴリズムのフローチャート（その２）である。
【図１３】本発明の音声合成装置の実施の形態５の構成図である。
【図１４】本発明の音声合成装置の実施の形態５の変形例の構成図である。
【図１５】本発明の音声合成装置の実施の形態６の構成図である。
【図１６】実施の形態６のアルゴリズムのフローチャートである。
【図１７】本発明の音声合成装置の実施の形態７の構成図である。
【図１８】実施の形態７のアルゴリズムのフローチャート（その１）である。
【図１９】実施の形態７のアルゴリズムのフローチャート（その２）である。
【図２０】本発明の音声合成装置の実施の形態８の構成図である。
【図２１】実施の形態８のアルゴリズムのフローチャートである。
【図２２】本発明の音声合成装置の実施の形態９の構成図である。
【図２３】実施の形態９のアルゴリズムのフローチャートである。
【図２４】本発明の音声合成装置の実施の形態10の構成図である。
【図２５】記録媒体の記録状態の概念図である。
【符号の説明】
１文章入力部
２階層辞書検索部
３階層辞書
４文章解析部
５基本辞書
６音声波形生成部
７スピーカ[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a speech synthesizer that reads out text information of an address having a plurality of readings in one notation with an accurate reading.as well asSpeech synthesisTo the lawRelated.
[0002]
[Prior art]
In a conventional speech synthesizer, when text information is input, text information is referred to a dictionary that stores information such as word reading and accents for synthesizing speech in association with word notation. It is converted into readings, a speech waveform is generated from the reading information and information such as accents, and the text information is read out with synthesized speech.
[0003]
[Problems to be solved by the invention]
However, if the text information includes addresses with multiple readings in a single notation, for example, the text information with the notation “Honmachi” can be changed to “Honcho,” “Honmachi,” “Moto’machi” ( There is a possibility that it is impossible to accurately determine which of a plurality of readings such as “′” is an accent symbol) and to read out by wrong reading.
[0004]
  The present invention has been made in order to solve such problems, and has a hierarchical dictionary in which address notation and reading information in which a plurality of readings exist for one notation are hierarchized. Voice synthesizer that reads out text information of an address where multiple readings exist for one notation with accurate readingas well asSpeech synthesisLegalFor the purpose of provision.
[0005]
[Means for Solving the Problems]
  FIG. 1 is a basic block diagram of a speech synthesizer according to the present invention.
  When the text input unit 1 that inputs text information directly from a recording medium such as a CD-ROM or a magneto-optical disk or via a public line or the like inputs the text information, the hierarchical dictionary search unit 2 appears first.Represents an addressAppear next to the word by a wordRepresents an addressWord reading is predeterminedRepresents an addressFrom wordsSimpleThe notation and reading information of each word in the word string is the wordFor example MiyakoPrefecture, city county, town and village orderOn the floorIncluded in the text information input from the hierarchical dictionary 3 stored with the information of the layered hierarchical structureWord string that matches the address word string from the stringThe word analysis unit 4 searches the candidate for reading and the sentence analysis unit 4 includes the word string in the hierarchical dictionary 3 included in the sentence information from the candidate readings.From the addressA sentence analysis unit 4 that selects a reading of a word string that matches the word string based on hierarchical structure information, converts the word string into a reading in the hierarchical dictionary 3, and a voice waveform that generates a voice waveform from the reading information A generation unit 6 and a speaker 7 that outputs sound related to the generated sound waveform are provided.
  As a result, an address with multiple readings in one notationRepresentsRead a word string with correct reading.
[0006]
In addition, the speech synthesizer of the present invention starts a search from a hierarchy belonging to the designated area by designating the area based on input of information specifying the area, determination from the place name included in the text information, and the like. To reduce search time.
[0007]
  The speech synthesizer according to the present invention also includes a hierarchical dictionary in which the hierarchical structure information is information specifying a parent word in a higher hierarchy of each word that determines the reading, and candidate readings retrieved from the hierarchical dictionary. Included in text information with reference to informationMatches the address word string in the stringSets the connection relationship of word string reading candidates.
[0008]
In addition, the speech synthesizer of the present invention searches a hierarchical dictionary based on a word string in which notation of words in any hierarchy of the word string is omitted, so that a part of the word string in the document is omitted. Speak with correct reading even if you are.
[0009]
  The speech synthesizer of the present invention also reads the hierarchical dictionary when the notation of a predetermined number or more characters or words matches the notation of the characters or words included in the word string stored in the hierarchical dictionary. Or an address where there are multiple readings by judging it as a word readingSimpleNon-word strings that contain common words whose notation matches part of this word stringPlaceThere is no reading mistake by reading.
[0010]
The speech synthesizer according to the present invention further includes a suffix dictionary storing notation and reading information of suffixes connected to the word string, and a sentence immediately after the word string matching the word string in the hierarchical dictionary. Connect to this word string by searching notation information matching the notation of the information from the suffix dictionary and selecting the reading of the notation of the suffix dictionary that matches the notation of the immediately following notation as the reading of the immediately following notation By reading the suffix that is different from the reading of a general word.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 2 is a schematic diagram showing an example of a speech synthesizer according to the present invention.
In the speech synthesizer of this example, a dictionary such as an address dictionary and an address suffix dictionary as described below and a computer program of the speech synthesis method are recorded in a disk drive of a general-purpose personal computer having a speech output function. A computer program is loaded by loading a recording medium D such as a magneto-optical disk, CD-ROM, etc., and voice is synthesized from text information input from a text database disk or via a public line (not shown). The synthesized voice is output from a speaker connected to the PC or via a public line.
[0012]
The speech synthesizer according to the present invention is a dedicated speech synthesizer that reads out character string information such as traffic information received from FM multiplex broadcasting as synthesized speech in addition to the configuration in which software is loaded onto a general-purpose personal computer as described above. In this case, the text information is input via the antenna.
[0013]
[Embodiment 1]
FIG. 3 is a block diagram of Embodiment 1 of the speech synthesizer of the present invention.
A text input unit 101 inputs text information from a CD-ROM, a magneto-optical disk, etc., and an address dictionary search unit 102 converts a word string that matches the word string of the address included in the text information to the notation of the address word string and The reading information is hierarchized, and retrieval is performed from the hierarchical address dictionary 103 in which hierarchical information is stored together with the hierarchical information.
The sentence analysis unit 104 uses the reading of the word searched from the address dictionary 103 as the reading of the address dictionary 103, searches the basic dictionary 105 for words that match other general words, converts the sentence information into readings, The speech waveform generation unit 106 generates a speech waveform from the reading and accent information, and the synthesized speech that reads out the text information from the speaker 107 is output.
[0014]
FIG. 4 is a conceptual diagram of an example of the address dictionary 103.
The address dictionary 103 stores the notation and reading information of each word constituting the address word string in a hierarchical manner. The head of the hierarchy is the name of 47 prefectures, and as a hierarchy below that, the place names of towns or villages, and place names such as large letters and small letters are hierarchized in the order of connection. In addition to such hierarchical information, each word has pronunciation information including prosodic information such as kanji notation as a search key, accent phrase boundary information, and accent type, that is, reading information.
[0015]
FIG. 5 is a flowchart of a basic search algorithm for the address dictionary 103 in the first embodiment.
First, a text pointer is set at the head of a text buffer for storing input text information (S101), and an address dictionary search pointer is set at the head of the hierarchical structure of the address dictionary 103 (S102). While shifting the text pointer one character at a time, the word starting at the text pointer position is compared with the search candidate word in the address dictionary 103 to determine whether or not they match (S103, 104). Check if it exists in the buffer. If there is a word starting with the text pointer in the address dictionary 103, the address dictionary search pointer is set to the next level (S105), the text pointer is set to the next word position (S106), and the text It is checked whether or not a word of the next hierarchy exists at the next word position (S107, S108).
[0016]
When there is no longer a word that matches the word in the address dictionary 103, the section that matches the word in the address dictionary 103 is regarded as the address section, and the reading column set in the matched word string is used as the address section. Pronunciation information is set as a reading (S109) and passed to the sentence analysis unit 104. Also, since there is a possibility that the address is also included in the text after the address section, the address dictionary search pointer is set at the head of the hierarchical structure (S110), and the same processing is performed on the entire text up to the end of the text information. (S111, S112).
[0017]
[Embodiment 2]
FIG. 6 is a configuration diagram of the second embodiment. Note that the same parts as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted. In the second embodiment, there is no word in the basic dictionary 105 used by the sentence analysis unit 104. However, when pronunciation information is known by searching the address dictionary 103, the word is not registered and the pronunciation is made in the input sentence. An address pronunciation setting unit 108 that embeds information in a text as a pronunciation designation character string and a pronunciation designation analysis unit 109 that identifies and analyzes the pronunciation designation character string embedded in the text are provided.
[0018]
Here, if the format of the pronunciation designation character string is defined as “<pronunciation: kanji notation: pronunciation information>”, “<:>” is a symbol for giving a special meaning in the sentence. "Is a keyword for identifying the pronunciation designation. “Kanji notation” is merely a comment role, and “phonetic information” describes pronunciation information expressed in katakana and accent symbols.
[0019]
Next, the operation will be described.
For example, when the text “I live in Ota-ku, Tokyo” is input from the text input unit 101, the address dictionary search unit 102 searches the address dictionary 103 in the same manner as in the first embodiment, and “Tokyo” The section of “Ota Ward” is the address word string, and the pronunciation of the address is “Tokyo "Otaku '("' is the accent symbol, " "Is an accent zone boundary symbol). Further, the address pronunciation setting unit 108 replaces the address section of the input sentence with the above-mentioned pronunciation designation character string, and reads “<pronunciation: Ota-ku, Tokyo: Tokyo I live in Ota. Is output to the pronunciation specification analysis unit 109.
[0020]
The pronunciation specification analysis unit 109 identifies the characters delimited by parentheses (<>) and analyzes them as pronunciation specifications. It is assumed that there is a noun that is “auto”, and the other part of the sentence is passed to the sentence analysis unit 104 as it is. The sentence analysis unit 104 analyzes the pronunciation designation information and other sentences and sets correct reading information.
[0021]
In the second embodiment, an address pronunciation setting unit 108 is connected to the address dictionary search unit 102, and in addition to the input route from the address pronunciation setting unit 108, the text is set between the address pronunciation setting unit 108 and the sentence analysis unit 104. By connecting a pronunciation specification analysis unit 109 having an input path of sentence information from the input unit 101, when reading a sentence that does not include an address, the sentence information is converted to pronunciation information with reference to the basic dictionary 105. Direct input to the analysis unit 104 is possible. That is, in the figure, the address reading section surrounded by a broken line can be configured as an independent device, or the text information input path can be selectively used.
[0022]
Therefore, first, parallel processing is possible when the address reading section is configured as an independent device. The number of words in the address dictionary 103 generally exceeds 100,000 words, and the number of words in the basic dictionary 105 in the language processing unit may exceed tens of thousands to 100,000 words. With such a configuration, it becomes possible to process the address reading portion and the language processing for speech synthesis of the other portions in parallel with two CPUs, thereby preventing an increase in processing time. It is.
Second, when the address reading section is configured by software, independent commands can be created as commands for the address reading section and the language processing section, so that software maintenance work and system changes can be handled. Is easy.
[0023]
[Embodiment 3]
FIG. 7A is a configuration diagram of the third embodiment. In addition, the same code | symbol is attached | subjected to the part same as the above-mentioned embodiment, and the description is abbreviate | omitted.
In this embodiment, each time text information is input, a designated area input unit 110 that sets the area name, and a search start position storage buffer that holds the designated area as a search start point of the hierarchical address dictionary 103 If the region name is specified, starting from the search for each layer of the hierarchical structure belonging to the specified region will greatly increase the processing time required for the search. To reduce.
[0024]
The address may be written entirely from the name of 47 prefectures, but if the name of the prefecture is self-explanatory from the context or is a well-known place name, the name of the prefecture can be omitted. In many cases, the prefecture name is omitted. However, if the text contains an address starting from the middle of the hierarchy, the hierarchical address dictionary 103 can also search from the middle hierarchy, but as the hierarchy goes down, the number of search target words increases, The search target word may reach hundreds of thousands.
For this reason, in this embodiment, information on which point on the hierarchy is used as the starting point is designated in the region.
[0025]
FIG. 8 is a conceptual diagram of the search start position storage buffer 111 when the area “Hokkaido Asahikawa” is designated by the designated area input unit 110, for example. When the area “Hokkaido Asahikawa City” is specified by the specified area input unit 110, the specified area is searched by the address dictionary search unit 102, and each of the three levels “hierarchy top”, “Hokkaido”, and “Asahikawa City” is searched. The structural search pointer is stored in the search start position storage buffer 111. The address dictionary search unit 102 searches the address dictionary 103 using each search pointer in the search start position storage buffer 111 as a starting point.
Note that the search start position information is not limited to one region, and may be configured to store information on a plurality of regions.
[0026]
FIG. 7B is a configuration diagram of a modification of the third embodiment. This modification is different from the third embodiment in that a designated area acquisition unit 112 for obtaining area information from the inputted text information is provided instead of the designated area input unit 110. When the address dictionary search unit 102 searches the address dictionary 103 and there is a word string candidate in the address dictionary 103 that matches the word in the sentence, the designated area acquisition unit 112 acquires the area of the place name in the sentence. The information of the search start position stored in the search start position storage buffer 111 is used in the same manner as described above.
[0027]
[Embodiment 4]
FIG. 9 is a block diagram of Embodiment 4 of the speech synthesizer of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the above-mentioned embodiment, and the description is abbreviate | omitted.
In the fourth embodiment, in order to search for a place name starting from an intermediate hierarchy, in order to solve the problem that it takes a long time to search when all the following hierarchies are searched from the middle of the address dictionary, the hierarchy is solved. The structure address dictionary 103 is configured as shown in FIG. 10, and a connection relationship setting unit 113 for obtaining a connection relationship between words is provided.
[0028]
That is, in the address dictionary 103 of FIG. 10, all words in the hierarchical structure are given parent information such as a parent number for identifying a parent word in a higher hierarchy, in which the reading becomes this reading, in the notation and reading information. Are expressed in a parent-child relationship (eg, parent word # 1, parent word # 2,...). Furthermore, for easy retrieval by notation, the codes are sorted in the order of the notation codes. At this time, words with different readings in one notation have parent information in each of multiple readings, and words having the same reading in different notations have multiple parent information in each reading. It can also be compressed.
[0029]
FIG. 11 is a flowchart of the algorithm of the connection relation setting unit 113, and steps surrounded by a broken line indicate processing in the connection relation setting unit 113.
The search character position is set to the first character of the text information (S201), and the word starting at the start position is searched (S202). It is determined whether or not there is a word starting at the start position (S203). If there is no word starting at the start position, the search character position is set to the next character (S204), and whether or not it is the last character in the text. Is determined (S205). If it is not the last character in the text, the process proceeds to step S202 to search for a word starting at the start position (S202). The address dictionary search unit 102 searches all the hierarchies of the address dictionary 103 based on all the words included in the sentence, and extracts all candidates that match the words in the sentence.
[0030]
If there is a word starting at the start position as a result of the determination in step S203, the word having a different reading and word is divided (S206).
The connection relationship setting unit 113 determines whether or not processing has been performed for all words (S207). If processing has not been performed for all words, it is determined whether or not there is a word ending at the start position ( S208), if there is no parent word, information without a parent word is set (S209).
On the other hand, if there is a word ending at the start position as a result of the determination in step S208, it is determined whether there is a word with a matching parent number (S210). If there is no word with a matching parent number, information indicating no parent word is set (S209), while if there is a word with a matching parent number, a pointer to the parent word is set (S211).
[0031]
If all words are processed as a result of the determination in step S207, the process proceeds to step S204, the search character position is set to the next character, and steps S204 to S211 are repeated until the last character in the text is reached.
[0032]
In this embodiment, when a parent word exists, the reading of the word is always selected. By configuring the address dictionary 103 in this way, correct reading can be obtained for any address starting from any level of the hierarchical structure.
Further, the address dictionary may be configured such that a hash index is assigned to a word representing each level of the address, and readings are searched through the hash index. Even when the address dictionary has such a structure, the procedure for obtaining the connection relationship after extracting candidates matching the notation from the address dictionary by searching the address dictionary is the same as that in the configuration of FIG.
[0033]
In addition, when a word string that matches a word string in a sentence is obtained from the address dictionary 103, multiple types of matching may occur. For example, using an address dictionary 103 that includes the names “Tokyo-Minato-Shirokane” and “Tokyo-Minato-Shirokanedai”, “Shirokanedai, Minato-ku, Tokyo…. ”Or by using the address dictionary 103 containing the place names“ Yamagata-Nanyo-Miyauchi-Shinmachi ”and“ Kumamoto-Arao-Miyauchi-Shinmachi ” Shinmachi is ... ] Is synthesized with the name of both place names in the address dictionary 103.
At this time, in the previous example, the total character length of << Tokyo-Minato-Platinum >> is 7 characters, and the total character length of << Tokyo-Minato-Shirokanedai >> is 8 characters, so the character length is long. Select the reading. In the example below, the place name in Yamagata Prefecture matches only 4 characters with “Miyauchi-Shinmachi”, but the place name in Kumamoto Prefecture matches 7 characters with “Arao City-Miyauchi-Shinmachi”. Select the reading of Kumamoto Prefecture.
[0034]
A flowchart of the above algorithm is shown in FIG.
A word candidate string is created from all the words (S301), and the word string having the longest number of characters is selected (S302). The selected word and the candidate string in which the word and the section overlap are deleted from a buffer (not shown) storing the word candidate string (S303). It is determined whether or not the word candidate string buffer is empty (S304), and steps S302 and S303 are repeated until the buffer is empty.
In this algorithm, processing is performed in consideration of the case where a plurality of addresses are included in a sentence.
[0035]
[Embodiment 5]
FIGS. 13 (a) and 13 (b) are block diagrams of Embodiment 5 of the speech synthesizer of the present invention. FIG. 13 (a) shows an address dictionary 103 having the hierarchical structure shown in FIG. FIG. 13 (b) is a block diagram when using the address dictionary 103 having the configuration of FIG. 10 in which hierarchical information is expressed by a parent-child relationship between words. In the figure, the same parts as those in the above-described embodiment are denoted by the same reference numerals, and the description thereof is omitted. This embodiment is different in that a hierarchy omission information setting unit 114 is provided.
[0036]
That is, as a part omitted when representing the place name, not only the head part starting from the prefecture name but also the middle part of the address may be omitted. For example, if the name of the county is abbreviated as “Yamanashi Prefectural Nishi-Yatsushiro-Kami Kyuichi-color Village”, the name of the county is omitted. In some cases, the name of the ward is abbreviated as “Nagatsuta” but “Nagatsuda, Yokohama, Kanagawa”. If an address notation in which one or more layers of the address dictionary are omitted is present in the text, there is a matching candidate even if the address dictionary having the hierarchy structure is referenced and searched as it is. I can't find it and I can't read the address correctly.
[0037]
Therefore, in the fifth embodiment, a rule is provided that a combination skipped by one layer at the time of word search or connection relation setting is also possible.
FIG. 13 (a) will be described based on a method of searching for words while shifting the pointer for each layer. In this case, when searching for a word in a specific hierarchy or all hierarchies, not only the words in the hierarchy indicated by the search pointer are searched, but also those words and those words one or several levels below. Search all words. The hierarchy omission information setting unit 114 is a means for setting such information of omission of hierarchy omission.
[0038]
13 (b) will be described based on a method in which all words in all layers appearing in a sentence are searched, and thereafter, a connection relation between dictionaries is obtained by the connection relation setting unit 113. In this case, even if one layer or several layers are omitted, it is necessary to set the relationship that there is a parent-child relationship between words in the connection relationship setting unit 113, and control for that is performed in the layer abbreviation information setting unit 114 . That is, by following the parent information pointer twice, it is possible to easily determine that there is a connection relationship even if one layer is omitted.
[0039]
In addition, as another method for solving the above-mentioned problem, there is a possibility that a word that may be omitted, such as the name of a county or the name of a ward, may be omitted at the time of word search or connection setting. Can be considered as a word attribute of the address dictionary 103.
FIGS. 14 (a) and 14 (b) are configuration diagrams of a modification of the fifth embodiment, and FIG. 14 (a) shows a case where the address dictionary 103 having the hierarchical structure as shown in FIG. 4 is used. FIG. 14 (b) is a configuration diagram in the case of using the address dictionary 103 having the configuration of FIG. 10 in which hierarchical information is expressed by a parent-child relationship between words.
[0040]
FIG. 14 (a) will be described based on a method of searching for words while shifting the pointer for each layer. In this case, the hierarchy omission information acquisition unit 115 examines the attributes of the words in the address dictionary 103 and only includes information that there is a possibility of omission, as in the above, Search all words one level or several levels below. This method has an advantage that when only one layer or several layers are omitted, only a portion that may be omitted is searched, so that the processing amount does not increase.
[0041]
  14 (b) will be described based on a method in which all words in all layers appearing in a sentence are searched, and thereafter, a connection relation between words looked up in the dictionary is obtained by the connection relation setting unit 113. In this case, the hierarchy omission information acquisition unit 115 checks the attribute of the word in the address dictionary 103 and only includes information that there is a possibility of omission. , One layer omittedBeenEven if it is described, it can be determined that there is a connection relationship. Further, when there is a possibility that the parent word may be omitted, it may be configured to have a plurality of parent information such as having the parent word of the parent word as parent information.
[0042]
[Embodiment 6]
FIG. 15 is a block diagram of Embodiment 6 of the speech synthesizer of the present invention. The same parts as those in the above-described embodiment are denoted by the same reference numerals, and description thereof is omitted.
In this embodiment, after all possible word string candidates are created, a word string candidate selection unit 116 that selects the most probable candidate among them, and a word string in the address dictionary 103 that matches in the sentence An address reading determination unit 117 is provided for determining whether to select address reading for the candidate.
[0043]
That is, if the address dictionary is always referred to when matching with the address dictionary, there is a problem that the part other than the address is replaced with the reading of the address as follows. For example, “Fossil”, “Triangle”, “Yaji Maji”, “Lowercase (Common)”, “Uppercase” There are cases where place names have different readings from common nouns. Even if the reading is the same, the accent type or accent combination attribute may be different, so that the accent to be read out may be different. Therefore, in such a case, it is necessary not to read out by reading the address. In order to avoid this inconvenience, in the sixth embodiment, the address reading is not selected when only the number of words or characters less than the predetermined number of words or characters match.
[0044]
For example, <Okayama Prefecture-Jobo-Gun-Kayocho-Kita-Kado>, <Oita Prefecture-Togoku-Higashi-gun If the place name “Higashikunisaki'-kun” -Kunimi-cho- “Naka”-“Shimo” is registered in the address dictionary 103, text such as “Wait at the North Gate”, “Kaminaka-shita”, etc. Even if is entered, if only the two letters “North Gate” or two letters “Middle Lower” are matched, it will be read correctly if the address dictionary is not selected and the basic dictionary 105 is referenced. Can do.
[0045]
FIG. 16 is a flowchart of the algorithm of the sixth embodiment.
The number of words or characters in the candidate word string is obtained (S401), and it is determined whether or not the obtained number of words or characters is smaller than a threshold value (S402). If the number of words or the number of characters in the candidate word string is greater than or equal to the threshold value, address reading is selected (S403). On the other hand, if the obtained number of words or characters is smaller than the threshold value, the reading is terminated without selecting address reading.
[0046]
[Embodiment 7]
FIG. 17 is a block diagram of Embodiment 7 of the speech synthesizer of the present invention. The same parts as those in the above-described embodiment are denoted by the same reference numerals and description thereof is omitted. In the seventh embodiment, a prosodic boundary setting unit 118 is provided for setting prosodic boundary symbols in order to obtain a synthesized speech with a natural utterance.
[0047]
That is, when reading an address with a long total number of mora, if the exhalation paragraph boundary and the phrase boundary are not set, the pitch of the utterance will be too low, or the utterance with no breath will sound stuffy. For example, the address “Kusamizawa, Jozankei Toyoiwa Mine, Minami-ku, Sapporo, Hokkaido” Sapporo Southern Josa'nkay Johzan Kate Johako Walnut 'Sawa ", but it sounds very unnatural when read at once. Also, it sounds unnatural even if it is too divided. Therefore, it is necessary to set prosodic boundary symbols such as phrase boundaries or exhalation paragraph boundaries at appropriate positions.
[0048]
As a first method for setting the inter-word boundary symbol, the number of mora of the word matched with the address dictionary 103 is accumulated, and the phrase boundary or expiration is set so that the cumulative mora number does not exceed the threshold value or exceeds each threshold value. A method of setting prosodic boundary symbols such as paragraph boundaries is conceivable.
[0049]
FIG. 18 is a flowchart of this algorithm.
The current word mora number is set as the cumulative mora number (S501). The reading of the place name word is set in a buffer (not shown) (S502), and the pointer of the place name word is advanced (S503). It is determined whether or not the place name word candidate string ends (S504). If it is not the end, the number of mora of the place name words is added (S505). It is determined whether or not the cumulative number of mora exceeds a threshold value (S506), and steps S502 to S505 are repeated until the threshold value is exceeded.
When the cumulative number of mora exceeds the threshold value, the exhalation paragraph symbol is set in the buffer (S507), and the process returns to step S501 to repeat the steps S501 to S507 until the place name word candidate sequence is completed.
[0050]
In the previous example of “Kurumizawa, Jozankei Jozankei Toyoha Mine, Minami-ku, Sapporo, Hokkaido”, the number of mora per word is {6, 5, 4, 6, 13, 5}, so the threshold is set to 13 mora. If set, it is divided into {(6, 5), (4, 6), 13, 5}. Here, if the exhalation paragraph symbol is represented by “•”, the reading is “Hokka Ido”. Sapporo's Minami'ku "Jozanke", "Jozakate", "Johako," "Zanwali," "Sawa".
[0051]
As a second method for setting a boundary symbol between words, a boundary symbol is included in a hierarchical data structure of place name words in the address dictionary 103, and a prosodic boundary symbol such as a phrase boundary or a breath paragraph boundary is referred to. A method of setting is conceivable.
FIG. 19 is a flowchart of this algorithm.
The reading of the place name word is set in the buffer (S601). The place name word pointer is advanced (S602), and it is determined whether or not the place name word candidate string ends (S603). If the place name word candidate sequence is not the end, if there is a prosodic boundary symbol, it is acquired and set (S604), and the process returns to step S601 and steps S601 to S604 are repeated until the place name word candidate sequence ends.
[0052]
In the second method, not only the number of mora but also the phrase boundary and exhalation paragraph boundary symbol can be distinguished in advance, so that the voice can be spoken more naturally. In the above example, since it is more natural to say “Sapporo City” and “Minami Ward” together, the information is stored in the address dictionary 103. Here, when the exhalation paragraph boundary symbol is represented by “·” and the phrase boundary symbol is represented by “/”, the above example is “Hoccaido / Sapporo”. It becomes possible to read out “Minami” / Joosa ’N-Ke Jozan-Keto Johako-Zan / Kurumi ’s Sawa”.
[0053]
[Embodiment 8]
FIG. 20 is a block diagram of Embodiment 8 of the speech synthesizer of the present invention. In the figure, the same parts as those in the above-described embodiment are denoted by the same reference numerals and description thereof is omitted. In the eighth embodiment, there is provided an address symbol conversion unit 119 for converting an address symbol represented by a minus sign, a long sound symbol, etc. into an address reading.
That is, when a sentence including an address written as “1-6-1 Marunouchi, Chiyoda-ku, Tokyo” is input, the address “1-6-1” is read as “ichi minus minus lok minus minus 1”. If you do, you will feel uncomfortable as reading the address. Therefore, when the address has a continuous numerical value as described above, it is read out as “1 of 6 1”, that is, “1”, “1”, “1”.
[0054]
FIG. 21 is a flowchart of this algorithm.
After recognizing the address determined from the word string in the address dictionary 103 matched in the sentence, the characters following the address section are searched. It is determined whether or not the next character is a number (0 to 9, 0 to 9, 10, 100, 1,000, 10,000) (S701). If it is a number, the pointer is advanced by one character (S702), and the process returns to step S701. If the next character is not a number, it is determined whether or not the next word is a classifier (address, number, chome, number) (S703). If it is a classifier, the pointer is advanced by the character of the word (S704), step S701. Return to.
[0055]
If the next word is not a classifier, it is determined whether or not the next character is a delimiter (-,-) (S705). If it is a delimiter, the character is replaced with "no" (S706). Here, the reason that the long sound symbol “-” is included as a delimiter symbol is that the minus sign “-” is erroneously written as a long sound symbol so that it can be read even if it is written incorrectly.
If it is not any of the above characters or words, the process is terminated. By performing these processes, the address description including the delimiter character between the numerical values can be read out correctly.
[0056]
[Embodiment 9]
FIG. 22 is a block diagram of Embodiment 9 of the speech synthesizer of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the above-mentioned Example, and the description is abbreviate | omitted.
For example, if the sentence “Toshima Ward Third Junior High School” is entered, if only the address section determined from the word string in the address dictionary 103 matched in the sentence is converted into pronunciation information, “Tokyo ′ Te As a result, the sentence analysis unit 104 reads “Tachi” as “Tachi”.
[0057]
Therefore, in the ninth embodiment, the address suffix dictionary 120 storing the address suffix and the accent combination attribute etc. composed of words such as “standing, inside, people, going, departure, arrival,...” An address suffix dictionary search unit 121 that searches the address suffix dictionary 120 for a word that matches the word immediately following the address section determined from the word string in the address dictionary 103 that matches the word, and an address suffix dictionary search If there is an address suffix after the address as a result of the search in part 121, the address suffix consisting of the word string in the address dictionary 103 that matches the word in the sentence is corrected to include the address suffix. An accent combination processing unit 122 which performs accent combination according to the accent combination attribute set in the address suffix and sets the reading, and the final word of the word string in the address dictionary 103 matching the word of the address and the address suffix. Is provided.
[0058]
FIG. 23 is a flowchart of the algorithm according to the ninth embodiment.
It is determined whether or not the next word in the address word string is in the address suffix dictionary 120 (S801), and if it is in the address suffix dictionary 120, the accent combination processing unit 122 performs accent combination (S802).
By performing such processing, the sentence information of the previous example is converted to “Tokyo Since it can be passed to the sentence analysis unit 105 as information “Toshimak 'Ritsu” + “third junior high school”, it can be read correctly.
[0059]
Embodiment 10
FIG. 24 is a block diagram of Embodiment 10 of the speech synthesizer of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the above-mentioned Example, and the description is abbreviate | omitted.
For example, if an address appears in a sentence and then a word contained in the address appears repeatedly, for example, “Kamiodanaka Public Hall is located in Kamikodanaka, Nakahara-ku, Kanagawa. In the sentence, the place name "Kamiodanaka, Nakahara-ku, Kanagawa" can be read correctly, but the next sentence, "Kamiodanaka public hall is ..." has the same notation, although it was read earlier. Since another reading is registered, it may be read out with an incorrect reading. Further, in the case of adopting the configuration as in the sixth embodiment, the number of characters (number of words) may be less than a predetermined threshold value so that it cannot be processed by address reading.
[0060]
Accordingly, in the tenth embodiment, the address word learning unit 123 stores the kanji notation and pronunciation information in pairs in the learning word buffer 124 for all the constituent words of the address processed by the address dictionary search unit 102. And a learning word buffer 124 for storing recently used address notations and readings, and a learning word search for sentences after searching the hierarchical address dictionary 103 to match the learning word When a word is present, a learning word search unit 125 is provided that embeds corresponding pronunciation information in the corresponding sentence portion. The word stored in the learning word buffer 124 can be read out correctly when a part of the address that appears once in the sentence appears next by preferentially using the reading.
[0061]
Further, a learning word deletion unit 126 that initializes the contents in the learning word buffer 124 may be provided. In this case, the learning word buffer 124 may be configured not to be initialized but to be initialized according to the user's specification, or to be initialized every time text information is input. . Furthermore, the configuration may be such that initialization is performed when the number of sentences input after registration exceeds a predetermined number.
[0062]
Note that the speech synthesis method implemented in the speech synthesizer of the present invention is recorded in a recording medium D such as a compact disk as shown in FIG. The recording medium D may be loaded into a disk drive of a personal computer and synthesized.
[0063]
In the above-described embodiment, the case where the word string is an address has been described. However, the word string is not limited to an address, and the same effect can be obtained if the word string has a hierarchical structure.
[0064]
【The invention's effect】
  As described above, the speech synthesizer of the present inventionas well asSpeech synthesisLaw isSince it has a hierarchical dictionary that categorizes address notation and reading information that has multiple readings for one notation, it can accurately identify text information of addresses that have multiple readings for one notation Excellent effect of reading aloud with a simple reading.
[Brief description of the drawings]
FIG. 1 is a basic block diagram of a speech synthesizer according to the present invention.
FIG. 2 is a schematic diagram of an example of a speech synthesizer according to the present invention.
FIG. 3 is a configuration diagram of Embodiment 1 of the speech synthesizer of the present invention.
FIG. 4 is a conceptual diagram of an example of an address dictionary.
FIG. 5 is a flowchart of the algorithm of the first embodiment.
FIG. 6 is a configuration diagram of a second embodiment of the speech synthesizer of the present invention.
FIG. 7 is a configuration diagram of a third embodiment of the speech synthesizer of the present invention and a modification thereof.
FIG. 8 is a conceptual diagram of a search start position storage buffer.
FIG. 9 is a configuration diagram of Embodiment 4 of a speech synthesizer according to the present invention.
FIG. 10 is a conceptual diagram of another example of an address dictionary.
FIG. 11 is a flowchart (No. 1) of an algorithm according to the fourth embodiment.
FIG. 12 is a flowchart (part 2) of the algorithm of the fourth embodiment.
FIG. 13 is a configuration diagram of Embodiment 5 of a speech synthesis device of the present invention.
FIG. 14 is a configuration diagram of a modified example of the fifth embodiment of the speech synthesizer of the present invention.
FIG. 15 is a configuration diagram of a sixth embodiment of the speech synthesizer of the present invention.
FIG. 16 is a flowchart of the algorithm of the sixth embodiment.
FIG. 17 is a block diagram of a seventh embodiment of the speech synthesizer of the present invention.
FIG. 18 is a flowchart (No. 1) of an algorithm according to the seventh embodiment.
FIG. 19 is a flowchart (part 2) of the algorithm according to the seventh embodiment;
FIG. 20 is a block diagram of Embodiment 8 of a speech synthesizer of the present invention.
FIG. 21 is a flowchart of the algorithm of the eighth embodiment.
FIG. 22 is a block diagram of Embodiment 9 of the speech synthesizer of the present invention.
FIG. 23 is a flowchart of the algorithm of the ninth embodiment.
FIG. 24 is a block diagram of Embodiment 10 of a speech synthesizer of the present invention.
FIG. 25recoding mediaIt is a conceptual diagram of the recording state of.
[Explanation of symbols]
1 Text input part
2. Hierarchical dictionary search section
Three-level dictionary
4 sentence analysis department
5 basic dictionaries
6 Speech waveform generator
7 Speaker

Claims

文章情報から音声を合成する音声合成装置において、
文章情報の入力手段と、
先に出現する住所を表わす単語によって該単語の次に出現する住所を表わす単語の読みが予め決められている住所を表わす単語群からなる単語列の各単語の表記及び前記読みの情報が、単語を階層化した階層構造の情報とともに格納されている階層辞書と、
入力された文章情報に含まれている文字列の中から住所の単語列に一致する単語列の読みの候補を階層辞書から検索する階層辞書検索手段と、
該読みの候補の中から、文章情報に含まれている前記単語列の読みを前記階層構造の情報に基づいて選択し、前記単語列を該読みに変換する文章解析手段と、
読みの情報から音声波形を生成する音声波形生成手段と、
生成した音声波形に係る音声の出力手段と
を備えたことを特徴とする音声合成装置。In a speech synthesizer that synthesizes speech from text information,
Text information input means;
The notation of each word in a word string consisting of a group of words representing an address that is pre-determined to be read for the address that appears next to the word by the word that represents the address that appears first, and the reading information are the words A hierarchical dictionary that is stored with hierarchical structure information
A hierarchical dictionary search means for searching a candidate for reading of a word string that matches a word string of an address from among character strings included in input sentence information;
A sentence analysis means for selecting the reading of the word string included in sentence information from the reading candidates based on the hierarchical structure information, and converting the word string into the reading;
Voice waveform generation means for generating a voice waveform from reading information;
A speech synthesizer comprising: speech output means related to the generated speech waveform.

地域を指定する手段を備え、階層辞書検索手段が、指定された地域に属する階層から検索を開始する手段である請求項１記載の音声合成装置。 2. The speech synthesizer according to claim 1, further comprising means for designating a region, wherein the hierarchical dictionary retrieval unit is a unit for starting a search from a hierarchy belonging to the designated region.

前記階層構造の情報が、その読みを該読みに決めるべき各単語の上位の階層の親単語を特定する情報であり、階層辞書から検索した読みの候補の該情報を参照して文章情報に含まれる文字列の中から住所の単語列に一致する単語列の読みの候補の接続関係を設定する接続関係設定手段を備えた請求項１又は２記載の音声合成装置。 The hierarchical structure information is information for identifying a parent word in a higher hierarchy of each word whose reading should be determined as the reading, and is included in the sentence information with reference to the information of the reading candidates retrieved from the hierarchical dictionary The speech synthesizer according to claim 1 or 2, further comprising connection relation setting means for setting a connection relation of reading candidates of a word string that matches a word string of an address among the character strings to be read.

前記階層辞書検索手段は、前記単語列のいずれかの階層の単語の表記が省略されている単語列を基に階層辞書を検索する手段を有する請求項１乃至３のいずれかに記載の音声合成装置。 The speech synthesis according to any one of claims 1 to 3, wherein the hierarchical dictionary search means includes means for searching a hierarchical dictionary based on a word string in which notation of a word in any one of the word strings is omitted. apparatus.

所定数以上の文字又は単語の表記が階層辞書に格納されている単語列に含まれる該文字又は単語の表記と一致する場合に階層辞書の読みを該文字又は単語の読みと判定する読み判定手段を備えた請求項１乃至４のいずれかに記載の音声合成装置。 A reading determination means for determining that the reading of the hierarchical dictionary is the reading of the character or word when the notation of a predetermined number or more of characters or words matches the notation of the character or word included in the word string stored in the hierarchical dictionary The speech synthesizer according to claim 1, comprising:

前記単語列に接続される接尾語の表記及び読みの情報が格納されている接尾語辞書と、前記単語列の直後の表記に一致する表記の情報を接尾語辞書から検索し、前記直後の表記に一致する接尾語辞書の表記の読みを前記直後の表記の読みとして選択する接尾語辞書検索手段とを備えた請求項１乃至５のいずれかに記載の音声合成装置。 The suffix dictionary storing the notation and reading information of the suffix connected to the word string and the notation information matching the notation immediately after the word string are searched from the suffix dictionary, and the notation immediately after The speech synthesizer according to claim 1, further comprising: a suffix dictionary search unit that selects a reading of a notation of the suffix dictionary that matches the notation as a reading of the notation immediately after the notation.

先に出現する住所を表わす単語によって該単語の次に出現する住所を表わす単語の読みが予め決められている住所を表わす単語群からなる単語列の各単語の表記及び前記読みの情報が、単語を接続順に従って階層化した階層構造の情報とともに格納されている階層辞書を参照して文章情報から音声を合成する音声合成方法であって、
文章情報を入力し、
入力された文章情報に含まれている文字列の中から住所の単語列に一致する単語列の読みの候補を階層辞書から検索し、
該読みの候補の中から、文章情報に含まれている前記単語列の読みを前記階層構造の情報に基づいて選択し、
前記単語列を該読みに変換し、
読みの情報から音声波形を生成し、
生成した音声波形に係る音声を出力すること
を特徴とする音声合成方法。The notation of each word in a word string consisting of a group of words representing an address that is pre-determined to be read for the address that appears next to the word by the word that represents the address that appears first, and the reading information are the words A speech synthesis method for synthesizing speech from sentence information with reference to a hierarchical dictionary stored together with hierarchical structure information that is layered according to the connection order,
Enter text information,
Search the hierarchical dictionary for candidates for reading the word string that matches the word string of the address from the character strings included in the input sentence information,
From the reading candidates, select the reading of the word string included in the sentence information based on the hierarchical structure information,
Converting the word string into the reading;
Generate speech waveform from reading information,
A speech synthesis method characterized by outputting speech related to a generated speech waveform.