JP3284976B2

JP3284976B2 - Speech synthesis device and computer-readable recording medium

Info

Publication number: JP3284976B2
Application number: JP18969198A
Authority: JP
Inventors: 淳若尾; 玲史近藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-06-19
Filing date: 1998-06-19
Publication date: 2002-05-27
Anticipated expiration: 2018-06-19
Also published as: JP2000010579A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声合成装置に関
し、特に任意のテキストを読み上げる音声合成装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer, and more particularly, to a speech synthesizer for reading an arbitrary text.

【０００２】[0002]

【従来の技術】音声合成によってテキストを読み上げる
場合、そのテキストの内容に応じた読み方で読み上げる
ことが望まれる。例えば、「１２３−４５６７」という
テキストが電話番号を意味する場合、「いちにいさんの
よんごおろくなな」と読み上げることが望まれ、同テキ
ストが数式を意味する場合、「ひゃくにじゅうさんまい
なすよんせんごひゃくろくじゅうなな」と読み上げるこ
とが望まれる。また、テキスト中に表が存在する場合、
表の罫線として用いられている「−」，「｜」等は、視
覚的な意味しかないので読み上げないことが望ましい。2. Description of the Related Art When a text is read aloud by speech synthesis, it is desired that the text be read aloud according to the content of the text. For example, if the text "123-4567" means a telephone number, it is desirable to read out "Ichi ni Nissan no Yogo Ogana", and if the text means a mathematical formula, "Hyaku Niju Sanyo Yosui" It is hoped that it will be read out. Also, if there is a table in the text,
"-", "|", Etc., which are used as ruled lines in the table, have only a visual meaning and are not preferably read out.

【０００３】テキストの内容に応じた読み方を実現する
最も簡単な方法は、ユーザ自身に読み方、つまり読み上
げモードを指定させることである。一例を図２０に示
す。この音声合成装置は、音声合成手段４と、テキスト
を入力する入力手段１と、ユーザからの読み上げモード
の指示を音声合成手段４に指定するモード選択手段６と
から構成され、入力手段１によって入力されたテキスト
を音声合成手段４がモード選択手段６から指定された読
み上げモードに従って読み上げる。[0003] The easiest way to realize the reading according to the content of the text is to have the user specify the reading, that is, the reading mode. An example is shown in FIG. This speech synthesizing apparatus comprises a speech synthesizing unit 4, an input unit 1 for inputting text, and a mode selecting unit 6 for designating a speech mode instruction from a user to the speech synthesizing unit 4. The text synthesis unit 4 reads the text in accordance with the reading mode specified by the mode selection unit 6.

【０００４】例えば、「数式読み」と「電話番号読み」
との２つの読み上げモードがあり、ユーザがモード選択
手段６によって「電話番号読み」を指定した状態におい
て、入力手段１によって「１２３−４５６７」というテ
キストが入力された場合、音声合成手段４は、モード選
択手段６から指定された「電話番号読み」の指示に従っ
て、数字は棒読み（「１２３」は、「いちにいさん」と
読む）、「−」は「の」、として、「いちにいさんのよ
んごおろくなな」と読み上げる。他方、ユーザがモード
選択手段６によって「数式読み」を指定した状態におい
て、入力手段１によって「１２３−４５６７＝−４４４
４」というテキストが入力された場合、音声合成手段４
は、モード選択手段６から指定された「数式読み」の指
示に従って、数字は桁読み（「１２３」は「ひゃくにじ
ゅうさん」と読む）、「−」は「まいなす」、として、
「ひゃくにじゅうさんまいなすよんせんごひゃくろくじ
ゅうなないこーるまいなすよんせんよんひゃくよんじゅ
うよん」と読み上げる。[0004] For example, "math formula reading" and "telephone number reading"
In the state where the user has designated “read phone number” by the mode selecting means 6 and the text “123-4567” is input by the input means 1, the voice synthesizing means 4 In accordance with the instruction of "phone number reading" specified by the mode selection means 6, the numbers are read as sticks ("123" is read as "ichini-san"), and "-" is read as "no", and "ichi-ni-san-no-go-roku". What? " On the other hand, in a state where the user designates “reading of mathematical formula” by the mode selecting means 6, “123−4567 = −444” is inputted by the input means 1.
When the text "4" is input, the voice synthesizing means 4
In accordance with the instruction of “formula reading” specified by the mode selection means 6, the number is digit reading (“123” is read as “hypnotic”), and “−” is “manas”.
Say, "Hyaku-ninju-san-sen-sen-kyo-hyaku-kuro-juna-in-no-kyo-kun-sui-sen-yon-yon-hyaku-kun-juyon."

【０００５】このように、読み上げモードの選択により
同じ文字列であってもテキストの意味に則して正しく読
み上げることが可能となる。As described above, it is possible to correctly read out the same character string according to the meaning of the text by selecting the reading mode.

【０００６】しかしながら、ユーザ自身に読み上げモー
ドを指定させる上述した音声合成装置では、読み上げ対
象となるテキストの内容をユーザが事前に、つまり読み
上げられる前に知っている必要がある。このため、読み
上げ対象となるテキストがユーザにとって未知な場合に
は適用できない。また、ユーザからの読み上げモードの
指示を入力するモード選択手段６を音声合成装置側に設
ける必要があるため、例えば電話回線を通じて必要な情
報を遠隔地に設置された音声合成装置から音声にて得る
ような場合などには適用が困難である。However, in the above-described speech synthesizer in which the user specifies the reading mode, the user needs to know the contents of the text to be read in advance, that is, before the text is read out. Therefore, it cannot be applied when the text to be read out is unknown to the user. Further, since it is necessary to provide the mode selecting means 6 for inputting the instruction of the reading mode from the user on the voice synthesizing device side, necessary information is obtained by voice from the voice synthesizing device installed at a remote place through a telephone line, for example. It is difficult to apply in such cases.

【０００７】このような問題を解決するには、音声合成
装置自体がテキストの内容に応じて読み上げモードを自
動的に決定することが必要である。このような技術の従
来例が特開平８−３６３９５号公報に記載されている。To solve such a problem, it is necessary for the speech synthesizer itself to automatically determine the reading mode according to the contents of the text. A conventional example of such a technique is described in JP-A-8-36395.

【０００８】同公報に記載された技術では、所定の特殊
単語あるいは特殊パターンの数字文字列をテキスト中に
検出すると、特殊単語の前後に近接して存在する数字文
字列あるいは特殊パターンの数字文字列に、前記特殊単
語あるいは前記特殊パターンに対応した規則に従った読
みを当てることで、テキストの内容に応じて読み上げモ
ードを自動的に決定している。According to the technique described in the publication, when a numeric character string of a predetermined special word or special pattern is detected in the text, a numeric character string or a numeric character string of a special pattern existing before and after the special word. Then, the reading according to the rules corresponding to the special word or the special pattern is applied to automatically determine the reading mode according to the content of the text.

【０００９】[0009]

【発明が解決しようとする課題】上述した従来技術で
は、特殊な読みとなる数字文字列は、特殊な単語を含ん
でいたり、特殊なパターンとして現れる点に着目したも
のであり、テキストの内容に応じて読み上げモードを自
動的に決定する技術の一つとして有益な技術ではある。In the above-mentioned prior art, a special character string that is a special reading is focused on a point that includes a special word or appears as a special pattern. This is a useful technique as one of techniques for automatically determining a reading mode in response.

【００１０】しかし、例えば「数字−数字」という特殊
パターンに対し、数字は棒読み、「−」は「の」と読
む、という規則を使用するとき、「１２３−４５６７」
というテキストは「いちにいさんのよんごおろくなな」
と読み上げるが、「１２３−４５６７＝−４４４４」と
いうテキストも「いちにいさんのよんごおろくなないこ
ーるまいなすよんせんよんひゃくよんじゅうよん」と読
み上げてしまう。これを改善するには、上記の特殊パタ
ーン及び規則だけでは足りず、新たな特殊パターンと規
則の追加が必要である。この例の場合、追加する特殊パ
ターン及び規則の数はそれほど多く必要としないであろ
うが、表の罫線を読み上げないモードを自動的に決定す
るような場合、罫線に使う「−」，「｜」等の並べ方、
つまり特殊パターンは千差万別であるため、膨大な特殊
パターンを登録しておく必要があり、実質的に適用は困
難になる。However, for a special pattern such as "numerical-numerical", when using the rule that the numeral is read as a bar and "-" is read as "no", "123-4567" is used.
The text says, "I'm not happy with Ichinini-san."
However, the text "123-4567 = -4444" is also read as "Ichinii-san's not afraid." To improve this, new special patterns and rules need to be added, not just the special patterns and rules described above. In this example, the number of special patterns and rules to be added may not be so large, but when automatically determining the mode in which the ruled lines of the table are not read out, "-", "|" , Etc.,
In other words, since special patterns vary widely, it is necessary to register a huge number of special patterns, which makes practical application difficult.

【００１１】そこで本発明の目的は、テキストの内容に
応じた適切な読み上げモードを特殊パターンの検出によ
らずに自動的に決定することのできる音声合成装置を提
供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech synthesizing apparatus capable of automatically determining an appropriate reading mode according to the contents of text without detecting a special pattern.

【００１２】[0012]

【課題を解決するための手段】本発明の音声合成装置
は、テキスト中の各文字の出現頻度に基づいて読み上げ
モードを決定する。より具体的には、テキストを入力す
る入力手段と、入力されたテキスト中の文字のテキスト
全体における出現頻度を抽出する頻度抽出手段と、この
頻度情報をもとに読み上げモードを決定するモード決定
手段と、この決定された読み上げモードに応じて動作す
る音声合成手段とを備えている。SUMMARY OF THE INVENTION A speech synthesizer according to the present invention determines a reading mode based on the appearance frequency of each character in a text. More specifically, input means for inputting text, and text of characters in the input text
The apparatus includes frequency extracting means for extracting the appearance frequency in the whole, mode determining means for determining a reading mode based on the frequency information, and speech synthesizing means operating in accordance with the determined reading mode.

【００１３】このように構成された本発明の音声合成装
置にあっては、入力手段が読み上げ対象となるテキスト
を入力し、頻度抽出手段が、そのテキスト中の文字のテ
キスト全体における出現頻度を抽出し、この抽出された
頻度情報をもとにモード決定手段が読み上げモードを決
定し、音声合成手段がその決定された読み上げモードで
テキストを読み上げる。[0013] In the speech synthesizer of the present invention configured as above, enter the text to be read aloud input means, the frequency extraction means, character Te in the text
The appearance frequency in the entire text is extracted, the mode determining means determines the reading mode based on the extracted frequency information, and the voice synthesizing means reads the text in the determined reading mode.

【００１４】また本発明の音声合成装置は、テキスト中
に様々な書式の文が混在していてもそれぞれを適切な読
み上げモードで読み上げることができるようにもする為
に、入力されたテキストを分割し、各分割テキスト毎に
処理するようにしている。より具体的には、テキストを
入力する入力手段と、入力されたテキストを分割する分
割手段と、分割されたテキスト毎にそのテキスト中の文
字のテキスト全体における出現頻度を抽出する頻度抽出
手段と、この頻度情報をもとに、分割テキスト毎に読み
上げモードを決定するモード決定手段と、分割テキスト
毎に、決定された読み上げモードに応じて動作する音声
合成手段とを備えている。Further, the speech synthesizing apparatus of the present invention divides an input text so that even if sentences in various formats are mixed in a text, each can be read out in an appropriate reading mode. Then, processing is performed for each divided text. More specifically, an input unit for inputting a text, a dividing unit for dividing the input text, a frequency extracting unit for extracting, for each divided text, an appearance frequency of a character in the text in the entire text , A mode determining unit that determines a reading mode for each of the divided texts based on the frequency information, and a speech synthesis unit that operates in accordance with the determined reading mode for each of the divided texts.

【００１５】このように構成された本発明の音声合成装
置にあっては、入力手段が入力した読み上げ対象となる
テキストを分割手段が分割し、頻度抽出手段が、分割さ
れたテキスト毎にそのテキスト中の文字のテキスト全体
における出現頻度を抽出し、この頻度情報をもとにモー
ド決定手段が、分割されたテキスト毎に読み上げモード
を決定し、音声合成手段がその決定された読み上げモー
ドでテキストを読み上げる。In the speech synthesizing apparatus according to the present invention, the text to be read aloud input by the input means is divided by the dividing means, and the frequency extracting means outputs the text for each divided text. The entire text of the middle character
Extract the frequency of appearance, based on the mode determination unit the frequency information determines a reading mode for each divided text, read text in reading mode speech synthesis means is the decision.

【００１６】本発明が自動切り替え対象とする読み上げ
モードの種類や数は任意である。一例を挙げれば、表の
罫線などに多用される「−」，「｜」，「＊」等の特定
の記号文字を読み上げないモードと読み上げるモードと
の相互切り替えに適用できる。この場合、頻度抽出手段
は、特に、前記特定の記号文字の出現頻度を抽出し、モ
ード決定手段は、各記号文字の出現頻度をもとに、その
記号文字を読み上げないモード、読み上げるモードの何
れかを決定する。The type and number of reading modes to be automatically switched by the present invention are arbitrary. As an example, the present invention can be applied to mutual switching between a mode in which specific symbol characters such as “−”, “|”, and “*”, which are frequently used for ruled lines in a table, are not read, and a mode in which they are read. In this case, the frequency extracting means particularly extracts the frequency of appearance of the specific symbol character, and the mode determining means determines whether the mode of reading the symbol character is not read or the mode of reading the symbol character based on the frequency of appearance of each symbol character. To decide.

【００１７】他の例として、アルファベット文字列をロ
ーマ字として読み上げるモードと英語として読み上げる
モードとの相互切り替えに適用できる。この場合、頻度
抽出手段は、特に、アルファベット母音および子音の出
現頻度を抽出し、モード決定手段は、アルファベット母
音および子音の出現頻度をもとに、アルファベット文字
列をローマ字として読み上げるモード、英語として読み
上げるモードの何れかを決定する。そして、より好まし
い形態においては、音声合成手段は、アルファベット文
字列を英語として読み上げるモードが決定された場合、
テキスト中の数字、記号も英語として読み上げる。As another example, the present invention can be applied to switching between a mode in which an alphabet character string is read out as Roman characters and a mode in which it is read out as English. In this case, the frequency extracting means particularly extracts the appearance frequency of the alphabet vowels and consonants, and the mode determining means reads the alphabet character string as a Roman character based on the appearance frequency of the alphabet vowels and consonants, and reads out as English. Determine one of the modes. Then, in a more preferred embodiment, the voice synthesis means determines that a mode in which the alphabet character string is read out as English is determined.
The numbers and symbols in the text are read aloud in English.

【００１８】また別の例として、特定の記号文字を含む
数字文字列を電話番号として読み上げるモードと数式と
して読み上げるモードとの相互切り替えに適用できる。
この場合、頻度抽出手段は、例えば「−」や「＋，＝，
×，÷」等の特定の記号文字の出現頻度を抽出し、モー
ド決定手段は、特定の記号文字の出現頻度をもとに、数
字文字列を電話番号として読み上げるモード、数式とし
て読み上げるモードの何れかを決定する。As another example, the present invention can be applied to a mode of reading a numeric character string including a specific symbol character as a telephone number and a mode of reading a numeric character string as a mathematical expression.
In this case, for example, the frequency extracting means may be “−”, “+, =,
The mode determination means extracts the frequency of occurrence of a specific symbol character such as “×, ÷”, and the mode determining means reads out a numeric character string as a telephone number or a mode to read as a mathematical expression based on the frequency of occurrence of the specific symbol character. To decide.

【００１９】[0019]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００２０】（１）第１の実施の形態図１は本発明の第１の実施の形態のブロック図である。
本実施の形態の音声合成装置は、テキストを入力する入
力手段１と、入力されたテキスト中の文字の出現頻度を
抽出する頻度抽出手段２と、この頻度情報をもとに読み
上げモードを決定するモード決定手段３と、この決定さ
れた読み上げモードに応じて動作する音声合成手段４と
を備えている。(1) First Embodiment FIG. 1 is a block diagram of a first embodiment of the present invention.
The speech synthesizing apparatus according to the present embodiment includes an input unit 1 for inputting a text, a frequency extracting unit 2 for extracting a frequency of appearance of a character in the input text, and a reading mode determined based on the frequency information. The apparatus includes a mode determining unit 3 and a voice synthesizing unit 4 that operates according to the determined reading mode.

【００２１】入力手段１は、図示しないキーボードから
入力されたテキスト、またはハードディスク等に蓄えら
れたテキスト等、読み上げ対象となる任意のテキストを
入力し、頻度抽出手段２と音声合成手段４とに渡す。The input means 1 inputs an arbitrary text to be read out, such as a text input from a keyboard (not shown) or a text stored in a hard disk or the like, and passes the text to the frequency extracting means 2 and the speech synthesizing means 4. .

【００２２】頻度抽出手段２は、入力手段１から受け取
ったテキスト中の文字の出現頻度を抽出し、その頻度情
報をモード決定手段３に渡す。出現頻度は、例えば、
「あ」、「い」等の文字毎や、「記号」、「漢字」等の
文字の種類（カテゴリ）毎に抽出する。また、全ての文
字や種類の出現頻度を必ずしも抽出する必要はない。ど
のような文字や種類の出現頻度を抽出するかは、自動切
り替えの対象としている読み上げモードの種類に応じて
事前に定められている。The frequency extracting means 2 extracts the appearance frequency of the character in the text received from the input means 1 and passes the frequency information to the mode determining means 3. The appearance frequency is, for example,
It is extracted for each character such as "A" and "I" and for each character type (category) such as "symbol" and "Kanji". Further, it is not always necessary to extract the appearance frequencies of all characters and types. Which character or type of appearance frequency is to be extracted is determined in advance according to the type of the reading mode to be automatically switched.

【００２３】モード決定手段３は、頻度抽出手段２から
受け取った頻度情報をもとに、予め定められた規則に基
づいて読み上げモードを決定する。どのような規則を使
用するかは、音声合成装置が自動切り替えの対象として
いる読み上げモードの種類に応じて事前に定められてい
る。The mode determining means 3 determines a reading mode based on the frequency information received from the frequency extracting means 2 based on a predetermined rule. Which rule is used is determined in advance according to the type of the reading mode to which the speech synthesizer is to be automatically switched.

【００２４】音声合成手段４は、入力手段１から与えら
れたテキストを、モード決定手段３からの読み上げモー
ドの指示に従い読み上げる。図２に音声合成手段４の構
成例を示す。The voice synthesizing means 4 reads out the text provided from the input means 1 in accordance with a reading mode instruction from the mode determining means 3. FIG. 2 shows a configuration example of the voice synthesizing means 4.

【００２５】この例の音声合成手段４は、テキストを解
析し発音記号を生成するテキスト解析手段４１と、読み
上げモードを記憶するモード記憶手段４２と、単語とそ
の発音記号を記憶する単語辞書４３と、音声波形を音素
単位で記憶する波形辞書４４と、発音記号から音声波形
を生成する音声波形生成手段４５とを備えている。The voice synthesizing means 4 of this example includes a text analyzing means 41 for analyzing text and generating phonetic symbols, a mode storing means 42 for storing a reading mode, a word dictionary 43 for storing words and their phonetic symbols. And a speech dictionary 44 for storing speech waveforms in phoneme units and speech waveform generation means 45 for producing speech waveforms from phonetic symbols.

【００２６】モード記憶手段４２はモード決定手段３か
らの読み上げモードの指示を記憶する。単語辞書４３
は、単語とその読み、アクセントの情報である発音記号
を記憶する。読み上げモード毎に読み分ける必要がある
単語については、読み上げモード毎にその発音記号（読
まない場合も含む）が単語辞書４３に記憶されている。
テキスト解析手段４１は、入力手段１から与えられたテ
キストを、単語単位に分割し、単語毎に単語辞書４３お
よびモード記憶手段４２の内容に従い発音記号に変換す
る。波形辞書４４は、音素単位の音声波形とその音素ラ
ベルを記憶している。音声波形生成手段４５は、テキス
ト解析手段４１から出力された発音記号をもとに、波形
辞書４４から、音素ラベルをキーとして必要な音素単位
の音声波形を検索、抽出し、これを編集することで音声
波形を生成する。The mode storage means 42 stores a reading mode instruction from the mode determination means 3. Word dictionary 43
Stores a word, its pronunciation, and phonetic symbols that are information on accents. For words that need to be read separately in each reading mode, their phonetic symbols (including those not read) are stored in the word dictionary 43 for each reading mode.
The text analysis unit 41 divides the text provided from the input unit 1 into words and converts the text into phonetic symbols according to the contents of the word dictionary 43 and the mode storage unit 42 for each word. The waveform dictionary 44 stores a speech waveform for each phoneme and its phoneme label. The speech waveform generating means 45 searches for, extracts, and edits a necessary phoneme unit speech waveform from the waveform dictionary 44 using the phoneme label as a key based on the phonetic symbols output from the text analysis means 41. Generates an audio waveform.

【００２７】次に本実施の形態の実施例について図面を
参照して詳細に説明する。Next, examples of the present embodiment will be described in detail with reference to the drawings.

【００２８】（Ａ）第１の実施例図３は本実施の形態における第１の実施例のフローチャ
ート、図４および図５はテキスト例を示す。本実施例の
音声合成装置は、記号文字の出現頻度をもとに、その記
号文字を読み上げないモード、読み上げるモードを自動
的に選択する機能を持つ。(A) First Example FIG. 3 is a flowchart of a first example of the present embodiment, and FIGS. 4 and 5 show text examples. The voice synthesizing apparatus according to the present embodiment has a function of automatically selecting a mode in which the symbol character is not read out or a mode in which the symbol character is read out based on the appearance frequency of the symbol character.

【００２９】本実施例の場合、頻度抽出手段２は、テキ
スト中の文字の出現頻度を抽出するが、特にテキスト中
において視覚的な意味しか持たない記号文字については
各記号文字毎に出現頻度を抽出する。対象とする記号文
字は、「−」、「｜」、「＊」などである。また、モー
ド決定手段３には、記号文字の出現頻度が予め定められ
た頻度より多いとき、その記号文字を読み上げないモー
ドにし、そうでないときは読み上げるモードにするとい
う規則が予め設定されている。In the case of this embodiment, the frequency extracting means 2 extracts the frequency of appearance of a character in a text. In particular, for a symbol character having only a visual meaning in a text, the frequency of occurrence is determined for each symbol character. Extract. The target symbol characters are “−”, “|”, “*” and the like. Further, the mode determining means 3 is preset with a rule that, when the appearance frequency of the symbol character is higher than a predetermined frequency, the mode is set to a mode in which the symbol character is not read out, and otherwise, a mode in which the symbol character is read out is set.

【００３０】以下、図１〜図５を参照して、本実施例の
動作を説明する。The operation of this embodiment will be described below with reference to FIGS.

【００３１】まず、入力手段１がテキストを入力し、頻
度抽出手段２および音声合成手段４に渡す（ステップＳ
１）。次に、頻度抽出手段２は、テキスト中の文字の出
現頻度を求めるが、その際、「−」、「｜」、「＊」な
どの所定の記号文字については個々の記号文字毎に出現
頻度を求める（ステップＳ２）。次に、モード決定手段
３は、個々の記号文字毎にその出現頻度の大小を判定し
（ステップＳ３）、頻度が著しく大きい記号文字を、読
まない文字のリストに加える（ステップＳ４）。こうし
て生成したリストは、読み上げモードの情報として、音
声合成手段４のモード記憶手段４２に記憶される。な
お、読まない文字が１つもない場合、リストは空であ
る。First, the input means 1 inputs a text and passes it to the frequency extracting means 2 and the speech synthesizing means 4 (step S).
1). Next, the frequency extracting means 2 calculates the frequency of appearance of the characters in the text. At this time, the frequency of occurrence of predetermined symbol characters such as “−”, “|”, “*” is determined for each symbol character. Is obtained (step S2). Next, the mode determining means 3 determines the magnitude of the appearance frequency of each symbol character (step S3), and adds the symbol character having a remarkably high frequency to a list of unreadable characters (step S4). The list thus generated is stored in the mode storage unit 42 of the speech synthesis unit 4 as the information on the reading mode. If there are no unread characters, the list is empty.

【００３２】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２に上記リストが記憶されると、
入力テキストに対する処理を開始する。まず、テキスト
中から、モード記憶手段４２に記憶されたリスト中の記
号文字を消去する（ステップＳ５）。そして、そのテキ
ストを単語単位に分割し（ステップＳ６）、単語辞書４
３を参照して単語毎に発音記号に変換する（ステップＳ
７）。次に、音声波形生成手段４５は、テキスト解析手
段４１から出力された発音記号をもとに波形辞書４４を
参照して必要な音素単位の音声波形を取得して編集する
ことで音声波形を生成する（ステップＳ８）。そして、
音声波形の電気信号をスピーカ等に直接に出力するか、
電話回線を通じて他所に出力する（ステップＳ９）。The text analysis means 41 of the speech synthesis means 4
When the above list is stored in the mode storage means 42,
Start processing the input text. First, the symbol characters in the list stored in the mode storage unit 42 are deleted from the text (step S5). Then, the text is divided into words (step S6), and the word dictionary 4
3 is converted into phonetic symbols for each word (step S
7). Next, the audio waveform generation unit 45 generates an audio waveform by acquiring and editing a necessary phoneme unit audio waveform by referring to the waveform dictionary 44 based on the phonetic symbols output from the text analysis unit 41. (Step S8). And
Whether to output the audio waveform electric signal directly to a speaker or the like,
Output to another place through the telephone line (step S9).

【００３３】本実施例の音声合成装置は以上のように動
作するため、例えば図４に示されるようなテキストが入
力された場合、記号文字「＊」１文字、その他の文字１
７文字であり、「＊」の出現頻度が少ないため、「＊」
は消去されず、従って、「あすたりすくはわくをえがく
ときなどによくもちいられる」と読み上げられる。これ
に対し、例えば図５に示されるテキストの場合、記号文
字「＊」３６文字、その他の文字２６文字であり、
「＊」の出現頻度が極めて大きいため、テキスト中から
消去され、「＊」で描かれた枠内の文のみが読み上げら
れる。Since the speech synthesizer of this embodiment operates as described above, for example, when a text as shown in FIG. 4 is input, one symbol character "*" and one other character
7 characters and "*" because the frequency of occurrence of "*" is low.
Is not erased, and is therefore read aloud as "usually used when writing tomorrow." On the other hand, in the case of the text shown in FIG. 5, for example, there are 36 symbol characters “*” and 26 other characters,
Since the appearance frequency of “*” is extremely high, it is deleted from the text, and only the sentences in the frame drawn with “*” are read out.

【００３４】なお、本実施例では、テキスト解析手段４
１における単語読み付けの段階以前で、読まない文字を
テキスト中から削除することによって、読まない文字の
読み上げを行わないようにした。しかし、「−」、
「｜」、「＊」などの所定の記号文字について、単語読
み付けの段階で、その記号文字がリスト中に存在すれば
発音記号を付与しないことによって、読み上げないよう
にすることもできる。In this embodiment, the text analysis means 4
Before the word reading stage in No. 1, the unread characters were deleted from the text so that the unread characters were not read aloud. However, "-",
Regarding predetermined symbol characters such as “|” and “*”, at the stage of reading words, if the symbol characters are present in the list, no phonetic symbol is added, so that reading is not performed.

【００３５】（Ｂ）第２の実施例図６は本実施の形態における第２の実施例のフローチャ
ート、図７および図８はテキスト例、図９は図８のテキ
ストの読み上げ例を示す。本実施例の音声合成装置は、
アルファベット母音および子音の出現頻度をもとに、ア
ルファベット文字列をローマ字として読み上げるモー
ド、英語として読み上げるモードを自動的に選択する機
能を持つ。(B) Second Example FIG. 6 is a flowchart of a second example of the present embodiment, FIGS. 7 and 8 show examples of text, and FIG. 9 shows an example of reading out the text of FIG. The speech synthesizer of the present embodiment
Based on the frequency of appearance of alphabet vowels and consonants, it has a function to automatically select a mode for reading alphabetic character strings as Roman characters and a mode for reading English characters as English.

【００３６】本実施例の場合、頻度抽出手段２は、アル
ファベット母音（ａ，ｉ，ｕ，ｅ，ｏ）およびアルファ
ベット子音について、それぞれ毎に出現頻度を抽出す
る。また、モード決定手段３には、例えば、アルファベ
ット母音の出現頻度÷アルファベット子音の出現頻度の
値が予め定められた値より大きいとき、テキスト中のア
ルファベット文字列をローマ字として読み上げるモード
にし、そうでないときは英語として読み上げるモードに
するという規則が予め設定されている。In the case of this embodiment, the frequency extracting means 2 extracts the appearance frequency for each of the alphabet vowels (a, i, u, e, o) and the alphabet consonants. For example, when the value of the appearance frequency of an alphabet vowel divided by the appearance frequency of an alphabet consonant is larger than a predetermined value, the mode determining means 3 sets a mode in which the alphabet character string in the text is read out as Roman characters. Is set in advance so that the mode is set to the mode of reading out as English.

【００３７】以下、図１、図２、図６〜図９を参照し
て、本実施例の動作を説明する。The operation of this embodiment will be described below with reference to FIGS. 1, 2, and 6 to 9.

【００３８】まず、入力手段１がテキストを入力し、頻
度抽出手段２および音声合成手段４に渡す（ステップＳ
１１）。次に、頻度抽出手段２は、アルファベット母音
と子音について各々毎に出現回数を累計し、それぞれの
カテゴリの出現頻度を求める（ステップＳ１２）。次
に、モード決定手段３は、アルファベット母音の出現頻
度÷アルファベット子音の出現頻度の値を求め、その値
が予め定められた値より大きいとき、ローマ字読みのモ
ードを決定し、そうでないときは英語読みのモードを決
定し、音声合成手段４のモード記憶手段４２に設定する
（ステップＳ１３〜Ｓ１５）。First, the input means 1 inputs a text and passes it to the frequency extracting means 2 and the speech synthesizing means 4 (step S).
11). Next, the frequency extracting means 2 accumulates the number of appearances for each of the alphabet vowels and consonants, and obtains the appearance frequency of each category (step S12). Next, the mode determining means 3 calculates the value of the appearance frequency of the alphabet vowel divided by the appearance frequency of the alphabet consonant, and if the value is larger than a predetermined value, determines the Roman reading mode. The reading mode is determined and set in the mode storage unit 42 of the speech synthesis unit 4 (steps S13 to S15).

【００３９】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２にモードが設定されると、入力
テキストに対する処理を開始する。まず、テキストを単
語単位に分割する（ステップＳ１６）。次に、モード記
憶手段４２にローマ字読みのモードが設定されていれ
ば、テキスト中のアルファベットをローマ字とみなし平
仮名に変換する（ステップＳ１７，Ｓ１９）。例えば、
「ｔａｎａｋａ」は「たなか」に変換する。また、モー
ド記憶手段４２に英語読みのモードが設定されていれ
ば、テキスト中のアルファベット部分はそのままにして
おく。但し、テキスト中の数字および記号については、
英語の読みに変換しておく（ステップＳ１７，Ｓ１
８）。例えば、「２４」は「ｔｗｅｎｔｙ−ｆｏｕｒ」
に変換しておく。次に、テキスト解析手段４１は、単語
辞書４３を参照して単語毎に発音記号に変換し、音声波
形生成手段４５に出力する（ステップＳ２０）。Text analysis means 41 of speech synthesis means 4
Starts processing on the input text when the mode is set in the mode storage means 42. First, the text is divided into words (step S16). Next, if the Roman character reading mode is set in the mode storage means 42, the alphabet in the text is regarded as Roman characters and converted into Hiragana (steps S17, S19). For example,
“Tanaka” is converted to “tanaka”. If the English reading mode is set in the mode storage means 42, the alphabet part in the text is left as it is. However, for numbers and symbols in the text,
Converted to English reading (steps S17, S1
8). For example, "24" is "twenty-four"
Converted to Next, the text analysis unit 41 refers to the word dictionary 43, converts each word into a phonetic symbol, and outputs it to the speech waveform generation unit 45 (step S20).

【００４０】音声波形生成手段４５は、テキスト解析手
段４１から出力された発音記号をもとに波形辞書４４を
参照して必要な音素単位の音声波形を取得して編集する
ことで音声波形を生成し（ステップＳ２１）、音声波形
の電気信号をスピーカ等に直接に出力するか、電話回線
を通じて他所に出力する（ステップＳ２２）。The voice waveform generation means 45 generates a voice waveform by acquiring and editing a required phoneme unit voice waveform by referring to the waveform dictionary 44 based on the phonetic symbols output from the text analysis means 41. Then, an electric signal of an audio waveform is directly output to a speaker or the like, or is output to another place through a telephone line (step S22).

【００４１】本実施例の音声合成装置は以上のように動
作するため、例えば図７に示されるような英文テキスト
が入力された場合、頻度抽出手段２においてアルファベ
ット母音の出現頻度＝２６文字、アルファベット子音の
出現頻度＝３６文字という頻度情報が得られ、母音と比
較して子音の頻度が多いことからモード決定手段３は英
語読みモードと決定する。この結果、図７のテキストは
英語としてそのまま読み上げられる。また、図７のテキ
スト中には数字および記号は存在しないが、若し、存在
していたならば、英語で読み上げられる。一方、例えば
図８に示されるローマ字テキストが入力された場合、頻
度抽出手段２においてアルファベット母音の出現頻度＝
５２文字、アルファベット子音の出現頻度＝４６文字と
いう頻度情報が得られ、子音と比較して母音の頻度が多
いことからモード決定手段３はローマ字読みモードと決
定する。この結果、図８のテキストは図９に示されるよ
うに読み上げられる。Since the speech synthesizing apparatus of this embodiment operates as described above, for example, when an English text as shown in FIG. The frequency information that the consonant appearance frequency = 36 characters is obtained, and the frequency of the consonant is higher than that of the vowel, so that the mode determining means 3 determines the English reading mode. As a result, the text in FIG. 7 is read out as it is in English. Further, although the numbers and symbols do not exist in the text of FIG. 7, if they exist, they are read out in English. On the other hand, for example, when the Roman text shown in FIG.
The frequency information that 52 characters and the appearance frequency of alphabet consonants = 46 characters is obtained, and the frequency of vowels is higher than consonants, so that the mode determining means 3 determines the Roman character reading mode. As a result, the text of FIG. 8 is read out as shown in FIG.

【００４２】なお、本実施例では、アルファベット母音
の出現頻度÷アルファベット子音の出現頻度の値が予め
定められた値より大きいか否かで、ローマ字読みモー
ド、英語読みモードを決定したが、アルファベット母音
の出現頻度とアルファベット子音の出現頻度との差や、
アルファベット（母音および子音）の出現頻度に対する
アルファベット母音あるいは子音の出現頻度の割合等に
基づいて、モードを決定することも可能である。In this embodiment, the Roman alphabet reading mode and the English reading mode are determined based on whether the value of the appearance frequency of alphabet vowels ÷ the appearance frequency of alphabet consonants is greater than a predetermined value. The difference between the frequency of occurrence of
It is also possible to determine the mode based on the ratio of the appearance frequency of alphabet vowels or consonants to the appearance frequency of alphabets (vowels and consonants).

【００４３】（Ｃ）第３の実施例図１０は本実施の形態における第３の実施例のフローチ
ャート、図１１および図１３はテキスト例、図１２は図
１１のテキストの読み上げ例、図１４は図１３のテキス
トの読み上げ例を示す。本実施例の音声合成装置は、特
定の記号文字の出現頻度を抽出し、その出現頻度をもと
に、前記特定の記号文字を含む数字文字列を電話番号と
して読み上げるモード、数式として読み上げるモードを
自動的に選択する機能を持つ。(C) Third Example FIG. 10 is a flowchart of a third example of the present embodiment, FIGS. 11 and 13 are text examples, FIG. 12 is a text-to-speech example of FIG. 11, and FIG. 14 illustrates an example of reading out the text of FIG. 13. The voice synthesizing apparatus according to the present embodiment extracts a frequency of appearance of a specific symbol character, and based on the frequency of occurrence, a mode in which a numeric character string including the specific symbol character is read as a telephone number, and a mode in which the character string is read as a mathematical expression. Has a function to select automatically.

【００４４】本実施例の場合、頻度抽出手段２は、特定
の記号文字について、テキスト中における出現頻度を抽
出する。対象とする記号文字は、本実施例の場合、
「＝，＋，÷，×」である。また、モード決定手段３に
は、前記特定の記号文字の出現頻度が所定値より多いと
き、前記特定の記号文字を含む数字文字列を数式として
読み上げるモード、そうでないとき電話番号として読み
上げるモードにするという規則が予め設定されている。
さらに、単語辞書４３には、記号文字「−」の発音記号
として、数式読みモードの場合の「まいなす」、電話番
号読みモードの場合の「の」が記憶されている。In the case of this embodiment, the frequency extracting means 2 extracts the frequency of occurrence of a specific symbol character in a text. The target symbol character is, in the case of this embodiment,
“=, +, ÷, ×”. Further, the mode determining means 3 sets a mode in which, when the frequency of occurrence of the specific symbol character is larger than a predetermined value, a numeric character string including the specific symbol character is read as a mathematical expression, and otherwise, the mode is read as a telephone number. Is set in advance.
Further, the word dictionary 43 stores “Manasu” in the case of the mathematical expression reading mode and “No” in the case of the telephone number reading mode as pronunciation symbols of the symbol character “−”.

【００４５】以下、図１、図２、図１０〜図１４を参照
して、本実施例の動作を説明する。The operation of this embodiment will be described below with reference to FIGS. 1, 2, and 10 to 14.

【００４６】まず、入力手段１がテキストを入力し、頻
度抽出手段２および音声合成手段４に渡す（ステップＳ
３１）。次に、頻度抽出手段２は、記号文字「＝，＋，
÷，×」のテキスト中での出現回数を累計し、それら全
体の出現頻度を求める（ステップＳ３２）。次に、モー
ド決定手段３は、その出現頻度が予め定められた値より
大きいとき、数式読みモードと決定し、そうでないとき
は電話番号読みモードと決定し、音声合成手段４のモー
ド記憶手段４２に設定する（ステップＳ３３〜Ｓ３
５）。First, the input means 1 inputs a text and passes it to the frequency extracting means 2 and the speech synthesizing means 4 (step S).
31). Next, the frequency extracting means 2 outputs the symbol characters “=, +,
The number of appearances in the text “{, ×” ”is accumulated, and the overall appearance frequency is obtained (step S32). Next, when the appearance frequency is larger than a predetermined value, the mode determining means 3 determines the mode as the numerical expression reading mode, and otherwise, determines the mode as the telephone number reading mode. (Steps S33 to S3)
5).

【００４７】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２にモードが設定されると、入力
テキストに対する処理を開始する。まず、テキストを単
語単位に分割する（ステップＳ３６）。次に、モード記
憶手段４２に数式読みモードが設定されていれば、テキ
スト中の数字を桁読みに変換する（ステップＳ３７，Ｓ
３８）。例えば、「２４」は「にじゅうよん」に変換す
る。また、モード記憶手段４２に電話番号読みモードが
設定されていれば、テキスト中の数字を棒読みに変換す
る（ステップＳ３７，Ｓ３９）。例えば、「２４」は
「にいよん」に変換する。次に、テキスト解析手段４１
は、単語辞書４３を参照して単語毎に発音記号に変換
し、音声波形生成手段４５に出力する（ステップＳ４
０）。このとき、テキスト中の「−」については、数式
読みモードのとき「まいなす」、電話番号読みモードの
とき「の」の発音記号に変換される。The text analysis means 41 of the speech synthesis means 4
Starts processing on the input text when the mode is set in the mode storage means 42. First, the text is divided into words (step S36). Next, if the formula reading mode is set in the mode storage means 42, the numbers in the text are converted to digit reading (steps S37, S37).
38). For example, “24” is converted to “Nijyo”. If the telephone number reading mode is set in the mode storage means 42, the numbers in the text are converted into stick readings (steps S37, S39). For example, "24" is converted to "Niyon". Next, the text analysis means 41
Is converted into phonetic symbols for each word with reference to the word dictionary 43 and output to the speech waveform generating means 45 (step S4).
0). At this time, "-" in the text is converted to a phonetic symbol of "Manai" in the mathematical expression reading mode and "No" in the telephone number reading mode.

【００４８】音声波形生成手段４５は、テキスト解析手
段４１から出力された発音記号をもとに波形辞書４４を
参照して必要な音素単位の音声波形を取得して編集する
ことで音声波形を生成し（ステップＳ４１）、音声波形
の電気信号をスピーカ等に直接に出力するか、電話回線
を通じて他所に出力する（ステップＳ４２）。The voice waveform generating means 45 generates a voice waveform by acquiring and editing a required voice element-based voice waveform by referring to the waveform dictionary 44 based on the phonetic symbols output from the text analysis means 41. Then, an electric signal of an audio waveform is directly output to a speaker or the like, or is output to another place through a telephone line (step S42).

【００４９】本実施例の音声合成装置は以上のように動
作するため、例えば図１１に示されるような名前と電話
番号とが羅列されたテキストが入力された場合、頻度抽
出手段２において記号文字「＝，＋，÷，×」の出現頻
度＝０という頻度情報が得られ、モード決定手段３は電
話番号読みモードと決定する。この結果、図１１のテキ
ストは図１２のように読み上げられる。一方、例えば図
１３に示される数式のテキストが入力された場合、頻度
抽出手段２において記号文字「＝，＋，÷，×」の出現
頻度＝７という頻度情報が得られ、頻度が多いことから
モード決定手段３は電話番号読みモードと決定する。こ
の結果、図１３のテキストは図１４に示されるように読
み上げられる。The speech synthesizing apparatus of this embodiment operates as described above. For example, when a text in which names and telephone numbers are listed as shown in FIG. Frequency information that the appearance frequency of “=, +, ÷, ×” = 0 is obtained, and the mode determination unit 3 determines the telephone number reading mode. As a result, the text in FIG. 11 is read out as in FIG. On the other hand, for example, when the text of the mathematical formula shown in FIG. 13 is input, the frequency extraction means 2 obtains the frequency information of the appearance frequency of the symbol character “=, +, ÷, ×” = 7. The mode determining means 3 determines the telephone number reading mode. As a result, the text in FIG. 13 is read out as shown in FIG.

【００５０】なお、本実施例では、記号文字「＝，＋，
÷，×」の出現頻度が予め定められた値より大きいとき
に数式読みモード、そうでないときに電話番号読みモー
ドとしたが、記号文字「＝，＋，÷，×」の代わりに記
号文字「−」の出現頻度を求め、その出現頻度が予め定
められた値より大きいときに電話番号読みモード、そう
でないときに数式読みモードを決定するようにしても良
い。In this embodiment, the symbol characters "=, +,
When the appearance frequency of “÷, ×” is larger than a predetermined value, the formula reading mode is set. Otherwise, the phone number reading mode is set. However, instead of the symbol characters “=, +, ÷, ×”, the symbol characters “ The appearance frequency of “−” may be determined, and the phone number reading mode may be determined when the appearance frequency is greater than a predetermined value, and the mathematical expression reading mode may be determined otherwise.

【００５１】（２）第２の実施の形態図１５は本発明の第２の実施の形態のブロック図であ
る。本実施の形態の音声合成装置は、テキストを入力す
る入力手段１と、入力されたテキストを分割する分割手
段５と、分割されたテキスト毎にそのテキスト中の文字
の出現頻度を抽出する頻度抽出手段２と、この頻度情報
をもとに、分割テキスト毎に読み上げモードを決定する
モード決定手段３と、分割テキスト毎に、決定された読
み上げモードに応じて動作する音声合成手段４とを備え
ている。(2) Second Embodiment FIG. 15 is a block diagram of a second embodiment of the present invention. The speech synthesizing apparatus according to the present embodiment includes an input unit 1 for inputting a text, a dividing unit 5 for dividing the input text, and a frequency extraction for extracting an appearance frequency of a character in the text for each divided text. Means 2, mode determining means 3 for determining a reading mode for each divided text based on the frequency information, and speech synthesizing means 4 operating in accordance with the determined reading mode for each divided text. I have.

【００５２】入力手段１は、図示しないキーボードから
入力されたテキスト、またはハードディスク等に蓄えら
れたテキスト等、読み上げ対象となる任意のテキストを
入力し、分割手段５に渡す。The input unit 1 inputs an arbitrary text to be read out, such as a text input from a keyboard (not shown) or a text stored in a hard disk or the like, and passes the text to the dividing unit 5.

【００５３】分割手段５は、テキストを空白、改
行、「。」等を区切りとして、テキストを文、段落等の
単位に分割する。分割されたテキストは頻度抽出手段２
および音声合成手段４に渡される。The dividing means 5 divides the text into units such as sentences, paragraphs, etc., using a space, line feed, "." The divided text is a frequency extracting means 2
And passed to the speech synthesis means 4.

【００５４】頻度抽出手段２は、入力手段１から受け取
った分割テキスト毎にその中の文字の出現頻度を抽出
し、その頻度情報をモード決定手段３に渡す。出現頻度
は、例えば、「あ」、「い」等の文字毎や、「記号」、
「漢字」等の文字の種類（カテゴリ）毎に抽出する。ま
た、全ての文字や種類の出現頻度を必ずしも抽出する必
要はない。どのような文字や種類の出現頻度を抽出する
かは、自動切り替えの対象としている読み上げモードの
種類に応じて事前に定められている。The frequency extracting means 2 extracts the appearance frequency of the character in each of the divided texts received from the input means 1, and passes the frequency information to the mode determining means 3. The appearance frequency is, for example, for each character such as "a", "i", "symbol",
It is extracted for each character type (category) such as "Kanji". Further, it is not always necessary to extract the appearance frequencies of all characters and types. Which character or type of appearance frequency is to be extracted is determined in advance according to the type of the reading mode to be automatically switched.

【００５５】モード決定手段３は、頻度抽出手段２から
受け取った頻度情報をもとに、予め定められた規則に基
づいて分割テキスト毎に読み上げモードを決定する。ど
のような規則を使用するかは、自動切り替えの対象とし
ている読み上げモードの種類に応じて事前に定められて
いる。The mode determining means 3 determines the reading mode for each of the divided texts based on the frequency information received from the frequency extracting means 2 based on a predetermined rule. The rule to be used is determined in advance according to the type of the reading mode to be automatically switched.

【００５６】音声合成手段４は、モード決定手段３で読
み上げモードが決定されると、分割テキスト毎に、分割
手段５から与えられたテキストをモード決定手段３から
の読み上げモードの指示に従い読み上げる。音声合成手
段４は、第１の実施の形態と同様に、例えば図２のよう
に構成される。When the text-to-speech mode is determined by the mode determining means 3, the speech synthesizing means 4 reads out the text given from the dividing means 5 according to the instruction of the text-to-speech mode from the mode determining means 3 for each divided text. The voice synthesizing means 4 is configured as shown in FIG. 2, for example, as in the first embodiment.

【００５７】図１６は本実施の形態のフローチャートで
ある。まず、入力手段１は読み上げ対象となるテキスト
を入力し、分割手段５に渡す（ステップＳ５１）。次に
分割手段５はテキストを分割し、個々の分割テキストを
頻度抽出手段２および音声合成手段４に渡す（ステップ
Ｓ５２）。次に、全ての分割テキストについて、頻度抽
出手段２、モード決定手段３および音声合成手段４によ
る処理を繰り返す（ステップＳ５３，Ｓ５４）。FIG. 16 is a flowchart of the present embodiment. First, the input unit 1 inputs a text to be read out and passes it to the dividing unit 5 (step S51). Next, the dividing unit 5 divides the text, and passes each divided text to the frequency extracting unit 2 and the speech synthesizing unit 4 (step S52). Next, the processing by the frequency extracting means 2, the mode determining means 3, and the speech synthesizing means 4 is repeated for all the divided texts (steps S53, S54).

【００５８】ステップＳ５４における１つの分割テキス
トに対する処理の中身は、例えば記号文字の出現頻度を
もとに、その記号文字を読み上げないモード、読み上げ
るモードを自動的に選択する実施例にあっては、図３の
ステップＳ２〜Ｓ８の処理に相当する。また、アルファ
ベット母音および子音の出現頻度をもとに、アルファベ
ット文字列をローマ字として読み上げるモード、英語と
して読み上げるモードを自動的に選択する実施例にあっ
ては、図６のステップＳ１２〜Ｓ２１の処理に相当す
る。さらに、特定の記号文字の出現頻度ををもとに、前
記特定の記号文字を含む数字文字列を電話番号として読
み上げるモード、数式として読み上げるモードを自動的
に選択する実施例にあっては、図１０のステップＳ３２
〜Ｓ４１の処理に相当する。In the embodiment in which the mode of not reading out the symbol character or the mode of reading out the symbol character is automatically selected based on the frequency of appearance of the symbol character, for example, the content of the processing for one divided text in step S54 is as follows. This corresponds to the processing of steps S2 to S8 in FIG. Further, in the embodiment in which the mode in which the alphabet character string is read out as Roman characters and the mode in which it is read out as English are automatically selected based on the appearance frequencies of the alphabet vowels and consonants, the processing in steps S12 to S21 in FIG. Equivalent to. Further, based on the frequency of occurrence of a specific symbol character, a mode in which a numeric character string including the specific symbol character is read as a telephone number and a mode in which a numerical character string is read as a mathematical expression are automatically selected. Step S32 of 10
To S41.

【００５９】最後に、音声合成手段４は、各分割テキス
トについて生成された音声波形の電気信号をスピーカ等
に直接に出力するか、電話回線を通じて他所に出力する
（ステップＳ５４）。Finally, the voice synthesizing means 4 outputs the electric signal of the voice waveform generated for each of the divided texts directly to a speaker or the like, or to another place through a telephone line (step S54).

【００６０】次に本実施の形態の実施例について図面を
参照して詳細に説明する。例としては、記号文字の出現
頻度をもとに、その記号文字を読み上げないモード、読
み上げるモードを自動的に選択する実施例を取り上げ
る。また、読み上げ対象テキストとして図１７に示され
るテキストを用いる。このテキストは電子メールの文章
であり、通常の文章、表、シグネチャーと呼ばれる差出
人の情報欄、の３つから構成されている。このうち、表
の罫線として用いられている「−」，「｜」、シグニチ
ャーの枠である「＊」は、視覚的な意味しかないので読
み上げないことが望ましい。Next, examples of the present embodiment will be described in detail with reference to the drawings. As an example, an embodiment will be described in which a mode in which the symbol character is not read out and a mode in which the symbol character is read out are automatically selected based on the appearance frequency of the symbol character. The text shown in FIG. 17 is used as the text to be read. This text is a text of an e-mail, and is composed of three parts: a normal text, a table, and a sender information field called a signature. Of these, "-" and "|" used as the ruled lines of the table, and "*" which is the signature frame have only a visual meaning, so it is desirable not to read them out.

【００６１】このような実施例の場合、頻度抽出手段２
は、分割テキスト中の文字の出現頻度を抽出するが、特
に視覚的な意味しか持たない記号文字については各記号
文字毎に出現頻度を抽出する。対象とする記号文字は、
「−」、「｜」、「＊」などである。また、モード決定
手段３には、記号文字の出現頻度が予め定められた頻度
より多いとき、その記号文字を読み上げないモードに
し、そうでないときは読み上げるモードにするという規
則が予め設定されている。なお、図１６のステップＳ５
４では、図３のステップＳ２〜Ｓ８の処理が実行され
る。In the case of such an embodiment, the frequency extracting means 2
Extracts the frequency of appearance of characters in the segmented text, but extracts the frequency of appearance of each symbol character especially for symbol characters having only a visual meaning. The target symbol character is
"-", "|", "*" And the like. Further, the mode determining means 3 is preset with a rule that, when the appearance frequency of the symbol character is higher than a predetermined frequency, the mode is set to a mode in which the symbol character is not read out, and otherwise, a mode in which the symbol character is read out is set. Note that step S5 in FIG.
In step 4, the processes of steps S2 to S8 in FIG. 3 are executed.

【００６２】動作にあっては、まず、入力手段１が図１
７のテキストを入力し、分割手段５に渡す（ステップＳ
５１）。次に分割手段５は、テキストを空白行をもと
に、１〜３行目、５〜１１行目、１３〜１６行目の３つ
に分割する（ステップＳ５２）。分割されたテキスト
は、テキストの先頭のものから順に頻度抽出手段３およ
び音声合成手段４に渡される。In operation, first, the input means 1 is operated as shown in FIG.
7 and input it to the dividing means 5 (step S
51). Next, the dividing means 5 divides the text into three based on blank lines, namely, the first to third lines, the fifth to eleventh lines, and the thirteenth to sixteenth lines (step S52). The divided text is passed to the frequency extracting means 3 and the speech synthesizing means 4 in order from the head of the text.

【００６３】次に、１〜３行目の分割テキストがまず処
理される（ステップＳ５３，Ｓ５４）。具体的には、頻
度抽出手段２は、分割テキスト中の各文字の出現頻度を
抽出し、特に「−」、「｜」、「＊」などの所定の記号
文字については個々の記号文字毎に出現頻度を求める
（図３のステップＳ２）。次に、モード決定手段３は、
個々の記号文字毎にその出現頻度の大小を判定し（ステ
ップＳ３）、著しく頻度の大きな記号文字は、読まない
文字のリストに加える（ステップＳ４）。図１７の１〜
３行目の分割テキストの場合、特定の記号文字が「−」
しかなく、然も１つで頻度が少ないので、リストは空と
なる。このリストは、最初の分割テキストの読み上げモ
ードの情報として音声合成手段４のモード記憶手段４２
に記憶される。Next, the divided texts in the first to third lines are processed first (steps S53 and S54). Specifically, the frequency extracting means 2 extracts the frequency of appearance of each character in the divided text, and particularly for predetermined symbol characters such as "-", "|", "*", for each symbol character. The appearance frequency is obtained (step S2 in FIG. 3). Next, the mode determining means 3
The magnitude of the appearance frequency is determined for each symbol character (step S3), and the symbol character with an extremely high frequency is added to a list of unreadable characters (step S4). 17 of FIG.
In the case of the divided text on the third line, the specific symbol character is "-"
The list is empty because there is only one and of course less frequently. This list is stored in the mode storage unit 42 of the speech synthesis unit 4 as information on the reading mode of the first divided text.
Is stored.

【００６４】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２に上記リストが記憶されると、
最初の分割テキストに対する処理を開始する。まず、テ
キスト中から、モード記憶手段４２に記憶されたリスト
中の記号文字を消去する（ステップＳ５）。今の場合、
リストは空なので消去される記号文字はない。そして、
そのテキストを単語単位に分割し（ステップＳ６）、単
語辞書４３を参照して単語毎に発音記号に変換する（ス
テップＳ７）。次に、音声波形生成手段４５は、テキス
ト解析手段４１から出力された発音記号をもとに波形辞
書４４を参照して必要な音素単位の音声波形を取得して
編集することで音声波形を生成する（ステップＳ８）。
この生成された音声波形は音声出力時点まで保存され
る。The text analysis means 41 of the speech synthesis means 4
When the above list is stored in the mode storage means 42,
Start processing for the first split text. First, the symbol characters in the list stored in the mode storage unit 42 are deleted from the text (step S5). In this case,
Since the list is empty, no symbol characters are deleted. And
The text is divided into words (step S6), and converted into phonetic symbols for each word with reference to the word dictionary 43 (step S7). Next, the audio waveform generation unit 45 generates an audio waveform by acquiring and editing a necessary phoneme unit audio waveform by referring to the waveform dictionary 44 based on the phonetic symbols output from the text analysis unit 41. (Step S8).
The generated voice waveform is stored until the time of voice output.

【００６５】最初の分割テキストに対する処理を終える
と、図１７の５〜１１行目の分割テキストが処理される
（ステップＳ５３，Ｓ５４）。まず、頻度抽出手段２
は、前回と同様に「−」、「｜」、「＊」などの所定の
記号文字毎に、当該分割テキスト中での出現頻度を求め
（図３のステップＳ２）、モード決定手段３は、各々の
出現頻度を予め定められた値と比較し（ステップＳ
３）、その値より多ければ、その記号文字を読まない文
字のリストに加える（ステップＳ４）。図１７の５〜１
１行目の分割テキストの場合、例えば漢字２文字、アル
ファベット６文字、数字３０文字、記号「−」１０７文
字、記号「｜」１４文字、記号「、」４文字という頻度
情報が得られ、「−」と「｜」の頻度が多いことから、
「−」と「｜」は読まない文字と判断し、リストに加え
る。こうして生成したリストは、２番目の分割テキスト
の読み上げモードの情報として、音声合成手段４のモー
ド記憶手段４２に記憶される。When the processing for the first divided text is completed, the divided texts in the fifth to eleventh lines in FIG. 17 are processed (steps S53 and S54). First, frequency extraction means 2
Calculates the appearance frequency in the divided text for each predetermined symbol character such as "-", "|", "*" (step S2 in FIG. 3) as in the previous case. Each appearance frequency is compared with a predetermined value (step S
3) If the value is larger than the value, the symbol character is added to the list of unread characters (step S4). 5-1 in FIG.
In the case of the divided text on the first line, frequency information of, for example, 2 Chinese characters, 6 alphabets, 30 numbers, 107 symbols “−”, 14 symbols “|”, and 4 symbols “,” is obtained. Because “−” and “|” are frequent,
"-" And "|" are determined to be unreadable characters and are added to the list. The list thus generated is stored in the mode storage unit 42 of the speech synthesis unit 4 as information on the reading mode of the second divided text.

【００６６】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２に上記リストが記憶されると、
２番目の分割テキストに対する処理を開始し、分割テキ
スト中から、リスト中の記号文字「−」，「｜」を消去
する（ステップＳ５）。そして、そのテキストを単語単
位に分割し（ステップＳ６）、単語辞書４３を参照して
単語毎に発音記号に変換する（ステップＳ７）。次に、
音声波形生成手段４５は、発音記号をもとに波形辞書４
４を参照して必要な音素単位の音声波形を取得して編集
することで音声波形を生成する（ステップＳ８）。この
生成された音声波形は音声出力時点まで保存される。The text analysis means 41 of the speech synthesis means 4
When the above list is stored in the mode storage means 42,
The processing for the second divided text is started, and the symbol characters "-" and "|" in the list are deleted from the divided text (step S5). Then, the text is divided into words (step S6), and converted into phonetic symbols for each word with reference to the word dictionary 43 (step S7). next,
The voice waveform generating means 45 generates the waveform dictionary 4 based on the phonetic symbols.
4 to obtain and edit a necessary phoneme unit speech waveform to generate a speech waveform (step S8). The generated voice waveform is stored until the time of voice output.

【００６７】２番目の分割テキストに対する処理を終え
ると、図１７の１３〜１６行目の分割テキストを処理す
る（ステップＳ５３，Ｓ５４）。まず、頻度抽出手段２
は、前回と同様に所定の記号文字毎に、当該分割テキス
ト中での出現頻度を求め（図３のステップＳ２）、モー
ド決定手段３は、各々の出現頻度を予め定められた値と
比較し（ステップＳ３）、その値より多ければ、その記
号文字を読まない文字のリストに加える（ステップＳ
４）。１３〜１６行目の分割テキストの場合、漢字７文
字、アルファベット１５文字、記号「＊」３６文字、そ
の他の記号３文字という頻度情報が得られ、「＊」の頻
度が多いことから、「＊」は読まない文字と判断し、リ
ストに加える。こうして生成したリストは、３番目の分
割テキストの読み上げモードの情報として、音声合成手
段４のモード記憶手段４２に記憶される。When the processing for the second divided text is completed, the divided text on the thirteenth to sixteenth lines in FIG. 17 is processed (steps S53 and S54). First, frequency extraction means 2
Calculates the appearance frequency in the divided text for each predetermined symbol character as in the previous case (step S2 in FIG. 3), and the mode determination means 3 compares each appearance frequency with a predetermined value. (Step S3) If the value is larger than the value, the symbol character is added to the list of unread characters (Step S3).
4). In the case of the divided texts in the 13th to 16th lines, frequency information of 7 characters of kanji, 15 characters of the alphabet, 36 characters of the symbol "*", and 3 characters of other symbols is obtained. Since the frequency of "*" is high, "*""Is judged as an unreadable character and is added to the list. The list generated in this manner is stored in the mode storage unit 42 of the speech synthesis unit 4 as information on the reading mode of the third divided text.

【００６８】音声合成手段４のテキスト解析手段４１
は、モード記憶手段４２に上記リストが記憶されると、
３番目の分割テキストに対する処理を開始し、分割テキ
スト中から、リスト中の記号文字「＊」を消去する（ス
テップＳ５）。そして、そのテキストを単語単位に分割
し（ステップＳ６）、単語辞書４３を参照して単語毎に
発音記号に変換する（ステップＳ７）。次に、音声波形
生成手段４５は、発音記号をもとに波形辞書４４を参照
して必要な音素単位の音声波形を取得して編集すること
で音声波形を生成する（ステップＳ８）。この生成され
た音声波形は音声出力時点まで保存される。The text analysis means 41 of the speech synthesis means 4
When the above list is stored in the mode storage means 42,
The processing for the third divided text is started, and the symbol character "*" in the list is deleted from the divided text (step S5). Then, the text is divided into words (step S6), and converted into phonetic symbols for each word with reference to the word dictionary 43 (step S7). Next, the voice waveform generating means 45 generates a voice waveform by acquiring and editing a required phoneme unit voice waveform by referring to the waveform dictionary 44 based on the phonetic symbols (step S8). The generated voice waveform is stored until the time of voice output.

【００６９】以上で入力テキストの全ての分割テキスト
に対する処理を終えたので、音声合成手段４は、蓄積さ
れていた音声波形の電気信号を最初の分割テキスト、２
番目の分割テキスト、３番目の分割テキストの順にスピ
ーカ等に出力する（ステップＳ５５）。これにより、図
１７のテキストは図１８のように読み上げられる。この
場合、図１７の３行目の「−」は読み上げられている
が、５−１１行目の「−」は読み上げられていない。つ
まり、第２の実施の形態によれば、第１の実施の形態の
効果に加えて、テキスト中に様々な書式の文が混在して
いても、正しく読み上げるという効果が奏される。As described above, the processing for all the divided texts of the input text has been completed, and the speech synthesizing means 4 converts the stored electric signal of the speech waveform into the first divided text,
The third divided text is output to a speaker or the like in the order of the third divided text (step S55). Thus, the text in FIG. 17 is read out as in FIG. In this case, "-" on the third line in FIG. 17 is read out, but "-" on the 5-11th line is not read out. That is, according to the second embodiment, in addition to the effects of the first embodiment, even if sentences in various formats are mixed in a text, an effect of correctly reading out aloud is exerted.

【００７０】第２の実施の形態の実施例として、記号文
字の出現頻度をもとにその記号文字を読み上げないモー
ド、読み上げるモードを自動的に選択する実施例を挙げ
たが、アルファベット母音および子音の出現頻度をもと
に、アルファベット文字列をローマ字として読み上げる
モード、英語として読み上げるモードを自動的に選択す
る場合や、特定の記号文字の出現頻度をもとに、前記特
定の記号文字を含む数字文字列を電話番号として読み上
げるモード、数式として読み上げるモードを自動的に選
択する場合などにも適用できるのは勿論のことである。As an example of the second embodiment, there has been described an example in which a mode in which the symbol character is not read out and a mode in which the symbol character is read out are automatically selected based on the frequency of appearance of the symbol character. Based on the frequency of occurrence, a mode in which the alphabet character string is read out as Roman characters, a mode in which the text is read out as English is automatically selected, or based on the frequency of occurrence of a specific symbol character, a numeral containing the specific symbol character It is needless to say that the present invention can also be applied to a case where a mode in which a character string is read as a telephone number or a mode in which a character string is read as a mathematical expression is automatically selected.

【００７１】図１９は本発明を適用したコンピュータシ
ステムの一例を示すブロック図である。この例のコンピ
ュータシステムは、ＣＰＵや主記憶等を含むコンピュー
タ本体１００と、それに接続されたキーボード１０１、
補助記憶装置１０２、スピーカ等の音声出力装置１０３
および記録媒体１０４とから構成されている。記録媒体
１０４は、ＣＤ−ＲＯＭ，半導体メモリ，磁気ディスク
等の機械読み取り可能な記録媒体であり、ここに記録さ
れたプログラムはコンピュータシステムの立ち上げ時な
どにコンピュータ本体１００にインストールされ、コン
ピュータ本体１００の動作を制御することにより、コン
ピュータ本体１００上に、前述した第１の実施の形態に
おける入力手段１，頻度抽出手段２，モード決定手段３
および音声合成手段４を実現し、また前述した第２の実
施の形態における入力手段１，頻度抽出手段２，モード
決定手段３，音声合成手段４および分割手段５を実現す
る。FIG. 19 is a block diagram showing an example of a computer system to which the present invention is applied. The computer system of this example includes a computer main body 100 including a CPU, a main memory, and the like, a keyboard 101 connected thereto,
Auxiliary storage device 102, audio output device 103 such as speaker
And a recording medium 104. The recording medium 104 is a machine-readable recording medium such as a CD-ROM, a semiconductor memory, and a magnetic disk. The program recorded in the recording medium 104 is installed in the computer main body 100 when the computer system starts up. Of the input means 1, the frequency extracting means 2, the mode determining means 3 in the first embodiment described above by controlling the operation of
And the voice synthesizing means 4 and the input means 1, the frequency extracting means 2, the mode determining means 3, the voice synthesizing means 4 and the dividing means 5 in the second embodiment.

【００７２】以上本発明の実施の形態を説明したが、本
発明は以上の実施の形態にのみ限定されず、その他各種
の付加変更が可能である。例えば、記号文字の出現頻度
をもとにその記号文字を読み上げないモード、読み上げ
るモードを選択する機能と、アルファベット母音および
子音の出現頻度をもとに、アルファベット文字列をロー
マ字として読み上げるモード、英語として読み上げるモ
ードを選択する機能と、特定の記号文字の出現頻度をも
とに、前記特定の記号文字を含む数字文字列を電話番号
として読み上げるモード、数式として読み上げるモード
を選択する機能など、複数の機能を併せ有する音声合成
装置も本発明の他の実施の形態として考えられる。この
場合、読み上げモード毎に読み分ける必要がある単語
は、読み上げモードの組み合わせ毎にその発音記号を単
語辞書４３に記憶しておけば良い。例えば、記号「−」
の読みは、「電話番号読み」と「ローマ字読み」の場合
「の」、「数式読み」と「英語読み」または「ローマ字
読み」の場合「まいなす」、「電話番号読み」と「英語
読み」の場合「」（読まない）と記憶しておけば良い。Although the embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and various other modifications can be made. For example, a function to select a mode that does not read out the symbol character based on the frequency of appearance of the symbol character, a mode to read out, and a mode that reads the alphabet character string as Roman characters based on the frequency of occurrence of alphabet vowels and consonants, as English Multiple functions such as a function to select a reading mode and a mode to read a numeric character string including the specific symbol character as a telephone number based on the frequency of occurrence of the specific symbol character, and a function to select a mode to read as a mathematical expression Is also considered as another embodiment of the present invention. In this case, words that need to be read separately for each reading mode may be stored in the word dictionary 43 with their phonetic symbols for each combination of reading modes. For example, the symbol "-"
The reading of "phone number reading" and "Romaji reading" is "no", "Formula reading" and "English reading" or "Romaji reading" is "Manasu", "phone number reading" and "English reading" In the case of "", "" (not read) may be stored.

【００７３】[0073]

【発明の効果】以上説明したように本発明は、テキスト
中の文字の出現頻度に基づいて読み上げモードを決定す
る。このため、表の罫線を読み上げないモードの決定
や、ローマ字読みと英語読みとのモード決定など、特殊
パターンの検出による従来技術では実質的に自動切り替
えが難しかった種類の読み上げモードに対しても、ユー
ザによるモード選択無しに自動的に正しいモードを決定
することができる。勿論、電話番号等の記述が或る程度
の頻度でテキスト中に存在する等の条件が必要になるも
のの、数字文字列を電話番号として読み上げるモードと
数式として読み上げるモードとの自動切り替え等に対し
ても適用できるという柔軟性も有している。As described above, according to the present invention, the reading mode is determined based on the appearance frequency of the character in the text. For this reason, even in the case of a reading mode of a type that is difficult to automatically switch in the conventional technology by detecting a special pattern, such as determining a mode in which a ruled line of a table is not read or determining a mode between a Roman character reading and an English reading, The correct mode can be automatically determined without a mode selection by the user. Of course, it is necessary to have a condition that a description such as a telephone number is present in the text at a certain frequency, but it is not possible to automatically switch between a mode in which a numeric character string is read out as a telephone number and a mode in which it is read out as a mathematical expression. It also has the flexibility of being applicable.

【００７４】また、テキストを空白行等に基づいて分割
する構成にあっては、テキスト中に様々な書式の文が混
在していてもそれぞれを適切な読み上げモードで読み上
げることができる。Further, in a configuration in which a text is divided based on blank lines or the like, even if sentences in various formats are mixed in the text, each can be read out in an appropriate reading mode.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の第１の実施の形態のブロック図であ
る。FIG. 1 is a block diagram of a first embodiment of the present invention.

【図２】音声合成手段の構成例を示すブロック図であ
る。FIG. 2 is a block diagram illustrating a configuration example of a speech synthesis unit.

【図３】本発明の第１の実施の形態における第１の実施
例のフローチャートである。FIG. 3 is a flowchart of a first example according to the first embodiment of the present invention.

【図４】本発明の第１の実施の形態における第１の実施
例で使用するテキストの例を示す図である。FIG. 4 is a diagram showing an example of a text used in the first example according to the first embodiment of the present invention.

【図５】本発明の第１の実施の形態における第１の実施
例で使用するテキストの他の例を示す図である。FIG. 5 is a diagram showing another example of the text used in the first example of the first embodiment of the present invention.

【図６】本発明の第１の実施の形態における第２の実施
例のフローチャートである。FIG. 6 is a flowchart of a second example according to the first embodiment of the present invention.

【図７】本発明の第１の実施の形態における第２の実施
例で使用するテキストの例を示す図である。FIG. 7 is a diagram showing an example of a text used in a second example according to the first embodiment of the present invention.

【図８】本発明の第１の実施の形態における第２の実施
例で使用するテキストの他の例を示す図である。FIG. 8 is a diagram showing another example of the text used in the second example according to the first embodiment of the present invention.

【図９】図８のテキストの読み上げ例を示す図である。FIG. 9 is a diagram showing an example of reading out the text of FIG. 8;

【図１０】本発明の第１の実施の形態における第３の実
施例のフローチャートである。FIG. 10 is a flowchart of a third example according to the first embodiment of the present invention.

【図１１】本発明の第１の実施の形態における第３の実
施例で使用するテキストの例を示す図である。FIG. 11 is a diagram showing an example of a text used in a third example according to the first embodiment of the present invention.

【図１２】図１１のテキストの読み上げ例を示す図であ
る。FIG. 12 is a diagram showing an example of reading out the text of FIG. 11;

【図１３】本発明の第１の実施の形態における第３の実
施例で使用するテキストの他の例を示す図である。FIG. 13 is a diagram showing another example of the text used in the third example according to the first embodiment of the present invention.

【図１４】図１３のテキストの読み上げ例を示す図であ
る。FIG. 14 is a diagram showing an example of reading out the text of FIG. 13;

【図１５】本発明の第２の実施の形態のブロック図であ
る。FIG. 15 is a block diagram of a second embodiment of the present invention.

【図１６】本発明の第２の実施の形態の一実施例のフロ
ーチャートである。FIG. 16 is a flowchart of an example of the second embodiment of the present invention.

【図１７】本発明の第２の実施の形態における一実施例
で使用するテキストの例を示す図である。FIG. 17 is a diagram showing an example of a text used in one example according to the second embodiment of the present invention.

【図１８】図１７のテキストの読み上げ例を示す図であ
る。18 is a diagram showing an example of reading out the text of FIG. 17;

【図１９】本発明を適用したコンピュータシステムの一
例を示すブロック図である。FIG. 19 is a block diagram illustrating an example of a computer system to which the present invention has been applied.

【図２０】ユーザ自身に読み上げモードを指定させる方
式の音声合成装置のブロック図である。FIG. 20 is a block diagram of a speech synthesizing apparatus that allows a user to designate a reading mode.

【符号の説明】[Explanation of symbols]

１…入力手段２…頻度抽出手段３…モード決定手段４…音声合成手段５…分割手段 REFERENCE SIGNS LIST 1 input means 2 frequency extraction means 3 mode determination means 4 voice synthesis means 5 division means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−20998（ＪＰ，Ａ) 特開平４−306766（ＪＰ，Ａ) 特開平９−6379（ＪＰ，Ａ) 特開平４−253098（ＪＰ，Ａ) 特開平８−36395（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/00 - 13/08 G06F 17/28 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-4-20998 (JP, A) JP-A-4-306766 (JP, A) JP-A-9-6379 (JP, A) JP-A-4- 253098 (JP, A) JP-A-8-36395 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 13/00-13/08 G06F 17/28

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】テキストを入力する入力手段と、入力されたテキスト中の文字のテキスト全体における出
現頻度を抽出する頻度抽出手段と、この頻度情報をもとに読み上げモードを決定するモード
決定手段と、この決定された読み上げモードに応じて動作する音声合
成手段とを備えたことを特徴とする音声合成装置。An input unit for inputting a text; a frequency extracting unit for extracting a frequency of appearance of characters in the input text in the entire text; and a reading mode is determined based on the frequency information. And a voice synthesizing unit that operates in accordance with the determined reading mode.

【請求項２】テキストを入力する入力手段と、入力されたテキストを分割する分割手段と、分割されたテキスト毎にそのテキスト中の文字のテキス
ト全体における出現頻度を抽出する頻度抽出手段と、この頻度情報をもとに、分割テキスト毎に読み上げモー
ドを決定するモード決定手段と、分割テキスト毎に、決定された読み上げモードに応じて
動作する音声合成手段とを備えたことを特徴とする音声
合成装置。2. An input unit for inputting a text, a dividing unit for dividing an input text, and a text of a character in the text for each divided text.
Frequency extracting means for extracting an appearance frequency in the entire text, mode determining means for determining a reading mode for each divided text based on the frequency information, and operating according to the determined reading mode for each divided text. A speech synthesizing apparatus comprising: a speech synthesizing unit.

【請求項３】前記頻度抽出手段は、文字のカテゴリー
の出現頻度を抽出する構成を有することを特徴とする請
求項１または２記載の音声合成装置。3. The speech synthesizer according to claim 1, wherein said frequency extracting means has a configuration for extracting an appearance frequency of a character category.

【請求項４】前記モード決定手段は、記号文字の出現
頻度をもとに、その記号文字を読み上げないモード、読
み上げるモードの何れかを決定する構成を有する請求項
１または２記載の音声合成装置。4. The speech synthesizer according to claim 1, wherein said mode determining means determines one of a mode in which the symbol character is not read out and a mode in which the symbol character is read out based on the frequency of appearance of the symbol character. .

【請求項５】前記モード決定手段は、アルファベット
母音および子音の出現頻度をもとに、アルファベット文
字列をローマ字として読み上げるモード、英語として読
み上げるモードの何れかを決定する構成を有する請求項
１または２記載の音声合成装置。5. The mode determining means according to claim 1, wherein the mode determining means determines one of a mode in which the alphabet character string is read as a Roman character and a mode in which the character string is read as an English language, based on the appearance frequencies of the alphabet vowels and consonants. A speech synthesizer as described.

【請求項６】前記音声合成手段は、アルファベット文
字列を英語として読み上げるモードが決定された場合、
数字、記号も英語として読み上げる構成を有することを
特徴とする請求項５記載の音声合成装置。6. The voice synthesizing means, when a mode in which an alphabet character string is read out in English is determined,
6. The speech synthesizer according to claim 5, wherein numbers and symbols are read out in English.

【請求項７】前記モード決定手段は、特定の記号文字
の出現頻度をもとに、前記特定の記号文字を含む数字文
字列を電話番号として読み上げるモード、数式として読
み上げるモードの何れかを決定する構成を有する請求項
１または２記載の音声合成装置。7. The mode determining means determines one of a mode in which a numeric character string including the specific symbol character is read out as a telephone number and a mode in which it is read out as a mathematical expression, based on the frequency of occurrence of the specific symbol character. 3. The speech synthesizer according to claim 1, having a configuration.

【請求項８】テキストを入力する入力手段と、入力されたテキスト中の文字の出現頻度を抽出する頻度
抽出手段と、この頻度情報をもとに読み上げモードを決定する手段で
あって、アルファベット母音および子音の出現頻度をも
とに、アルファベット文字列をローマ字として読み上げ
るモード、英語として読み上げるモードの何れかを決定
するモード決定手段と、この決定された読み上げモードに応じて動作する音声合
成手段とを備えたことを特徴とする音声合成装置。 8. An input means for inputting a text, a frequency extracting means for extracting an appearance frequency of a character in the input text, and a means for determining a reading mode based on the frequency information, comprising: A mode determining means for determining one of a mode in which the alphabet character string is read out as Roman characters and a mode in which it is read out as English based on the appearance frequency of consonants; and a voice synthesizing means which operates in accordance with the determined reading mode. A speech synthesizer comprising:

【請求項９】テキストを入力する入力手段と、入力されたテキストを分割する分割手段と、分割されたテキスト毎にそのテキスト中の文字の出現頻
度を抽出する頻度抽出手段と、この頻度情報をもとに、分割テキスト毎に読み上げモー
ドを決定する手段であって、アルファベット母音および
子音の出現頻度をもとに、アルファベット文字列をロー
マ字として読み上げるモード、英語として読み上げるモ
ードの何れかを決定するモード決定手段と、分割テキスト毎に、決定された読み上げモードに応じて
動作する音声合成手段とを備えたことを特徴とする音声
合成装置。 9. An input means for inputting a text, a dividing means for dividing the input text, a frequency extracting means for extracting a frequency of appearance of a character in the text for each of the divided texts, Based on the frequency of appearance of alphabet vowels and consonants, it is means for determining a reading mode for each of the divided texts. A speech synthesis apparatus comprising: a determination unit; and a speech synthesis unit that operates in accordance with the determined reading mode for each divided text.

【請求項１０】前記音声合成手段は、アルファベット
文字列を英語として読み上げるモードが決定された場
合、数字、記号も英語として読み上げる構成を有するこ
とを特徴とする請求項８または９記載の音声合成装置。10. The speech synthesis apparatus according to claim 8, wherein said speech synthesis means has a configuration in which, when a mode in which an alphabet character string is read out in English is determined, numbers and symbols are also read out in English. .

【請求項１１】コンピュータを、テキストを入力する入力手段、入力されたテキスト中の文字のテキスト全体における出
現頻度を抽出する頻度抽出手段、この頻度情報をもとに読み上げモードを決定するモード
決定手段、この決定された読み上げモードに応じて動作する音声合
成手段、として機能させるプログラムを記録したコンピュータ可
読記録媒体。11. A computer, comprising: input means for inputting a text; frequency extracting means for extracting a frequency of appearance of characters in the input text in the entire text; and a reading mode based on the frequency information. A computer-readable recording medium storing a program to function as a mode determining means for determining, a speech synthesizing means operating in accordance with the determined reading mode.

【請求項１２】コンピュータを、テキストを入力する入力手段、入力されたテキストを分割する分割手段、分割されたテキスト毎にそのテキスト中の文字のテキス
ト全体における出現頻度を抽出する頻度抽出手段、この頻度情報をもとに、分割テキスト毎に読み上げモー
ドを決定するモード決定手段、分割テキスト毎に、決定された読み上げモードに応じて
動作する音声合成手段、として機能させるプログラムを記録したコンピュータ可
読記録媒体。12. A computer, comprising: input means for inputting text; splitting means for splitting input text; text of characters in the text for each split text;
Frequency extracting means for extracting the appearance frequency in the entire text, mode determining means for determining a reading mode for each divided text based on the frequency information, and speech synthesis operating according to the determined reading mode for each divided text. Means, a computer-readable recording medium recording a program to function as a computer.

【請求項１３】前記頻度抽出手段は、文字のカテゴリ
ーの出現頻度を抽出する構成を有することを特徴とする
請求項１１または１２記載のコンピュータ可読記録媒
体。 13. The computer-readable recording medium according to claim 11, wherein said frequency extracting means has a configuration for extracting an appearance frequency of a character category.

【請求項１４】前記モード決定手段は、記号文字の出
現頻度をもとに、その記号文字を読み上げないモード、
読み上げるモードの何れかを決定する構成を有する請求
項１１または１２記載のコンピュータ可読記録媒体。 14. A mode in which a symbol character is not read out based on the frequency of appearance of the symbol character.
13. The computer-readable recording medium according to claim 11, wherein the computer-readable recording medium has a configuration for determining one of a reading mode.

【請求項１５】前記モード決定手段は、アルファベッ
ト母音および子音の出現頻度をもとに、アルファベット
文字列をローマ字として読み上げるモード、英語として
読み上げるモードの何れかを決定する構成を有する請求
項１１または１２記載のコンピュータ可読記録媒体。 15. The apparatus according to claim 11, wherein said mode determining means determines one of a mode in which an alphabetic character string is read out as Roman characters and a mode in which it is read out as English, based on the frequency of appearance of alphabet vowels and consonants. The computer-readable recording medium according to claim 1.

【請求項１６】前記音声合成手段は、アルファベット
文字列を英語として読み上げるモードが決定された場
合、数字、記号も英語として読み上げる構成を有するこ
とを特徴とする請求項１５記載のコンピュータ可読記録
媒体。 16. The computer-readable recording medium according to claim 15, wherein said voice synthesizing means has a configuration in which, when a mode in which an alphabetic character string is read out in English is determined, numerals and symbols are also read out in English.

【請求項１７】前記モード決定手段は、特定の記号文
字の出現頻度をもとに、前記特定の記号文字を含む数字
文字列を電話番号として読み上げるモード、数式として
読み上げるモードの何れかを決定する構成を有する請求
項１１または１２記載のコンピュータ可読記録媒体。 17. The mode determining means determines one of a mode in which a numeric character string including the specific symbol character is read out as a telephone number and a mode in which it is read out as a mathematical expression, based on the frequency of occurrence of the specific symbol character. 13. The computer-readable recording medium according to claim 11, having a configuration.

【請求項１８】コンピュータを、テキストを入力する入力手段、入力されたテキスト中の文字の出現頻度を抽出する頻度
抽出手段、この頻度情報をもとに読み上げモードを決定する手段で
あって、アルファベット母音および子音の出現頻度をも
とに、アルファベット文字列をローマ字として読み上げ
るモード、英語として読み上げるモードの何れかを決定
するモード決定手段、この決定された読み上げモードに応じて動作する音声合
成手段、として機能させるプログラムを記録したコンピュータ可
読記録媒体。 18. A computer, comprising : input means for inputting a text; frequency extracting means for extracting a frequency of appearance of a character in the input text; means for determining a reading mode based on the frequency information; A mode determining means for determining one of a mode in which an alphabet character string is read out as Roman characters and a mode in which it is read out as English based on the frequency of appearance of vowels and consonants; A computer-readable recording medium on which a program to be operated is recorded.

【請求項１９】コンピュータを、テキストを入力する入力手段、入力されたテキストを分割する分割手段、分割されたテキスト毎にそのテキスト中の文字の出現頻
度を抽出する頻度抽出手段、この頻度情報をもとに、分割テキスト毎に読み上げモー
ドを決定する手段であって、アルファベット母音および
子音の出現頻度をもとに、アルファベット文字列をロー
マ字として読み上げるモード、英語として読み上げるモ
ードの何れかを決定するモード決定手段、分割テキスト毎に、決定された読み上げモードに応じて
動作する音声合成手段、として機能させるプログラムを記録したコンピュータ可
読記録媒体。 19. A computer, comprising : input means for inputting text; dividing means for dividing input text; frequency extracting means for extracting the appearance frequency of a character in the text for each divided text; A mode for determining a reading mode for each divided text based on the frequency of appearance of alphabet vowels and consonants, which determines one of a mode for reading an alphabet character string as Roman characters and a mode for reading as English. A computer-readable recording medium that stores a program that functions as a determining unit, and a speech synthesizing unit that operates in accordance with the determined reading mode for each divided text.

【請求項２０】前記音声合成手段は、アルファベット
文字列を英語として読み上げるモードが決定された場
合、数字、記号も英語として読み上げる構成を有するこ
とを特徴とする請求項１８または１９記載のコンピュー
タ可読記録媒体。 20. The computer-readable recording according to claim 18, wherein said speech synthesizing means has a configuration in which, when a mode in which an alphabetic character string is read out in English is determined, numbers and symbols are also read out in English. Medium.