JPH06119144A

JPH06119144A - Document read-alout device

Info

Publication number: JPH06119144A
Application number: JP4263985A
Authority: JP
Inventors: Hiromi Saito; 裕美斎藤; Kenichiro Kobayashi; 賢一郎小林
Original assignee: Toshiba Corp; Toshiba AVE Co Ltd
Current assignee: Toshiba Corp; Toshiba AVE Co Ltd
Priority date: 1992-10-02
Filing date: 1992-10-02
Publication date: 1994-04-28

Abstract

PURPOSE:To always appropriately read aloud document data by distinctively reading homonyms at need. CONSTITUTION:A Japanese analytic part 4 takes a Japanese analysis of read- aloud document data in a document data file 1 by referring to a word dictionary 6, and reads cooccurrence information for respective pronunciations of a word with the pronunciations out of the word dictionary 6 and supplies it to a cooccurrence information retrieval part 8. The cooccurrence information retrieval part 8 retrieves a word which cooccurs from document data before and after the word according to the given cooccurrence information, gives top priority to the pronunciation in cooccurrence relation with the retrieved word, and holds the word in a cooccurrence information use storage part 9 and also passes it to the Japanese analytic part 4. The Japanese analytic part 4 stores the pronunciation having the top priority in an analytic result buffer 7 as an analytic result. The analytic result is converted by a speech data generation part 10 into speech data, which are passed through a speech synthesizing device 13 and outputted as a speech from a speech output part 14.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は日本語の文書データを解
析し、この解析結果を音声データにした後に、この音声
データに音声合成処理を施して前記文書データの内容を
音声にして出力する文書読み上げ装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention analyzes Japanese document data, converts the analysis result into voice data, and then performs voice synthesis processing on the voice data to output the content of the document data as voice. The present invention relates to a document reading device.

【０００２】[0002]

【従来の技術】従来この種の音声読み上げ装置では、ワ
ードプロセッサ等で作成された文書データを単語辞書等
を参照して日本語解析し、この解析結果を音声データ生
成部により音声データに変換し、この音声データを音声
合成装置によって音声信号に変換して、これをスピーカ
等から出力する構成を有している。しかし、上記音声デ
ータの日本語解析では、読みが複数ある単語について
は、その代表的な読みを予め決められた規則に従って出
力していたため、文章の内容やその前後関係に従って、
読みを変更することができないという欠点があった。例
えば、前記文書データの中に「市場」という漢字があっ
た場合、前記「市場」で魚等を売っている場合は、「い
ちば」と読み上げるのが正しく、株等が売買されている
場合は、「しじょう」と読み上げるのが正しいが、従来
はこのように場合によって同字異音語を読み分けること
ができないという欠点があった。2. Description of the Related Art Conventionally, in this type of voice reading device, document data created by a word processor or the like is analyzed in Japanese by referring to a word dictionary or the like, and the analysis result is converted into voice data by a voice data generation unit. This voice data is converted into a voice signal by a voice synthesizer and output from a speaker or the like. However, in the Japanese analysis of the voice data, for a word having a plurality of readings, the typical reading was output according to a predetermined rule.
It had the drawback of not being able to change the reading. For example, when there is a kanji "market" in the document data, when fish etc. are sold in the "market", it is correct to read "ichiba" and stocks etc. are traded. Is correct to read aloud, but there was a drawback in the past that it was impossible to distinguish homophones in this way.

【０００３】[0003]

【発明が解決しようとする課題】上記のように計算機上
で処理できる形で表された文書データを音声にして読み
上げる従来の文書読み上げ装置では、市場（いちば又
は、しじょう）等の同字異音語を文書中の前後関係や文
章の意味によって読み分けることができないという欠点
があった。このため、場合によってはその文章中ではふ
さわしくない読み上げを行って聞き手に違和感を与える
という欠点があった。In the conventional document reading device which reads out the document data expressed in a form that can be processed on the computer as a voice as described above, the same character as the market (Ichiba or Shijo) is used. There was a drawback in that allophones could not be read according to the context in the document and the meaning of the sentence. For this reason, in some cases, there is a drawback in that the sentence is read improperly and the listener feels uncomfortable.

【０００４】そこで本発明は上記の欠点を除去し、同字
異音語を場合により読み分けて、常に適切な文書データ
の読み上げを行うことができる文書読み上げ装置を提供
することを目的としている。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to eliminate the above-mentioned drawbacks and to provide a document reading device capable of always reading appropriate document data by distinguishing homophones from different ones as occasion demands.

【０００５】[0005]

【課題を解決するための手段】本発明は文書データを日
本語解析して音声データを得、この音声データを音声合
成装置により電気的な音声信号に変換し、これを音声出
力装置により音声にして出力する文書読み上げ装置にお
いて、複数の読みがある見出し語の前記各読みに対する
共起情報を収集した単語辞書と、日本語解析対象の単語
に複数の読みがあると、前記単語辞書から前記各読みに
対する共起情報を読み出す共起情報読出手段と、この共
起情報読出手段により読み出された共起情報に基づいて
前記日本語解析対象の単語の前後の文書データ内に共起
する単語があるか否かを検索する共起情報検索手段と、
この共起情報検索手段により検索された単語と共起関係
にある前記日本語解析対象の単語の読みの優先順位を第
１位とする読み決定手段とを具備し、この読み決定手段
により優先順位第１位となった読みにて音声データを作
成する音声データ作成手段とを具備した構成を有する。SUMMARY OF THE INVENTION The present invention analyzes document data in Japanese to obtain voice data, converts the voice data into an electric voice signal by a voice synthesizer, and converts the voice data into a voice by a voice output device. In a document reading device that outputs as a word dictionary that collects co-occurrence information for headwords having a plurality of readings for each reading, and if there are a plurality of readings in a word to be analyzed in Japanese, A co-occurrence information reading unit for reading co-occurrence information for reading, and a co-occurrence word in the document data before and after the word to be analyzed in Japanese based on the co-occurrence information read by the co-occurrence information reading unit. Co-occurrence information search means for searching whether or not there is,
The reading determination means sets the reading priority of the Japanese analysis target word that is in a co-occurrence relation with the word searched by the co-occurrence information searching means as the first priority, and the reading determination means determines the priority order. A voice data creating means for creating voice data by reading the first place is provided.

【０００６】[0006]

【作用】本発明の文書読み上げ装置において、単語辞書
は複数の読みがある見出し語の前記各読みに対する共起
情報を収集してある。共起情報読出手段は日本語解析対
象の単語に複数の読みがあると、前記単語辞書から前記
各読みに対する共起情報を読み出す。共起情報検索手段
は前記共起情報読出手段により読み出された共起情報に
基づいて前記日本語解析対象の単語の前後の文書データ
内に共起する単語があるか否かを検索する。読み決定手
段は前記共起情報検索手段により検索された共起する単
語と共起関係にある前記日本語解析対象の単語の読みの
優先順位を第１位とする。音声データ作成手段は前記読
み決定手段により優先順位第１位となった読みにて音声
データを作成する。In the document reading device of the present invention, the word dictionary collects co-occurrence information for each reading of a headword having a plurality of readings. The co-occurrence information reading means reads the co-occurrence information for each of the readings from the word dictionary when the word to be analyzed in Japanese has a plurality of readings. The co-occurrence information retrieving means retrieves, based on the co-occurrence information read by the co-occurrence information reading means, whether or not there is a co-occurring word in the document data before and after the word to be analyzed in Japanese. The reading determination means sets the reading priority of the Japanese analysis target word having the co-occurrence relationship with the co-occurring word searched by the co-occurrence information searching means to the first priority. The voice data creating means creates the voice data with the reading which has the first priority by the reading determining means.

【０００７】[0007]

【実施例】以下、本発明の一実施例を図面を参照して説
明する。図１は本発明の文書読み上げ装置の一実施例を
示したブロック図である。１は計算機上で扱える形の文
書データを格納している文書データファイル、２は読み
上げ時の各種設定データ等を入力する入力装置、３は文
書データを読み上げる際の総合的な制御を行う制御部、
４は読み上げる文書データを単語辞書６を参照して形態
的、構文的及び意味的に解析する日本語解析部、５は読
み上げ時の各種設定データが保存される設定バッファ、
６は文書データを解析するための見出し、品詞、読み、
アクセント、意味、共起情報その他が一覧となって収集
されている単語辞書、７は日本語解析部４による解析結
果を保存する解析結果バッファ、８は日本語解析部４に
より解析される単語の前後の文書データ内から共起する
単語を検索する共起情報検索部、９は前記共起情報検索
部８により決定された読みの優先順位情報が保持されて
いる共起情報使用記憶部、１０は日本語解析部４の解析
結果及び共起情報等から対応する音声データを生成する
音声データ生成部、１１は前記音声データ生成部１０で
音声データを生成する際に参照される音声データ生成規
則を格納している音声データ生成規則ファイル、１２は
音声データ生成部１０により生成された音声データを保
存する音声データファイル、１３は音声データに基づい
て音声信号を合成する音声合成装置、１４は音声信号を
出力する音声出力部、１５は表示データを保存する表示
データファイル、１６は表示装置１７に表示する表示デ
ータを音声データから作成する表示データ作成部、１７
は表示データを表示する表示装置である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the document reading device of the present invention. Reference numeral 1 is a document data file storing document data in a form that can be handled on a computer, 2 is an input device for inputting various setting data at the time of reading, and 3 is a control unit for performing comprehensive control when reading the document data. ,
Reference numeral 4 is a Japanese analysis unit that morphologically, syntactically and semantically analyzes the read document data with reference to the word dictionary 6, and 5 is a setting buffer in which various setting data at the time of reading is stored,
6 is a heading for analyzing the document data, a part of speech, a reading,
A word dictionary in which accents, meanings, co-occurrence information, etc. are collected as a list, 7 is an analysis result buffer for storing the analysis result of the Japanese analysis unit 4, and 8 is a word analysis unit of the Japanese analysis unit 4. A co-occurrence information retrieval unit for retrieving co-occurrence words from the preceding and following document data, and 9 a co-occurrence information use storage unit in which the reading priority information determined by the co-occurrence information retrieval unit 8 is held. Is a voice data generation unit that generates corresponding voice data from the analysis result of the Japanese analysis unit 4 and co-occurrence information, and 11 is a voice data generation rule that is referred to when the voice data generation unit 10 generates voice data. Is stored in the voice data generation rule file, 12 is a voice data file that stores the voice data generated by the voice data generation unit 10, and 13 is a voice signal based on the voice data. That speech synthesizer, an audio output unit which outputs the audio signal 14, 15 is a display data file that stores the display data, 16 the display data creation section that creates a sound data display data to be displayed on the display device 17, 17
Is a display device for displaying display data.

【０００８】次に本実施例の動作について図２のフロー
チャートを参照して説明する。文書データファイル１に
は日本語ワードプロセッサ等により作成された計算機で
処理可能な文書データが予め格納されている。又、この
文書データファイル１に格納された文書データはＯＣＲ
等により計算機中に読み込まれた文章等でも良い。この
ような状態で、入力装置２から文書読み上げ指示と、読
み上げ対象となる文書データの範囲がオペレータ２によ
りステップ２０１で制御部３に入力されると、制御部３
は選択された文書データを読み上げる制御を開始する。
オペレータは前記読み上げ対象文書の範囲を入力した
後、入力装置２から文書を読み上げるための図３に示し
たような設定条件を制御部３にステップ２０２にて入力
するため、制御部３はこれら条件を設定バッファ５に保
存する。この条件設定作業に関し、オペレータは表示装
置１７に出力される設定画面のメニュー等を見ながら各
種設定条件等を入力することができる。この設定できる
条件としては、基本の読み上げ速度、音質、高さ、強
さ、読み上げ終了の設定時間、強調文字の特殊読み上げ
の有無、強調文字の特殊読み上げの変更点、読み上げの
有無、休みの長さ等がある。図３は前記設定バッファ５
に設定される設定条件例を示した図である。設定条件と
して、読み上げ速度、音質、高さ、強さ、終了時間、…
等のデータがあり、これらの諸条件がオペレータによ
り、任意に設定される。Next, the operation of this embodiment will be described with reference to the flowchart of FIG. In the document data file 1, document data created by a Japanese word processor or the like that can be processed by a computer is stored in advance. The document data stored in this document data file 1 is OCR.
It may be a sentence or the like that is read into the computer by the above. In this state, when the operator 2 inputs the document reading instruction and the range of the document data to be read from the input device 2 to the control unit 3 in step 201, the control unit 3
Starts the control of reading the selected document data.
After the operator inputs the range of the reading target document, the control unit 3 inputs the setting conditions for reading the document from the input device 2 as shown in FIG. Is stored in the setting buffer 5. Regarding this condition setting work, the operator can input various setting conditions and the like while looking at the menu of the setting screen output to the display device 17. The conditions that can be set are: basic reading speed, sound quality, height, strength, set time for ending reading, whether special reading of emphasized characters is included, changes in special reading of emphasized characters, whether or not to read, and length of rest. There are such things. FIG. 3 shows the setting buffer 5
It is a figure showing the example of the setting conditions set to. As setting conditions, reading speed, sound quality, height, strength, end time, ...
And the like, and these conditions are arbitrarily set by the operator.

【０００９】制御部３は文書データファイル１からオペ
レータにより選択された文書データをその先頭から一
語、一語読み出し、読み出した文を日本語解析部４に渡
す。日本語解析部４はステップ２０３にて単語の形態情
報、読み情報、アクセント情報、単語間の共起情報等を
収めた図４に示したような内容の単語辞書６を参照し
て、入力された文章の形態的、構文的、意味的な解析を
行って、前記文書データを単語単位に切り分け、この各
単語に関する前記単語辞書６内の情報をまとめた解析結
果を得る。制御部３はこうして得た解析結果を解析結果
バッファ７に書き込む。この時、日本語解析部４は解析
対象単語（同字異音語）に複数の読みがあった場合、こ
の各読みに対する共起情報を単語辞書６から読み出し
て、これら共起情報を共起情報検索部８に渡す。これに
より、共起情報検索部８はステップ２０４にて現在日本
語解析部４で解析している単語の前後の文に前記渡され
た共起情報に共起する単語があるかないかを検索し、ス
テップ２０５にて共起する単語が検索されない場合はス
テップ２１３へ進み、検索された場合はステップ２０６
へ進む。ステップ２０６へ進んだ場合、共起情報検索部
８は検索された単語と共起関係にある読みの優先順位を
変更し、得られた優先順位情報をステップ２０７にて共
起情報使用記憶部９に保存すると共に、日本語解析部４
に渡す。その後、ステップ２０８にて日本語解析部４は
前記文書データに関する構文や意味の解析を行って得ら
れた最終的な解析結果と共に、前記共起情報検索部８か
ら渡された優先順位情報を解析結果バッファ７に格納し
た後、ステップ２０９へ進む。The control unit 3 reads the document data selected by the operator from the document data file 1 one word at a time from the beginning, and passes the read sentence to the Japanese analysis unit 4. In step 203, the Japanese analysis unit 4 refers to the word dictionary 6 having the contents as shown in FIG. 4, which stores the morphological information, reading information, accent information, co-occurrence information between words, etc. The sentence data is morphologically, syntactically, and semantically analyzed to divide the document data into word units, and an analysis result in which information in the word dictionary 6 regarding each word is collected is obtained. The control unit 3 writes the analysis result thus obtained in the analysis result buffer 7. At this time, when there are a plurality of readings in the analysis target word (same and different sound words), the Japanese analysis unit 4 reads the co-occurrence information for each reading from the word dictionary 6 and co-occurs these co-occurrence information. It is passed to the information retrieval unit 8. As a result, the co-occurrence information search unit 8 searches in step 204 whether or not there is a word that co-occurs in the passed co-occurrence information in the sentences before and after the word currently analyzed by the Japanese analysis unit 4. If no co-occurring word is found in step 205, go to step 213. If found, step 206
Go to. When the process proceeds to step 206, the co-occurrence information search unit 8 changes the priority order of readings that have a co-occurrence relationship with the searched word, and the obtained priority order information is used by the co-occurrence information storage unit 9 in step 207. Japanese analysis part 4
Pass to. Then, in step 208, the Japanese analysis unit 4 analyzes the priority information passed from the co-occurrence information search unit 8 together with the final analysis result obtained by analyzing the syntax and meaning of the document data. After storing in the result buffer 7, the process proceeds to step 209.

【００１０】一方、ステップ２１３へ進んだ場合、共起
情報使用記憶部９に優先順位情報が保存されているか否
かを判定し、保存されている場合はステップ２１４へ進
み、保存されていない場合はステップ２１５へ進む。ス
テップ２１５へ進んだ場合、日本語解析部４により解析
された同字異音語の読みの優先順位を初期設定の順位の
ままとして、ステップ２０８へ進む。ステップ２１４へ
進んだ場合、日本語解析部４により解析された同字異音
語の読みの優先順位を初期設定から共起情報使用記憶部
９に記憶されている読みの優先順位に変更した後、ステ
ップ２０８へ進む。On the other hand, if the process proceeds to step 213, it is determined whether or not the priority information is stored in the co-occurrence information use storage unit 9. If it is stored, the process proceeds to step 214, and if it is not stored. Proceeds to step 215. In the case of proceeding to step 215, the reading priority of the homonym and acronym analyzed by the Japanese analysis section 4 is left as the default order and the procedure proceeds to step 208. In the case of proceeding to step 214, after changing the reading priority of the homonym and acronym analyzed by the Japanese analysis unit 4 from the initial setting to the reading priority stored in the cooccurrence information use storage unit 9 , Go to step 208.

【００１１】制御部３は読み上げ対象となった文書デー
タの解析終了を日本語解析部４から知らされると、ステ
ップ２０９にて音声データ生成部１０を起動して、音声
データの生成を行う。音声データ生成部１０は解析結果
バッファ７に保存されている前記音声データの解析結果
から音声データを音声データ生成規則ファイル１１に格
納されている生成規則に従って作成し、得られた音声デ
ータを音声データファイル１２に格納する。この際、音
声データ生成部１０は設定バッファ５に設定されている
読み上げ条件を参照して、この条件に対応した音質や速
度で読み上げるためのデータを前記音声データと共に音
声データファイル１２に格納する。音声データ生成部１
０は上記した音声データの作成が終了すると、これを制
御部３に知らせる。制御部３はこの知らせを受けると音
声データファイル１２内の音声データを音声合成装置１
３及び表示データ作成部１６に送る。When the Japanese analysis unit 4 notifies the control unit 3 of the end of analysis of the document data to be read out, the control unit 3 activates the voice data generation unit 10 in step 209 to generate voice data. The voice data generation unit 10 creates voice data from the analysis result of the voice data stored in the analysis result buffer 7 according to the generation rule stored in the voice data generation rule file 11, and the obtained voice data is used as the voice data. Store in file 12. At this time, the voice data generation unit 10 refers to the reading condition set in the setting buffer 5, and stores the data for reading with the sound quality and speed corresponding to the condition in the voice data file 12 together with the voice data. Voice data generator 1
0 notifies the control unit 3 of the completion of the creation of the voice data described above. When the control unit 3 receives this notification, the control unit 3 converts the voice data in the voice data file 12 into the voice synthesizer 1
3 and the display data creation unit 16.

【００１２】ステップ２１０にて、表示データ作成部１
６は入力された音声データを表示データ形式に変換し、
得られた表示データを表示データファイル１５に格納す
る。一方、音声合成装置１３はステップ２１１にて音声
データファイル１２から入力された音声データを、この
音声データに付加されている速度、音質、高さ、強さ等
の読み上げ条件に従って電気信号に変換し、これを音声
出力部１４に出力する。音声出力部１４は入力された電
気信号を音声に変換して、これをスピーカ等から出力す
る。これと同時に、制御部３は表示データファイル１５
から表示データを読み出して表示装置１７に表示すると
共に、文書データファイル１の前記表示データに対応す
る文書データを読み出して、これも表示装置１７の同画
面内に表示する。その後、制御部３はステップ２１２に
て指定された文書データが最後まで読み出されて読み上
げ用の解析が施されたか否かを判定し、解析されていな
い場合はステップ２０３に戻り、解析されて読み上げが
終了した場合は処理を終了する。In step 210, the display data creation unit 1
6 converts the input voice data into a display data format,
The obtained display data is stored in the display data file 15. On the other hand, the voice synthesizer 13 converts the voice data input from the voice data file 12 in step 211 into an electric signal according to the reading conditions such as speed, sound quality, height, and strength added to the voice data. , And outputs this to the audio output unit 14. The voice output unit 14 converts the input electric signal into voice and outputs the voice from a speaker or the like. At the same time, the controller 3 displays the display data file 15
The display data is read from and displayed on the display device 17, and the document data corresponding to the display data of the document data file 1 is also read and displayed on the same screen of the display device 17. After that, the control unit 3 determines whether or not the document data designated in step 212 has been read to the end and analyzed for reading, and if not analyzed, the process returns to step 203 to be analyzed. If the reading is finished, the process ends.

【００１３】次に上記本装置の文書データ読み上げ動作
を具体例に従って更に詳述する。例えば、入力装置２か
らオペレータによって読み上げ指定された文書データの
一部に図５に示すような例文があった場合、この例文は
日本語解析部４によって単語辞書６を参照して解析さ
れ、その解析結果は図６に示したようになり、これが解
析結果バッファ７に格納される。この図６にてＡＣＳ０
はアクセントがないことを示し、ＡＣＳ１はアクセント
が第１拍目にあることを示している。これによると、
「私（わたし）」はどこにもアクセントがなく、「は」
には第１拍目にアクセントがあることが分かる。Next, the document data read-out operation of this apparatus will be described in more detail with reference to a concrete example. For example, when there is an example sentence as shown in FIG. 5 in a part of the document data read out by the operator from the input device 2, the example sentence is analyzed by the Japanese analysis unit 4 with reference to the word dictionary 6, and The analysis result is as shown in FIG. 6, which is stored in the analysis result buffer 7. ACS0 in this FIG.
Indicates that there is no accent, and ACS1 indicates that the accent is on the first beat. according to this,
"I" has no accent, and "ha"
It can be seen that has an accent on the first beat.

【００１４】上記日本語解析部４では、文書データを解
析して読みを導く際に、一つの単語に対して複数の読み
が存在する場合、共起情報検索部８を起動して、単語辞
書６内に記述されている共起情報に基づいて、前記複数
の読みの中で適切なものを選択する。前記共起情報は単
語辞書６内の読みが複数ある単語の各読みに対して図４
に示すように記述されている。ここで、読み上げ対象文
書データに図７に示すような「２月の市場の株価は値を
上げた。」という文章があった時、この文章に対する単
語辞書６の検索結果により、「市場」の読みとして「し
じょう」と「いちば」の２つが存在することが分かり、
図８はこの場合の検索状態を示している。図８では「市
場」に対して「しじょう」「いちば」の２つの読みが
あり、各読みに対して共起情報（株金融株価…，魚野菜
…）が記述されている。そこで、日本語解析部４はこれ
ら共起情報を共起情報検索部８に渡して、この共起情報
検索部８を起動する。起動された共起情報検索部８は前
記単語（市場）の前方或いは後方に渡された共起情報に
基づいて各読みに共起する単語があるか否かを検索す
る。上記した文例にて、「しじょう」は「株価（かぶ
か）」と共起関係にあるが、「いちば」との共起関係に
ある単語がないことが分かり、共起情報検索部８は「し
じょう」の読みの優先順位を上げ、この優先順位情報を
共起情報使用記憶部９に書き込むと共に、日本語解析部
４に渡す。In the Japanese analysis unit 4, when a plurality of readings exist for one word in analyzing the document data and guiding the reading, the co-occurrence information searching unit 8 is activated and the word dictionary is activated. Based on the co-occurrence information described in 6, an appropriate one is selected from the plurality of readings. The co-occurrence information is shown in FIG. 4 for each reading of a word having a plurality of readings in the word dictionary 6.
It is described as shown in. Here, when the reading target document data has a sentence such as "the stock price in the market in February increased in February" as shown in FIG. 7, the search result of the word dictionary 6 for this sentence indicates that "market" is You can see that there are two readings, "Shijo" and "Ichiba",
FIG. 8 shows a search state in this case. In FIG. 8, there are two readings of “shijo” and “ichiba” for “market”, and co-occurrence information (stock financial stock price ..., fish and vegetables ...) is described for each reading. Therefore, the Japanese analysis unit 4 passes the co-occurrence information to the co-occurrence information search unit 8 and activates the co-occurrence information search unit 8. The activated co-occurrence information retrieval unit 8 retrieves whether or not there is a co-occurrence word in each reading, based on the co-occurrence information passed in front of or behind the word (market). In the above example sentence, it is found that "shijo" has a co-occurrence relationship with "stock price", but there is no word that has a co-occurrence relationship with "ichiba". Raises the priority of reading "Shijo", writes this priority information in the cooccurrence information use storage unit 9 and passes it to the Japanese analysis unit 4.

【００１５】同様に上記した文例中の「値」に対しても
「ね」と「あたい」の２つの読みがあるが、「株価（か
ぶか）」との共起関係により、共起情報検索部８は読み
として「ね」を選択して、この読みの優先順位を上げ
る。ここで、前記図７に示した例文の後半に出てくる
「そして市場は大変賑わった」という文に関しては、
「市場」の読みである「しじょう」にも「いちば」にも
共起関係にある単語が存在しない。その場合、先ほど共
起関係により優先された「しじょう」読みの優先順位
を、日本語解析部４は共起情報使用記憶部９から読み出
して、これを採用する。Similarly, although there are two readings of "ne" and "atai" for "value" in the above-mentioned sentence example, co-occurrence information search is performed due to the co-occurrence relationship with "stock price (kabuka)". The section 8 selects "ne" as the reading to raise the priority of this reading. Here, regarding the sentence "and the market was very busy" that appears in the latter half of the example sentence shown in FIG. 7,
There is no co-occurrence word in either "shijo" or "ichiba", which is the reading of "market". In that case, the Japanese analysis unit 4 reads out the priority order of “Shijo” reading, which has been prioritized by the co-occurrence relationship, from the co-occurrence information use storage unit 9 and adopts it.

【００１６】又、図９に示すように読み上げる文書デー
タ中に「２月の市場は魚が大量に入った。そして市場は
大変賑わった。」という文章があった時、「２月の市場
は魚が大量にはいった。」という文章に関しては単語辞
書６の検索結果により、上記の文例と同様に「市場」に
対しては「しじょう」と「いちば」の２つの読みが存在
することが分かる。そこで、共起情報検索部８は前記各
読みに対する共起情報に基づいて、前記単語の前後に共
起する単語があるか否かを検索すると、「市場」と
「魚」が共起関係にあることが分かり、この文の場合は
「いちば」の読みの優先順位を上げる。その後出てくる
「そして市場は大変賑わった。」という文に関しては、
「市場」の読みである「しじょう」にも「いちば」にも
共起関係がある単語が存在しないが、この場合には、先
程共起関係により優先順位が上がった「いちば」の読み
が採用される。即ち、図７、図９に示した２つの例文の
前文は異なるが、それに続く後半の文は「そして市場は
大変賑わった。」で共通している。しかし、前文に共起
関係にある単語があった場合には、図７と図９の各例文
中の「市場」という単語を「しじょう」と「いちば」と
いうように異なった読みで読み分けることが行われる。
ここで、図１０は前記した図７の例文を日本語解析部４
により解析した解析結果を示しており、特に右端の欄は
優先順位情報を共起情報使用記憶部９に記憶したか否か
を示す情報である。同様に、図１１は前記した図９の例
文を日本語解析部４により解析した解析結果を示してい
る。Further, as shown in FIG. 9, when there is a sentence in the text data to be read out, "A lot of fish entered the market in February. The market was very busy." As for the sentence "A lot of fish came in.", There are two readings of "Shijo" and "Ichiba" for "market" as in the above example, according to the search results in the word dictionary 6. I understand. Therefore, when the co-occurrence information search unit 8 searches for co-occurrence words before and after the word based on the co-occurrence information for each reading, “market” and “fish” are found to have a co-occurrence relationship. It turns out that there is, and in the case of this sentence, the priority of reading "Ichiba" is raised. Regarding the sentence "And the market was very busy."
There is no co-occurrence word in either "shijo" or "ichiba," which is the reading of "market," but in this case, the priority of "ichiba", which was raised earlier due to the co-occurrence relationship, Reading is adopted. That is, although the preceding sentences of the two example sentences shown in FIG. 7 and FIG. 9 are different, the sentence in the latter half of the subsequent sentence is “and the market is very busy.” However, if there is a co-occurrence word in the previous sentence, read the word “market” in each of the example sentences in FIGS. 7 and 9 with different readings such as “shijo” and “ichiba”. Dividing is done.
Here, FIG. 10 shows the example sentence of FIG.
Shows the analysis result analyzed by, and particularly the rightmost column is information indicating whether or not the priority order information is stored in the cooccurrence information use storage unit 9. Similarly, FIG. 11 shows an analysis result obtained by analyzing the example sentence of FIG. 9 described above by the Japanese analysis unit 4.

【００１７】制御部３は上記した日本語解析部４の文書
の解析終了を知ると、音声データ生成部１０を起動す
る。音声データ生成部１０は解析結果バッファ７の内容
を元に、音声データ生成規則ファイル１１内の規則を参
照しながら音声データを作成して、これを音声データフ
ァイル１２に格納する。図１２は前記した音声データ生
成規則の一部を示した図である。このような音声データ
生成規則に基づいて音声データ生成部１０が作成した音
声データは図１３に示すようになる。ここで、前記図１
２で示した規則は、５段動詞でアクセントの形が０型で
ない場合で、その活用形が未然形の時はそのアクセント
の形を０型にするという例である。図１３は音声データ
ファイル１２に格納されている音声データのフォーマッ
ト例を示している。この図１３の中でかたかな文字列は
音声データを示し、「＾」はアクセントの位置を示し、
「．」は設定バッファ５に設定されている長さの休みを
示している。音声データファイル１２は解析結果バッフ
ァ７の全ての内容に対して処理が終了すると、制御部３
に音声データが作成できたことを知らせる。この知らせ
を受けた制御部３は設定バッファ５の内容を参照しなが
ら、音声データファイル１２内の音声データを音声合成
装置１３に出力する。When the control section 3 knows that the Japanese analysis section 4 has finished analyzing the document, it activates the voice data generation section 10. The voice data generation unit 10 creates voice data based on the contents of the analysis result buffer 7 while referring to the rules in the voice data generation rule file 11, and stores this in the voice data file 12. FIG. 12 is a diagram showing a part of the above-mentioned voice data generation rule. The voice data created by the voice data generation unit 10 based on such a voice data generation rule is as shown in FIG. Here, in FIG.
The rule shown by 2 is an example in which the accent form is not type 0 when the accent form is not type 0 with a 5-verb verb, and the accent form is type 0. FIG. 13 shows an example of the format of audio data stored in the audio data file 12. In this FIG. 13, the Katakana character string indicates voice data, and “^” indicates the position of the accent,
“.” Represents a break of the length set in the setting buffer 5. When all the contents of the analysis result buffer 7 have been processed in the voice data file 12, the control unit 3
Notify that the voice data has been created. Receiving this notification, the control unit 3 outputs the voice data in the voice data file 12 to the voice synthesizer 13 while referring to the contents of the setting buffer 5.

【００１８】音声合成装置１３は音韻列と各種制御コー
ドからなるデータを受けることにより、これらを電気的
な音声信号に変換する装置であり、前記データは図１３
に示した音声データと同じ形式である。音声合成装置１
３は図１３に示したようなフォーマットの音声データの
文字列を入力すると、音声の規則合成を行える装置であ
り、指定された速度、音質、高さ、強さにより、これら
情報に続いて送られてくる文字列に対して規則合成を行
うことができる。その結果、音声合成装置１３は前記音
声データを電気的な合成音声信号に変換し、この電気信
号をスピーカ等を備えた音声出力部１４に出力して、実
際の音声として周囲に出力する。The voice synthesizing device 13 is a device which receives data consisting of a phoneme sequence and various control codes, and converts these into an electric voice signal.
It has the same format as the voice data shown in. Speech synthesizer 1
Reference numeral 3 is a device that can perform rule synthesis of voices by inputting a character string of voice data in a format as shown in FIG. 13, which is transmitted following these information according to a specified speed, sound quality, height and strength. Rule composition can be performed on the received character string. As a result, the voice synthesizing device 13 converts the voice data into an electrical synthesized voice signal, outputs the electrical signal to the voice output unit 14 equipped with a speaker or the like, and outputs it as an actual voice to the surroundings.

【００１９】一方、制御部３は音声データファイル１２
内の音声データを音声合成装置１３に送るのと同時に、
前記音声データを表示データ作成部１６にも送る。表示
データ作成部１６は送られてきた音声データをアクセン
トや読みが分かり易くなるような表示データに作成し直
す。このようにして作成された表示データは制御部３に
よって一旦表示データファイル１５に格納されるが、そ
の後、表示装置１７に送られ、ここで、対応する文書デ
ータと一緒に表示される。図１４はこの時の表示装置１
７の表示画面例を示した図である。この図において、読
みに対する「＾」のマークは発声を行う際に、その拍に
対してアクセントがあることを示している。この際、制
御部３は音声合成装置１３に前記音声データを送るのと
同時に、表示装置１７に表示している文書データとそれ
に対応した音声データに対して、音声の発生と同じタイ
ミングで文字の着色、反転、下線の付加等の文字修飾を
施して、現在どこを読み上げているかを、オペレータに
分かるようにする。図１５はこのような表示画面例を示
しており、現在、「今日」という単語の「キョ」という
拍を読み上げている状態を表示している。画面の上側に
表示されている文書データの現在読み上げている単語に
は反転の文字修飾が施され、音声データを元に作成され
た前記画面の下側の表示データの中で、今まで読み上げ
た部分は反転されて強調表示されている。これらの表示
によって音声による読み上げタイミングが表示されてい
る。On the other hand, the control unit 3 controls the voice data file 12
At the same time that the internal voice data is sent to the voice synthesizer 13,
The voice data is also sent to the display data creation unit 16. The display data creation unit 16 recreates the sent voice data into display data that makes the accent and reading easier to understand. The display data created in this way is temporarily stored in the display data file 15 by the control unit 3 and then sent to the display device 17 where it is displayed together with the corresponding document data. FIG. 14 shows the display device 1 at this time.
It is a figure showing the example of a display screen of No. 7. In this figure, the mark "^" for reading indicates that there is an accent on the beat when uttering. At this time, the control unit 3 sends the voice data to the voice synthesizer 13 and simultaneously, at the same timing as the generation of voice, with respect to the document data displayed on the display device 17 and the voice data corresponding to the document data. Character modifications such as coloring, inversion, and addition of underline are made so that the operator can know where to read aloud at present. FIG. 15 shows an example of such a display screen, and shows a state in which the word “Kyo” of the word “today” is currently being read aloud. The word being read aloud in the document data displayed on the upper side of the screen has been subjected to reverse character modification, and it has been read out so far in the display data on the lower side of the screen created based on the voice data. Portions are highlighted and highlighted. With these displays, the reading timing by voice is displayed.

【００２０】本実施例によれば、読み上げ対象の文字デ
ータを日本語解析する際に参照される単語辞書６には、
見出しに対する複数の各読みに対して共起情報が格納さ
れており、解析対象の単語に複数の読みがあった場合
は、この単語の各読みに対する共起情報を共起情報検索
部８に渡し、この共起情報検索部８により前記単語の前
後に共起する単語があるかないかを検索し、あった場合
は前記単語と共起関係にある読みの優先順位を上げた
後、この優先順位情報と前記日本語解析による解析結果
に基づいて音声データを作成し、この音声データを音声
合成装置１３によって音声信号に変換し、これを音声出
力装置４から出力する構成を採っているため、１つの単
語に対して複数の読みがある同字異音語の読み分けを、
この単語が含まれる文書の意味や状況によって適切に読
み分けることができる。According to the present embodiment, the word dictionary 6 referred to when the character data to be read is analyzed in Japanese is
When the co-occurrence information is stored for each of the plural readings for the headline, and the word to be analyzed has a plurality of readings, the co-occurrence information for each reading of this word is passed to the co-occurrence information searching unit 8. , The co-occurrence information search unit 8 searches for words that co-occur before and after the word, and if there is, increases the priority of reading that has a co-occurrence relationship with the word, Since voice data is created based on the information and the analysis result by the Japanese language analysis, the voice data is converted into a voice signal by the voice synthesizer 13, and the voice signal is output from the voice output device 4. Differentiating homophones with multiple readings for one word,
It can be properly read according to the meaning and situation of the document containing this word.

【００２１】[0021]

【発明の効果】以上記述した如く本発明の文書読み上げ
装置によれば、同字異音語を場合により読み分けて、常
に適切な文書データの読み上げを行うことができる。As described above, according to the document reading device of the present invention, it is possible to read the homonymous words differently depending on the situation and always read the appropriate document data.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の文書読み上げ装置の一実施例を示した
ブロック図。FIG. 1 is a block diagram showing an embodiment of a document reading device of the present invention.

【図２】図１に示した装置の文書読み上げ動作を示した
フローチャート。FIG. 2 is a flowchart showing a document reading operation of the apparatus shown in FIG.

【図３】図１に示した設定バッファ内に設定される読み
上げ条件例を示した図。FIG. 3 is a diagram showing an example of reading conditions set in a setting buffer shown in FIG.

【図４】図１に示した単語辞書の構成例を示した図。FIG. 4 is a diagram showing a configuration example of the word dictionary shown in FIG.

【図５】図１の装置で読み上げられる文書データ例の一
部を示した図。5 is a diagram showing a part of an example of document data read out by the apparatus of FIG.

【図６】図１に示した解析結果バッファ７内に格納され
る解析結果例を示した図。6 is a diagram showing an example of analysis results stored in an analysis result buffer 7 shown in FIG.

【図７】図１の装置で読み上げられる文書データの他の
例の一部を示した図。FIG. 7 is a diagram showing a part of another example of document data read aloud by the apparatus of FIG.

【図８】図７に示した「市場」に対する日本語解析結果
例を示した図。FIG. 8 is a diagram showing an example of Japanese analysis results for the “market” shown in FIG. 7.

【図９】図１の装置で読み上げられる文書データの他の
例の一部を示した図。9 is a diagram showing a part of another example of document data read aloud by the apparatus of FIG.

【図１０】図７に示した文書に対する日本語解析結果例
を示した図。10 is a diagram showing an example of Japanese analysis results for the document shown in FIG.

【図１１】図９に示した文書に対する日本語解析結果例
を示した図。FIG. 11 is a diagram showing an example of Japanese analysis results for the document shown in FIG.

【図１２】図１に示した音声データ生成規則ファイル内
の生成規則例を示した図。FIG. 12 is a diagram showing an example of generation rules in the audio data generation rule file shown in FIG. 1.

【図１３】図１に示した音声データファイルに格納され
る音声データ例を示した図。13 is a diagram showing an example of audio data stored in the audio data file shown in FIG.

【図１４】図１に示した表示装置に表示される読み上げ
文書データとこれに対応する音声データの表示例を示し
た図。14 is a diagram showing a display example of read-aloud document data displayed on the display device shown in FIG. 1 and voice data corresponding thereto.

【図１５】図１４に示した表示データ例に読み上げタイ
ミングを表示した場合の画面例を示した図。FIG. 15 is a diagram showing an example of a screen when the reading timing is displayed in the display data example shown in FIG.

【符号の説明】[Explanation of symbols]

１…文書データファイル２…入力装置３…制御部４…日本語解析部５…設定バッファ６…単語辞書７…解析結果バッファ８…共起情報検索
部９…共起情報使用記憶部１０…音声データ
生成部１１…音声データ生成規則ファイル１２…音声データ
ファイル１３…音声合成装置１４…音声出力部１５…表示データファイル１６…表示データ
作成部１７…表示装置1 ... Document data file 2 ... Input device 3 ... Control unit 4 ... Japanese analysis unit 5 ... Setting buffer 6 ... Word dictionary 7 ... Analysis result buffer 8 ... Co-occurrence information search unit 9 ... Co-occurrence information storage unit 10 ... Voice Data generation unit 11 ... Voice data generation rule file 12 ... Voice data file 13 ... Voice synthesizer 14 ... Voice output unit 15 ... Display data file 16 ... Display data creation unit 17 ... Display device

Claims

【特許請求の範囲】[Claims]

【請求項１】文書データを日本語解析して音声データ
を得、この音声データを音声合成装置により電気的な音
声信号に変換し、これを音声出力装置により音声にして
出力する文書読み上げ装置において、複数の読みがある
見出し語の前記各読みに対する共起情報を収集した単語
辞書と、日本語解析対象の単語に複数の読みがあると、
前記単語辞書から前記各読みに対する共起情報を読み出
す共起情報読出手段と、この共起情報読出手段により読
み出された共起情報に基づいて前記日本語解析対象の単
語の前後の文書データ内に共起する単語があるか否かを
検索する共起情報検索手段と、この共起情報検索手段に
より検索された単語と共起関係にある前記日本語解析対
象の単語の読みの優先順位を第１位とする読み決定手段
とを具備し、この読み決定手段により優先順位第１位と
なった読みにて音声データを作成する音声データ作成手
段とを具備したことを特徴とする文書読み上げ装置。1. A document reading device for analyzing Japanese from document data to obtain voice data, converting the voice data into an electric voice signal by a voice synthesizer, and outputting the voice signal as a voice by a voice output device. , A word dictionary that collects co-occurrence information for each reading of a headword having a plurality of readings, and a Japanese analysis target word has a plurality of readings,
In the co-occurrence information reading means for reading the co-occurrence information for each reading from the word dictionary, and in the document data before and after the word to be analyzed in Japanese based on the co-occurrence information read by the co-occurrence information reading means. The co-occurrence information search means for searching whether or not there is a co-occurrence word, and the reading priority of the words to be analyzed in Japanese that have a co-occurrence relationship with the word searched by the co-occurrence information search means. A document reading device, comprising: a reading determination unit that is the first place; and a voice data creation unit that creates voice data with the reading that has the first priority by the reading decision unit. .

【請求項２】前記読み決定手段により優先順位が１位
となった読みの単語が再度出現した場合に、この単語の
読みに共起する単語が前記共起情報検索手段により検索
されない限り、前回決定された優先順位が１位の読みを
再度採用することを特徴とする請求項１記載の文書読み
上げ装置。2. When a reading word whose priority is ranked 1st by the reading determining unit appears again, unless a word that co-occurs with reading of this word is searched by the co-occurrence information searching unit, 2. The document reading device according to claim 1, wherein the reading of which the determined priority is first is adopted again.