JP2006098552A

JP2006098552A - Speech information generating device, speech information generating program and speech information generating method

Info

Publication number: JP2006098552A
Application number: JP2004282693A
Authority: JP
Inventors: Yuuji Shimizu; 勇詞清水
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2004-09-28
Filing date: 2004-09-28
Publication date: 2006-04-13

Abstract

<P>PROBLEM TO BE SOLVED: To easily generate a speech intermediate language document without being experienced in accents and syllables. <P>SOLUTION: A speech information generating device is equipped with an input processing part which inputs a character string in Hiragana (cursive form of Japanese syllabary) inputted from a user, a display selection part which displays candidates in Kanji (Chinese character) mixed representation which are made to correspond to the character string in Hiragana and can be selected by the user and selects Kanji mixed representation selected by the user, and a speech intermediate language generation part which generates speech intermediate language data by combining pronunciation symbols with a reading for speech synthesis on the basis of the intermediate language character string made correspond to the Kanji mixed represetation selected from the display selection part. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声を合成するための中間言語データを生成するため音声合成用記号の音声情報生成装置、音声情報生成プログラム及び音声情報生成方法に関し、入力装置により入力された文字列から中間言語データを生成することに関するものである。 The present invention relates to a speech information generation apparatus, a speech information generation program, and a speech information generation method for a speech synthesis symbol for generating intermediate language data for synthesizing speech. The present invention relates to intermediate language data from a character string input by an input device. Is related to generating

従来、パソコン、ＰＤＡ、組み込みボード上で動作する音声合成技術は、大きく分けて２種類ある。第１の技術は、漢字交じりの一般の文書そのものを音声合成の対象としたもので、音声合成する対象となる文書を形態素解析し、読み、アクセント、ポーズ長情報を各種規則に従って予測し、その情報を元に音声合成を行うＴＴＳ（テキスト・トウ・スピーチ）方式である。 Conventionally, there are roughly two types of speech synthesis technologies that operate on personal computers, PDAs, and embedded boards. The first technique uses a general kanji mixed document itself as the target of speech synthesis, morphologically analyzes the target document for speech synthesis, predicts reading, accent, and pose length information according to various rules. This is a TTS (text-to-speech) system that performs speech synthesis based on information.

第２の技術は、一般の文書を用いずに、読みやアクセント、ポーズ長情報を特定のフォーマットで表記した音声中間言語文字列を利用者が入力し、この音声中間言語文字列を元に音声合成を行う音声中間言語入力方式である（例えば非特許文献１）。 In the second technique, a user inputs a speech intermediate language character string in which reading, accent, and pause length information are expressed in a specific format without using a general document, and a speech is generated based on the speech intermediate language character string. This is a speech intermediate language input method for performing synthesis (for example, Non-Patent Document 1).

第１の技術であるＴＴＳ方式は、入力した文書の形態素解析を失敗すると、文書に記載された本来の内容とは異なる解釈が為されてしまうため、期待した読みを出力できない場合がある。このため形態素解析の精度を１００％に出来ない限り、期待した読みを出力できない可能性があるため、絶対に読み誤りが許されない状況、例えばニュースの読み上げや、契約内容の読み上げ等について、ＴＴＳ方式を用いることは現在のところ困難である。 In the TTS method, which is the first technique, if the morphological analysis of the input document fails, an interpretation different from the original content described in the document may be made, so that an expected reading may not be output. For this reason, unless the accuracy of morphological analysis can be made 100%, the expected reading may not be output, so the TTS method is used for situations where reading errors are not allowed, such as reading news or reading contract contents. Is currently difficult to use.

これに対し、音声中間言語方式の場合は、ＴＴＳ方式のように形態素解析を行わず、利用者が読みやアクセント、ポーズ長情報を音声中間言語文字列として直接入力するため読み誤りが発生することはない。 On the other hand, in the case of the speech intermediate language method, the morphological analysis is not performed unlike the TTS method, and the user directly inputs the reading, accent, and pause length information as the speech intermediate language character string, which causes a reading error. There is no.

（社）日本電子工業振興協会「ＪＥＩＤＡ―６２−２０００日本語テキスト音声合成用記号の規格」（平成１２年３月、ＪＥＩＤＡ）Japan Electronics Industry Promotion Association “JEIDA-62-2000 Japanese Text-to-Speech Standard” (March 2000, JEIDA)

しかしながら、音声中間言語文字列を利用者が直接入力するためには、利用者が、言語学的な意味で、アクセントや音節（発音の単位、モーラともいう）等について習熟している必要がある。 However, in order for a user to directly input a speech intermediate language character string, the user needs to be familiar with accents, syllables (pronounced units, also called mora), etc. in a linguistic sense. .

例えば、「中国からパンダ」という文に対する音声中間言語は、読みを「ひらがな」で表現し、アクセントがある位置にアクセント記号「＾」、文節の切れ目の軽いポーズを「／」、文末にくるポーズは「．」という仕様の場合、「ちゅ＾うごくから／ぱ＾んだ.」として生成される。この場合、利用者はまず「中国」及び「パンダ」のアクセントの位置を知っている必要があり、さらにアクセント記号の挿入する位置は、最初の音節単位の直後となるが、「ちゅうごく」では、最初の音節単位は「ち」ではなく「ちゅ」となる。 For example, the speech intermediate language for the sentence “Chinese to panda” expresses the reading as “Hiragana”, accents “^” at the position where there is an accent, “/” for a light pause at the break of the phrase, and the pause at the end of the sentence In the case of the specification of “.”, It is generated as “Chuyu Gokaku / Paunda.”. In this case, the user first needs to know the positions of the accents of “China” and “Panda”, and the position where the accent mark is inserted is immediately after the first syllable unit. The first syllable unit is “Chu” instead of “Chi”.

このように、音声中間言語文字列の生成にはかなりのスキルが必要となるため、初心者が読みやアクセント記号を直接入力して音声中間言語文書を生成することは困難であるという問題がある。 Thus, since considerable skill is required to generate a speech intermediate language character string, there is a problem that it is difficult for a beginner to generate a speech intermediate language document by directly inputting readings and accent marks.

本発明は、上記に鑑みてなされたものであって、アクセントや音節について習熟していなくとも、文字列を入力して表示された漢字交じり表記の候補から適した漢字交じり表記を選択することで音声中間言語文書を容易に生成することができる音声情報生成装置、音声情報生成プログラム及び音声情報生成方法を提供することを目的とする。 The present invention has been made in view of the above, and by selecting a suitable kanji mixed notation from candidates for kanji mixed notation displayed by inputting a character string, even if not familiar with accents and syllables. An object of the present invention is to provide an audio information generation device, an audio information generation program, and an audio information generation method that can easily generate an audio intermediate language document.

上述した課題を解決し、目的を達成するために、利用者により入力される読みを示す表音文字列と、一般文書に用いられる漢字交じり表記と、該漢字交じり表記の読みと音声合成時に該漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた文字列対応情報を記憶する文字列対応記憶手段と、利用者により入力された前記表音文字列の入力処理を行う入力処理手段と、前記記憶手段に記憶された前記文字列対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択手段と、前記記憶手段に記憶された前記文字列対応情報により、前記表示選択手段により選択された前記漢字交じり表記と対応付けられた前記音声中間言語文字列に基づいて、音声合成するための読みと前記語調情報を組み合わせた音声中間言語文書を生成する中間言語生成手段と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, a phonetic character string indicating a reading input by a user, a kanji mixed notation used in a general document, and a kanji mixed notation reading and speech synthesis Character string correspondence storage means for storing character string correspondence information in association with speech intermediate language character strings combined with tone information indicating information specifying the tone in order to pronounce in a tone corresponding to kanji mixed notation, and use The input processing means for performing the input process of the phonogram string input by the user, and the character string correspondence information stored in the storage means is associated with the phonogram string input by the input processing means. A display selection means for selecting the kanji mixed notation candidates displayed so as to be selectable to the user, and selecting the kanji mixed notation made by the user from the displayed candidates; Based on the character string correspondence information, the speech that combines the pronunciation information and the tone information based on the speech intermediate language character string that is associated with the kanji mixed notation selected by the display selection unit And an intermediate language generation means for generating an intermediate language document.

また、本発明は、利用者により入力された読みを示す表音文字列の入力処理を行う入力処理手順と、前記表音文字列と、一般文書に用いられる漢字交じり表記とを対応付けた文字列表記対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択手順と、前記漢字交じり表記と、前記漢字交じり表記の読みと音声合成時に前記漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた表記中間言語対応情報により、前記表示選択手順により選択された前記漢字交じり表記と対応付けられた該音声中間言語文字列に基づいて、音声合成するための読みと該語調情報を組み合わせた音声中間言語文書を生成する中間言語生成手順と、をコンピュータに実行させる。 The present invention also relates to an input processing procedure for performing an input process of a phonetic character string indicating a reading input by a user, a character in which the phonetic character string is associated with a kanji mixed notation used in a general document. Based on the column notation correspondence information, the kanji mixed notation candidates associated with the phonogram strings input by the input processing means are displayed to be selectable to the user, and the displayed candidates are made the user. A display selection procedure for selecting the kanji mixed notation, the kanji mixed notation, and a tone that indicates information for specifying the tone to be pronounced in the tone corresponding to the kanji mixed notation when reading and synthesizing the kanji mixed notation In the speech associated with the kanji mixed notation selected by the display selection procedure, with the notation intermediate language correspondence information associated with the speech intermediate language character string combined with the information Based on the language string, to execute the intermediate language generation procedure for generating sound intermediate language document that combines reading and word or key information for speech synthesis, to a computer.

また、本発明は、利用者により入力された読みを示す表音文字列の入力処理を行う入力処理ステップと、前記表音文字列と、一般文書に用いられる漢字交じり表記とを対応付けた文字列表記対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択ステップと、前記漢字交じり表記と、前記漢字交じり表記の読みと音声合成時に前記漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた表記中間言語対応情報により、前記表示選択ステップにより選択された前記漢字交じり表記と対応付けられた該音声中間言語文字列に基づいて、音声合成するための読みと該語調情報を組み合わせた音声中間言語文書を生成する中間言語生成ステップと、を有することを特徴とする。 The present invention also provides an input processing step for performing an input process of a phonetic character string indicating a reading input by a user, a character in which the phonetic character string is associated with a kanji mixed notation used in a general document. Based on the column notation correspondence information, the kanji mixed notation candidates associated with the phonogram strings input by the input processing means are displayed to be selectable to the user, and the displayed candidates are made the user. A display selection step for selecting the kanji mixed notation, a kanji indicating the kanji mixed notation, and information for designating a tone to sound in a tone corresponding to the kanji mixed notation at the time of reading and synthesizing the kanji mixed notation The notation intermediate language correspondence information associated with the speech intermediate language character string combining information is associated with the kanji mixed notation selected by the display selection step. Based on the voice intermediate language character string, and having an intermediate language generation step of generating a sound intermediate language document that combines reading and word or key information for speech synthesis, a.

本発明によれば、利用者は読みを入力することで表示された漢字交じり表記の候補から適した候補を選択することで、候補に対応付けられた音声中間言語文字列に基づいて音声中間言語文書が生成されるため、利用者がアクセントや音節について習熟していなくとも、容易に音声合成用中間言語文書の生成することが可能という効果を奏する。 According to the present invention, the user selects a suitable candidate from the kanji mixed notation candidates displayed by inputting a reading, so that the speech intermediate language is based on the speech intermediate language character string associated with the candidate. Since the document is generated, it is possible to easily generate an intermediate language document for speech synthesis even if the user is not familiar with accents and syllables.

以下に添付図面を参照して、この発明に係る音声情報生成装置、音声情報生成プログラム及び音声情報生成方法を実現する音声中間言語生成装置の最良な実施の形態を詳細に説明する。 Exemplary embodiments of an audio information generation device, an audio information generation program, and an audio information generation method according to the present invention will be explained below in detail with reference to the accompanying drawings.

（第１の実施の形態）
図１は、第１の実施の形態に係る音声中間言語生成装置１００の構成を示すブロック図である。本図に示すように音声中間言語生成装置１００は、入力処理部１０１と、表示選択部１１０と、表記出力部１０２と、音声中間言語生成部１０３と、音声中間言語出力部１０４と、記憶部１０５とを備える。このような構成を備えることで音声中間言語生成装置１００は、キーボード１５１から入力された平仮名による文字列から、漢字交じりの表記による候補を表示し、利用者から適切な候補を選択されると、この候補に対応した中間言語文字列に基づいて中間言語データを生成することとなる。 (First embodiment)
FIG. 1 is a block diagram showing a configuration of a speech intermediate language generation apparatus 100 according to the first embodiment. As shown in the figure, the speech intermediate language generation apparatus 100 includes an input processing unit 101, a display selection unit 110, a notation output unit 102, a speech intermediate language generation unit 103, a speech intermediate language output unit 104, and a storage unit. 105. With such a configuration, the speech intermediate language generation device 100 displays candidates based on kanji kanji from the hiragana character string input from the keyboard 151, and when an appropriate candidate is selected by the user, Intermediate language data is generated based on the intermediate language character string corresponding to this candidate.

また、中間言語データとは、音声合成するための平仮名による読みと音声合成時に正確に発音するためのアクセントやポーズ長等の発音を示す発音記号が記載されたデータをいう。そして中間言語文字列は、中間言語データと同様に、平仮名による読みと発音記号により構成された単語などの文字列をいう。なお、発音記号とは、例えばアクセントを示す「＾」や、１秒のポーズ長を示す「．」や、０．３秒のポーズ長を示す「／」などがある。 The intermediate language data refers to data in which phonetic symbols indicating pronunciations such as accents and pose lengths for accurate pronunciation at the time of speech synthesis and pronunciation by hiragana for speech synthesis are described. The intermediate language character string is a character string such as a word composed of a reading by hiragana and a phonetic symbol, as in the intermediate language data. The phonetic symbols include, for example, “^” indicating an accent, “.” Indicating a pause length of 1 second, “/” indicating a pause length of 0.3 seconds, and the like.

入力処理部１０１は、利用者からキーボード１５１を介して入力された文字列や、表示された候補からの選択等に基づいて、入力処理を行う。具体的には、日本語ワープロへの入力のように、キーボード１５１から入力された文字列が、平仮名による文字列である場合は、入力処理を行った後、後述する入力バッファ１１１に出力する。なお、本実施の形態では、キーボード１５１から入力された文字列を処理したが、文字列を入力するための入力装置をキーボード１５１に制限するものではなく、例えばマイクを入力装置として用いて、利用者が読み上げた文字列について処理を行ってもよい。 The input processing unit 101 performs input processing based on a character string input from the user via the keyboard 151, selection from displayed candidates, and the like. Specifically, when the character string input from the keyboard 151 is a character string based on hiragana as in input to a Japanese word processor, the input process is performed and then output to the input buffer 111 described later. In this embodiment, the character string input from the keyboard 151 is processed. However, the input device for inputting the character string is not limited to the keyboard 151. For example, a microphone is used as the input device. The character string read out by the person may be processed.

また入力処理部１０１は、文字列が入力バッファ１１１に蓄積された後、利用者から漢字交じり表記の候補の表示する旨の指示があった場合、その旨を後述する候補表示処理部１１２に出力する。本実施の形態においては、利用者がキーボード１５１のスペースキーを押し下げたか否かにより、候補を表示する指示の有無を判定する。つまり文字列の入力処理が行われたあとで、スペースキーが押し下げられた場合に文字列の候補の表示の指示があったとする。スペースキーが押し下げられず、その代わりにリターンキーが押し下げられた場合、候補の表示は行わず、入力された文字列が、後述する候補選択バッファ１１６に蓄積される。 Further, after the character string is accumulated in the input buffer 111, the input processing unit 101 outputs an indication to the candidate display processing unit 112 described later when the user gives an instruction to display the kanji mixed notation candidate. To do. In the present embodiment, the presence / absence of an instruction to display a candidate is determined based on whether or not the user has pressed the space key on the keyboard 151. In other words, it is assumed that after the character string input process is performed, an instruction to display candidate character strings is given when the space key is pressed. When the space key is not depressed and the return key is depressed instead, the candidate is not displayed and the input character string is accumulated in a candidate selection buffer 116 described later.

また、入力処理部１０１は、ＣＲＴ１５２上に漢字交じり表記による候補が表示されている時に、候補に対応する数値が入力された場合、候補が選択されたものとして、後述する表示選択部１１０に入力された数値を出力する。なお、本実施の形態ではキーボード１５１から数値が入力された場合に、候補が選択されたとして処理を行ったが、他にもマウスやカーソルキー等により候補を選択することも考えられる。 In addition, when a numerical value corresponding to a candidate is input when a candidate in kanji mixed notation is displayed on the CRT 152, the input processing unit 101 inputs the input to the display selection unit 110 described later as a candidate selected. Output the numerical value. In this embodiment, when a numerical value is input from the keyboard 151, processing is performed assuming that a candidate is selected. However, it is also possible to select a candidate using a mouse, a cursor key, or the like.

記憶部１０５は、単語辞書及び単語繋がり辞書を保持し、後述する候補表示処理部１１２による候補の表示する時や、後述する音声中間言語生成部１０３により音声中間言語データを生成する時に用いられる。 The storage unit 105 holds a word dictionary and a word connection dictionary, and is used when candidates are displayed by a candidate display processing unit 112 (to be described later) or when voice intermediate language data is generated by the speech intermediate language generation unit 103 (to be described later).

図２は、本実施の形態に用いられる単語辞書の一例を示した図である。本図に示すように、１つの単語について、単語番号と、読み、漢字交じり表記、アクセント情報、品詞、プロパティを対応付けたレコードとして保持する。 FIG. 2 is a diagram showing an example of a word dictionary used in the present embodiment. As shown in the figure, for each word, a word number is stored as a record in which reading, kanji mixed notation, accent information, part of speech, and property are associated with each other.

図３は、本実施の形態に用いられる単語繋がり辞書の一例を示した図である。本図に示すように、前単語、後単語、単語間のポーズ長、アクセント移動を対応付けたレコードとして保持する。このように構成により、単語繋がり辞書は、単語の繋がりによるアクセント情報やポーズ長に関する予め定められた規則あるいはプロパティ情報を保持することが可能となる。また、単語繋がり辞書で保持する予め定められた規則をアクセント情報やポーズ長に制限するものではなく、適切に音声合成を行うための情報であればよい。 FIG. 3 is a diagram showing an example of a word connection dictionary used in the present embodiment. As shown in the figure, the previous word, the subsequent word, the pause length between words, and the accent movement are stored as a corresponding record. With this configuration, the word connection dictionary can hold predetermined rule or property information regarding accent information and pose length due to word connection. Further, the predetermined rules held in the word connection dictionary are not limited to accent information and pause length, but may be information for appropriately performing speech synthesis.

図１に戻り、表示選択部１１０は、ＣＲＴ１５２へ漢字交じり表記の候補を表示し、利用者がなされた漢字交じり表記を選択する。また表示選択部１１０は、入力バッファ１１１、候補表示処理部１１２、候補選択バッファ１１６及び表記表示処理部１１７を備える。 Returning to FIG. 1, the display selection unit 110 displays the kanji mixed notation candidates on the CRT 152 and selects the kanji mixed notation made by the user. The display selection unit 110 includes an input buffer 111, a candidate display processing unit 112, a candidate selection buffer 116, and a notation display processing unit 117.

入力バッファ１１１は、入力処理部１０１より入力処理された文字列を蓄積する。なお、利用者の入力により蓄積される文字列は、原則として読みを示す平仮名による文字列とする。 The input buffer 111 stores the character string input from the input processing unit 101. Note that the character string accumulated by the user's input is, in principle, a character string with hiragana indicating reading.

候補表示処理部１１２は、入力バッファ１１１に蓄積された文字列に基づいて、仮名漢字変換される可能性のある候補を取得し、取得した候補をＣＲＴ１５２上に表示するための処理を行う。また候補表示処理部１１２は、候補取得部１１３、候補バッファ１１４及び候補出力部１１５を備える。 The candidate display processing unit 112 acquires a candidate that may be converted to kana-kanji based on the character string stored in the input buffer 111, and performs processing for displaying the acquired candidate on the CRT 152. The candidate display processing unit 112 includes a candidate acquisition unit 113, a candidate buffer 114, and a candidate output unit 115.

候補取得部１１３は、記憶部１０５が記憶する単語辞書から、入力バッファ１１１に蓄積された文字列と読みが一致する漢字交じり表記の組み合わせの全てを候補として取得する。 The candidate acquisition unit 113 acquires, from the word dictionary stored in the storage unit 105, candidates for all combinations of kanji mixed notation whose readings match the character strings stored in the input buffer 111.

例えば、入力バッファ１１１に蓄積された文字列が「はしは」であった場合、図２で示された単語辞書に対して検索を行い、読みが一致する単語を組み合わせた候補である、単語番号３＋単語番号１０１、単語番号４＋単語番号１０１、単語番号５＋単語番号１０１を取得する。 For example, if the character string stored in the input buffer 111 is “Hashihashi”, a word is searched for the word dictionary shown in FIG. Number 3 + word number 101, word number 4 + word number 101, word number 5 + word number 101 are acquired.

候補バッファ１１４は、候補取得部１１３により取得された１つ以上の候補を蓄積する。例えば候補取得部１１３が取得した候補が単語番号３＋単語番号１０１、単語番号４＋単語番号１０１、単語番号５＋単語番号１０１の場合、候補バッファ１１４は（No.3,No.101）,(No.4,No101),(No.5,No.101)を蓄積する。 The candidate buffer 114 accumulates one or more candidates acquired by the candidate acquisition unit 113. For example, if the candidates acquired by the candidate acquisition unit 113 are word number 3 + word number 101, word number 4 + word number 101, word number 5 + word number 101, the candidate buffer 114 stores (No. 3, No. 101), (No. 4, No101), (No.5, No.101) are accumulated.

候補出力部１１５は、候補バッファ１１４に蓄積された候補を、単語辞書により単語番号を漢字交じり表記に変換してＣＲＴ１５２に出力する。 The candidate output unit 115 converts the candidate number stored in the candidate buffer 114 into a kanji mixed notation using a word dictionary, and outputs the result to the CRT 152.

図４は、本実施の形態にかかる音声中間言語生成装置１００の候補出力部１１５により出力された漢字交じり表記による候補の表示の一例を示した図である。具体的には、本図は、候補出力部１１５により、候補バッファ１１４に蓄積された（No.3,No.101）,（No.4,No101),（No.5,No.101）の単語番号を、単語辞書を用いて漢字交じり表記に変換した「箸は」「端は」「橋は」をＣＲＴ１５２に出力した画面である。 FIG. 4 is a diagram showing an example of candidate display by kanji mixed notation output by the candidate output unit 115 of the speech intermediate language generation device 100 according to the present exemplary embodiment. Specifically, this figure shows (No. 3, No. 101), (No. 4, No 101), (No. 5, No. 101) stored in the candidate buffer 114 by the candidate output unit 115. This is a screen in which “chopsticks”, “end”, and “bridge” are output to the CRT 152 by converting the word number into kanji mixed notation using a word dictionary.

候補選択バッファ１１６は、ＣＲＴ１５２に表示された候補から、利用者が選択した漢字交じり表記の単語番号の組み合わせを候補バッファ１１４から選択して蓄積する。なお、単語番号の組み合わせは、括弧を付けたまま候補選択バッファ１１６に蓄積する。 The candidate selection buffer 116 selects from the candidates displayed on the CRT 152 a combination of word numbers in kanji mixed notation selected by the user from the candidate buffer 114 and stores it. The combination of word numbers is stored in the candidate selection buffer 116 with parentheses attached.

例えば、図４で示した画面例において、利用者はＣＲＴ１５２に表示された候補から、キーボード１５１より「数字＋enter」を入力して候補を選択する。具体的には、「３＋enter」が入力処理部１０１から入力されると「３．橋は」を選択したものと判断し、選択した漢字交じり表記と対応付けられた候補バッファ１１４が保持する単語番号の組み合わせ（No.3,No.101）を選択し、候補選択バッファ１１６に蓄積する。 For example, in the screen example shown in FIG. 4, the user inputs “number + enter” from the keyboard 151 and selects a candidate from the candidates displayed on the CRT 152. Specifically, when “3 + enter” is input from the input processing unit 101, it is determined that “3. Bridge is” and the word number held in the candidate buffer 114 associated with the selected kanji mixed notation. Combination (No. 3, No. 101) is selected and stored in the candidate selection buffer 116.

なお、入力処理部１０１により候補を表示する指示がなかった場合、候補選択バッファ１１６は、入力処理された文字列をそのまま蓄積する。 If there is no instruction for displaying candidates from the input processing unit 101, the candidate selection buffer 116 stores the input character string as it is.

つまり、候補選択バッファ１１６は、候補による変換を行わずに平仮名による文字列「どうしても」が入力された場合、(どうしても)が蓄積される。つまり（No.3,No.101）のあとに(どうしても)が蓄積された場合、候補選択バッファ１１６に蓄積された文字列は（No.3,No.101）,(どうしても)となる。 That is, the candidate selection buffer 116 accumulates (whether) when the character string “inevitable” is input by hiragana without performing conversion by the candidate. That is, if (inevitable) is accumulated after (No. 3, No. 101), the character strings accumulated in the candidate selection buffer 116 are (No. 3, No. 101), (inevitable).

表記表示処理部１１７は、候補選択バッファ１１６に蓄積された単語番号を漢字交じり表記に変換してＣＲＴ１５２に表示する処理を行うものである。表記表示処理部１１７が候補選択バッファ１１６に蓄積された単語番号に対応した漢字交じりの表記をＣＲＴ１５２に表示するため、利用者は生成される音声中間言語データの内容の確認が容易となる。なお、表記表示処理部１１７は、候補を表示せずに候補選択バッファ１１６に蓄積された文字列をそのまま表示する処理を行う。 The notation display processing unit 117 performs processing for converting the word numbers stored in the candidate selection buffer 116 into kanji mixed notation and displaying them on the CRT 152. Since the notation display processing unit 117 displays the kanji-mixed notation corresponding to the word number stored in the candidate selection buffer 116 on the CRT 152, the user can easily confirm the content of the generated speech intermediate language data. The notation display processing unit 117 performs a process of displaying the character strings stored in the candidate selection buffer 116 as they are without displaying the candidates.

また、候補表示処理部１１２は、漢字交じり候補を表示する処理を行うときに、漢字交じりの候補以外に品詞を表示する等の処理を行ってもよい。これは、品詞が異なる単語が選択されてしまうと、単語間の繋がりが誤って認識されてしまい、後述する音声中間言語生成部１０３において、不適切な発音記号が挿入された中間言語データを生成する可能性があるためである。特に平仮名のみの文字列による候補から選択する場合に有効である。 Further, the candidate display processing unit 112 may perform processing such as displaying part of speech in addition to the kanji-mixing candidate when performing the process of displaying the kanji-mixing candidate. This is because if a word with a different part of speech is selected, the connection between the words is mistakenly recognized, and the intermediate language data in which an inappropriate phonetic symbol is inserted is generated in the speech intermediate language generation unit 103 described later. This is because there is a possibility. This is particularly effective when selecting from candidates based on character strings of only hiragana.

音声中間言語生成部１０３は、候補選択バッファ１１６に蓄積された単語番号から、単語辞書により単語のアクセント情報、品詞、プロパティを取得し、さらに単語繋がり辞書を用いて単語間の繋がりに基づいたアクセントの変更あるいはプロパティに基づいたアクセントの変更などの処理を行いながら音声中間言語データを生成する。 The speech intermediate language generation unit 103 acquires the accent information, part of speech, and property of the word from the word number stored in the candidate selection buffer 116 using a word dictionary, and further uses the word connection dictionary to accent the word based on the connection between words. Audio intermediate language data is generated while processing such as changing the accent or changing the accent based on the property.

つまり音声中間言語生成部１０３は、音声中間言語データを生成する時、単語単独あるいは単語の繋がりにより単語辞書に登録している単語の読みやアクセント情報を変更する必要がある。またポーズ長は単語単独あるいは単語の繋がりにより決定されることもあるため、音声中間言語生成部１０３は、中間言語データを生成する時にポーズ長を示す記号を挿入する必要がある。このため音声中間言語生成部１０３は、単語繋がり辞書を用いて、読みやアクセント情報の変更や、ポーズ長を示す記号の挿入を行う。 That is, when generating the voice intermediate language data, the voice intermediate language generation unit 103 needs to change the reading and accent information of the word registered in the word dictionary depending on the word alone or the word connection. In addition, since the pause length may be determined by a single word or a word connection, the speech intermediate language generation unit 103 needs to insert a symbol indicating the pause length when generating the intermediate language data. For this reason, the speech intermediate language generation unit 103 uses a word connection dictionary to change readings, accent information, and insert symbols indicating pause lengths.

例えば、候補バッファ１１４に蓄積された単語番号が(No.3,No.101),(No.300,No.202,No.205,No.502)の場合、まず、音声中間言語生成部１０３は、No.3は名詞であるため自立語、No.101は係助詞であるため付属語と判断し、単語の繋がりが自立語と付属語である場合の規則を、単語繋がり辞書から取得する。図３で示された単語繋がり辞書では、前にある単語が自立語で、後にある単語が付属語の場合、単語間のポーズについて指定がなく、アクセント移動についても指定がない。このため、音声中間言語生成部１０３は、原則通りであればNo.3,No.101から単語辞書によるアクセント情報に従い「はし＾は」という中間言語文字列を生成する。しかし、No.101のプロパティ情報に「わ」と読み替える旨定められているため、音声中間言語生成部１０３は、No.3,No.101から「はし＾わ」という中間言語文字列を生成する。これは日本語として係助詞の「は」は「わ」と読む決まりによるものである。このように音声中間言語生成部１０３は単語辞書による単語のプロパティ情報による変更も行う。 For example, if the word numbers stored in the candidate buffer 114 are (No. 3, No. 101), (No. 300, No. 202, No. 205, No. 502), first, the speech intermediate language generation unit 103 No. 3 is a noun because it is a noun, and No. 101 is an adjunct because it is a particle, and the rules when the word connection is an independent word and an adjunct are obtained from the word connection dictionary. . In the word connection dictionary shown in FIG. 3, when the preceding word is an independent word and the following word is an adjunct word, there is no designation for the pause between words and no designation for accent movement. For this reason, the speech intermediate language generation unit 103 generates an intermediate language character string “Hashiyuha” from No. 3 and No. 101 according to the accent information by the word dictionary if it is in principle. However, since it is stipulated that the property information of No. 101 should be read as “wa”, the speech intermediate language generation unit 103 generates an intermediate language character string “Hashiyuwa” from No.3 and No.101. To do. This is due to the rule that the Japanese particle “ha” is read as “wa” in Japanese. In this way, the speech intermediate language generation unit 103 also changes the word property information based on the word dictionary.

次に、音声中間言語生成部１０３は、No.101とNo.300は係助詞「は」と動詞であるため、単語の繋がりが係助詞「は」と動詞である場合の規則を、単語繋がり辞書から取得する。単語繋がり辞書では、係助詞「は」と動詞である場合、単語間のポーズとして「／」が指定されている。したがって音声中間言語生成部１０３は、ポーズ長を示す「／」を挿入して、No.300の読みと結合し、「はし＾わ／わたら」という中間言語文字列を生成する。なお、No.101とNo.300は「付属語」と「自立語」でもあるため、前にある単語が付属語で、後にある単語が自立語である場合の規則を単語繋がり辞書で検索して一致するものが無いことを確認した後で、さらに細かい条件である'単語の繋がりが係助詞「は」と動詞'である場合の規則を検索することにしても良い。また、これとは逆に、細かい条件による規則を検索し、一致するものがない場合により大まかな条件による規則を検索することしても良い。 Next, since the speech intermediate language generation unit 103 has the verbs “No” and No. 300 as the verbs “ha” and the verb, the rule when the word linkage is the verb “ha” and the verb is used as the word linkage. Get from dictionary. In the word connection dictionary, when the particle is a verb “ha” and a verb, “/” is designated as a pause between words. Therefore, the speech intermediate language generation unit 103 inserts “/” indicating the pause length and combines it with the reading of No. 300 to generate the intermediate language character string “Hashiwa / Watara”. Note that No. 101 and No. 300 are also “adjunct words” and “independent words”, so the rules for when the preceding word is an attached word and the following word is an independent word are searched in the word connection dictionary. After confirming that there is no match, it is also possible to search for a rule when the word connection is a particle “ha” and a verb, which is a more detailed condition. Conversely, a rule based on a fine condition may be searched, and a rule based on a rough condition may be searched when there is no match.

そして、音声中間言語生成部１０３は、No.300とNo.202は自立語と付属語であるので、単語繋がり辞書に基づいてポーズ長は挿入せずに結合し、No.202とNo.205は付属語と付属語であるので単語繋がり辞書に基づいてポーズ長を挿入せずに結合し、No.401は単語繋がり辞書のプロパティに従って、ポーズ長１秒を意味する「．」を挿入する。そして音声中間言語生成部１０３は、(No.3,No.101),(No.300,No.202,No.205,No.502)から「はし＾わ／わたらな＾かった．」という中間言語データを生成する。 The speech intermediate language generation unit 103 combines No. 300 and No. 202 without inserting a pause length based on the word connection dictionary, because No. 300 and No. 202 are independent words and attached words. Are attached words and attached words, and are combined without inserting a pause length based on the word connection dictionary, and No. 401 inserts “.” Meaning a pause length of 1 second according to the property of the word connection dictionary. Then, the speech intermediate language generation unit 103 determines from “No.3, No.101”, (No.300, No.202, No.205, No.502) “Hashiwa / No. The intermediate language data is generated.

他に候補バッファ１１４に蓄積された単語番号が（No.1,No.401）の場合、音声中間言語生成部１０３は、No.1は名詞であり、No.401は接尾であるため、単語の繋がりが名詞と接尾である場合の規則を、単語繋がり辞書から取得する。単語繋がり辞書では、前にある単語が名詞で、後にある単語が接尾の場合、単語間のポーズについて指定はないが、アクセント移動についてはプロパティ情報に従う旨が指定されている。このため、No.1の単語のアクセントは「ちゅ＾うごく」と登録されているにもかかわらず、No.401のプロパティで定めた規則「直前最後にアクセント」に従い変更されることとなる。つまり音声中間言語生成部１０３は、原則通りであれば「ちゅ＾うごくじん」となるがプロパティ情報に定められた規則により「ちゅうごく＾じん」となる。この規則は、日本語では「人」という接尾が直前の名詞の末尾にアクセント移動を起こすという性質を持っているために規定した規則であり、音声中間言語生成部１０３はこの規則に従って中間言語文字列を生成する。 In addition, when the word number stored in the candidate buffer 114 is (No.1, No. 401), the speech intermediate language generation unit 103 uses the word because No. 1 is a noun and No. 401 is a suffix. The rule when the connection is a noun and a suffix is obtained from the word connection dictionary. In the word connection dictionary, when the preceding word is a noun and the following word is a suffix, there is no designation for the pause between words, but the accent movement is designated to follow the property information. For this reason, the accent of the word No. 1 is changed according to the rule “accent at the last minute” defined in the property of No. 401, even though “Chyugoku” is registered. In other words, the speech intermediate language generation unit 103 becomes “Chuyu-jin” according to the rule, but “Chugoku-jin” according to the rules defined in the property information. This rule is defined in Japanese because the suffix “person” has the property of causing accent movement at the end of the immediately preceding noun, and the speech intermediate language generation unit 103 follows this rule to determine the intermediate language characters. Generate a column.

つまり、音声中間言語生成部１０３は、候補選択バッファ１１６に蓄積された全ての単語番号から単語辞書にアクセント情報として保持されている中間言語文字列を取得し、取得した中間言語文字列に対して単語繋がり辞書を用いて適切な読みやアクセントに変更する処理を行い、変更する処理が行われた中間言語文字列を全て結合して中間言語データを生成する。 That is, the speech intermediate language generation unit 103 acquires an intermediate language character string held as accent information in the word dictionary from all the word numbers stored in the candidate selection buffer 116, and for the acquired intermediate language character string Processing to change to appropriate reading and accent using a word connection dictionary is performed, and intermediate language character strings that have undergone the processing to be changed are combined to generate intermediate language data.

なお、本実施の形態においては、音声中間言語生成部１０３が、名詞が自立語であることや係助詞が付属語であることを判断しているが、別途テーブルにより名詞が自立語である旨等を対応付けて保持してもよい。 In the present embodiment, the speech intermediate language generation unit 103 determines that the noun is an independent word or that the counsel is an auxiliary word. However, a separate table indicates that the noun is an independent word. Etc. may be held in association with each other.

音声中間言語出力部１０４は、音声中間言語生成部１０３により生成された音声中間言語データをファイルとして出力する。この出力されたファイルを音声の合成を行う装置に入力することで、音声の合成が可能となる。 The voice intermediate language output unit 104 outputs the voice intermediate language data generated by the voice intermediate language generation unit 103 as a file. By inputting the output file to a device that synthesizes speech, it is possible to synthesize speech.

表記出力部１０２は、候補選択バッファ１１６に蓄積された単語番号を、単語辞書を用いて漢字交じり表記に変換し、変換した漢字交じりの表記による文書データを表記データとして出力する。表記出力部１０２を備えることで、音声中間言語出力部１０４により出力された音声中間言語データに対応した漢字交じりの表記による文書データが出力されるため、利用者は音声中間言語データの内容の確認が容易となる。 The notation output unit 102 converts the word numbers stored in the candidate selection buffer 116 into kanji mixed notation using a word dictionary, and outputs the document data with the converted kanji mixed notation as notation data. By providing the notation output unit 102, document data with kanji mixed notation corresponding to the speech intermediate language data output by the speech intermediate language output unit 104 is output, so that the user can confirm the contents of the speech intermediate language data Becomes easy.

また音声中間言語生成装置１００において、表記表示処理部１１７により漢字交じり表記がＣＲＴ１５２上に表示されている場合、利用者が表示されている漢字交じり表記から、単語の削除や変更などの編集を行うことが考えられる。この場合、利用者による漢字交じり表記の修正に対応させて候補選択バッファ１１６に蓄積された単語番号を修正することで、出力される中間言語データを編集することが可能となる。他にもワープロで用いられる編集手段を音声中間言語生成装置１００に適用することで、様々な編集手段を容易に類推することが可能である。例えば読みと漢字交じり表記と品詞情報等を単語辞書に登録する単語登録などが考えられる。 In the speech intermediate language generation apparatus 100, when the kanji mixed notation is displayed on the CRT 152 by the notation display processing unit 117, the user performs editing such as deletion or change of words from the kanji mixed notation displayed. It is possible. In this case, the intermediate language data to be output can be edited by correcting the word number stored in the candidate selection buffer 116 so as to correspond to the correction of the kanji mixed notation by the user. In addition, various editing means can be easily inferred by applying editing means used in a word processor to the speech intermediate language generation apparatus 100. For example, word registration for registering reading, kanji mixed notation, part-of-speech information, etc. in a word dictionary can be considered.

次に、以上により構成された本実施の形態に係る音声中間言語生成装置１００において入力された文字列から中間言語データを出力するまでの処理について説明する。図５は本実施の形態にかかる音声中間言語生成装置１００において入力された文字列から中間言語データを出力するまでの全体処理を示すフローチャートである。 Next, a process until the intermediate language data is output from the input character string in the speech intermediate language generation apparatus 100 according to the present embodiment configured as described above will be described. FIG. 5 is a flowchart showing the entire processing from the input character string to the output of the intermediate language data in the speech intermediate language generation apparatus 100 according to the present embodiment.

入力処理部１０１は、利用者により入力された文字列の処理を行う（ステップＳ４０１）。なお、入力処理された文字列は平仮名であるものとする。そして入力バッファ１１１は、入力処理部１０１により処理が行われた文字列を蓄積する（ステップＳ４０２）。 The input processing unit 101 processes the character string input by the user (step S401). It is assumed that the input character string is a hiragana character. The input buffer 111 accumulates the character string processed by the input processing unit 101 (step S402).

次に、入力処理部１０１は、利用者から入力処理が行われた文字列の候補を表示する指示があるか否か判定する（ステップＳ４０３）。 Next, the input processing unit 101 determines whether or not there is an instruction to display candidate character strings for which input processing has been performed (step S403).

入力処理部１０１により候補の表示の指示があったと判定した場合（ステップＳ４０３：Ｙｅｓ）、候補取得部１１３は、単語辞書に対して入力処理が行われた文字列の読みに基づいて検索を行い、入力バッファ１１１に蓄積された平仮名による文字列と読みが一致する単語の組み合わせによる候補を取得する（ステップＳ４０４）。そして候補バッファ１１４は、候補取得部１１３により取得された候補を蓄積する（ステップＳ４０５）。 If it is determined that the input processing unit 101 has instructed display of candidates (step S403: Yes), the candidate acquisition unit 113 performs a search based on the reading of the character string that has been input to the word dictionary. Then, candidates based on combinations of words whose readings coincide with the character strings of hiragana stored in the input buffer 111 are acquired (step S404). Then, the candidate buffer 114 accumulates the candidates acquired by the candidate acquisition unit 113 (step S405).

次に、候補出力部１１５は、候補バッファ１１４に蓄積された候補をＣＲＴ１５２上に出力する（ステップＳ４０６）。そして候補選択バッファ１１６は、ＣＲＴ１５２に表示された候補から利用者が選択した漢字交じり表記に対応した単語番号を蓄積する（ステップＳ４０７）。 Next, the candidate output part 115 outputs the candidate accumulate | stored in the candidate buffer 114 on CRT152 (step S406). Then, the candidate selection buffer 116 accumulates the word numbers corresponding to the kanji mixed notation selected by the user from the candidates displayed on the CRT 152 (step S407).

入力処理部１０１により候補を表示する指示がなかったと判定した場合（ステップＳ４０３：Ｎｏ）、候補選択バッファ１１６は、入力処理された文字列を蓄積する（ステップＳ４０８）。 If the input processing unit 101 determines that there is no instruction for displaying candidates (step S403: No), the candidate selection buffer 116 accumulates the input character string (step S408).

そして表記表示処理部１１７は、候補選択バッファ１１６に蓄積された単語番号を漢字交じり表記に変換してＣＲＴ１５２上に出力する（ステップＳ４０９）。なお、候補を表示する指示がなかったため、候補選択バッファ１１６が入力された文字列をそのまま蓄積している場合は、入力された文字列を表示する。 Then, the notation display processing unit 117 converts the word numbers accumulated in the candidate selection buffer 116 into kanji mixed notation and outputs it on the CRT 152 (step S409). Since there is no instruction to display candidates, if the input character string is stored as it is in the candidate selection buffer 116, the input character string is displayed.

そして入力処理部１０１により入力処理が終了したか否か判定される（ステップＳ４１０）。本実施の形態においては、ＥＳＣキーが押し下げられた場合に、入力処理が終了したと判定する。ＥＳＣキーが押し下げられなかった場合は、入力処理が終了していないと判定され（ステップＳ４１０：Ｎｏ）、ステップＳ４０１の文字列の入力処理から再び開始する。 Then, it is determined by the input processing unit 101 whether or not the input processing has been completed (step S410). In the present embodiment, it is determined that the input process has been completed when the ESC key is pressed. If the ESC key is not depressed, it is determined that the input process has not ended (step S410: No), and the process starts again from the character string input process in step S401.

入力処理が終了したと判定された場合（ステップＳ４１０：Ｙｅｓ）、音声中間言語生成部１０３は、候補選択バッファ１１６に蓄積された単語番号あるいは文字列に基づいて、音声中間言語データを生成する（ステップＳ４１１）。なお、文字列を蓄積している場合は、そのまま音声中間言語データとして生成する。 When it is determined that the input process has been completed (step S410: Yes), the speech intermediate language generation unit 103 generates speech intermediate language data based on the word numbers or character strings stored in the candidate selection buffer 116 ( Step S411). When character strings are accumulated, they are generated as speech intermediate language data as they are.

そして、音声中間言語出力部１０４は、音声中間言語生成部１０３により生成された音声中間言語データをファイルとして出力する（ステップＳ４１２）。 Then, the audio intermediate language output unit 104 outputs the audio intermediate language data generated by the audio intermediate language generation unit 103 as a file (step S412).

なお、本実施の形態の処理手順を、上述した全体処理手順に制限するものではなく、例えば、最後に中間言語データを出力するだけではなく、中間結果として音声中間言語データを出力しても良い。 The processing procedure of the present embodiment is not limited to the overall processing procedure described above. For example, not only the intermediate language data is output at the end but also the audio intermediate language data may be output as an intermediate result. .

図６は、本実施の形態にかかる音声中間言語生成装置１００のハードウェア構成を示した図である。本実施の形態にかかる音声中間言語生成装置１００は、制御装置となるＣＰＵ（Central Processing Unit）５０１と、記憶装置であるＲＯＭ（Read Only Memory）５０２やＲＡＭ(Randam Access Memory)５０３と、ＨＤＤ(Hard Disk Drive)、ＣＤドライブ装置などの外部記憶装置５０６と、ディスプレイ装置などの表示装置５０７と、キーボードやマウスなどの入力装置５０４と、外部と通信可能するための通信インターフェース５０５と、それらを接続するバス５０８を備えており、通常のコンピュータを利用したハードウェア構成となっている。 FIG. 6 is a diagram showing a hardware configuration of the speech intermediate language generation apparatus 100 according to the present embodiment. The speech intermediate language generation apparatus 100 according to the present embodiment includes a CPU (Central Processing Unit) 501 serving as a control device, a ROM (Read Only Memory) 502 or a RAM (Randam Access Memory) 503 serving as a storage device, and an HDD ( Hard Disk Drive), an external storage device 506 such as a CD drive device, a display device 507 such as a display device, an input device 504 such as a keyboard and a mouse, and a communication interface 505 for communicating with the outside And a hardware configuration using a normal computer.

本実施形態の音声中間言語生成装置１００で実行される音声中間言語生成プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The speech intermediate language generation program executed by the speech intermediate language generation apparatus 100 of the present embodiment is a file in an installable format or an executable format, and is a CD-ROM, flexible disk (FD), CD-R, DVD (Digital It is recorded on a computer readable recording medium such as Versatile Disk).

また、本実施形態の音声中間言語生成装置１００で実行される音声中間言語生成プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施の形態の音声中間言語生成装置１００で実行される音声中間言語生成プログラムをインターネット等のネットワーク経由で提供または配布するように構成しても良い。 The speech intermediate language generation program executed by the speech intermediate language generation apparatus 100 of the present embodiment is stored on a computer connected to a network such as the Internet and is provided by being downloaded via the network. May be. Further, the voice intermediate language generation program executed by the voice intermediate language generation apparatus 100 according to the present embodiment may be provided or distributed via a network such as the Internet.

また、本実施形態の音声中間言語生成プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 Moreover, you may comprise so that the audio | voice intermediate language generation program of this embodiment may be provided by previously incorporating in ROM etc.

本実施の形態にかかる音声中間言語生成装置１００で実行される音声中間言語生成プログラムは、上述した各部（入力処理部１０１、表示選択部１１０、表記出力部１０２、音声中間言語生成部１０３、音声中間言語出力部１０４）を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ（プロセッサ）５０１が上記記憶媒体から音声中間言語生成プログラムを読み出して実行することにより上記各部が主記憶装置上にロードされ、入力処理部１０１、表示選択部１１０、表記出力部１０２、音声中間言語生成部１０３、音声中間言語出力部１０４が主記憶装置上に生成されるようになっている。 The speech intermediate language generation program executed by the speech intermediate language generation apparatus 100 according to the present embodiment includes the above-described units (input processing unit 101, display selection unit 110, notation output unit 102, speech intermediate language generation unit 103, speech The module configuration includes an intermediate language output unit 104). As actual hardware, the CPU (processor) 501 reads out and executes the voice intermediate language generation program from the storage medium, so that the above units are on the main storage device. The input processing unit 101, the display selection unit 110, the notation output unit 102, the speech intermediate language generation unit 103, and the speech intermediate language output unit 104 are generated on the main storage device.

また、本実施の形態にかかる記憶部１０５に記憶される単語辞書では、動詞「渡ら」、助動詞「なかっ」等の活用のある単語について、活用が違うものは別の単語として扱っている。しかし、動詞などの活用語尾は規則的に変化を持たせることが可能である。そこで単語辞書に、活用語尾を除いた見出しのみ、例えば「わた；渡」のみ登録し、候補取得部１１３において、単語辞書から「わた；渡」を取得して活用語尾を展開したもの「わたら；渡ら」「わたり；渡り」「わたる；わたる」「わたれ；渡れ」「わたろ；わたろ」等と、入力された文字列と比較するなどの処理手順も考えられる。 Further, in the word dictionary stored in the storage unit 105 according to the present embodiment, words that are used differently, such as the verb “Watara” and the auxiliary verb “None”, are handled as different words. However, verb endings such as verbs can be regularly changed. Therefore, only the headline excluding the used endings, for example, “Wawa; Hand” is registered in the word dictionary, and the candidate acquiring unit 113 acquires “Wawa; Hand” from the word dictionary and develops the used endings “Watara; Processing procedures such as comparison of an input character string such as “Watari”, “Watari; Crossing”, “Wataru; Wataru”, “Watare;

本実施の形態にかかる音声中間言語生成装置１００により出力された音声中間言語データを音声合成装置に入力することで、漢字交じり表記などに基づいて適切な読みと発音記号を付加されているので期待した語調により音声を合成することが可能となる。 It is expected that appropriate reading and phonetic symbols are added based on kanji mixed notation by inputting the speech intermediate language data output by the speech intermediate language generation device 100 according to the present embodiment to the speech synthesizer. It is possible to synthesize speech based on the tone.

また、音声中間言語生成装置１００を使用することで、利用者はアクセントや音節に習熟していなくとも、ワープロの仮名漢字変換と同様の処理手順により、音声中間言語データを生成することが可能となる。 In addition, by using the speech intermediate language generation device 100, the user can generate speech intermediate language data by the same processing procedure as that of word processor kana-kanji conversion even if the user is not familiar with accents and syllables. Become.

また、従来は、読み上げ原稿を汎用ワープロで作成し、その後、それに対応する音声中間言語データを生成しなければならなかったが、音声中間言語生成装置１００が音声中間言語出力部１０４と表記出力部１０２を備えることにより、読み上げ原稿を書き上げると同時に、期待する読みがついた音声中間言語データを生成することが可能となる。 Conventionally, a read-out original must be created by a general-purpose word processor, and then the corresponding audio intermediate language data must be generated. The audio intermediate language generation apparatus 100 includes an audio intermediate language output unit 104 and a notation output unit. With the provision of 102, it is possible to generate the speech intermediate language data with the expected reading at the same time as writing the reading document.

例えば、現在、カーナビで、インターネットのＷＥＢ画面上による、記事原稿表示と、表示原稿に対する読み上げを行うサービスがなされている。この場合、ＷＥＢに表示されている原稿は、専門のライターが作成し、これに対する音声中間言語データの生成は別作業で行っている。しかし、音声中間言語生成装置１００により、ライターが原稿を書き上げた時点で音声中間言語データの生成も完了することとなり、音声中間言語生成の作業によるコストの削減及び音声中間言語データの生成までの時間の短縮が可能となる。 For example, currently, there is a service for displaying an article manuscript and reading a displayed manuscript on a WEB screen on the Internet with a car navigation system. In this case, the manuscript displayed on the WEB is created by a specialized writer, and the generation of the voice intermediate language data is performed in a separate operation. However, the generation of the audio intermediate language data is completed by the audio intermediate language generation apparatus 100 when the writer has finished writing the original, and the cost for the operation of generating the audio intermediate language and the time until the generation of the audio intermediate language data are increased. Can be shortened.

（第２の実施の形態）
第１の実施の形態では、音声中間言語生成装置１００は、入力された文字列の読みと、単語辞書から単語の組み合わせによる読みが一致した候補を表示することとした。しかしながら、第１の実施の形態における音声中間言語生成装置１００では、利用者はアクセントを自由に設定することが出来ない。従って第２の実施の形態に係る音声中間言語生成装置は、入力された文字列に対して、さらにアクセントを付加した複数の候補を表示することとした。 (Second Embodiment)
In the first embodiment, the speech intermediate language generation apparatus 100 displays candidates in which the reading of the input character string and the reading by the combination of words from the word dictionary match. However, in the speech intermediate language generation apparatus 100 according to the first embodiment, the user cannot freely set an accent. Therefore, the speech intermediate language generation apparatus according to the second embodiment displays a plurality of candidates in which accents are further added to the input character string.

図７は、第２の実施の形態にかかる音声中間言語生成装置６００の構成を示すブロック図である。上述した第１の実施の形態にかかる音声中間言語生成装置１００とは、表示選択部１１０とは処理が異なる表示選択部６１０に変更され、音声中間言語生成部１０３とは処理が異なる音声中間言語生成部６０２に変更され、そして候補音声出力部６０１が追加されている点で異なる。以下の説明では、上述した実施の形態１と同一の構成要素には同一の符号を付してその説明を省略している。 FIG. 7 is a block diagram illustrating a configuration of the speech intermediate language generation device 600 according to the second embodiment. The audio intermediate language generation apparatus 100 according to the first embodiment described above is changed to the display selection unit 610 that is different in processing from the display selection unit 110, and is different from the audio intermediate language generation unit 103 in audio intermediate language. The difference is that the candidate speech output unit 601 is added to the generation unit 602 and the candidate speech output unit 601 is added. In the following description, the same components as those in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

表示選択部６１０は、ＣＲＴ１５２へ漢字交じり表記及びアクセントを付けたカタカナ表記の候補を表示し、利用者がなされた漢字交じり表記又はアクセントを付けたカタカナ表記を選択する。また表示選択部６１０は、入力バッファ１１１、候補表示処理部６１１、候補選択バッファ６１３及び表記表示処理部１１７を備える。 The display selection unit 610 displays the kana kanji notation and accented katakana notation candidates on the CRT 152, and selects the kanji kanji notation or accented katakana notation made by the user. The display selection unit 610 includes an input buffer 111, a candidate display processing unit 611, a candidate selection buffer 613, and a notation display processing unit 117.

候補表示処理部６１１は、入力バッファ１１１に蓄積された平仮名による文字列に基づいて、単語辞書から仮名漢字変換される可能性のある候補を取得し、さらに平仮名の文字列にアクセントを示す記号’＾’を付けた複数の候補を生成し、これら候補をＣＲＴ１５２上に表示する処理を行う。また候補表示処理部６１１は、候補取得部１１３、候補生成部６１２、候補バッファ１１４及び候補出力部１１５を備える。 The candidate display processing unit 611 acquires a candidate that may be converted to kana-kanji from the word dictionary based on the character string of hiragana stored in the input buffer 111, and further displays a symbol that indicates an accent in the character string of hiragana A plurality of candidates with ^ 'are generated, and processing for displaying these candidates on the CRT 152 is performed. The candidate display processing unit 611 includes a candidate acquisition unit 113, a candidate generation unit 612, a candidate buffer 114, and a candidate output unit 115.

候補生成部６１２は、入力バッファ１１１に蓄積された平仮名による文字列に基づいて、それぞれ異なるアクセントを付加した複数の候補を生成して、候補バッファ１１４に蓄積する。候補生成部６１２を備えることにより、入力された平仮名の文字列に対して、異なるアクセントを付けた複数の候補の生成が可能となる。 The candidate generation unit 612 generates a plurality of candidates to which different accents are added based on the hiragana character strings stored in the input buffer 111 and stores them in the candidate buffer 114. By including the candidate generation unit 612, it is possible to generate a plurality of candidates with different accents on the input hiragana character string.

具体的には、第１の実施の形態では、入力バッファ１１１に蓄積された平仮名の文字列が「かなりすき」であった場合、ＣＲＴ１５２上には「１．かなり好き」としか表示されず、利用者が‘１’を選択すると、中間言語データには「か＾なりすき」と出力される。しかし、第１の実施の形態にかかる音声中間言語生成装置１００では、２００４年現在の若者の東京方言のようなアクセント「かなりすき」と出力することが出来ない。 Specifically, in the first embodiment, if the character string of the hiragana stored in the input buffer 111 is “pretty good”, only “1. pretty much like” is displayed on the CRT 152, When the user selects “1”, “Kana Narusuki” is output to the intermediate language data. However, the speech intermediate language generation apparatus 100 according to the first embodiment cannot output an accent “satisfactory” as in the Tokyo dialect of young people as of 2004.

しかし第２の実施の形態では、入力バッファ１１１に蓄積された平仮名による文字列が「かなりすき」であった場合、候補取得部１１３により取得された候補「１．かなり好き」以外に、候補生成部６１２が生成した候補が候補バッファ１１４に蓄積され、候補出力部１１５によりＣＲＴ１５２上に出力される。 However, in the second embodiment, if the character string based on the hiragana stored in the input buffer 111 is “substantially favorite”, candidate generation other than the candidate “1. The candidates generated by the unit 612 are accumulated in the candidate buffer 114, and are output on the CRT 152 by the candidate output unit 115.

図８は、本実施の形態にかかる音声中間言語生成装置６００の候補出力部１１５により出力された候補の表示の一例を示した図である。本図で示すように、漢字交じりの候補「１．かなり好き」の候補の他に、「２．カナリスキ」「３．カ＾ナリスキ」のように、入力された文字列に対してアクセント記号を付けたカタカナによる文字列を追加して表示する。なお、本実施の形態においては、表示する際にカタカナで表示することとしたが、これは漢字交じり候補と区別を、利用者が容易に判断できるようにするためである。 FIG. 8 is a diagram showing an example of the display of candidates output by the candidate output unit 115 of the speech intermediate language generation device 600 according to the present embodiment. As shown in this figure, in addition to the kanji candidate “1. I like pretty much”, accent marks are entered for the input character string, such as “2. Kanariski” and “3. Adds a character string by the attached katakana and displays it. In the present embodiment, katakana is used for display, but this is to make it easy for the user to distinguish between kanji-mixing candidates.

図７に戻り、候補音声出力部６０１は、候補出力部１１５によりＣＲＴ１５２上により候補が表示する時に、それぞれの候補を音声で順次出力する。この音声出力により、利用者は実際のアクセントを確認して候補を選択することが可能となる。 Returning to FIG. 7, when the candidate output unit 601 displays candidates on the CRT 152 by the candidate output unit 115, the candidate speech output unit 601 sequentially outputs each candidate by voice. With this audio output, the user can confirm the actual accent and select a candidate.

つまり、候補生成部６１２により生成された候補の表示では、アクセント記号とカタカナの組み合わせによる文字列のため、これらの候補から選択する際に、利用者がアクセント等の知識が必要になる。そこで候補音声出力部６０１は、候補として表示された文字列について音声合成して出力することとした。これにより利用者はアクセント等の知識がなくとも、最適なアクセント記号とカタカナの組み合わせによる文字列の選択が可能となった。なお、候補の音声出力する候補音声出力部６０１を、第１の実施の形態にかかる音声中間言語生成装置１００に備えることにしても良い。 That is, in the display of candidates generated by the candidate generation unit 612, since the character string is a combination of accent symbols and katakana, the user needs knowledge of accents and the like when selecting from these candidates. Therefore, the candidate speech output unit 601 performs speech synthesis on the character strings displayed as candidates and outputs them. As a result, the user can select a character string by combining an optimum accent symbol and katakana without knowledge of accents. Note that the candidate speech output unit 601 that outputs candidate speech may be provided in the speech intermediate language generation apparatus 100 according to the first embodiment.

そして、候補選択バッファ６１３は、ＣＲＴ１５２に表示された候補から、利用者が漢字交じり候補を選択した場合は、選択した候補の単語番号を蓄積し、アクセントを付けたカタカナによる文字列を選択した場合は、選択した候補のアクセントを付けた中間言語文字列を蓄積する。 The candidate selection buffer 613 stores the word number of the selected candidate when the user selects a kanji-mixing candidate from the candidates displayed on the CRT 152, and selects the accented katakana character string. Accumulates an intermediate language character string with the accent of the selected candidate.

具体的には利用者が‘２’を選択した場合には候補選択バッファ６１３には（かなりすき）が、‘３’を選択した場合には候補選択バッファ６１３には（か＾なりすき）が蓄積される。なお、候補として表示する際はカタカナだったが、蓄積する際には平仮名となる。 Specifically, when the user selects “2”, the candidate selection buffer 613 has (satisfactory), and when “3” is selected, the candidate selection buffer 613 has (Kanasuki). Accumulated. In addition, it was katakana when displayed as a candidate, but when it is accumulated, it becomes hiragana.

音声中間言語生成部６０２は、これらアクセント付きの文字列が候補選択バッファ６１３に蓄積されている場合は、第１の実施の形態で示した単語繋がり辞書を用いた処理等を行わず、そのまま音声中間言語データとして生成する。なお、単語番号が候補選択バッファ６１３に蓄積されている場合は、第１の実施の形態と同様の処理を行う。 When these accented character strings are stored in the candidate selection buffer 613, the speech intermediate language generation unit 602 does not perform the processing using the word connection dictionary shown in the first embodiment, and performs speech processing as it is. Generate as intermediate language data. If word numbers are stored in the candidate selection buffer 613, the same processing as in the first embodiment is performed.

本実施の形態による音声中間言語生成装置６００により、利用者は表示された候補を選択することで、利用者の所望した語調による音声中間言語データの生成が容易となる。 With the speech intermediate language generation apparatus 600 according to the present embodiment, the user can easily generate speech intermediate language data in the tone desired by the user by selecting the displayed candidate.

本実施の形態においては、候補として読みとアクセントの組み合わせによる候補を追加して表示したが、他にポーズ長等との組み合わせにより候補をさらに追加して表示するなどによる実施の形態は容易に考えられる。 In the present embodiment, candidates are added and displayed as combinations of reading and accent as candidates, but other embodiments such as displaying additional candidates by combining with the pose length, etc. can be easily considered. It is done.

なお、本実施の形態にかかる候補音声出力部６０１は、表示された候補を順次、音声合成して出力することとしたが、利用者が指定した候補を音声出力することにしても良い。 Although the candidate speech output unit 601 according to the present embodiment sequentially synthesizes and outputs the displayed candidates, the candidate specified by the user may be output by speech.

（第３の実施の形態）
第２の実施の形態は、利用者により入力された文字列が読み情報である場合に音声合成用のアクセント等を付加した中間言語文字列を生成していた。しかし、一般的な日本語ワープロなどでは数字や記号、アルファベットの入力する場合がある。このため、第３の実施の形態による音声中間言語生成装置は、平仮名による読み情報以外の入力文字列について読みを付加することとした。 (Third embodiment)
In the second embodiment, when a character string input by a user is reading information, an intermediate language character string to which an accent for speech synthesis is added is generated. However, in general Japanese word processors, numbers, symbols, and alphabets may be entered. For this reason, the speech intermediate language generation apparatus according to the third embodiment adds readings to input character strings other than reading information based on hiragana.

図９は、第３の実施の形態にかかる音声中間言語生成装置８００の構成を示すブロック図である。上述した第２の実施の形態に係る音声中間言語生成装置６００とは、入力処理部１０１とは処理が異なる入力処理部８０１に変更され、表示選択部６１０とは処理が異なる表示選択部８１０に変更され、音声中間言語生成部６０２とは処理が異なる音声中間言語生成部８０２に変更され、記憶部１０５とは記憶するテーブルが異なる記憶部８０３に変更された点で異なる。以下の説明では、上述した実施の形態２と同一の構成要素には同一の符号を付してその説明を省略している。 FIG. 9 is a block diagram illustrating a configuration of a speech intermediate language generation apparatus 800 according to the third embodiment. The speech intermediate language generation apparatus 600 according to the second embodiment described above is changed to an input processing unit 801 that is different in processing from the input processing unit 101, and is changed to a display selection unit 810 that is different in processing from the display selection unit 610. The difference is that the processing is changed to a voice intermediate language generation unit 802 that is different in processing from the voice intermediate language generation unit 602, and the storage unit 105 is different from the storage unit 105 in that the stored table is changed to a different storage unit 803. In the following description, the same components as those in the second embodiment described above are denoted by the same reference numerals, and the description thereof is omitted.

入力処理部８０１は、第２の実施の形態にかかる入力処理部１０１による入力処理の他に、数値、記号、アルファベットが入力処理された場合、これらを切り分けてから入力バッファ１１１に蓄積する。 In addition to the input processing by the input processing unit 101 according to the second embodiment, the input processing unit 801 separates and stores them in the input buffer 111 when numerical values, symbols, and alphabets are input.

具体的には「ごうけいは１０００えんです」と入力された場合、入力処理部８０１は、「ごうけいは／１０００／えんです」と切り分けてから、入力バッファ１１１に蓄積する。 Specifically, when “Goke is 1000 yen” is input, the input processing unit 801 separates “Goke is / 1000 / en” and stores it in the input buffer 111.

他には、例えば「５０ｋｍ」と入力された場合、入力処理部８０１は、「５０／ｋｍ」と切り分けてから、入力バッファ１１１に蓄積する。 In addition, for example, when “50 km” is input, the input processing unit 801 separates it into “50 / km” and stores it in the input buffer 111.

他には、例えば「ツリー（tree）構造」」と入力された場合、入力処理部８０１は、「ツリー／(／tree／)／構造」と切り分けてから、入力バッファ１１１に蓄積する。 In addition, for example, when “tree structure” is input, the input processing unit 801 separates “tree / (/ tree /) / structure” and stores them in the input buffer 111.

表示選択部８１０は、入力処理された文字列が数値、記号、アルファベットなどの平仮名以外の文字列の場合は候補として表示するか否かに関わらず、そのまま候補選択バッファ８１１に蓄積させる。また、入力された文字列が平仮名による文字列の場合は第２の実施の形態の表示選択部６１０と同様の処理を行う。 If the input character string is a character string other than a hiragana character such as a numerical value, a symbol, or an alphabet, the display selection unit 810 stores the character string in the candidate selection buffer 811 as it is regardless of whether or not the character string is displayed as a candidate. If the input character string is a character string using hiragana, the same processing as that performed by the display selection unit 610 of the second embodiment is performed.

候補選択バッファ８１１は、入力処理された文字列が数値、記号、アルファベットなどの平仮名以外の文字列の場合でも、入力バッファ１１１に蓄積された文字列をそのままに蓄積する。他の入力処理された文字列については、第２の実施の形態の候補選択バッファ６１３と同様の態様により蓄積する。 The candidate selection buffer 811 stores the character string stored in the input buffer 111 as it is even when the input character string is a character string other than a hiragana character such as a numerical value, a symbol, or an alphabet. Other input-processed character strings are accumulated in the same manner as the candidate selection buffer 613 of the second embodiment.

音声中間言語生成部８０２は、数値、記号、アルファベットなどの平仮名以外の文字列が候補選択バッファ８１１に蓄積されている場合は、入力文字列に読みを付与する処理を行い、音声中間言語データを生成する。なお、単語番号やアクセント付き文字列が候補選択バッファ８１１に蓄積されている場合は、第２の実施の形態と同様の処理を行う。 When a character string other than a hiragana character such as a numerical value, a symbol, or an alphabet is accumulated in the candidate selection buffer 811, the speech intermediate language generation unit 802 performs a process of adding a reading to the input character string, and converts the speech intermediate language data Generate. When word numbers and accented character strings are stored in the candidate selection buffer 811, the same processing as in the second embodiment is performed.

まず、候補選択バッファ８１１に蓄積された文字列がアラビア数字の連続の場合は、音声中間言語生成部８０２は、数字という品詞を文字列とみなし、後述する処理により読み及びアクセントを付与する。数字の読み及びアクセントの付け方については、様々な方法が考えられるが、本実施の形態では、数字の読み及びアクセントを保持する数字テーブルを記憶部８０３にあらかじめ記憶することとし、音声中間言語生成部８０２は、この数字テーブルから読み及びアクセントを取得する。また、数字テーブルを桁数毎に複数のテーブルを分けることとする。 First, when the character string stored in the candidate selection buffer 811 is a sequence of Arabic numerals, the speech intermediate language generation unit 802 regards the part of speech, which is a numeral, as a character string, and adds readings and accents by processing described later. Various methods are conceivable for reading numbers and adding accents. In this embodiment, a number table that holds readings and accents for numbers is stored in the storage unit 803 in advance, and a speech intermediate language generation unit Step 802 obtains readings and accents from this number table. Also, a number table is divided into a plurality of tables for each number of digits.

具体的には、記憶部８０３は、数字テーブルとして４桁毎に分けた複数のテーブルを記憶する。数字テーブルは、まず下１桁から４桁までの範囲、つまり「０〜９９９９」の読み及びアクセントを保持するテーブルと、そして下５桁から８桁までの範囲、つまり「１００００〜９９９９００００」の読み及びアクセントを保持するテーブルと、下９桁から１２桁までの範囲、つまり「１００００００００〜９９９９００００００００」の読み及びアクセントを保持するテーブルと、下１３桁から１６桁までの範囲、つまり「１００００００００００００〜９９９９００００００００００００」の読み及びアクセントを保持するテーブルと、下１７桁から２０桁までの範囲、つまり「１００００００００００００００００〜９９９９００００００００００００００００」の読み及びアクセントを保持するテーブルとする。なおテーブルの数は、音声中間言語データとして出力する必要のある桁数に応じて増加あるいは減少させてもよい。 Specifically, the storage unit 803 stores a plurality of tables divided into four digits as a number table. The number table first has a range from the last 1 digit to 4 digits, that is, a table holding a reading and accent of “0 to 9999”, and a range from the last 5 digits to 8 digits, ie, “10000 to 999900000”. And a table holding accents, a range from the last 9 digits to 12 digits, that is, a table holding readings and accents of “100000000 to 999900000000”, and a range from the last 13 digits to 16 digits, ie “100000000000000 to 999000000000000” And a table holding readings and accents in the range from the last 17 digits to 20 digits, that is, “10000000000000000-999990000000000000”. Note that the number of tables may be increased or decreased according to the number of digits that need to be output as voice intermediate language data.

図１０―１は、下１桁から４桁までの範囲の読み及びアクセントを保持するテーブルを示した図である。そして図１０―２は、下５桁から８桁までの範囲の読み及びアクセントを保持するテーブルを示した図である。また、同様の保持方法により、下９桁から１２桁までの範囲や下１３桁から１６桁までの範囲の読み及びアクセントをテーブルとして保持する。 FIG. 10A is a diagram illustrating a table that holds readings and accents in the range of the last 1 to 4 digits. FIG. 10-2 shows a table that holds readings and accents in the range of the last 5 digits to 8 digits. Further, by the same holding method, readings and accents in the range from the lower 9 digits to the 12 digits and the range from the lower 13 digits to the 16 digits are held as a table.

そして、音声中間言語生成部８０２は、アラビア数字の桁数を確認し、桁数に対応する読み及びアクセントをあらかじめ生成した数字の読み及びアクセントを記憶部８０３で保持するテーブルから取得する。 Then, the speech intermediate language generation unit 802 confirms the number of digits of the Arabic numerals, and acquires the readings and accents corresponding to the number of digits from the table that is stored in the storage unit 803 with the number readings and accents generated in advance.

例えば、候補選択バッファ８１１に蓄積された文字列が「９９９９０１００」の場合、音声中間言語生成部８０２は、「９９９９００００」と「１００」に切り分け、それぞれの読み及びアクセントを、対応するテーブルから取得する。音声中間言語生成部８０２は、数字テーブルから「きゅうせ＾ん／き＾ゅ―ひゃく／き＾ゅーじゅー／きゅーま＾ん」と「ひゃく」を取得するので、これらを組み合わせて「きゅうせ＾ん／き＾ゅ―ひゃく／き＾ゅーじゅー／きゅーま＾ん／ひゃく」という中間言語文字列を生成する。 For example, when the character string stored in the candidate selection buffer 811 is “999990100”, the speech intermediate language generation unit 802 separates the text into “999900000” and “100”, and acquires each reading and accent from the corresponding table. . The speech intermediate language generation unit 802 obtains “Kyusei / Kyunyu-Hyaku / Kyushu / Kyumayu” and “Hyaku” from the number table, and combines them. An intermediate language character string “Kyusei / Kyunyu / Hyaku / Kyuyuju / Kyumayu / Hyaku” is generated.

他の例としては、候補選択バッファ８１１に蓄積された文字列が「３２３０３２３」の場合、音声中間言語生成部８０２は、「３２３００００」と「３２３」に切り分け、それぞれの読み及びアクセントを、対応するテーブルから取得する。音声中間言語生成部８０２は、数字テーブルから「さん＾びゃく／に＾じゅう／さんま＾ん」と「さ＾んびゃく／に＾じゅうさん」を取得するので、これらを組み合わせて「さ＾んびゃく／に＾じゅう／さんま＾ん／さ＾んびゃく／に＾じゅうさん」という中間言語文字列を生成する。 As another example, when the character string stored in the candidate selection buffer 811 is “32323323”, the speech intermediate language generation unit 802 divides the character string into “3230000” and “323”, and the respective readings and accents correspond to each other. Get from table. The speech intermediate language generation unit 802 obtains “san ^ nyaku / niyuju / sanmayun” and “sanyubanku / ni ^ yusan” from the number table, and combines them into “sanyabyaku”. An intermediate language character string “/ Ni ^ 10 / Sanma ^ / Sa ^ Nyaku / Ni ^ Jun” is generated.

また、候補選択バッファ８１１に蓄積された数字の連続による文字列として、アラビア数字の連続の場合を説明したが、例えば「１，０００」など、桁区切り記号を含めた数字の文字列を候補選択バッファ８１１に蓄積される場合もある。この場合、音声中間言語生成部８０２は、桁区切り記号を削除した数字の連続による文字列に対して、上述した処理を行い、中間言語文字列を生成すればよい。 Further, the case of continuous Arabic numerals has been described as a continuous character string stored in the candidate selection buffer 811, but a numerical character string including a digit separator such as “1,000” is selected as a candidate. In some cases, the data is stored in the buffer 811. In this case, the speech intermediate language generation unit 802 may perform the above-described processing on a character string that is a series of numbers from which digit separators have been deleted to generate an intermediate language character string.

また、候補選択バッファ８１１に蓄積された数字の連続による文字列が小数を有する場合もある。このため記憶部８０３は、小数の読み及びアクセントを保持する数字テーブルを記憶し、音声中間言語生成部８０２が、文字列が数字の連続であり小数を有すると判断した場合、小数の読み及びアクセントを保持する数字テーブルから読み及びアクセントを取得することとする。次に、小数を有する数字の連続による文字列の読み及びアクセントを保持する数字のテーブルを説明する。 In addition, a character string formed by a series of numbers stored in the candidate selection buffer 811 may have a decimal number. For this reason, the storage unit 803 stores a number table that holds decimal readings and accents, and when the speech intermediate language generation unit 802 determines that the character string is a series of numbers and has decimals, decimal readings and accents are stored. The reading and accent are acquired from the number table that holds. Next, a table of numbers that retains reading and accent of a character string by consecutive numbers having decimals will be described.

図１１―１は、小数点を含む整数１桁から４桁までの範囲の読み及びアクセントを保持する数字テーブルを示した図である。そして図１１―２は、小数点以下の１文字および２文字に対応する読み及びアクセントを保持する数字テーブルを示した図である。これらの数字テーブルを記憶部８０３で記憶することで、小数を含む数字の連続による文字列について中間言語文字列の生成が可能となる。 FIG. 11A is a diagram showing a number table that holds readings and accents in the range of 1 to 4 digits including an integer including a decimal point. FIG. 11B is a diagram illustrating a number table that holds readings and accents corresponding to one character and two characters after the decimal point. By storing these number tables in the storage unit 803, it is possible to generate an intermediate language character string for a character string formed by a series of numbers including decimals.

候補選択バッファ８１１に蓄積された小数を含む数字の連続による文字列の場合、音声中間言語生成部８０２は、小数点より前の４桁については図１１−１で示した数字テーブルから読み及びアクセントを取得し、小数点以下の部分は、基本的に数字を２文字ずつに区切って図１１―２で示した数字テーブルから取得し、最後に１文字残った場合には、１文字の数字の読み及びアクセントを図１１−２で示した数字テーブルから取得する。 In the case of a character string consisting of consecutive numbers including decimal numbers stored in the candidate selection buffer 811, the speech intermediate language generation unit 802 reads and accents the four digits before the decimal point from the number table shown in FIG. The part after the decimal point is obtained from the number table shown in Fig. 11-2 by dividing the number into two characters. If one character is left at the end, reading the number of one character and Accents are obtained from the number table shown in FIG.

例えば、候補選択バッファ８１１に蓄積された文字列が「１．２２４」の場合、音声中間言語生成部８０２は、「１．」、「２２」と「４」に切り分け、それぞれの読み及びアクセントを、対応する数字テーブルから取得する。音声中間言語生成部８０２は、数字テーブルから「い＾ってん」、「にーに＾―」と「よ＾ん」を取得するので、これらを組み合わせて「い＾ってんにーに＾― よ＾ん」という中間言語文字列を生成する。なお、取得した文字列を組み合わせる際に、それぞれの文字列の間にポーズ長記号をはさんでも良い。 For example, when the character string stored in the candidate selection buffer 811 is “1.224”, the speech intermediate language generation unit 802 separates the reading and accent into “1.”, “22”, and “4”. , Get from the corresponding number table. The speech intermediate language generation unit 802 obtains “I-Ten”, “Ni-^-”, and “Yo-N” from the number table, and combines them to “In-Ten-ni-^- An intermediate language character string “Yon” is generated. When combining the acquired character strings, a pause length symbol may be inserted between the character strings.

また、読みを付けた数字の連続による文字列の品詞を、数字という品詞を持つ単語として扱うため、単語繋がり辞書にポーズやアクセント移動あるいは読みの変更についての規則を単語繋がり辞書に保持させることで、音声中間言語生成部８０２は、上述した１つあるいは複数の数字による文字列に対応する読みを取得したあと、ポーズの挿入やアクセント移動を行うことが可能となる。 Also, because the part-of-speech of a character string consisting of a series of numbers with readings is treated as a word with a part-of-speech called a number, the word connection dictionary holds rules for pauses, accent moves, or changes in reading in the word connection dictionary. The speech intermediate language generation unit 802 can perform pose insertion and accent movement after obtaining the reading corresponding to the character string of one or more numerals described above.

例えば、図３で示したような単語繋がり辞書の列に“読み変更”を加えたテーブルにおいて、“前単語”に“末尾が「いち」と読む数字”を、“後単語”に“「杯」数詞接尾”に対応付けて、“読み変更”として“いっぱい”と保持する。これによりＣＲＴ１５２上に、漢字交じりの候補として表示された「１杯」を選択した場合、音声中間言語生成部８０２は「いっぱい」という中間言語文字列の生成が可能となる。 For example, in a table in which “change reading” is added to the word connection dictionary column as shown in FIG. 3, “the number that reads“ 1 ”at the end” is added to “previous word”, and “ In association with “numerical suffix”, “full” is stored as “reading change.” When “one cup” displayed as a candidate for kanji combination on the CRT 152 is selected, the speech intermediate language generation unit 802 Can generate an intermediate language string of “full”.

次に、候補選択バッファ８１１に蓄積された文字列がアルファベットの場合、音声中間言語生成部８０２は、後述する処理により読み及びアクセントを付与する。 Next, when the character string stored in the candidate selection buffer 811 is alphabetic, the speech intermediate language generation unit 802 gives readings and accents by processing to be described later.

図１２は、アルファベットの連続について読みとアクセントを保持するアルファベットテーブルを示した図である。本図に示されたように、アルファベット、アクセント情報、品詞、プロパティを対応付けて保持する。 FIG. 12 is a diagram showing an alphabet table that holds readings and accents for a sequence of alphabets. As shown in the figure, alphabets, accent information, parts of speech, and properties are stored in association with each other.

そして候補選択バッファ８１１に蓄積された文字列がアルファベットの連続の場合、音声中間言語生成部８０２は、蓄積されていたアルファベットの連続による文字列と、図１２で示したアルファベットテーブルが保持している“アルファベット”の列の文字列を比較し、一致するレコードがあった場合、レコードに保持されていた読み及びアクセント、品詞、プロパティを用いて中間言語文字列を生成する。取得した品詞やプロパティは、第１の実施の形態と同様の処理に用いることとする。 If the character strings stored in the candidate selection buffer 811 are continuous alphabets, the speech intermediate language generation unit 802 holds the stored character strings based on the continuous alphabets and the alphabet table shown in FIG. The character strings in the “alphabet” column are compared, and if there is a matching record, an intermediate language character string is generated using the reading, accent, part of speech, and property held in the record. The acquired part of speech and property are used for the same processing as in the first embodiment.

アルファベットテーブルに一致するレコードがなかった場合、音声中間言語生成部８０２は、アルファベットの各文字に対応した読み及びアクセント情報をつなげて中間言語文字列を生成する。なお、各アルファベットの読み及びアクセント情報はアルファベット文字テーブルとして記憶部８０３が記憶することとする。このアルファベット文字テーブルは、例えば"a"は"えー"と対応付けて保持している。 If there is no matching record in the alphabet table, the speech intermediate language generation unit 802 generates an intermediate language character string by connecting reading and accent information corresponding to each character of the alphabet. Note that the storage unit 803 stores the reading and accent information of each alphabet as an alphabet character table. In this alphabet character table, for example, “a” is associated with “e” and held.

そして候補選択バッファ８１１に蓄積された文字列が"abc"であり、上述したアルファベットテーブルに一致するレコードがなかった場合、音声中間言語生成部８０２は、アルファベット文字レコードからアルファベットに対応する読み及びアクセントを取得し、「えーびーしー」という中間言語文字列を生成する。 If the character string stored in the candidate selection buffer 811 is “abc” and there is no record that matches the above alphabet table, the speech intermediate language generation unit 802 reads from the alphabet character record the accent corresponding to the alphabet. Is obtained and an intermediate language character string “E-B-I-S” is generated.

また、記号の文字に対応した読み及びアクセント情報を記号テーブルに保持する。そして候補選択バッファ８１１が文字列として記号を蓄積していた場合は、音声中間言語生成部８０２は、この記号テーブルが保持する記号と候補選択バッファ８１１が蓄積していた記号が一致した場合には、記号テーブルの一致したレコードから読み及びアクセントを取得して、中間言語文字列を生成する。なお、“、”、“。”、“「”など、読みを付けたくない記号は、記号テーブルに読みを付けないようにすればよい。なお、記号テーブルにおいて、記号１文字で記号という品詞の単語として扱う。 Also, reading and accent information corresponding to the character of the symbol is held in the symbol table. If the candidate selection buffer 811 stores symbols as character strings, the speech intermediate language generation unit 802 determines that the symbols stored in the symbol table match the symbols stored in the candidate selection buffer 811. Then, reading and accent are obtained from the matched records in the symbol table, and an intermediate language character string is generated. Note that symbols such as “,”, “.”, ““ ”, Etc. that are not desired to be read should not be read in the symbol table. In the symbol table, one symbol character is treated as a part of speech word called a symbol.

本実施の形態では、小数点以下がない数字あるいは小数点のある数字の整数部分は、桁読みを付けるようにしているが、上記小数点以下の読みを付ける要領で、読みを付ける処理も考えられる。つまり、候補選択バッファ８１１に蓄積された文字列が「９２１」ならば、音声中間言語生成部８０２は、「きゅーに＾ーいち」という中間言語文字列を生成する処理である。また、桁を考慮した読みにするか否かを、入力処理した後、候補として表示して利用者に選択させることも考えられる。 In the present embodiment, digits that do not have a decimal point or an integer part of a number that has a decimal point are subjected to digit reading. That is, if the character string stored in the candidate selection buffer 811 is “921”, the speech intermediate language generation unit 802 is a process for generating an intermediate language character string “Kyu ni ^ 1”. In addition, it is also conceivable that, after the input processing, whether or not the reading is performed in consideration of the digits is displayed as a candidate and selected by the user.

また、本実施の形態では、数字に読み及びアクセントを付ける方法として、４桁毎に数字の読み及びアクセント情報をテーブルに持つこととしたが、他の方法としてテーブルに持たせず、一定の規則により読み及びアクセントを付ける等が考えられる。 Further, in this embodiment, as a method of adding reading and accent to numbers, the table has reading of numbers and accent information every four digits. Can be read and accented.

同様に数字テーブルを用いて、小数点以下の中間言語文字列を生成したが、数字テーブルを用いずに規則によって「いってん」を「い＾ってん」や「にに」を「にーに＾―」と書き換える等も考えられる。 Similarly, an intermediate language character string with a decimal point was generated using a number table. However, without using a number table, “Inten” or “Ni ni” or “Ni ni” or “ni ni ^ — It is possible to rewrite "

本実施の形態では、入力処理された文字列に読み以外のアラビア数字、アルファベット、記号が混じっていた場合、音声中間言語生成部８０２は、上述した処理を行うことにより、品詞、読み、アクセントを取得することが可能となる。これにより、キーボードから数字など読み以外の入力をした場合でも、適切な音声合成を行う中間言語データの生成が可能となる。 In the present embodiment, if the input character string contains a mixture of Arabic numerals, alphabets, and symbols other than reading, the speech intermediate language generation unit 802 performs the above-described processing to obtain the part of speech, reading, and accent. It can be acquired. This makes it possible to generate intermediate language data for performing appropriate speech synthesis even when an input other than reading such as numbers is input from the keyboard.

以上のように、本発明に係る音声情報生成装置、音声情報生成プログラム及び音声情報生成方法は、音声合成用を行うための技術として有用であり、特に利用者が入力装置に介した入力により音声中間言語データを生成する技術において有用である。 As described above, the speech information generation device, the speech information generation program, and the speech information generation method according to the present invention are useful as a technology for performing speech synthesis. In particular, the speech is generated by the user input through the input device. This is useful in a technique for generating intermediate language data.

第１の実施の形態に係る音声中間言語生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice intermediate language production | generation apparatus which concerns on 1st Embodiment. 第１の実施の形態に係る音声中間言語生成装置の記憶部で記憶される単語辞書の一例を示した図である。It is the figure which showed an example of the word dictionary memorize | stored in the memory | storage part of the audio | voice intermediate language production | generation apparatus which concerns on 1st Embodiment. 第１の実施の形態に係る音声中間言語生成装置の記憶部で記憶される単語繋がり辞書の一例を示した図である。It is the figure which showed an example of the word connection dictionary memorize | stored in the memory | storage part of the speech intermediate language production | generation apparatus which concerns on 1st Embodiment. 第１の実施の形態に係る音声中間言語生成装置の候補出力部により出力された漢字交じり表記による候補の表示の一例を示した図である。It is the figure which showed an example of the display of the candidate by the kanji mixed notation output by the candidate output part of the audio | voice intermediate language generation apparatus which concerns on 1st Embodiment. 第１の実施の形態に係る音声中間言語生成装置で入力された文字列から中間言語データを出力するまでの全体処理を示すフローチャートである。It is a flowchart which shows the whole process until it outputs intermediate language data from the character string input with the audio | voice intermediate language production | generation apparatus which concerns on 1st Embodiment. 第１の実施の形態に係る音声中間言語生成装置のハードウェア構成を示した図である。It is the figure which showed the hardware constitutions of the audio | voice intermediate language production | generation apparatus which concerns on 1st Embodiment. 第２の実施の形態に係る音声中間言語生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice intermediate language production | generation apparatus which concerns on 2nd Embodiment. 第２の実施の形態に係る音声中間言語生成装置の候補出力部により出力された候補の表示の一例を示した図である。It is the figure which showed an example of the display of the candidate output by the candidate output part of the audio | voice intermediate language generation apparatus which concerns on 2nd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice intermediate language production | generation apparatus which concerns on 3rd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の記憶部で記憶される、数字の下１桁から４桁までの範囲の読み及びアクセントを保持するテーブルを示した図である。It is the figure which showed the table which hold | maintains the reading and accent of the range from the last 1 digit of a number of 4 digits memorize | stored in the memory | storage part of the speech intermediate language production | generation apparatus which concerns on 3rd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の記憶部で記憶される、数字の下５桁から８桁までの範囲の読み及びアクセントを保持するテーブルを示した図である。It is the figure which showed the table holding the reading and accent of the range from the last 5 digits of the number stored in the memory | storage part of the speech intermediate language production | generation apparatus which concerns on 3rd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の記憶部で記憶される、小数点を含む整数１桁から４桁までの範囲の読み及びアクセントを保持する数字テーブルを示した図である。It is the figure which showed the number table which hold | maintains the reading and accent of the range from the integer 1 digit including a decimal point to 4 digits memorize | stored in the memory | storage part of the speech intermediate language production | generation apparatus which concerns on 3rd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の記憶部で記憶される、小数点以下の１文字および２文字に対応する読み及びアクセントを保持する数字テーブルを示した図である。It is the figure which showed the number table holding the reading and accent corresponding to 1 character and 2 characters below a decimal point memorize | stored in the memory | storage part of the speech intermediate language production | generation apparatus which concerns on 3rd Embodiment. 第３の実施の形態に係る音声中間言語生成装置の記憶部で記憶される、アルファベットについて読みとアクセントを保持するアルファベットテーブルを示した図である。It is the figure which showed the alphabet table which holds the reading and accent about the alphabet memorize | stored in the memory | storage part of the audio | voice intermediate language production | generation apparatus which concerns on 3rd Embodiment.

符号の説明Explanation of symbols

１００、６００、８００音声中間言語生成装置
１０１、８０１入力処理部
１０２表記出力部
１０３、６０２、８０２音声中間言語生成部
１０４音声中間言語出力部
１０５、８０３記憶部
１１０、６１０、８１０表示選択部
１１１入力バッファ
１１２、６１１候補表示処理部
１１３候補取得部
１１４候補バッファ
１１５候補出力部
１１６、６１３、８１１候補選択バッファ
１１７表記表示処理部
１５１キーボード
１５２ＣＲＴ
５０１ＣＰＵ
５０２ＲＯＭ
５０３ＲＡＭ
５０４入力装置
５０５通信インターフェース
５０６外部記憶装置
５０７表示装置
５０８バス
６０１候補音声出力部
６１２候補生成部 100, 600, 800 Spoken intermediate language generation apparatus 101, 801 Input processing unit 102 Notation output unit 103, 602, 802 Spoken intermediate language generation unit 104 Spoken intermediate language output unit 105, 803 Storage unit 110, 610, 810 Display selection unit 111 Input buffer 112, 611 Candidate display processing unit 113 Candidate acquisition unit 114 Candidate buffer 115 Candidate output unit 116, 613, 811 Candidate selection buffer 117 Notation display processing unit 151 Keyboard 152 CRT
501 CPU
502 ROM
503 RAM
504 Input device 505 Communication interface 506 External storage device 507 Display device 508 Bus 601 Candidate voice output unit 612 Candidate generation unit

Claims

利用者により入力される読みを示す表音文字列と、一般文書に用いられる漢字交じり表記と、該漢字交じり表記の読みと音声合成時に該漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた文字列対応情報を記憶する文字列対応記憶手段と、
利用者により入力された前記表音文字列の入力処理を行う入力処理手段と、
前記記憶手段に記憶された前記文字列対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択手段と、
前記記憶手段に記憶された前記文字列対応情報により、前記表示選択手段により選択された前記漢字交じり表記と対応付けられた前記音声中間言語文字列に基づいて、音声合成するための読みと前記語調情報を組み合わせた音声中間言語文書を生成する中間言語生成手段と、
を備えたことを特徴とする音声情報生成装置。 The phonetic character string indicating the reading input by the user, the kanji mixed notation used for general documents, and the tone of the kanji mixed notation used to pronounce the tone in the tone corresponding to the kanji mixed notation at the time of reading and speech synthesis. A character string correspondence storage means for storing character string correspondence information in which speech intermediate language character strings combined with tone information indicating information to be specified are associated;
Input processing means for performing input processing of the phonetic character string input by the user;
Based on the character string correspondence information stored in the storage means, the candidate for the kanji mixed notation associated with the phonetic character string input by the input processing means is displayed to be selectable for the user. Display selection means for selecting the kanji mixed notation made by the user from the candidates;
Based on the character string correspondence information stored in the storage means, based on the phonetic intermediate language character string associated with the kanji mixed notation selected by the display selection means, reading and tone for speech synthesis Intermediate language generation means for generating a voice intermediate language document combining information;
An audio information generating apparatus comprising:

前記入力処理手段により入力された前記表音文字列により、前記表音文字列の読みと前記語調情報を組み合わせた該音声中間言語文字列による候補を生成する候補生成手段と、をさらに備え、
前記表示選択手段は、さらに、前記候補生成手段により生成された候補を前記入力処理手段により入力された前記表音文字列に対応付けられた候補として表示し、表示された前記候補から利用者にされた前記音声中間言語文字列を選択すること
を特徴とする請求項１に記載の音声情報生成装置。 Candidate generating means for generating candidates based on the phonetic intermediate language character string combining the reading of the phonetic character string and the tone information by the phonetic character string input by the input processing means,
The display selection unit further displays the candidate generated by the candidate generation unit as a candidate associated with the phonogram string input by the input processing unit, and displays the displayed candidate to the user. The speech information generation apparatus according to claim 1, wherein the speech intermediate language character string is selected.

前記表示選択手段により表示された候補を音声出力する音声出力手段と、をさらに備えたことを特徴とする請求項１または２に記載の音声情報生成装置。 The audio information generation apparatus according to claim 1, further comprising: an audio output unit that outputs the candidate displayed by the display selection unit.

１つあるいは複数の、音素を示す文字、数字、記号を含む特殊文字列と、前記音声中間言語文字列とを対応付けた特殊中間言語対応情報を記憶する特殊中間言語対応記憶手段と、をさらに備え、
前記入力処理手段は、さらに、前記特殊文字列の入力処理を行い、
前記中間言語生成手段は、さらに、前記特殊中間言語対応情報により、前記入力処理手段により入力された前記特殊文字列と対応付けられた前記音声中間言語文字列に基づいて、前記音声中間言語文書を生成すること
を特徴とする請求項１〜３のいずれか一つに記載の音声情報生成装置。 Special intermediate language correspondence storage means for storing special intermediate language correspondence information in which one or a plurality of special character strings including letters, numbers, and symbols indicating phonemes and the speech intermediate language character strings are associated with each other. Prepared,
The input processing means further performs input processing of the special character string,
The intermediate language generation unit further converts the speech intermediate language document based on the speech intermediate language character string associated with the special character string input by the input processing unit based on the special intermediate language correspondence information. The voice information generation device according to any one of claims 1 to 3, wherein the voice information generation device is generated.

前記表示選択手段により選択された前記漢字交じり表記に基づいて、漢字交じりの表記による文書を出力する表記出力手段を、さらに備えたことを特徴とする請求項１〜４のいずれか一つに記載の音声情報生成装置。 The notation output means which outputs the document by the notation of kanji mix based on the kanji mix notation selected by the display selection means is further provided, The description output means characterized by the above-mentioned. Voice information generation device.

前記語調情報は、アクセント、ポーズ長を示した記号であることを特徴とする請求項１〜５のいずれか１つに記載の音声情報生成装置。 The speech information generating apparatus according to claim 1, wherein the tone information is a symbol indicating an accent and a pose length.

利用者により入力された読みを示す表音文字列の入力処理を行う入力処理手順と、
前記表音文字列と、一般文書に用いられる漢字交じり表記とを対応付けた文字列表記対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択手順と、
前記漢字交じり表記と、前記漢字交じり表記の読みと音声合成時に前記漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた表記中間言語対応情報により、前記表示選択手順により選択された前記漢字交じり表記と対応付けられた該音声中間言語文字列に基づいて、音声合成するための読みと該語調情報を組み合わせた音声中間言語文書を生成する中間言語生成手順と、
をコンピュータに実行させる音声情報生成プログラム。 An input processing procedure for performing an input process of a phonetic character string indicating a reading input by the user;
Based on the character string notation correspondence information in which the phonetic character string is associated with the kanji mixed notation used in the general document, the kanji mixed notation associated with the phonetic character string input by the input processing means is provided. A display selection procedure for displaying candidates for selection to the user, and selecting the kanji mixed notation made by the user from the displayed candidates,
Correspondence between the kanji mixed notation and a speech intermediate language character string that combines the kanji mixed notation reading and tone information indicating information specifying the tone in order to pronounce in the tone corresponding to the kanji mixed notation at the time of speech synthesis Based on the written intermediate language correspondence information attached, based on the spoken intermediate language character string associated with the kanji mixed notation selected by the display selection procedure, the speech that combines the reading and the tone information for speech synthesis An intermediate language generation procedure for generating an intermediate language document;
Information generation program for causing a computer to execute

利用者により入力された読みを示す表音文字列の入力処理を行う入力処理ステップと、
前記表音文字列と、一般文書に用いられる漢字交じり表記とを対応付けた文字列表記対応情報により、前記入力処理手段により入力された前記表音文字列と対応付けられた前記漢字交じり表記の候補を利用者に選択可能に表示し、表示された前記候補から利用者にされた前記漢字交じり表記を選択する表示選択ステップと、
前記漢字交じり表記と、前記漢字交じり表記の読みと音声合成時に前記漢字交じり表記に対応した語調で発音するために語調を指定する情報を示す語調情報を組み合わせた音声中間言語文字列と、を対応付けた表記中間言語対応情報により、前記表示選択ステップにより選択された前記漢字交じり表記と対応付けられた該音声中間言語文字列に基づいて、音声合成するための読みと該語調情報を組み合わせた音声中間言語文書を生成する中間言語生成ステップと、
を有することを特徴とする音声情報生成方法。 An input processing step for performing an input process of a phonetic character string indicating a reading input by the user;
Based on the character string notation correspondence information in which the phonetic character string is associated with the kanji mixed notation used in the general document, the kanji mixed notation associated with the phonetic character string input by the input processing means is provided. A display selection step of displaying candidates for selection to the user and selecting the kanji mixed notation made by the user from the displayed candidates,
Correspondence between the kanji mixed notation and a speech intermediate language character string that combines the kanji mixed notation reading and tone information indicating information specifying the tone in order to pronounce in the tone corresponding to the kanji mixed notation at the time of speech synthesis Based on the written intermediate language correspondence information attached, based on the spoken intermediate language character string associated with the kanji mixed notation selected in the display selection step, a speech that combines reading for speech synthesis and the tone information An intermediate language generation step for generating an intermediate language document;
A method for generating speech information, comprising: