JPS60186958A

JPS60186958A - Japanese processing system

Info

Publication number: JPS60186958A
Application number: JP59040571A
Authority: JP
Inventors: Tsuneo Nitta; 恒雄新田; Norimasa Nomura; 典正野村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-03-05
Filing date: 1984-03-05
Publication date: 1985-09-24

Abstract

PURPOSE:To read a position in a work dictionary and to read necessary information out of the word dictionary according to the position information by providing the single word dictionary and collating a KANJI (Chinese character) independent word with a KANJI word retrieval table. CONSTITUTION:A character string inputted with KANA (Japanese syllabary) keys is verified by a KANA independent word collation part 1 as to an independent word part. The word dictionary 2 is used for the verification and said KANA character string is collated with index words of the word dictionary 2 from the head. The collated part of the character string is removed and an adjunct word verifying part 3 verifies an adjunct word as to the remaining character string by using an adjunct word dictionary 4. An independent word part is substituted with KANJI as to a character string which stands said independent word verification and adjunct word verification to generate a KANJI and KANA mixed character string, which is stored in a document storage part 5. A KANJI independent word collation part 6 collates the KANJI and KANA mixed character string with a KANJI word retrieval table 7 from the head.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、単語辞書を有する日本語処理方式に関する。[Detailed description of the invention] [Technical field of invention] The present invention relates to a Japanese language processing method having a word dictionary.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

近年、日本語文章を処理の対象とする文書出力装置、文
書ファイルシステム、文書出力装置等が盛んに開発され
るようになった。そして種々の日本語処理のための辞膏
データベースが作成され、これらの装置に組込まれてい
る。第１図は日本語ワードプロセッサに用いられる「か
な−漢字変換用の辞書」の構成例を示している。かな・
漢字変換用の辞書に線、通常、通し番号の他、見出し語
（かな表記の読み）・品詞情報・漢字混り表記・付加悄
＠などが入っている。例えば、かなキーによって文節単
位に入力された文は、形態素解析の後それらの組み合わ
せをこの辞書で検索することになる。付加情報としては
、その見出し語の使用頻度などが含まれ、表示の際の優
先屓位の決定に利用される。なお、かな表記を見出し語
とする場合は、国語辞典などと同じように「あいうえお
順」とするのが一般である。このような構成をとる日本
語処理装置の辞書としては、簡単なものはかな表記見出
し語と漢字混シ表記を並べた「電子式国語辞典」用のも
のから、付加情報として熟語や意味属性さらには韻律情
報などまで記入した「音声認識装置」などのより両度な
日本語処理システム用辞香まで種々の形態が考えられる
。In recent years, document output devices, document file systems, document output devices, etc. that process Japanese text have been actively developed. A dictionary database for various types of Japanese processing has been created and incorporated into these devices. FIG. 1 shows an example of the configuration of a ``dictionary for kana-kanji conversion'' used in a Japanese word processor. kana·
In addition to lines, normal numbers, and serial numbers, the dictionary for kanji conversion includes headwords (reading in kana notation), part of speech information, mixed kanji notation, and additional @. For example, a sentence entered in clause units using the kana key will be searched for combinations in this dictionary after morphological analysis. The additional information includes the frequency of use of the headword, and is used to determine priority when displaying. Note that when using kana notation as a headword, it is generally written in ``Aiueo order'', as in Japanese dictionaries. Dictionaries for Japanese language processing devices with this type of configuration range from simple ones for "electronic Japanese language dictionaries" with ephemeral headwords and kanji-mixed notation, to additional information such as idioms, semantic attributes, and more. Various forms can be considered, including a ``speech recognition device'' that includes prosodic information, and even a more sophisticated Japanese language processing system.

また、日本語文書を音声あるいは点字に変換するような
場合は、［漢字−かな変換用の辞書」が必要となる。第
２図にこの構成例を示す。漢字もしくは漢字混）の文字
列を見出し語とする場合は「ＪＩＳコード順」とするの
が一般である。このような構成をとる日本語処理装置の
辞書としては付加情報として熟語や意味属性を含む漢字
０ＣＲ（光学式文字読み取シ装置）の言語処理のための
辞書、さらに付加情報にアクセントなどの韻律情報を含
む「文章読み合わせ機」用辞誓など、やけ９種々の形態
が考えら、れる。Furthermore, when converting a Japanese document into audio or Braille, a ``kanji-kana conversion dictionary'' is required. FIG. 2 shows an example of this configuration. When a character string of kanji or a combination of kanji is used as a headword, it is generally done in "JIS code order." A dictionary for a Japanese language processing device with such a configuration includes a dictionary for language processing of kanji 0CR (optical character reading device), which includes idioms and semantic attributes as additional information, and additional information including prosodic information such as accents. There are nine different forms that can be considered, such as a letter of resignation for a ``sentence read-aloud machine.''

一方、これまでに述べたような「かな見出し語」辞書と
「漢字見出し語」辞書の相方を必要とする日本語処理装
置が、今後必要とされ「読み合わせ機能付き日本語ワー
ドプロセッサ」や文を音声で入力し結果を音声で出方で
きる「音声入出方日本語ワードプロセッサ」さらに［読
み合わせ機能付き漢字００ＲＪなどが考えられる。こう
した装置では、「かな見出し語」辞書（以下、単に単語
辞書と呼ぶ）と「漢字見出し語」辞書の相方を内蔵する
必要があるため、従来の方法では倍近い辞書容量が必要
となるため、効率良い辞書の構成方法が望まれる。On the other hand, Japanese language processing devices that require companions to the ``kana headword'' and ``kanji headword'' dictionaries mentioned above will be needed in the future, such as a ``Japanese word processor with a read-aloud function'' and a system that can process sentences by voice. A ``Voice Input/Output Japanese Word Processor'' that allows you to input data and output the result aloud, as well as ``Kanji 00RJ'' with a reading function. These devices require a built-in "kana entry word" dictionary (hereinafter simply referred to as a word dictionary) and a "kanji entry word" dictionary, which requires nearly twice the dictionary capacity as with conventional methods. An efficient dictionary construction method is desired.

〔発明の目的〕[Purpose of the invention]

この発明は上記事情に鑑みて少ない辞書容量で効率的な
日本語処理方式を提供することを目的とする。In view of the above circumstances, it is an object of the present invention to provide an efficient Japanese language processing method with a small dictionary capacity.

〔発明の概要〕[Summary of the invention]

本発明によれば、単語辞４（「かな見出し語」）の照合
と「漢字見出し語」辞書の照合の相方の機能を内蔵する
日本語処理／ステムにおいて、単一の単語辞書を有し、
漢字自立語は漢字単語検索テーブルと照合することによ
り、単語辞書上の位置を読み出し、この位置情報により
単語辞書から必要な情報をとり出すことを可能としてい
る。本発明を用いると、例えばカナあるいは音節による
入力を、単語辞書を用いて漢字かな混り文字列に変換す
る機能と、漢字かな混り文字列を単語辞書を用いて音声
パラメータに変え、合成音声で出力する機能とを単一の
単語辞書で行うことができる。According to the present invention, in the Japanese processing/stem that incorporates the functions of matching the word dictionary 4 ("Kana entry word") and the "Kanji entry word" dictionary, it has a single word dictionary,
By comparing Kanji independent words with the Kanji word search table, the position in the word dictionary is read out, and this position information makes it possible to extract the necessary information from the word dictionary. Using the present invention, for example, there is a function that converts input in kana or syllables into a character string containing kanji and kana using a word dictionary, and a function that converts a character string containing kanji and kana into speech parameters using a word dictionary and synthesizes speech. This can be done with a single word dictionary.

〔発明の効果〕〔Effect of the invention〕

本発明によれば、少ない辞書容量で実質上２種類の辞書
を内蔵させたものと同等の処理を行なうことができる。According to the present invention, it is possible to perform substantially the same processing as when two types of dictionaries are built in with a small dictionary capacity.

〔発明の実施例〕[Embodiments of the invention]

Ｍ３図は本発明を日本語音声出力装置に適用した第１の
実施例を示す。かなキーによって入力されたかな文字列
を、かな自立語照合部１が自立語部分の検定を行う。こ
の検定に単語辞書２を用い、上記のかな文字列の先頭か
ら、単語辞書２の見出し語と照合する。照合できたかな
文字列の部分をとり除き、残シの文字列について付属語
検定部３が付属語辞書４を用いて付属語の検定を行う。Figure M3 shows a first embodiment in which the present invention is applied to a Japanese audio output device. A kana independent word collation unit 1 verifies the independent word part of a kana character string input using the kana key. The word dictionary 2 is used for this test, and the headwords of the word dictionary 2 are compared from the beginning of the above-mentioned kana character string. The parts of the kana character strings that have been matched are removed, and the adjunct word verification section 3 uses the adjunct word dictionary 4 to test the adjunct words on the remaining character strings.

上記の自立語検定と付属語検定で合格したかな文字列に
ついて、自立語部分を漢字表記に置換することにより、
漢字かな混り文字列を生成し、文書記憶部５へ格納する
。By replacing the independent word parts with kanji notation for the kana character strings that passed the above independent word test and attached word test,
A character string containing kanji and kana is generated and stored in the document storage unit 5.

上記の過程で用いられる単語辞書２の形式を第４図に示
している。１単語に関する情報として、単語奇岩、見出
し語（かな表記）、文法情報（品詞活用型など）、漢字
表記、音声情報（単語のアクセント型など）、出現頻度
がおる。The format of the word dictionary 2 used in the above process is shown in FIG. Information about a single word includes the word strange rock, headword (kana notation), grammatical information (part-of-speech conjugation type, etc.), kanji notation, phonetic information (word accent type, etc.), and frequency of appearance.

つぎに漢字かな混９文字列を合成音声に変換する機能は
、以下のようにして行われる。まず漢字自立語照合部６
は、漢字かな混り文字列を先頭から、漢字単語検索テー
ブル７と照合する。照合できた漢字単語についてその単
語が、前記の単語辞書２のどこの位置にあるかの情報（
単語番号）を漢字単語検索テーブル７から読み出す。そ
してその位置情報にもとづき単語辞書２から、その単語
に関する音韻情報および韻律情報などをとシ出す。Next, the function of converting the 9-character string of kanji and kana into synthesized speech is performed as follows. First, Kanji independent word matching section 6
compares the character string containing kanji and kana from the beginning with the kanji word search table 7. Information about the position of the kanji word that was matched in the word dictionary 2 (
The word number) is read out from the kanji word search table 7. Then, based on the position information, phonetic information, prosody information, etc. regarding the word are extracted from the word dictionary 2.

音韻変形検定部８は前記音韻情報に関して、鼻音化など
の音韻変形の検定を行う。韻律情報検定部９は、前記の
単語韻律情報を用いて、単語（自立語）に付属語が結合
したときのアクセントの変化などの検定を行う。The phoneme deformation testing section 8 tests phoneme deformation such as nasalization with respect to the phoneme information. The prosody information testing unit 9 uses the word prosody information to test changes in accent when an adjunct word is combined with a word (independent word).

以上のようにして得られた音韻および韻律情報を、音声
パラメ生成性成部１０が、背戸パラメータＶＣＫえ、音
声合成部１２が音声パラメータにもとづいて駆動するこ
とにより、合成音声が出力される。The voice parameter generator 10 converts the phoneme and prosody information obtained in the manner described above to the back door parameter VCK, and the voice synthesizer 12 drives the voice parameters based on the voice parameters, thereby outputting a synthesized voice.

上記の合成音声に変換する過程で用いられる漢字早語検
索テーブル７は、第５図に示す形式をもち、このテーブ
ルは、第４図の構成をもつ単語辞書ｌから、自動的に生
成できる。このため、単語辞書としては、単独の辞書だ
けで済み、辞書の記憶１ｃ要する容量が小さくてよいだ
けでなく、辞書に登録した単語を修正したり、またある
いは単語を追加したりする、いわゆる辞書メインテナン
ス作業が複数の辞書を用いる場合に比較してはるかに軽
減される。The kanji early word search table 7 used in the process of converting into synthesized speech has the format shown in FIG. 5, and this table can be automatically generated from the word dictionary 1 having the structure shown in FIG. For this reason, as a word dictionary, you only need a single dictionary, and the memory capacity required for the dictionary is small. Maintenance work is much reduced compared to when multiple dictionaries are used.

以上のように、本発明の辞書方式は、実用上大きな効果
をもつ。As described above, the dictionary method of the present invention has great practical effects.

第６図に本発明の第２の実施例を示す。この実施例は、
入力を音声、文字（手書き、印刷いずれでもよい）で行
ない認識結果を辞書照合した後、一旦、文書ファイルに
格納し、作成文書を音声で確認しながら校正することを
目的としている。FIG. 6 shows a second embodiment of the invention. This example is
The purpose of this system is to perform input by voice or text (handwritten or printed), check the recognition results with a dictionary, then store them in a document file, and then proofread the created document while confirming it by voice.

まず、入力音声は音声認識部２１で認識され、その候補
音節列はカナ自立語照合部２２へ転送され自立語部分の
検定が行われる。この検定に単語辞書２５を用い候補音
節列の先頭から見出し語と照合する。次に照合できた音
節列の部分をとり除き残シの候補音節列について付属語
検定部２３で付属語辞ｖ６を用いて付属語の検定を行う
。上ｉピの自立語検定と付属語検定で合格したかな文字
列について自立語部分を漢字表記ＶＣ置換することによ
シ、漢字かな文字列を生成し文書記憶部２４へ採納する
。First, the input speech is recognized by the speech recognition unit 21, and the candidate syllable string is transferred to the kana independent word matching unit 22, where the independent word portion is verified. For this test, the word dictionary 25 is used to match the headword from the beginning of the candidate syllable string. Next, the parts of the syllable strings that have been successfully matched are removed, and the remaining candidate syllable strings are tested for adjuncts in the adjunct word verification section 23 using the adjunct dictionary v6. For the kana character strings that passed the independent word test and adjunct word test of the upper i-pi, by replacing the independent word part with the kanji notation VC, a kanji kana character string is generated and stored in the document storage unit 24.

ここで用いた単語辞書２５は、前述の実施例１における
単語辞４Ｉ（第４図参照）と同じものが使用される。文
書記憶部の漢字かな混９文字列を合成音声に変換する機
能は、前述の第１の実施例とまったく同じでらる。The word dictionary 25 used here is the same as the word dictionary 4I (see FIG. 4) in the first embodiment described above. The function of converting the 9 character strings of kanji and kana in the document storage section into synthesized speech is exactly the same as in the first embodiment described above.

次に手書きもしくは印刷された漢字かな混ｐ文字を読み
取シ文字ファイルを作成後、合成音声で確認する場合に
ついて説明する。入力文字列は漢字文字認識部２８で認
識され、認識結果の候補文字列は漢字自立語照合部２９
で漢字単飴検累テーブル２７と先頭から照合される。そ
して照合できた漢字単語について、その単語が、前記の
単語辞書５のどこの位置にあるかの情報を漢字単語検索
テーブルから読み出し、その位置情報にもとづいて単語
辞書５から、その単語に関する品詞情報や頻度情報など
をとシ出す。頻度情報は、複数の漢字単語が検出された
場合の優先順などに利用する。次に照合できた漢字単語
の部分をとシ除き、漢字自立語照合部から送られる残り
の候補文字列と自立語部分の品詞情報をもとＩｃ例属詔
検定部３０で付属語辞書２６を用いた付属語の検定が行
なわれる。上記の自立語検定と例属語検定で合格した漢
字かな文字列は文書記憶部２４へ転送され格納される。Next, a case will be described in which handwritten or printed Kanji/Kana characters are read, a character file is created, and then the file is confirmed using a synthesized voice. The input character string is recognized by the Kanji character recognition unit 28, and the candidate character strings as a result of recognition are recognized by the Kanji independent word matching unit 29.
It is compared with the kanji single candy test table 27 from the beginning. Then, for the kanji word that has been matched, information about the position of the word in the word dictionary 5 is read from the kanji word search table, and based on the position information, part-of-speech information about the word is retrieved from the word dictionary 5. and frequency information. The frequency information is used for priority order when multiple kanji words are detected. Next, the part of the kanji word that could be matched is removed, and the attached word dictionary 26 is calculated by the IC example categorization edict test part 30 based on the remaining candidate character strings sent from the kanji independent word matching part and the part-of-speech information of the independent word part. The adjunct words used are tested. The kanji-kana character strings that have passed the above independent word test and example dependent word test are transferred to the document storage section 24 and stored therein.

ここで用いた漢字単語検索テーブル２７は前述の第１の
実施例におけるもの（第５図）と同じものが使用される
。文書記憶部の漢字かな混り文字列を合成音声に変換す
る過程は、これまで述べたとまったく同じでるる。The kanji word search table 27 used here is the same as that used in the first embodiment described above (FIG. 5). The process of converting a character string containing kanji and kana in the document storage unit into synthesized speech is exactly the same as described above.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は単語辞＠（「かな見出し語」）の構成例を示す
図、第２図は漢字単給辞薔（「漢字見出し語」）の構成
例を示す図、第３図は本発明の第１の実施例を示す図、
第４図は本発明の実施例で用いる単語辞書の構成例をボ
す４１図、芋５図は本発明の実施例で用いる漢字早語倹
索テーブルの構成例を示す図、第６図は本発明の第２の
実施例を示す図である。１　かな自立語照合部２　単語辞書記憶部３　付属語検定部４　付属語辞書記憶部５　文書記憶部６　漢字自立語照合部７　漢字単語検索テーブル記憶部８　音韻変形検定部９　韻律情報検定部１０　音声パラメータ生成部１１　音声素片ファイル記憶部１２　音声合成部２１　音声認識部２２　カナ自立語照合部２３　付属語検定部２４　文書記憶部２５　単語辞＃記憶部２６　付属語辞書記憶部２７藻字単語検索テーブル記憶部２８　漢字文字認識部２９　漢字自立語照合部３０　付＠語検定部３１　漢字自立語照合部３２　音韻変形検定部３３　韻律情報検定部３４　音声パラメータ生成部３５　音声合成部３６　音声素片ファイル記憶部代理人　弁理士　則　近　憲　佑（ほか１名）第１図第２図第４図第６図第　８　°図Figure 1 is a diagram showing an example of the structure of a word dictionary @ (``kana entry word''), Figure 2 is a diagram showing an example of the configuration of a kanji tankyuji 薔 (``kanji entry word''), and Figure 3 is a diagram showing the invention of the present invention. A diagram showing a first example of
Figure 4 is a diagram showing an example of the configuration of a word dictionary used in an embodiment of the present invention, Figure 5 is a diagram showing an example of the configuration of a kanji early word search table used in an embodiment of the present invention, and Figure 6 is a diagram showing an example of the configuration of a kanji early word search table used in an embodiment of the present invention. It is a figure which shows the 2nd Example of this invention. 1 Kana independent word matching section 2 Word dictionary storage section 3 Adjunct word testing section 4 Adjunct word dictionary storage section 5 Document storage section 6 Kanji independent word matching section 7 Kanji word search table storage section 8 Phonological transformation testing section 9 Prosodic information testing section 10 Speech parameter generation section 11 Speech segment file storage section 12 Speech synthesis section 21 Speech recognition section 22 Kana independent word matching section 23 Adjunct word verification section 24 Document storage section 25 Word dictionary # storage section 26 Adjunct word dictionary storage section 27 Moji words Search table storage unit 28 Kanji character recognition unit 29 Kanji independent word matching unit 30 @word testing unit 31 Kanji independent word matching unit 32 Phonological transformation testing unit 33 Prosodic information testing unit 34 Speech parameter generation unit 35 Speech synthesis unit 36 Speech segment File Storage Department Agent Patent Attorney Kensuke Chika (and 1 other person) Figure 1 Figure 2 Figure 4 Figure 6 Figure 8 ° Figure

Claims

【特許請求の範囲】[Claims]

単語辞書の照合と漢字見出し語辞書の照合の相方の機能
を有する日本語処理方式において、自立語辞書としては
単語辞書を内蔵し、漢字自立語は漢字単語検索テーブル
と照合することにより、単語辞書上の位置を説み出し、
この位置情報を用いて単語辞書上の必要な情報をとり出
すことを特徴とする日本語処理方式。In the Japanese processing method, which has the functions of matching a word dictionary and a kanji headword dictionary, a word dictionary is built-in as an independent word dictionary, and kanji independent words are compared with a kanji word search table. explain the upper position,
This Japanese language processing method is characterized by using this location information to extract necessary information from a word dictionary.