JP2839419B2

JP2839419B2 - Machine translation device with idiom registration function

Info

Publication number: JP2839419B2
Application number: JP4293372A
Authority: JP
Inventors: 至幸小山; いち子佐田; 陽士福持; 等鈴木
Original assignee: Consejo Superior de Investigaciones Cientificas CSIC
Current assignee: Consejo Superior de Investigaciones Cientificas CSIC
Priority date: 1992-10-30
Filing date: 1992-10-30
Publication date: 1998-12-16
Anticipated expiration: 2013-12-16
Also published as: JPH06139272A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、電子化辞書あるいは
電子化辞書を搭載した情報検索装置あるいは電子化辞書
を搭載した機械翻訳装置に関し、特に、イディオムを登
録し検索・翻訳することのできるイディオム登録機能を
持つ辞書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an electronic dictionary, an information retrieval apparatus equipped with an electronic dictionary, and a machine translation apparatus equipped with an electronic dictionary, and more particularly, to an idiom capable of registering an idiom to search and translate. The present invention relates to a dictionary search device having a registration function.

【０００２】[0002]

【従来の技術】現在実用化されている言語処理装置に
は、人間の文書作成活動を支援するためのワードプロセ
ッサや、或る言語で書かれた文書を別の言語に翻訳する
ための機械翻訳装置などがある。これらの言語処理装置
には、それぞれの目的に応じた情報を納めた辞書が備え
られている。ここでいう辞書とは、言語処理装置で扱う
言語に関して、見出語とそれに付帯する各種の情報の組
とを１単位の項目としたものを多数統合し、見出語を用
いて所望の項目を容易に検索できるように系統立てて並
べたものである。2. Description of the Related Art Currently available language processors include a word processor for assisting human document creation activities and a machine translator for translating a document written in one language into another language. and so on. These language processing devices are provided with dictionaries containing information according to their purposes. The term "dictionary" as used herein refers to a language that is handled by the language processing device, and a number of entries each having a headword and a set of various information accompanying the headword are integrated into one unit. Are arranged systematically so that they can be easily searched.

【０００３】辞書は、原則として機械可読な不揮発性の
媒体に機械可読な形式で記録される。このように機械可
読な不揮発性の媒体に機械可読な形式で記録された辞書
を、電子化辞書と呼ぶことにする。電子化辞書を機械翻
訳において用いる場合には、見出語としては原語（ソー
ス原語）の単語列（１単語のみのものも含む）が用いら
れ、その単語列に付帯する各種の情報は、その単語列の
品詞情報や翻訳語（ターゲット言語）の対応単語列など
を含む。[0003] The dictionary is recorded in a machine-readable format on a machine-readable non-volatile medium in principle. A dictionary recorded in a machine-readable format on a machine-readable non-volatile medium will be referred to as an electronic dictionary. When an electronic dictionary is used in machine translation, a word string (including one word only) of an original word (source original word) is used as a headword, and various information attached to the word string is Includes the part-of-speech information of the word string and the corresponding word string of the translated word (target language).

【０００４】このような言語処理装置を用いて利用者が
処理あるいは作成しようとしている文書に、この装置に
備えられた辞書に見出語として記載されていない単語が
含まれている場合には、作業効率が著しく低下してしま
う。そのために、辞書に収録する見出語は、より多いほ
うが好ましい。同様に、機械翻訳の場合には、原語の各
単語のみではなく、イディオムを見出語として採用し、
対応するターゲット言語の言い回し等をペアとして、こ
のようなペアをできるだけ多数登録しておくことが翻訳
効率の上では望ましい。ここで、イディオムは、通常可
変部分と固定部分とから構成される。可変部分とはたと
えば「ｆｒｏｍ〜ｔｏ…」における「〜」と「…」であ
り、ある文法特徴を共有する単語や句が入る部分であ
る。また、固定部分とは「ｆｒｏｍ」や「ｔｏ」など骨
格となる単語や単語列からなる部分である。[0004] When a document that is to be processed or created by a user using such a language processing device includes a word that is not described as a headword in a dictionary provided in the device, Work efficiency is significantly reduced. For this reason, it is preferable that the number of headwords recorded in the dictionary is larger. Similarly, in the case of machine translation, not only each word in the original language but also an idiom is used as a headword,
In terms of translation efficiency, it is desirable to register as many pairs as possible of the language of the corresponding target language. Here, the idiom usually includes a variable part and a fixed part. The variable portions are, for example, "~" and "..." in "from to to ...", and are portions containing words and phrases that share a certain grammatical feature. The fixed part is a part composed of a skeleton word or word string such as “from” or “to”.

【０００５】[0005]

【発明が解決しようとする課題】しかし、たとえば見出
語としてイディオムなどを登録する場合、次のような問
題点がある。イディオムには、数詞、所有格代名詞、再
帰代名詞など、主語や他の語との関係でその形を変え得
る可変部分を含むものが多い。そのため、前述のように
翻訳効率を上げるためには、これら可変部分に具体的な
語を入れ替えた同一のイディオムを多数登録しなければ
ならない。そのために、辞書登録者に係る負担が大きく
なってしまう。また、見出語が増えれば増えるだけ辞書
の記憶媒体として必要な容量も増大するために、このよ
うな登録の方法は好ましいものではない。However, for example, when an idiom is registered as a headword, there are the following problems. Many idioms include variable parts such as numbers, possessive pronouns, and reflexive pronouns that can change their form in relation to the subject or other words. Therefore, in order to increase the translation efficiency as described above, it is necessary to register many identical idioms in which specific words are replaced in these variable portions. As a result, the burden on the dictionary registrant increases. Further, as the number of headwords increases, the capacity required as a storage medium for the dictionary also increases. Therefore, such a registration method is not preferable.

【０００６】そこで、辞書の記憶容量の増大を防ぐため
に、たとえば、イディオム中の可変部分に入るべき１つ
の単語に対応する代表記号を導入し、登録するイディオ
ムの数を減らすことが考えられる。Therefore, in order to prevent the storage capacity of the dictionary from increasing, for example, it is conceivable to introduce a representative symbol corresponding to one word to be included in a variable portion in the idiom and reduce the number of idioms to be registered.

【０００７】このような可変部分の表現形式として、１
つの単語のみが対応し、かつ同じ品詞の単語が対応し、
さらに、可変部分が１ヵ所しかないようなイディオムの
場合は、簡単に登録又は検索が可能であるが、イディオ
ムの中には可変部分そのものが複数個存在するものと
か、可変部分に形容詞又は副詞などのいくつかの品詞が
対応ししかも１つの単語だけでなく複数の単語からなる
単語列（句）が対応するものもある。[0007] As an expression form of such a variable part, 1
Only two words correspond, and words with the same part of speech correspond,
Furthermore, in the case of an idiom in which there is only one variable part, registration or search can be easily performed. However, there are a plurality of variable parts themselves in the idiom, or an adjective or adverb in the variable part. Some parts of speech correspond to a word string (phrase) composed of not only one word but also a plurality of words.

【０００８】これらの種々の表現形式を持つイディオム
を登録するために、可変部分の単語ごとに代表記号を導
入していたのでは、かえって登録するイディオムの数が
増加し、登録作業の負担もかかり、記録容量の減少とは
ならないおそれがある。If a representative symbol is introduced for each variable part word in order to register idioms having these various expression forms, the number of idioms to be registered increases, and the burden of registration work increases. However, the recording capacity may not be reduced.

【０００９】そこでこの発明は、以上のような事情を考
慮してなされたもので、イディオムにおける可変部分を
代表記号で表わすことにより、イディオム中に可変部分
が複数存在する場合や可変部分に単語だけでなく句が入
る場合でも、イディオムを簡単に登録及び検索すること
ができ、登録時間の短縮、翻訳労力の削減および情報記
憶容量の増大の防止ができるイディオム登録機能を持つ
機械翻訳装置を提供するものである。Therefore, the present invention has been made in view of the above-mentioned circumstances, and the variable portion in the idiom is represented by a representative symbol, so that when the idiom has a plurality of variable portions or only the variable portion has a word. Provided is a machine translation device having an idiom registration function that can easily register and search for an idiom even when a phrase is entered, thereby shortening registration time, reducing translation effort, and preventing an increase in information storage capacity. Things.

【００１０】[0010]

【課題を解決するための手段】図１に、この発明の構成
ブロック図を示す。同図に示すように、この発明は、文
字列および記号を入力する入力手段１と、所定の属性を
共有する単語又は単語列の集合を代表する代表記号を１
つ又は複数個持ち、複数の属性が対応する単語又は単語
列の部分を複数の代表記号を複合した形式で表現したイ
ディオムの見出し語と訳語を登録するイディオム登録手
段２と、イディオムの登録と翻訳処理に必要な辞書及び
処理結果を記憶する記憶手段９と、入力単語列を形態素
に分解し、かつ文法解析を行う辞書引き・形態素解析手
段３と、入力文字列あるいはその一部分と登録されたイ
ディオムの見出し語とを同定し、同定されたイディオム
の見出し語に対応する文字列の訳語を生成するイディオ
ム翻訳手段４と、構文解析手段５と、構文変換手段６
と、翻訳文生成手段７と、翻訳文を出力する出力手段８
とを備え、イディオム登録手段２が、複数の代表記号か
らなるイディオムを登録する場合に、イディオム全体の
品詞を代表する中心語に対応する代表記号を指定した形
式で登録することを特徴とするイディオム登録機能を持
つ機械翻訳装置を提供するものである。FIG. 1 is a block diagram showing the configuration of the present invention. As shown in FIG. 1, the present invention comprises an input unit 1 for inputting a character string and a symbol, and a representative symbol representing a set of words or word strings sharing a predetermined attribute.
Idiom registration means 2 for registering idiom headwords and translations that have one or more words and a part of a word or word string corresponding to a plurality of attributes in a form in which a plurality of representative symbols are combined, and register and translate idioms A storage unit 9 for storing a dictionary and processing results required for processing, a dictionary lookup / morphological analysis unit 3 for decomposing an input word string into morphemes and performing grammatical analysis, and an idiom registered with an input character string or a part thereof Idiom translation means 4, syntax analysis means 5, and syntax conversion means 6 for identifying a headword of the idiom and generating a translation of a character string corresponding to the headword of the identified idiom.
And a translated sentence generating means 7 and an output means 8 for outputting a translated sentence
And the idiom registration means 2 has a plurality of representative symbols.
When registering an idiom consisting of
A form that specifies a representative symbol corresponding to the central word that represents the part of speech
An object of the present invention is to provide a machine translation device having an idiom registration function characterized in that registration is performed using a formula .

【００１１】また、前記記憶手段９は、入力された文字
列の翻訳を行うための文法および訳語情報を持つ辞書メ
モリ９ａと、訳語生成に至るまでの処理の結果を記憶す
るバッファメモリ９ｂと、前記イディオム登録手段２に
よって登録されたイディオムを記憶するイディオム登録
メモリ９ｃとから構成することが好ましい。The storage means 9 includes a dictionary memory 9a having grammar and translation information for translating an input character string, and a buffer memory 9b for storing a result of processing up to the generation of a translation. It is preferable to include an idiom registration memory 9c for storing the idiom registered by the idiom registration means 2.

【００１２】前記辞書引き・形態素解析手段３は、入力
された文字列を単語に分解し各単語の品詞情報を生成す
る品詞抽出部３ａと、各単語の訳語の候補を生成する訳
語抽出部３ｂとから構成することが好ましい。The dictionary lookup / morphological analysis unit 3 decomposes an input character string into words and generates part-of-speech information of each word, and a translated word extraction unit 3b that generates candidates for translated words of each word. It is preferred to be composed of

【００１３】前記イディオム翻訳手段４は、前記イディ
オム登録メモリを検索し分解された単語列と表現形式が
一致可能なイディオムの見出語の候補を選択するイディ
オム検索部４ａと、イディオムの中の代表記号の位置に
相当する単語又は単語列の属性が、代表記号に与えられ
た属性に一致するイディオムの見出語をイディオム候補
の中から一つに特定するイディオム同定部４ｂと、代表
記号に対応する単語又は単語列の構文を解析しイディオ
ム全体の文構成を生成するイディオム解析部４ｃと、イ
ディオムの文構成を基に、入力された単語列のイディオ
ム部の訳語を生成するイディオム訳語生成部４ｄとから
構成することが好ましい。The idiom translation means 4 searches the idiom registration memory and selects an idiom entry word candidate whose expression form can match the decomposed word string, and an idiom search section 4a; The idiom identification unit 4b for identifying one of the idiom headwords that match the attribute given to the representative symbol from the idiom candidate, where the attribute of the word or word string corresponding to the position of the symbol corresponds to the representative symbol. An idiom analysis unit 4c that analyzes the syntax of the word or word string to be generated and generates a sentence structure of the entire idiom, and an idiom translation generation unit 4d that generates a translation of the idiom part of the input word string based on the idiom sentence structure It is preferred to be composed of

【００１４】複数の代表記号からなるイディオムを登録
する場合に、イディオム全体の品詞を代表する中心語に
対応する代表記号を指定してもよい。When registering an idiom composed of a plurality of representative symbols, a representative symbol corresponding to a central word representing the part of speech of the entire idiom may be designated.

【００１５】[0015]

【作用】イディオムの見出語および訳語を登録する場
合、同定された単語とイディオムの可変部分に相当する
単語の属性を区別するあらかじめ定義された代表記号を
用いてイディオムの見出語と訳語をイディオム登録メモ
リに登録する。これにより、表現形式が同じで可変部分
の属性が共通する複数個のイディオムの見出語を１つの
見出語として登録することができる。When registering idiom headwords and translations, idiom headwords and translations are identified using a predefined representative symbol that distinguishes between the identified word and the attribute of the word corresponding to the variable part of the idiom. Register in the idiom registration memory. As a result, entries of a plurality of idioms having the same expression form and common attributes of variable portions can be registered as one entry.

【００１６】入力された文字列を単語に分解し、分解さ
れた単語列の一部の表現形式と一致可能なイディオムの
見出語の候補を検索し、さらにその見出語の中に含まれ
る代表記号の位置に相当する単語又は単語列の属性が代
表記号に与えられた属性に一致するかどうかを検査する
ことによって、イディオムを特定する。The input character string is decomposed into words, idiom entry word candidates that can match some expression forms of the decomposed word string are searched, and further included in the entry words. The idiom is specified by checking whether the attribute of the word or word string corresponding to the position of the representative symbol matches the attribute given to the representative symbol.

【００１７】さらに、イディオムの代表記号に対応する
単語又は単語列の構文を解析し、単語間のかかり受け関
係を基に入力された文字列のイディオムに相当する部分
の訳語を生成する。また、このときイディオムの中心語
に対応する代表記号が指定されている場合には、この代
表記号が意味する品詞を参照して訳語を生成する。Further, the syntax of the word or word string corresponding to the representative symbol of the idiom is analyzed, and a translated word corresponding to the idiom of the input character string is generated based on the interrelation between the words. If a representative symbol corresponding to the central word of the idiom is specified at this time, a translated word is generated with reference to the part of speech represented by the representative symbol.

【００１８】次に、入力された文字列全体の構文解析を
行い、単語間のかかり受け関係を基に、文字列全体の訳
語を生成し言語表現に必要な用語を補って翻訳文を生成
する。Next, a syntax analysis of the entire input character string is performed, and a translation of the entire character string is generated based on the dependency relationship between words, and a translation sentence is generated by supplementing terms necessary for language expression. .

【００１９】以上のように、この発明によれば、イディ
オムにおける可変部分を代表記号で表わすことにより、
イディオム中に可変部分が複数存在する場合や、可変部
分に単語だけでなく句が入る場合でも、イディオムを簡
単に登録及び検索することができ、イディオムを含む入
力文の適切な翻訳文を生成することができる。As described above, according to the present invention, the variable part in the idiom is represented by the representative symbol,
Even when there are multiple variable parts in an idiom, or when a variable part contains a phrase as well as a word, the idiom can be easily registered and searched, and an appropriate translation of the input sentence including the idiom is generated. be able to.

【００２０】なお、イディオム全体の品詞を代表する中
心語に対応する代表記号を指定することによって、イデ
ィオムの登録が簡単になり、さらにイディオムを含む入
力文の適切な翻訳文を生成することができる。By specifying a representative symbol corresponding to the central word representing the part of speech of the entire idiom, registration of the idiom is simplified, and an appropriate translation of the input sentence including the idiom can be generated. .

【００２１】[0021]

【実施例】以下、図に示す実施例に基づいて、この発明
を詳述する。なお、これによってこの発明が限定される
ものではない。実施例の説明の前に、機械翻訳の概念に
ついて簡単に説明する。図２を参照して、機械翻訳にお
いて行なわれる解析処理には、様々な解析レベルがあ
る。機械翻訳は、図２の左上に示されるソース言語が入
力されると、各レベルの処理を順に行なって最終的に図
２の右側に示されるターゲット言語を得るための処理で
ある。たとえばソース言語が入力されると、まずレベル
Ｌ１の辞書引き処理、レベルＬ２の形態素解析処理、レ
ベルＬ３の構文解析処理、…と処理が進められ、最終的
にレベルＬ１０の形態素生成処理が行なわれてターゲッ
ト言語が生成される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the embodiments shown in the drawings. Note that the present invention is not limited to this. Before describing the embodiments, the concept of machine translation will be briefly described. Referring to FIG. 2, there are various analysis levels in the analysis processing performed in machine translation. The machine translation is a process for sequentially processing each level when a source language shown in the upper left of FIG. 2 is input, and finally obtaining a target language shown in the right of FIG. For example, when a source language is input, first, a dictionary lookup process at level L1, a morphological analysis process at level L2, a syntax analysis process at level L3, and so on are performed, and finally a morpheme generation process at level L10 is performed. The target language is generated.

【００２２】機械翻訳は、どのレベルの解析処理まで行
なうかによって、大きく次の２つに分けられる。第１
は、レベルＬ６に示される中間言語まで解析し、そこか
らターゲット言語を生成していくピボット方式である。
第２は、上述のレベルＬ２〜レベルＬ５のいずれかまで
解析を行なってソース言語の内部構造を得、次に、得ら
れたソース言語の内部構造と同じレベルのターゲット言
語の内部構造に変換した後、ターゲット言語を生成する
トランスファ方式である。Machine translation is roughly divided into the following two types depending on the level of analysis processing to be performed. First
Is a pivot method for analyzing an intermediate language shown at level L6 and generating a target language therefrom.
Second, the internal structure of the source language is obtained by performing an analysis up to any one of the above-described levels L2 to L5, and then converted into the internal structure of the target language at the same level as the obtained internal structure of the source language. Later, a transfer method for generating a target language is used.

【００２３】ピボット方式において用いられる中間言語
とは、ソース言語およびターゲット言語のどちらにも依
存しない概念である。したがってソース言語の解析処理
を一度行なって中間言語を得れば、この中間言語から複
数の言語を生成することができるため、多言語間翻訳に
有利であるとされる。しかし、このようなピボット方式
では、その基本的概念である中間言語を本当に設定でき
るかどうかという点が未解決である。The intermediate language used in the pivot method is a concept that does not depend on either the source language or the target language. Therefore, once the analysis processing of the source language is performed to obtain the intermediate language, a plurality of languages can be generated from the intermediate language, which is advantageous for multilingual translation. However, in such a pivot method, it is unsolved whether or not the basic concept, that is, the intermediate language can be really set.

【００２４】トランスファ方式はピボット方式のこのよ
うな問題に対する妥協案であって、現在の多くのシステ
ムはこのトランスファ方式を採っている。以下の説明は
トランスファ方式についてのものであり、後述する実施
例の機械翻訳装置もトランスファ方式を用いたものであ
る。The transfer method is a compromise to such a problem of the pivot method, and many current systems adopt this transfer method. The following description is about the transfer method, and the machine translation device of the embodiment described later also uses the transfer method.

【００２５】以下、図２に示される各解析処理の内容に
ついて説明する。（１）辞書引き、形態素解析この処理においては、入力された文章に対して、形態素
が格納された辞書を参照しながらたとえば最長一致法な
どを用いて形態素列（単語列）に分割する処理がまず行
なわれる。そして、得られた各単語に対する品詞などの
文法情報および訳語を得、さらに時制・人称・数などを
解析する処理が行なわれる。The contents of each analysis process shown in FIG. 2 will be described below. (1) Dictionary Lookup, Morphological Analysis In this process, a process of dividing an input sentence into morpheme strings (word strings) using, for example, a longest match method while referring to a dictionary in which morphemes are stored. First done. Then, grammatical information such as a part of speech and a translated word for each of the obtained words are obtained, and a process of analyzing tense, personal name, number, and the like is performed.

【００２６】（２）構文解析この処理においては、品詞・変化形などに基づき、単語
間のかかり受けなどの文章の構造（構造解析木）を組立
て、決定する処理が行なわれる。この処理においては、
得られた文章の構造が正しい意味を表わすかどうかにつ
いての判断は行なわれない。(2) Syntax Analysis In this process, a process of assembling and determining a sentence structure (structure analysis tree) such as the inter-word interrelation is performed based on the part of speech and the variation. In this process,
No determination is made as to whether the resulting sentence structure represents the correct meaning.

【００２７】（３）意味解析構文解析処理の結果得られた複数の構造解析木から、意
味的に正しいものとそうでないものとを判別し、正しい
もののみを採用する処理が行なわれる。(3) Semantic analysis From a plurality of structural analysis trees obtained as a result of the syntax analysis processing, a semantically correct one and a non-semantically correct one are discriminated, and processing for adopting only the correct one is performed.

【００２８】（４）文脈解析文脈解析処理では、入力された文章の話題を理解し、入
力文章中に含まれる省略部分や曖昧さなどを取去る処理
が行なわれる。(4) Context Analysis In the context analysis process, a process of understanding the topic of an input sentence and removing omitted parts, ambiguity, and the like included in the input sentence is performed.

【００２９】次に、図３に示すこの発明の一実施例であ
る機械翻訳装置のブロック図について説明する。この機
械翻訳装置は、メインＣＰＵ（中央処理装置）３１と、
メインＣＰＵ３１が接続されたバス３７と、バス３７に
接続されたメインメモリ３２と、バス３７に接続された
ＣＲＴ（陰極線管）やＬＣＤ（液晶表示装置）などから
なる表示装置３３と、キーボード３４と、バス３７に接
続された翻訳モジュール３５と翻訳モジュール３５に接
続された翻訳用の辞書・文法規則および木変換構造規則
などの知識ベースを格納している辞書メモリ３６とを含
む。Next, a block diagram of a machine translation apparatus according to an embodiment of the present invention shown in FIG. 3 will be described. This machine translation device includes a main CPU (central processing unit) 31,
A bus 37 to which the main CPU 31 is connected; a main memory 32 connected to the bus 37; a display device 33 such as a cathode ray tube (CRT) or an LCD (liquid crystal display) connected to the bus 37; , A translation module 35 connected to the bus 37 and a dictionary memory 36 storing knowledge bases such as translation dictionaries / grammar rules and tree conversion structure rules connected to the translation module 35.

【００３０】翻訳モジュール３５は、ソース言語の文章
が入力されると、それを所定の手順で翻訳してターゲッ
ト言語を出力するためのものである。図４に翻訳モジュ
ール５のブロック図を示す。The translation module 35, when a source language sentence is input, translates the sentence in a predetermined procedure and outputs a target language. FIG. 4 shows a block diagram of the translation module 5.

【００３１】翻訳モジュール３５は、バス３７に接続さ
れ、バス３７を介して入力されるソース言語（本実施例
の場合には英語）を、所定の翻訳プログラムに従って翻
訳してターゲット言語（本実施例の場合には日本語）と
してバス３７に出力するための翻訳ＣＰＵ４５と、バス
３７に接続され、翻訳ＣＰＵ４５で実行される翻訳プロ
グラムを格納するための翻訳プログラムメモリ４６と、
入力されたソース言語の原文を各単語ごとに格納するた
めのバッファＡ（４０）と、バッファＡ（４０）に格納
された各単語につき、辞書メモリ３６に含まれる辞書を
参照して得た各単語の品詞、訳語などの情報を格納する
ためのバッファＢ（４１）と、ソース言語の構造解析木
に関する情報を格納するためのバッファＣ（４２）と、
バッファＣ（４２）に格納されたソース言語の構造解析
木が変換されたターゲット言語の構造解析木を格納する
ためのバッファＤ（４３）と、バッファＤ（４３）に格
納された日本語の構造解析木に適切な助詞や助動詞など
を補充して、日本語の形として整えられた文章を格納す
るためのバッファＥ（４４）とを含む。以上のような構
成を持つ翻訳モジュールにおいて、図２に示したレベル
Ｌ３までの解析処理を行うものとする。The translation module 35 is connected to the bus 37 and translates a source language (English in the present embodiment) input via the bus 37 in accordance with a predetermined translation program, and translates the target language (the present embodiment). A translation CPU 45 for outputting to the bus 37 as a Japanese language), a translation program memory 46 connected to the bus 37 for storing a translation program to be executed by the translation CPU 45,
A buffer A (40) for storing the input source language original sentence for each word, and each of the words stored in the buffer A (40) obtained by referring to the dictionary contained in the dictionary memory 36. A buffer B (41) for storing information such as word parts of speech and translations, and a buffer C (42) for storing information related to a source language structural analysis tree;
A buffer D (43) for storing the target language structure analysis tree obtained by converting the source language structure analysis tree stored in the buffer C (42), and a Japanese structure stored in the buffer D (43) A buffer E (44) for storing a sentence prepared in Japanese form by supplementing the parse tree with appropriate particles, auxiliary verbs, and the like. In the translation module having the above configuration, it is assumed that the analysis processing up to the level L3 shown in FIG. 2 is performed.

【００３２】図５に翻訳モジュールの処理構造を示す。
同図において、５１はソース言語に対して辞書引き・形
態素解析処理を行なうための辞書引き・形態素解析部で
あり、５２はソース言語中にイディオムが存在する場合
にイディオムの同定、解析及び訳語を生成するイディオ
ム翻訳部、５３は形態素解析された入力文章に対して構
文解析を行なうための構文解析部、５４は構文解析の結
果を変換してターゲット言語の構造解析木を生成するた
めの構文変換部、５５は構文変換部５４によって生成さ
れたターゲット言語の構造解析木に基づき、ターゲット
言語の翻訳文を生成するための翻訳文生成部である。FIG. 5 shows the processing structure of the translation module.
In the figure, reference numeral 51 denotes a dictionary lookup / morphological analysis unit for performing dictionary lookup / morphological analysis processing on a source language, and 52 denotes identification, analysis, and translation of an idiom when an idiom exists in the source language. An idiom translating unit to be generated, 53 is a parsing unit for parsing the morphologically analyzed input sentence, and 54 is a syntactic conversion for converting a result of the parsing to generate a structure analysis tree of a target language. And 55, a translated sentence generating unit for generating a translated sentence of the target language based on the target language structural parse tree generated by the syntax converting unit 54.

【００３３】辞書引き・形態素解析部５１は、入力され
た文字列を単語に分解し、各単語の品詞情報を生成する
品詞抽出部５１ａと、各単語の訳語の候補を生成する訳
語抽出部５１ｂとからなる。The dictionary lookup and morphological analysis unit 51 decomposes the input character string into words and generates part-of-speech information of each word, and a translated word extraction unit 51b that generates candidates for translated words of each word. Consists of

【００３４】イディオム翻訳部５２は、辞書引き・形態
素解析部５１によって分解された単語列と表現形式が一
致可能なイディオム見出語の候補をあらかじめ登録され
ているイディオムの中から選択するイディオム検索部５
２ａと、イディオム検索部５２ａで選択されたイディオ
ム候補の中から、イディオムの見出語を１つに特定する
イディオム同定部５２ｂと、１つに特定されたイディオ
ムと対応する単語列の構文を解析し、イディオム全体の
文構成を生成するイディオム解析部５２ｃと、イディオ
ムの文構成をもとに、入力された単語列のイディオム部
分の訳語を生成するイディオム訳語生成部とからなる。
なお各部５１〜５５で行なわれる処理については、後に
詳述する。The idiom translation unit 52 selects an idiom entry word candidate whose expression form can match the word sequence decomposed by the dictionary lookup / morphological analysis unit 51 from idioms registered in advance. 5
2a, an idiom identification unit 52b that identifies one idiom entry from the idiom candidates selected by the idiom search unit 52a, and analyzes the syntax of a word string corresponding to the one idiom identified. An idiom analysis unit 52c that generates a sentence structure of the entire idiom, and an idiom translation generation unit that generates a translation of an idiom part of the input word string based on the idiom sentence structure.
The processing performed by each of the units 51 to 55 will be described later in detail.

【００３５】以下、図３〜図１０を参照して、本実施例
の機械翻訳装置による英日翻訳の動作を説明する。ここ
では、イディオムを含まない英文“Ｔｈｉｓｉｓａ
ｐｅｎ．”を例にとって、この英文を日本文に翻訳す
る動作の概要を示す。Hereinafter, the operation of English-Japanese translation by the machine translation apparatus of the present embodiment will be described with reference to FIGS. Here, the English sentence “This is a
pen. Here is an outline of the operation of translating this English sentence into a Japanese sentence.

【００３６】まず、読込まれた原文は形態素解析によっ
て形態素に分解され、図６に示されるようにバッファＡ
（４０）（図４参照）に格納される。続いて翻訳プログ
ラムメモリ４６に記憶されたプログラムに基づく翻訳Ｃ
ＰＵ４５の制御の下に、図５に示される辞書引き・形態
素解析部５１によって、バッファＡ（４０）に格納され
た原文の各単語につき、辞書メモリ３６に格納されてい
る辞書を参照することにより各単語の訳語や品詞情報な
どの情報が得られる。この情報は図４に示されるバッフ
ァＢ（４１）に格納される。First, the read original text is decomposed into morphemes by morphological analysis, and the buffer A is read as shown in FIG.
(40) (see FIG. 4). Subsequently, the translation C based on the program stored in the translation program memory 46
Under the control of the PU 45, the dictionary lookup and morphological analysis unit 51 shown in FIG. 5 refers to the dictionary stored in the dictionary memory 36 for each word of the original text stored in the buffer A (40). Information such as the translation of each word and part of speech information can be obtained. This information is stored in the buffer B (41) shown in FIG.

【００３７】これらの情報の一部として、各単語の品詞
情報が含まれるが、これら品詞情報は図７に示されるよ
うに格納される。すなわち“ｔｈｉｓ”の多品詞語であ
って代名詞、指示形容詞の２つの品詞を持つ。また“ｉ
ｓ”の品詞は動詞である。同様に“ａ”、“ｐｅｎ”に
ついてもそれぞれの品詞がバッファＢ（４１）に格納さ
れる。“ｔｈｉｓ”は多品詞語であるが、文中の品詞が
何であるかについては、図５に示される構文解析部５３
に相当する処理によって一意に決定される。As a part of the information, the part of speech information of each word is included. These parts of speech information are stored as shown in FIG. That is, it is a multi-speech word of "this" and has two parts of speech, a pronoun and a denotation adjective. Also, "i
The part of speech of "s" is a verb. Similarly, the parts of speech of "a" and "pen" are stored in the buffer B (41). Whether it is present or not is determined by the syntax analyzer 53 shown in FIG.
Is uniquely determined by the process corresponding to.

【００３８】翻訳プログラムのうち図５に示される構文
解析部５３に相当する処理においては、辞書メモリ３６
に格納された辞書および文法規則に従って、各単語間の
かかり受け関係を示す構造解析木がたとえば図８に示さ
れるように決定される。この構文解析結果は図４のバッ
ファＣ（４２）に格納される。In the processing corresponding to the syntax analyzer 53 shown in FIG. 5 in the translation program, the dictionary memory 36
In accordance with the dictionary and grammar rules stored in the grammar, a structure analysis tree indicating the relationship between each word is determined, for example, as shown in FIG. This syntax analysis result is stored in the buffer C (42) of FIG.

【００３９】構造解析木の決定は次のようにして行なわ
れる。辞書メモリ３６に格納された文法規則のうち、英
語に関する文法規則は次のようなものから成り立ってい
る。文→主部、述部主部→名詞句述部→動詞、名詞句名詞句→代名詞名詞句→冠詞、名詞この規則のうちたとえば１つ目の規則は、「文は主部と
述部からできている。」ということを表わす。他の規則
についても同様である。これらの規則に従って構造解析
木が決定される。なお、このような文法規則は同じよう
に日本語についても用意されており、英語の文法規則と
日本語の文法規則との間で対応づけがなされている。The determination of the structural analysis tree is performed as follows. Of the grammar rules stored in the dictionary memory 36, the grammar rules relating to English consist of the following. Sentence → main part, predicate main part → noun phrase predicate → verb, noun phrase noun phrase → pronoun noun phrase → article, noun For example, the first rule of this rule is that “a sentence consists of a main part and a predicate. It's done. " The same applies to other rules. The structural analysis tree is determined according to these rules. It should be noted that such grammatical rules are similarly prepared for Japanese, and correspondence is made between English grammatical rules and Japanese grammatical rules.

【００４０】翻訳プログラムのうち、図５に示される構
文変換部５４に相当する処理においては、入力された英
文の構造解析木（図８参照）の構造が、図９に示される
日本文に対する構文解析木の構造に変換される。この変
換においては、上述の構文解析部５３が利用したのと同
様に、辞書メモリ３６に格納されている「木構造変換規
則」が用いられる。この変換は、図２でいえばレベルＬ
３からターゲット言語のレベルＬ９への変換に相当す
る。得られた結果は図４に示されるバッファＤ（４３）
に格納される。この説明において用いられている例文
“Ｔｈｉｓｉｓａｐｅｎ．”は、この変換によって
日本語文字列「これペンである」に変換されたこと
になる。In the processing corresponding to the syntax conversion unit 54 shown in FIG. 5 in the translation program, the structure of the input sentence analysis tree (see FIG. 8) is changed to the syntax for the Japanese sentence shown in FIG. Converted to parse tree structure. In this conversion, the “tree structure conversion rule” stored in the dictionary memory 36 is used in the same manner as used by the syntax analysis unit 53 described above. This conversion corresponds to the level L in FIG.
3 corresponds to conversion to the target language level L9. The obtained result is the buffer D (43) shown in FIG.
Is stored in The example sentence “This is pen.” Used in this description is converted into the Japanese character string “This is pen” by this conversion.

【００４１】翻訳プログラムのうち図５の翻訳文生成部
５５に相当する処理を行なう部分は、得られた日本語文
字列「これペンである」に適切な助詞「は」や助動
詞をつけることにより、図１０に示されるような文法的
な日本語の形にし、図４に示されるバッファＥ（４４）
に格納す。この処理は、図２に示されるレベルＬ９から
レベルＬ１０への変換に相当する。得られた日本文「こ
れはペンである。」は、図３に示される翻訳モジュール
３５から出力され、メインメモリ３２に格納されるとと
もに、表示装置３３に表示される。The part of the translation program that performs the processing corresponding to the translated sentence generation unit 55 in FIG. 5 is obtained by adding an appropriate particle "ha" or an auxiliary verb to the obtained Japanese character string "this pen". , The grammatical Japanese form as shown in FIG. 10, and the buffer E (44) shown in FIG.
To be stored. This processing corresponds to the conversion from the level L9 to the level L10 shown in FIG. The obtained Japanese sentence “This is a pen” is output from the translation module 35 shown in FIG. 3, stored in the main memory 32, and displayed on the display device 33.

【００４２】次に、単語又は単語列を代表記号で表現し
たイディオムの登録について説明する。図１１に、代表
記号のテーブルの例を示す。このテーブルは、辞書メモ
リ３６の中にあらかじめ記憶されているものとする。同
図に示すように、代表記号は種々の品詞を表わすｔｅｒ
ｍｉｎａｌ代表記号と、主として文・句・節と呼ばれる
単語列からなるｎｏｎ−ｔｅｒｍｉｎａｌ代表記号に分
類することができる。Next, registration of an idiom in which a word or word string is represented by a representative symbol will be described. FIG. 11 shows an example of a table of representative symbols. This table is stored in the dictionary memory 36 in advance. As shown in the figure, the representative symbol is ter indicating various parts of speech.
It can be categorized into a non-terminal representative symbol consisting of a minal representative symbol and a word string mainly called a sentence / phrase / clause.

【００４３】図１２〜図１６に、イディオムの登録例を
示す。図に示すように、登録されたイディオムは英単語
及びその品詞、訳語及びその品詞で構成され、これらは
すべて辞書メモリ３６に記憶される。FIGS. 12 to 16 show examples of idiom registration. As shown in the figure, the registered idiom is composed of English words and their parts of speech, translated words and their parts of speech, and these are all stored in the dictionary memory 36.

【００４４】図１２は、〔ａｓＡａｓｃａｎｂ
ｅ〕というイディオムを代表記号を利用して登録したも
のである。ここでの‘Ａ’は単語、句又は文を表わすも
のとする。このイディオム例では、‘Ａ’として対応す
るものは形容詞の一単語のみである。そこで、図１１の
代表記号テーブルにおいて、形容詞を表す代表記号は＊
ａであるので、英単語の欄に〔ａｓ＊ａａｓｃａ
ｎｂｅ〕という形式で登録する。FIG. 12 shows [as A as can b]
e) is registered using a representative symbol. Here, 'A' represents a word, phrase or sentence. In this idiom example, only one word of an adjective corresponds to 'A'. Therefore, in the representative symbol table of FIG. 11, the representative symbol representing an adjective is *
a, so [as * a as ca
n be].

【００４５】続く品詞の欄はイディオムの品詞を登録す
るものであるが、この場合形容詞句であり、＊ａの形容
詞と同義であるので指定しない。このように指定しない
のは、今の場合代表記号に入るのが形容詞だけと特定で
きたが、形容詞か副詞が入るという場合、入った品詞に
よって、イディオムの品詞が変わるからである。例え
ば、副詞が入った時は、そのイディオムは副詞句とな
り、形容詞が入った時は形容詞句になるという様に指定
する。The following part-of-speech column registers the part-of-speech of an idiom. In this case, it is an adjective phrase and is not specified because it is synonymous with the adjective * a. The reason why the designation is not specified in this way is that, in the present case, only the adjective is included in the representative symbol, but when the adjective or the adverb is included, the part of speech of the idiom changes depending on the included part of speech. For example, when an adverb is included, the idiom is designated as an adverb phrase, and when an adjective is included, the idiom is designated as an adjective phrase.

【００４６】続いて訳語を「この上もなく＊ａ」のよう
に、＊ａが日本語のどの部分に当たるのかを明確にして
おく。最後の品詞の欄は訳語の品詞指定であるが、訳語
が＊ａのように代表記号で終っており、＊ａの活用によ
って変わるので品詞は特定できない。そこで何も指定し
ないのであるが、訳語の品詞は後の処理で特定される。Next, it is clarified to which part of Japanese the * a corresponds, such as "* a" for the translated word. The last part-of-speech column specifies the part-of-speech of the translated word. However, the translated word ends with a representative symbol like * a, and the part-of-speech cannot be specified because it changes depending on the use of * a. Therefore, nothing is specified, but the part of speech of the translated word is specified in later processing.

【００４７】次に図１３は〔ａｓＡａｓＢ〕で
‘Ａ’が形容詞、‘Ｂ’にｔｏ不定詞が入るイディオム
である。形容詞の一単語は代表記号テーブルを参照して
＊ａであり、ｔｏ不定詞は＊Ｉである。そこで図１２と
同じように登録するが、代表記号が２つあり、次の項目
の品詞指定で品詞を指定しなくてもよいようにするた
め、どちらが中心語であるかを指定する。ここでは、＊
ａが中心語となるので、＊の記号を付け加えて、＊＊ａ
とし、英単語の項目に〔ａｓ＊＊ａａｓ＊Ｉ〕と
登録する。Next, FIG. 13 shows an idiom in which "A" is an adjective and "B" is a to-infinitive in [as A as B]. One word of an adjective is * a with reference to the representative symbol table, and to infinitive is * I. Therefore, registration is performed in the same manner as in FIG. 12, but there are two representative symbols, and in order to avoid having to specify the part of speech in the part of speech specification of the next item, which is the central word is specified. here,*
Since a is the central word, add the symbol * to ** a
And [as ** aas * I] is registered in the English word item.

【００４８】続いて品詞については、このイディオムは
形容詞句であり、中心語の＊ａと一致するため指定しな
い。訳語は「＊Ｉ［体：］ほど＊ａ」とする。ここで、
‘＊Ｉ［体：］’はｔｏ不定詞を訳す時に連体形にする
ように指定する記号である。最後の品詞は図１２の時と
同様に＊ａで終っていて特定できないので書かない。Subsequently, the idiom is not specified because the idiom is an adjective phrase and matches the central word * a. The translated word is “* I [body:] as * a”. here,
'* I [body:]' is a symbol for designating the infinitive to be in a continuous form when translated. The last part of speech is not written because it ends with * a and cannot be specified as in the case of FIG.

【００４９】図１４は、〔ＡｔｈｒｏｕｇｈＢ〕と
いう形式のイディオムを登録したものである。このイデ
ィオムでは、‘Ａ’には数詞又は名詞句が入るので、図
１１の代表記号として＊ｍ又は＊Ｎが対応する。このよ
うに複数個の代表記号が対応する時は、まとめて＊ｍＮ
と書くことにする。この時の記号の順はｔｅｒｍｉｎａ
ｌ記号、ｎｏｎ−ｔｅｒｍｉｎａｌ記号の順とし、ｔｅ
ｒｍｉｎａｌ記号どうし、ｎｏｎ−ｔｅｒｍｉｎａｌ記
号どうしの場合は順序を問わない。この順序は後の処理
の順序に関係し、先頭の代表記号の方、ここでは‘ｍ’
の方が優先的に処理される。FIG. 14 shows a registered idiom of the form [A through B]. In this idiom, since 'A' contains a numeral or a noun phrase, * m or * N corresponds to the representative symbol in FIG. As described above, when a plurality of representative symbols correspond, collectively * mN
I will write. The order of the symbols at this time is termina
l symbol, non-terminal symbol, and te
In the case of rminal symbols and non-terminal symbols, the order does not matter. This order is related to the order of the subsequent processing, and the leading representative symbol, here, “m”
Is preferentially processed.

【００５０】また‘Ｂ’にも数詞又は名詞句が入り、
‘Ａ’と同様に＊ｍＮとなるが、‘Ａ’と‘Ｂ’の区別
をするために、＊ｍＮ１，＊ｍＮ２というように代表記
号のすぐ後に数字を振り当てて記述する。このように、
代表記号の列が複数個一致する時は１から順に数字を振
り当てる。また、この場合、品詞指定の項目で品詞を設
定するようにするので中心語を設定する必要はない。品
詞指定に関して、このイディオムでは代表記号の所に数
詞が入った場合でも、名詞句が入った場合でも、名詞句
として扱うのが望ましいと考えられるので名詞句と指定
する。Also, 'B' contains a number or noun phrase,
It is * mN like "A", but in order to distinguish between "A" and "B", numbers are assigned immediately after the representative symbols such as * mN1 and * mN2. in this way,
When a plurality of representative symbol strings match, numbers are assigned in order from one. Further, in this case, since the part of speech is set in the item of part of speech designation, it is not necessary to set the central word. Regarding the part-of-speech designation, this idiom designates a noun phrase even if it contains a numeral or a noun phrase at the representative symbol because it is considered desirable to treat it as a noun phrase.

【００５１】続く訳語に関しても、「＊ｍＮ１から＊ｍ
Ｎ２まで」のように英単語の項目と同じように番号を明
確に表現して指定する。最後の品詞は、この場合訳語が
「〜まで」と活用しない形で終っているので、‘その
他’を指定する。Regarding the following translated words, “* mN1 to * m
The number is clearly expressed and specified in the same way as the English word item, such as "up to N2". In the last part of speech, in this case, the translated word ends in a form that does not utilize “to”, so “Other” is specified.

【００５２】図１５及び図１６はどちらも〔Ａｔｉｍ
ｅｓａｓＢａｓＣ〕のイディオムを登録したも
のである。このように二つに分けて登録するのは、
‘Ｃ’の部分に文または名詞句が来る時と、人称代名詞
の主格＋助動詞が来る場合では、代表記号に置き換えた
時に、一方は‘＊ＣＮ’、他方は‘＊３＊ｘ’と表現
されこの部分の文法上の構造が異なり、また代表記号の
数が一致しないので、一つにまとめることはできないか
らである。しかしながらこのように複数にわけて書くこ
とを認めることにより、より多くのイディオムを登録可
能とし、様々なタイプの入力文に対応できる。FIGS. 15 and 16 both show [A time
es as B as C]. In this way, it is divided into two and registered
When the sentence or noun phrase comes in the part of 'C', and when the personal pronoun nominative + auxiliary verb comes, one is expressed as '* CN' and the other is '* 3 * x' when replaced with a representative symbol. However, since the grammatical structure of this part is different and the number of representative symbols does not match, they cannot be combined into one. However, by allowing writing in a plurality of ways, it is possible to register more idioms and to cope with various types of input sentences.

【００５３】また図１６の訳語で、「＊ｍ倍＊Ｃ
［体：］Ｎより＊ａｄ」となっているが、これは、＊Ｃ
の時は連体形で活用し、Ｎの時は活用しないことを示
す。このように複数のカテゴリーを指定する時で、活用
形があるときは、一つごとに指定する。例えば、一つの
可変部分に名詞句、形容詞句、又は副詞句が入り、形容
詞句は連体形で活用し、副詞句は終止形で活用する場合
は、＊ＮＡ［体：］Ｄ［終：］と書くようにする。In the translation of FIG. 16, "* m times * C
[Body:] * ad "from N, which is * C
In the case of, it indicates that it is used in a continuous form, and in the case of N, it is not used. When a plurality of categories are specified in this way, if there is an inflected form, specify each one. For example, if one variable part contains a noun phrase, adjective phrase, or adverb phrase, and the adjective phrase is used in adjoint form, and the adverb phrase is used in end form, * NA [body:] D [end:] To write.

【００５４】次に図１７に示すフローチャートを用いて
代表記号によって登録されたイディオムと入力文字列と
の同定処理及び訳語出力処理について説明する。イディ
オムの登録の例である図１２〜図１６のイディオムがす
でに辞書メモリ３６の中に登録されているものとする。Next, an identification process of an idiom registered by a representative symbol and an input character string and a translated word output process will be described with reference to a flowchart shown in FIG. It is assumed that the idioms shown in FIGS. 12 to 16, which are examples of idiom registration, have already been registered in the dictionary memory 36.

【００５５】図１７は、代表記号によって登録されたイ
ディオムと入力文字列との同定から構文解析処理へのデ
ータの受渡しまでの処理を示したフローチャートであ
る。図１８は文法規則の一部であり、実施例で使用され
るステップＳ６用の文法規則である。この文法規則はあ
らかじめ必要な数だけ辞書メモリ３６に記憶しておく。
以下、［Ｔｈｉｓａｐｐｌｅｉｓｔｈｒｅｅｔ
ｉｍｅｓａｓｂｉｇａｓｔｈａｔｏｒａｎｇ
ｅ．］という文が入力された場合を例にとり説明する。FIG. 17 is a flowchart showing a process from the identification of an idiom registered by a representative symbol and an input character string to the transfer of data to a syntax analysis process. FIG. 18 shows a part of the grammar rule, which is a grammar rule for step S6 used in the embodiment. The required number of the grammar rules is stored in the dictionary memory 36 in advance.
Hereinafter, [This apple is treet]
imes as big as that orang
e. ] Will be described as an example.

【００５６】まず、ステップＳ１で辞書引きが行なわれ
る。これにより品詞情報および、細分類の情報が翻訳モ
ジュール３５の中の品詞バッファにセットされる。この
例文の品詞バッファのメモリ内容を表したものが図１９
である。First, dictionary lookup is performed in step S1. As a result, the part-of-speech information and the information of the fine classification are set in the part-of-speech buffer in the translation module 35. FIG. 19 shows the memory contents of the part of speech buffer of this example sentence.
It is.

【００５７】ステップＳ２では、図１１の代表記号テー
ブルと図１２から図１６に示す代表記号によって登録さ
れたイディオムが参照され、登録されたイディオムの中
に図１９に示された品詞バッファの各語及び品詞とマッ
チできるものがあるかどうかを探索する。図１９に示さ
れた品詞バッファを見て登録されたイディオムとマッチ
するものを捜す。In step S2, the representative symbol table of FIG. 11 and the idioms registered by the representative symbols shown in FIGS. 12 to 16 are referred to, and each of the words of the part-of-speech buffer shown in FIG. And whether there is something that can match the part of speech. The part-of-speech buffer shown in FIG. 19 is searched for a match with the registered idiom.

【００５８】この例文の場合、まず品詞バッファ５の
‘ｔｉｍｅｓ’が図１５によって登録されたイディオム
の‘ｔｉｍｅｓ’とマッチし、次に品詞バッファ６の
‘ａｓ’、品詞バッファ８の‘ａｓ’が図１５のそれぞ
れの‘ａｓ’とマッチする。また、図１６のイディオム
も同じ‘ｔｉｍｅｓ’、‘ａｓ’、‘ａｓ’を持ってい
るのでマッチし、結局、図１５及び図１６とも入力文と
マッチする可能性のあるイディオムとなる。ここで、生
成されたイディオムバッファの数は２に設定される。In the case of this example sentence, first, “time” of the part-of-speech buffer 5 matches “time” of the idiom registered in FIG. 15, and then “as” of the part-of-speech buffer 6 and “as” of the part-of-speech buffer 8 are changed. Matches with each 'as' in FIG. Also, the idiom in FIG. 16 has the same 'time', 'as', and 'as', so it matches, and eventually the idiom in FIGS. 15 and 16 is likely to match the input sentence. Here, the number of generated idiom buffers is set to two.

【００５９】さらに、それぞれの代表記号が置き換えら
れる単語や句、文がイディオムバッファにセットされ
る。具体的に説明すると、図１５のイディオムの＊ｍに
当たるのが、‘ｔｈｒｅｅ’であり、＊＊ａｄに当たる
のが‘ｂｉｇ’、＊３が‘ｔｈａｔ’，＊ｘが‘ｏｒａ
ｎｇｅ’となりイディオムバッファ１にセットされる
（図１９参照）。Further, words, phrases, and sentences in which the respective representative symbols are replaced are set in the idiom buffer. More specifically, "three" corresponds to * m of the idiom in FIG. 15, "big" corresponds to ** ad, "that" corresponds to * 3, and "ora" corresponds to * x.
nge 'and set in the idiom buffer 1 (see FIG. 19).

【００６０】図１６のイディオムの場合は、図１５の対
応とほぼ同じであるが、最後の＊ＣＮの部分が‘ｔｈａ
ｔｏｒａｎｇｅ’となる。それらを格納したのが図２
０のイディオムバッファ２である。なお、ステップＳ２
において、イディオムとマッチしなければ、通常のステ
ップＳ１４の構文解析の処理に移る。The case of the idiom of FIG. 16 is almost the same as that of FIG. 15, except that the last * CN portion is' tha
t orange '. Fig. 2 shows them stored.
0 idiom buffer 2. Step S2
If it does not match the idiom, the routine proceeds to the ordinary syntax analysis processing in step S14.

【００６１】ステップＳ３では、カウンタ１の値を１に
初期化する。このカウンタ１は、イディオムバッファの
番号を処理するために利用する。ステップＳ４では、カ
ウンタ２の値を１に初期化する。このカウンタ２は、イ
ディオムバッファの内の代表記号の番号を処理するため
のものである。In step S3, the value of the counter 1 is initialized to 1. This counter 1 is used to process the number of the idiom buffer. In step S4, the value of the counter 2 is initialized to 1. This counter 2 processes the number of the representative symbol in the idiom buffer.

【００６２】ステップＳ５では、カウンタ１の値が指す
イディオムバッファの中にある代表記号のうちカウンタ
２が示す順番の代表記号に注目しｔｅｒｍｉｎａｌ記号
かどうか判断する。この場合はカウンタ１の値は１であ
るので、イディオムバッファ１を注目し、その中でカウ
ンタ２が示す順番、つまりこの場合は１番目＊ｍに注目
する。そしてその先頭の代表記号がｔｅｒｍｉｎａｌ記
号かどうかを判断する。この場合、＊ｍはｔｅｒｍｉｎ
ａｌ記号なので、ステップＳ８に処理を移す。In step S5, attention is paid to the representative symbol in the order indicated by the counter 2 among the representative symbols in the idiom buffer indicated by the value of the counter 1, and it is determined whether the symbol is a terminal symbol. In this case, since the value of the counter 1 is 1, the idiom buffer 1 is focused on, and the order indicated by the counter 2 therein, ie, the first * m in this case, is focused on. Then, it is determined whether or not the leading representative symbol is a terminal symbol. In this case, * m is termin
Since it is an al symbol, the process proceeds to step S8.

【００６３】ステップＳ８では、代表記号と英単語がマ
ッチするかどうかを見る。この場合、注目されている英
単語は‘ｔｈｒｅｅ’であり、図１９の品詞バッファか
ら数詞、つまり代表記号の＊ｍにマッチすることがわか
り、ステップＳ９に処理を移す。In step S8, it is checked whether the representative symbol matches the English word. In this case, the English word of interest is 'three', and it can be seen from the part-of-speech buffer in FIG. 19 that it matches the numeral, ie, the representative symbol * m, and the process proceeds to step S9.

【００６４】もしステップＳ８でマッチしなければ、ス
テップＳ８０に処理を移す。ステップＳ８０では代表記
号文字列がｎｏｎ−ｔｅｒｍｉｎａｌ記号を含むか判定
し、含む場合はステップＳ６に処理を移し、含まない時
はステップＳ１２に処理を移す。If no match is found in step S8, the process proceeds to step S80. In step S80, it is determined whether or not the representative symbol character string includes a non-terminal symbol. If it does, the process proceeds to step S6; otherwise, the process proceeds to step S12.

【００６５】ステップＳ９では、カウンタ２の値を１増
やす。今は、カウンタ２は１であったので２になる。ス
テップＳ１０は、カウンタ２の値がイディオムバッファ
の代表記号の数より大きいかどうか判定する。ここでカ
ウンタ２の値がイディオムバッファの数よりも大きけれ
ば、その候補のイディオムと完全にマッチしたことにな
る。今の場合は、カウンタ２の値は２であり、イディオ
ムバッファ内の代表記号の数は４であるので、失敗しス
テップＳ５に処理を移す。In step S9, the value of the counter 2 is incremented by one. Now, since the counter 2 is 1, it becomes 2. A step S10 decides whether or not the value of the counter 2 is larger than the number of the representative symbols of the idiom buffer. Here, if the value of the counter 2 is larger than the number of idiom buffers, it means that the idiom of the candidate is completely matched. In this case, since the value of the counter 2 is 2 and the number of representative symbols in the idiom buffer is 4, the process fails and the process shifts to step S5.

【００６６】ステップＳ５では、また注目する代表記号
がｔｅｒｍｉｎａｌ記号であるかどうか判定するが、今
回は、カウンタ２の値は２となっているため、図１５に
おける＊＊ａｄ、すなわち図２０における＊ａｄが先頭
として注目される。＊ａｄは図１１の代表記号テーブル
によればｔｅｒｍｉｎａｌ記号であるため、ステップＳ
８に処理を移す。In step S5, it is determined whether or not the representative symbol of interest is a terminal symbol. In this case, since the value of the counter 2 is 2, ** ad in FIG. 15, that is, * in FIG. ad is noted as the head. * Ad is a terminal symbol according to the representative symbol table of FIG.
8 is transferred.

【００６７】ステップＳ８では、代表記号‘＊＊ａｄ’
と対応する英単語‘ｂｉｇ’とがマッチするかどうかを
判定する。英単語‘ｂｉｇ’は形容詞であるので、＊＊
ａｄの中の＊ａの方とマッチする。次にステップＳ９の
処理へ進む。In step S8, the representative symbol "** ad"
And whether the corresponding English word 'big' matches. Since the English word 'big' is an adjective, **
Matches with * a in ad. Next, the process proceeds to step S9.

【００６８】ステップＳ９の処理でカウンタ２の値が３
となり、ステップＳ１０で判定に失敗し、ステップＳ５
に戻る。ステップＳ５では、注目するのは＊３であり、
これはｔｅｒｍｉｎａｌ記号であるのでステップＳ８に
処理が移る。In the process of step S9, the value of the counter 2 becomes 3
And the determination fails in step S10, and step S5
Return to At step S5, the target is * 3,
Since this is a terminal symbol, the process proceeds to step S8.

【００６９】ステップＳ８で‘ｔｈａｔ’と＊３がマッ
チするかどうかを判定するが、‘ｔｈａｔ’の品詞は図
１９の品詞情報バッファにより、代名詞か限定詞か関係
代名詞であるので、＊３の示す代名詞（主格）とマッチ
する。In step S8, it is determined whether or not 'that' matches * 3. Since the part of speech of 'that' is a pronoun, a qualifier or a relative pronoun according to the part-of-speech information buffer of FIG. Matches the pronoun (nominative) shown.

【００７０】次にステップＳ９で、カウンタ２の値が４
となり、ステップＳ１０で判定に失敗し、ステップＳ５
に戻る。ステップＳ５で、注目するのは＊ｘであり、こ
れはｔｅｉｍｉｎａｌ記号であるのでステップＳ８に処
理が移る。Next, at step S9, the value of the counter 2 becomes 4
And the determination fails in step S10, and step S5
Return to At step S5, the target is * x, which is a teiminal symbol, so that the process proceeds to step S8.

【００７１】ステップＳ８で‘ｏｒａｎｇｅ’と＊ｘが
マッチするかどうかを判定するが、‘ｏｒａｎｇｅ’の
品詞は図１９の品詞情報バッファによると、名詞であ
り、＊ｘは助動詞であるためマッチしない。そこでＳ８
０に処理を移す。ステップＳ８０では、今注目している
代表記号は＊ｘであり、ｎｏｎ−ｔｅｒｍｉｎａｌ記号
を含まないのでステップＳ１２に処理を移す。もしここ
で注目している代表記号が、＊ａＮのようなｎｏｎ−ｔ
ｅｒｍｉｎａｌ記号を含む場合はステップＳ６に処理を
移す。これはＮの方がマッチする可能性があるので、そ
れを調べるためである。In step S8, it is determined whether or not 'orange' and * x match. According to the part-of-speech information buffer shown in FIG. 19, the part of speech of 'orange' is a noun, and * x is an auxiliary verb. . So S8
Move the process to 0. In step S80, the representative symbol of interest is * x, which does not include a non-terminal symbol, so that the process proceeds to step S12. If the representative symbol of interest here is a non-t like * aN
If it contains the terminal symbol, the process proceeds to step S6. This is to check for possible matches of N.

【００７２】ステップＳ１２は、カウンタ１の値を１増
やす処理である。今、カウンタ１は１であったので２に
なる。この処理は注目していたイディオムバッファを次
のものに移すためのである。ステップＳ１３は、カウン
タ１の値がイディオムバッファの数より大きくないかど
うかを判定する。これはカウンタ１の値がイディオムバ
ッファの数よりも大きければ、入力文とマッチする可能
性のあったイディオムはすべて調べたことになり、処理
をステップＳ１４へ移す。今の場合は、カウンタ１の値
は２であり、かつイディオムバッファの数は２であるの
でステップＳ４に戻る。Step S12 is a process for increasing the value of the counter 1 by one. Now, since the counter 1 is 1, it becomes 2. This process is for transferring the idiom buffer of interest to the next one. A step S13 decides whether or not the value of the counter 1 is larger than the number of the idiom buffers. If the value of the counter 1 is larger than the number of idiom buffers, it means that all idioms that may match the input sentence have been examined, and the process proceeds to step S14. In this case, since the value of the counter 1 is 2 and the number of idiom buffers is 2, the process returns to step S4.

【００７３】ステップＳ４では、カウンタ２の値を１に
セットしなおす。これは今のままではカウンタ２の値は
３になっており、次の処理に進んだ時、イディオムバッ
ファの３番目の代表記号から処理が始まってしまうた
め、カウンタ２の値を１に初期化する。In step S4, the value of the counter 2 is reset to 1. In this case, the value of the counter 2 is 3 as it is, and when the process proceeds to the next process, the process starts from the third representative symbol of the idiom buffer, so the value of the counter 2 is initialized to 1. I do.

【００７４】再びステップＳ５の処理であるが、今注目
しているのが、イディオムバッファ２の一番目、＊ｍで
ありｔｅｉｍｉｎａｌ記号であるので、ステップＳ８に
処理を移す。これ以降の処理においてカウンタ２の値が
３になり、代表記号＊ＣＮが注目されるまでの処理は前
記したイディオムバッファ１と同じ処理が繰り返される
ため、ここでは省略する。The processing of step S5 is again focused on, since what is currently focused on is * m, which is the first symbol, of the idiom buffer 2 and is a symbol of the terminus, the processing shifts to step S8. In the subsequent processing, the processing until the value of the counter 2 becomes 3 and the representative symbol * CN is noticed is the same as that of the idiom buffer 1 described above, and thus the description is omitted here.

【００７５】今カウンタ２の値が３になりステップＳ５
の処理に入ったとする。代表記号は＊ＣＮであり、これ
はｎｏｎ−ｔｅｒｍｉｎａｌ記号であるから、ステップ
Ｓ６に処理を移す。Now, the value of the counter 2 becomes 3, step S5
It is assumed that processing has started. Since the representative symbol is * CN, which is a non-terminal symbol, the process proceeds to step S6.

【００７６】ステップＳ６では、単語列を構文解析して
文または句にするものである。今注目されているのは、
イディオムバッファ２の３番目の英単語列‘ｔｈａｔ
ｏｒａｎｇｅ’である。そこで１９図の品詞バッファに
記憶された‘ｔｈａｔｏｒａｎｇｅ’に対応する品詞
情報９，１０に対して、図１８に示す文法規則が適用さ
れる。ここで‘ｔｈａｔ’はＤＥＴ（限定詞）、‘ｏｒ
ａｎｇｅ’はＮＯＵＮ（名詞）と認識され、‘ｔｈａｔ
ｏｒａｎｇｅ’は規則６がマッチングする。すなわ
ち、‘ｔｈａｔｏｒａｎｇｅ’はＮＰ（名詞句）と認
定される。At step S6, a word string is analyzed into a sentence or a phrase by parsing. What is drawing attention now is
Third English word string 'that in idiom buffer 2
orange '. Therefore, the grammar rule shown in FIG. 18 is applied to the part-of-speech information 9 and 10 corresponding to 'that orange' stored in the part-of-speech buffer of FIG. Where 'that' is DET (qualifier), 'or
"ange" is recognized as NOUN (noun), and "that"
orange 'matches rule 6. That is, 'that orange' is recognized as an NP (noun phrase).

【００７７】もしステップＳ６における構文解析が失敗
した時は、ステップＳ６１において処理をステップＳ１
２へ移すが、今は成功したため、処理は次のステップＳ
７へ入る。ステップＳ７では、ステップＳ６で認定され
た名詞句に対応する訳語の生成を行なう。ここでの処理
は、図５における構文変換部５４及び翻訳文生成部５５
で行われるものを利用する。すなわち、‘ｔｈａｔｏ
ｒａｎｇｅ’は「あのオレンジ」と訳される。If the syntax analysis in step S6 fails, the process proceeds to step S61.
The process moves to step S2, but since it succeeds now, the process proceeds to the next step S
Enter 7. In step S7, a translated word corresponding to the noun phrase recognized in step S6 is generated. The processing here is performed by the syntax conversion unit 54 and the translation sentence generation unit 55 in FIG.
Use what is done in. That is, 'that o
range 'is translated as "that orange".

【００７８】次に、ステップＳ８１で代表記号とのマッ
チを行なう。今、図１１により英単語‘ｔｈａｔｏｒ
ａｎｇｅ’はＮＰ（名詞句）なので、代表記号＊ＣＮの
＊Ｎとマッチする。Ｓ９でカウンタ２の値が４になる。
次にステップＳ１０でカウンタ２の値（＝４）はイディ
オムバッファ内の代表記号の数（＝３）よりも多いので
ステップＳ１１に移る。Next, in step S81, a match is made with the representative symbol. Now, referring to FIG. 11, the English word 'that or
Since angle 'is NP (noun phrase), it matches * N of the representative symbol * CN. At S9, the value of the counter 2 becomes 4.
Next, in step S10, the value of the counter 2 (= 4) is larger than the number of representative symbols (= 3) in the idiom buffer, so that the process proceeds to step S11.

【００７９】ステップＳ１１では入力文にマッチしたイ
ディオムを辞書バッファに登録する。登録は英単語列と
その品詞、訳語とその品詞である。英単語列には図１６
に示すイディオムの代表記号を英単語で置き換えた［ｔ
ｈｒｅｅｔｉｍｅｓａｓｂｉｇａｓｔｈａｔ
ｏｒａｎｇｅ］が入り、品詞には中心語が形容詞だった
ので形容詞句が入る。In step S11, an idiom matching the input sentence is registered in the dictionary buffer. The registration is an English word sequence and its part of speech, a translated word and its part of speech. Figure 16 shows the English word string.
The representative symbol of the idiom shown in was replaced with an English word [t
hree times asbig as as that
orange], and the adjective phrase is included in the part of speech because the central word was an adjective.

【００８０】訳語には図１６の訳語のところの代表記号
をその各単語の訳語で置き換えたものが入る。この場合
は「３倍あのオレンジより大きい」が入る。訳語の品詞
は、イディオムの登録の際に述べたように、訳語の最後
の単語の品詞と活用によって決まり、この場合は「大き
い」なので、形容詞となる。この結果を示したものが図
２２である。The translated words include those obtained by replacing the representative symbol at the translated word in FIG. 16 with the translated word of each word. In this case, "3 times larger than that orange" is entered. The part of speech of the translated word is determined by the part of speech and the inflection of the last word of the translated word, as described at the time of registration of the idiom. In this case, since it is "large", it is an adjective. FIG. 22 shows the result.

【００８１】次にステップＳ１２でカウンタ１の値を１
増やすので、カウンタ１の値は３になる。ステップＳ１
３でカウンタ１の値が３であり、イディオムバッファの
数が２であるので成功し、辞書引きのすべての処理が終
ったことになる。Next, at step S12, the value of the counter 1 is set to 1
Since it is increased, the value of the counter 1 becomes 3. Step S1
At 3, the value of the counter 1 is 3, and the number of idiom buffers is 2, so that the process was successful and all the processes of dictionary lookup have been completed.

【００８２】そしてこの辞書引きされた結果（図２３に
示されたもの）がステップＳ１４の通常の構文解析に渡
される。これ以下の処理は図６の例で示した［Ｔｈｉｓ
ｉｓａｐｅｎ．］の処理と同様であり、入力文全
体の翻訳処理を実行後、最後に日本語文「このりんごは
３倍あのオレンジより大きい」が生成される。The result of this dictionary lookup (shown in FIG. 23) is passed to the ordinary syntax analysis in step S14. The processing below this is shown in the example of FIG.
is a pen. After executing the translation processing of the entire input sentence, a Japanese sentence “this apple is three times larger than that of orange” is finally generated.

【００８３】以上のように、イディオムの中に存在する
複数の可変部分を代表記号で表わすことにより、イディ
オムを簡単に登録及び検索することができ、さらにイデ
ィオムを含む入力文の適切な翻訳文を生成することがで
きる。As described above, by representing a plurality of variable portions present in an idiom with representative symbols, the idiom can be easily registered and searched, and an appropriate translation of an input sentence including the idiom can be obtained. Can be generated.

【００８４】[0084]

【発明の効果】この発明によれば、イディオムにおける
可変部分を代表記号で表わすことにより、イディオム中
に可変部分が複数存在する場合や可変部分に単語だけで
なく句が入る場合でも、イディオムを簡単に登録及び検
索することができる。また、この可変部分の表現形式で
ある代表記号が、１つの表現で、複数の単語の品詞や句
に対応できるように記述されるため、イディオムの可変
部分に対するあらゆるパターンを登録する必要はなく、
登録時間の短縮、翻訳労力の削減および情報記憶容量の
増大の防止ができる。また、イディオム全体の品詞を代
表する中心語に対応する代表記号を指定することによっ
て、イディオムの登録が簡単になり、さらにイディオム
を含む入力文の適切な翻訳文を生成することができる。According to the present invention, the variable part in the idiom is represented by a representative symbol, so that the idiom can be simplified even when there are a plurality of variable parts in the idiom or when not only words but also phrases are included in the variable part. Can be registered and searched. In addition, since the representative symbol, which is an expression form of the variable part, is described in one expression so as to correspond to the parts of speech and phrases of a plurality of words, it is not necessary to register every pattern for the variable part of the idiom,
It is possible to reduce registration time, reduce translation effort, and prevent an increase in information storage capacity. In addition, by specifying a representative symbol corresponding to the central word representing the part of speech of the entire idiom, registration of the idiom is simplified, and an appropriate translation of the input sentence including the idiom can be generated.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明の辞書検索装置の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of a dictionary search device of the present invention.

【図２】機械翻訳の概念を模式的に示す図である。FIG. 2 is a diagram schematically illustrating the concept of machine translation.

【図３】この発明の辞書検索装置の一例を含む機械翻訳
装置のブロック図である。FIG. 3 is a block diagram of a machine translation device including an example of the dictionary search device of the present invention.

【図４】図３に示される翻訳モジュール３５のブロック
図である。FIG. 4 is a block diagram of a translation module 35 shown in FIG.

【図５】翻訳モジュールにおける入力文の解析から翻訳
文の生成までの処理構造を示す図である。FIG. 5 is a diagram showing a processing structure from analysis of an input sentence to generation of a translated sentence in a translation module.

【図６】バッファＡの格納内容を模式的に示す図であ
る。FIG. 6 is a diagram schematically showing contents stored in a buffer A;

【図７】バッファＢの格納内容を模式的に示す図であ
る。FIG. 7 is a diagram schematically showing the contents stored in a buffer B;

【図８】バッファＣの格納内容を模式的に示す図であ
る。FIG. 8 is a diagram schematically showing contents stored in a buffer C;

【図９】バッファＤの格納内容を模式的に示す図であ
る。FIG. 9 is a diagram schematically showing contents stored in a buffer D;

【図１０】バッファＥの格納内容を模式的に示す図であ
る。FIG. 10 is a diagram schematically showing the contents stored in a buffer E;

【図１１】イディオムの登録に使用する代表記号テーブ
ルの例を示す図である。FIG. 11 is a diagram showing an example of a representative symbol table used for registering an idiom.

【図１２】図１１に示した代表記号を用いたイディオム
の登録例を示す図である。FIG. 12 is a diagram showing an example of registration of an idiom using the representative symbols shown in FIG. 11;

【図１３】図１１に示した代表記号を用いたイディオム
の登録例を示す図である。13 is a diagram illustrating an example of registration of an idiom using the representative symbols illustrated in FIG. 11;

【図１４】図１１に示した代表記号を用いたイディオム
の登録例を示す図である。14 is a diagram illustrating an example of registration of an idiom using the representative symbol illustrated in FIG. 11;

【図１５】図１１に示した代表記号を用いたイディオム
の登録例を示す図である。15 is a diagram illustrating an example of registration of an idiom using the representative symbols illustrated in FIG. 11;

【図１６】図１１に示した代表記号を用いたイディオム
の登録例を示す図である。FIG. 16 is a diagram illustrating an example of registration of an idiom using the representative symbol illustrated in FIG. 11;

【図１７】代表記号によって登録されたイディオムと入
力文との同定処理及び訳語出力処理のフローチャートで
ある。FIG. 17 is a flowchart of a process of identifying an idiom registered by a representative symbol and an input sentence and a process of outputting a translated word;

【図１８】構文解析用の文法規則を示す図である。FIG. 18 is a diagram showing a grammar rule for parsing.

【図１９】入力文を辞書引きした後に生成される品詞バ
ッファの内容の一部を示す模式図である。FIG. 19 is a schematic diagram illustrating a part of the contents of a part-of-speech buffer generated after an input sentence is looked up in a dictionary.

【図２０】イディオムの代表記号と単語との対応関係を
記憶したイディオムバッファの内容を示す模式図であ
る。FIG. 20 is a schematic diagram showing the contents of an idiom buffer that stores the correspondence between a representative symbol of an idiom and a word.

【図２１】イディオムの代表記号と単語との対応関係を
記憶したイディオムバッファの内容を示す模式図であ
る。FIG. 21 is a schematic diagram showing the contents of an idiom buffer that stores the correspondence between a representative symbol of an idiom and a word.

【図２２】マッチングに成功した入力文のイディオム部
に関する情報の登録例を示す模式図である。FIG. 22 is a schematic diagram illustrating an example of registration of information regarding an idiom portion of an input sentence that has been successfully matched;

【図２３】イディオム部分の翻訳が終了し入力文のすべ
ての辞書引きが完了した後に生成される結果を示す模式
図である。FIG. 23 is a schematic diagram showing a result generated after the translation of an idiom part is completed and all dictionary lookups of an input sentence are completed.

【符号の説明】[Explanation of symbols]

１入力手段２イディオム登録手段３辞書引き・形態素解析手段３ａ品詞抽出部３ｂ訳語抽出部４イディオム翻訳手段４ａイディオム検索部４ｂイディオム同定部４ｃイディオム解析部４ｄイディオム訳語生成部５構文解析手段６構文変換手段７翻訳文生成手段８出力手段９記憶手段９ａ辞書メモリ９ｂバッファメモリ９ｃイディオム登録メモリ３１メインＣＰＵ３２メインメモリ３３表示装置３４キーボード３５翻訳モジュール３６辞書メモリ３７バス Reference Signs List 1 input means 2 idiom registration means 3 dictionary lookup / morphological analysis means 3a part-of-speech extraction unit 3b translated word extraction unit 4 idiom translation means 4a idiom search unit 4b idiom identification unit 4c idiom analysis unit 4d idiom translation generation unit 5 syntax analysis unit 6 syntax conversion Means 7 Translation generation means 8 Output means 9 Storage means 9a Dictionary memory 9b Buffer memory 9c Idiom registration memory 31 Main CPU 32 Main memory 33 Display device 34 Keyboard 35 Translation module 36 Dictionary memory 37 Bus

───────────────────────────────────────────────────── フロントページの続き (72)発明者鈴木等大阪府大阪市阿倍野区長池町22番22号シャープ株式会社内 (56)参考文献特開昭58−92063（ＪＰ，Ａ) 特開昭58−40684（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/20 - 17/28──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Suzuki et al. 22-22 Nagaike-cho, Abeno-ku, Osaka-shi, Osaka Inside Sharp Corporation (56) References JP-A-58-92063 (JP, A) JP-A-58- 40684 (JP, A) (58) Field surveyed (Int. Cl. ⁶ , DB name) G06F 17/20-17/28

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】文字列および記号を入力する入力手段
と、所定の属性を共有する単語又は単語列の集合を代表
する代表記号を１つ又は複数個持ち、複数の属性が対応
する単語又は単語列の部分を複数の代表記号を複合した
形式で表現したイディオムの見出し語と訳語を登録する
イディオム登録手段と、イディオムの登録と翻訳処理に
必要な辞書及び処理結果を記憶する記憶手段と、入力単
語列を形態素に分解し、かつ文法解析を行う辞書引き・
形態素解析手段と、入力文字列あるいはその一部分と登
録されたイディオムの見出し語とを同定し、同定された
イディオムの見出し語に対応する文字列の訳語を生成す
るイディオム翻訳手段と、構文解析手段と、構文変換手
段と、翻訳文生成手段と、翻訳文を出力する出力手段と
を備え、イディオム登録手段が、複数の代表記号からな
るイディオムを登録する場合に、イディオム全体の品詞
を代表する中心語に対応する代表記号を指定した形式で
登録することを特徴とするイディオム登録機能を持つ機
械翻訳装置。An input means for inputting a character string and a symbol, and one or more representative symbols representing a set of words or word strings sharing a predetermined attribute, and a word or word corresponding to a plurality of attributes Idiom registration means for registering idiom headwords and translations representing a column portion in a form in which a plurality of representative symbols are compounded; storage means for storing a dictionary and processing results required for idiom registration and translation processing; Dictionaries that break down word strings into morphemes and perform grammar analysis
Morphological analysis means, an idiom translation means for identifying an input character string or a part thereof and a registered idiom headword, and generating a translation of a character string corresponding to the identified idiom headword, and a syntax analysis means. Syntactic conversion means, translation sentence generation means, and output means for outputting a translation sentence, wherein the idiom registration means comprises a plurality of representative symbols.
When registering an idiom, the part of speech of the entire idiom
In the form of specifying the representative symbol corresponding to the central word that represents
Machine translation device with the idiom registration function, characterized in that the registration.