JPS6368972A

JPS6368972A - Unregistered word processing system

Info

Publication number: JPS6368972A
Application number: JP61211586A
Authority: JP
Inventors: Hiroko Yoshinaka; 吉中　裕子; Atsushi Okajima; 岡島　惇; Tadao Furuya; 古谷　忠雄
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-09-10
Filing date: 1986-09-10
Publication date: 1988-03-28

Abstract

PURPOSE:To make continuation of sentence processing possible, reduce errors of syntax analyzing process and improve efficiency of document processing by adding attribute inferred from word structure to unregistered words during sentence processing. CONSTITUTION:When an unregistered word in a sentence is inputted from an input register 4 to a memory 3, a processor 1 judges whether the unregistered word includes a prefix in a prefix table 2. When it is known by longest match method that a character of the head part coincides with one of prefixes, the processor 1 removes the prefix, and gives attribute information to the unregistered word in the memory 3 and sets a flag. The processor 1 retrieves a dictionary memory 4 making the unregistered word in the memory 3 from which prefix is removed a keyword. When it is present, coincidence of part of speech information and the base of the prefix in the memory 3, and the unregistered word and attribute of the prefix are registered in a dictionary 6 for registration.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、自然言語処理において、辞書に登録されてい
ない単語を、言語処理、例えば機械翻訳処理等の対象と
することのできる、未登録語処理方式に関する。[Detailed Description of the Invention] [Industrial Application Field] In natural language processing, the present invention provides unregistered words that can be subjected to language processing, such as machine translation processing, for words that are not registered in a dictionary. Concerning word processing methods.

〔従来の技術〕[Conventional technology]

従来の言語処理方式では、与えられた文章中に辞書に登
録されていない単語があった場合に、その属性を決定す
ることが出来ないが為に、構文解析不能という事態が起
こった。その対策として上記未登録語に文法上推定され
る属性を逐次与え、その都度構文解析を行い、構文解析
が成功するまで上記属性を変えることを試行する方法（
特開昭５８−１７５０７４　ｒ構文分析方式」）や、上
記未登録語をＣＲＴに表示し、同未登録語の情報をオペ
レータによって逐一人力する方法（特開昭５８−１７５
０７６１自然言語処理装置Ｊ）などが考案された。With conventional language processing methods, if a given sentence contains a word that is not registered in a dictionary, it is not possible to determine its attributes, resulting in a situation where syntactic analysis is impossible. As a countermeasure for this, there is a method of sequentially assigning grammatically estimated attributes to the unregistered words, performing syntax analysis each time, and attempting to change the above attributes until the syntax analysis is successful (
Japanese Unexamined Patent Publication No. 58-175074 R Syntactic Analysis Method), a method in which the unregistered words mentioned above are displayed on a CRT, and information on the unregistered words is manually entered by an operator (Unexamined Japanese Patent Application No. 58-175
0761 Natural Language Processing Device J) etc. were devised.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

上記の従来技術は、多量の文書を連続的に処理する場合
についての配慮がされておらず、その為、連続処理を行
う場合に以下の様な問題が起こった。The above-mentioned conventional technology does not take into consideration the case where a large number of documents are processed continuously, and therefore, the following problems occur when continuous processing is performed.

上記前者の方法では、推定された属性と構文解析上で成
功と見なされる為の属性とが一致するまで、属性を変え
る度ごとに構文解析を行わなければならず、その為に解
析時間が非常に増大した。また、上記後者の方法では、
オペレータが登録かべき属性をその全てに渡って点検す
ることが必要となり、このような情報入力は上記属性の
項目が増える程繁雑となるためオペレータの負担の増加
につながった。本発明は、文書処理を中断することなし
に上記問題点を解決し、構文解析処理部での誤りを少な
くすることを目的とする。In the former method above, syntax analysis must be performed each time the attribute is changed until the estimated attribute matches the attribute that is considered successful in syntax analysis, which requires a significant amount of analysis time. It increased to Also, in the latter method,
It is necessary for the operator to check all attributes to be registered, and such information input becomes more complicated as the number of attribute items increases, leading to an increase in the burden on the operator. It is an object of the present invention to solve the above problems without interrupting document processing and to reduce errors in the parsing processing section.

〔問題点を解決するための手段〕[Means for solving problems]

本発明は上記の問題点を鑑み、言語処理対象の文章中に
おける未登録語に対し、上記未登録語の単語構造が持つ
情報をもとに、既に登録済みの情報を用いて同単語の属
性を推定して同単語と共に辞書に登録し、また推定され
る属性が唯一に決定されずに複数個得られた場合は、可
能性のある属性を全て上記単語と共に辞書に登録してそ
の後の構文解析中の多義解消部において属性を一つに決
定することにより、上記目的を達成する。In view of the above-mentioned problems, the present invention analyzes the attributes of an unregistered word in a sentence to be processed by using already registered information, based on the information of the word structure of the unregistered word. is estimated and registered in the dictionary together with the same word, and if the estimated attribute is not uniquely determined and multiple attributes are obtained, all possible attributes are registered in the dictionary together with the above word and the subsequent syntax is The above objective is achieved by determining one attribute in the disambiguation section during analysis.

〔作用〕[Effect]

本発明は、文章処理中における未登録語に、その単語構
造より推定した属性を付記することによって、文書処理
の続行を可能とするように動作する。それによって未登
録語による構文解析処理での失敗を少なくし、文書処理
の効率を向上させることができる。The present invention operates to enable continuation of document processing by adding attributes estimated from the word structure to unregistered words during document processing. This can reduce failures in parsing processing due to unregistered words and improve the efficiency of document processing.

〔実施例〕〔Example〕

以下、本発明の実施例を図面を参照して説明する。一実
施例として、英文の処理を行う際の未登録語処理方式に
ついて述べる。また、属性情報の一例として、本例では
品詞情報を例にとって説明する。Embodiments of the present invention will be described below with reference to the drawings. As an example, an unregistered word processing method when processing English text will be described. Furthermore, in this example, part of speech information will be explained as an example of attribute information.

第２図は、本発明の一実施例の構成を示すブロック図で
ある。本図において、１はプロセッサ、２は接頭辞／接
尾辞テーブル、３は内部メモリ、４の辞書メモリ、５は
入力レジスタ、６は単語及びその属性の登録用辞書を表
している。未登録語処理は、第１図に示されるフローに
従って行われる。FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention. In this figure, 1 is a processor, 2 is a prefix/suffix table, 3 is an internal memory, 4 is a dictionary memory, 5 is an input register, and 6 is a dictionary for registering words and their attributes. The unregistered word processing is performed according to the flow shown in FIG.

文章中において発見された未登録語が、入力レジスタ４
から内部メモリ３内に入力されると、プロセッサ１は、
まず処理１０１として上記未登録語が接頭辞テーブル２
内の接眼辞を含むか否かを判断する。ここで、接頭辞テ
ーブル２の中には第３図のように属性情報として、接頭
辞、単語から接頭辞を除去した語基の品詞情報及び、語
基に接頭辞を付加した派生語の品詞情報が登録してある
。Unregistered words found in the text are input to the input register 4.
When input into internal memory 3 from
First, in process 101, the above unregistered word is stored in the prefix table 2.
Determine whether or not it includes the eyepiece within. Here, in the prefix table 2, as shown in Figure 3, the attribute information includes the prefix, the part of speech information of the base word with the prefix removed from the word, and the part of speech information of the derived word with the prefix added to the base word. Information has been registered.

判断の方法には、同単語の頭部文字と一致する接頭辞が
幾つかある場合、最も長いものを選択する最長一致法を
用いる。上記の方法によって同単語頭部の文字が予め定
められた接頭辞中の一つと同じであることが判明すると
、プロセッサ１は、処理１０２として、この接頭辞を除
去し、処理１０３として、先に述べた接頭辞テーブル２
の中の除去した接頭辞の属性情報を内部メモリ３の中の
上記未登録語に付与し、付与したことを示すフラグ工１
をセットする。そして、処理１０４として、として、接
頭辞を処理したことを示すために、フラグＰをセットす
る。次にプロセッサ１は、処理１０５として、内部メモ
リ３内の接頭辞を除去した上記未登録語をキーワードと
して辞書メモリ４内に同キーワードが存在するか否かを
検索し、存在した場合、更に、処理１．０６として、辞
書メモリ４内に付記している同キーワードと一致した単
語の品詞情報と、先に付与した内部メモリ３内の接頭辞
の語基の品詞情報とが一致するか否かを判断する。上記
の両方の情報が一致した場合、プロセッサ１は処理１０
７として、登録用辞書６に上記未登録語と共に接頭辞と
キーワードより推定した属性として内部メモリ３に付与
した接頭辞の属性情報のうちの派生語の品詞情報を登録
し、フラグＲをセットして、更に別の可能性の属性を探
す為に結合点２に移る。For the determination method, if there are several prefixes that match the first letter of the same word, the longest match method is used to select the longest one. When it is determined by the above method that the first character of the same word is the same as one of the predetermined prefixes, the processor 1 removes this prefix in step 102, and first removes this prefix in step 103. Prefix table 2 mentioned
The attribute information of the removed prefix is added to the unregistered word in the internal memory 3, and a flag 1 indicating that it has been added is added.
Set. Then, in step 104, a flag P is set to indicate that the prefix has been processed. Next, in step 105, the processor 1 searches the dictionary memory 4 for the unregistered word from which the prefix has been removed as a keyword to see if the same keyword exists in the dictionary memory 4, and if it does, further As process 1.06, determine whether the part-of-speech information of the word that matches the same keyword added in the dictionary memory 4 matches the part-of-speech information of the base of the prefix in the internal memory 3 that was added earlier. to judge. If both of the above information match, processor 1 performs processing 10
7, the part of speech information of the derived word among the attribute information of the prefix assigned to the internal memory 3 is registered in the registration dictionary 6 together with the unregistered word as an attribute estimated from the prefix and the keyword, and the flag R is set. Then, move to connection point 2 to search for yet another possible attribute.

上記未登録語と一致する接頭辞が接頭辞テーブル２内に
存在しなかった場合、或いは、上記未登録語から接頭辞
を除去した上記キーワードが辞書メモリ４内に存在しな
かった場合、或いは、辞書メモリ４内に存在した上記キ
ーワードの品詞情報が、内部メモリ３内の除去した接頭
辞の語基の品詞と一致しなかった場合にも結合点２に分
岐し、プロセッサ１は処理１０８として、内部メモリ３
内の上記未登録語が接尾辞を含んでいるか否かを判断す
る処理に移る。ここで用いる接尾辞テーブル２の中には
第４図のように属性情報として、接尾辞、単語から接尾
辞を除去した語基の品詞情報。If the prefix that matches the unregistered word does not exist in the prefix table 2, or if the keyword from which the prefix is removed from the unregistered word does not exist in the dictionary memory 4, or If the part-of-speech information of the keyword that existed in the dictionary memory 4 does not match the part-of-speech of the base of the removed prefix in the internal memory 3, the process also branches to connection point 2, and the processor 1 performs the following steps as processing 108. Internal memory 3
The process moves on to determining whether or not the unregistered word within includes a suffix. In the suffix table 2 used here, as shown in FIG. 4, attribute information includes suffixes and part-of-speech information of the base word from which the suffix is removed.

語基に接尾辞を付加した派生語の品詞情報及び、語尾処
理情報が登録されている。語尾処理情報とは、接尾辞を
除去した際の単語後部を再構成する為の情報で、（Ａ）
は処理前、（Ｂ）は処理後の単語後部を示す。Part-of-speech information and word-suffix processing information for derived words with suffixes added to the base word are registered. Ending processing information is information for reconstructing the end of a word when a suffix is removed, and (A)
(B) shows the end of the word before processing, and (B) shows the end of the word after processing.

処理１０８で最長一致法を用いて上記未登録語の単語後
部が予め接尾辞テーブル２内に定められた接尾辞の一つ
と同じであることが判明すると、プロセッサ１は処理１
０９としてこの接尾辞を除去して、処理１１０として単
語後部を接尾辞テーブル２内の語尾処理情報によって再
構成する。次に、処理１１１として、接尾辞テーブル２
の中の除去した接尾辞の属性情報を内部メモリ３内の上
記未登録語に付与し、付与したことを示すフラグＩ２を
セットする。そして接尾辞を処理したことを示す為に処
理１１２として、フラグＳをセットする。When it is found in process 108 that the end of the unregistered word is the same as one of the suffixes predefined in the suffix table 2 using the longest match method, the processor 1 performs process 1.
In step 09, this suffix is removed, and in step 110, the end of the word is reconstructed using the ending processing information in the suffix table 2. Next, as process 111, the suffix table 2
The attribute information of the removed suffix in is added to the unregistered word in the internal memory 3, and a flag I2 indicating that it has been added is set. Then, in step 112, a flag S is set to indicate that the suffix has been processed.

内部メモリ３内の接尾辞を除去した上記未登録語を新に
キーワードとし、辞書メモリ４内に同キーワードが存在
するか否かをプロセッサ１は処理１１３として検索し、
存在した場合、更に処理１１４として、同キーワードと
一致した単語の辞書メモリ４内に付記Ｌノである品詞情
報と、先に除去した内部メモリ３内の接尾辞の語基の品
詞情報が一致するか否かを判断する。上記両方の情報が
一致した場合、プロセッサ１は処理１１５でフラグＰを
セットしているか否かを判断する。セットしていれば、
処理１１６において登録用辞書６に上記未登録語と共に
接頭辞、接頭辞及びキーワードから推定した属性として
、内部メモリ３に付与した接尾辞の属性情報の内の派生
語の品詞情報とを登録してフラグＲをセットし、更に別
の可能性の属性を探す為に結合点２に分岐する。フラグ
Ｐをセットしていなければ、処理１１７として、登録用
辞書６に上記未登録語と共に接尾辞とキーワードから推
定した属性として内部メモリ３に付与した接尾辞の属性
情報のうちの派生語の品詞情報を登録してフラグＲをセ
ットし、更に別の可能性の属性を探す為に結合点２に分
岐する。Using the unregistered word from which the suffix has been removed in the internal memory 3 as a new keyword, the processor 1 searches as a process 113 whether or not the same keyword exists in the dictionary memory 4;
If the word exists, then in step 114, the part-of-speech information in appendix L in the dictionary memory 4 of the word that matches the same keyword matches the part-of-speech information of the base of the suffix in the internal memory 3 that was previously removed. Determine whether or not. If both pieces of information match, the processor 1 determines whether the flag P is set in step 115. If you set it,
In process 116, the prefix, part of speech information of the derived word in the attribute information of the suffix assigned to the internal memory 3 is registered in the registration dictionary 6 together with the unregistered word as an attribute estimated from the prefix and the keyword. Set flag R and branch to connection point 2 to search for yet another possible attribute. If the flag P is not set, in step 117, the part of speech of the derived word of the attribute information of the suffix added to the internal memory 3 as an attribute estimated from the suffix and keyword together with the unregistered word in the registration dictionary 6. The information is registered, flag R is set, and branching is made to connection point 2 in order to search for another possible attribute.

辞書メモリ４内に上記キーワードが存在しなかった場合
、或いは、上記キーワードと接尾辞の両肩性情報が一致
しなかった場合にも、結合点２に再分岐して、プロセッ
サ１は再度処理１０８によって同キーワードが更に接尾
辞を含んでいるか否かを判断し、含んでいれば同処理群
１０９〜１１７を繰り返し行う。If the above-mentioned keyword does not exist in the dictionary memory 4, or if the bilateral information of the above-mentioned keyword and the suffix do not match, the process branches again to the connection point 2, and the processor 1 performs the process 108 again. It is determined whether the same keyword further includes a suffix, and if so, the same processing group 109 to 117 is repeated.

上記キーワードがもう接尾辞を含んでいない場合には結
合点４に分岐し、プロセッサ１は処理１１８として、そ
こまでの処理で接尾辞を除去していたか否かを判断する
。除去していた場合は処理１１９として、除去していた
接尾辞を元通りに付加して、続いて処理１２０でフラグ
Ｓをクリアした上、結合点２に分岐して、内部メモリ３
内の接尾辞を元通り付加した新キーワードと一致する付
加した接尾辞の次に長い接尾辞が接尾辞テーブル２内に
存在するか否かを再び処理１０８によって判断する。存
在すればプロセッサ１は同処理群１０９〜１１７を繰り
返し行う。If the keyword no longer includes a suffix, the process branches to connection point 4, and in step 118, the processor 1 determines whether the suffix has been removed in the processing up to that point. If the suffix has been removed, the removed suffix is added back to its original state in step 119, and then the flag S is cleared in step 120, branching to connection point 2, and internal memory 3 is added.
It is determined again in process 108 whether or not there exists in the suffix table 2 the next longest suffix of the added suffix that matches the new keyword to which the suffix within is added. If the processor 1 exists, the processor 1 repeatedly performs the same processing group 109 to 117.

処理１０８で、前に除去した接尾辞の次に長い接尾辞が
存在しなければ、プロセッサ１は処理１２１として、フ
ラグＰをセットしているか否かを判断する。セットして
いれば、処理１２２としてフラグＰをクリアし、処理１
２３とした接頭辞を除去していることを新たに示すフラ
グＰ２をセットした上、結合点１に分岐して、骨部メモ
リ３内の接尾辞だけを元通り付加した新キーワードの頭
部文字と一致する接頭辞が接頭辞テーブル２内に存在す
るか否かを処理１０１で判断し、存在した場合、同処理
群１０２〜１２５を繰り返し行う。In process 108, if there is no suffix next to the previously removed suffix, the processor 1 determines in process 121 whether flag P is set. If set, flag P is cleared in process 122, and process 1
After setting a new flag P2 indicating that the prefix 23 has been removed, branching to connection point 1, and adding only the suffix in bone memory 3 back to the original, the initial letter of the new keyword It is determined in process 101 whether or not a prefix that matches exists in the prefix table 2, and if it exists, the same processes 102 to 125 are repeated.

存在しなかった場合、プロセッサ１は処理１２４として
、前に接頭辞を除去していたか否かをフラグＰ２によっ
て判断する。除去していた場合、処理１２５において、
上記キーワードに除去した接頭辞を元通りに付加し、更
に処理１２６でフラグＰ２をクリアした」二で結合点１
に分岐して、接頭辞を元通りに付加した新キーワードと
一致する、付加した接頭辞の次に長い接頭辞が接頭辞テ
ーブル２内に存在するか否かを処理１．　Ｏ］−で判断
する。If the prefix does not exist, the processor 1 determines in step 124 whether or not the prefix has been removed previously, based on the flag P2. If it has been removed, in process 125,
The removed prefix was added back to the above keyword, and flag P2 was cleared in process 126.
Branching to step 1, the process 1. checks whether there is a prefix in the prefix table 2 that is the next longest prefix to the added prefix and matches the new keyword to which the prefix has been added. O]- to judge.

存在すればプロセッサ］−は処理群１０２〜１２６を繰
り返し行う。If the processor exists, the processor]- repeatedly performs processing groups 102 to 126.

存在しなかった場合、プロセッサ１は、キーワードの品
詞情報をも含めて推定した属性結果を登録しているか否
かを処理１２７としてフラグＲをセットしているか否か
により判断する。フラグＲをセットしていたら結合点９
に分岐して処理を終了する。If it does not exist, the processor 1 determines whether or not the attribute result estimated including the part-of-speech information of the keyword is registered in step 127, based on whether the flag R is set. If flag R is set, connection point 9
Branch to and end the process.

セットしていない場合、フラグ■１、フラグエ２をセッ
トしているか否かをプロセッサ１は処理１２８として判
断する。セットしていれば、処理１２９として、内部メ
モリ内に付与された接頭辞、接頭辞の属性情報の内、派
生語の品詞情報を上記未登録語の属性として、登録用辞
書６に上記未登録語と共に登録する。セットされていな
い場合、処理１３０として、固有名詞と推定し、固有名
詞としての属性情報を登録用辞書６に登録する。If not set, the processor 1 determines in step 128 whether flag 1 and flag 2 are set. If set, in step 129, the prefix assigned in the internal memory and the part-of-speech information of the derived word among the attribute information of the prefix are set as the attribute of the unregistered word in the registration dictionary 6. Register with the word. If it is not set, in step 130, it is presumed to be a proper noun, and attribute information as a proper noun is registered in the registration dictionary 6.

ここで、ｒＵＮＡｃｃＯＵＮＴＡＢＬＹ　　（アンアカ
ウンタブリイ）」なる単語が、未登録語として入力され
た場合の事例を以下に説明する。まず、入力レジスタ５
を通って内部メモリ３内に入力された同単語を、プロセ
ッサ１は処理１０１として、接頭辞テーブル２と比較し
、接頭辞ｒＵＮ　（アン）」が最長−教法によって接頭
辞テーブル２から選ばれる。Here, a case will be described below in which the word "rUNAccOUNTABLY" is input as an unregistered word. First, input register 5
The processor 1 compares the same word input into the internal memory 3 through the process 101 with the prefix table 2, and the prefix "rUN" is selected from the prefix table 2 according to the longest teaching method. .

処理１０２において内部メモリ３内の未登録語ｒＵＮＡ
ｃｃＯ［ｊＮＴＡＢＬＹ　ＪからｒＵＮＪを除去する。In process 102, the unregistered word rUNA in the internal memory 3
Remove rUNJ from ccO[jNTABLY J.

プロセッサ１は処理１０３で接頭辞ｒＵＮＪを除去する
。プロセッサ１は処理１０３で接頭辞ｒＵ　ＮＪの属性
情報として第３図の（１）に示す情報群を内部メモリ３
に付与して、処理１０４でフラグエ１及びフラグＰをセ
ットする。内部メモリ３内の上記未登録語から接頭辞ｒ
ＵＮＪを除去したキーワードｒＡｃｃＯＵＮＴＡＢＬＹ
　　（アカウンタブリイ）」の辞書検索を処理１０５で
行う。同キーワードが辞書メモリ４内に存在しなければ
、結合点２に分岐し、処理１０８において接尾辞テーブ
ル２内から接尾辞ＦＬＹ　（リイ）」を選んだ処理１０
９おいて同キーワードから除去して、処理１１０で語尾
を再構成する。処理１１１において接尾辞ｒ　Ｌ　Ｙ　
Ｊの属性情報として第４図の（１）に示す情報群を内部
メモリ３内に付与して、フラグ■２及びフラグＳを処理
１１２でセットする。プロセッサ１は処理１１３として
、キーワードから接尾辞を除去し語尾を再構成した第二
のキーワードｒＡｃｃＯＵＮＴＡＢＬＹ　　（アカウン
タブリイ）」での辞書検索を行う。辞書メモリ４内に同
キーワードが存在しない場合、結合点２に分岐し、処理
１０８において再度、接尾辞テーブル２と比較する。同
処理群１０９〜１１２で、接尾辞ｒＡＢＬＥＪの属性情
報として第４図の（２）に示す属性情報群と第二のキー
ワードｒＡｃｃＯＵＮＴ　　（アカウント）」を内部メ
モリ３に得る。同キーワードｒＡｃｃＯ［ｊＮＴ　Ｊが
辞書メモリ４内に存在し、キーワードの品詞情報。Processor 1 removes the prefix rUNJ in process 103. In process 103, the processor 1 stores the information group shown in (1) in FIG. 3 as the attribute information of the prefix rU NJ in the internal memory 3.
, and flag 1 and flag P are set in step 104. Prefix r from the above unregistered word in internal memory 3
Keyword rAccOUNTABLY with UNJ removed
(Accountable)" is searched in the dictionary in step 105. If the same keyword does not exist in the dictionary memory 4, the process branches to connection point 2, and in process 108, the suffix FLY is selected from the suffix table 2.
In step 9, the keyword is removed from the same keyword, and in step 110, the ending of the word is reconstructed. In process 111, the suffix r L Y
The information group shown in (1) of FIG. 4 is added to the internal memory 3 as the attribute information of J, and flag 2 and flag S are set in step 112. In process 113, the processor 1 performs a dictionary search using the second keyword "rAccOUNTABLY (accountable)" obtained by removing the suffix from the keyword and reconstructing the ending. If the same keyword does not exist in the dictionary memory 4, the process branches to the connection point 2 and compares it with the suffix table 2 again in process 108. In the same processing group 109 to 112, the attribute information group shown in (2) of FIG. 4 and the second keyword "rAccOUNT (account)" are obtained in the internal memory 3 as the attribute information of the suffix rABLEJ. The same keyword rAccO[jNT J exists in the dictionary memory 4, and part-of-speech information of the keyword.

接頭辞、接尾辞の属性情報がうまく一致すれば、プロセ
ッサ１は処理１１６として、辞書メモリ４内の同単語の
属性情報と内部メモリ３内の除去した接頭辞又は接尾辞
の属性情報から上記未登録語の属性を登録用辞書６に登
録できる。第５図にキーワード、接頭辞、接尾辞の属性
情報と未登録語の属性推定例を示す。本図において、キ
ーワード。If the attribute information of the prefix and suffix match well, the processor 1 performs processing 116 to determine the above-mentioned unused word from the attribute information of the same word in the dictionary memory 4 and the attribute information of the removed prefix or suffix in the internal memory 3. Attributes of registered words can be registered in the registration dictionary 6. FIG. 5 shows attribute information of keywords, prefixes, and suffixes, and an example of attribute estimation of unregistered words. In this figure, keywords.

接頭辞、接尾辞の下の文字列は各々の持つ品詞情報を示
し、→の左側は語基の品詞情報、右側は派生語の品詞情
報であって、その接頭辞又は、接尾辞を付加することに
よって扱う単語の品詞が左側から右側に変わることを示
す。また、もしもｒＡｃｃＯＵＮＴ　Ｊが辞書メモリ４
内に存在しない場合は、処理１２９において、内部メモ
リ３内の除去した接頭辞又は接尾辞の属性情報のみから
上記未登録語を推定し、登録用辞書６に登録することと
なる。The character strings below the prefix and suffix indicate the part-of-speech information each has. The left side of → is the part-of-speech information of the base word, and the right side is the part-of-speech information of the derived word, to which the prefix or suffix is added. This indicates that the part of speech of the word being treated changes from the left to the right. Also, if rAccOUNT J is dictionary memory 4
If the unregistered word does not exist in the registration dictionary 6, in step 129, the unregistered word is estimated from only the attribute information of the removed prefix or suffix in the internal memory 3, and is registered in the registration dictionary 6.

本実施例では、接頭辞、接尾辞として文法的に意味を持
ったものを想定して記述したが、一般にある品詞が持っ
ている語尾の字面の特徴を属性推定の要因と見なすこと
もできる。例えば、〜ＡＴＥは、動詞、形容詞２名詞と
いったように複数個の品詞を推定すれば良い。ただし、
この場合は、語基の品詞情報は設定できないので、接尾
辞の意味等は付与できないことになるが、品詞のしぼり
込みの効果は実現できる。In this embodiment, prefixes and suffixes are assumed to have grammatical meaning, but the font characteristics of the ending of a certain part of speech can also be considered as a factor for attribute estimation. For example, ~ATE may estimate multiple parts of speech, such as a verb, two adjectives, and two nouns. however,
In this case, the part-of-speech information of the word base cannot be set, so the meaning of the suffix cannot be assigned, but the effect of narrowing down the parts of speech can be achieved.

〔発明の効果〕〔Effect of the invention〕

以上、述べたように本発明によれば、未登録語に対して
単語及びその属性を記憶させた辞書メモリ、及び、接頭
辞、接尾辞とその各々の属性を記憶させて接頭辞テーブ
ル、接尾辞テーブルによって、同単語の属性を推定する
ことができ、これによって接頭辞、接尾辞付加の為に未
登録語となっていた単語による構文解析の失敗を無くす
ことができる。故に、未登録語出現による文書処理の中
断を少なくし、その作業性、処理効率の向上を図ること
ができる。また、未登録語解消処理を予め登録済みの属
性情報等を利用して行うので、簡易であり、オペレータ
に対する負担を減少することができる。As described above, according to the present invention, there is a dictionary memory that stores words and their attributes for unregistered words, and a prefix table that stores prefixes, suffixes, and their respective attributes. The attributes of the same word can be estimated using the dictionary table, thereby eliminating failures in parsing due to words that have become unregistered words due to the addition of prefixes or suffixes. Therefore, interruptions in document processing due to the appearance of unregistered words can be reduced, and workability and processing efficiency can be improved. Further, since the unregistered word cancellation process is performed using previously registered attribute information, etc., it is simple and the burden on the operator can be reduced.

尚、本発明は、上述した実施例の属性の情報に限定され
るものではなく、属性の情報を細かく分類すれば、より
きめの細かい属性の付与を行うことができる。It should be noted that the present invention is not limited to the attribute information of the above-described embodiments, and by classifying the attribute information finely, more fine-grained attributes can be assigned.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明による未登録語処理方式の動作を示す流
れ線図である。第２図は本処理方式の幾つかの構成要素
を示すブロック図である。第３図接頭辞テーブルの一例
である。第４図は接尾辞テーブルの一例である。第５図
は同処理方式による未登録語の属性推定例を示す説明図
である。１・・・プロセッサ、２・・・接頭辞／接尾辞テーブル
、３・・・内部メモリ、４・・・辞書メモリ、５・・・
入力レジ潟Ｚ　口第３　図第４０第５固FIG. 1 is a flow diagram showing the operation of the unregistered word processing method according to the present invention. FIG. 2 is a block diagram showing some of the components of the present processing scheme. FIG. 3 is an example of a prefix table. FIG. 4 is an example of a suffix table. FIG. 5 is an explanatory diagram showing an example of estimating attributes of unregistered words using the same processing method. 1... Processor, 2... Prefix/suffix table, 3... Internal memory, 4... Dictionary memory, 5...
Input cash register Z gate 3 Figure 40 5th gate

Claims

【特許請求の範囲】[Claims]

１、複数の単語とその属性等を登録してある辞書を用い
て、与えられた自然言語の文章を処理する際に、辞書登
録されていない単語、即ち未登録語の扱いにおいて、上
記辞書内の未登録語の頭部及び後部を、予め登録済みの
接頭辞テーブル及び接尾辞テーブルと比較する過程と、
同未登録語と一致した接頭辞及び接尾辞を未登録語から
削除して必要ならば語尾を再構成し改めて辞書検索する
過程と、上記の過程によつて得られる接頭辞、接尾辞及
び辞書検索に成功した単語の持つ各属性情報によつて上
記未登録語の属性を推定する過程と、可能性のある複数
の上記属性を上記未登録語と共に辞書に登録する過程と
、複数の属性の多義解消を行いながら文章を解析する過
程によつて、文書処理の続行を可能にすることを特徴と
する未登録語処理方式。1. When processing a given natural language sentence using a dictionary in which multiple words and their attributes are registered, when handling words that are not registered in the dictionary, that is, unregistered words, a step of comparing the beginning and end of the unregistered word with a prefix table and a suffix table registered in advance;
The process of deleting prefixes and suffixes that match the unregistered word from the unregistered word, reconstructing the word endings if necessary, and searching the dictionary again, and the prefixes, suffixes, and dictionaries obtained through the above process. A process of estimating the attributes of the unregistered word based on the attribute information of the successfully searched words, a process of registering the possible attributes of the unregistered word together with the unregistered word, and a process of registering the attributes of the unregistered word together with the unregistered word. An unregistered word processing method that is characterized by making it possible to continue document processing through the process of analyzing a sentence while disambiguating it.