JPH10171799A

JPH10171799A - Method and device for analyzing name

Info

Publication number: JPH10171799A
Application number: JP8334007A
Authority: JP
Inventors: Shigeto Iwase; 成人岩瀬
Original assignee: N T T COMMUN WEAR KK; Nippon Telegraph and Telephone Corp
Current assignee: N T T COMMUN WEAR KK; Nippon Telegraph and Telephone Corp
Priority date: 1996-12-13
Filing date: 1996-12-13
Publication date: 1998-06-26
Anticipated expiration: 2016-12-13
Also published as: JP3587279B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to analyze a correct person's name even when only either one of a family name and a first name exists in a name dictionary by allocating FURIGANA (phonetic Japanese syllabary) to each KANJI (Chinese character) and applying FURIGANA to each KANJI when only one of a family name and a first name is included in a family name dictionary/first name dictionary or both of them are not included and judging whether an answer is correct or not based on delimiter information for solving the ambiguousness of reading. SOLUTION: A name delimiter analyzing part 10 divides an inputted full name by referring to a family name dictionary 20 and a first name dictionary 30. The analyzing part 10 analyzes a full name delimiter by using both the dictionaries 20, 30. When only one of a family name and a first name is included in the dictionary 20 or 30, a FURIGANA analyzing part 40 analyzes FURIGANA. Then a delimiting position checking part 60 checks the delimiting position of FURIGANA. A reading attribute checking part 70 checks the reading of the final character of the family name, the reading of the leading character of the first name, the number of characters in the family name/first name, the sorts of reading, etc., by referring to the attribute of the FURIGANA obtained by the analyzing part 40.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、姓名解析方法及び
装置に係り、特に、人名を扱う顧客システムで、区切な
しに入力された人名を姓と名に分割し、姓や名で検索す
るような業務や、漢字１文字毎にフリガナを対応させる
ことにより、連濁や音便等の音の変化を考慮した検索キ
ーの派生に使用する姓名解析方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for analyzing first and last names, and more particularly, to a customer system for handling personal names, which divides a personal name input without delimitation into a first name and a last name, and searches for the first name and the last name. The present invention relates to a first and last name analysis method and apparatus used for deriving a search key in consideration of a change in sound such as rendaku or stool by associating a reading with each kanji character.

【０００２】[0002]

【従来の技術】従来から、姓名を登録した辞書を準備
し、姓名両方とも辞書にある解を出力する方法が特開昭
６２−２３７５６７等に開示されている。姓名の片方し
か辞書に登録されていない場合には、特開平６−１６１
９９５に開示されているように、単語分割パターンと単
語長から姓名の区切位置を求める方法がある。2. Description of the Related Art A method of preparing a dictionary in which first and last names are registered and outputting both the first and last names in the dictionary is disclosed in Japanese Patent Application Laid-Open No. 62-237567. If only one of the first and last names is registered in the dictionary, refer to Japanese Patent Laid-Open No. 6-161.
As disclosed in U.S. Pat. No. 995, there is a method of calculating the delimitation position of the first and last name from the word division pattern and the word length.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、芸名等
でかな表記する場合もあること、日本に在住する外国人
（特に中国人等、漢字を使用する外国人）の増加、デー
タベースに含まれる間違い等により全ての種類の姓を読
みを含めて辞書に登録することは困難である。また、名
については、新しい名を付けることは可能なため、辞書
に全ての名を登録することは不可能である。[Problems to be Solved by the Invention] However, there are cases where Kana is written with stage names, etc., the number of foreigners living in Japan (especially Chinese, foreigners who use kanji), mistakes included in the database, etc. Therefore, it is difficult to register all types of surnames in the dictionary, including readings. In addition, new names can be assigned to names, and it is impossible to register all names in the dictionary.

【０００４】そこで、姓名の片方のみ辞書にある場合の
区切り位置の決定方法が重要となる。例えば、「石渡隆
瑞（イシワタリユウズイ）」という姓名に対し、姓辞書
に「石渡（イシワタリ）」と「石渡（イシワタ）」が存
在し、名辞書には存在しないとする。その時、名の部分
の漢字と読みの対応を取り、「隆瑞」の読みは「リュウ
ズイ」であることが分からないと、「石渡／隆瑞（イシ
ワタ／リユウズイ）」が正解であることが分からない。[0004] Therefore, a method of determining a break position when only one of the first and last names is in the dictionary is important. For example, it is assumed that, for the first and last name of “Ishiwatari Yuzui”, “Ishiwatari” and “Ishiwata” exist in the surname dictionary and do not exist in the name dictionary. At that time, the correspondence between the kanji and the reading of the name part was taken, and if it was not known that the reading of "Ryuzu" was "Ryuzui", then it was understood that "Ishiwata / Ryuzu" was the correct answer. Absent.

【０００５】また、姓名の区切りは辞書に存在している
姓または、名の長さや出現頻度のみでは決定できない。
例えば、「小野寺和（オノデラカズ）」の場合、辞書に
「小野」しか無い場合、名が「寺和」が対応するが、単
語の先頭の「寺」を「デラ」と読むことはあり得ないの
で、単に辞書の存在の有無から姓名の区切を正確に求め
ることはできない。[0005] Further, the delimitation of the first and last names cannot be determined only by the last names existing in the dictionary or by the length and appearance frequency of the first names.
For example, in the case of "Onodera Kazu", if there is only "Ono" in the dictionary, the name corresponds to "Terawa", but the first word "Tera" cannot be read as "Dela". Therefore, it is not possible to simply determine the delimitation of the first and last names simply based on the presence or absence of the dictionary.

【０００６】また、「小野」も「小野寺」も辞書に登録
したとしても、出現頻度は「小野」の方が「小野寺」よ
りも１０倍多いため、やはり「小野／寺和」を選択す
る。一方、単語の長さのみで判断を行う方法でも正確な
判断はできない。例えば、「羽田野里子（ハタノリ
コ）」に対して辞書に「羽田」と「羽田野」が存在した
場合、長さのみからは、「羽田野／里子」を出力する
が、正解は、「羽田／野里子」である。[0006] Even if both "Ono" and "Onodera" are registered in the dictionary, "Ono / Terawa" is also selected because "Ono" has ten times more appearance frequency than "Onodera". On the other hand, an accurate determination cannot be made even by a method in which the determination is performed only based on the word length. For example, if "Haneda" and "Haneda" exist in the dictionary for "Riko Haneda (Hatano Riko)", "Haneda / Riko" is output only from the length, but the correct answer is "Haneda / Noriko"".

【０００７】本発明は、上記の点に鑑みなされたもの
で、姓名辞書の片方しか辞書にない人名であっても、正
しく姓名の区切を付与し、正しい人名の解析を可能とす
る人名解析方法及び装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and provides a personal name analysis method that correctly assigns a first and last name delimiter to a correct personal name even if only one of the first and last name dictionaries is present in the dictionary. And an apparatus.

【０００８】[0008]

【課題を解決するための手段】図１は、本発明の第１の
原理を説明するための図である。本発明は、姓名の区切
りなしに入力された個人名とフリガナに対して姓と名の
区切り及び漢字１文字毎の文字区切りを付与する姓名解
析方法において、姓を登録した姓辞書と、名を登録した
名辞書を用いて、入力された姓名を姓と名に分割し（ス
テップ１）、文字に対する読みを登録した文字辞書を用
いて、姓または名の片方しか姓辞書・名辞書にない場
合、あるいは、両方とも該姓辞書・名辞書にない場合に
は、漢字１文字毎にフリガナの対応を取ることにより漢
字１文字毎にフリガナを付与し（ステップ２）、読みの
多義を解消するための文字の区切り情報に基づいて解が
正しいかを判定する（ステップ３）。FIG. 1 is a diagram for explaining a first principle of the present invention. The present invention provides a first name and last name analyzing method for assigning a first name and a last name and a character delimiter for each kanji character to a personal name and a reading input without a first name and a last name. Using the registered name dictionary, divide the input first and last name into first and last name (step 1), and use the character dictionary in which the readings for the characters are registered. Or, if both are not in the surname dictionary / first name dictionary, the reading is assigned to each kanji character by taking correspondence of the reading for each kanji character (step 2) to eliminate the polysemy of reading. It is determined whether the solution is correct based on the character delimiter information (step 3).

【０００９】図２は、本発明の第２の原理を説明するた
めの図である。また、本発明は、ステップ３において、
解が正しいかを判定する際に、文字区切り位置と姓名区
切り位置の対応を取ることにより漢字とフリガナに姓名
の区切りを付与し、区切りの対応しない解を棄却する
（ステップ３−１）。FIG. 2 is a diagram for explaining the second principle of the present invention. Also, the present invention provides the following in step 3.
When judging whether the solution is correct, a correspondence between the character delimiter position and the first and last name delimiter position is made to assign a delimiter for the first and last names to the kanji and the furigana, and a solution that does not correspond to the delimiter is rejected (step 3-1).

【００１０】また、本発明は、予め文字辞書に文字の属
性を登録しておき、ステップ３において、入力された文
字列に対して、姓または、名のみしか辞書にないとき、
文字の読みの属性を参照することにより、誤りと判断さ
れる解を棄却する（ステップ３−２）。Further, according to the present invention, the attribute of a character is registered in a character dictionary in advance, and when only the last name or first name is found in the dictionary for the input character string in step 3,
By referring to the character reading attribute, the solution determined to be incorrect is rejected (step 3-2).

【００１１】図３は、本発明の第３の原理を説明するた
めの図である。本発明は、予め、姓辞書及び名辞書に姓
・名の出現頻度情報及び姓・名の属性を登録しておき、
ステップ１〜ステップ３−１、ステップ３−２に加え
て、入力された文字列に対し、姓名の片方しか辞書にな
い解が複数ある場合は、姓または、名の頻度、辞書に存
在した姓または名の長さ、辞書に存在した姓または、名
の属性を評価し、最も確からしい解を出力する（ステッ
プ３−３）。FIG. 3 is a diagram for explaining the third principle of the present invention. The present invention, in advance, the appearance frequency information of the surname / first name and the attribute of the surname / first name are registered in the surname dictionary and the name dictionary,
In addition to Step 1 to Step 3-1 and Step 3-2, if there are a plurality of solutions in which only one of the first and last names is present in the dictionary for the input character string, the last name or the frequency of the first name, the last name existing in the dictionary Alternatively, the length of the first name, the attribute of the last name or first name existing in the dictionary is evaluated, and the most probable solution is output (step 3-3).

【００１２】本発明は、姓名の区切りなしに入力された
個人名とフリガナに対して姓と名の区切り及び漢字１文
字毎の文字区切りを付与する姓名解析装置であって、姓
を登録した姓辞書と、名を登録した名辞書と、文字に対
する読みを登録した文字辞書と、入力された姓名を、姓
辞書と名辞書を用いて、姓と名に分割する姓名分割手段
と、文字辞書を用いて、姓または、名の片方または、両
方とも該文字辞書にない場合に、漢字１文字毎にフリガ
ナの対応をとることにより漢字１文字毎にフリガナを付
与するフリガナ付与手段と、文字区切り位置と姓名区切
り位置の対応を取ることにより、漢字とフリガナに姓名
の区切りを付与し、区切りの対応しない解を棄却する解
棄却手段とを有する。[0012] The present invention is a first and last name analyzing apparatus for assigning a first name and a last name and a kanji character-by-character delimiter to a personal name and a reading input without a first name and a last name. A dictionary, a name dictionary in which first names are registered, a character dictionary in which readings for characters are registered, a first and last name dividing means for dividing the input first and last names into first and last names using the last name dictionary and the first name dictionary, and a character dictionary. When one or both of the last name and the first name are not in the character dictionary, a reading device is provided for giving a reading to each kanji character by associating a reading to each kanji character, and a character separation position. By providing a correspondence between the first and last name and the first and last name delimiter positions, a delimiter for giving the first and last names to the kanji and the reading is provided, and a solution that does not correspond to the delimiter is rejected.

【００１３】また、上記の文字辞書は、文字に対する読
みに加え、該文字の読みの属性を含み、解棄却手段は、
入力された文字列に対して姓または、名のみしか辞書に
無いとき、文字辞書の文字の読みの属性を参照すること
により、誤りと判断される解を棄却する誤り棄却手段を
含む。In addition, the character dictionary includes, in addition to the pronunciation of the character, the attribute of the pronunciation of the character,
An error rejecting means for rejecting a solution determined as an error by referring to the character reading attribute of the character dictionary when only the last name or the first name of the input character string is present in the dictionary is included.

【００１４】また、上記の姓辞書は、姓の出現頻度情報
及び姓の属性を含み、名辞書は、名の出現頻度情報及び
名の属性を含み、入力された文字列に対し、姓・名の片
方しか姓辞書または、名辞書に存在しない解が複数在る
場合には、該姓または、該名の頻度、該姓辞書または、
該名辞書に存在した姓または、名の長さ、属性を評価す
る長さ・属性評価手段と、長さ・属性評価手段の評価結
果に基づいて最も確からしい解を出力する解出力手段と
を更に有する。Further, the surname dictionary includes surname appearance frequency information and surname attributes, and the surname dictionary includes surname appearance frequency information and surname attributes. If there is more than one solution that does not exist in the surname dictionary or the name dictionary, the surname or the frequency of the name, the surname dictionary or
Length / attribute evaluation means for evaluating the length and attribute of the surname or first name existing in the name dictionary, and solution output means for outputting a most probable solution based on the evaluation result of the length / attribute evaluation means. Have more.

【００１５】上記のように、本発明では、姓名辞書にな
い場合でも、文字の区切りを付与するために、漢字１文
字毎のフリガナを解析し、読みの多義（「石渡」を「イ
シワタリ」と読むか「イシワタ」と読むか）を解消する
ための文字の区切り情報に基づいて姓名辞書での検索結
果が正しいかチェックすることができる。As described above, in the present invention, even if it is not in the first and last name dictionary, in order to provide a character delimiter, the reading of each kanji character is analyzed and the polysemy of reading ("Ishiwatari" is replaced with "Ishiwatari"). It is possible to check whether or not the search result in the first and last name dictionary is correct based on the character delimiter information for resolving whether to read or "Ishiwata".

【００１６】また、日本語としてあり得ない区切り方を
排除するために漢字１文字毎の読みの属性を求め、属性
のチェックをすることが可能となる。さらに、姓名の長
さ、頻度、属性から解候補の評価を行うとで、評価値に
基づいた候補の出力が可能となる。In addition, it is possible to obtain a reading attribute for each kanji character and to check the attribute in order to eliminate a delimiter that is impossible in Japanese. Further, by evaluating the solution candidate based on the length, frequency, and attribute of the first and last names, it is possible to output the candidate based on the evaluation value.

【００１７】また、姓名のどちらかが辞書にない時は、
フリガナを解析し、１文字毎の漢字の読みを求め、漢字
とフリガナが対応するかをチェックする。これにより
「石渡（イシワタリ）」と「石渡（いしわた）」の様に
読みが包含関係にある解のチェックを行うことが可能と
なる。If either the first name or the last name is not in the dictionary,
The reading is analyzed to determine the reading of the kanji for each character, and it is checked whether the kanji corresponds to the reading. As a result, it is possible to check for solutions whose readings are in an inclusive relation, such as “Ishiwatari” and “Ishiwata”.

【００１８】次に、漢字１文字毎に読みの属性を求め、
姓の末尾にならない読み、名の先頭にならない読み、１
文字の単語でしか読まない文字、２文字以上の単語で読
む文字等のチェックを行い、矛盾する候補を棄却する。
この処理により、「小野寺和」の「寺（デラ）」が先頭
になる様な解を棄却する。Next, the reading attribute is determined for each kanji character,
Yomi not at the end of last name, Yomi not at the beginning of first name, 1
A check is made for characters that can be read only with the words of the characters, characters that are read with words of two or more characters, and contradictory candidates are rejected.
By this processing, the solution in which "Tedera (dera)" of "Onodera Kazu" is the first is rejected.

【００１９】最後に、単語の頻度と辞書に存在した単語
の長さにより、最も良い評価値の解を出力する。但し、
「一郎」「太郎」などの名の場合は、前方に一文字付い
て「恵一郎」「栄太郎」のようになる場合が多いので、
評価値を下げて評価する。このように、本発明によれ
ば、姓名辞書の片方にしか存在しない人名でも、フリガ
ナの属性をチェックするので、正しくない解を棄却する
ことができる。また、単語の長さ、頻度、姓名の属性を
組み合わせて解の評価を行うので、正しい解を選択する
ことが可能となる。Finally, a solution with the best evaluation value is output based on the word frequency and the length of the word existing in the dictionary. However,
In the case of names such as "Ichiro" and "Taro", there is often a letter in front of them, like "Keiichiro" or "Eitaro"
Evaluate by lowering the evaluation value. As described above, according to the present invention, an incorrect solution can be rejected even for a person name that exists only in one of the first and last name dictionaries because the attribute of the reading is checked. In addition, since the solution is evaluated by combining the attributes of the word length, frequency, and first and last name, it is possible to select the correct solution.

【００２０】[0020]

【発明の実施の形態】図４は、本発明の第１の姓名解析
装置の構成を示す。同図に示す姓名解析装置の構成は、
姓名区切り解析部１０、姓辞書２０、名辞書３０、フリ
ガナ解析部４０、文字辞書５０、区切り位置チェック部
６０から構成される。FIG. 4 shows the configuration of a first and last name analyzing apparatus according to the present invention. The configuration of the first and last name analyzer shown in FIG.
It comprises a first and last name separation analysis unit 10, a last name dictionary 20, a first name dictionary 30, a reading analysis unit 40, a character dictionary 50, and a separation position check unit 60.

【００２１】姓名区切り解析部１０は、姓辞書２０、名
辞書３０を参照して入力された姓名を分割する。姓辞書
２０は、姓が登録された辞書である。また、当該姓に対
応する属性、出現頻度情報等を併せて登録しておくよう
にしてもよい。The first and last name separation analysis section 10 divides the first and last name inputted by referring to the first name dictionary 20 and the first name dictionary 30. The surname dictionary 20 is a dictionary in which surnames are registered. Further, an attribute corresponding to the last name, appearance frequency information, and the like may be registered together.

【００２２】名辞書３０は、名が登録された辞書でる。
また、当該名に対応する属性、出現頻度情報等を併せて
登録しておくようにしてもよい。フリガナ解析部４０
は、入力された姓名に対して漢字とフリガナの対応をと
る。The name dictionary 30 is a dictionary in which names are registered.
Further, an attribute corresponding to the name, appearance frequency information, and the like may be registered together. Reading analysis unit 40
Takes correspondence between Chinese characters and readings for the input first and last names.

【００２３】文字辞書５０は、漢字と読みの対応を登録
した辞書である。区切り位置チェック部６０は、姓辞書
２０、名辞書１０による区切りと文字辞書５０による区
切りをチェックする。図５は、本発明の第１の姓名解析
装置構成における動作のフローチャートである。The character dictionary 50 is a dictionary in which correspondence between kanji and pronunciation is registered. The delimiter position check unit 60 checks a delimiter by the last name dictionary 20 and the first name dictionary 10 and a delimiter by the character dictionary 50. FIG. 5 is a flowchart of the operation in the first and last name analysis device configuration of the present invention.

【００２４】ステップ１０１）まず、姓名区切り解析
部１０は、姓辞書２０及び名辞書３０による姓名区切り
を解析する。ステップ１０２）ここで、姓名の両方の解があるかを
判定し、ある場合には、ステップ１０３に移行し、ない
場合には、ステップ１０４に移行する。Step 101) First, the first and last name separation analysis unit 10 analyzes the first and last name separation by the first and second name dictionaries 20 and 30. Step 102) Here, it is determined whether there is a solution for both the first and last names, and if there is, the process proceeds to step 103, and if not, the process proceeds to step 104.

【００２５】ステップ１０３）姓名の両方ある解が複
数ある場合には、姓名の頻度の合計が多い解を出力す
る。ステップ１０４）姓名の片方のみ、姓辞書２０また
は、名辞書３０にある場合には、フリガナ解析部４０に
おいて、フリガナの解析を行う。Step 103) If there are a plurality of solutions having both the first and last names, a solution having a large sum of the frequencies of the first and last names is output. Step 104) If only one of the first and last names is in the last name dictionary 20 or the first name dictionary 30, the reading analysis unit 40 analyzes the reading.

【００２６】ステップ１０５）つぎに、区切り位置チ
ェック部６０において、フリガナの区切り位置のチェッ
クを行う。図６は、本発明の第２の姓名解析装置の構成
を示す。同図に示す構成は、前述の図４の構成に読み属
性をチェックする読み属性チェック部７０を加えた構成
である。Step 105) Next, the division position check section 60 checks the division position of the reading. FIG. 6 shows the configuration of the second name and name analyzing apparatus of the present invention. The configuration shown in the figure is a configuration in which a reading attribute check unit 70 for checking a reading attribute is added to the configuration of FIG. 4 described above.

【００２７】図７は、本発明の第２の姓名解析装置構成
における動作のフローチャートである。ステップ２０１）まず、姓名区切り解析部１０は、
姓辞書２０及び名辞書３０による姓名区切りを解析す
る。FIG. 7 is a flowchart of the operation in the second name and name analyzing apparatus configuration of the present invention. Step 201) First, the first and last name separation analysis unit 10
The first name and last name separated by the last name dictionary 20 and the first name dictionary 30 are analyzed.

【００２８】ステップ２０２）ここで、姓名の両方の
解があるかを判定し、ある場合には、ステップ２０３に
移行し、ない場合には、ステップ２０４に移行する。ステップ２０３）姓名の両方ある解が複数ある場合に
は、姓名の頻度の合計が多い解を出力する。Step 202) Here, it is determined whether there is a solution for both the first and last names. If there is, the process proceeds to step 203, and if not, the process proceeds to step 204. Step 203) If there are a plurality of solutions having both the first and last names, a solution having a large sum of the frequencies of the first and last names is output.

【００２９】ステップ２０４）姓名の片方のみ、姓辞
書２０または、名辞書３０にある場合には、フリガナ解
析部４０において、フリガナの解析を行う。ステップ２０５）つぎに、区切り位置チェック部６０
において、フリガナの区切り位置のチェックを行う。Step 204) If only one of the first and last names is in the last name dictionary 20 or the first name dictionary 30, the reading analysis section 40 analyzes the reading. Step 205) Next, the separation position check unit 60
In, a check is made for the position at which the reading is separated.

【００３０】ステップ２０６）読み属性チェック部７
０は、フリガナ解析部４０で取得したフリガナの属性を
参照することにより、姓の末尾の文字の読み、名の先頭
の文字の読み、姓・名の文字数、読みの種類等をチェッ
クする。つまり、図５に示す動作に上記のステップ２０
６の動作が付加される。Step 206) Reading attribute check section 7
0 refers to the reading attribute obtained by the reading analysis unit 40 to check the reading of the last character of the last name, the reading of the first character of the first name, the number of characters of the last name / first name, the type of reading, and the like. That is, the operation shown in FIG.
Operation 6 is added.

【００３１】図８は、本発明の第３の姓名解析装置の構
成を示す。同図に示す構成は、前述の図６の構成にさら
に、単語の長さ・頻度・属性により解候補を評価し、最
も評価の良い解を選択する解評価・選択部８０が加えら
れた構成である。FIG. 8 shows a configuration of a third name analyzing apparatus according to the present invention. The configuration shown in the figure is a configuration in which a solution evaluation / selection unit 80 that evaluates a solution candidate based on the length, frequency, and attribute of a word and selects a solution with the highest evaluation is added to the configuration shown in FIG. It is.

【００３２】図９は、本発明の第３の姓名解析装置構成
における動作のフローチャートである。ステップ３０１）まず、姓名区切り解析部１０は、姓
辞書２０及び名辞書３０による姓名区切りを解析する。FIG. 9 is a flowchart of the operation in the third and last name analyzing device configuration of the present invention. Step 301) First, the first and last name separation analysis unit 10 analyzes the first and last name separation by the first and second name dictionaries 20 and 30.

【００３３】ステップ３０２）ここで、姓名の両方の
解があるかを判定し、ある場合には、ステップ３０３に
移行し、ない場合には、ステップ３０４に移行する。ステップ３０３）姓名の両方ある解が複数ある場合に
は、姓名の頻度の合計が多い解を出力する。Step 302) Here, it is determined whether there is a solution for both the first and last names. If there is, the process proceeds to step 303. If not, the process proceeds to step 304. Step 303) When there are a plurality of solutions having both the first and last names, a solution having a large sum of the frequencies of the first and last names is output.

【００３４】ステップ３０４）姓名の片方のみ、姓辞
書２０または、名辞書３０にある場合には、フリガナ解
析部４０において、フリガナの解析を行う。ステップ３０５）つぎに、区切り位置チェック部６０
において、フリガナの区切り位置のチェックを行う。Step 304) If only one of the first and last names is in the last name dictionary 20 or the first name dictionary 30, the reading analysis section 40 analyzes the reading. Step 305) Next, the separation position check unit 60
In, a check is made for the position at which the reading is separated.

【００３５】ステップ３０６）読み属性チェック部７
０は、フリガナ解析部４０で取得したフリガナの属性を
参照することにより、姓の末尾の文字の読み、名の先頭
の文字の読み、姓・名の文字数、読みの種類等をチェッ
クする。ステップ３０７）まだ、解が複数あるかを判定し、あ
る場合には、ステップ３０８に移行する。Step 306) Reading attribute check section 7
0 refers to the reading attribute obtained by the reading analysis unit 40 to check the reading of the last character of the last name, the reading of the first character of the first name, the number of characters of the last name / first name, the type of reading, and the like. Step 307) It is determined whether or not there are still a plurality of solutions.

【００３６】ステップ３０８）解評価・選択部８０
は、姓辞書２０、名辞書３０のいずれかにある方の単語
の長さと頻度、及び名の属性を総合して最もよい解を出
力する。Step 308) Solution evaluation / selection section 80
Outputs the best solution by integrating the length and frequency of the word in either the surname dictionary 20 or the name dictionary 30 and the attribute of the first name.

【００３７】[0037]

【実施例】以下、図面と共に本発明の実施例を説明す
る。［第１の実施例］第１の実施例は、前述の図４及び図５
に基づいて、図１０のフローチャートを用いて説明す
る。図１０のステップ番号は、図５のステップ番号と同
様である。Embodiments of the present invention will be described below with reference to the drawings. [First Embodiment] The first embodiment is described with reference to FIGS.
This will be described with reference to the flowchart of FIG. The step numbers in FIG. 10 are the same as the step numbers in FIG.

【００３８】まず、入力された姓名に対して、姓辞書２
０、名辞書３０を検索して、姓名の区切りを求める（ス
テップ１０１）。この例では、以下の区切りが得られた
ものとする。なお、％は姓名区切りであり、／は文字区
切りであるとして説明する。解１：石渡％隆瑞イシワタ％リユウズイ解２：石渡％隆瑞イシワタリ％ユウズイ次に、姓名が姓辞書２０と名辞書３０のいずれか一方に
ある解が１つの場合には、その解を出力して処理を終了
する。姓名が姓辞書２０及び名辞書３０の双方にある解
が複数ある場合には姓名の頻度の合計が多い解を出力す
る（ステップ１０２）。First, a first name dictionary 2
0, the name dictionary 30 is searched to determine the delimitation of the first and last names (step 101). In this example, it is assumed that the following breaks have been obtained. Note that the description will be made on the assumption that% is a first and last name delimiter and / is a character delimiter. Solution 1: Ishiwata% Takami Ishiwata% Ryuzui Solution 2: Ishiwata% Takami Ishiwata% Yuzui Next, if there is one solution whose first and last name is in either the last name dictionary 20 or the first name dictionary 30, the solution is output. And terminate the processing. When there are a plurality of solutions having the first and last names in both the first name dictionary 20 and the first name dictionary 30, a solution having a large sum of the first and last names is output (step 102).

【００３９】また、姓名が姓辞書２０、名辞書３０のい
ずれか片方の辞書にある場合には、フリガナ解析部４０
の解析結果に基づいて、区切り位置チェック部６０がフ
リガナの区切り位置のチェックを行う（ステップ１０
４）。例えば、「石渡隆瑞（イシワタリユウズイ）」に
対して「石渡（イシワタリ）」と「石渡（イシワタ）」
の姓が辞書に存在する。フリガナ解析部４０による結果
は、「石渡隆瑞（イシ／ワタ／リユウ／ズイ）」となる
ので、区切り位置チェック部６０により、「石渡隆瑞
（イシワタリ／ユウズイ）」と区切る解は棄却され、
「石渡隆瑞（イシワタ／リユウズイ）」が取得される
（ステップ１０５）。If the surname is in either the surname dictionary 20 or the surname dictionary 30, the reading analysis unit 40
Based on the analysis result, the division position check unit 60 checks the division position of the reading (step 10).
4). For example, "Ishiwatari" and "Ishiwata" for "Ishiwatari Yuzui"
Last name exists in the dictionary. Since the result of the reading analysis unit 40 is “Takumi Ishiwatari (Ishiwatari / Yuzui)”, the solution that is separated from “Takumi Ishiwata (Ishiwatari / Yuzui)” by the separation position checking unit 60 is rejected.
“Takazu Ishiwatari (Ishiwata / Ryuuzui)” is acquired (step 105).

【００４０】［第２の実施例］前述の図６及び図７に基
づいて、図１０のフローチャートを用いて説明する。図
１１のステップ番号は、図７のステップ番号と同様であ
る。図１１は、本発明の第２の実施例の動作を説明する
フローチャートである。[Second Embodiment] A description will be given with reference to the flowchart of FIG. 10 based on FIGS. 6 and 7 described above. The step numbers in FIG. 11 are the same as the step numbers in FIG. FIG. 11 is a flowchart illustrating the operation of the second embodiment of the present invention.

【００４１】前述の第１の実施例と同様に、姓名区切り
解析により、解１：小野寺％和オノデラ％カズ解２：小野％寺和オノ％デラカズの２つの解が得られる（ステップ２０１）。ここで、姓
名が姓辞書２０、名辞書３０の双方にある解が複数存在
するため（ステップ２０２）、第１の実施例と同様にフ
リガナ解析部４０により、フリガナ解析を行う（ステッ
プ２０４）。例えば、小野寺和オ／ノ／デラ／カズという解析結果が得られたとする。In the same manner as in the first embodiment described above, two solutions of solution 1: Onodera% sum Onodera% Kazu solution 2: Ono% Terawa Ono% delakazu are obtained by first and last name separation analysis (step 201). Here, since there are a plurality of solutions having the first and last names in both the first name dictionary 20 and the first name dictionary 30 (step 202), the reading analysis is performed by the reading analysis unit 40 as in the first embodiment (step 204). For example, suppose that the analysis result of Onodera Kazuo / No / Dela / Kazu was obtained.

【００４２】次に、区切り位置チェック部６０の、区切
り位置チェックにより、「小野寺／和（オノデラ／カ
ズ）と「小野／寺和（オノ／デラカズ）」が取得された
場合に（ステップ２０５）、読み属性チェック部７０
は、「小野寺和（オノデラカズ）」を「小野／寺和」と
区切る解は、文字辞書５０を参照することにより、単語
の先頭では、「寺」を「デラ」と読まないので、棄却す
る（ステップ２０６）。Next, when "Onodera / Wa (Onodera / Kaz)" and "Ono / Terawa (Ono / Delakazu)" are obtained by the break position check of the break position check unit 60 (step 205), Reading attribute check unit 70
Rejects the solution that separates "Onodera Kazu (Onodera Kazu)" from "Onodera / Teiwa" because the word dictionary does not read "dera" as "dera" at the beginning of the word by referring to the character dictionary 50 ( Step 206).

【００４３】また、「羽田野里子（ハタノリコ）」を
「羽田野／里子（ハタノ／リコ）と区切られた解につい
ては、読み属性チェック部７０により、「里（リ）」は
名の接尾辞「子」を除いて２文字以上で読むので、棄却
される。図１２に、読み属性チェック部７０による参照
された文字辞書５０の例を示す。For the solution in which "Riko Hanedano (Hatano Rico)" is separated from "Hatano / Satoko (Hatano / Rico)", the reading attribute checker 70 determines that "Sato (ri)" is the suffix "ko". ”, It is rejected because it is read in two or more characters. FIG. 12 shows an example of the character dictionary 50 referred to by the reading attribute check unit 70.

【００４４】［第３の実施例］次に、本発明の第３の実
施例を説明する。前述の図８及び図９に基づいて、図１
３のフローチャートを用いて説明する。図１３のステッ
プ番号は、図９のステップ番号と同様である。図１３
は、本発明の第３の実施例の動作を説明するフローチャ
ートである。[Third Embodiment] Next, a third embodiment of the present invention will be described. Based on FIGS. 8 and 9 described above, FIG.
This will be described with reference to the flowchart of FIG. The step numbers in FIG. 13 are the same as the step numbers in FIG. FIG.
9 is a flowchart for explaining the operation of the third embodiment of the present invention.

【００４５】図１３のフローチャートにおいて、ステッ
プ３０６の処理は無くとも構わない。また、処理結果
（ステップ３０６までにおいて）で解候補が無くなれば
区切りは無しと判断される。まだ、解がある場合には、
読み属性チェック部７０により、辞書にある方の単語の
長さと頻度及び名の属性（「一郎」「太郎」等の名の一
部になる単語は評価を下げる）を総合して最も良い解を
出力する（ステップ３０６）。一般には、頻度よりも単
語長を優先させる方が評価関数としては性能が良くなる
という実験結果があるので、辞書にあった姓・名が長い
解を優先し、同じ長さのとき頻度の多い解を優先させ
る。In the flowchart of FIG. 13, the process of step 306 may be omitted. If there is no solution candidate in the processing result (up to step 306), it is determined that there is no break. If there is still a solution,
The reading attribute checker 70 combines the length and frequency of the word in the dictionary and the attribute of the name (words that become part of the name such as "Ichiro" and "Taro" have a lower evaluation) to determine the best solution. Output (Step 306). In general, there is an experimental result that the performance of the evaluation function is better if the word length is prioritized over the frequency. Prioritize the solution.

【００４６】例えば、次のような評価式が考えられる。（単語の長さ）×１０＋（頻度）評価式において、（頻度）は単語の頻度ｌｏｇを０から
１０までに正規化したものである。なお、この正規化さ
れた単語の頻度は、姓辞書２０、名辞書３０に予め登録
しておくようにしてもよい。For example, the following evaluation formula can be considered. (Length of word) × 10 + (frequency) In the evaluation formula, (frequency) is a value obtained by normalizing the frequency log of the word from 0 to 10. Note that the normalized word frequency may be registered in the surname dictionary 20 and the name dictionary 30 in advance.

【００４７】例として、「森沢繁澄（モリサワシゲス
ミ）」を解析すると、「森」は頻度９、「森沢」は頻度
５なので、頻度情報のみで判断すると、「森／沢繁澄」
であるが、上記の評価式を用いると、「森」は１９、
「森沢」は２５となり、正解の「森沢／繁澄」を出力す
る。As an example, if "Morisawa Shigesumi" is analyzed, "Mori" has a frequency of 9 and "Morisawa" has a frequency of 5. Therefore, judging only with the frequency information, "Mori / Sawasumi"
However, using the above evaluation expression, "forest" is 19,
"Morizawa" becomes 25, and the correct answer "Morizawa / Shigemi" is output.

【００４８】但し、「一郎」「太郎」のような名の場合
は、当該名の前に一文字付加される可能性があるので、
解の評価を悪くするように上記評価式を修正して用い
る。また、名が名辞書３０辞書にあり、姓の部分が長す
ぎる（例えば、６文字以上）の場合は、その解は間違っ
た所で切れている可能性が高いので、棄却する。However, in the case of names such as "Ichiro" and "Taro", there is a possibility that one character may be added before the name.
The above evaluation formula is modified and used to make the evaluation of the solution worse. If the first name is in the first name dictionary 30 and the last name portion is too long (for example, 6 characters or more), the solution is likely to be cut off at an incorrect place, and is rejected.

【００４９】例えば、「アキヤマサダアキ」と仮名で入
力された姓名に対し、「アキ」のみが名辞書３０にあっ
た場合、当該名部分を除いた姓部分は「アキヤマサダ」
になる。しかし、このような長い姓はないので棄却す
る。なお、本発明は、上記の実施例に限定されることな
く、特許請求の範囲内で種々変更・応用が可能である。For example, if only "Aki" is present in the name dictionary 30 with respect to the first and last name entered as a pseudonym "Akiyama Masada", the last name excluding the first name is "Akiyama Masada".
become. However, there is no such last name, so I will reject it. It should be noted that the present invention is not limited to the above-described embodiment, but can be variously modified and applied within the scope of the claims.

【００５０】[0050]

【発明の効果】上述のように、本発明の姓名解析方法及
び装置によれば、姓名辞書の片方にか辞書にない人名で
も１文字毎にフリガナの属性をチェックするので、正し
く姓名の区切りを付与することができる。また、単語長
と頻度と単語の属性を組み合わせて解の評価を行うの
で、正確な解を出力することができる。As described above, according to the first and last name analysis method and apparatus of the present invention, even if a person's name is in one of the first and last name dictionaries or is not in the dictionary, the attribute of the reading is checked for each character. Can be granted. In addition, since the solution is evaluated by combining the word length, the frequency, and the attribute of the word, an accurate solution can be output.

【００５１】また、姓名のどちらかが辞書にあれば、解
析できるので、解析率が大幅に向上する。例えば、カバ
ー率９５％の姓辞書と名辞書を用いたとしても、姓名両
方とも辞書にある確率は、９０％程度に低下する。とこ
ろが、本発明で示した方法及び装置によれば、姓名の片
方が辞書にあれば解析できるので、カバー率は９９％以
上になる。If either the first name or the last name is in the dictionary, it can be analyzed, so that the analysis rate is greatly improved. For example, even if a surname dictionary and a name dictionary with a coverage of 95% are used, the probability that both the surname and the name are in the dictionary is reduced to about 90%. However, according to the method and apparatus shown in the present invention, if one of the first and last names is in the dictionary, it can be analyzed, so that the coverage is 99% or more.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の第１の原理を説明するための図であ
る。FIG. 1 is a diagram for explaining a first principle of the present invention.

【図２】本発明の第２の原理を説明するための図であ
る。FIG. 2 is a diagram for explaining a second principle of the present invention.

【図３】本発明の第３の原理を説明するための図であ
る。FIG. 3 is a diagram for explaining a third principle of the present invention.

【図４】本発明の第１の姓名解析装置の構成図である。FIG. 4 is a configuration diagram of a first and last name analysis device of the present invention.

【図５】本発明の第１の姓名解析装置構成における動作
のフローチャートである。FIG. 5 is a flowchart of an operation in the first and last name analyzing device configuration of the present invention.

【図６】本発明の第２の姓名解析装置の構成図である。FIG. 6 is a configuration diagram of a second name analysis device according to the present invention.

【図７】本発明の第２の姓名解析装置構成における動作
のフローチャートである。FIG. 7 is a flowchart of an operation in the second name and name analysis device configuration of the present invention.

【図８】本発明の第３の姓名解析装置の構成図である。FIG. 8 is a configuration diagram of a third and last name analysis device of the present invention.

【図９】本発明の第３の姓名解析装置構成における動作
のフローチャートである。FIG. 9 is a flowchart of the operation in the third and last name analysis device configuration of the present invention.

【図１０】本発明の第１の実施例の動作を説明するフロ
ーチャートである。FIG. 10 is a flowchart illustrating an operation of the first exemplary embodiment of the present invention.

【図１１】本発明の第２の実施例の動作を説明するフロ
ーチャートである。FIG. 11 is a flowchart illustrating an operation of the second exemplary embodiment of the present invention.

【図１２】本発明の第２の実施例の文字辞書の例であ
る。FIG. 12 is an example of a character dictionary according to a second embodiment of the present invention.

【図１３】本発明の第３の実施例の動作を説明するフロ
ーチャートである。FIG. 13 is a flowchart illustrating the operation of the third embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１０姓名区切り解析部２０姓辞書３０名辞書４０フリガナ解析部５０文字辞書６０区切り位置チェック部７０読み属性チェック部８０解評価・選択部 DESCRIPTION OF SYMBOLS 10 First name and last name analysis part 20 Last name dictionary 30 First name dictionary 40 Reading analysis part 50 Character dictionary 60 Delimiter position check part 70 Reading attribute check part 80 Solution evaluation / selection part

Claims

【特許請求の範囲】[Claims]

【請求項１】姓名の区切りなしに入力された個人名と
フリガナに対して姓と名の区切り及び漢字１文字毎の文
字区切りを付与する姓名解析方法において、姓を登録した姓辞書と、名を登録した名辞書を用いて、
入力された前記姓名を姓と名に分割し、文字に対する読みを登録した文字辞書を用いて、姓また
は名の片方しか前記姓辞書、前記名辞書にない場合、あ
るいは、両方とも該姓・名辞書にない場合には、漢字１
文字毎にフリガナの対応を取ることにより漢字１文字毎
にフリガナを付与し、読みの多義を解消するための文字の区切り情報に基づい
て解が正しいかを判定することを特徴とする姓名解析方
法。A first and last name analyzing method for assigning a first and last name separator and a character separator for each kanji character to a personal name and a reading that is input without a first and last name separator. Using the name dictionary that registered
The input first and last name is divided into first and last names, and using a character dictionary in which readings for characters are registered, if only one of the last names or first names is present in the last name dictionary or the first name dictionary, or both of the last name and first name Kanji 1 if not in dictionary
A surname / name analysis method characterized by providing a reading for each kanji character by taking a reading correspondence for each character, and determining whether a solution is correct based on character delimiter information for eliminating polysemy of reading. .

【請求項２】前記解が正しいかを判定する際に、文字区切り位置と姓名区切り位置の対応を取ることによ
り漢字とフリガナに姓名の区切りを付与し、区切りの対
応しない解を棄却する請求項１記載の姓名解析方法。2. A method according to claim 1, wherein, when determining whether said solution is correct, a correspondence between a character delimiter position and a first and last name delimiter position is provided, and a kanji character and a reading are assigned a first and last name delimiter, and a solution having no corresponding delimiter is rejected. 1. The method for analyzing the first and last names described in 1.

【請求項３】前記文字辞書に文字の属性を登録し、入力された文字列に対して、姓または、名のみしか辞書
にないとき、文字の読みの属性を参照することにより、
誤りと判断される解を棄却する請求項１記載の姓名解析
方法。3. Registering an attribute of a character in the character dictionary, and referring to an attribute of reading of a character when only an input of a surname or a first name is present in the dictionary.
2. The method according to claim 1, wherein the solution determined to be incorrect is rejected.

【請求項４】前記姓辞書及び前記名辞書に姓・名の出
現頻度情報及び姓・名の属性を登録し、入力された文字
列に対し、姓名の片方しか前記姓辞書、前記名辞書にな
い解が複数ある場合は、姓または、名の頻度、辞書に存
在した姓または名の長さ、前記辞書に存在した姓また
は、名の属性を評価し、最も確からしい解を出力する請
求項１または、３記載の姓名解析方法。4. Registering the appearance frequency information of first and last names and attributes of first and last names in the last name dictionary and the first name dictionary, and for the input character string, only one of the first and last names is stored in the last name dictionary and the first name dictionary. If there are multiple solutions that do not exist, evaluate the frequency of the last name or first name, the length of the last name or first name existing in the dictionary, the attribute of the last name or first name existing in the dictionary, and output the most probable solution. The method for analyzing first and last names according to 1 or 3.

【請求項５】姓名の区切りなしに入力された個人名と
フリガナに対して姓と名の区切り及び漢字１文字毎の文
字区切りを付与する姓名解析装置であって、姓を登録した姓辞書と、名を登録した名辞書と、文字に対する読みを登録した文字辞書と、前記姓辞書と前記名辞書を用いて、入力された姓名を姓
と名に分割する姓名分割手段と、前記文字辞書を用いて、姓または、名の片方または、両
方とも前記姓辞書、前記名辞書にない場合に、漢字１文
字毎にフリガナの対応をとることにより漢字１文字毎に
フリガナを付与するフリガナ付与手段と、文字区切り位置と姓名区切り位置の対応を取ることによ
り、漢字とフリガナに姓名の区切りを付与し、区切りの
対応しない解を棄却する解棄却手段とを有することを特
徴とする姓名解析装置。5. A first and last name analyzing apparatus for providing a first and last name separator and a character separator for each kanji character to a personal name and a reading input without a first and last name separator. A name dictionary in which first names are registered, a character dictionary in which readings for characters are registered, a first name and a second name dividing unit that divides an input first name into a last name using the last name dictionary and the first name dictionary, A first name and a last name, and / or both names are not in the last name dictionary and the first name dictionary, and the reading is assigned to each kanji character by taking the correspondence of the reading for each kanji character. A surname / name analyzing apparatus, characterized by having a rejection means for assigning a surname and a name to a kanji and a reading by recognizing a correspondence between a character delimiter position and a surname / first name delimiter position and rejecting a solution that does not correspond to the delimiter.

【請求項６】前記文字辞書は、前記文字に対する読み
に加え、該文字の読みの属性を含み、前記解棄却手段は、入力された文字列に対して姓または、名のみしか辞書に
無いとき、前記文字辞書の前記文字の読みの属性を参照
することにより、誤りと判断される解を棄却する誤り棄
却手段を含む請求項５記載の姓名解析装置。6. The character dictionary includes, in addition to the reading for the character, an attribute of the reading of the character. The rejection unit determines whether only the last name or the first name of the input character string exists in the dictionary. 6. The first and last name analyzing apparatus according to claim 5, further comprising an error rejecting unit that rejects a solution determined to be incorrect by referring to a reading attribute of the character in the character dictionary.

【請求項７】前記姓辞書は、姓の出現頻度情報及び姓
の属性を含み、前記名辞書は、名の出現頻度情報及び名の属性を含み、前記入力された文字列に対し、姓・名の片方しか前記姓
辞書または、前記名辞書に存在しない解が複数在る場合
には、該姓または、該名の頻度、該姓辞書または、該名
辞書に存在した姓または、名の長さ、属性を評価する長
さ・属性評価手段と、前記長さ・属性評価手段の評価結果に基づいて最も確か
らしい解を出力する解出力手段とを更に有する請求項５
または、６記載の姓名解析装置。7. The surname dictionary includes surname appearance frequency information and surname attributes. The surname dictionary includes surname appearance frequency information and surname attributes. If there are a plurality of solutions in which only one of the first names does not exist in the last name dictionary or the first name dictionary, the last name or the frequency of the first name, the last name dictionary or the last name existing in the first name dictionary, or the length of the first name 6. A length / attribute evaluation unit for evaluating an attribute, and a solution output unit for outputting a most probable solution based on the evaluation result of the length / attribute evaluation unit.
Or, the first and last name analyzing device described in 6.