JP3747957B2

JP3747957B2 - Connection table editing device

Info

Publication number: JP3747957B2
Application number: JP01060896A
Authority: JP
Inventors: 篤司池野
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-01-25
Filing date: 1996-01-25
Publication date: 2006-02-22
Anticipated expiration: 2016-01-25
Also published as: JPH09204424A

Description

【０００１】
【発明の属する技術分野】
本発明は、前方の単語又は品詞と、後方の単語又は品詞と、前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則とを保持しておく形態素解析用接続テーブルに対して編集を行う接続テーブル編集装置に関する。
【０００２】
【従来の技術】
日本語文等の自然言語文を処理する装置（例えば機械翻訳装置や質疑応答装置やコンピュータ支援教育装置等）においては、自然言語文に対して最初に形態素解析を行ってから、その後所望の動作をする。
【０００３】
従来、このような形態素解析を行う日本語文用の形態素解析装置として、形態素解析部（形態素解析プログラム部）と、日本語辞書と、活用語尾テーブルと、（品詞別）接続テーブルとから構成されるものが提案されていた（特公平０５−５２５４３号公報参照）。
【０００４】
また、接続テーブルは、接続可能か否かを表す接続テーブル以外にも、連続値をとり得る接続重み表を用いたものも提案されている（特開平０５−１３３２７号公報参照）。
【０００５】
接続テーブルのデータは、接続関係を定義する接続規則によって記述・変更される（文献『松本祐治、黒橋禎夫、宇津呂武仁、妙木裕、長尾真、“日本語形態素解析システムＪＵＭＡＮ使用説明書ｖｅｒｓｉｏｎ２．０”、平成５年２月』）。
【０００６】
【発明が解決しようとする課題】
形態素解析装置において出力結果を変更させるためには、辞書または、接続テーブルを修正することが考えられる。
【０００７】
従来、ユーザーが自由に辞書又は接続テーブルを変更すること、つまりカスタマイズすることはほとんど想定されていなかったので、以下に示すように、特に接続テーブルの修正のために必要な要件を満たす接続テーブル編集装置が存在しなかった。
【０００８】
従来の接続テーブル編集装置では、接続テーブルの値や項目を追加・修正する際に、１行に１ルール（または数ルール）の形式で記述されたデータに対して編集を行うが、他のデータとの関連を参照することが困難であった。以下において、１行に１ルール又は数ルールが記載された形式をルール形式と呼ぶ。
例えば、図２は、１行に１ルールの形式で記述された品詞間の接続重みの例を示したものである。図２において、前接部分は接続可能な２つの品詞があるときの前の品詞を示しており、後接部分は接続可能な２つの品詞があるときの後の品詞を示していて、前接部分が名詞で後接部分が格助詞である行を例にすると両品詞間（名詞と格助詞）接続重みが０．８であることを意味する。ここで、接続重みは０〜１の範囲を取るものして、数値が大きいほど両品詞が接続する確率が高いとする。
【０００９】
従来の接続テーブル編集装置では、図２に示した形式のデータの１部又は全部をルール形式のままで表示して編集を行っていた。
【００１０】
ここで、名詞と動詞の接続重みを変更しようとしたときに、格助詞と動詞の接続重みを参考にしたいが、格助詞と動詞の接続重みのルールはかなり後方に存在する、ということも起こり得る。また、加えて副助詞と動詞の接続重みも参照して同時に３つのルールを比較したい場合には、従来の接続テーブル編集装置では非常に困難であった。
【００１１】
また、新しい品詞等の項目を追加するときには、すでに存在する品詞の接続ルール群を探索して希望する接続ルール群との相違を調べ、適当な部分をコピーして、必要であれば一部を修正するという手順をとることが望ましい。さらに、データ量を削減するためには、新しく作成したルール群が、既にあるいずれかの品詞の接続ルール群と重複していないかチェックして、重複部分を１つにまとめることが考えられる。
【００１２】
例えば、既存の接続テーブルに新たに“サ変名詞”を追加したいとする。このとき、現在定義されている品詞全てとサ変名詞の接続ルールを作成しなければならない。全てのルールを一から作成するには膨大な時間がかかるので、既存の品詞の接続ルール群の中から希望する接続条件に対して相違の少ないルール群を探索する。サ変名詞は名詞と接続条件はほとんど同じであるが、動詞「する」との接続重みのみが違うと考えられるので、名詞の接続ルール群を全てコピーして動詞「する」との接続重みのみを変更する。ここで作成されたサ変名詞の接続ルール群が、既に存在する別の品詞のルール群と一致していないかどうかチェックする。もし、既にあるルール群に一致していれば、サ変名詞の接続ルール群を新規に作成する必要はなく、既にあるデータをそのまま利用すればよいということになる。しかしながら、従来の接続テーブル編集装置では、接続テーブルの記述や修正にあたって、他の箇所との関連を一目でみることが困難であり、ルール群どうしの比較も重複のチェックも困難であったので、このような接続テーブルの編集には、多大な労力を要した。
【００１３】
【課題を解決するための手段】
請求項１の発明は、前方の単語又は品詞と、後方の単語又は品詞と、前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則とを保持しておく形態素解析用接続テーブルに対して編集を行う接続テーブル編集装置において、以下の手段を備えたことを特徴とする。
【００１４】
すなわち、形態素解析用接続テーブルに格納された情報を読み出す情報読み出し手段と、情報読み出し手段が読み出した情報を基にして、前方の単語又は品詞を行又は列の見出しに展開すると共に後方の単語又は品詞を列又は行の見出しに展開し、それぞれの行と列の交点に前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則を配置して、行と列からなる表を作成する表作成手段と、表作成手段が作成した行と列からなる表を表示する表示手段と、ユーザーからの入力を受け付ける入力受け付け手段と、入力受け付け手段からの情報に基づいて、表作成手段が作成した表を編集する編集演算手段と、編集演算手段による表の編集後、編集箇所に係る行又は列が有する接続規則群が、他の行又は列が有する接続規則群と重複するか否かを判断する重複チェック手段と、重複チェック手段が、編集箇所に係る行又は列が有する接続規則群が他の行又は列が有する接続規則群と重複すると判断した場合、その重複する複数の行又は列のうち一の行又は列の見出しに他の重複する行又は列の見出しを付加すると共に、他の行又は列を削除する併合手段と、行と列からなる表を形態素解析用接続テーブルに対応した形式に変換する形式変換手段と、形式を変換した後の情報を上記接続テーブルに書き出す情報書き出し手段とを備えることを特徴とする。
【００１５】
請求項１の発明の形態素解析装置では、情報読み出し手段が接続テーブルに格納された情報を読み出して、表作成手段が、この情報読み出し手段が読み出した情報を基にして、前方の単語又は品詞を行又は列の見出しに展開して、後方の単語又は品詞を列又は行の見出しに展開して、行と列の交点に前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則を配置して、行と列からなる表を作成して、表示手段が行と列からなる表を表示して、入力受け付け手段がユーザーからの入力を受け付けて、編集演算手段が入力受け付け手段からの情報に基づいて行と列からなる表に編集を行い、形式変換手段が行と列からなる表を接続テーブルに対応した形式に変換して、情報書き出し手段が形式を変換した後の情報を接続テーブルに書き出す。
【００１６】
【発明の実施の形態】
（第１の実施の形態）
図３は、第１の実施の形態における接続テーブル編集装置を備えた形態素解析装置の機能構成を示したものであるが、実際には、第１の実施の形態の形態素解析装置は、大容量の補助記憶装置を備えた計算機システムで実現されており、そのハードウェア構成は省略する。第１の実施の形態における形態素解析装置は、機能的には、図３に示すように、形態素解析部３１、接続テーブル３２、辞書３３、接続テーブル編集装置３４から構成されている。
【００１７】
接続テーブル３２は、単語（品詞）間の接続の可否あるいは接続重みを定めた値を保持しておくものであり、接続テーブル編集装置３４によって編集され、形態素解析部３１によって参照される。なお、接続テーブル３１は複数存在することもあるが、第１の実施の形態においては１つのみ存在するものとする。
【００１８】
辞書３３は、見出しと品詞の組み合わせで表現された単語の情報を保持しておくものであり、形態素解析部３１によって参照される。なお、辞書３３は複数存在することもあるが、第１の実施の形態においては１つのみ存在するものとする。
【００１９】
形態素解析部３１は、文字列を読み込み、文字列に対して接続テーブル３２及び辞書３３に格納されている情報を参照して、形態素解析を行うものである。すなわち、日本語文を対象とする場合を例にすると、形態素解析部３１は、入力文字列の未処理部分に対して辞書３３を参照して一致する単語（群）を検索し、その際に用言が出てきた場合には、辞書３３内又は形態素解析部３１内にある活用語尾情報を用いて活用語尾を付加し、検索された単語が前方及び後方の形態素候補と接続可能であるか否かを接続テーブル３２を参照して決定し、接続可能で在ればその単語を形態素候補として出力し、接続不可能であれば他の単語候補に対して同様の処理を行い、最終的に得られた形態素解析結果を出力する。
【００２０】
接続テーブル編集装置３４は、ユーザーの入力を受け付け、接続テーブル３２に格納されているルール形式のデータを表形式で表示して、編集を受け付け、変更があった場合に表形式のデータをルール形式に変換して編集テーブル３２内の格納情報を変更するものである。
【００２１】
なお、図１に示すように、接続テーブル編集装置３４は、読み取り・書き出し部１１、変換部１２、データ管理部１３、及び表示部１４と入力受け付け部１５からなる。また、図３との同一、対応部分には同一符号を付して示した。
【００２２】
読み取り・書き出し部１１は、データ管理部１３の指示により接続テーブル３３に格納されているルール形式のデータを読みとって変換部１２に送り、また、変換部１２からのルール形式のデータを受け取って当該データを接続テーブル３３に書き込むものである。
【００２３】
変換部１２は、読み取り・書き出し部１１からのルール形式のデータを受け取って所定の規則にしたがって整列すると共に表形式に変換してデータ管理部１３に送り、また、データ管理部１３から表形式のデータを受け取ってルール形式に変換して読み取り・書き出し部１１に送るものである。
【００２４】
表示部１４は、データ管理部１３から送られれてきた表形式のデータを受け取って図示しない表示装置に表形式で表示するものである。
【００２５】
入力受け付け部１５は、図示しないユーザーインターフェース装置からユーザーによる入力を受け付けて、入力内容を解析して、解析結果に基づいてデータ管理部１３に指示を送るものである。
【００２６】
データ管理部１３は、変換部１２から受け取った表形式のデータを表示部１４に送り、入力受け付け部１５からの指示を受けて表形式のデータに対して所望の編集を行って編集後の表形式データを表示部１４に送り、編集を全編集を終了してから表形式のデータを変換部１２に送るものである。
【００２７】
また、図４に示すように、データ管理部１３は、、制御部４１、探索部４２、編集保護部４３、重複チェック部４４、併合部４５、及び複写部４６と比較部４７、演算部４８からなる。また、図１との同一、対応部分には同一符号を付して示した。
【００２８】
制御部４１は、変換部１２から表形式のデータを受け取り、表形式のデータを表示部１４に送り、入力受け付け部からの指示を受けて、必要に応じてデータ管理部１３内の各部４２〜４８を制御して、表形式のデータに対して編集を行うものである。探索部４２は、制御部４１の指示を受けて表形式のデータの中から所望のデータを探索するものである。編集保護部４３は、制御部４１の指示を受けて、探索部４２の探索したデータが含まれる行又は列のデータのみを編集可能に設定して、他のデータを編集不可能に設定するものである。重複チェック部４４は、制御部４１の指示を受けて、ユーザーが編集を行う毎に、修正された行又は列のデータが既存の行又は列のデータと重複しているか否かを調べるものであり、その結果を制御部４１に通知する。併合部４５は、重複チェック部４４の調べた結果データに重複があった場合に、ユーザーの確認後、制御部４１の指示を受けて、修正データを既存データに併合するものである。複写部４６は、制御部４１の指示を受けて、指定された部分のデータを、表形式のデータ中の他の場所に複写するものである。比較部４７は、制御部４１の指示を受けて、表形式のデータ中の複数の行又は列のデータ中の相違を調べて、制御部４１に通知するものである。演算部４８は、制御部４１の指示を受けて数値演算を行うものであり、演算結果を制御部４１に通知する。
【００２９】
以下に、第１の実施の形態の接続テーブル編集装置３４の動作を説明する。
【００３０】
まず、第１の実施の形態における接続テーブル編集装置３４の基本的な動作を表示・編集・書き込みの３つに分けて説明する。
【００３１】
最初に、接続テーブル編集装置３４が接続テーブル３２からデータを読み出して表示するときの動作を説明する。入力受け付け部１５は、ユーザーからの表示指示の入力を受けて、入力内容を解析して入力内容が表示指示であることを得て、表示指示をデータ管理部１３に送り、表示指示を受けたデータ管理部１３は読み取り・書き出し部１１にデータ読み出し指示を送り、読み取り書き出し部１１は接続テーブル３２内に格納されたデータ（図２で示したようなルール形式）を読み出しその情報を変換部１２に送り、変換部１２はルール形式のデータを表形式のデータに変換してデータ管理部１３に送り、データ管理部１３はこの表形式のデータを表示部１４に送り、表示部１４はデータ管理部１３からの表形式のデータを図示しない表示装置に表形式で表示する。
【００３２】
このとき、図示しない表示装置に表示される表形式の情報の例を図５に示す。図５において、前は前接部分を示していて、後ろは後接部分を示している。なお、各品詞間の交点に表示されている数字は、両品詞間の接続重みである。
【００３３】
つぎに、接続テーブル編集装置３４が表形式のデータ（図５の形式）に対して編集を行うときの動作を説明する。ユーザーが表形式で表示されたデータに対する編集指示を出し、編集指示入力を受けた入力受け付け部１５は、入力内容を解析して所望の指示をデータ管理部１３に送り、データ管理部１３が表形式のデータに対して所望の編集行って編集後のデータを表示部１４に送り、表示部１４が図示しない表示装置に当該データを表形式で表示する。
【００３４】
さらに、接続テーブル編集装置３４が編集後のデータ（図５の形式）を接続テーブル３２（図２の形式）に書き込むときの動作を説明する。編集動作が終了した後、ユーザーは書き込み指示を入力し、入力受け付け部１５は入力内容を解析して書き込み指示をデータ管理部１３に送り、データ管理部１３は編集後の表形式のデータを変換部１２に送り、変換部１２は表形式のデータをルール形式に変換してルール形式のデータを読み取り・書き出し部１１に送り、読み取り書き出し部１１がルール形式のデータを接続テーブル３２に書き出す（削除・変更・追加）。
【００３５】
基本的な動作は以上であるが、以下に、さらに詳細な動作を説明する。まず最初に、図５のようなデータが表示されてから表形式のデータに対して、探索、編集保護、修正、重複チェックの一連の処理を行うときの動作を図６のフローチャートを参照しながら説明する。
【００３６】
ユーザーからの「名詞と動詞の接続の探索」（「名詞と接続の値が０．６の箇所を探索」のような形式でもよい）を意味する入力を受けて、入力受け付け部１５は入力内容を解析して「名詞と動詞の接続の探索」に対応する指示をデータ管理部１３に指示して、指示を受けたデータ管理部１３内の演算部１４がこの指示と受けて探索部４２に指示を出して、探索部４２が表形式のデータ全体の中から「名詞と動詞の接続」のデータを探索して探索結果を制御部４１に送り、制御部４１は編集保護部４３に指示を出して、編集保護部４３は探索された行又は列のデータのみを編集可能に設定して他のデータは編集不可能に設定してから制御を制御部４１に戻し、制御部４１は編集後のデータを表示部１４に送り、表示部は、探索された位置を先頭にしてデータを表示する（ステップ６０１，６０２）。
このとき表示されるデータは図７のようになる。図７において、前は前接部分を示していて、後ろは後接部分を示している。なお、各品詞間の交点に表示されている数字は、両品詞間の接続重みである。
【００３７】
ユーザーが再度他の箇所を参照する場合はステップ６０１に戻り再度探索と編集保護の動作を行い、そうでない場合は後続のステップに進む（ステップ６０３）。
【００３８】
続いて、既に説明したように表形式のデータを編集（修正）し、その後、修正された行又は列のデータが既存のデータと重複していないかチェックして（重複チェックの詳細は後述する）、ユーザーが再編集を行う時はこの修正と重複チェックの動作を繰り返す（ステップ６０４〜６０６）。
【００３９】
既存のデータを修正するときの動作は以上のようであるが、ステップ６０５の重複チェックの動作の詳細を図８のフローチャートを参照しながら説明する。
【００４０】
制御部４１は重複チェック部４４に重複チェックを指示して、それを受けて重複チェック部４４は修正された行又は列のデータが既存のデータと重複していないかをチェックして重複があった場合には重複箇所を制御部４１に伝え、制御部４１は重複が在った場合には表示部１４に重複箇所を表示するように伝え、表示部１４は重複箇所を先頭にしてデータを表示する（ステップ８０１，８０２）。
重複の表示を見たユーザーが「重複データの同一化」を意味する入力を行った場合には、重複データのなかの修正した行又は列のデータを削除して、削除した当該品詞を重複データの既存データの品詞に付け加える（ステップ８０３，８０４）。
【００４１】
ステップ８０２で表示するときの表示例を図９に示す。図９において、前は前接部分を示していて、後ろは後接部分を示している。なお、各品詞間の交点に表示されている数字は、両品詞間の接続重みである。図９では、前接部分の品詞が名詞と特殊名詞１のデータが重複していて、名詞の行を修正したものとする。
【００４２】
図９のような重複に対してステップ８０４でデータを同一化した後の表示例を図１０に示す。図１０においても、前は前接部分を示していて、後ろは後接部分を示している。なお、各品詞間の交点に表示されている数字は、両品詞間の接続重みである。図１０では、前接部分の品詞が名詞のデータが削除されて、削除された品詞（名詞）が特殊名詞１に付け加えられている。
【００４３】
このようにして、重複した行又は列のデータを同一化することで、接続テーブル３２に格納するデータ量が削減できる。
【００４４】
本実施の形態における接続テーブル３２に格納されているデータには、図２に示したような１行１ルールの形式のデータと、１行に数ルール記載された形式のデータがある。さらに、既に述べたように、行又は列のデータが重複していたときに同一化されているデータもある。
【００４５】
図１１に、接続テーブル３２に格納された１行に数ルール形式のデータの例を示す。図１１において、名詞のサ変名詞と接尾辞の名詞性名詞接尾辞は共に動詞との接続重みが０．５であることを示しており、名詞及び指示詞の名詞形態指示詞と判定詞及び特殊の読点の接続重みの値は入記されていない（省略されている）ことを示している。
【００４６】
次に、このような接続テーブル３２に格納されているデータを読み取り・書き出し部１１を介して受け取った変換部１２の動作の詳細を図１２のフローチャートを参照しながら説明する。
【００４７】
変換部１２は、読み取り・書き出し部１１からルール形式のデータを受けて、前接部分と後接部分をそれぞれ取り出して、整列して（部分文字列が近くに配置されるようにする）前接部分リストと後接部分リストを作成して、リスト内の項目数から表のサイズを決定し、値の入っていない空の表形式データを作成する（ステップ１２０１，１２０２）。
【００４８】
続いて、前接部分リストと後接部分リスト内の見出し部分（品詞部分）をキーにしてルール形式のデータから接続重みの値を取り出して、表形式データに値（接続重み）を埋め込んでいく。このとき、値の記入されていないものについては、デフォルト値を埋め込んでいく（ステップ１２０３）。
【００４９】
部分文字列が近くに配置されるようにしたのは、例えば名詞（名詞一般）と名詞のサ変名詞という２つの品詞のように、同じ部分文字列を含む品詞同士は互いによく似た接続重みを持つことが考えられ、品詞相互間を比較しながら修正等を行うことが容易に行えることを考慮したためである。
【００５０】
また、表形式のデータを図示しない表示装置に表示した後で、接続テーブル３２に新規の品詞を追加するとき動作を図１３のフローチャートを参照しながら説明する。
【００５１】
ユーザーから品詞を新規に追加する指示を受けた入力受け付け部１５は入力内容を解析して当該指示をデータ管理部１３内の制御部４１に送り、追加する当該行又は列のエリアを表形式のデータ中に確保して品詞見出しにだけ値をいれ接続重みの部分を空欄にして表示部１４に当該データを送り、表示部は追加された行又は列のデータ部分を表示する（ステップ１３０１）。
【００５２】
図１４にステップ１３０１の処理を行ったときの表示例を示す。図１４において、前は前接部分を示していて、後ろは後接部分を示している。なお、各品詞間の交点に表示されている数字は、両品詞間の接続重みである。図１４では、新規に追加された見出し（サ変名詞）が表示されていて、接続重みは空欄（図１４中の‘−’は空欄を示す）のままになっている。
【００５３】
次に図６のフローチャートのステップ６０１〜６０３の処理と同様の探索処理を行う。ただし、探索箇所を必ず表示上の先頭に表示するのではなく、任意の場所に表示できるものとする（ステップ１３０２）。
【００５４】
ステップ１３０２で複数の箇所を探索して、さらにユーザーから複数の探索箇所のデータ間の相違を表示する指示を受けた場合に、入力受け付け部１５はその入力内容を解析して相違の表示の指示をデータ管理部１３内の制御部４１に送り、制御部４１は比較部４７を制御して、比較部４７は複数の探索箇所のデータを比較して比較結果を制御部４１に通知して、制御部４１は比較結果と共に表形式のデータを表示部１４に送り、表示部１４が探索箇所のデータに加えデータ間の相違を表示する（ステップ１３０３，１３０４）。
【００５５】
データ間の相違をユーザーが参照して、ユーザーが探索箇所のデータから新規に追加した行又は列にデータを複写する指示をだした場合は、入力受け付け部１５は入力内容を解析して当該指示を制御部４１に送り、制御部４１は複写部４６を制御して、複写部は探索箇所のデータの指示された部分を新規に追加した行又は列の指示された部分に複写して、制御部４１は複写後の表形式のデータを表示部１４に送り、表示部１４がそのデータを表形式で表示する（ステップ１３０５）。
【００５６】
その後、図６のフローチャートのステップ６０４〜６０５と同じ処理を行いデータのの修正処理と修正後の重複チェック処理を行う（ステップ１３０６）。
【００５７】
以上のように、第１の実施の形態によれば、読み取り・書き出し部１１が接続テーブル３２に格納されたルール形式のデータを読み出して、表変換部１１がルール形式のデータを行と列からなる表形式のデータに変換して、データ管理部１３がそのデータを受けて表示部１４に受け渡し、表示部１４が表形式のデータを表示して、ユーザーからの入力があった場合に、入力受け付け部１５が入力内容を解析して解析結果の指示をデータ管理部１３に通知して、データ管理部が指示に従い、所望の編集を行い、編集後のデータを表示部１４に渡して表示部１４が編集後のデータを表形式で表示して、ユーザーから書き込みの指示があった場合に、入力受け付け部１５が入力内容を解析して解析結果の指示をデータ管理部１３に通知して、データ管理部１３が表形式のデータを変換部１２に渡し、変換部１２が表形式のデータをルール形式のデータに変換して、読み取り・書き出し部１１がルール形式のデータを接続テーブル３２に書き込むようにしたので、表示装置上に多量の情報を一度に表示することができ、データの参照・編集を容易に実施できる。
【００５８】
さらに、読み取り・書き出し部１１がデータを整列するようにしたので、互いによく似た接続重みを持つデータを近くに表示することでき、よく似た接続重みを持つデータを参照しながら編集を行うことができる。
【００５９】
また、探索部４２が指示された箇所を探索して、編集保護部４３が探索箇所のみを編集可能に設定して他の部分を編集不可能に設定するようにしたので、ユーザーの参照したい箇所を自動的に表示することができ、さらに誤修正を防止できる。
【００６０】
加えて、データ修正後に、重複チェック部４４が修正したデータと既存のデータ間の重複を調べて、重複があった場合に併合部４５が修正したデータと既存のデータを１つに併合するようにしたので、修正誤りを防止できると共に、表示及び接続テーブル３２に格納するデータ量を削減できる。
【００６１】
さらにまた、比較部４７が参照データ間の相違を調べて、ユーザーがその相違を考慮した上で情報複写部４６が参照データから任意の部分を他の場所に複写するようにしたので、新規のデータの入力に係る時間を大幅に短縮することができる。
【００６２】
（他の実施の形態）
第１の実施の形態では、ルール形式のデータが接続テーブルに格納されているものを示したが、接続テーブルに表形式以外のデータが格納されていれば適用できる。
【００６３】
また、第１の実施の形態では、変換部でデータを整列するようにしたが、データ管理部内に所定の規則でデータを整列する整列部を設けて、ユーザーからの指示により整列部がデータを整列するようにしてもよい。
【００６４】
さらに、第１の実施の形態では、新規の品詞の追加時に複写部がデータの複写を行うようにしたが、ユーザーからの指示で任意のタイミングに複写部がデータの複写を行うようにしてもよいし、同様に編集保護、重複チェック、及び併合と比較の処理もそれぞれ任意のタイミングでよい。
【００６５】
なお、第１の実施の形態においては、日本語に適用したものを示したが、他の言語にも適用できる。
【００６６】
【発明の効果】
以上のように、本発明によれば、形態素解析用接続テーブルに格納された情報を読み出す情報読み出し手段と、情報読み出し手段が読み出した情報を基にして、前方の単語又は品詞を行又は列の見出しに展開すると共に後方の単語又は品詞を列又は行の見出しに展開し、それぞれの行と列の交点に前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則を配置して、行と列からなる表を作成する表作成手段と、表作成手段が作成した行と列からなる表を表示する表示手段と、ユーザーからの入力を受け付ける入力受け付け手段と、入力受け付け手段からの情報に基づいて、表作成手段が作成した表を編集する編集演算手段と、編集演算手段による表の編集後、編集箇所に係る行又は列が有する接続規則群が、他の行又は列が有する接続規則群と重複するか否かを判断する重複チェック手段と、重複チェック手段が、編集箇所に係る行又は列が有する接続規則群が他の行又は列が有する接続規則群と重複すると判断した場合、その重複する複数の行又は列のうち一の行又は列の見出しに他の重複する行又は列の見出しを付加すると共に、他の行又は列を削除する併合手段と、行と列からなる表を形態素解析用接続テーブルに対応した形式に変換する形式変換手段と、形式を変換した後の情報を上記接続テーブルに書き出す情報書き出し手段とを備えることとしたので、多量の情報を一度に体系的に表示することができ、ルール群同士の比較も重複チェックもできるので、情報の参照及び編集が容易になり、誤編集を低減させると共に編集にかかる時間を大幅に短縮することができる。
【図面の簡単な説明】
【図１】第１の実施の形態における接続テーブル編集装置の機能構成を示すブロック図である。
【図２】１行１ルールの形式で記述された品詞間の接続重みの例を示す図である。
【図３】第１の実施の形態における形態素解析装置の機能構成を示すブロック図である。
【図４】第１の実施の演算部の機能構成を示すブロック図である。
【図５】第１の実施の形態における表示部の表示例を示す図である。
【図６】第１の実施の形態における探索、編集保護、修正、重複チェックの一連の処理の流れを示すフローチャートである。
【図７】第１の実施の形態における探索後の表示部の表示例を示す図である。
【図８】第１の実施の形態における重複チェック、データの同一化の一連の処理の流れを示すフローチャートである。
【図９】第１の実施の形態における重複チェック後の表示部の表示例を示す図である。
【図１０】第１の実施の形態におけるデータの同一化をした後の表示部の表示例を示す図である。
【図１１】１行数ルールの形式で記述された品詞間の接続重みの例を示す図である。
【図１２】第１の実施の形態における変換部の処理の流れを示すフローチャートである。
【図１３】第１の実施の形態における新規のデータを追加するときの処理の流れを示すフローチャートである。
【図１４】第１の実施の形態における新規データの表示例を示す図である。
【符号の説明】
１１読み取り・書き出し部
１２変換部
１３データ管理部
１４表示部
１５入力受け付け部
３２接続テーブル
３４接続テーブル編集装置
４１制御部
４２探索部
４３編集保護部
４４重複チェック部
４５併合部
４６複写部
４７比較部
４８演算部[0001]
BACKGROUND OF THE INVENTION
The present invention provides a connection table for morpheme analysis that holds a connection word related to whether or not a front word or part of speech, a back word or part of speech, and a connection of a front word or part of speech and a back word or part of speech can be connected. The present invention relates to a connection table editing apparatus that performs editing on a network.
[0002]
[Prior art]
In a device that processes a natural language sentence such as a Japanese sentence (for example, a machine translation device, a question and answer device, a computer-aided education device, etc.), a morphological analysis is first performed on the natural language sentence, and then a desired operation is performed. To do.
[0003]
Conventionally, a morphological analyzer for Japanese sentences that performs such morphological analysis is composed of a morphological analysis unit (morpheme analysis program unit), a Japanese dictionary, a utilization ending table, and a connection table (by part of speech). Have been proposed (see Japanese Patent Publication No. 05-52543).
[0004]
In addition to the connection table indicating whether or not connection is possible, a connection table using a connection weight table that can take continuous values has been proposed (see Japanese Patent Application Laid-Open No. 05-13327).
[0005]
The data in the connection table is described / changed according to the connection rules that define the connection relationship (literatures “Yuji Matsumoto, Ikuo Kurohashi, Takehito Utsuro, Hiroshi Myoki, Makoto Nagao,“ Japanese Morphological Analysis System JUMAN Instruction Manual version ” 2.0 ", February 1993]).
[0006]
[Problems to be solved by the invention]
In order to change the output result in the morphological analyzer, it is conceivable to modify the dictionary or the connection table.
[0007]
Conventionally, it was rarely assumed that the user freely changed the dictionary or connection table, that is, customized, so as shown below, the connection table editing that satisfies the requirements necessary especially for correcting the connection table The device did not exist.
[0008]
In a conventional connection table editing device, one rule (or number) is added to each line when adding or modifying values or items in the connection table. Le Edit), it was difficult to refer to the relationship with other data. Hereinafter, a format in which one rule or number rule is described in one line is referred to as a rule format.
For example, FIG. 2 shows an example of connection weights between parts of speech described in the form of one rule per line. In FIG. 2, the front part indicates the previous part of speech when there are two connectable parts of speech, and the rear part indicates the subsequent part of speech when there are two connectable parts of speech. Taking a line in which the part is a noun and the trailing part is a case particle, for example, it means that the connection weight between both parts of speech (noun and case particle) is 0.8. Here, it is assumed that the connection weight ranges from 0 to 1, and that the larger the numerical value, the higher the probability that both parts of speech are connected.
[0009]
In the conventional connection table editing apparatus, some or all of the data in the format shown in FIG. 2 is displayed in the rule format for editing.
[0010]
Here, when trying to change the connection weight between a noun and a verb, I would like to refer to the connection weight between a case particle and a verb, but it also happens that the connection weight rule between a case particle and a verb exists quite backward. obtain. In addition, when it is desired to simultaneously compare three rules with reference to the connection weights of adjunct and verb, it is very difficult with the conventional connection table editing apparatus.
[0011]
Also, when adding items such as new parts of speech, search for connection rules for existing parts of speech to find differences from the desired connection rules, copy appropriate parts, and copy some if necessary. It is desirable to take the procedure of correcting. Furthermore, in order to reduce the amount of data, it is possible to check whether a newly created rule group overlaps with any existing part-of-speech connection rule group and combine the overlapping parts into one.
[0012]
For example, suppose that it is desired to add a new “sa-changing noun” to an existing connection table. At this time, it is necessary to create a connection rule for all the parts of speech currently defined and the saun noun. All roux Le Since it takes an enormous amount of time to create from scratch, a rule group having a small difference with respect to a desired connection condition is searched from among a group of existing part-of-speech connection rules. The sa noun has almost the same connection condition as the noun, but it is thought that only the connection weight with the verb “s” is different, so copy all the connection rules of the noun and copy only the connection weight with the verb “s”. change. It is checked whether or not the connection rule group of the sa-variable noun created here matches the rule group of another part of speech that already exists. If it matches the existing rule group, it is not necessary to create a new connection rule group for the sub-noun, and the existing data can be used as it is. However, with the conventional connection table editing device, it is difficult to see the relationship with other parts at a glance when describing or correcting the connection table, and it is difficult to compare rules and check for duplication. Editing such a connection table requires a great deal of effort.
[0013]
[Means for Solving the Problems]
The invention of claim 1 is for morphological analysis that retains a front word or part of speech, a rear word or part of speech, and a connection rule relating to whether or not the front word or part of speech can be connected to the rear word or part of speech. A connection table editing apparatus for editing a connection table includes the following means.
[0014]
That is, For morphological analysis Based on the information reading means for reading information stored in the connection table and the information read by the information reading means, the front word or part of speech is expanded into a row or column heading. As well as Expand backward word or part of speech to column or row heading And each A table creation means that creates a table composed of rows and columns by arranging a connection rule relating to whether or not a front word or part of speech and a back word or part of speech can be connected at the intersection of a row and a column, and a table creation means Display means for displaying a table composed of rows and columns, input receiving means for receiving input from the user, Based on the information from the input accepting means, the edit operation means for editing the table created by the table creation means, and after editing the table by the edit operation means, the connection rule group possessed by the row or column related to the edit location is another Duplicate check means for determining whether or not a connection rule group included in a row or a column overlaps, and a connection rule group included in another row or column included in a connection rule group included in the row or column related to the edited portion by the overlap check means A merge means for adding another duplicate row or column heading to one row or column heading and deleting the other row or column when it is determined that the other row or column is duplicated , A table with rows and columns For morphological analysis Format converting means for converting to a format corresponding to the connection table, and information writing means for writing the information after the format conversion to the connection table And comprising .
[0015]
In the morphological analyzer of the invention of claim 1, the information reading means reads the information stored in the connection table, and the table creating means reads the previous word or part of speech based on the information read by the information reading means. Expands to a row or column heading, expands a backward word or part of speech to a column or row heading, and determines whether or not a front word or part of speech and a backward word or part of speech can be connected at the intersection of the row and column Arrange the connection rules, create a table of rows and columns, display the table of rows and columns, input accepting means accepts input from the user, edit operation means accepts input After editing the table consisting of rows and columns based on the information from the means, the format conversion means converts the table consisting of rows and columns into a format corresponding to the connection table, and after the information writing means converts the format Information connection table Export.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 3 shows a functional configuration of the morpheme analyzer including the connection table editing device according to the first embodiment. Actually, the morpheme analyzer according to the first embodiment has a large capacity. This is realized by a computer system including the auxiliary storage device, and the hardware configuration thereof is omitted. The morpheme analyzer in the first embodiment is functionally composed of a morpheme analyzer 31, a connection table 32, a dictionary 33, and a connection table editor 34 as shown in FIG.
[0017]
The connection table 32 holds values that determine whether or not words (parts of speech) can be connected or the connection weights. The connection table 32 is edited by the connection table editing device 34 and referred to by the morphological analysis unit 31. Although there may be a plurality of connection tables 31, it is assumed that only one connection table 31 exists in the first embodiment.
[0018]
The dictionary 33 holds information on words expressed by combinations of headings and parts of speech, and is referred to by the morpheme analysis unit 31. Although there may be a plurality of dictionaries 33, it is assumed that only one dictionary 33 exists in the first embodiment.
[0019]
The morpheme analysis unit 31 reads a character string and performs morpheme analysis on the character string with reference to information stored in the connection table 32 and the dictionary 33. That is, in the case where a Japanese sentence is taken as an example, the morphological analysis unit 31 searches the unprocessed portion of the input character string for a matching word (group) by referring to the dictionary 33, and uses it at that time. When a word comes out, a utilization ending is added using utilization ending information in the dictionary 33 or in the morpheme analysis unit 31, and whether the searched word can be connected to the front and rear morpheme candidates. Is determined by referring to the connection table 32, and if the connection is possible, the word is output as a morpheme candidate. If the connection is impossible, the same processing is performed on the other word candidates, and finally the word is obtained. The morpheme analysis result is output.
[0020]
The connection table editing device 34 accepts user input, displays the rule format data stored in the connection table 32 in a table format, accepts the edit, and if there is a change, the table format data is converted into a rule format. The stored information in the edit table 32 is changed by converting to
[0021]
As shown in FIG. 1, the connection table editing device 34 includes a reading / writing unit 11, a conversion unit 12, a data management unit 13, a display unit 14, and an input receiving unit 15. Also, the same and corresponding parts as those in FIG.
[0022]
The read / write unit 11 reads the rule format data stored in the connection table 33 according to an instruction from the data management unit 13 and sends the rule format data to the conversion unit 12. The read / write unit 11 receives the rule format data from the conversion unit 12 and receives the rule format data. Data is written to the connection table 33.
[0023]
The conversion unit 12 receives the data in the rule format from the reading / writing unit 11 and arranges it according to a predetermined rule, converts it into a table format and sends it to the data management unit 13, and also sends the data in the table format from the data management unit 13. Data is received, converted into a rule format, and sent to the reading / writing unit 11.
[0024]
The display unit 14 receives tabular data sent from the data management unit 13 and displays it in a tabular format on a display device (not shown).
[0025]
The input receiving unit 15 receives user input from a user interface device (not shown), analyzes the input content, and sends an instruction to the data management unit 13 based on the analysis result.
[0026]
The data management unit 13 sends the tabular data received from the conversion unit 12 to the display unit 14, receives an instruction from the input receiving unit 15, performs desired editing on the tabular data, and edits the edited table. The format data is sent to the display unit 14, and after the editing is completed, the tabular data is sent to the conversion unit 12.
[0027]
4, the data management unit 13 includes a control unit 41, a search unit 42, an edit protection unit 43, a duplication check unit 44, a merge unit 45, a copy unit 46, a comparison unit 47, and a calculation unit 48. Consists of. Also, the same and corresponding parts as those in FIG.
[0028]
The control unit 41 receives tabular data from the conversion unit 12, sends the tabular data to the display unit 14, receives an instruction from the input receiving unit, and receives the instructions from the units 42 to 42 in the data management unit 13 as necessary. 48 is used to edit the tabular data. The search unit 42 searches for desired data from tabular data in response to an instruction from the control unit 41. In response to an instruction from the control unit 41, the edit protection unit 43 sets only the row or column data including the data searched by the search unit 42 to be editable, and sets other data to be uneditable. It is. The duplication check unit 44 receives an instruction from the control unit 41 and checks whether or not the corrected row or column data overlaps with existing row or column data every time the user edits. Yes, and notifies the control unit 41 of the result. The merging unit 45 merges the corrected data with the existing data in response to an instruction from the control unit 41 after confirmation by the user when there is duplication in the result data examined by the duplication check unit 44. In response to an instruction from the control unit 41, the copying unit 46 copies the specified portion of data to another location in the tabular data. The comparison unit 47 receives an instruction from the control unit 41, examines differences in data of a plurality of rows or columns in tabular data, and notifies the control unit 41 of the differences. The calculation unit 48 performs numerical calculation in response to an instruction from the control unit 41 and notifies the control unit 41 of the calculation result.
[0029]
The operation of the connection table editing device 34 according to the first embodiment will be described below.
[0030]
First, the basic operation of the connection table editing apparatus 34 in the first embodiment will be described by dividing it into three parts: display, editing, and writing.
[0031]
First, the operation when the connection table editing device 34 reads and displays data from the connection table 32 will be described. The input receiving unit 15 receives an input of a display instruction from the user, analyzes the input content, obtains that the input content is a display instruction, sends the display instruction to the data management unit 13, and receives the display instruction The data management unit 13 sends a data read instruction to the reading / writing unit 11, and the reading / writing unit 11 reads the data (rule format as shown in FIG. 2) stored in the connection table 32 and converts the information into the conversion unit 12. The conversion unit 12 converts the data in the rule format into tabular data and sends it to the data management unit 13. The data management unit 13 sends the data in the table format to the display unit 14, and the display unit 14 manages the data. The tabular data from the unit 13 is displayed in tabular format on a display device (not shown).
[0032]
FIG. 5 shows an example of tabular information displayed on a display device (not shown). In FIG. 5, the front shows the front part and the back shows the rear part. The number displayed at the intersection between each part of speech is the connection weight between both parts of speech.
[0033]
Next, the operation when the connection table editing device 34 edits the data in the table format (format in FIG. 5) will be described. The user issues an editing instruction for the data displayed in the table format, and the input receiving unit 15 that receives the editing instruction input analyzes the input content and sends a desired instruction to the data management unit 13. The desired editing is performed on the data in the format, and the edited data is sent to the display unit 14. The display unit 14 displays the data in a table format on a display device (not shown).
[0034]
Further, the operation when the connection table editing device 34 writes the edited data (format in FIG. 5) in the connection table 32 (format in FIG. 2) will be described. After the editing operation is completed, the user inputs a writing instruction, the input receiving unit 15 analyzes the input content and sends the writing instruction to the data management unit 13, and the data management unit 13 converts the tabular data after editing. The conversion unit 12 converts the tabular data into the rule format, sends the rule format data to the reading / writing unit 11, and the reading / writing unit 11 writes the rule format data to the connection table 32 (deletion).・ Change / Add).
[0035]
Although the basic operation is as described above, a more detailed operation will be described below. First, referring to the flowchart of FIG. 6, the operation when a series of processing of search, edit protection, correction, and duplication check is performed on tabular data after data as shown in FIG. 5 is displayed. explain.
[0036]
In response to an input meaning “search for connection between nouns and verbs” from the user (which may be in a form such as “search for a place where the value of the noun and connection is 0.6”), the input receiving unit 15 And an instruction corresponding to “search for connection between nouns and verbs” is instructed to the data management unit 13, and the calculation unit 14 in the data management unit 13 receiving the instruction receives the instruction and sends it to the search unit 42. The search unit 42 searches for the data of “noun and verb connection” from the entire tabular data, sends the search result to the control unit 41, and the control unit 41 instructs the edit protection unit 43. The edit protection unit 43 sets only the searched row or column data to be editable and sets other data to be uneditable, and then returns control to the control unit 41. Is sent to the display unit 14, and the display unit displays the searched position first. To display the data in the (step 601, 602).
The data displayed at this time is as shown in FIG. In FIG. 7, the front shows the front part, and the back shows the rear part. The number displayed at the intersection between each part of speech is the connection weight between both parts of speech.
[0037]
If the user refers to another part again, the process returns to step 601, and the search and edit protection operations are performed again. If not, the process proceeds to the subsequent step (step 603).
[0038]
Subsequently, as described above, the tabular data is edited (corrected), and then it is checked whether the corrected row or column data is duplicated with the existing data (details of the duplicate check will be described later). ) When the user re-edits, the correction and duplication check operations are repeated (steps 604 to 606).
[0039]
The operation for correcting the existing data is as described above. The details of the duplication check operation in step 605 will be described with reference to the flowchart of FIG.
[0040]
The control unit 41 instructs the duplication check unit 44 to perform duplication check, and in response to this, the duplication check unit 44 checks whether the corrected row or column data is duplicated with existing data. If there is an overlap, the control unit 41 informs the display unit 14 to display the overlap portion, and the display unit 14 sends the data with the overlap portion as the head. It is displayed (steps 801 and 802).
When a user who has viewed duplicates makes an input that means "identify duplicate data", delete the corrected row or column data in the duplicate data, and use the deleted part of speech as duplicate data. Is added to the part of speech of the existing data (steps 803 and 804).
[0041]
A display example when displaying in step 802 is shown in FIG. In FIG. 9, the front shows the front part and the back shows the rear part. The number displayed at the intersection between each part of speech is the connection weight between both parts of speech. In FIG. 9, it is assumed that the noun and the data of the special noun 1 are duplicated in the part of speech of the front part, and the noun line is corrected.
[0042]
FIG. 10 shows a display example after the data is made identical in step 804 for the duplication as shown in FIG. Also in FIG. 10, the front part shows the front part and the rear part shows the rear part. The number displayed at the intersection between each part of speech is the connection weight between both parts of speech. In FIG. 10, the noun data of the part of speech of the front part is deleted, and the deleted part of speech (noun) is added to the special noun 1.
[0043]
In this way, the data amount stored in the connection table 32 can be reduced by making the data of the duplicated rows or columns identical.
[0044]
The data stored in the connection table 32 in the present embodiment includes data in the form of one line and one rule as shown in FIG. 2 and data in the form of several rules written in one line. Furthermore, as already mentioned, there are some data that are identified when the data in the row or column is duplicated.
[0045]
FIG. 11 shows an example of the data in the number rule format in one line stored in the connection table 32. In FIG. 11, the noun suffix and the noun noun suffix of the noun indicate that the connection weight with the verb is 0.5, and the noun form indicator, the determinator and the special This indicates that the connection weight value of the punctuation marks is not entered (omitted).
[0046]
Next, details of the operation of the conversion unit 12 that has received the data stored in the connection table 32 via the read / write unit 11 will be described with reference to the flowchart of FIG.
[0047]
The conversion unit 12 receives the data in the rule format from the reading / writing unit 11, extracts the front part and the rear part, and arranges them (so that the partial character strings are arranged close to each other). A partial list and a trailing partial list are created, the size of the table is determined from the number of items in the list, and empty tabular data containing no value is created (steps 1201 and 1202).
[0048]
Subsequently, the connection weight value is extracted from the data in the rule format using the heading part (part of speech) in the front part list and the back part list as a key, and the value (connection weight) is embedded in the tabular data. . At this time, the default value is embedded for those for which no value is entered (step 1203).
[0049]
The parts of text that contain the same substrings, such as two parts of speech, such as a noun (generally noun) and a noun sa-variant noun, are arranged close to each other. Mutual This is because it can be considered that the connection weights are very similar to each other, and it can be easily corrected while comparing parts of speech.
[0050]
The operation when a new part of speech is added to the connection table 32 after the tabular data is displayed on a display device (not shown) will be described with reference to the flowchart of FIG.
[0051]
Upon receiving an instruction to add a new part of speech from the user, the input receiving unit 15 analyzes the input content and sends the instruction to the control unit 41 in the data management unit 13 so that the area of the row or column to be added is displayed in a tabular format. The data is secured in the data and only the part-of-speech heading is entered, the connection weight portion is left blank, and the data is sent to the display unit 14. The display unit displays the data portion of the added row or column (step 1301).
[0052]
FIG. 14 shows a display example when the processing of step 1301 is performed. In FIG. 14, the front shows the front part and the rear shows the rear part. The number displayed at the intersection between each part of speech is the connection weight between both parts of speech. In FIG. 14, a newly added heading (sa variable noun) is displayed, and the connection weight is left blank (“-” in FIG. 14 indicates a blank).
[0053]
Next, search processing similar to the processing in steps 601 to 603 in the flowchart of FIG. 6 is performed. However, the search location is not necessarily displayed at the top of the display, but can be displayed at an arbitrary location (step 1302).
[0054]
In step 1302, when the user receives an instruction to search for a plurality of locations and further display the difference between the data of the plurality of search locations, the input receiving unit 15 analyzes the input content and instructs the display of the differences. To the control unit 41 in the data management unit 13, the control unit 41 controls the comparison unit 47, the comparison unit 47 compares the data of a plurality of search locations, and notifies the comparison result to the control unit 41, The control unit 41 sends the tabular data together with the comparison result to the display unit 14, and the display unit 14 displays the difference between the data in addition to the data of the search location (steps 1303 and 1304).
[0055]
When the user refers to the difference between the data and the user gives an instruction to copy the data from the data at the search location to the newly added row or column, the input receiving unit 15 analyzes the input content and performs the instruction. To the control unit 41, the control unit 41 controls the copying unit 46, and the copying unit copies the specified portion of the data of the search location to the specified portion of the newly added row or column, and controls The unit 41 sends the tabular data after copying to the display unit 14, and the display unit 14 displays the data in tabular format (step 1305).
[0056]
Thereafter, the same processing as steps 604 to 605 in the flowchart of FIG. 6 is performed to perform data correction processing and post-correction duplication check processing (step 1306).
[0057]
As described above, according to the first embodiment, the reading / writing unit 11 Contact The rule format data stored in the connection table 32 is read out, and the table conversion unit 11 is read out. Is Data is converted into tabular data consisting of rows and columns, the data management unit 13 receives the data and passes it to the display unit 14, and the display unit 14 displays the tabular data and When the input is received, the input receiving unit 15 analyzes the input contents and notifies the data management unit 13 of an instruction of the analysis result, and the data management unit performs desired editing in accordance with the instruction, and after editing When the data is passed to the display unit 14 and the display unit 14 displays the edited data in a tabular format and the user gives an instruction to write, the input receiving unit 15 analyzes the input content and instructs the analysis result Is sent to the data management unit 13, the data management unit 13 passes the tabular data to the conversion unit 12, the conversion unit 12 converts the tabular data into rule format data, and the reading / writing unit 11 Rule form Since such writing over data in the connection table 32, can display a large amount of information at a time on the display device, the reference and edit data can be performed easily.
[0058]
Further, since the reading / writing unit 11 arranges the data, data having connection weights similar to each other can be displayed nearby, and editing can be performed while referring to data having similar connection weights. Can do.
[0059]
In addition, the search unit 42 searches for the designated location, and the edit protection unit 43 sets only the search location to be editable and sets other portions to be uneditable. Can be displayed automatically, and erroneous correction can be prevented.
[0060]
In addition, after the data correction, the duplication check unit 44 examines the duplication between the corrected data and the existing data, and when there is duplication, the data merged by the merging unit 45 and the existing data are merged into one. Thus, correction errors can be prevented and the amount of data stored in the display and connection table 32 can be reduced.
[0061]
Furthermore, since the comparison unit 47 examines the difference between the reference data and the user considers the difference, the information copying unit 46 copies an arbitrary part from the reference data to another place. Time required for data input can be greatly reduced.
[0062]
(Other embodiments)
In the first embodiment, the rule format data is stored in the connection table. However, the present invention is applicable if data other than the table format is stored in the connection table.
[0063]
In the first embodiment, the data is aligned by the conversion unit. However, the data management unit is provided with an alignment unit for aligning data according to a predetermined rule. You may make it align.
[0064]
Furthermore, in the first embodiment, the copying unit copies data when a new part of speech is added. However, the copying unit may copy data at an arbitrary timing according to an instruction from the user. Similarly, the edit protection, duplication check, and merge and compare processes may be performed at arbitrary timings.
[0065]
In the first embodiment, the application to Japanese is shown, but the present invention can also be applied to other languages.
[0066]
【The invention's effect】
As described above, according to the present invention, Based on the information read out from the information read in the connection table for morphological analysis and the information read out by the information reading means, the front word or part of speech is expanded into a row or column heading and the rear word or part of speech is Expands to column or row headings, and creates a table of rows and columns by placing connection rules related to the connection between the front word or part of speech and the back word or part of speech at the intersection of each row and column. Based on information from the table creation means, a display means for displaying a table composed of rows and columns created by the table creation means, an input acceptance means for accepting input from the user, and a table creation means. Editing operation means for editing the created table, and after editing the table by the editing operation means, whether or not the connection rule group possessed by the row or column related to the edited location overlaps with the connection rule group possessed by another row or column When the duplication check means and the duplication check means judge that the connection rule group possessed by the row or column related to the edited location overlaps with the connection rule group possessed by another row or column, the duplicate rows or columns A merge means for adding another duplicate row or column heading to one row or column heading of the columns and deleting the other row or column, and a table composed of the rows and columns to the connection table for morphological analysis Since it is provided with format conversion means for converting to a corresponding format and information writing means for writing information after the format conversion to the connection table. A large amount of information can be displayed systematically at once, Since you can compare rules and check duplicates, It becomes easy to refer to and edit information, thereby reducing erroneous editing and greatly shortening the time required for editing.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a functional configuration of a connection table editing apparatus according to a first embodiment.
FIG. 2 is a diagram showing an example of connection weights between parts of speech described in the form of one line and one rule.
FIG. 3 is a block diagram showing a functional configuration of the morphological analyzer according to the first embodiment.
FIG. 4 is a block diagram illustrating a functional configuration of a calculation unit according to the first embodiment.
FIG. 5 is a diagram showing a display example of a display unit in the first embodiment.
FIG. 6 is a flowchart showing a flow of a series of processing of search, edit protection, correction, and duplication check in the first embodiment.
FIG. 7 is a diagram showing a display example of the display unit after searching in the first embodiment.
FIG. 8 is a flowchart showing a flow of a series of processing of duplication check and data identification in the first embodiment.
FIG. 9 is a diagram showing a display example of the display unit after the duplication check in the first embodiment.
FIG. 10 is a diagram showing a display example of the display unit after data is made identical in the first embodiment.
FIG. 11 is a diagram showing an example of connection weights between parts of speech described in the form of one line number rule.
FIG. 12 is a flowchart illustrating a processing flow of a conversion unit according to the first embodiment.
FIG. 13 is a flowchart showing the flow of processing when adding new data in the first embodiment.
FIG. 14 is a diagram showing a display example of new data in the first embodiment.
[Explanation of symbols]
11 Reading / writing section
12 Conversion unit
13 Data Management Department
14 Display section
15 Input receiving part
32 Connection table
34 Connection table editing device
41 Control unit
42 Search unit
43 Editing Protection Department
44 Duplication check part
45 Merger
46 Copying Department
47 comparison part
48 Calculation unit

Claims

前方の単語又は品詞と、後方の単語又は品詞と、前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則とを保持しておく形態素解析用接続テーブルに対して編集を行う接続テーブル編集装置において、
上記形態素解析用接続テーブルに格納された情報を読み出す情報読み出し手段と、
上記情報読み出し手段が読み出した情報を基にして、前方の単語又は品詞を行又は列の見出しに展開すると共に後方の単語又は品詞を列又は行の見出しに展開し、それぞれの行と列の交点に前方の単語又は品詞と後方の単語又は品詞との接続の可否に係る接続規則を配置して、行と列からなる表を作成する表作成手段と、
上記表作成手段が作成した行と列からなる表を表示する表示手段と、
ユーザーからの入力を受け付ける入力受け付け手段と、
上記入力受け付け手段からの情報に基づいて、上記表作成手段が作成した表を編集する編集演算手段と、
上記編集演算手段による表の編集後、編集箇所に係る行又は列が有する接続規則群が、他の行又は列が有する接続規則群と重複するか否かを判断する重複チェック手段と、
上記重複チェック手段が、編集箇所に係る行又は列が有する接続規則群が他の行又は列が有する接続規則群と重複すると判断した場合、その重複する複数の行又は列のうち一の行又は列の見出しに他の重複する行又は列の見出しを付加すると共に、他の行又は列を削除する併合手段と、
行と列からなる表を上記形態素解析用接続テーブルに対応した形式に変換する形式変換手段と、
形式を変換した後の情報を上記接続テーブルに書き出す情報書き出し手段と
を備えることを特徴とする接続テーブル編集装置。Edit the connection table for morphological analysis that holds the front word or part of speech, the rear word or part of speech, and the connection rule relating to the possibility of connection of the front word or part of speech and the rear word or part of speech. In the connection table editing device to perform,
Information reading means for reading information stored in the connection table for morphological analysis ;
Based on the information read by the information reading means, the front word or part of speech is expanded to a row or column heading, and the rear word or part of speech is expanded to a column or row heading, and the intersection of each row and column. A table creation means for creating a table composed of rows and columns by arranging a connection rule relating to whether or not to connect a front word or part of speech to a back word or part of speech;
Display means for displaying a table of rows and columns the table producing means,
Input receiving means for receiving input from the user;
Based on information from the input receiving means, an editing operation means for editing the table created by the table creating means;
After editing the table by the editing calculation means, a duplication check means for judging whether or not a connection rule group possessed by a row or a column relating to the edited portion overlaps with a connection rule group possessed by another row or column;
When the duplication checking means determines that the connection rule group possessed by the row or column related to the edited portion overlaps with the connection rule group possessed by another row or column, one of the plurality of duplicate rows or columns or A merge means for adding other duplicate row or column headings to the column headings and deleting other row or column;
Format conversion means for converting a table composed of rows and columns into a format corresponding to the connection table for morphological analysis ;
Information writing means for writing the information after the format conversion to the connection table ;
Connection table editing apparatus, characterized in that it comprises a.

上記表作成手段は、上記情報読み出し手段からの情報に基づいて表を作成する際、各行又は各列の見出しとする単語又は品詞を構成する文字列が部分的に一致する単語又は品詞同士を近接するように整列させて作成することを特徴とする請求項１に記載の接続テーブル編集装置。When the table creation means creates a table based on the information from the information reading means, the words or parts of speech that partially match the character strings constituting the words or parts of speech used as the headings of the respective rows or columns are close to each other. The connection table editing device according to claim 1, wherein the connection table editing device is arranged so as to be arranged.

上記編集演算手段は、The editing operation means is
単語又は品詞の新規追加の際、ユーザーが入力した新規追加に係る単語又は品詞の関連する探索指示情報を上記入力受け付け手段から受け取り、上記探索指示情報に対応する上記接続規則を表から探索する探索部と、A search for receiving search instruction information related to a word or part of speech related to the new addition input by the user when the word or part of speech is newly added, and searching the connection rule corresponding to the search instruction information from the table And
上記探索部が探索した上記接続規則のうちユーザーに指示された上記接続規則を、新規追加に係る単語又は品詞の接続規則として複写する複写部とA copy unit that copies the connection rule instructed by the user among the connection rules searched by the search unit as a connection rule for words or parts of speech related to a new addition;
を有することを特徴とする請求項１又は２に記載の接続テーブル編集装置。The connection table editing device according to claim 1, wherein the connection table editing device is provided.

上記編集演算手段は、行と列からなる表の任意の部分を編集可能部分と編集不可能部分に設定する編集保護部を有することを特徴とする請求項１〜３の何れかに記載の接続テーブル編集装置。The editing operation unit, the connection according to any one of claims 1 to 3, characterized in that it has an editing protection unit for setting any part of the table consisting of rows and columns moiety and uneditable portions Edit Table editing device.