JP2004264960A

JP2004264960A - Example-based sentence translation device and computer program

Info

Publication number: JP2004264960A
Application number: JP2003052639A
Authority: JP
Inventors: Mitsuo Shimohata; 光夫下畑; Eiichiro Sumida; 英一郎隅田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2003-02-28
Filing date: 2003-02-28
Publication date: 2004-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To appropriately translate a natural conversation sentence into the other sentence at a low cost. <P>SOLUTION: A sentence translation device 30 includes: an example corpus 40; a modality identification section 50 identifying the modality of the sentence: a content word list extract section 46 extracting the content word list of the sentence; a similar example retrieval section 54 retrieving the similar example sentence of the input sentence out of the example sentences of the example corpus 40 based on the content word list extracted from the input sentence and modality information on the input sentence, the content word list extracted from each of the example sentences in the example corpus 40 and the modality information on each of the example sentences; and a similar word replacement section 58 correcting the corresponding sentence of the similar example sentence in accordance with the difference between the input sentence and the similar example, and then outputting the corresponding sentence. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は用例ベースの文変換装置に関し、特に、自然な対話文を対象に、意味を大きく誤る事なく、文の大意を伝える事が可能な用例ベースの機械翻訳装置に代表される文変換装置に関する。
【０００２】
【従来の技術】
コンピュータを用いて翻訳を行なういわゆる機械翻訳に関しては、様々な手法が試みられている。その代表的なもので、多くの機械翻訳システムに採用されているのは、人手または半自動で作成した翻訳規則を利用したルールベースのものである。しかし、ルールベースの機械翻訳では、翻訳のための規則の抽出と、翻訳に要する辞書の構築とに、多大な人手をかける必要があり、開発工程が大きくなるという問題がある。また、言語ごとに翻訳のための規則が異なるため、ある言語の組合せについての機械翻訳システムを開発しても、他の言語の組合せに応用できる可能性が低いという問題もある。
【０００３】
これに対し、用例コーパスを用いた機械翻訳を用いる事により、少ない開発工程で精度の高い翻訳を実現しようとする研究が行なわれている。ここで、「用例コーパス」とは、一般的には、元言語文と、元言語文に対応する目的言語文との対を多数含む用例データベースの事をいう。
【０００４】
用例コーパスを用いた機械翻訳には、大きく分けて次の二つがある。
【０００５】
（１）統計的機械翻訳（ＳｔａｔｉｓｔｉｃａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ。略して「ＳＭＴ」と呼ぶ。）
ＳＭＴでは、用例コーパスと対訳辞書とから一旦翻訳モデルを学習し、実行時にそのモデルに従って最善の翻訳を検索する。
【０００６】
（２）用例翻訳（Ｅｘａｍｐｌｅ−ＢａｓｅｄＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ。略して「ＥＢＭＴ」と呼ぶ。）
ＥＢＭＴは、用例コーパスを直接に翻訳に用いる。ＥＢＭＴに関するものとして、後掲の特許文献１に記載のものがある。特許文献１に記載された機械翻訳装置では、次の様にして入力文を翻訳している。すなわち、用例コーパスの中から入力文に最も類似する用例を検索する。検索された用例の元言語文中の単語であって、入力文と異なるものを差分部分とする。検索された用例の目的言語文中のこの差分部分に対応する部分を変数に置換える事により、翻訳パターンを生成する。そして、入力文中にこの翻訳パターン内の変数に対応する元言語の単語が存在する場合には、当該元言語の単語に対応する目的言語の単語を対訳辞書に基づいて取得し、翻訳パターン中の変数を、得られた単語で置換する。
【０００７】
【特許文献１】
特開２００３−６１９３号公報（第１図、段落００２６〜００３１）
【発明が解決しようとする課題】
しかし、特許文献１に記載された技術では、自然な対話を適切に訳す事が難しいか、またはそのためのコストが高くなるという問題がある。これは、自然な対話を収録する用例コーパスを作成するコストが高い事に起因する。書き言葉などの場合には、比較的簡単に用例を収集する事ができるが、自然な対話ではそうした事が困難なためである。
【０００８】
自然な対話を翻訳するために話し言葉からなる用例コーパスを作成しようとすれば、そのためのコストが高くなる。コストを下げようとすれば用例コーパスに十分なデータ量を確保できず、適切な翻訳を行なう事が難しい。十分なデータ量を確保するために書き言葉コーパスを導入すると、自然な対話を翻訳しようとする場合に入力文と用例コーパス内の用例との文体が異なり、適切な翻訳を行なう事が困難になるという問題がある。従って、自然な対話を、小さなコストで適切に翻訳する事が可能な用例ベースの機械翻訳装置が求められている。
【０００９】
さらに、こうした従来の機械翻訳は、全般的に、翻訳に失敗すると非常に悪い翻訳しか得られないという問題がある。この問題は、入力文が長くなると顕著になる。このような事が生じないよう、仮に翻訳に失敗した場合であっても、得られる訳により入力文の大意が理解できるような出力が得られるような機械翻訳装置があれば有用である。また、こうした問題は、機械翻訳装置に限定されず、用例ベースで文を変換または検索するための装置一般に関連して生ずるものである。
【００１０】
それゆえに本発明は、自然な対話文を、小さなコストで適切に他の文に変換する事が可能な用例ベースの文変換装置を提供する事を目的とする。
【００１１】
本発明の他の目的は、元言語の自然な対話文を、小さなコストで適切に目的言語の文に翻訳する事が可能な用例ベースの文変換装置を提供する事である。
【００１２】
本発明のさらに他の目的は、自然な対話文を、小さなコストで適切に他の言語の文に翻訳する事が可能で、翻訳に失敗した場合であっても大意を誤りなく伝える事ができる、用例ベースの文変換装置を提供する事を目的とする。
【００１３】
【課題を解決するための手段】
前述の様に、従来の機械翻訳装置などの文変換装置では、文の変換に失敗したときには非常に悪い翻訳結果しか得られないか、全く翻訳が得られない事が一般的である。これは、たとえば従来の機械翻訳装置では入力文をその細部まで正確に翻訳しようとする事を目的としている事が原因と考えられる。そのため、翻訳の過程で誤りがあると、その影響が非常に大きくなる。
【００１４】
しかし、状況によっては、入力文の細部まで正確に翻訳するのではなく、その大意を伝えればよいという場合もある。たとえば対話の翻訳などの場合である。特に、音声認識の出力からリアルタイムで対話の翻訳を行なう場合などには、音声以外にも情報を伝える手段があるため、入力文を細部まで正確に翻訳する必要性は小さい。
【００１５】
そこで、本発明では、以下に述べる構成とする事により、機械翻訳装置などの文変換装置に付随するこうした問題を解決した。
【００１６】
すなわち、本発明の第１の局面に係る用例ベースの文変換装置は、各々が、用例文と、用例と所定の関係にある対応文とからなる複数の文ペアを含む用例コーパスと、文のモダリティを識別してモダリティ情報を出力するための手段と、文の内容語リストを抽出するための手段と、入力文から抽出される内容語リストおよび入力文のモダリティ情報、ならびに用例コーパス中の各用例文から抽出される内容語リストおよび各用例文のモダリティ情報に基づいて、用例コーパスの用例文の中から入力文の類似用例文を検索するための手段と、入力文と類似用例文との間の相違に従って、類似用例文の対応文を修正して出力するための手段とを含む。
【００１７】
好ましくは、検索するための手段は、用例コーパス中で、入力文のモダリティと一致するモダリティを持つ用例文を検索するための手段と、用例文を検索するための手段により検索された用例文のうち、入力文から抽出される内容語リストと所定の関係を有する用例文を選択するための手段と、選択するための手段により選択された用例文の各々について予め定められた算出式により類似度を算出し、最も高い類似度の用例文を入力文の類似用例文として選択するための手段とを含む。
【００１８】
さらに好ましくは、用例文を選択するための手段は、用例文を検索するための手段により検索された用例文の各々の内容語リストによって、入力文の内容語リストの主要領域と非主要領域とを決定するための手段と、検索された用例文のうち、その内容語リストと、当該用例文の内容語リストによって決定された入力文の内容語リストの主要領域とに同一の内容語が存在するものを選択するための手段とを含む。
【００１９】
より好ましくは、類似用例文として選択するための手段は、類義語情報を格納したシソーラスと、シソーラスを参照して、（１）各用例文と、主要領域とに存在する同一内容語の数、（２）各用例文と、非主要領域とに存在する同一内容語の数、（３）各用例文と、主要領域に存在する類義内容語の数、（４）各用例文と、非主要領域に存在する類義内容語数の数、（５）各用例文と、入力文とに共通する機能語の数、および（６）各用例文と、入力文とで異なる機能語の数に基づいて、予め定められた算定式により類似度を算出するための手段と、類似度を算出するための手段により最も高い類似度が算出された用例文を類似用例文として選択するための手段とを含む。
【００２０】
好ましくは、類似度を算出するための手段は、（１）〜（６）の値に対してこの順で大から小となる様に予め割当てられた重みを乗算し、その和を類似度として算出するための手段を含む。
【００２１】
さらに好ましくは、用例コーパスの文ペアの各々は、所定の元言語の用例文と、当該元言語の用例文に対応する目的言語の文とを含む。
【００２２】
より好ましくは、用例コーパスの文ペアの各々は、所定の元言語の用例文と、当該元言語の文に対応する目的言語の文とを含み、用例ベースの文変換装置はさらに、元言語の類義内容語情報を格納したシソーラスと、元言語と目的言語との対訳辞書と、シソーラスを参照して、入力文と類似用例文とに含まれる類義内容語のペアを検出するための手段と、対訳辞書を参照して類義内容語のペアを目的言語の単語に翻訳するための手段と、類似用例文の対応文を、翻訳された類義内容語のペアを用いて修正するための手段とを含む。
【００２３】
好ましくは、用例ベースの文変換装置はさらに、モダリティ情報を出力するための手段および文の内容語リストを抽出するための手段を制御して、用例コーパス中の各用例文についてモダリティ情報および文の内容語リストを作成するための手段と、作成するための手段により作成されたモダリティ情報および文の内容語リストを用例コーパス中の対応する用例文に付す様に用例コーパスを更新するための手段とを含む。
【００２４】
本発明の第２の局面に係るコンピュータプログラムは、コンピュータにより実行されると、当該コンピュータを、上記したいずれかの用例ベースの文変換装置として動作させる。
【００２５】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施の形態の機械翻訳装置について詳細に説明する。なお、以下で参照する図面において、同じ部品には同じ参照符号および名称を付してある。それらの機能も同一である。従って、それらについての詳細な説明を繰返す事はしない。
【００２６】
本実施の形態の装置では、入力文の大意を翻訳する様にし、細部まで精密に翻訳する事は目的としない。対話を対象とした翻訳では、細かい意味が欠落しても大意さえ伝われば対話の進行にほとんど問題がない事が多いためである。そのために、本実施の形態では、用例コーパスを用い、かつ用例コーパスから入力文と類似した用例文（類似用例文と呼ぶ。）を検索するための方法に特徴がある。
【００２７】
最初に、以下の本実施の形態に関する説明で使用される用語について定義する。なお、これら用語の詳細についてはさらに後に述べる。また、以下の定義はあくまで本実施の形態での定義であり、実施の形態によっては別の定義に従うものを用いる事も可能である。
【００２８】
＜類似用例文＞
類似用例文とは、翻訳に用いられる用例コーパスのうち、入力文と大意を共有する文の事をいう。より具体的には、以下に説明するモダリティおよび内容語リストに基づいて決定される。類似用例文は、入力文と正確に等しくなくともよい。
【００２９】
＜モダリティ＞
モダリティとは、文内の主観的表現一般の部分の事をいう。法性、法範疇とも呼ばれる。たとえば、平叙、疑問、依頼、否定、などの分類がその例である。
【００３０】
＜内容語＞
内容語とは、一般に事物の名称，性質、動作、状況などを表現する語の事をさす。以下の説明では、名詞、形容詞、動詞、副詞を内容語とする。
【００３１】
＜語の正規形＞
語の正規形とは、語形変化する語を終止形などの一つの形に統合したり、送り仮名などの標記の些細な異なりをまとめたりした形の事をいう。
【００３２】
＜内容語リスト＞
内容語リストとは、文から抽出された内容語の正規形を、文中でのそれらの出現順に従って並べたものをいう。
【００３３】
＜入力文の主要領域＞
入力文の主要領域とは、入力文と用例文との類似度を算出する際に、「比較すべき用例文の内容語リストに含まれる内容語の数がｎ個であるときの、入力文の内容語リストの最後のｎ語」の事をいう。
【００３４】
＜シソーラス＞
シソーラスとは、単語を、単語間の意味的関係によって整理した情報が収録されているデータの事をいう。
【００３５】
＜単語の「同一」と「類義」＞
２単語が「同一」であるとは、その２単語が全く同一で一致する事を指す。２単語が「類義」であるとは，シソーラスにおいてその２単語が類似した意味を有する関係（類義関係）にあるとして定義されている事を指す。
【００３６】
［モジュール構成］
図１に、本発明の実施の形態に係る用例ベースの機械翻訳装置３０のモジュール構成をブロック図形式で示す。図１を参照して、この機械翻訳装置３０は、用例コーパス４０と、信号入力部６４から与えられる選択信号に応答して、入力文４２と用例コーパス４０中の用例とのいずれかを選択する選択部４４と、選択部４４から出力される文から内容語リストを抽出するための内容語リスト抽出部４６と、選択部４４から出力される文についてそのモダリティを識別し、モダリティ情報を出力するためのモダリティ識別部５０とを含む。
【００３７】
機械翻訳装置３０はさらに、内容語リスト抽出部４６の出力を受ける入力と第１の出力７０および第２の出力７２とを有し、信号入力部６４から与えられる選択信号の値に応じて、内容語リスト抽出部４６から与えられる内容語リストを第１の出力７０または第２の出力７２に出力する選択部４８を含む。機械翻訳装置３０はまた、モダリティ識別部５０の出力を受ける入力と第１の出力７４および第２の出力７６とを有し、内容語リスト抽出部４６から与えられる選択信号の値に応じて、モダリティ識別部５０から与えられるモダリティ情報を第１の出力７０または第２の出力７６に出力する選択部５２を含む。
【００３８】
機械翻訳装置３０はさらに、元言語の単語の類義語情報を含むシソーラス５６と、選択部４８の第１の出力７０、選択部５２の第１の出力７０、および用例コーパス４０に接続され、選択部４８から与えられる内容語リスト、選択部５２から与えられるモダリティ情報、およびシソーラス５６を参照して、入力文４２の類似用例文およびその対訳文を用例コーパス４０中から検索するための類似用例検索部５４とを含む。
【００３９】
機械翻訳装置３０はさらに、元言語および目的言語の対訳辞書６０と、類似用例検索部５４の出力および対訳辞書６０に接続された類義語置換部５８を含む。類義語置換部５８は、類似用例検索部５４から、入力文４２と、入力文４２に対する類似用例文と、類似用例文の対訳文とを受けて、対訳辞書６０を参照して、入力文４２およびその類似用例文の間の相違に基づいて、類似用例文の対訳文を修正し翻訳文６２として出力するためのものである。
【００４０】
機械翻訳装置３０はさらに、選択部４８の第２の出力７２および選択部５２の第２の出力７６に接続され、内容語リスト抽出部４６が用例コーパス４０の用例文から抽出した内容語リストと、モダリティ識別部５０が用例コーパス４０の同じ用例文から抽出したモダリティ情報とに基づいて、用例コーパス４０中の当該用例文にその内容語リストおよびモダリティ情報とを付加する様に用例コーパス４０を更新するためのコーパス更新部６６を含む。
【００４１】
信号入力部６４を介して与えられる選択信号は、入力文４２の内容語リストおよびモダリティ情報を算出する場合には第１の値をとり、用例コーパス４０中の用例文の各々について内容語リストおよびモダリティ情報を算出する処理の場合には第２の値をとる。
【００４２】
選択信号が第１の値をとるとき、選択部４４は入力文４２を選択して内容語リスト抽出部４６およびモダリティ識別部５０に与える。このとき、選択部４８および選択部５２は、それぞれ、入力文４２に関連した内容語リストおよびモダリティ情報を類似用例検索部５４に与える。
【００４３】
選択信号が第２の値をとるとき、選択部４４は用例コーパス４０から与えられる用例文を選択して内容語リスト抽出部４６およびモダリティ識別部５０に与える。このとき選択部４８および選択部５２はそれぞれ、用例コーパス４０中の用例文から得られた内容語リストおよびモダリティ情報をコーパス更新部６６に与える。
【００４４】
図２に、用例コーパス４０の内容の一例を示す。図２を参照して、用例コーパス４０は、元言語の用例文およびその目的言語の対訳文からなる文ペアを多数含む。
【００４５】
モダリティ識別部５０が行なう文のモダリティの識別方法について説明する。モダリティ識別部５０は、文の表層的特徴を用いて文のモダリティを識別する。たとえば、元言語として日本語を採用した場合には、モダリティは特有の文末表現で識別できる。図３に、日本語に関するモダリティと各モダリティに特有の文末表現とを表形式で示す。たとえば、文末が「か」または「ね」という終助詞で終わる文のモダリティは「疑問」である。同様に「いただけませんか」「いただけないでしょうか」「いないでしょうか」「お願い」または「ください」などという文末表現を持つ文のモダリティは「依頼」である。「ない」「ません」という助動詞で終わる文のモダリティは「否定」である。これ以外にも各モダリティに特有の表現があり得るが、ここには全ては挙げていない。本実施の形態では、図３の文末表現に該当しない文のモダリティはすべて「平叙」と判定する。図４に、文例とそのモダリティの例を示す。
【００４６】
図５に、コーパス更新部６６によって更新された後の用例コーパス４０の内容の例を示す。図２に示した各文ペアごとに、元言語の用例文のモダリティ情報と、元言語の用例文から抽出した内容語リストとが追加されている。
【００４７】
図１に示す類似用例検索部５４は、用例コーパス４０中の各用例文のうち、以下の二つの条件を満たす文の中から入力文４２に最も類似した文を検索する。
（条件１）モダリティが入力文４２のモダリティと一致する。
（条件２）入力文４２の主要領域中に存在する内容語と同一の内容語を含む内容語リストをもつ。
【００４８】
図６に、図１に示す類似用例検索部５４の詳細な機能ブロック図を示す。図６を参照して、類似用例検索部５４は、用例コーパス４０に含まれる用例文のうち、入力文４２のモダリティと一致するものを抽出するための同一モダリティ用例文抽出部８０と、同一モダリティ用例文抽出部８０により抽出された用例文のうち、対応する内容語リストが所定の条件を満足しているもののみを選択するための内容語条件検査部８２と、内容語条件検査部８２により選択された用例文の各々について後述する方法により入力文４２との類似度を示すスコアを付与し、最高スコアを持つ用例文のみを選択するためのスコア計算・選択部８４とを含む。
【００４９】
同一モダリティ用例文抽出部８０の機能については上述した通りであり、そのための構成も明らかである。
【００５０】
図７に、内容語条件検査部８２が行なう内容語リストの条件の検査方法について示す。図７を参照して、上段に示す様に入力文４２については内容語リスト抽出部４６により内容語リストが作成される。また、用例コーパス４０に含まれる各用例文については、予め内容語リスト抽出部４６により内容語リストが作成されコーパス更新部６６により各用例文に付与されている。
【００５１】
内容語条件検査部８２は、前述した条件１を満足する用例文のうち、さらに条件２を満足する用例文を選択するためのものである。この条件で使用される「入力文の主要領域」の定義は既に述べた通りである。
【００５２】
図７の上段に示す通り、第１の用例文の内容語リストは３語である。従って、入力文４２と第１の用例文とが類似しているか否かを判定する際の入力文４２の主要領域は、その内容語のリストの末尾３語（うるさい、部屋、換える）である。これらのうち「部屋」「換える」という２つの語１００が第１の用例文の内容語リストにも含まれている。従って第１の用例文は条件２を満たす。
【００５３】
第２の用例文の内容語リストは２語である。従って、入力文４２と第２の用例文とが類似しているか否かを判定する際の入力文４２の主要領域は、その内容語リストの末尾２語（部屋、換える）である。この中には、第２の用例文の内容語リスト（隣、うるさい）にも存在する語はない。従って、第２の用例文は入力文４２とは類似していないと判定される。
【００５４】
内容語条件検査部８２は以上の様にして入力文４２と類似している用例文を入力文４２とモダリティが一致しているものの中から選択する。
【００５５】
スコア計算・選択部８４は、内容語条件検査部８２から出力される条件１、２を満足する用例文の各々に対して、図８に示す条件に従ってスコアを付与する。図８では、上から順に重要な要因を挙げ、各要因ごとに重みを乗じて加算する。この例では、上位に挙げられた要因が、スコアの計算結果の大部分を支配する。もちろん、重みの定め方は図８に示したものに限定されない。複数の要因がほぼ同じ重みを持つような設定方法もあり得る。
【００５６】
図９に、図１に示す類義語置換部５８のより詳細な機能ブロック図を示す。図９を参照して、類義語置換部５８は、入力文４２と、類似用例検索部５４により選択された類似用例文との間の類義語ペアを抽出し、類義語ペアの数が０か否かを示す判定信号と、抽出された類義語ペアとを出力するための類義語抽出部１２０を含む。判定信号は、類義語ペアの数が０の場合には第１の値、０以外のときには第２の値をとるものとする。
【００５７】
類義語置換部５８はさらに、類似用例検索部５４から用例文およびその対訳文を受ける様に接続され、さらに対訳辞書６０に接続され、類義語抽出部１２０からの判定信号が第２の値であるときに、類義語抽出部１２０から与えられた類義語ペアを対訳辞書６０を用いて翻訳し、用例文の対訳文中の、当該類義語に該当する個所を類義語の訳語で置換して出力する処理を全ての類義語ペアに対して行ない、結果として得られた訳文を出力するための類義語置換部１２４と、用例文の対訳文を受ける第１の入力、および類義語置換部１２４の出力を受ける第２の入力を持ち、類義語抽出部１２０の出力する判定信号が第１の値のときには対訳文を、第２の値のときには類義語置換部１２４の出力を、それぞれ選択して出力するための選択部１２６とを含む。
【００５８】
［動作］
この機械翻訳装置３０は以下の様にして動作する。動作は大きく分けて第１および第２の二つの局面に分かれる。第１の局面では用例コーパス４０中の各用例文に対して、モダリティ情報と内容語リストとを付与する作業を行ない、第２の局面では、入力文４２に対し、用例コーパス４０を用いた翻訳を行なう。
【００５９】
第１の局面では、信号入力部６４に与える選択信号の値を前述した第２の値とする。このとき、図１に示す選択部４４は用例コーパス４０の出力を選択する。内容語リスト抽出部４６は、選択部４４から与えられた用例文ごとに内容語リストを抽出し、選択部４８に与える。選択部４８は内容語リスト抽出部４６の出力をコーパス更新部６６に与える。モダリティ識別部５０は、選択部４４から与えられた用例文ごとにモダリティを識別し、モダリティ情報を選択部５２に与える。選択部５２はモダリティ識別部５０の出力をコーパス更新部６６に与える。コーパス更新部６６は、内容語リスト抽出部４６から与えられた内容語リストおよび選択部５２から与えられたモダリティ情報を、用例コーパス４０中の該当する用例文に付加する。
【００６０】
用例コーパス４０の全ての用例文に対して上記した処理が終了すると、第１の局面の処理は終了する。
【００６１】
第２の局面では、信号入力部６４からの選択信号を第１の値とする。選択部４４は、入力文４２を選択して内容語リスト抽出部４６およびモダリティ識別部５０に与える。内容語リスト抽出部４６は、入力文４２から内容語リストを抽出し、選択部４８に与える。選択部４８は、この内容語リストを類似用例検索部５４に与える。モダリティ識別部５０は、選択部４４から与えられた入力文４２のモダリティを識別し，モダリティ情報を類似用例検索部５４に与える。
【００６２】
類似用例検索部５４の同一モダリティ用例文抽出部８０（図６を参照されたい。）は、用例コーパス４０の用例文の中から、入力文４２のモダリティと同一のモダリティを持つ文（これは通常複数個存在する。）を抽出し内容語条件検査部８２に与える。内容語条件検査部８２は、同一モダリティ用例文抽出部８０から与えられた文のうち、前述した内容語リストに関する条件１，２を満足するもののみを選択してスコア計算・選択部８４に与える。スコア計算・選択部８４は、内容語条件検査部８２から与えられた文の各々について図８に示した算定式に基づきスコアを計算し、最高スコアを示した用例文（これが「類似用例文」である。）を類義語置換部５８に出力する。
【００６３】
類義語置換部５８の類義語抽出部１２０（図９を参照されたい。）は、類似用例文の中に、入力文中の語と類義関係にある語がいくつあるかを判定する。もしも類義関係にある語がなにもない場合には、類義語抽出部１２０は前述した通り第１の値の信号を類義語置換部１２４および選択部１２６に出力する。類義語置換部１２４は、この信号が第１の値の場合には動作しない。選択部１２６は、類似用例文の対訳文を選択して出力する。
【００６４】
類義関係にある語がある場合には、類義語抽出部１２０は第２の値の信号を類義語置換部１２４および選択部１２６に与える。類義語抽出部１２０はさらに、入力文と類似用例文との間の類義内容語ペアとを類義語置換部１２４に与える。
【００６５】
類義語置換部１２４は、類義語ペアを対訳辞書６０を用いて翻訳する。翻訳された類義語ペアを目的言語置換情報と呼ぶ。類義語置換部１２４はさらに、類似用例文の対訳文中の単語を、目的言語置換情報に従って置換する。より具体的には類義語置換部１２４は、類似用例文の対訳文中で、目的言語置換情報にある単語を探す。見つかった単語を、目的言語置換情報中の訳語で置換する。類義語置換部１２４は、この処理を全ての類義語ペアについて繰返し、結果として得られた文を選択部１２６に与える。
【００６６】
選択部１２６は、類義語抽出部１２０から与えられる信号が第２の値であるので、類義語置換部１２４の出力を選択し出力する。その結果、修正後の対訳文が選択部１２６の出力に得られる。
【００６７】
図１０に示す例を用いて類義語置換部５８の動作について再度説明する。この例では、入力文が「ツインは満室ですよ。」という文であり、用例コーパス４０から得られたその類似用例文が「シングルは満室ですよ。」という文であり、その対訳文は図２および図１０に示す通り「Ａｌｌｔｈｅｓｉｎｇｌｅｓａｒｅｆｕｌｌ．」であるものとする。この入力文と類似用例文との間では、入力文の「ツイン」という単語１４０と、類似用例文中の「シングル」という単語１４２とが類義関係にある。類義語抽出部１２０はこの類義語ペアを抽出し類義語置換部１２４に与える。
【００６８】
類義語置換部１２４は、対訳辞書６０を参照して、この類義語ペアを翻訳する。この場合、「ｓｉｎｇｌｅｓ」と「ｔｗｉｎｓ」という目的言語置換情報が得られる。類義語置換部１２４は、この目的言語置換情報を用いて、類似用例文の対訳文の中の「ｓｉｎｇｌｅｓ」という単語１４４を「ｔｗｉｎｓ」という単語１４６で置換して、修正後の対訳文として出力する。
【００６９】
［コンピュータを用いた実現］
上記した機械翻訳装置は、一般的なコンピュータハードウェアおよびその上で動作するソフトウェアにより実現できる。特に、携帯可能なコンピュータにより実現する事により、たとえば旅行会話など、話し言葉が主として使用される場面で翻訳を効率的に行なう事ができる。図１１に、そうした機械翻訳装置の外観を示す。
【００７０】
図１１を参照して、この機械翻訳装置１６０は、いわゆるＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）１７０と、このＰＤＡ１７０に装着可能なメモリカード１８０とを含む。ＰＤＡ１７０は、情報の表示と、スタイラスペンなどによる入力装置とが一体となった表示パネル１７２と、各種の操作に用いる複数のボタン１７６と、表示パネル１７２に表示されるカーソル位置を移動する際などに操作する十字キー１７８とを含む。
【００７１】
図１２に、ＰＤＡ１７０のハードウェアブロック図を示す。図１２を参照して、ＰＤＡ１７０は、ボタン１７６、および十字キー１７８からの入力を受ける様に接続されたＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１８６と、ＣＰＵ１８６および表示パネル１７２が接続されたバス１９６と、いずれもバス１９６に接続された、ブートアッププログラム、オペレーティングシステム（ＯＳ）プログラムなどを格納したＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）１８８、本発明に係る機械翻訳プログラムが作業用に使用するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１９０、およびメモリカード１８０が装着されるメモリカードインタフェース１９２とを含む。
【００７２】
メモリカード１８０は、図１に示した用例コーパス４０、シソーラス５６、対訳辞書６０、および機械翻訳プログラムを格納している。この機械翻訳プログラムをＣＰＵ１８６が実行する事により、上記した機械翻訳装置を実現する事ができる。
【００７３】
機械翻訳装置は、上記した通り、用例コーパス４０の更新という第１の局面と、更新された用例コーパス４０を用いた翻訳という第２の局面という二つの局面で動作する。図１３に、用例コーパス４０の更新プログラムの概略のフローチャートを示す。このプログラムは、メモリカード１８０がＰＤＡ１７０に装着されたときに表示パネル１７２に表示されるメニューから起動する事ができる。このプログラムを起動する事は、図１において信号入力部６４を介して与えられる選択信号の値を第２の値に設定する事に相当する。
【００７４】
図１３を参照して、用例コーパス４０の更新処理では、最初に用例コーパス４０の各用例文のモダリティを識別する（ステップ２００）。モダリティの識別の基準は図３〜図５を参照して前述した通りである。
【００７５】
続いてステップ２０２で、用例コーパス４０の各用例文から内容語リストを抽出する。内容語リストについては、図５を参照して前述した通りである。
【００７６】
さらにステップ２０４で、用例コーパス４０中の各用例文のモダリティとその内容語リストとを、各用例文を含む文ペアと関連付けて用例コーパス４０に記録する。記録後の用例コーパス４０の内容は図５に示した通りである。
【００７７】
以上で用例コーパス４０の更新処理は終了する。この処理は、用例コーパス４０が最初に導入されたとき、または用例コーパス４０に格納された用例文の内容に変更があったときに、それぞれ一回だけ行なえばよい。
【００７８】
図１４に、第２の局面の機械翻訳を実現するためのプログラムのフローチャートを示す。このプログラムは、入力文が与えられる事により起動するものとする。図１４を参照して、このプログラムでは、まず入力文のモダリティを識別する（ステップ２２０）。続いて、入力文から内容語リストを抽出する（ステップ２２２）。
【００７９】
続くステップ２２４では、用例コーパス４０から、以下の二つの条件に適合する用例文を検索する。すなわち、
（１）入力文とモダリティが一致する。
（２）入力文の主要領域に同一の内容語が存在する。
このうち、主要領域については、図７を参照して説明した通りである。
【００８０】
ステップ２２６では、ステップ２２４で検索された各用例文について、図８を参照して説明した方法により入力文との類似度を示すスコアを計算する。そして、最高のスコアを持つ文を類似用例文として選択する（ステップ２２８）。
【００８１】
ステップ２３０では、最高スコア文の対訳文を、類義語情報に従って修正する。修正後の対訳文を翻訳文として出力する（ステップ２３２）。
【００８２】
図１５に、図１４のステップ２３０で行なわれる対訳文修正処理のフローチャートを示す。図１５を参照して、対訳文修正処理では、まず入力文と類似用例文との間の類義語ペアの数Ｍを算出する（ステップ２５０）。続いてステップ２５２で、以下の繰返し処理のための繰返し制御変数ｉを０に設定する。続いてステップ２５４でｉに１を加算し、さらに加算後のｉがステップ２５０で求めた類義語ペアの数Ｍを超えたか否かを判定する（ステップ２５６）。ｉがＭを超えていれば処理を終了する。ｉがＭを超えていなければ制御はステップ２５８に進む。
【００８３】
ステップ２５８では、ｉ番目の類義語ペアを構成する（入力文と類似用例文の）単語の各々を、対訳辞書６０を用いて翻訳する。この結果、類義語ペアが翻訳されて目的言語置換情報が得られる。
【００８４】
続いてステップ２６０で、類似用例文の対訳文の中で、ｉ番目の類義語の訳語に相当する単語または単語群を、目的言語置換情報にしたがって置換する。置換の後、制御はステップ２５４に戻る。
【００８５】
こうして、ステップ２５４からステップ２６０までの処理をｉ＝１〜Ｍまで繰返す事により、類似用例文の対訳文中で、入力文中の単語と類義語関係にある語に相当する部分が、入力文中の単語の訳語で置換され、最終的に入力文の翻訳文が得られる。仮に入力文と類似用例文との間で類義語関係にあるものが存在しない場合には、上記したＭの値は０となる。従って、図１５のステップ２５６の判定が最初に行なわれたときに処理を終了する事になり、類似用例文の対訳文がそのまま出力される。
【００８６】
以上の様に本実施の形態の装置によれば、用例ベースの機械翻訳において、大まかに意味が等しい文を検索し、その対訳文を利用して翻訳が行なわれる。従来の機械翻訳と異なり、長い入力文に対しても翻訳を出力する能力が高いという特徴がある。また、本実施の形態の装置での翻訳には、入力文と用例コーパスとの間で文体が異なっていても翻訳が得られる確率が高いという特徴もある。文体の違いは、機能語によるところが大きいが、本実施の形態では内容語に基づいて類似用例文の検索を行なっているため、文体の違いの影響を受けにくいのである。
【００８７】
なお、本実施の形態では元言語として日本語の場合について説明した。それに伴い、モダリティを判定するためには文末の表現という文の表層的特徴を用いた。また入力文の主要領域の決定には、文の末尾の何語かを用いる様にした。しかし、本発明はそうした実施の形態には限定されない。モダリティまたは主要領域の判定には、文の表層的特徴を用いるだけでなく、文の深層構造を用いる様にしてもよい。そのために、たとえば、構文解析の結果を利用する様にしてもよい。また、モダリティの間で何らかの変形規則、例えば平叙文から疑問への変換などがあれば、異なるモダリティの文を互いに類似した文として取り扱ってもよい。
【００８８】
また、上記した実施の形態では、用例コーパス４０の更新を行なえる様にした。しかし、最初からモダリティ情報および内容語リストを含んだ用例コーパスを用いるのであれば、上記した処理は必要ない。そうした用例コーパスを用いて直ちに翻訳を行なう事ができる。
【００８９】
さらに、上記した実施の形態では、類義語ペアの翻訳に対訳辞書６０を用いた。しかし、対訳辞書６０は、元言語の単語と、目的言語の単語とをペアにして格納してあるため、たとえば数字を含んだ語については、数字ごとに対訳を格納しておく必要がある。そのため、対訳辞書６０の容量を節約する必要があれば、数字を含む語中の数字部分については何らかの記号で代替した形で対訳辞書に格納しておき、翻訳時に必要に応じて数字のみを入れ替える様にしておく事が考えられる。同様の事が、たとえば曜日、月などについてもいえる。
【００９０】
本実施の形態では、入力文と大まかな意味を共有する類似用例文を検索する様にし、類似用例文を検索した後にその対訳文を翻訳のために使用する。しかし、本発明は翻訳だけに適用可能なわけではない。本実施の形態のシステムは、たとえば質問応答システムにも応用する事ができる。
【００９１】
例えば、用例コーパスとして、質問文とそれに対する回答という文ペアからなるものを用いる事ができる。この場合、質問として与えられた入力文と大まかな意味を共有する質問文を用例コーパスから検索し、その質問文に対応する回答を、もとの質問に対する回答を提示する際に使用する事ができる。従って、本発明の「用例コーパス」とは、元言語の文とその対訳文との対からなるものには限定されず、入力文と比較される対象となる文と、その文に対応して予め準備された文との対からなるもの一般をも含む。
【００９２】
今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る機械翻訳装置のブロック図である。
【図２】用例コーパス４０の構成を示す図である。
【図３】モダリティの決定方法を説明するための図である。
【図４】モダリティの例を説明するための図である。
【図５】モダリティ情報および内容語リストが付与された後の用例コーパス４０の構成を示す図である。
【図６】類似用例検索部５４のより詳細なブロック図である。
【図７】入力文の主要領域の決定方法を説明するための模式図である。
【図８】スコア計算の際の重みの例を説明するための図である。
【図９】類義語置換部５８のより詳細なブロック図である。
【図１０】入力文と検索された用例文とに基づいて、対訳文を修正する過程を説明するための図である。
【図１１】本発明の一実施の形態に係るＰＤＡの外観を示す図である。
【図１２】図１１に示すＰＤＡのハードウェアブロック図である。
【図１３】コーパスの更新処理のフローチャートである。
【図１４】翻訳処理の全体フローチャートである。
【図１５】対訳文修正処理のフローチャートである。
【符号の説明】
３０機械翻訳装置、４０用例コーパス、４２入力文、４４、４８、５２選択部、４６内容語リスト抽出部、５０モダリティ識別部、５６シソーラス、５８類義語置換部、６０対訳辞書、６２翻訳文、６６コーパス更新部、８０同一モダリティ用例文抽出部、８２内容語条件検査部、８４スコア計算・選択部[0001]
TECHNICAL FIELD OF THE INVENTION
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an example-based sentence conversion apparatus, and more particularly to a sentence conversion apparatus represented by an example-based machine translation apparatus capable of conveying the meaning of a sentence in a natural dialogue sentence without significantly mistaking the meaning. About.
[0002]
[Prior art]
Various techniques have been tried for so-called machine translation, which performs translation using a computer. A typical example of such a system is a rule-based system that uses a translation rule created manually or semi-automatically, which is adopted in many machine translation systems. However, the rule-based machine translation requires a great deal of manpower to extract rules for translation and to build a dictionary required for translation, which causes a problem that the development process becomes large. Further, since rules for translation are different for each language, there is also a problem that even if a machine translation system for a certain language combination is developed, it is unlikely that the system can be applied to other language combinations.
[0003]
On the other hand, research has been conducted to realize high-accuracy translation in a small number of development steps by using machine translation using an example corpus. Here, the “example corpus” generally refers to an example database including many pairs of an original language sentence and a target language sentence corresponding to the original language sentence.
[0004]
Machine translation using an example corpus is roughly divided into the following two types.
[0005]
(1) Statistical Machine Translation (abbreviated as “SMT”)
In SMT, a translation model is once learned from an example corpus and a bilingual dictionary, and the best translation is searched according to the model at the time of execution.
[0006]
(2) Example-Based Machine Translation (referred to as "EBMT" for short)
EBMT uses the example corpus directly for translation. As for EBMT, there is one described in Patent Document 1 given below. The machine translator described in Patent Literature 1 translates an input sentence as follows. That is, an example most similar to the input sentence is searched from the example corpus. A word in the source language sentence of the searched example that is different from the input sentence is defined as a difference portion. A translation pattern is generated by replacing a part corresponding to the difference part in the target language sentence of the searched example with a variable. If a word in the source language corresponding to the variable in the translation pattern exists in the input sentence, a word in the target language corresponding to the word in the source language is acquired based on the bilingual dictionary, and Replace the variable with the resulting word.
[0007]
[Patent Document 1]
JP-A-2003-6193 (FIG. 1, paragraphs 0026 to 0031)
[Problems to be solved by the invention]
However, the technique described in Patent Literature 1 has a problem in that it is difficult to appropriately translate a natural conversation or the cost for the translation is high. This is due to the high cost of creating an example corpus that captures natural dialogue. In the case of written language, it is relatively easy to collect examples, but it is difficult to do so in a natural dialogue.
[0008]
Creating an example corpus of spoken words to translate natural dialogue would be costly. If the cost is to be reduced, a sufficient amount of data cannot be secured in the example corpus, and it is difficult to perform appropriate translation. Introducing a written corpus to ensure a sufficient amount of data means that when translating natural dialogue, the style of the input sentence differs from that of the example in the example corpus, making it difficult to perform appropriate translation. There's a problem. Therefore, there is a need for an example-based machine translation device that can appropriately translate natural dialogue at low cost.
[0009]
Furthermore, such conventional machine translation generally has a problem that if the translation fails, only a very bad translation can be obtained. This problem becomes more pronounced as the input sentence becomes longer. In order to prevent such a situation from occurring, it is useful to have a machine translation apparatus that can obtain an output that can understand the meaning of the input sentence by the obtained translation even if the translation fails. Further, such problems are not limited to the machine translation apparatus, but arise in connection with a general apparatus for converting or retrieving a sentence on an example basis.
[0010]
Therefore, an object of the present invention is to provide an example-based sentence conversion device capable of appropriately converting a natural dialogue sentence into another sentence at a small cost.
[0011]
It is another object of the present invention to provide an example-based sentence conversion device capable of appropriately translating a natural dialogue sentence in an original language into a sentence in a target language at a small cost.
[0012]
Still another object of the present invention is to be able to appropriately translate a natural dialogue sentence into another language sentence at a small cost, and to convey the meaning without error even if the translation fails. It is an object of the present invention to provide an example-based sentence conversion device.
[0013]
[Means for Solving the Problems]
As described above, in a conventional sentence conversion device such as a machine translation device, when the conversion of a sentence fails, only a very bad translation result is obtained or no translation is generally obtained. This is considered to be because, for example, a conventional machine translator aims to accurately translate an input sentence into its details. Therefore, if there is an error in the translation process, the effect becomes extremely large.
[0014]
However, depending on the situation, it may be necessary to convey the meaning of the input sentence rather than accurately translating it. For example, in the case of translating a conversation. In particular, in the case of translating a conversation in real time from the output of speech recognition, there is a means of transmitting information other than speech, so that it is not necessary to translate an input sentence precisely in detail.
[0015]
Therefore, the present invention solves such a problem associated with a sentence conversion device such as a machine translation device by adopting a configuration described below.
[0016]
That is, the example-based sentence conversion device according to the first aspect of the present invention includes an example corpus including a plurality of sentence pairs each including an example sentence and a corresponding sentence having a predetermined relationship with the example, Means for identifying the modality and outputting the modality information; means for extracting the content word list of the sentence; content word list extracted from the input sentence and the modality information of the input sentence; A means for searching for an example sentence similar to the input sentence from the example sentences in the example corpus based on the content word list extracted from the example sentences and the modality information of each example sentence; Means for correcting and outputting the corresponding sentence of the similar example sentence in accordance with the difference between them.
[0017]
Preferably, the means for searching includes a means for searching for an example sentence having a modality matching a modality of the input sentence in the example corpus, and a means for searching for an example sentence searched for by the means for searching for an example sentence. A means for selecting an example sentence having a predetermined relationship with the content word list extracted from the input sentence, and a similarity degree determined by a predetermined calculation formula for each of the example sentences selected by the means for selecting. , And selecting an example sentence having the highest similarity as a similar example sentence of the input sentence.
[0018]
More preferably, the means for selecting an example sentence includes a main area and a non-main area of the content word list of the input sentence, by a content word list of each of the example sentences searched by the means for searching for the example sentence. Means for determining the same, and the same content word exists in the content word list of the searched example sentences and the main area of the content word list of the input sentence determined by the content word list of the example sentence Means for selecting what to do.
[0019]
More preferably, the means for selecting a similar example sentence includes a thesaurus that stores synonym information and a thesaurus, wherein (1) the number of identical content words present in each example sentence and the main area; 2) The number of identical content words existing in each example sentence and the non-main area, (3) The number of each example sentence and the number of synonymous content words existing in the main area, (4) Each example sentence and the non-main Based on the number of synonym content words existing in the area, (5) the number of function words common to each example sentence and the input sentence, and (6) the number of function words different between each example sentence and the input sentence Means for calculating the similarity by a predetermined calculation formula, and means for selecting the example sentence for which the highest similarity was calculated by the means for calculating the similarity as the similar example sentence. Including.
[0020]
Preferably, the means for calculating the similarity multiplies the values of (1) to (6) by weights assigned in advance in this order from large to small, and uses the sum as the similarity. Means for calculating.
[0021]
More preferably, each sentence pair of the example corpus includes an example sentence of a predetermined original language and a sentence of a target language corresponding to the example sentence of the original language.
[0022]
More preferably, each of the sentence pairs in the example corpus includes an example sentence in a predetermined original language and a sentence in a target language corresponding to the sentence in the original language, and the example-based sentence conversion device further includes: Means for detecting a pair of synonym content words included in an input sentence and a similar example sentence by referring to a thesaurus storing synonym content word information, a bilingual dictionary of a source language and a target language, and the thesaurus Means for translating a synonymous content word pair into a target language word by referring to a bilingual dictionary, and correcting a corresponding sentence of a similar example sentence using the translated synonymous content word pair Means.
[0023]
Preferably, the example-based sentence conversion device further controls a means for outputting modality information and a means for extracting a content word list of the sentence, so that the modality information and the sentence of each example sentence in the example corpus are changed. Means for creating a content word list, and means for updating the example corpus so as to add the modality information and the content word list of the sentence created by the means for creating to the corresponding example sentence in the example corpus. including.
[0024]
The computer program according to the second aspect of the present invention, when executed by a computer, causes the computer to operate as any of the example-based sentence conversion apparatuses described above.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a machine translation apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings. In the drawings referred to below, the same components have the same reference numerals and names. Their functions are the same. Therefore, detailed description thereof will not be repeated.
[0026]
In the apparatus of the present embodiment, the meaning of the input sentence is translated, and the purpose is not to precisely translate the sentence. This is because in translation for dialogue, there is often no problem in the progress of the dialogue as long as the meaning is conveyed even if the detailed meaning is lost. For this reason, the present embodiment is characterized by a method of using an example corpus and searching the example corpus for an example sentence similar to the input sentence (referred to as a similar example sentence).
[0027]
First, terms used in the following description of the present embodiment will be defined. The details of these terms will be described later. Further, the following definitions are merely definitions in the present embodiment, and depending on the embodiment, it is also possible to use a definition according to another definition.
[0028]
<Similar example sentences>
The similar example sentence refers to a sentence that shares the meaning of the input sentence with the example sentence in the example corpus used for translation. More specifically, it is determined based on a modality and a content word list described below. The similar example sentence does not have to be exactly equal to the input sentence.
[0029]
<Modality>
Modality refers to the general part of a subjective expression in a sentence. It is also called legality or legal category. For example, classifications such as declarative, question, request, denial, etc. are examples.
[0030]
<Content word>
The term “content word” generally refers to a word that expresses the name, property, action, situation, and the like of an object. In the following description, nouns, adjectives, verbs, and adverbs are used as content words.
[0031]
<Normal form of word>
The normal form of a word refers to a form in which the inflected word is integrated into one form, such as a closed form, or a trivial difference in the notation, such as a sentence kana.
[0032]
<Content list>
The content word list is a list of normal forms of content words extracted from a sentence arranged in the order of appearance in the sentence.
[0033]
<Main area of input sentence>
When calculating the similarity between the input sentence and the example sentence, the main area of the input sentence is “the input sentence when the number of content words included in the content word list of the example sentence to be compared is n”. At the end of the content word list.
[0034]
<Thesaurus>
The thesaurus refers to data in which information in which words are arranged according to the semantic relationship between words is recorded.
[0035]
<The words "identical" and "synonyms">
“Two words are“ identical ”” means that the two words are exactly the same and match. When two words are "synonymous", it means that the two words are defined as having a similar meaning (synonymous relationship) in the thesaurus.
[0036]
[Module configuration]
FIG. 1 is a block diagram showing a module configuration of an example-based machine translation apparatus 30 according to an embodiment of the present invention. Referring to FIG. 1, machine translation apparatus 30 selects one of input sentence 42 and an example in example corpus 40 in response to an example corpus 40 and a selection signal provided from signal input unit 64. A selection unit 44; a content word list extraction unit 46 for extracting a content word list from the sentence output from the selection unit 44; and a modality of the sentence output from the selection unit 44, and modality information is output. And a modality identification unit 50 for the purpose.
[0037]
The machine translation device 30 further has an input for receiving the output of the content word list extraction unit 46, a first output 70 and a second output 72, and according to the value of the selection signal provided from the signal input unit 64, A selection unit 48 that outputs the content word list provided from the content word list extraction unit 46 to the first output 70 or the second output 72 is included. The machine translation device 30 also has an input for receiving the output of the modality identification unit 50, a first output 74 and a second output 76, and in accordance with the value of the selection signal provided from the content word list extraction unit 46, A selection unit 52 that outputs the modality information provided from the modality identification unit 50 to the first output 70 or the second output 76 is included.
[0038]
The machine translation device 30 is further connected to a thesaurus 56 including synonym information of words in the original language, a first output 70 of the selection unit 48, a first output 70 of the selection unit 52, and the example corpus 40, A similar example search unit for searching the example corpus 40 for a similar example sentence of the input sentence 42 and its bilingual sentence with reference to the content word list given from 48, the modality information given from the selection unit 52, and the thesaurus 56. 54.
[0039]
The machine translation device 30 further includes a bilingual dictionary 60 of the source language and the target language, and a synonym replacement unit 58 connected to the output of the similar example search unit 54 and the bilingual dictionary 60. The synonym substitution unit 58 receives the input sentence 42, the similar example sentence for the input sentence 42, and the bilingual sentence of the similar example sentence from the similar example searching unit 54, and refers to the bilingual dictionary 60 to Based on the difference between the similar example sentences, a bilingual sentence of the similar example sentence is corrected and output as a translated sentence 62.
[0040]
The machine translation device 30 is further connected to the second output 72 of the selection unit 48 and the second output 76 of the selection unit 52, and the content word list extraction unit 46 extracts the content word list extracted from the example sentences of the example corpus 40 and Based on the modality information extracted from the same example sentence of the example corpus 40 by the modality identification unit 50, the example corpus 40 is updated so as to add the content word list and the modality information to the example sentence in the example corpus 40. And a corpus updating unit 66 for performing the operations.
[0041]
The selection signal given via the signal input unit 64 takes a first value when calculating the content word list and the modality information of the input sentence 42, and provides a content word list and a content word list for each of the example sentences in the example corpus 40. In the case of processing for calculating modality information, the second value is used.
[0042]
When the selection signal takes the first value, the selection unit 44 selects the input sentence 42 and gives it to the content word list extraction unit 46 and the modality identification unit 50. At this time, the selection unit 48 and the selection unit 52 provide the content word list and the modality information related to the input sentence 42 to the similar example search unit 54, respectively.
[0043]
When the selection signal takes the second value, the selection unit 44 selects an example sentence provided from the example corpus 40 and provides the selected example sentence to the content word list extraction unit 46 and the modality identification unit 50. At this time, the selecting unit 48 and the selecting unit 52 provide the corpus updating unit 66 with the content word list and the modality information obtained from the example sentences in the example corpus 40, respectively.
[0044]
FIG. 2 shows an example of the contents of the example corpus 40. Referring to FIG. 2, example corpus 40 includes a large number of sentence pairs including an example sentence in the original language and a bilingual sentence in the target language.
[0045]
A method of identifying a modality of a sentence performed by the modality identification unit 50 will be described. The modality identification unit 50 identifies the modality of the sentence using the surface features of the sentence. For example, if Japanese is adopted as the original language, the modality can be identified by a specific end-of-sentence expression. FIG. 3 shows, in a table form, modalities relating to Japanese and sentence end expressions specific to each modality. For example, the modality of a sentence ending with the final particle "ka" or "ne" is "question". Similarly, the modality of a sentence having an end-of-sentence expression such as "can you please", "can you please", "isn't it?", "Please" or "please" is "request". The modality of a sentence ending with the auxiliary verb "no" or "no" is "denial". There may be other expressions specific to each modality, but they are not all listed here. In the present embodiment, all the modalities of sentences that do not correspond to the end-of-sentence expression in FIG. 3 are determined to be “declaration”. FIG. 4 shows an example of a sentence and an example of its modality.
[0046]
FIG. 5 shows an example of the contents of the example corpus 40 after being updated by the corpus updating unit 66. For each sentence pair shown in FIG. 2, modality information of the example sentence in the original language and a content word list extracted from the example sentence in the original language are added.
[0047]
The similar example search unit 54 illustrated in FIG. 1 searches the example sentences in the example corpus 40 for a sentence that most closely resembles the input sentence 42 from sentences that satisfy the following two conditions.
(Condition 1) The modality matches the modality of the input sentence 42.
(Condition 2) There is a content word list including the same content words as those existing in the main area of the input sentence 42.
[0048]
FIG. 6 shows a detailed functional block diagram of the similar example search unit 54 shown in FIG. Referring to FIG. 6, similar example search unit 54 includes the same modality example sentence extraction unit 80 for extracting, from among the example sentences included in example corpus 40, those that match the modality of input sentence 42, From the example sentence extracted by the example sentence extraction unit 80, the content word condition inspection unit 82 and the content word condition inspection unit 82 for selecting only the example whose corresponding content word list satisfies a predetermined condition. A score calculation / selection unit 84 for assigning a score indicating the degree of similarity to the input sentence 42 to each of the selected example sentences by a method described later and selecting only the example sentence having the highest score.
[0049]
The function of the same modality example sentence extracting unit 80 is as described above, and the configuration for that is also clear.
[0050]
FIG. 7 shows a method of checking the condition of the content word list performed by the content word condition checking unit 82. Referring to FIG. 7, a content word list is created by content word list extraction unit 46 for input sentence 42 as shown in the upper part. For each example sentence included in the example corpus 40, a content word list is created in advance by the content word list extraction unit 46, and is added to each example sentence by the corpus update unit 66.
[0051]
The content word condition checking unit 82 is for selecting an example sentence that satisfies the condition 2 out of the example sentences that satisfies the condition 1 described above. The definition of the "main area of the input sentence" used in this condition is as described above.
[0052]
As shown in the upper part of FIG. 7, the content word list of the first example sentence is three words. Therefore, when determining whether or not the input sentence 42 is similar to the first example sentence, the main area of the input sentence 42 is the last three words (noisy, room, change) of the content word list. . Of these, the two words 100 "room" and "change" are also included in the content word list of the first example sentence. Therefore, the first example sentence satisfies condition 2.
[0053]
The content word list of the second example sentence is two words. Therefore, the main area of the input sentence 42 when determining whether or not the input sentence 42 is similar to the second example sentence is the last two words (room, change) of the content word list. Among them, there is no word that is also present in the content word list (adjacent, noisy) of the second example sentence. Therefore, it is determined that the second example sentence is not similar to the input sentence 42.
[0054]
The content word condition checking unit 82 selects an example sentence similar to the input sentence 42 from those whose modalities match the input sentence 42 as described above.
[0055]
The score calculation / selection unit 84 assigns a score to each of the example sentences that satisfy the conditions 1 and 2 output from the content word condition inspection unit 82 according to the conditions shown in FIG. In FIG. 8, important factors are listed in order from the top, and each factor is multiplied by a weight and added. In this example, the top factors govern most of the score calculation results. Of course, the method of determining the weight is not limited to that shown in FIG. There may be a setting method in which a plurality of factors have substantially the same weight.
[0056]
FIG. 9 shows a more detailed functional block diagram of the synonym replacement section 58 shown in FIG. Referring to FIG. 9, synonym replacement section 58 extracts a synonym pair between input sentence 42 and the similar example sentence selected by similar example search section 54, and determines whether the number of synonym pairs is 0 or not. And a synonym extraction unit 120 for outputting the determined determination signal and the extracted synonym pair. The judgment signal takes a first value when the number of synonym pairs is 0, and takes a second value when the number is not 0.
[0057]
The synonym replacement unit 58 is further connected to receive the example sentence and its bilingual sentence from the similar example search unit 54, and further connected to the bilingual dictionary 60, and when the judgment signal from the synonym extraction unit 120 is the second value. The process of translating the synonym pair given from the synonym extraction unit 120 using the bilingual dictionary 60, replacing the portion corresponding to the synonym in the bilingual sentence of the example sentence with the synonym translation, and outputting the same. It has a synonym replacement unit 124 for performing a pair and outputting the resulting translation, a first input for receiving a bilingual sentence of the example sentence, and a second input for receiving the output of the synonym replacement unit 124. The selection unit 12 for selecting and outputting the bilingual sentence when the determination signal output from the synonym extraction unit 120 is the first value and the output of the synonym replacement unit 124 when the determination signal is the second value. Including the door.
[0058]
[motion]
This machine translator 30 operates as follows. The operation is roughly divided into first and second phases. In the first aspect, a task of adding modality information and a content word list to each example sentence in the example corpus 40 is performed. In the second aspect, the input sentence 42 is translated using the example corpus 40. Is performed.
[0059]
In the first aspect, the value of the selection signal given to the signal input unit 64 is the second value described above. At this time, the selection unit 44 shown in FIG. 1 selects the output of the example corpus 40. The content word list extraction unit 46 extracts a content word list for each example sentence provided from the selection unit 44 and supplies the content word list to the selection unit 48. The selecting unit 48 supplies the output of the content word list extracting unit 46 to the corpus updating unit 66. The modality identification unit 50 identifies a modality for each example sentence provided from the selection unit 44 and provides modality information to the selection unit 52. The selection unit 52 gives the output of the modality identification unit 50 to the corpus update unit 66. The corpus update unit 66 adds the content word list provided from the content word list extraction unit 46 and the modality information provided from the selection unit 52 to the corresponding example sentence in the example corpus 40.
[0060]
When the above-described processing is completed for all example sentences in the example corpus 40, the processing of the first aspect ends.
[0061]
In the second aspect, the selection signal from the signal input unit 64 is set to a first value. The selection unit 44 selects the input sentence 42 and gives it to the content word list extraction unit 46 and the modality identification unit 50. The content word list extraction unit 46 extracts a content word list from the input sentence 42 and supplies the content word list to the selection unit 48. The selection unit 48 gives this content word list to the similar example search unit 54. The modality identification unit 50 identifies the modality of the input sentence 42 given from the selection unit 44, and provides the modality information to the similarity example search unit 54.
[0062]
The example sentence extraction unit 80 of the same modality of the similar example search unit 54 (see FIG. 6) extracts, from among the example sentences of the example corpus 40, a sentence having the same modality as the modality of the input sentence 42 (this is usually ) Are extracted and given to the content word condition checking unit 82. The content word condition checking unit 82 selects only those sentences satisfying the above-described conditions 1 and 2 relating to the content word list from the sentences provided from the same modality example sentence extracting unit 80, and gives them to the score calculation / selection unit 84. . The score calculation / selection unit 84 calculates a score for each of the sentences provided from the content word condition inspection unit 82 based on the calculation formula shown in FIG. 8, and an example sentence indicating the highest score (this is a “similar example sentence”). Is output to the synonym replacement unit 58.
[0063]
The synonym extraction unit 120 (see FIG. 9) of the synonym substitution unit 58 determines how many words in the similar example sentence have a synonymous relationship with the word in the input sentence. If there is no synonymous word, the synonym extraction unit 120 outputs the signal of the first value to the synonym substitution unit 124 and the selection unit 126 as described above. The synonym replacement unit 124 does not operate when the signal has the first value. The selection unit 126 selects and outputs a bilingual sentence of a similar example sentence.
[0064]
If there is a synonymous word, the synonym extraction unit 120 provides a signal of the second value to the synonym substitution unit 124 and the selection unit 126. The synonym extraction unit 120 further provides the synonym content word pair between the input sentence and the similar example sentence to the synonym replacement unit 124.
[0065]
The synonym replacement unit 124 translates the synonym pair using the bilingual dictionary 60. The translated synonym pair is called target language replacement information. The synonym replacement unit 124 further replaces the word in the bilingual sentence of the similar example sentence according to the target language replacement information. More specifically, the synonym replacement unit 124 searches for a word in the target language replacement information in the bilingual sentence of the similar example sentence. The found word is replaced with the translated word in the target language replacement information. The synonym replacement unit 124 repeats this process for all synonym pairs, and gives the resulting sentence to the selection unit 126.
[0066]
The selection unit 126 selects and outputs the output of the synonym replacement unit 124 because the signal provided from the synonym extraction unit 120 is the second value. As a result, the corrected bilingual sentence is obtained as an output of the selection unit 126.
[0067]
The operation of the synonym replacement unit 58 will be described again using the example shown in FIG. In this example, the input sentence is a sentence "Twin is full." A similar example sentence obtained from the example corpus 40 is a sentence "Single is full." 2 and FIG. 10, it is assumed to be "All the singles are full." Between this input sentence and the similar example sentence, the word 140 of “twin” of the input sentence and the word 142 of “single” in the similar example sentence have a synonymous relationship. The synonym extraction unit 120 extracts the synonym pair and supplies the synonym pair to the synonym replacement unit 124.
[0068]
The synonym replacement unit 124 translates the synonym pair with reference to the bilingual dictionary 60. In this case, the target language replacement information “singles” and “twins” is obtained. Using the target language replacement information, the synonym replacement unit 124 replaces the word “singles” 144 in the bilingual sentence of the similar example sentence with the word “twins” 146 and outputs the corrected bilingual sentence. .
[0069]
[Realization using computer]
The above-described machine translation apparatus can be realized by general computer hardware and software operating on the computer hardware. In particular, by realizing a portable computer, translation can be efficiently performed in a situation where spoken language is mainly used, such as a travel conversation. FIG. 11 shows the appearance of such a machine translation device.
[0070]
Referring to FIG. 11, machine translation apparatus 160 includes a so-called PDA (Personal Digital Assistant) 170 and a memory card 180 that can be mounted on PDA 170. The PDA 170 includes a display panel 172 in which information display and an input device such as a stylus pen are integrated, a plurality of buttons 176 used for various operations, and movement of a cursor position displayed on the display panel 172. And a cross key 178 to be operated.
[0071]
FIG. 12 is a hardware block diagram of the PDA 170. Referring to FIG. 12, PDA 170 includes a CPU (Central Processing Unit) 186 connected to receive input from button 176 and cross key 178, and a bus 196 to which CPU 186 and display panel 172 are connected. A ROM (Read-Only Memory) 188 storing a boot-up program, an operating system (OS) program, and the like, connected to the bus 196, and a RAM (Random Access Memory) used by the machine translation program according to the present invention for work. 190, and a memory card interface 192 to which the memory card 180 is attached.
[0072]
The memory card 180 stores the example corpus 40, thesaurus 56, the bilingual dictionary 60, and the machine translation program shown in FIG. By executing this machine translation program by the CPU 186, the machine translation device described above can be realized.
[0073]
As described above, the machine translation device operates in two aspects, that is, the first aspect of updating the example corpus 40 and the second aspect of translation using the updated example corpus 40. FIG. 13 shows a schematic flowchart of the update program of the example corpus 40. This program can be started from a menu displayed on the display panel 172 when the memory card 180 is inserted into the PDA 170. Activating this program corresponds to setting the value of the selection signal provided via the signal input unit 64 in FIG. 1 to the second value.
[0074]
Referring to FIG. 13, in the updating process of example corpus 40, first, the modality of each example sentence in example corpus 40 is identified (step 200). The criteria for modality identification are as described above with reference to FIGS.
[0075]
Subsequently, in step 202, a content word list is extracted from each example sentence of the example corpus 40. The content word list is as described above with reference to FIG.
[0076]
Further, in step 204, the modality of each example sentence in the example corpus 40 and its content word list are recorded in the example corpus 40 in association with a sentence pair including each example sentence. The contents of the example corpus 40 after recording are as shown in FIG.
[0077]
Thus, the updating process of the example corpus 40 ends. This process may be performed only once when the example corpus 40 is first introduced or when the content of the example sentence stored in the example corpus 40 is changed.
[0078]
FIG. 14 shows a flowchart of a program for implementing the machine translation of the second aspect. This program is started when an input sentence is given. Referring to FIG. 14, the program first identifies the modality of the input sentence (step 220). Subsequently, a content word list is extracted from the input sentence (step 222).
[0079]
In the following step 224, an example sentence that meets the following two conditions is searched from the example corpus 40. That is,
(1) The input sentence matches the modality.
(2) The same content word exists in the main area of the input sentence.
Among them, the main area is as described with reference to FIG.
[0080]
In step 226, for each example sentence searched in step 224, a score indicating the degree of similarity with the input sentence is calculated by the method described with reference to FIG. Then, the sentence having the highest score is selected as a similar example sentence (step 228).
[0081]
In step 230, the bilingual sentence of the highest score sentence is corrected according to the synonym information. The translated bilingual sentence is output as a translated sentence (step 232).
[0082]
FIG. 15 shows a flowchart of the bilingual sentence correction process performed in step 230 of FIG. Referring to FIG. 15, in the bilingual sentence correction process, first, the number M of synonym pairs between the input sentence and the similar example sentence is calculated (step 250). Subsequently, in step 252, a repetition control variable i for the following repetition processing is set to 0. Subsequently, 1 is added to i in step 254, and it is determined whether or not i after the addition exceeds the number M of synonym pairs obtained in step 250 (step 256). If i exceeds M, the process ends. If i does not exceed M, control proceeds to step 258.
[0083]
In step 258, each of the words (of the input sentence and the similar example sentence) constituting the i-th synonym pair is translated using the bilingual dictionary 60. As a result, the synonym pair is translated to obtain target language replacement information.
[0084]
Subsequently, in step 260, in the bilingual sentence of the similar example sentence, a word or a group of words corresponding to the translation of the i-th synonym is replaced according to the target language replacement information. After the replacement, control returns to step 254.
[0085]
In this manner, by repeating the processing from step 254 to step 260 from i = 1 to M, in the bilingual sentence of the similar example sentence, a portion corresponding to a word having a synonymous relationship with the word in the input sentence is replaced with the word in the input sentence. It is replaced with a translation, and a translation of the input sentence is finally obtained. If there is no synonymous relationship between the input sentence and the similar example sentence, the value of M is 0. Therefore, the process ends when the determination at step 256 in FIG. 15 is made first, and the bilingual sentence of the similar example sentence is output as it is.
[0086]
As described above, according to the apparatus of the present embodiment, in example-based machine translation, a sentence having roughly the same meaning is searched, and translation is performed using the bilingual sentence. Unlike the conventional machine translation, it has a feature that the ability to output a translation even for a long input sentence is high. Further, the translation performed by the apparatus according to the present embodiment also has a feature that the probability of obtaining a translation is high even if the style of the input sentence and the example corpus are different. The difference in style is largely attributable to the functional words, but in the present embodiment, similar example sentences are searched based on the content words, so that the difference in style is less likely to be affected.
[0087]
In the present embodiment, the case where the original language is Japanese has been described. Accordingly, to determine the modality, we used the surface features of the sentence, the expression at the end of the sentence. In addition, several words at the end of the sentence are used to determine the main area of the input sentence. However, the present invention is not limited to such an embodiment. For the determination of the modality or the main area, not only the surface features of the sentence but also the deep structure of the sentence may be used. For this purpose, for example, the result of the syntax analysis may be used. Also, if there is some transformation rule between modalities, for example, conversion from declarative sentence to question, sentences of different modalities may be treated as sentences similar to each other.
[0088]
In the above-described embodiment, the example corpus 40 can be updated. However, if the example corpus including the modality information and the content word list is used from the beginning, the above processing is not necessary. The translation can be performed immediately using such an example corpus.
[0089]
Further, in the above-described embodiment, the bilingual dictionary 60 is used for translating a synonym pair. However, since the bilingual dictionary 60 stores the words in the original language and the words in the target language as a pair, it is necessary to store a bilingual translation for each number, for example, for words that include numbers. Therefore, if it is necessary to save the capacity of the bilingual dictionary 60, the numeral portion in the word including the number is stored in the bilingual dictionary in a form substituted by some symbol, and only the numeral is replaced as needed at the time of translation. It is conceivable to do so. The same can be said of, for example, the day of the week and the month.
[0090]
In the present embodiment, a similar example sentence that shares a general meaning with an input sentence is searched, and after searching for a similar example sentence, the bilingual sentence is used for translation. However, the invention is not only applicable to translation. The system of the present embodiment can be applied to, for example, a question answering system.
[0091]
For example, as an example corpus, a sentence pair consisting of a question sentence and an answer thereto can be used. In this case, a question sentence that shares the general meaning with the input sentence given as the question can be searched from the example corpus, and the answer corresponding to the question sentence can be used when presenting the answer to the original question. it can. Therefore, the “example corpus” of the present invention is not limited to a sentence in the original language and a pair of the translation in the original language, and corresponds to the sentence to be compared with the input sentence and the sentence. This includes general sentence pairs prepared in advance.
[0092]
The embodiment disclosed this time is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after considering the description of the detailed description of the invention, and all changes within the meaning and range equivalent to the wording described therein are described. Including.
[Brief description of the drawings]
FIG. 1 is a block diagram of a machine translation device according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a configuration of an example corpus 40;
FIG. 3 is a diagram for explaining a method of determining a modality.
FIG. 4 is a diagram illustrating an example of a modality.
FIG. 5 is a diagram showing a configuration of an example corpus 40 after modality information and a content word list are added.
FIG. 6 is a more detailed block diagram of a similar example search unit 54;
FIG. 7 is a schematic diagram for explaining a method for determining a main area of an input sentence.
FIG. 8 is a diagram for explaining an example of weights at the time of score calculation.
FIG. 9 is a more detailed block diagram of the synonym replacement section 58;
FIG. 10 is a diagram for explaining a process of correcting a bilingual sentence based on an input sentence and a searched example sentence.
FIG. 11 is a diagram showing an appearance of a PDA according to one embodiment of the present invention.
FIG. 12 is a hardware block diagram of the PDA shown in FIG. 11;
FIG. 13 is a flowchart of a corpus update process.
FIG. 14 is an overall flowchart of a translation process.
FIG. 15 is a flowchart of a bilingual sentence correction process.
[Explanation of symbols]
Reference Signs List 30 machine translation device, 40 example corpus, 42 input sentences, 44, 48, 52 selection unit, 46 content word list extraction unit, 50 modality identification unit, 56 thesaurus, 58 synonym replacement unit, 60 bilingual dictionary, 62 translation sentences, 66 Corpus update unit, 80 Same-modality example sentence extraction unit, 82 Content word condition inspection unit, 84 Score calculation / selection unit

Claims

各々が、用例文と、用例と所定の関係にある対応文とからなる複数の文ペアを含む用例コーパスと、
文のモダリティを識別してモダリティ情報を出力するための手段と、
文の内容語リストを抽出するための手段と、
入力文から抽出される内容語リストおよび前記入力文のモダリティ情報、ならびに前記用例コーパス中の各用例文から抽出される内容語リストおよび各用例文のモダリティ情報に基づいて、前記用例コーパスの用例文の中から前記入力文の類似用例文を検索するための手段と、
前記入力文と前記類似用例文との間の相違に従って、前記類似用例文の対応文を修正して出力するための手段とを含む、用例ベースの文変換装置。An example corpus including a plurality of sentence pairs each of which includes an example sentence and a corresponding sentence having a predetermined relationship with the example;
Means for identifying the modality of the sentence and outputting the modality information;
Means for extracting the content word list of the sentence;
Based on the content word list extracted from the input sentence and the modality information of the input sentence, and the content word list extracted from each example sentence in the example corpus and the modality information of each example sentence, the example sentence of the example corpus is used. Means for searching for a similar example sentence of the input sentence from
Means for correcting and outputting a corresponding sentence of the similar example sentence according to a difference between the input sentence and the similar example sentence.

前記検索するための手段は、
前記用例コーパス中で、入力文のモダリティと一致するモダリティを持つ用例文を検索するための手段と、
前記用例文を検索するための手段により検索された用例文のうち、前記入力文から抽出される内容語リストと所定の関係を有する用例文を選択するための手段と、
前記選択するための手段により選択された用例文の各々について予め定められた算出式に従って類似度を算出し、最も高い類似度の用例文を前記入力文の類似用例文として選択するための手段とを含む、請求項１に記載の用例ベースの文変換装置。The means for searching includes:
Means for searching for an example sentence having a modality matching the modality of the input sentence in the example corpus;
Means for selecting an example sentence having a predetermined relationship with the content word list extracted from the input sentence, among the example sentences searched by the means for searching for the example sentence,
Means for calculating a similarity according to a predetermined calculation formula for each of the example sentences selected by the means for selecting, and selecting an example sentence having the highest similarity as a similar example sentence of the input sentence; The example-based sentence converter of claim 1, comprising:

前記用例文を選択するための手段は、
前記用例文を検索するための手段により検索された用例文の各々の内容語リストによって、前記入力文の内容語リストの主要領域と非主要領域とを決定するための手段と、
前記検索された用例文のうち、その内容語リストと、当該用例文の内容語リストによって決定された入力文の内容語リストの前記主要領域とに同一の内容語が存在するものを選択するための手段とを含む、請求項２に記載の用例ベースの文変換装置。The means for selecting the example sentence includes:
Means for determining a main area and a non-main area of the content word list of the input sentence, by a content word list of each of the example sentences searched by the means for searching for the example sentence,
To select the searched example sentence in which the same content word exists in the content word list and the main area of the content word list of the input sentence determined by the content word list of the example sentence. 3. The example-based sentence conversion device according to claim 2, comprising:

前記類似用例文として選択するための手段は、
類義語情報を格納したシソーラスと、
前記シソーラスを参照して、
（１）各用例文と、前記主要領域とに存在する同一内容語の数、
（２）各用例文と、前記非主要領域とに存在する同一内容語の数、
（３）各用例文と、前記主要領域に存在する類義内容語の数、
（４）各用例文と、前記非主要領域に存在する類義内容語数の数、
（５）各用例文と、前記入力文とに共通する機能語の数、および
（６）各用例文と、前記入力文とで異なる機能語の数
に基づいて、予め定められた算定式により類似度を算出するための手段と、
前記類似度を算出するための手段により最も高い類似度が算出された用例文を前記類似用例文として選択するための手段とを含む、請求項３に記載の用例ベースの文変換装置。The means for selecting the similar example sentence is:
A thesaurus storing synonym information,
Referring to the thesaurus,
(1) the number of identical content words present in each example sentence and the main area,
(2) the number of identical content words present in each example sentence and the non-main area,
(3) each example sentence and the number of synonymous content words existing in the main area;
(4) each example sentence and the number of synonymous content words existing in the non-main area;
(5) The number of function words common to each example sentence and the input sentence, and (6) the number of function words different from each example sentence and the input sentence, based on a predetermined calculation formula. Means for calculating the degree of similarity;
4. The example-based sentence conversion apparatus according to claim 3, further comprising: means for selecting, as the similar example sentence, an example sentence for which the highest similarity is calculated by the means for calculating the similarity.

前記類似度を算出するための手段は、前記（１）〜（６）の値に対してこの順で大から小となる様に予め割当てられた重みを乗算し、その和を類似度として算出するための手段を含む、請求項４に記載の用例ベースの文変換装置。The means for calculating the similarity multiplies the values of (1) to (6) by weights assigned in advance in this order from large to small, and calculates the sum as the similarity. 5. The example-based sentence conversion device according to claim 4, further comprising means for performing:

前記用例コーパスの文ペアの各々は、所定の元言語の用例文と、当該元言語の用例文に対応する目的言語の文とを含む、請求項１〜請求項５のいずれかに記載の用例ベースの文変換装置。The example according to any one of claims 1 to 5, wherein each of the sentence pairs in the example corpus includes an example sentence in a predetermined original language and a sentence in a target language corresponding to the example sentence in the original language. Based sentence converter.

前記用例コーパスの文ペアの各々は、所定の元言語の用例文と、当該元言語の文に対応する目的言語の文とを含み、
前記用例ベースの文変換装置はさらに、
前記元言語の類義内容語情報を格納したシソーラスと、
前記元言語と前記目的言語との対訳辞書と、
前記シソーラスを参照して、前記入力文と前記類似用例文とに含まれる類義内容語のペアを検出するための手段と、
前記対訳辞書を参照して類義内容語のペアを前記目的言語の単語に翻訳するための手段と、
前記類似用例文の対応文を、翻訳された類義内容語のペアを用いて修正するための手段とを含む、請求項１に記載の用例ベースの文変換装置。Each of the sentence pairs of the example corpus includes an example sentence of a predetermined original language and a sentence of a target language corresponding to the sentence of the original language,
The example-based sentence converter further comprises:
A thesaurus storing synonymous content word information of the original language,
A bilingual dictionary of the source language and the target language,
Means for detecting a pair of synonymous content words included in the input sentence and the similar example sentence with reference to the thesaurus,
Means for translating a pair of synonymous content words into words in the target language with reference to the bilingual dictionary,
2. The example-based sentence conversion apparatus according to claim 1, further comprising: means for correcting a corresponding sentence of the similar example sentence using a translated synonymous content word pair.

さらに、前記モダリティ情報を出力するための手段および前記文の内容語リストを抽出するための手段を制御して、前記用例コーパス中の各用例文についてモダリティ情報および文の内容語リストを作成するための手段と、
前記作成するための手段により作成されたモダリティ情報および文の内容語リストを前記用例コーパス中の対応する用例文に付す様に前記用例コーパスを更新するための手段とを含む、請求項１〜請求項７のいずれかに記載の用例ベースの文変換装置。Further, controlling the means for outputting the modality information and the means for extracting the content word list of the sentence to create modality information and the content word list of the sentence for each example sentence in the example corpus. Means,
Means for updating the example corpus so as to add the modality information and the content word list of the sentence created by the means for creating to the corresponding example sentence in the example corpus. Item 9. The example-based sentence conversion device according to any one of Items 7.

コンピュータにより実行されると、当該コンピュータを、請求項１〜請求項８のいずれかに記載の用例ベースの文変換装置として動作させる、コンピュータプログラム。A computer program which, when executed by a computer, causes the computer to operate as the example-based sentence conversion device according to claim 1.