JP3850742B2

JP3850742B2 - Language model adaptation method

Info

Publication number: JP3850742B2
Application number: JP2002047047A
Authority: JP
Inventors: 秀治中嶋; 博史山本; 太郎渡辺
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2002-02-22
Filing date: 2002-02-22
Publication date: 2006-11-29
Anticipated expiration: 2022-02-22
Also published as: JP2003248496A

Description

【０００１】
【発明の属する技術分野】
この発明は、言語モデルの適応化方法に関する。
【０００２】
【従来の技術】
ある言語の言語モデルを適応先タスクに適応させるためには、小規模でも、適応先タスクでのコーパスであって、適応化しようとする言語モデルの言語で記述されたコーパスが必要となる。多言語の話し言葉の音声翻訳器の言語モデルのタスク適応となると、各言語の適応先タスクの小規模コーパスが必要となる。この収集はコスト高となり収集が困難である。
【０００３】
そこで、従来においては、集められた少量のコーパスを用いて言語モデルの適応化を行ったり（文献[1] 参照）、ＷＷＷ（World Wide Web) に情報を求めて得られた情報を用いて言語モデルの適応化を行ったりしていた（文献[2] 参照）。
【０００４】
文献[1] ：A. I. Rudnicky: "Language Modeling with Limited Domain Data," Proc. of the ARPA Spoken Language Systems Technogy Workshop,pp.66-69(1995).
文献[2] ：A. Berger, et al.:"Just-In-Time Language Modeling," Proc. of the ICASSP,pp. 705-708(1998).
【０００５】
これらは対象がディクテーションであり、話し言葉に比べて大量に存在する書き言葉を集めて利用する場合がほとんどであった。また、データの集めにくい医療所見のディクテーションや、マンマシンインターフェースの分野では、タスクの会話を記述するＣＦＧ(Context Free Grammar(文脈自由文法))を人手で作成し、作成したＣＦＧによって人工的に生成したデータを利用して言語モデルの適応を行っていた（文献[3] ，[4] 参照）。
【０００６】
文献[3] ：伊藤伸泰, 荻野紫穂, 新島仁: " 文法を利用したＮ−ｇｒａｍモデルのタスク適応，”言語処理学会第４回年次大会発表論文集，pp. 610-613(1998).
文献[4] ：Y.Wang,et al ed.,"A Unified Context-Free Grammar and N-gram Model for Spoken Language Processing," Proc. of ICASSP, 2000.
【０００７】
【発明が解決しようとする課題】
この発明は、適応化しようとする言語モデルの言語以外の言語で記述された新規タスクでのモノリンガルコーパスを用いることによって、新規タスクに適応した言語モデルを作成することができる言語モデルの適応化方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
請求項１に記載の発明は、音声翻訳器のタスクを拡大する際に、新規タスクに適応した第１の言語の言語モデルを作成するための言語モデルの適応化方法において、第１の言語以外の第２の言語で記述された新規タスクの第１のモノリンガルコーパスを、機械翻訳器を用いて第１の言語に翻訳することによって、第１の言語で記述された新規タスクでの第２のモノリンガルコーパスを作成する第１ステップ、および第１ステップで作成された新規タスクでの第２のモノリンガルコーパスに基づいて、言語モデルを適応化する第２ステップを備えていることを特徴とする。
【０００９】
請求項２に記載の発明は、請求項１に記載の言語モデルの適応化方法において、第２ステップは、第１ステップで作成された新規タスクでの第１の言語で記述された第２のモノリンガルコーパス、および一般タスクでの第１の言語で記述されたモノリンガルコーパスから、新規タスクに適応した第１の言語の言語モデルを作成するステップを備えていることを特徴とする。
【００１０】
【発明の実施の形態】
以下、図面を参照して、この発明の実施の形態について説明する。
【００１１】
〔１〕本発明の概要についての説明
ある言語の言語モデルを適応先タスクに適応させるためには、小規模でも、適応先タスクでのコーパスであって、適応化しようとする言語モデルの言語で記述されたコーパスが必要となる。多言語の話し言葉の音声翻訳器の言語モデルのタスク適応となると、各言語の適応先タスクの小規模コーパスが必要となる。この収集はコスト高となり収集が困難である。
【００１２】
本発明では、適応化しようとする言語モデルの言語以外の言語で記述された新たな適応先タスクでのモノリンガルコーパスに基づいて、言語モデルの適応化を行う。つまり、適応化しようとする言語モデルの言語以外の言語で記述された新たな適応先タスクでのモノリンガルコーパスを機械翻訳器によって翻訳することによって、適応化しようとする言語モデルの言語で記述された適応先タスクでの擬似的なコーパスを作成し、それを使って統計的言語モデルの適応化を行う。
【００１３】
機械翻訳器の翻訳用の知識には、隣接単語間の接続情報が保持されていることが期待できる。適応先タスクのモノリンガルコーパスを翻訳器によって翻訳することにより得られた疑似的なコーパスは、仮に訳文全体としては間違いを含んでいても、トピックや文のスタイルが適応先のタスクに適合し、かつ、隣接単語程度の局所的な文脈では適切な語順が反映されていると考えられる。したがって、作成された擬似的コーパスは言語モデルの適応用のコーパスとして利用できることが期待できる。
【００１４】
〔２〕言語モデルの適応化方法についての説明
はじめに、言語モデルの適応という課題を明らかにする。ここでは、ソース言語をLanguage2(例えば日本語) とし、ターゲット言語をLanguage1(例えば英語) とする。
【００１５】
図１に示すように、一般的な意味での言語モデルの適応という課題は、一般タスク(General Task)のデータと適応先タスク(Target Task) のデータとを使って、ターゲットのタスクに適応した言語モデルを作成することである。このため、小規模でも適応先のターゲットタスクのコーパスが必要となる。
【００１６】
一方、本発明での言語モデルの適応という課題は、図２に示すようになる。すなわち、図２でターゲット言語(Language1) の言語モデルの適応において、適応先タスク(Target Task) に対する言語データＴ_L1が存在しない場合に、その代わりとなる言語データＴ' _L1を、そのタスクでの他の言語のデータＴ_L2から言語翻訳によって作成し、それを使って言語モデルを適応させることである。
【００１７】
このターゲット言語(Language1) の言語データＴ' _L1の作成に用いられる翻訳器では、翻訳用の知識の中に、ターゲット言語の局所的な語順の情報が保持されている。そのため隣接単語間程度では比較的正しい語順の翻訳結果を得られる見込みがある。その結果、その作成された言語データＴ' _L1から、言語モデルに必要な隣接単語間の接続性に関する情報が得られる見込みがある。
【００１８】
適応の手法としては、ＭＡＰ適応( 文献[5] 参照) やモデルの線形結合( 上記文献[1] 参照) など様々な手法があるが、この実施の形態では、モデルの線形結合を利用する。
【００１９】
文献[5] ：H. Masataki,et.al: "Task Adaptation using MAP Estimation in N-gram Language Modeling," Proc. of the ICASSP, 1997,pp.783-786.
【００２０】
図３は、本発明による言語モデルの適応化方法の手順を示している。
ここでは、ターゲットタスク（新規タスク）に適応した第１の言語の言語モデルを生成する場合の言語モデルの適応化方法について説明する。
ステップ１：一般タスクのバイリンガルコーパスを使って機械翻訳器を作成する。ここで、一般タスクだけを使った一般タスク用の言語モデルＬＭ_G（図２参照) も作成される。
【００２１】
ステップ２：ステップ１で作成した機械翻訳器を使って、第１言語以外の第２言語で記述されたターゲットタスクのモノリンガルコーパスＴ_L2（図２参照) を翻訳することにより、第１言語で記述されたターゲットタスクの擬似的なモノリンガルコーパスＴ' _L1（図２参照) を作成する。
【００２２】
ステップ３：ステップ２で作成された第１の言語で記述された擬似的なモノリンガルコーパスＴ' _L1を使って言語モデル適応用の言語モデルＬＭ_T'L1（図２参照) を作成する。
【００２３】
ステップ４：ステップ１で作成した一般タスク用の言語モデルＬＭ_Gと、ステップ３で作成した言語モデル適応用の言語モデルＬＭ_T'L1とを結合することによって、ターゲットタスクに適応した第１言語の言語モデルを作成する。
【００２４】
なお、機械翻訳器としては、一般タスクのバイリンガルコーパスを使用して作成されたものが用いられているが、それ以外の方法で作成された機械翻訳器を用いてもよい。
【００２５】
また、言語モデル適応用の言語モデルＬＭ_T'L1に結合される一般タスク用の言語モデルＬＭ_Gとしては、ステップ１で機械翻訳器を作成する際に作成されたものが用いられているが、それ以外の方法で作成されたものを用いてもよい。
【００２６】
〔３〕機械翻訳器についての説明
【００２７】
上述したように、この発明による言語モデルの適応化方法においては、言語モデル適応用のデータを作成するために機械翻訳器が用いられる。さまざまな機械翻訳器の利用が可能であるが、後述する評価実験では翻訳の原理が明らかな統計的機械翻訳器を利用しているため、ここでは評価実験で用いられる統計的機械翻訳器の概要について説明する。
【００２８】
〔３−１〕統計的機械翻訳器の概要についての説明
評価実験で用いる翻訳器は、文献[6] のＩＢＭＭｏｄｅｌ４を基本とする統計的翻訳器である。
【００２９】
文献[6] ：P. Brown, et. al,"The mathematics of statistical machine translation: Parameter estimation "Computational Linguistics, 19(2),pp.263-311(1993).
【００３０】
翻訳の問題は、雑音のある通信路での復号の問題と見做すことができる。例えば、日本語文（Ｊ）から英語文（Ｅ）への翻訳を考えると、日本語の文から英語の文への翻訳で最も尤もらしい翻訳結果（ここではこれをＥ^*とする）を得るという問題は、次式（１）として表現される。
【００３１】
Ｅ^*＝ａｒｇｍａｘＰ（Ｅ｜Ｊ） …（１）
【００３２】
通常、この式をベイズの公式によって次式（２）に示すように変形し、ある１つの日本語の文を入力に定めた場合には分母が定数項となるので、分子のみで翻訳結果Ｅ^*の決定を行う。
【００３３】
Ｅ^*＝ａｒｇｍａｘＰ（Ｊ｜Ｅ）Ｐ（Ｅ）／Ｐ（Ｊ） …（２）
【００３４】
ここで、分子のＰ（Ｊ｜Ｅ）は「翻訳モデル」、Ｐ（Ｅ）は「言語モデル」と呼ばれる。言語モデルとしては、隣接単語のＮ−ｇｒａｍがしばしば用いられる。
【００３５】
これらのモデルのパラメータは全て学習用の一般タスクのバイリンガルコーパスから統計的に推定される。これらのモデルを用いることにより、訳語を選択し、Ｐ（Ｅ）の制約のもとで語順を整えて、翻訳を達成する。そのため、対象が新たなタスクの場合でも、モデルのパラメータが正しく推定されていれば、出力される翻訳結果の中に、局所的な文脈では比較的正しい単語並びが得られ、言語モデル適応用のデータとして利用できることが期待できる。
【００３６】
本発明では、機械翻訳器で生成されたテキストから作られた言語モデル適応用の言語モデルと、一般タスクのコーパスから作られた言語モデル（この実施の形態ではＰ（Ｅ））との線形結合により、タスクに適応化された言語モデルを作成する。
【００３７】
〔４〕評価実験
【００３８】
以下、本発明の有効性を翻訳先言語のターゲットタスクでのテストセットパープレキシティの削減によって確認する。これに用いるデータ、実験条件について説明する。
【００３９】
〔４−１〕一般およびターゲットタスクのデータ
本実験では、日英の対訳コーパス（以後、「フレーズブックコーパス」と呼ぶ）を利用する。文の内容は旅行時の会話表現である。
【００４０】
これらの表現は、あらかじめ人手によって、「空港」、「飛行機内」、「レストラン」などの場面を主とした複数のカテゴリに分類されている。分類カテゴリの例を表１に示した。
【００４１】
【表１】

【００４２】
カテゴリーの中の「空港」での会話表現を評価実験のターゲットタスクとして設定し、残りのカテゴリーを一般タスクとして設定した。これらの内訳は、表２の通りである。
【００４３】
【表２】

【００４４】
ターゲットタスクのコーパスの規模と適応の効果との関係を調べるために、サイズの異なるターゲットタスクコーパスを数通り用意した。表２の”General ”を一般タスクとし、”t1000 ”，”t2000 ”，”t4739 ”を量が異なる３つのターゲットタスクのコーパスとし、”test”をターゲットタスクの評価用コーパスとする。なお、文と単語の総数は英語で数えた結果である。
【００４５】
機械翻訳器へは、”t1000 ”，”t2000 ”および”t4739 ”の形態素解析済みの日本語形態素列を入力する。そして、出力された翻訳結果の英語文（単語への分割済）を適応用のデータＴ' _L1（図２参照) として利用する。
【００４６】
〔４−２〕実験の手順
ここでは、翻訳器によって作成されたテキストを用いて言語モデルの適応化を行い、そのモデルの性能をパープレキシティーの削除量で評価する。
【００４７】
最初に、翻訳器によって完全な訳文（本実験では英文）が作成される場合を想定して、ターゲットタスクの３種類のコーパスのそれぞれに対応するフレーズブックコーパス内の訳文（ここでは英語文）を使って言語モデルの適応化を行った場合のパープレキシティーの削減量を調査する。
【００４８】
次に、翻訳器によって翻訳されたデータを言語モデルの適応に用いたことの効果（本発明の効果）を調査する。
【００４９】
まず、上記実施の形態で説明した手順にしたがって、統計的機械翻訳器の翻訳モデルと言語モデルとを作成する。つまり、一般タスクのバイリンガルコーパス（”General ”の両言語）のみを使用して、翻訳器とターゲット言語（本実験では英語）の言語モデルＬＭ_G（図２参照）とを作成する。
【００５０】
次に、ターゲットタスクのモノリンガルコーパス（本実験では、日本語のt1000 ，t2000 ，t4739 ）のみを上記機械翻訳器で翻訳し、得られた結果を使用して、ターゲット言語( 本実験では、英語) でのターゲットタスク用の言語モデルＬＭ_T（添字のＴは、t1000 ，t2000 ，t4739 を表す）を作成する。
【００５１】
最後に、２つの言語モデルＬＭ_GとＬＭ_Tとを線形結合することにより、適応化した言語モデルを作成する。本実験では、言語モデルには単語３gramを、線形結合の結合係数は削除補間法（上記文献[1] 参照）によって決めた。
【００５２】
本実験は、言語モデルの新たな適応先となるターゲットタスクにだけ出現する単語とその訳語の対が、機械翻訳器に存在しない設定の元で行った。
【００５３】
〔４−３〕実験結果と考察
【００５４】
以下に示す結果は全て、翻訳先言語のターゲットタスクのテストセット（本実験では「空港」タスクの英語）での単語パープレキシティー（ＰＰ値）またはその削減率で示す。
【００５５】
〔４−３−１〕対訳コーパスの理想訳を使った場合（性能の上限）
ターゲットタスクのデータを仮に集めることができた場合に、どこまでＰＰ値の削減が得られるかという性能の上限を確認しておくために、対訳コーパスの訳を使って適応した場合のＰＰ値を計算した。
【００５６】
この結果を、表３の２〜４行目（”General ＋t ＜サイズ＞”）に示す。同じ表３の１行目（”General only”）は、一般タスクのバイリンガルコーパスだけから作られた言語モデルでのＰＰ値である。
【００５７】
【表３】

【００５８】
このように、一般タスクの言語モデルとターゲットタスクの言語モデルとの適応によって作成された言語モデルにおけるＰＰ値のほうが、他に比べて小さくなっていることがわかる。
【００５９】
また、図４の実線Ａは、ターゲットタスクのコーパスを含まない一般タスクのバイリンガルコーパスの増加に伴うＰＰの変化を示すグラフであり、破線Ｂは一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパス内の訳文（英文）を順次追加した場合のＰＰの変化を示すグラフ（表３のGeneral ＋t ＜サイズ＞でのＰＰ値）である。図の横軸は文数（コーパスサイズ）を、縦軸はＰＰ値を示している。
【００６０】
図４から、一般タスクのコーパスをさらに倍の規模だけ集める（実線の右側への延長）よりもターゲットタスクのデータを適応させたほう（破線）がＰＰ値の削減の効果が大きくなっていることがわかる。
【００６１】
以上２点から、本データでは、言語モデル適応の方が有効であることが観察された。
【００６２】
これらの結果から、本実験の設定では、翻訳が１００％成功すれば、適応先のデータが増加するに伴って、適応先のタスクでのオープンなテストセットのＰＰ値が削減し、最大３８．０％近くまで相対的にＰＰ値を下げられる可能性があることがわかる。
【００６３】
〔４−３−２〕翻訳されたテキストを用いる場合
次に、理想的な翻訳結果を用いる代わりに、実際に機械翻訳器を使って作成されたターゲットタスクのデータを適応に使った場合のＰＰ値を、各組合わせについて計算した結果を表４に示す。
【００６４】
【表４】

【００６５】
表４から、機械翻訳器によって作成された、新しいターゲットタスクに対するテキストを言語モデルの適応に用いることにより、適応前のＰＰ値からの１３％のＰＰ値の削減が得られることがわかる。
なお、図５の実線Ｂは、図４の破線に相当するものであり、一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパス内の訳文（英文）を順次追加した場合のＰＰの変化を示すグラフであり、破線Ｃは、一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパスの日本文を機械翻訳した文（英文）を順次追加した場合のＰＰの変化を示すグラフである。
【００６６】
【発明の効果】
この発明によれば、適応化しようとする言語モデルの言語以外の言語で記述された新規タスクでのモノリンガルコーパスを用いることによって、新規タスクに適応した言語モデルを作成することができるようになる。この結果、各言語の小規模コーパスを集めなくて済み、そのためコストがかからなくなる。
【図面の簡単な説明】
【図１】一般的な意味での言語モデルの適応という課題を説明するための模式図である。
【図２】本発明での言語モデルの適応という課題を説明するための模式図である。
【図３】本発明による言語モデルの適応化方法の手順を示すフローチャートである。
【図４】ターゲットタスクのコーパスを含まない一般タスクのバイリンガルコーパスの増加に伴うＰＰの変化と、一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパス内の訳文（英文）を順次追加した場合のＰＰの変化とをそれぞれ示すグラフである。
【図５】一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパス内の訳文（英文）を順次追加した場合のＰＰの変化と、一般タスクのバイリンガルコーパスにターゲットタスクのバイリンガルコーパスの日本文を機械翻訳した文（英文）を順次追加した場合のＰＰの変化とをそれぞれ示すグラフである。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a language model adaptation method.
[0002]
[Prior art]
In order to adapt a language model of a certain language to an adaptation destination task, a corpus in the adaptation destination task is required even in a small scale, and a corpus described in the language of the language model to be adapted is required. When task adaptation is applied to the language model of a multilingual spoken language speech translator, a small corpus of tasks to be adapted to each language is required. This collection is expensive and difficult to collect.
[0003]
Therefore, in the past, language models were adapted by using a small amount of collected corpus (see reference [1]), or by using information obtained by requesting information from the World Wide Web (WWW). Some models were adapted (see reference [2]).
[0004]
Reference [1]: AI Rudnicky: "Language Modeling with Limited Domain Data," Proc. Of the ARPA Spoken Language Systems Technogy Workshop, pp. 66-69 (1995).
Reference [2]: A. Berger, et al .: "Just-In-Time Language Modeling," Proc. Of the ICASSP, pp. 705-708 (1998).
[0005]
In these cases, the subject was dictation, and in many cases, the written words that existed in large quantities compared to spoken language were collected and used. Also, in the field of medical findings dictation, where man-machine interface is difficult to collect data, CFG (Context Free Grammar) describing task conversations is created manually, and artificially generated by the created CFG The language model was adapted using the obtained data (Refs. [3] and [4]).
[0006]
Reference [3]: Nobuyasu Ito, Shiho Kanno, Hitoshi Niijima: "Task adaptation of N-gram model using grammar," Proc. Of the 4th Annual Conference of the Language Processing Society, pp. 610-613 (1998) .
Reference [4]: Y. Wang, et al ed., "A Unified Context-Free Grammar and N-gram Model for Spoken Language Processing," Proc. Of ICASSP, 2000.
[0007]
[Problems to be solved by the invention]
The present invention is an adaptation of a language model capable of creating a language model adapted to a new task by using a monolingual corpus in a new task described in a language other than the language of the language model to be adapted. It aims to provide a method.
[0008]
[Means for Solving the Problems]
The invention according to claim 1 is a language model adaptation method for creating a language model of a first language adapted to a new task when expanding a task of a speech translator. By translating the first monolingual corpus of the new task described in the second language into the first language using a machine translator, the second in the new task described in the first language And a second step of adapting the language model based on the second monolingual corpus in the new task created in the first step. To do.
[0009]
According to a second aspect of the present invention, in the language model adaptation method according to the first aspect, the second step is a second task described in the first language in the new task created in the first step. A step of creating a language model of the first language adapted to the new task from the monolingual corpus and the monolingual corpus described in the first language in the general task is provided.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0011]
[1] Description of the outline of the present invention In order to adapt a language model of a certain language to an adaptation destination task, it is a corpus of the adaptation destination task even if it is a small scale, and the language model language to be adapted is used. A written corpus is required. When task adaptation is applied to the language model of a multilingual spoken language speech translator, a small corpus of tasks to be adapted to each language is required. This collection is expensive and difficult to collect.
[0012]
In the present invention, the language model is adapted based on a monolingual corpus in a new adaptation destination task described in a language other than the language of the language model to be adapted. In other words, it is described in the language model language to be adapted by translating a monolingual corpus in a new adaptation target task described in a language other than the language model language to be adapted by a machine translator. A pseudo corpus is created in the adaptation target task, and the statistical language model is adapted using it.
[0013]
It can be expected that the translation information of the machine translator retains connection information between adjacent words. The pseudo corpus obtained by translating the monolingual corpus of the target task with a translator, even if the translation as a whole contains mistakes, the topic and sentence styles match the target task, In addition, it is considered that an appropriate word order is reflected in a local context such as adjacent words. Therefore, it can be expected that the created pseudo corpus can be used as a corpus for language model adaptation.
[0014]
[2] Description of Language Model Adaptation Method First, the problem of language model adaptation will be clarified. Here, the source language is Language2 (for example, Japanese) and the target language is Language1 (for example, English).
[0015]
As shown in Figure 1, the task of adapting the language model in a general sense was adapted to the target task using the data of the general task and the data of the target task (Target Task). Creating a language model. For this reason, even if it is small-scale, the corpus of the target task of an adaptation destination is needed.
[0016]
On the other hand, the problem of adaptation of the language model in the present invention is as shown in FIG. That is, in the adaptation of the language model of the target language (Language 1) in FIG. 2, when there is no language data T _L1 for the target task (Target Task), the language data T ′ _L1 as a substitute for the language data T ′ _L1 It is created by linguistic translation from data _{TL2 in} other languages and used to adapt the language model.
[0017]
In the translator used to create the language data T ′ _L1 of the target language (Language 1), local word order information of the target language is held in the knowledge for translation. Therefore, there is a possibility that a translation result with a relatively correct word order can be obtained between adjacent words. As a result, information about connectivity between adjacent words necessary for the language model can be obtained from the created language data T ′ _L1 .
[0018]
There are various adaptation methods, such as MAP adaptation (see reference [5]) and linear combination of models (see reference [1] above). In this embodiment, linear combination of models is used.
[0019]
Reference [5]: H. Masataki, et.al: "Task Adaptation using MAP Estimation in N-gram Language Modeling," Proc. Of the ICASSP, 1997, pp.783-786.
[0020]
FIG. 3 shows the procedure of the language model adaptation method according to the present invention.
Here, a method of adapting a language model when generating a language model of a first language adapted to a target task (new task) will be described.
Step 1: Create a machine translator using a general task bilingual corpus. Here, a language model LM _G (see FIG. 2) for the general task using only the general task is also created.
[0021]
Step 2: Use the machine translator created in Step 1 to translate the monolingual corpus T _L2 (see Fig. 2) of the target task described in the second language other than the first language. A pseudo monolingual corpus T ′ _L1 (see FIG. 2) of the described target task is created.
[0022]
Step 3: A language model LM _T′L1 (see FIG. 2) for language model adaptation is created using the pseudo monolingual corpus T ′ _L1 described in the first language created in Step 2.
[0023]
Step 4: By combining the language model LM _G for the general task created in Step 1 and the language model LM _T′L1 for language model adaptation created in Step 3, the first language adapted to the target task Create a language model.
[0024]
As the machine translator, a machine translator created using a general task bilingual corpus is used, but a machine translator created by other methods may be used.
[0025]
In addition, as the language model LM _G for general tasks combined with the language model LM _T′L1 for language model adaptation, the one created when the machine translator is created in Step 1 is used. You may use what was created by the method of other than that.
[0026]
[3] Explanation of machine translator [0027]
As described above, in the language model adaptation method according to the present invention, a machine translator is used to create data for language model adaptation. Various machine translators can be used, but in the evaluation experiment described later, a statistical machine translator whose translation principle is clear is used. Therefore, here is an overview of the statistical machine translator used in the evaluation experiment. Will be described.
[0028]
[3-1] Description of Outline of Statistical Machine Translator The translator used in the evaluation experiment is a statistical translator based on the IBM Model 4 of Reference [6].
[0029]
Reference [6]: P. Brown, et. Al, "The mathematics of statistical machine translation: Parameter estimation" Computational Linguistics, 19 (2), pp.263-311 (1993).
[0030]
The translation problem can be regarded as a decoding problem in a noisy channel. For example, considering a translation from a Japanese sentence (J) to an English sentence (E), the most likely translation result from a Japanese sentence to an English sentence (here, E ^* ) is obtained. The problem is expressed as the following equation (1).
[0031]
E ^* = argmaxP (E | J) (1)
[0032]
Normally, when this formula is transformed as shown in the following formula (2) by the Bayes formula and a certain Japanese sentence is set as input, the denominator becomes a constant term. ^{* Make} a decision.
[0033]
E ^* = argmaxP (J | E) P (E) / P (J) (2)
[0034]
Here, P (J | E) of the molecule is called “translation model”, and P (E) is called “language model”. As a language model, N-gram of adjacent words is often used.
[0035]
All parameters of these models are statistically estimated from the bilingual corpus of general tasks for learning. By using these models, translations are selected and the word order is arranged under the constraint of P (E) to achieve translation. Therefore, even if the target is a new task, if the model parameters are correctly estimated, a relatively correct word sequence can be obtained in the local context in the output translation results. It can be expected that it can be used as data.
[0036]
In the present invention, a linear combination of a language model for language model adaptation made from text generated by a machine translator and a language model made from a corpus of a general task (P (E) in this embodiment) To create a language model adapted to the task.
[0037]
[4] Evaluation experiment
Hereinafter, the effectiveness of the present invention will be confirmed by reducing the test set perplexity in the target task of the target language. Data and experimental conditions used for this will be described.
[0039]
[4-1] General and Target Task Data In this experiment, a Japanese-English bilingual corpus (hereinafter referred to as “phrase book corpus”) is used. The content of the sentence is a conversational expression when traveling.
[0040]
These expressions are classified into a plurality of categories mainly by scenes such as “airport”, “in the airplane”, and “restaurant” in advance by hand. Examples of classification categories are shown in Table 1.
[0041]
[Table 1]

[0042]
The conversation expression at “airport” in the category was set as the target task of the evaluation experiment, and the remaining categories were set as general tasks. These breakdowns are shown in Table 2.
[0043]
[Table 2]

[0044]
In order to investigate the relationship between the scale of the target task corpus and the effect of adaptation, several target task corpora of different sizes were prepared. “General” in Table 2 is a general task, “t1000”, “t2000”, and “t4739” are corpora of three target tasks having different amounts, and “test” is a corpus for evaluation of the target task. The total number of sentences and words is the result of counting in English.
[0045]
To the machine translator, Japanese morpheme strings having been subjected to morphological analysis of “t1000”, “t2000”, and “t4739” are input. Then, the output translation result English sentence (divided into words) is used as adaptation data T ′ _L1 (see FIG. 2).
[0046]
[4-2] Procedure of experiment Here, the language model is adapted using the text created by the translator, and the performance of the model is evaluated by the amount of deletion of perplexity.
[0047]
First, assuming that a complete translation (English in this experiment) is created by the translator, translations (in this case, English) in the phrase book corpus corresponding to each of the three types of corpus of the target task Investigate the amount of perplexity reduction when language models are used for adaptation.
[0048]
Next, the effect (the effect of the present invention) of using the data translated by the translator for adaptation of the language model is investigated.
[0049]
First, a statistical machine translator translation model and a language model are created according to the procedure described in the above embodiment. That is, the language model LM _G (see FIG. 2) of the translator and the target language (English in this experiment) is created using only the general task bilingual corpus (both “General” languages).
[0050]
Next, only the monolingual corpus of the target task (in this experiment, Japanese t1000, t2000, t4739) is translated by the above machine translator, and the result is used to obtain the target language (in this experiment, English ) To create a language model LM _T for the target task (subscript T represents t1000, t2000, t4739).
[0051]
Finally, an adapted language model is created by linearly combining the two language models LM _G and LM _T. In this experiment, 3 gram words were used for the language model, and the linear coupling coefficient was determined by the deletion interpolation method (see reference [1] above).
[0052]
This experiment was performed under a setting in which a pair of a word that appears only in a target task as a new adaptation destination of a language model and its translation does not exist in a machine translator.
[0053]
[4-3] Experimental results and discussion
All results shown below are expressed in terms of the word perplexity (PP value) in the target language test set of the target language (in this experiment, English for the “airport” task) or its reduction rate.
[0055]
[4-3-1] Using an ideal translation of a parallel corpus (upper limit of performance)
If the target task data can be collected, calculate the PP value when adapting using the translation of the bilingual corpus, in order to confirm the upper limit of the performance of how much PP value reduction can be obtained. did.
[0056]
The results are shown in the second to fourth lines ("General + t <size>") of Table 3. The first row ("General only") of the same Table 3 is a PP value in a language model created only from a bilingual corpus of general tasks.
[0057]
[Table 3]

[0058]
Thus, it can be seen that the PP value in the language model created by adapting the language model of the general task and the language model of the target task is smaller than the others.
[0059]
In addition, a solid line A in FIG. 4 is a graph showing a change in PP with an increase in the bilingual corpus of the general task that does not include the target task corpus, and a broken line B shows the bilingual corpus of the general task in the bilingual corpus of the target task. It is a graph (PP value in General + t <size> in Table 3) showing a change in PP when translations (English) are sequentially added. In the figure, the horizontal axis indicates the number of sentences (corpus size), and the vertical axis indicates the PP value.
[0060]
From Fig. 4, the effect of reducing the PP value is greater when the target task data is adapted (dashed line) than when the general task corpus is collected twice as much (extension to the right of the solid line). I understand.
[0061]
From these two points, it was observed that language model adaptation is more effective in this data.
[0062]
From these results, in the setting of this experiment, if the translation is 100% successful, the PP value of the open test set in the adaptation destination task decreases as the adaptation destination data increases, and the maximum 38. It can be seen that there is a possibility that the PP value can be relatively lowered to near 0%.
[0063]
[4-3-2] Using translated text Next, instead of using the ideal translation result, the PP when the target task data actually created using a machine translator is used adaptively The results of calculating the values for each combination are shown in Table 4.
[0064]
[Table 4]

[0065]
From Table 4, it can be seen that using the text for the new target task created by the machine translator for language model adaptation results in a 13% PP value reduction from the pre-adaptation PP value.
The solid line B in FIG. 5 corresponds to the broken line in FIG. 4 and is a graph showing changes in PP when a translation (English) in the target task's bilingual corpus is sequentially added to the bilingual corpus of the general task. A broken line C is a graph showing a change in PP when a sentence (English sentence) obtained by machine-translating Japanese sentences of the target task bilingual corpus is sequentially added to the bilingual corpus of the general task.
[0066]
【The invention's effect】
According to the present invention, a language model adapted to a new task can be created by using a monolingual corpus of a new task described in a language other than the language of the language model to be adapted. . As a result, it is not necessary to collect small-scale corpora for each language, and thus cost is not increased.
[Brief description of the drawings]
FIG. 1 is a schematic diagram for explaining a problem of adaptation of a language model in a general sense.
FIG. 2 is a schematic diagram for explaining a problem of adaptation of a language model in the present invention.
FIG. 3 is a flowchart illustrating a procedure of a language model adaptation method according to the present invention.
FIG. 4 shows changes in PP with an increase in the bilingual corpus of general tasks that do not include the target task corpus, and the PP in the case where translations (English) in the target task's bilingual corpus are sequentially added to the bilingual corpus of the general task It is a graph which shows each change.
[Fig.5] Machine translation of the change in PP when bilingual corpus of target task is sequentially added to bilingual corpus of general task and Japanese translation of target task's bilingual corpus into bilingual corpus of general task It is a graph which respectively shows the change of PP when a sentence (English sentence) is added sequentially.

Claims

音声翻訳器のタスクを拡大する際に、新規タスクに適応した第１の言語の言語モデルを作成するための言語モデルの適応化方法において、
第１の言語以外の第２の言語で記述された新規タスクの第１のモノリンガルコーパスを、機械翻訳器を用いて第１の言語に翻訳することによって、第１の言語で記述された新規タスクでの第２のモノリンガルコーパスを作成する第１ステップ、および
第１ステップで作成された新規タスクでの第２のモノリンガルコーパスに基づいて、言語モデルを適応化する第２ステップ、
を備えていることを特徴とする言語モデルの適応化方法。In a language model adaptation method for creating a language model of a first language adapted to a new task when expanding a task of a speech translator,
By translating the first monolingual corpus of a new task described in a second language other than the first language into the first language using a machine translator, a new one described in the first language A first step of creating a second monolingual corpus in the task, and a second step of adapting the language model based on the second monolingual corpus in the new task created in the first step;
A language model adaptation method characterized by comprising:

第２ステップは、第１ステップで作成された新規タスクでの第１の言語で記述された第２のモノリンガルコーパス、および一般タスクでの第１の言語で記述されたモノリンガルコーパスから、新規タスクに適応した第１の言語の言語モデルを作成するステップを備えていることを特徴とする請求項１に記載の言語モデルの適応化方法。The second step includes a new monolingual corpus written in the first language in the new task created in the first step and a monolingual corpus written in the first language in the general task. The method for adapting a language model according to claim 1, further comprising the step of creating a language model of the first language adapted to the task.