JP5137588B2

JP5137588B2 - Language model generation apparatus and speech recognition apparatus

Info

Publication number: JP5137588B2
Application number: JP2008002194A
Authority: JP
Inventors: 啓恭伍井; 利行花沢; 知弘岩崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2008-01-09
Filing date: 2008-01-09
Publication date: 2013-02-06
Anticipated expiration: 2028-01-09
Also published as: JP2009163109A

Description

本発明は、言語モデル生成装置、音声認識装置等に関し、特に、統計的言語モデルを用いた言語モデル生成装置及びその言語モデルを用いた音声認識装置等に関するものである。 The present invention relates to a language model generation device, a speech recognition device, and the like, and more particularly to a language model generation device that uses a statistical language model, a speech recognition device that uses the language model, and the like.

自然言語の統計量を用いた解析技術は多くの文書処理に応用されている。例えば、コールセンターにおけるオペレータ通話音声を音声認識によりテキスト化する手段として有用であり、より認識精度の向上が望まれる。以下用いる専門用語は、従来技術文献１：鹿野清宏、伊藤克亘、河原達也、武田一哉、山本幹雄著：「音声認識システム」株式会社オーム社，平成１３年５月１５日（以下教科書１）、または従来技術文献２：北研二、辻井潤一著：「確率的言語モデル」,東京大学出版会、1999年11月25日（以下教科書２）、または従来技術文献３：中川聖一著：「確率モデルによる音声認識」社団法人電子情報通信学会，昭和６３年７月１日（以下教科書３）、または従来技術文献４：長尾真著,「自然言語処理」,岩波書店，1996年4月26日（以下教科書４）に著される用語を用いるものとする。 Analysis techniques using natural language statistics have been applied to many document processing. For example, it is useful as a means for converting operator call voice in a call center into text by voice recognition, and further improvement in recognition accuracy is desired. Technical terms used in the following are: Prior Art Document 1: Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto: “Speech Recognition System” Ohm Co., Ltd., May 15, 2001 (hereinafter textbook 1), Or Prior Art Document 2: Kenji Kita, Junichi Sakurai: “Probabilistic Language Model”, University of Tokyo Press, November 25, 1999 (hereinafter textbook 2), or Prior Art Document 3: Seiichi Nakagawa: “Probability Speech recognition by model "The Institute of Electronics, Information and Communication Engineers, July 1, 1988 (hereinafter textbook 3), or prior art reference 4: Makoto Nagao," Natural Language Processing ", Iwanami Shoten, April 26, 1996 Terms used in the following (textbook 4) shall be used.

音声を精度よく認識するために、言語モデルとしてＮグラムを用いる方式が注目されている。（教科書１〜４参照）しかし、信頼性のある統計量を得るためには大量のコーパスからＮグラムの表を作成するため、表の増加を解決する方法として、Ｎグラムの圧縮方式が提案されている。逆に、Ｎグラムはコーパスから学習するため、十分な量のコーパスが得られないと信頼性のある統計量が得られないという課題がある。 In order to recognize speech accurately, a method using N-grams as a language model has attracted attention. (Refer to textbooks 1 to 4.) However, in order to obtain reliable statistics, N-gram tables are created from a large number of corpora. ing. Conversely, since N-grams are learned from a corpus, there is a problem that reliable statistics cannot be obtained unless a sufficient amount of corpus is obtained.

言語モデルとしては、基本的な単語２グラム又は単語３グラムの単語Ｎグラムモデルが広く用いられている。ここで、単語Ｎグラムを用いた言語尤度の計算について説明する。まず、単語列Ｗ１，Ｗ２，・・・，ＷＬの言語尤度ｌｏｇＰ（Ｗ１，Ｗ２，・・・，ＷＬ）は、条件付き確率を用いて下記に示される式（１）で表される。 As a language model, a basic word 2-gram or word 3-gram word N-gram model is widely used. Here, calculation of language likelihood using the word N-gram will be described. First, the language likelihood logP (W1, W2,..., WL) of the word strings W1, W2,..., WL is expressed by the following formula (1) using conditional probabilities.

式（１）右辺の条件付き確率Ｐ｛Ｗｉ｜Ｗ１，Ｗ２，・・・，Ｗ（ｉ−１）｝は、先行単語列Ｗ１，Ｗ２，・・・，Ｗ（ｉ−１）の後に当該単語Ｗｉが生起する確率であり、この先行単語列をＮ−１個で近似するのが単語Ｎグラムモデルである。そして、先行単語列を１個で近似する単語２グラムでは、下記に示される式（２）の近似式で表される。 The conditional probability P {Wi | W1, W2,..., W (i-1)} on the right side of the equation (1) is obtained after the preceding word string W1, W2,. The word N-gram model is the probability that a word Wi will occur, and this preceding word string is approximated by N-1. A 2-gram word that approximates one preceding word string is represented by an approximate expression of Expression (2) shown below.

同様に、先行単語列を２個で近似する単語３グラムでは、下記に示される式（３）の近似式で表される。 Similarly, a 3-gram word that approximates two preceding word strings is represented by an approximate expression of Expression (3) shown below.

音声認識の際には、認識途中の単語列候補に対して、単語音声の確率モデルであるＨＭＭ（ＨｉｄｄｅＮＭａｒｋｏｖＭｏｄｅｌ）等の音響モデルを用いて音響尤度を求め、さらに上記のようにして言語尤度を求め、それらを重み付き加算した総合尤度で単語列候補に序列をつけて、音声認識を行う。
ところで、Ｎグラムモデルには多種のバリエーションがあるが、ここでは、特に本件と関連性のある２つの従来技術を以下に説明する。 In speech recognition, an acoustic likelihood is obtained for a candidate word string in the middle of recognition using an acoustic model such as HMM (HiddeN Markov Model) which is a probability model of word speech, and further, the language is processed as described above. The likelihood is obtained, and the word string candidates are ranked according to the total likelihood obtained by weighted addition, and speech recognition is performed.
By the way, there are various variations in the N-gram model, but here, two conventional techniques that are particularly relevant to the present case will be described below.

Ｎグラムモデルの第１バリエーションとして、単語集団の中で、共通の性質を持つものをクラス化して確率を計算するものがある（例えば、特許文献１参照）。なお、この技術を、以下「第１の従来例」とも記す。この第１の従来例のクラスＮグラムモデルでは、単語Ｎグラムがクラスを用いて式（４）のように近似される（Ｎ＝２の場合）。ここで、Ｃｉはクラス化した単語を示す。 As a first variation of the N-gram model, there is one in which probabilities are calculated by classifying words having a common property in a word group (see, for example, Patent Document 1). This technique is hereinafter also referred to as “first conventional example”. In the class N gram model of the first conventional example, the word N gram is approximated as shown in Expression (4) using the class (when N = 2). Here, Ci indicates a classified word.

クラスを介して言語尤度を求めることで、学習データの少ない単語列に対して言語尤度の精度が悪いというデータ量が不十分の問題に対して効果がある。しかし、このクラスＮグラムモデルでは、コーパスのデータ量不足に対する効果はあるものの、言語制約としては、単語Ｎグラムと比較すると弱くなってしまうという課題がある。 Obtaining the language likelihood via the class is effective for the problem of insufficient data amount that the accuracy of the language likelihood is poor for a word string with less learning data. However, although this class N-gram model has an effect on the shortage of the corpus data amount, there is a problem that the language restriction becomes weaker than the word N-gram.

このような課題を解決するために、クラスＮグラムを上位階層とし、下位階層に単語Ｎグラムを適用するものがある。この技術を、以下「第２の従来例」とも記す。（特許文献２参照） In order to solve such a problem, there is one in which a class N-gram is set as an upper layer and a word N-gram is applied to a lower layer. Hereinafter, this technique is also referred to as “second conventional example”. (See Patent Document 2)

この第２の従来例では、例えば、テレビ番組名である「太陽を撃て」が「明日の太陽を撃てを録画」という文に含まれている場合に式（５）のように近似する。こうすることにより、テレビ番組名（＜ｔｉｔｌｅ＞とする）の前後の文脈が、Ｐ（＜ｔｉｔｌｅ＞｜の）と、Ｐ（を｜＜ｔｉｔｌｅ＞）とで表されるために、データ量が不十分による問題に対処できて、かつ、テレビ番組名が単語列で表されるために認識辞書も小さく、しかも、テレビ番組名を単語列で表すために、高い認識性能を確保できるとしている。 In this second conventional example, for example, when the TV program name “shoot the sun” is included in the sentence “shoot tomorrow's sun,” it is approximated as in equation (5). By doing this, since the context before and after the TV program name (<title>) is represented by P (<title> |) and P (<| title>), the amount of data is reduced. It is possible to deal with the problem due to insufficiency, and since the television program name is represented by a word string, the recognition dictionary is small, and since the television program name is represented by a word string, high recognition performance can be ensured.

特開２０００−２５９１７５号公報JP 2000-259175 A 国際公開番号ＷＯ２００４／０３４３７８号公報International Publication No. WO2004 / 034378

学習データ量の少ない単語列に対して言語尤度の精度が悪いというデータ量が不十分の問題（単語連鎖のスパースネスの問題）を解決するために氏名、住所、製品名などの形態素列をクラス化する手法が用いられる。クラス化は、例えば「三菱太郎様のご住所は神奈川県鎌倉市大船の」という例文表現を、「＜姓＞＜名＞様のご住所は＜県＞＜市＞＜町＞の」というように、人手で意味毎の＜＞で括られた特殊表現文字列に置き換え、＜＞で括られた特殊表現文字列を個々のクラスとして扱うことで実現できる。この部分をクラスとするＮグラムモデルを構成すれば、可能性のある単語をカバーできるが、前述のとおり、単語Ｎグラムと比較すると言語制約が弱くなってしまうという課題がある。 Class morpheme strings such as name, address, and product name to solve the problem of insufficient data volume (word chain sparseness problem) due to poor language likelihood accuracy for word strings with small learning data volume Is used. Classifying, for example, “Mitsubishi Taro ’s address is Ofuna, Kamakura City, Kanagawa Prefecture”, and “<Last Name> <First Name ’s Address is <Prefecture> <City> <Town>” In addition, it can be realized by manually replacing the special expression character strings enclosed in <> for each meaning and treating the special expression character strings enclosed in <> as individual classes. By constructing an N-gram model with this part as a class, possible words can be covered. However, as described above, there is a problem that the language restriction becomes weaker compared to the word N-gram.

これについて例をあげて説明する。図１に都道府県クラスと市区郡クラスのＮグラムの連鎖の例を示す。都道府県クラスとして、「神奈川県」、「東京都」の都道府県名の形態素が含まれており、市区郡クラスとして「横浜市」や「新宿区」などの市区郡名の形態素が登録されている。しかし、このクラスＮグラム連鎖を用いると、正しい住所名以外に、「神奈川県／新宿区」や「東京都／横浜市」などの正しくない形態素連鎖を許してしまうため、形態素Ｎグラムと比べてクラスＮグラムは言語制約が弱くなってしまう。 This will be described with an example. FIG. 1 shows an example of a chain of N-grams of prefecture classes and city classes. The prefecture classes include morphemes of prefecture names such as “Kanagawa Prefecture” and “Tokyo”, and city names such as “Yokohama City” and “Shinjuku City” are registered as city classes. Has been. However, using this class N-gram chain allows incorrect morpheme chains such as “Kanagawa / Shinjuku-ku” and “Tokyo / Yokohama City” in addition to the correct address name. Class N-grams have weak language restrictions.

また、クラスＮグラムを上位階層とし、下位階層に単語Ｎグラムを適用する方法では、下位階層の単語Ｎグラムは上位階層ではクラスＮグラムとして統合されてしまうため、下位階層の単語と上位階層の単語の連鎖統計量を信頼度よく推定することができないという課題がある。 Further, in the method in which the class N gram is an upper layer and the word N gram is applied to the lower layer, the lower layer word N gram is integrated as a class N gram in the upper layer. There is a problem that the chain statistics of words cannot be estimated with high reliability.

これについて実例をあげて説明する。例えば、住所表現である「鎌倉市の大船」が「県内鎌倉市の大船です」という文に含まれている場合を考える。文脈から県内の県は神奈川県であることがわかるが、第２の従来では式（６）のように近似される。 This will be described with an example. For example, let us consider a case where the address expression “Osuna in Kamakura City” is included in the sentence “Osuna in Kamakura City in the Prefecture”. From the context, it can be seen that the prefecture in the prefecture is Kanagawa prefecture, but in the second conventional case, it is approximated as in equation (6).

このため、＜住所＞で現される住所表現の前後の文脈が、Ｐ（＜住所＞｜県内）と、Ｐ（です｜＜住所＞）とで分離されるため、県内-鎌倉市以外の連接（例えば県内-新宿区）も同一の確率が与えられてしまうという課題がある。
本発明は、このような課題を解決するためになされたものである。 For this reason, the context before and after the address expression represented by <address> is separated by P (<address> | prefecture) and P (is | <address>). (For example, the prefecture-Shinjuku-ku) also has a problem that the same probability is given.
The present invention has been made to solve such problems.

本発明に係る言語モデル生成装置は、
コーパスから形態素とクラスによるＮグラム言語モデルを生成するＮグラム言語モデル生成装置であって、
言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスと、
予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスと、
第１のコーパスの各クラスに、第２のコーパスの一連の形態素のうち、第１のコーパスの一連のクラスと同一クラスの品詞の各形態素を置き換え展開形態素列を生成し、出力する特殊表現展開部と、
特殊表現展開部の処理結果を格納する統合コーパスと、
統合コーパスに格納された展開形態素を入力し単語Nグラムモデルを出力するNグラム辞書生成部とを備える。 The language model generation device according to the present invention is:
An N-gram language model generation device that generates an N-gram language model based on morphemes and classes from a corpus,
A first corpus in which example sentences for language model generation are partly grouped by morphemes and classes;
A second corpus describing a chain example of a morpheme set belonging to a class created in advance with a morpheme string;
In each class of the first corpus , among the series of morphemes of the second corpus , replace each morpheme of the part of speech of the same class as the series of classes of the first corpus to generate and output a special expression expansion that outputs And
An integrated corpus that stores the processing results of the special expression expansion unit;
An N-gram dictionary generation unit that inputs the expanded morpheme stored in the integrated corpus and outputs a word N-gram model.

また、本発明に係る音声認識装置は、
言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスと、
予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスと、
第１のコーパスの各クラスに、第２のコーパスの一連の形態素のうち、第１のコーパスの一連のクラスと同一クラスの品詞の各形態素を置き換え展開形態素列を生成し、出力する特殊表現展開部と、
特殊表現展開部の処理結果を格納する統合コーパスと、
統合コーパスに格納された展開形態素を入力し単語Nグラムモデルを生成するとともに、この単語Ｎグラムに前接続と後接続の形態素がクラス化された特殊表現またはそれ以外の非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合の異なる２つのバックオフ係数を出力するNグラム辞書生成部と、
Nグラム辞書生成部で生成された単語Nグラムモデルと異なる２つのバックオフ係数を格納する単語クラスNグラムと、
音声入力部でとり込まれた音声の認識を行うとき、前接の形態素により単語Ｎクラスグラムに格納されたバックオフ係数を選択して認識する音声認識部と、音声認識結果を出力するデータ出力部を備える。
The speech recognition apparatus according to the present invention is
A first corpus in which example sentences for language model generation are partly grouped by morphemes and classes;
A second corpus describing a chain example of a morpheme set belonging to a class created in advance with a morpheme string;
In each class of the first corpus , among the series of morphemes of the second corpus , replace each morpheme of the part of speech of the same class as the series of classes of the first corpus to generate and output a special expression expansion that outputs And
An integrated corpus that stores the processing results of the special expression expansion unit;
The expanded morpheme stored in the integrated corpus is input to generate a word N-gram model, and the same expression of the special expression in which the morphemes of the pre-connection and the post-connection are classified into this word N-gram or other non-special expressions. An N-gram dictionary generation unit that outputs two different back-off coefficients in the case of connection different from the case of connection;
A word class N-gram storing two back-off coefficients different from the word N-gram model generated by the N-gram dictionary generation unit;
A speech recognition unit that selects and recognizes a back-off coefficient stored in a word N-class gram by a front morpheme when a speech input unit recognizes speech, and a data output that outputs a speech recognition result A part.

本発明に係る言語モデル生成装置によれば、単語系列展開手段により、言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスのクラス化列に、予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスの形態素列を埋め込み展開することにより、学習データ量の少ない単語列の言語尤度の精度の低さを防ぐことができ、かつ、単語Ｎグラムと比較し言語制約が弱くなるという問題も防ぐことができる。 According to the language model generation apparatus according to the present invention, the example sentences for the purpose of generating the language model are created in advance in the classification sequence of the first corpus in which the example sentences for the purpose of language model generation are partly serialized by morphemes and classes. By embedding and expanding the morpheme sequence of the second corpus that describes the chain example of the morpheme set belonging to the class with the morpheme sequence, it is possible to prevent the low accuracy of the language likelihood of the word sequence with a small amount of learning data, In addition, it is possible to prevent the problem that the language restriction is weaker than that of the word N-gram.

また、他の発明に係る音声認識装置によれば、言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスのクラス化列に、予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスの形態素列を埋め込み展開する単語系列展開手段の処理結果である統合コーパスから、前接続と後接続の形態素が特殊表現または非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合の異なる２つのバックオフ係数を格納する単語Ｎグラムを生成し、
音声認識部で、音声入力部によりとり込まれた音声の認識を行うとき、前接の形態素により単語Ｎグラムに格納されたバックオフ係数を選択して認識するので、学習データ量の少ない単語列の言語尤度精度の低さを防ぐことができ、かつ、単語Ｎグラムと比較して言語制約が弱くなるということも防げ、さらに、特殊表現の形態素連鎖の中に一般の形態素が混合することを低減できる。 Further, according to another speech recognition apparatus of the present invention, a morpheme belonging to a class created in advance is included in a classifying column of a first corpus in which example sentences for generating a language model are partially grouped by morpheme and class. From the integrated corpus that is the processing result of the word sequence expansion means that embeds and expands the morpheme sequence of the second corpus that describes the chain example of the set in the morpheme sequence, the morphemes of the pre-connection and the post-connection are the same in the special expression or non-special expression. Generating a word N-gram storing two different back-off factors for different representation connections and different representation connections;
When the speech recognition unit recognizes the speech captured by the speech input unit, the back-off coefficient stored in the word N-gram is selected and recognized by the front morpheme, so that the word string with a small amount of learning data Linguistic likelihood accuracy can be prevented, and it can be prevented that the language constraint becomes weaker than that of the word N-gram, and a general morpheme is mixed in the morpheme chain of the special expression. Can be reduced.

実施の形態１．
図２は、本発明の実施の形態１における言語モデル作成装置の構成図を示し、以下に説明する。第１コーパス１０１と第２コーパス１０２を特殊表現展開部１０３に入力する。特殊表現展開部１０３は、第１コーパス中にあるクラス化部分を別途入力した第２コーパス１０２の対応する部分に展開し、これを統合コーパス１０４に記録する。Ｎグラム辞書生成部１０５は統合コーパス１０４を入力し、単語Ｎグラム１０６を出力するものである。 Embodiment 1 FIG.
FIG. 2 shows a configuration diagram of the language model creation apparatus according to Embodiment 1 of the present invention, which will be described below. The first corpus 101 and the second corpus 102 are input to the special expression development unit 103. The special expression expansion unit 103 expands the classified portion in the first corpus into a corresponding portion of the second corpus 102 that is separately input, and records this in the integrated corpus 104. The N-gram dictionary generation unit 105 receives the integrated corpus 104 and outputs a word N-gram 106.

上記のように構成された言語モデル生成装置の動作について説明する。
第１コーパス１０１は、通話書き起こし作業者が作成した例文が格納されている記憶装置である。例文の格納形態は、各文をあらかじめ単語に相当する単位（以下、形態素）に分割をしておくことが望ましいが、例文をそのままで格納しておくことも可能であり、第１コーパス１０１から出力する際は、形態素解析などのフィルタを使用し、形態素に分割すれば良い。図３に１例を示す。発声の区切りごとに形態素の連鎖を構成する単位として格納されている。形態素は、表記、読み、品詞の３つ組から構成されている。クラス化部分が形態素連鎖に存在する場合は、３つ組みともクラス化文字列が記憶されており、部分的に形態素とクラスにより系列化されている。 The operation of the language model generation apparatus configured as described above will be described.
The first corpus 101 is a storage device that stores example sentences created by a call transcription operator. The example sentence is stored in advance by dividing each sentence into units corresponding to words (hereinafter, morphemes), but it is also possible to store example sentences as they are. When outputting, a filter such as morphological analysis may be used and divided into morphemes. An example is shown in FIG. Each utterance break is stored as a unit constituting a morpheme chain. A morpheme consists of a triple of notation, reading and part of speech. When the classifying part exists in the morpheme chain, classifying character strings are stored in all triples, and are partly grouped by morpheme and class.

第２コーパス１０２は、システム開発者や利用者などによってあらかじめ例文が格納されている記憶装置である。例文の格納形態は、各文をあらかじめ単語に相当する単位（以下、形態素）に分割をしておくことが望ましいが、例文をそのままで格納しておくことも可能であり、第２コーパス１０２から出力する際は、形態素解析などのフィルタを使用し、形態素に分割すれば良い。図４に住所名コーパスの１例を示す。各住所名は、都道府県名、市区郡名、および町村名の３連鎖で構成され、それぞれの形態素は、第１コーパスと同様、表記、読み、品詞の３つ組から構成されており、クラスに属する形態素集合の連鎖例を形態素列で記述されている。 The second corpus 102 is a storage device in which example sentences are stored in advance by a system developer or a user. The example sentence is stored in advance by dividing each sentence into units corresponding to words (hereinafter referred to as morphemes), but it is also possible to store example sentences as they are. When outputting, a filter such as morphological analysis may be used and divided into morphemes. FIG. 4 shows an example of an address name corpus. Each address name consists of three chains of prefecture name, city county name, and town name, and each morpheme consists of a triplet of notation, reading and part of speech, similar to the first corpus. A chain example of a morpheme set belonging to a class is described in a morpheme string.

特殊表現展開部１０３は、第１コーパス１０１と第２コーパス１０２を読み込み、第１コーパス１０１に含まれる形態素のうちのクラス化の品詞が合致する一連の形態素を第２コーパス１０２の一連の形態素に置き換え、個々のすべてのパターンを出力する。ただし、このとき、「県内、けんない、名詞」や「県内、けんない、名詞、の、の、助詞」など特定の形態素連鎖がマッチする場合は、県名を神奈川県に限定して一連の形態素に置き換え、統合コーパス１０４に出力する。置き換えを行った個数分の数で除算した値を出現回数として同じく統合コーパス１０４に出力する。また、クラス化形態素間に連鎖するクラス化形態素外の形態素は、品詞を＜＞で括って特殊表現の形態素とする。図５にこの例を示す。図５の場合、「県内」を「神奈川県」に置き換えたので、出現回数は１／３となる。 The special expression development unit 103 reads the first corpus 101 and the second corpus 102, and converts a series of morphemes that match the part of speech of classification among the morphemes included in the first corpus 101 into a series of morphemes of the second corpus 102. Replace and output all individual patterns. However, if a specific morpheme chain such as “prefecture, kennai, noun” or “prefecture, kennai, noun, no, particle” matches, the prefecture name is limited to Kanagawa prefecture. Replace with the morpheme and output to the integrated corpus 104. A value obtained by dividing the number of replacements is output to the integrated corpus 104 as the number of appearances. In addition, a morpheme other than a classified morpheme that is chained between classified morphemes is a morpheme of a special expression by enclosing the part of speech with <>. FIG. 5 shows an example of this. In the case of FIG. 5, “prefecture” is replaced with “Kanagawa prefecture”, so the number of appearances is 1/3.

この限定を行う規則は、特定の形態素連鎖「」内の表現を第１コーパスが含む場合に、展開すべき表現が第2コーパスの制限として制限「」内の表現を含まなければならないことを示すルールとして例えば以下の式（７）を用いる。（）内は省略可能な形態素連鎖を表す。 This restriction rule indicates that if the first corpus contains an expression in a particular morpheme chain “”, the expression to be expanded must include the expression in the restriction “” as a restriction on the second corpus. For example, the following formula (7) is used as the rule. () Represents an optional morpheme chain.

ここで、特殊表現展開部１０３の処理を詳細に説明する。図６にフローチャートを示す。特殊表現展開部１０３は、第１コーパス１０１から１列ずつ形態素列を読み出し、第１形態素列として記憶する。（ＳＴ１００２）、一方、統合コーパス１０４への展開した形態素列の出力数をカウントするため、形態素列出力数を０クリアし記憶する（ＳＴ１００３）。次に、第２コーパス１０２から１列ずつ形態素列を読み込み、第２形態素列として記憶する（ＳＴ１００４）。次に、第１形態素列にクラス形態素連鎖（＜＞で括られた形態素の連鎖）があるかチェックする。ＹＥＳの場合は処理をＳＴ１００６にすすめ、ＮＯの場合には、処理をＳＴ１０１４にすすめる（ＳＴ１００５）。 Here, the processing of the special expression developing unit 103 will be described in detail. FIG. 6 shows a flowchart. The special expression expansion unit 103 reads out morpheme strings from the first corpus 101 one by one, and stores them as first morpheme strings. (ST1002) On the other hand, in order to count the number of outputs of the developed morpheme string to the integrated corpus 104, the number of morpheme string outputs is cleared to 0 and stored (ST1003). Next, the morpheme strings are read from the second corpus 102 one by one and stored as the second morpheme string (ST1004). Next, it is checked whether the first morpheme string has a class morpheme chain (a morpheme chain enclosed by <>). If YES, the process proceeds to ST1006, and if NO, the process proceeds to ST1014 (ST1005).

次に、ＳＴ１０１４では、クラスの展開がなかったので、形態素出力数を１とし、第１形態素列を統合コーパスに出力し、処理をＳＴ１０１２にすすめる（ＳＴ１０１４）。次に、ＳＴ１００６では、第１形態素列に規定連鎖があるかチェックする。ここで規定連鎖とは、特定の形態素連鎖のことで、この実施の形態の場合、形態素「県内,けんない,名詞」とする。チェックの結果、ＹＥＳの場合は、処理をＳＴ１００７にすすめ、ＮＯの場合は処理をＳＴ１００８にすすめる（ＳＴ１００６）。次に、第２形態素列に該当連鎖があるかチェックする。ここで、該当連鎖とは、形態素「神奈川県,かながわけん,＜県＞」である。また、ＹＥＳであればＳＴ１００８にすすめ、また、ＮＯであれば、処理をＳＴ１００４にすすめる（ＳＴ１００７）。
このように構成することで、第１形態素列に「県内,けんない,名詞」がある場合は、第２形態素列として形態素「神奈川県,かながわけん,＜県＞」を含む形態素列のみ対象とすることができる。 Next, in ST1014, since there is no class expansion, the number of morpheme outputs is set to 1, the first morpheme string is output to the integrated corpus, and the process proceeds to ST1012 (ST1014). Next, in ST1006, it is checked whether there is a prescribed chain in the first morpheme string. Here, the prescribed chain is a specific morpheme chain, and in this embodiment, it is a morpheme “prefecture, kennai, noun”. As a result of the check, if YES, the process proceeds to ST1007, and if NO, the process proceeds to ST1008 (ST1006). Next, it is checked whether there is a corresponding chain in the second morpheme string. Here, the corresponding chain is the morpheme “Kanagawa Prefecture, Kanagawa Prefecture, <prefecture>”. If YES, the process proceeds to ST1008, and if NO, the process proceeds to ST1004 (ST1007).
By configuring in this way, when there is “prefecture, kennai, noun” in the first morpheme sequence, only the morpheme sequence including the morpheme “Kanagawa, Kanagawa, <prefecture>” as the second morpheme sequence is targeted. can do.

次に、第１形態素のクラス化形態素を第２形態素列内の同一クラス品詞の形態素に置き換えた展開形態素列を生成する。例えば、第１形態素列「＜県＞,＜県＞,＜県＞＜市＞, ＜市＞, ＜市＞の,の,助詞＜町＞,＜町＞,＜町＞です,です,助動詞」に第２形態素列「神奈川県,かながわけん,＜県＞愛甲郡,あいこうぐん,＜市＞愛川町,あいかわまち,＜町＞」を適用した場合、展開形態素列は「神奈川県,かながわけん,＜県＞愛甲郡,あいこうぐん,＜市＞の,の,助詞愛川町,あいかわまち,＜町＞」となる。 Next, an expanded morpheme sequence is generated by replacing the classified morpheme of the first morpheme with the morpheme of the same class part of speech in the second morpheme sequence. For example, the first morpheme sequence “<prefecture>, <prefecture>, <prefecture> <city>, <city>, <city>, no particle, <town>, <town>, <town>, is, auxiliary verb When applying the second morpheme sequence “Kanagawa, Kanagawa, <prefecture> Aiko-gun, Aikogun, <city> Aikawa-cho, Aikawamachi, <town>”, the expanded morphological sequence is “Kanagawa, Kanagawa” Ken, <prefecture> Aiko-gun, Aikogun, <city>, no particle, Aikawacho, Aikawamachi, <town>.

次に、展開形態素列のクラス化形態素列間の一般形態素の品詞をクラス化し、統合コーパス１０４に出力する。前述の例では、「の,の,助詞」が一般形態素となる。この形態素が、「＜市＞」および「＜町＞」に連接しているので、間の「の,の,助詞」は「の,の,＜助詞＞」というようにクラス化し、統合コーパス１０４に出力する（ＳＴ１００９）。次に、統合コーパス１０４に展開出力した形態素列数を計数するため、形態素列出力数に１を加えて記憶する（ＳＴ１０１０）。次に、すべての第２コーパス１０２の形態素列を処理したかチェックする。ＹＥＳであれば、処理をＳＴ１０１２へすすめる。ＮＯの場合は処理をＳＴ１００４へ戻す（ＳＴ１０１１）。次に、展開した形態素列出力数を個々の展開形態素列に割り振るため、形態素列出力数の逆数を形態素列の出現回数として記憶する（ＳＴ１０１２）。次に、すべての第１コーパス１０１の形態素列について処理したかチェックする。ＹＥＳであれば処理をＳＴ１０１５にすすめ終了する。ＮＯであれば処理をＳＴ１００２にもどす（ＳＴ１０１３）。 Next, the parts of speech of the general morpheme between the morpheme strings are classified and output to the integrated corpus 104. In the above example, “no, no particle” is a general morpheme. Since this morpheme is connected to “<city>” and “<town>”, “no, no, particle” between them is classified as “no, no, <particle>” and integrated corpus 104 (ST1009). Next, in order to count the number of morpheme sequences output and output to the integrated corpus 104, 1 is added to the number of morpheme sequence outputs and stored (ST1010). Next, it is checked whether the morpheme strings of all the second corpora 102 have been processed. If YES, the process proceeds to ST1012. If NO, the process returns to ST1004 (ST1011). Next, in order to allocate the expanded morpheme string output number to each expanded morpheme string output, the inverse of the morpheme string output number is stored as the number of appearances of the morpheme string (ST1012). Next, it is checked whether the morpheme strings of all the first corpora 101 have been processed. If YES, the process proceeds to ST1015 and ends. If NO, the process returns to ST1002 (ST1013).

Ｎグラム辞書生成部１０５は、統合コーパス１０４を入力し、単語Ｎグラム１０６を出力する。このとき、Ｎグラムの出現カウントは出現回数をもとに行う。 The N-gram dictionary generation unit 105 inputs the integrated corpus 104 and outputs a word N-gram 106. At this time, the N-gram appearance count is performed based on the number of appearances.

実施の形態２．
図７は、本発明の実施の形態２における言語モデル作成装置の構成図を示し、以下に説明する。実施の形態１との違いは、第２コーパス１０２を構造判定部７０１に入力し、特定の形態素パターンのみ特殊表現展開部１０３に入力することにより、展開する形態素列を適切なパターンのみにするものである。 Embodiment 2. FIG.
FIG. 7 shows a configuration diagram of the language model creation device according to Embodiment 2 of the present invention, which will be described below. The difference from the first embodiment is that the second corpus 102 is input to the structure determination unit 701, and only a specific morpheme pattern is input to the special expression expansion unit 103, so that the expanded morpheme string has only an appropriate pattern. It is.

上記のように構成された言語モデル生成装置の動作について実施の形態１と異なる部分のみ説明する。
構造判定部７０１は、第２コーパス１０２に格納された形態素列をすべて探索し、連接する形態素の親子関係を調べる。（ここで形態素の親子関係とは、例えば、「神奈川県」の形態素に後接する市名形態素は「鎌倉市」と「藤沢市」の２つ以上が存在し、「鎌倉市」と「藤沢市」に前接する県名の形態素は「神奈川県」しかない場合に、この２つの連接する形態素は親子であるとする。）親子関係がないクラス化品詞をＮＧ品詞ペアとしてピックアップし、ＮＧ品詞ペアカウントが全体の品詞ペアの一定割合を超える場合は、それらの品詞ペアを含む形態素列は除去する。このように構成することで、階層構造の強い形態素列のみを特殊表現展開部１０３に入力することができるため、不要な形態素列を展開しないためメモリ効率が良い。 The operation of the language model generation apparatus configured as described above will be described only for parts different from the first embodiment.
The structure determination unit 701 searches all the morpheme sequences stored in the second corpus 102 and checks the parent-child relationship of the connected morphemes. (Here, the parent-child relationship of morphemes is, for example, that there are two or more city name morphemes that follow the morphemes of “Kanagawa Prefecture”, “Kamakura City” and “Fujisawa City”, “Kamakura City” and “Fujisawa City”. When the morpheme of the prefecture name that precedes “Kanagawa Prefecture” is only “Kanagawa Prefecture”, these two connected morphemes are assumed to be parent and child.) The classified part of speech that has no parent-child relationship is picked up as an NG part of speech pair. If the account exceeds a certain percentage of the total part-of-speech pairs, the morpheme strings containing those part-of-speech pairs are removed. With this configuration, only morpheme strings having a strong hierarchical structure can be input to the special expression development unit 103, so that unnecessary morpheme strings are not expanded, and memory efficiency is high.

次に、構造判定部７０１の親子判定アルゴリズムを図８のフローチャートに従って詳細を説明する。構造判定部７０１は第２コーパス１０２を読み込みすべての形態素列を記憶する（ＳＴ２００２）。次に、記憶した形態素列の隣り合うすべての形態素ペアを記憶する（ＳＴ２００３）。次に、形態素ペアのうち１つを取り出す（ＳＴ２００４）。次に、取り出した形態素ペアの前接形態素をキーとして後接形態素以外の後接形態素を持つ形態素ペアがあるかすべて探索する（ＳＴ２００５）。次に、ペアが存在した場合は親子関係が成り立つ可能性があるので、ＳＴ２００７で親子フラグを１にセットする（ＳＴ２００６，ＳＴ２００７）。 Next, details of the parent-child determination algorithm of the structure determination unit 701 will be described with reference to the flowchart of FIG. Structure determination section 701 reads second corpus 102 and stores all morpheme strings (ST2002). Next, all adjacent morpheme pairs in the stored morpheme sequence are stored (ST2003). Next, one of the morpheme pairs is taken out (ST2004). Next, it searches for all morpheme pairs that have a succeeding morpheme other than the succeeding morpheme, using the anterior morpheme of the extracted morpheme pair as a key (ST2005). Next, since there is a possibility that a parent-child relationship is established when a pair exists, the parent-child flag is set to 1 in ST2007 (ST2006, ST2007).

次に、同様に後接形態素をキーとして前接形態素以外の前接形態素を持つ形態素ペアがあるかすべて探索する（ＳＴ２００８）。次に、ペアが存在した場合は逆向きの親子関係（子親関係）が成り立つ可能性があるので、ＳＴ２０１０で子親フラグを１にセットする（ＳＴ２０１０，ＳＴ２０１１）。次に、親子フラグと子親フラグの双方が１かチェックする（ＳＴ２０１１）。次に、親子フラグと子親フラグの双方が１の場合は階層関係が成立しないから、形態素ペア前接、後接品詞をＮＧ品詞ペアに記憶するとともにＮＧ品詞ペアカウントをインクリメントする（ＳＴ２０１２）。次に、すべての形態素ペアを処理したかチェックする。ＹＥＳの場合は処理を終える。ＮＯ場合は処理をＳＴ２００４に戻す（ＳＴ２０１３）。このように構成することで、親子関係がないクラス化品詞をＮＧ品詞ペアとしてピックアップすることができる。 Next, in the same manner, all the morpheme pairs having a pre-adjacent morpheme other than the pre-adjacent morpheme are searched for using the pre-adjacent morpheme as a key (ST2008). Next, if there is a pair, there is a possibility that a parent-child relationship (child-parent relationship) in the opposite direction may be established, so the child-parent flag is set to 1 in ST2010 (ST2010, ST2011). Next, it is checked whether both the parent-child flag and the child-parent flag are 1 (ST2011). Next, when both the parent-child flag and the child-parent flag are 1, the hierarchical relationship is not established, so the morpheme pair front and rear part-of-speech are stored in the NG part-of-speech pair and the NG part-of-speech pair account is incremented (ST2012). Next, it is checked whether all morpheme pairs have been processed. If yes, the process ends. If NO, the process returns to ST2004 (ST2013). By configuring in this way, it is possible to pick up classified parts of speech having no parent-child relationship as NG part of speech pairs.

実施の形態３．
図９は、本発明の実施の形態３における音声認識装置の構成図を示し、以下にその説明をする。言語モデル生成部分における実施の形態１との違いは、本発明の実施の形態３においては前接続と後接続の形態素が特殊表現または非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合の２つのバックオフ係数を格納することである。音声認識時に、上記教科書では３グラムが存在しない場合は２グラムに、２グラムが存在しない場合は１グラムにバックオフする技術が紹介されている。一般的には上記教科書にあるようなバックオフスムージングのためのバックオフ係数を、より低位のＮグラムの項目に格納しているが、Ｎグラム辞書生成部１０５において、これを、前接続と後接続の形態素が特殊表現または非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合の２つのバックオフ係数を格納することが大きく異なる。これにより、前接形態素が同種である場合はバックオフ係数を大きく、異種である場合にはバックオフ係数を小さくすることで、混合誤りを低減できる。 Embodiment 3 FIG.
FIG. 9 shows a block diagram of a speech recognition apparatus according to Embodiment 3 of the present invention, which will be described below. The difference from the first embodiment in the language model generation part is that in the third embodiment of the present invention, the morphemes of the pre-connection and the post-connection are different from the connection of the same expression of the special expression or the non-special expression. Is to store the two backoff coefficients. At the time of speech recognition, the above textbook introduces a technology that backs off to 2 grams when 3 grams do not exist and back to 1 gram when 2 grams does not exist. Generally, a back-off coefficient for back-off smoothing as described in the above textbook is stored in a lower N-gram item. It is greatly different from storing two back-off coefficients when the connection morpheme is a connection of the same expression of a special expression or a non-special expression and a connection of a different expression. As a result, the mixing error can be reduced by increasing the back-off coefficient when the front morpheme is the same type, and decreasing the back-off coefficient when the type is different.

単語Ｎグラム１０６をもとに、音声認識部１０８では、音声入力部１０７でとり込まれた音声の認識を行う。このとき、前接の形態素によりバックオフ係数を選択して認識する。音声認識結果はデータ出力部１０９に送ることで、データ出力部１０９で出力される。出力の方法は、ディスプレイを用いて利用者に提示しても良いし、出力部の先にアプリケーションをつなげることで、音声認識の結果を他のアプリケーションで利用することも可能である。 Based on the word N-gram 106, the voice recognition unit 108 recognizes the voice captured by the voice input unit 107. At this time, the backoff coefficient is selected and recognized by the morpheme of the front. The voice recognition result is output to the data output unit 109 by being sent to the data output unit 109. The output method may be presented to the user using a display, or the result of speech recognition can be used in another application by connecting the application to the end of the output unit.

本実施の形態では、特殊表現連鎖と非特殊表現連鎖の２つの連鎖種別により言語モデルに２つのバックオフ係数を格納して選択したが、複数種類の連鎖種別において、同種、異種の連鎖の区別で２つのバックオフ係数を格納してもよい。また、連鎖種別の組み合わせに応じて複数のバックオフ係数を格納しても良い。さらに、言語モデルに２つのバックオフ係数を格納して選択したが、これを格納せず、音声認識処理の中で、前接と後接の形態素種別により前接形態素が同種である場合はバックオフ係数を大きく、異種である場合にはバックオフ係数を小さく一定の比率で増減するようにしてもよい。このようにすることで、前述の制御より荒い制御になるが、メモリ効率を向上することができる。こうすることにより、特殊表現の形態素連鎖の中に一般の形態素が混合することを低減できる。例えば、「東京都／西東京市／ひばりが丘」が「東京都／に／死闘／教師／ひばりが丘」のように誤ることを少なくできる。 In this embodiment, two back-off coefficients are stored and selected in the language model according to the two types of special expression chain and the non-special expression chain. Two back-off coefficients may be stored. Further, a plurality of back-off coefficients may be stored according to the combination of chain types. In addition, two back-off coefficients are stored and selected in the language model, but are not stored. If the front morpheme is the same type in the speech recognition process due to the front and back morpheme types, When the off coefficient is large and different, the back off coefficient may be decreased and increased / decreased at a constant ratio. By doing so, the control becomes rougher than the control described above, but the memory efficiency can be improved. By doing so, mixing of general morphemes in the morpheme chain of special expressions can be reduced. For example, “Tokyo / Nishi Tokyo / Hibarigaoka” can be less likely to be mistaken as “Tokyo / ni / death fight / teacher / hibarigaoka”.

本発明の言語モデル生成装置及びその言語モデル生成装置を用いた音声認識装置は、音声認識率の向上が図れ、例えば、通話録音装置向け音声認識システムに利用可能である。 The language model generation apparatus and the speech recognition apparatus using the language model generation apparatus according to the present invention can improve the speech recognition rate, and can be used for, for example, a speech recognition system for a call recording apparatus.

都道府県クラスと市区郡クラスのＮグラムの連鎖例の説明図である。It is explanatory drawing of the example of a chain of N grams of a prefecture class and a municipality class. 本発明の実施の形態１における言語モデル作成装置の構成図である。It is a block diagram of the language model creation apparatus in Embodiment 1 of this invention. 第１コーパスから出力される例文の形態素解析例の説明図である。It is explanatory drawing of the morphological analysis example of the example sentence output from a 1st corpus. 第２コーパスから出力される例文の形態素解析例の説明図である。It is explanatory drawing of the morphological analysis example of the example sentence output from a 2nd corpus. 特殊表現展開部から出力される展開形態素列例の説明図である。It is explanatory drawing of the example of the expansion | deployment morpheme sequence output from a special expression expansion | deployment part. 特殊表現展開部の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of a special expression expansion | deployment part. 本発明の実施の形態２における言語モデル作成装置の構成図である。It is a block diagram of the language model creation apparatus in Embodiment 2 of this invention. 構造判定部の親子判定アルゴリズムを示すフローチャートである。It is a flowchart which shows the parent-child determination algorithm of a structure determination part. 本発明の実施の形態３における音声認識装置の構成図である。It is a block diagram of the speech recognition apparatus in Embodiment 3 of this invention.

符号の説明Explanation of symbols

１０１；第１コーパス、１０２；第２コーパス、１０３；特殊表現展開部、１０４；統合コーパス、１０５；Ｎグラム辞書生成部、１０６；単語Ｎグラム、１０７；音声入力部、１０８；音声認識部、１０９；データ出力部、７０１；構造判定部。 101; first corpus, 102; second corpus, 103; special expression development unit, 104; integrated corpus, 105; N-gram dictionary generation unit, 106; word N-gram, 107; voice input unit, 108; 109; Data output unit, 701; Structure determination unit.

Claims

コーパスから形態素とクラスによるＮグラム言語モデルを生成するＮグラム言語モデル生成装置であって、
言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスと、
予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスと、
第１のコーパスの各クラスに、第２のコーパスの一連の形態素のうち、第１のコーパスの一連のクラスと同一クラスの品詞の各形態素を置き換え展開形態素列を生成し、出力する特殊表現展開部と、
特殊表現展開部の処理結果を格納する統合コーパスと、
統合コーパスに格納された展開形態素を入力し単語Nグラムモデルを出力するNグラム辞書生成部と、
を備える言語モデル生成装置。 An N-gram language model generation device that generates an N-gram language model based on morphemes and classes from a corpus,
A first corpus in which example sentences for language model generation are partly grouped by morphemes and classes;
A second corpus describing a chain example of a morpheme set belonging to a class created in advance with a morpheme string;
In each class of the first corpus , among the series of morphemes of the second corpus , replace each morpheme of the part of speech of the same class as the series of classes of the first corpus to generate and output a special expression expansion that outputs And
An integrated corpus that stores the processing results of the special expression expansion unit;
An N-gram dictionary generator that inputs expanded morphemes stored in an integrated corpus and outputs a word N-gram model;
A language model generation device comprising:

第２のコーパスのある形態素の前接続形態素と後接続形態素との関係からある形態素の前接続形態素との品詞ペアの階層構造を判定し、この品詞ペアが階層構造を形成しない場合、この品詞ペアが第２のコーパスの品詞ペア全体の所定割合を超える場合はこの品詞ペアを含む形態素列を削除する構造判定手段を備え、
特殊表現展開部は展開形態素列を生成する際、第１のコーパスのクラスに、構造判定手段により形態素列が削除された第２のコーパスの形態素を置き換えることを特徴とする前記請求項１に記載の言語モデル生成装置。 If the hierarchical structure of a part-of-speech pair of a pre-connected morpheme of a certain morpheme is determined from the relationship between the pre-connected morpheme and the post-connected morpheme of the second corpus, and this part-of-speech pair does not form a hierarchical structure, this part-of-speech pair Includes a structure determination unit that deletes a morpheme sequence including the part of speech pair when the ratio exceeds a predetermined ratio of the entire part of speech part of the second corpus,
The special expression expansion unit replaces the morpheme of the second corpus from which the morpheme sequence is deleted by the structure determination unit with the class of the first corpus when generating the expanded morpheme sequence. Language model generator.

Ｎグラム辞書生成部は、特殊表現展開部により第１のコーパスのクラスに、第２のコーパスの第１のコーパスのクラスと同一クラスの品詞の形態素を置き換え生成された展開形態素列から単語Ｎグラムを生成し、この単語Ｎグラムに前接続と後接続の形態素がクラス化された特殊表現またはそれ以外の非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合に音声認識時のバックオフ時に用いられるバックオフ係数を異なる２つのバックオフ係数として格納することを特徴とする請求項１または２に記載の言語モデル生成装置。 The N-gram dictionary generation unit replaces the word N-gram from the expanded morpheme sequence generated by replacing the morpheme of the same class as the first corpus class of the second corpus with the first corpus class by the special expression expansion unit. When the connection of a special expression in which the morphemes of the pre-connection and post-connection are classified in this word N-gram or the connection of the same expression of other non-special expressions and the connection of a different expression are used. The language model generation apparatus according to claim 1 or 2, wherein the backoff coefficient used at the time of off is stored as two different backoff coefficients.

言語モデル生成目的の例文が部分的に形態素とクラスにより系列化された第１のコーパスと、
予め作成されたクラスに属する形態素集合の連鎖例を形態素列で記述した第２のコーパスと、
第１のコーパスの各クラスに、第２のコーパスの一連の形態素のうち、第１のコーパスの一連のクラスと同一クラスの品詞の各形態素を置き換え展開形態素列を生成し、出力する特殊表現展開部と、
特殊表現展開部の処理結果を格納する統合コーパスと、
統合コーパスに格納された展開形態素を入力し単語Nグラムモデルを生成するとともに、この単語Ｎグラムに前接続と後接続の形態素がクラス化された特殊表現またはそれ以外の非特殊表現の同じ表現の接続の場合と異なる表現の接続の場合の異なる２つのバックオフ係数を出力するNグラム辞書生成部と、
Nグラム辞書生成部で生成された単語Nグラムモデルと異なる２つのバックオフ係数を格納する単語クラスNグラムと、
音声入力部でとり込まれた音声の認識を行うとき、前接の形態素により単語クラスＮグラムに格納されたバックオフ係数を選択して認識する音声認識部と、音声認識結果を出力するデータ出力部を備える音声認識装置。 A first corpus in which example sentences for language model generation are partly grouped by morphemes and classes;
A second corpus describing a chain example of a morpheme set belonging to a class created in advance with a morpheme string;
In each class of the first corpus, among the series of morphemes of the second corpus, replace each morpheme of the part of speech of the same class as the series of classes of the first corpus to generate and output a special expression expansion that outputs And
An integrated corpus that stores the processing results of the special expression expansion unit;
The expanded morpheme stored in the integrated corpus is input to generate a word N-gram model, and the same expression of the special expression in which the morphemes of the pre-connection and the post-connection are classified into this word N-gram or other non-special expressions. An N-gram dictionary generation unit that outputs two different back-off coefficients in the case of connection different from the case of connection;
A word class N-gram storing two back-off coefficients different from the word N-gram model generated by the N-gram dictionary generation unit;
When recognizing the speech captured by the speech input unit, a speech recognition unit that selects and recognizes the backoff coefficient stored in the word class N-gram by the front morpheme, and data output that outputs the speech recognition result A speech recognition device comprising a unit.