JP2001242886A

JP2001242886A - Speech recognition device, morpheme analyzer, kana kanji converter and its method and recording medium with recorded program

Info

Publication number: JP2001242886A
Application number: JP2000051475A
Authority: JP
Inventors: Hirotaka Goi; 啓恭伍井; Yuzo Maruta; 裕三丸田; Yoshiharu Abe; 芳春阿部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-02-28
Filing date: 2000-02-28
Publication date: 2001-09-07
Anticipated expiration: 2020-02-28
Also published as: JP3935655B2

Abstract

PROBLEM TO BE SOLVED: To realize speech recognition device in which strong constraint can be applied while the order of n gram is held small and a stronger constraint can be applied when the orders are the same. SOLUTION: A phoneme probability computing means 2 computes phoneme occurrence probability corresponding to each phoneme of inputted voice to generate phoneme string candidates. A word probability computing means 9 computes word occurrence probability of each word candidate corresponding to the phoneme string candidate by referring phoneme n grams 7 and 8 which classify the phoneme string of an object language, a word declared string corresponding to the phoneme string and occurrence probability for every topic and store them. An output means 6 outputs the word string candidates which are computed using the phoneme occurrence probability and the word occurrence probability and are similar to the inputted voice.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、自然言語の統計量
を用い、対象言語の文字、あるいは単語の連接生起確率
であるｎグラムに基づいて、音声認識、または形態素解
析、または仮名漢字変換を行う音声認識装置、形態素解
析装置、仮名漢字変換装置、およびそれらのための音声
認識方法、形態素解析方法、仮名漢字変換方法、ならび
にそれらのプログラムを記録した記録媒体に関し、特
に、ｎグラムの統計量を話題別に扱うことによる解析精
度の向上に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition, morphological analysis, or kana-kanji conversion based on n-grams, which are the probabilities of occurrences of characters or words in a target language, using statistics of natural languages. The present invention relates to a speech recognition apparatus, a morphological analyzer, a kana-kanji conversion apparatus, and a speech recognition method, a morphological analysis method, a kana-kanji conversion method for them, and a recording medium on which the program is recorded. This is related to improving the analysis accuracy by treating each topic by topic.

【０００２】[0002]

【従来の技術】自然言語の統計量を用いた解析技術は多
くの文書処理に応用されている。例えば、音声認識によ
る日本語の入力は文書入力の手段として有用であり、よ
り認識精度の向上が望まれる。音声を精度よく認識する
ために、言語モデルとして自然言語の統計量を用い、対
象言語の文字、または単語の連接生起確率であるｎグラ
ムを用いる方式が注目されている。しかし、ｎグラムで
の制約は次数ｎに影響されるため、ｎが小さくなると制
約が弱くなってしまう。逆にｎグラムの次数ｎを増加さ
せると、頻度を計数する表が巨大になってしまうという
深刻な問題があるとともに、信頼性のある統計量を確保
するためには非常に膨大な例文集が必要になるといった
課題があった。なお、音声認識における、このようなｎ
グラムの表の増加を解決するための圧縮方式としては、
例えば特表平１０−５０１０７８号公報に示すようなも
のが提案されている。2. Description of the Related Art Analysis techniques using statistics of natural languages are applied to many document processes. For example, Japanese input by voice recognition is useful as a means for inputting a document, and further improvement in recognition accuracy is desired. In order to recognize speech with high accuracy, attention has been paid to a method of using a natural language statistic as a language model and using an n-gram, which is a connection occurrence probability of a character or word in a target language. However, since the constraint on the n-gram is affected by the order n, the smaller the n, the weaker the constraint. Conversely, increasing the degree n of the n-gram has a serious problem that the table for counting the frequency becomes huge, and a very large collection of example sentences is needed to secure reliable statistics. There was a problem that it became necessary. Note that such n in speech recognition
As a compression method to solve the increase in the gram table,
For example, the one disclosed in Japanese Patent Publication No. 10-501078 has been proposed.

【０００３】以下、自然言語の統計量を用いた従来の解
析技術について説明する。図２６は発話された「ｓａＮ
ｋａｉｎｏｓｅＮｓｅｅ」より認識結果「３階の先生」
を得るための、従来の解析方式が適用された音声認識装
置の構成例を示すブロック図である。図において、１は
マイク、２は音韻確率算出手段、３は単語予測手段、４
はｎグラム表（この場合には３グラム表）、５は情報を
記憶するＲＡＭ、６は出力手段である。A conventional analysis technique using natural language statistics will be described below. FIG. 26 shows the uttered “saN
Recognition result from "kainoseNsee""3rd floor teacher"
FIG. 9 is a block diagram showing a configuration example of a speech recognition device to which a conventional analysis method is applied to obtain the same. In the figure, 1 is a microphone, 2 is a phoneme probability calculating means, 3 is a word predicting means, 4
Is an n-gram table (in this case, a 3-gram table), 5 is a RAM for storing information, and 6 is an output means.

【０００４】以下、単語列候補の生成について説明す
る。単語列候補は、発話された単語列をＷ、音韻列をＹ
としたときの、単語列の確率Ｐ（Ｗ｜Ｙ）を最大にする
単語列Ｗを算出することにより得られる。なお、この単
語列の確率Ｐ（Ｗ｜Ｙ）は次の式（１）で与えられる。Hereinafter, generation of word string candidates will be described. Word string candidates are W for a spoken word string and Y for a phoneme string.
Is obtained by calculating the word string W that maximizes the probability P (W | Y) of the word string. Note that the probability P (W | Y) of this word string is given by the following equation (1).

【０００５】[0005]

【数１】 (Equation 1)

【０００６】単語列候補を生成するためには、前述のよ
うにこの確率Ｐ（Ｗ｜Ｙ）を最大にする単語列Ｗを求め
ればよいので、上記式（１）の右辺のうち、単語列Ｗに
共通な確率Ｐ（Ｙ）は省略することができ、確率Ｐ（Ｙ
｜Ｗ）Ｐ（Ｗ）を最大にする単語列Ｗを求めればよい。
なお、上記Ｐ（Ｙ｜Ｗ）は単語列Ｗが与えられたときの
音韻列Ｙの出現確率であり、Ｐ（Ｗ）は単語列Ｗの出現
確率である。In order to generate a word string candidate, a word string W which maximizes the probability P (W | Y) may be obtained as described above. The probability P (Y) common to W can be omitted, and the probability P (Y
| W) The word string W that maximizes P (W) may be obtained.
Note that P (Y | W) is the appearance probability of the phoneme string Y when the word string W is given, and P (W) is the appearance probability of the word string W.

【０００７】ここで、時刻ｔ＝１，２，…，Ｌにおい
て、単語列Ｗに対応する音韻列Ｙが以下の式（２）で決
定されるとき、音韻列Ｙの出現確率Ｐ（Ｙ｜Ｗ）は式
（２）に示した各音韻Ｙ_ｉの出現確率である音韻確率Ｐ
（Ｙ_ｉ）より、次の式（３）によって算出することがで
きる。Here, at time t = 1, 2,..., L, when the phoneme string Y corresponding to the word string W is determined by the following equation (2), the appearance probability P (Y | W) is the probability of occurrence of each phoneme Y _i shown in equation (2) phoneme probability P
From (Y _i ), it can be calculated by the following equation (3).

【０００８】[0008]

【数２】 (Equation 2)

【０００９】また、単語列Ｗの出現確率Ｐ（Ｗ）は、ｍ
語からなる単語列Ｗが次の式（４）で決定されるとき、
上記式（２）による各音韻Ｙ_ｉの出現確率である音韻確
率とは独立に、単語３グラムの確率による次の式（５）
により近似する。なお、この式（５）において、ｉが１
もしくは２である場合、ｗ_ｉ−１，ｗ_ｉ−２には（＃）
が入る。The appearance probability P (W) of the word string W is m
When the word string W composed of words is determined by the following equation (4),
Independently of the phoneme probability of the occurrence probability of each phoneme Y _i according to the formula (2), the next by the probability of the word 3 g (5)
Approximate by In this equation (5), i is 1
Or, in the case of 2, _wi-1 and wi _-2 have (#)
Enters.

【００１０】[0010]

【数３】 (Equation 3)

【００１１】上述した計算により音韻列候補のうち３グ
ラムインデックスに単語の列が存在するものについて、
単語列確率Ｐ（Ｗ｜Ｙ）を最大にする単語列Ｗを算出す
る。それぞれの単語の出現確率は、図２６に示した単語
の３グラム表４に予め記憶してある頻度値をもとに算出
する。According to the above calculation, a phoneme sequence candidate having a word sequence in the 3-gram index is:
A word string W that maximizes the word string probability P (W | Y) is calculated. The appearance probability of each word is calculated based on the frequency values stored in advance in the word 3 gram table 4 shown in FIG.

【００１２】算出した単語列Ｗを認識結果として出力手
段６より出力する。The calculated word string W is output from the output means 6 as a recognition result.

【００１３】次に動作について説明する。ここで、図２
７は上記従来の音声認識装置における音声認識の概略動
作の流れを示すフローチャートである。この音声認識の
処理は、ステップＳＴ１においてマイク１に対して発話
することによって開始される。マイク１はステップＳＴ
２においてこの発話された音声が入力されると、ステッ
プＳＴ３でその入力音声を電気信号に変換する。次にス
テップＳＴ４において、音韻確率算出手段２はこのマイ
ク１からの電気信号をＡ／Ｄ変換し、量子化した後、ス
ペクトル分析を行って、音節単位に分離した認識結果を
連接し、音韻列候補としてＲＡＭ５にこれを記憶する。Next, the operation will be described. Here, FIG.
FIG. 7 is a flowchart showing a schematic operation flow of speech recognition in the conventional speech recognition apparatus. The voice recognition process is started by speaking to the microphone 1 in step ST1. Microphone 1 is step ST
When the uttered voice is input in step 2, the input voice is converted into an electric signal in step ST3. Next, in step ST4, the phoneme probability calculating means 2 performs A / D conversion on the electric signal from the microphone 1, quantizes the signal, performs spectrum analysis, concatenates recognition results separated into syllable units, and generates a phoneme sequence. This is stored in the RAM 5 as a candidate.

【００１４】その後、単語予測手段３はステップＳＴ５
で、ＲＡＭ５からその音韻列候補を１つ取り出し、先頭
単語列の初期化をする。次にステップＳＴ６において、
検索キーとして、対応する３グラム情報を３グラム表４
より検索し、ステップＳＴ７にて、検索された３グラム
情報をもとに単語３連鎖の確率値を計算する。このよう
にして求めた確率値に基づいて、対応する音韻列候補に
対して最も確率の高い単語列Ｗを、ステップＳＴ８でＲ
ＡＭ５に記憶する。Thereafter, the word predicting means 3 determines in step ST5
Then, one of the phoneme string candidates is extracted from the RAM 5 and the head word string is initialized. Next, in step ST6,
Table 3 shows the corresponding 3-gram information as a search key.
In step ST7, a probability value of three words in a chain is calculated based on the searched three-gram information. Based on the probability value thus obtained, a word string W having the highest probability for the corresponding phoneme string candidate is determined in step ST8 by R
Store it in AM5.

【００１５】次に、ステップＳＴ９において、このＲＡ
Ｍ５に記憶されたすべての音韻列候補に対して上述の計
算を行い、最も確率の高い単語列Ｗと音韻列候補を選択
してそれを出力手段６から出力し、ステップＳＴ１０に
進んでこの一連の音声認識処理を終了する。Next, in step ST9, this RA
The above-mentioned calculation is performed for all the phoneme string candidates stored in M5, the word string W and the phoneme string candidate having the highest probability are selected and output from the output means 6, and the process proceeds to step ST10 to proceed to step ST10. Ends the voice recognition process.

【００１６】このように、発話に対して類似する確率の
高い単語列Ｗが求められる。As described above, a word string W having a high probability of being similar to an utterance is obtained.

【００１７】なお、従来の音声認識装置に関連する記載
のある文献としては、上記特表平１０−５０１０７８号
公報以外にも、音声認識時に認識結果より話題が得られ
た場合に、次の認識にその話題を用いる特開昭６２−１
９８９９号公報、辞書検索で抽出した話題を用いて辞書
を選択し、検索精度を向上させる特開平６３−２１９０
６７号公報、構文的な制約を用いることでそれまでのｎ
グラムモデルよりも制約を強める特開平６−３４２２９
８号公報などがある。[0017] In addition to the above-mentioned documents having a description related to the conventional speech recognition apparatus, in addition to the above-mentioned Japanese Patent Application Laid-Open No. 10-501078, when a topic is obtained from the recognition result at the time of speech recognition, the following recognition is performed. Japanese Patent Application Laid-Open No. 62-1
No. 9899, JP-A-63-2190 that selects a dictionary using topics extracted by dictionary search and improves search accuracy
No. 67, by using syntactical constraints,
Japanese Patent Application Laid-Open No. 6-34229, which makes restrictions more restrictive than the Gram model
No. 8 publication.

【００１８】[0018]

【発明が解決しようとする課題】従来の音声認識装置は
以上のように構成されているので、ｎグラムの次数を大
きくとれば言語制約は強くなるが、ｎグラム表４が巨大
化するという課題があるうえ、実用的に巨大なｎグラム
表４をうめるだけの統計量をとるための例文が必要とな
り、また、ｎグラムの次数を小さくすると言語制約が弱
まり、解析精度の低下をまねくといった課題があった。
すなわち、「３号アーチの先制」という句をこの音声認
識装置に入力した時、ｎグラムは、例えば大量の新聞デ
ータから統計量を抽出し、ｎの次数を２として、簡単化
のため、形態素の区切りは「さんごう・あーち・の・せ
んせい」とした場合、「３号アーチの」までは正しく解
析されると仮定しても、「の」の次の「せんせい」は新
聞全体の統計量を用いてしまうと「先生」のほうが高く
なってしまうため、「３号アーチの先生」といった認識
誤りを起こしてしまう可能性が高くなり、ｎの次数を大
きくすれば正解が得られる可能性は高くなるが、前述の
ｎグラム表４が巨大化し、必要な例文集も巨大化するな
どの課題があった。Since the conventional speech recognition apparatus is configured as described above, the language constraint becomes stronger if the degree of the n-gram is increased, but the n-gram table 4 becomes huge. In addition, there is a need for an example sentence to obtain enough statistics to fill a huge n-gram table 4 in practical use. Also, if the degree of the n-gram is reduced, language restrictions are weakened, and analysis accuracy is reduced. was there.
That is, when the phrase "No. 3 arch pre-emption" is input to this speech recognition apparatus, the n-gram extracts a statistic from, for example, a large amount of newspaper data, sets the order of n to 2, and for simplicity, the morpheme If the delimiter is "Sango / Achi-No-Sensei", even if it is assumed that up to "No. 3 arch" is correctly analyzed, the following "Sensei" after "No" is the statistics of the entire newspaper If the amount is used, "teacher" will be higher, so there is a high possibility that a recognition error such as "3rd arch teacher" will occur. If the degree of n is increased, a correct answer may be obtained. However, there is a problem that the above-mentioned n-gram table 4 becomes huge and necessary example sentences become huge.

【００１９】この発明は上記のような課題を解決するた
めになされたもので、ｎグラムの次数を小さくしたまま
強い制約をかけることができ、また同じ次数であればよ
り強い制約をかけることができる、音声認識、または形
態素解析、または仮名漢字変換を行う装置、およびそれ
らのための方法、ならびにそれらのプログラムを記録し
た記録媒体を得ることを目的とする。The present invention has been made in order to solve the above-mentioned problems, and it is possible to apply a strong constraint while keeping the order of the n-gram small, and to apply a stronger constraint if the order is the same. It is an object of the present invention to obtain a device capable of performing speech recognition, morphological analysis, or kana-kanji conversion, a method therefor, and a recording medium on which such a program is recorded.

【００２０】[0020]

【課題を解決するための手段】この発明に係る音声認識
装置は、音韻確率算出手段にて、入力された音声の各音
韻に対応する音韻生起確率を計算して音韻列候補を生成
し、単語確率算出手段がその音韻列候補に対応する各単
語候補の単語生起確率を、対象言語の音韻列、音韻列に
対応する単語表記列、および生起確率を記憶した音韻ｎ
グラムを参照して算出し、それら音韻生起確率と単語生
起確率を用いて計算した、入力された音声に類似する単
語列候補を出力手段より出力するようにするとともに、
その際に用いられる音韻ｎグラム中の単語を、それぞれ
の話題に対応して分類するようにしたものである。The speech recognition apparatus according to the present invention calculates phoneme occurrence probabilities corresponding to each phoneme of the input speech by phoneme probability calculation means, generates phoneme sequence candidates, and generates a phoneme sequence candidate. The probability calculation means calculates a word occurrence probability of each word candidate corresponding to the phoneme string candidate, a phoneme string of the target language, a word notation string corresponding to the phoneme string, and a phoneme n in which the occurrence probability is stored.
With reference to the gram, and by using the phoneme occurrence probability and the word occurrence probability, a word string candidate similar to the input speech is output from the output unit,
The words in the phoneme n-gram used at that time are classified according to the respective topics.

【００２１】この発明に係る音声認識装置は、単語確率
算出手段にて、単語列候補の算出時に、一連の音声に対
応する音韻ｎグラム中の話題をすべて一致させるように
したものである。In the speech recognition apparatus according to the present invention, the word probability calculation means matches all the topics in a phoneme n-gram corresponding to a series of voices when calculating a word string candidate.

【００２２】この発明に係る音声認識装置は、単語確率
算出手段にて、確率の重みを話題ごとに設定するように
したものである。In the speech recognition apparatus according to the present invention, the weight of the probability is set for each topic by the word probability calculating means.

【００２３】この発明に係る形態素解析装置は、形態素
確率算出手段が入力された仮名漢字混じり文字列に対応
する各単語候補の単語生起確率を、仮名漢字混じり文字
列、仮名漢字混じり文字列に対応する単語表記列、およ
び生起確率を記憶した漢字ｎグラムを参照して算出し、
得られた単語生起確率を用いて計算した、入力された仮
名漢字混じり文字列に適合する単語列候補を出力手段よ
り出力するようにするとともに、その際に用いられる漢
字ｎグラム中の単語を、それぞれの話題に対応して分類
するようにしたものである。In the morphological analysis device according to the present invention, the morpheme probability calculating means converts the word occurrence probability of each word candidate corresponding to the input kana-kanji mixed character string into a kana-kanji mixed character string and a kana-kanji mixed character string. Is calculated with reference to the word notation string to be performed and the kanji n-gram that stores the occurrence probability,
A word string candidate that matches the input kana-kanji mixed character string calculated using the obtained word occurrence probability is output from the output unit, and the word in the kanji n-gram used at that time is They are classified according to each topic.

【００２４】この発明に係る形態素解析装置は、形態素
確率算出手段にて、単語列候補の算出時に、一連の仮名
漢字混じり文字列に対応する漢字ｎグラム中の話題をす
べて一致させるようにしたものである。In the morphological analysis device according to the present invention, the morpheme probability calculating means matches all the topics in a kanji n-gram corresponding to a series of kana-kanji mixed character strings when calculating a word string candidate. It is.

【００２５】この発明に係る形態素解析装置は、形態素
確率算出手段にて、確率の重みを話題ごとに設定するよ
うにしたものである。In the morphological analyzer according to the present invention, the morphological probability calculating means sets the weight of the probability for each topic.

【００２６】この発明に係る仮名漢字変換装置は、漢字
確率算出手段が入力された仮名文字列に対応する各単語
候補の単語生起確率を、仮名文字列、仮名文字列に対応
する単語表記列、および生起確率を記憶した仮名ｎグラ
ムを参照して算出し、その単語生起確率を用いて計算し
た、入力された仮名文字列に適合する単語列候補を出力
手段より出力するようにするとともに、その際に用いら
れる仮名ｎグラム中の単語を、それぞれの話題に対応し
て分類するようにしたものである。In the kana-kanji conversion device according to the present invention, the kanji probability calculation means calculates the word occurrence probability of each word candidate corresponding to the input kana character string, by using a kana character string, a word notation string corresponding to the kana character string, And by referring to the kana n-gram storing the occurrence probabilities, and outputting the word string candidates calculated using the word occurrence probabilities, which match the input kana character string, from the output means. The words in the kana n-gram used at this time are classified according to each topic.

【００２７】この発明に係る仮名漢字変換装置は、漢字
確率算出手段にて、単語列候補の算出時に、一連の仮名
文字列に対応する仮名ｎグラム中の話題をすべて一致さ
せるようにしたものである。In the kana-kanji conversion apparatus according to the present invention, the kanji probability calculating means matches all the topics in the kana n-gram corresponding to a series of kana character strings when calculating a word string candidate. is there.

【００２８】この発明に係る仮名漢字変換装置は、漢字
確率算出手段にて、確率の重みを話題ごとに設定するよ
うにしたものである。In the kana-kanji conversion device according to the present invention, the kanji probability calculation means sets the weight of the probability for each topic.

【００２９】この発明に係る音声認識方法は、取り込ま
れた音声より変換された各音韻に対応する音韻生起確率
を算出し、記憶している単語をそれぞれの話題対応に分
類して、対象言語の音韻列、音韻列に対応する単語表記
列、および生起確率の記憶をした音韻ｎグラムを参照し
て、音韻列候補に対応する各単語候補の単語生起確率を
算出し、それらを用いて入力された音声に類似する単語
列候補を計算するようにしたものである。The speech recognition method according to the present invention calculates a phoneme occurrence probability corresponding to each phoneme converted from a fetched speech, classifies the stored words into respective topic correspondences, and converts the stored words into corresponding topics. The word occurrence probability of each word candidate corresponding to the phoneme sequence candidate is calculated by referring to the phoneme sequence, the word notation sequence corresponding to the phoneme sequence, and the phoneme n-gram in which the occurrence probability is stored, and is input using them. A word string candidate similar to the generated voice is calculated.

【００３０】この発明に係る形態素解析方法は、記憶し
ている単語をそれぞれの話題対応に分類して、仮名漢字
混じり文字列、仮名漢字混じり文字列に対応する単語表
記列、および生起確率の記憶をした漢字ｎグラムを参照
して、入力された仮名漢字混じり文字列に対応する各単
語候補の単語生起確率を算出し、それを用いて入力され
た仮名漢字混じり文字列に適合する単語列候補を計算す
るようにしたものである。In the morphological analysis method according to the present invention, the stored words are classified according to the respective topics, and a kana-kanji mixed character string, a word notation string corresponding to the kana-kanji mixed character string, and an occurrence probability are stored. The word occurrence probability of each word candidate corresponding to the input kana-kanji mixed character string is calculated with reference to the kanji n-gram that has been input, and a word string candidate that matches the input kana-kanji mixed character string using the calculated probability Is calculated.

【００３１】この発明に係る仮名漢字変換方法は、記憶
している単語をそれぞれの話題対応に分類して、対象言
語の仮名文字列、仮名文字列に対応する単語表記列、お
よび生起確率の記憶をした仮名ｎグラムを参照して、入
力された仮名文字列に対応する各単語候補の単語生起確
率を算出し、それを用いて入力された仮名文字列に適合
する単語列候補を計算するようにしたものである。In the kana-kanji conversion method according to the present invention, the stored words are classified according to the respective topics, and the kana character string in the target language, the word notation string corresponding to the kana character string, and the occurrence probability are stored. With reference to the kana n-gram, the word occurrence probability of each word candidate corresponding to the input kana character string is calculated, and a word string candidate matching the input kana character string is calculated using the calculated probability. It was made.

【００３２】この発明に係る記録媒体は、取り込まれた
音声より変換された各音韻に対応する音韻生起確率を算
出し、記憶している単語をそれぞれの話題対応に分類し
て、対象言語の音韻列、音韻列に対応する単語表記列、
および生起確率の記憶をした音韻ｎグラムを参照して、
音韻列候補に対応する各単語候補の単語生起確率を算出
し、それらを用いて入力された音声に類似する単語列候
補を計算する音声認識方法を、コンピュータに実行させ
るためのプログラムを、コンピュータ読み取り可能に記
録したものである。The recording medium according to the present invention calculates a phoneme occurrence probability corresponding to each phoneme converted from a fetched voice, classifies the stored words into respective topic correspondences, and calculates the phoneme of the target language. Strings, word notation strings corresponding to phoneme strings,
And the phoneme n-gram that stores the probability of occurrence,
A computer-readable program for causing a computer to execute a speech recognition method of calculating a word occurrence probability of each word candidate corresponding to a phoneme string candidate and calculating a word string candidate similar to the input speech using the same. It is recorded as possible.

【００３３】この発明に係る記録媒体は、記憶している
単語をそれぞれの話題対応に分類して、仮名漢字混じり
文字列、仮名漢字混じり文字列に対応する単語表記列、
および生起確率の記憶をした漢字ｎグラムを参照して、
入力された仮名漢字混じり文字列に対応する各単語候補
の単語生起確率を算出し、それを用いて入力された仮名
漢字混じり文字列に適合する単語列候補を計算する形態
素解析方法をコンピュータに実行させるためのプログラ
ムを、コンピュータ読み取り可能に記録したものであ
る。According to the recording medium of the present invention, the stored words are classified according to the respective topics, and a kana-kanji mixed character string, a word notation string corresponding to a kana-kanji mixed character string,
And the kanji n-gram that memorized the probability of occurrence,
Calculates the word occurrence probability of each word candidate corresponding to the input kana-kanji mixed character string and executes a morphological analysis method on the computer using the calculated word occurrence probability to match the input kana-kanji mixed character string A program for causing a computer to read the program is recorded in a computer-readable manner.

【００３４】この発明に係る記録媒体は、記憶している
単語をそれぞれの話題対応に分類して、対象言語の仮名
文字列、仮名文字列に対応する単語表記列、および生起
確率の記憶をした仮名ｎグラムを参照して、入力された
仮名文字列に対応する各単語候補の単語生起確率を算出
し、それを用いて入力された仮名文字列に適合する単語
列候補を計算する仮名漢字変換方法をコンピュータに実
行させるためのプログラムを、コンピュータ読み取り可
能に記録したものである。In the recording medium according to the present invention, the stored words are classified according to the respective topics, and the kana character string of the target language, the word expression string corresponding to the kana character string, and the occurrence probability are stored. Kana-Kanji conversion that calculates the word occurrence probability of each word candidate corresponding to the input kana character string with reference to the kana n-gram and calculates the word string candidate that matches the input kana character string using the kana n-gram A program for causing a computer to execute the method is recorded in a computer-readable manner.

【００３５】[0035]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による音
声認識装置の構成を示すブロック図である。図におい
て、１は音声を入力する入力手段としてのマイク、２は
そのマイク１から入力された音声信号を音韻に変換し、
各音韻に対応する音韻生起確率を算出して音韻列候補を
生成する音韻確率算出手段であり、これらは図２６に同
一符号を付して示した従来のそれらと同等のものであ
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes a microphone as input means for inputting voice, and 2 converts a voice signal input from the microphone 1 into phonemes.
This is a phoneme probability calculating means for calculating phoneme occurrence probabilities corresponding to each phoneme to generate phoneme sequence candidates, and these are equivalent to the conventional ones indicated by the same reference numerals in FIG.

【００３６】７，８は対象言語の音韻列と、音韻列に対
応する単語表記列と、生起確率とを記憶する音韻ｎグラ
ムであり、この音韻ｎグラム中では単語が、それぞれの
話題に対応して分類されており、音韻ｎグラム７として
は野球の話題について記憶した野球話題の音韻ｎグラム
について、音韻ｎグラム８としては一般の話題について
記憶した一般話題の音韻ｎグラムについてそれぞれ例示
されている。９はこれら野球話題の音韻ｎグラム７およ
び一般話題の音韻ｎグラム８を参照して、音韻確率算出
手段２の出力する音韻列候補に対応する各単語候補の単
語生起確率を算出する単語確率算出手段である。Reference numerals 7 and 8 denote phonological n-grams for storing a phonological sequence of the target language, a word notation sequence corresponding to the phonological sequence, and an occurrence probability. In the phonological n-gram, words correspond to respective topics. The phoneme n-gram 7 is exemplified for a phoneme n-gram of a baseball topic stored on a topic of baseball, and the phoneme n-gram 8 is exemplified for a phoneme n-gram of a general topic stored on a general topic. I have. Reference numeral 9 denotes a word probability calculation for calculating the word occurrence probability of each word candidate corresponding to the phoneme sequence candidate output by the phoneme probability calculation means 2 with reference to the phoneme n-gram 7 of the baseball topic and the phoneme n-gram 8 of the general topic. Means.

【００３７】５は処理過程の情報を記憶するＲＡＭであ
り、６は音韻確率算出手段２で算出された音韻生起確率
と、単語確率算出手段９で算出された単語生起確率を用
いて、マイク１より入力された音声に類似する単語列候
補を求めて出力する出力手段である。なお、このＲＡＭ
５および出力手段６も図２６に同一符号を付して示した
従来のそれらと同等のものである。Reference numeral 5 denotes a RAM for storing information on the processing process. Reference numeral 6 denotes a microphone 1 using the phoneme occurrence probability calculated by the phoneme probability calculation means 2 and the word occurrence probability calculated by the word probability calculation means 9. This is output means for obtaining and outputting word string candidates similar to the input speech. Note that this RAM
The output means 5 and the output means 6 are the same as those in the prior art shown in FIG.

【００３８】以下、単語列候補の生成について説明す
る。この実施の形態１においても従来の場合と同様に、
単語列候補は、発話された単語列をＷ、音韻列をＹとし
たときの、上記従来の音声認識装置の説明で用いた式
（１）で与えられる単語列Ｗの確率Ｐ（Ｗ｜Ｙ）を最大
にする単語列Ｗを算出することによって得られる。この
ように単語列候補を生成するためには、確率Ｐ（Ｗ｜
Ｙ）を最大にする単語列Ｗを求めればよいので、前述の
式（１）の右辺のうち、単語列Ｗに共通な確率Ｐ（Ｙ）
は省略でき、確率Ｐ（Ｙ｜Ｗ）Ｐ（Ｗ）を最大にする単
語列Ｗを求めればよい。Hereinafter, generation of a word string candidate will be described. Also in the first embodiment, similarly to the conventional case,
The word string candidate is a probability P (W | Y) of the word string W given by equation (1) used in the description of the conventional speech recognition apparatus, where W is the uttered word string and Y is the phoneme string. ) Is obtained by calculating the word string W that maximizes the above. In order to generate a word string candidate in this manner, the probability P (W |
Since the word string W that maximizes Y) may be obtained, the probability P (Y) common to the word strings W in the right side of the above-described equation (1) is obtained.
Can be omitted, and the word string W that maximizes the probability P (Y | W) P (W) may be obtained.

【００３９】時刻ｔ＝１，２，…，Ｌにおいて、単語列
Ｗに対応する音韻列Ｙが、上記従来の音声認識装置の説
明で用いた式（２）で決定されるとき、音韻列Ｙの出現
確率Ｐ（Ｙ｜Ｗ）は当該音韻列Ｙの各音韻Ｙ_ｉの出現確
率である音韻確率Ｐ（Ｙ_ｉ）より、従来の音声認識装置
の説明における式（３）によって算出できる。また、単
語列Ｗの出現確率Ｐ（Ｗ）は、ｍ語からなる単語列Ｗが
従来の音声認識装置の説明における式（４）で決定され
るとき、音韻確率Ｐ（Ｙ_ｉ）とは独立に次の式（６）か
ら求めることができる。なお、この式（６）におけるｎ
は音韻ｎグラムの次数ｎである。At time t = 1, 2,..., L, when the phoneme string Y corresponding to the word string W is determined by the equation (2) used in the description of the conventional speech recognition apparatus, the phoneme string Y the occurrence probability P (Y | W) than phonemes probability P (Y _i) is a probability of occurrence of each phoneme Y _i of the series of phonemes Y, can be calculated by equation (3) in the description of the conventional speech recognition apparatus. The appearance probability P (W) of the word string W is independent of the phoneme probability P (Y _i ) when the word string W composed of m words is determined by Expression (4) in the description of the conventional speech recognition device. Can be obtained from the following equation (6). Note that n in this equation (6)
Is the degree n of the phoneme n-gram.

【００４０】[0040]

【数４】 (Equation 4)

【００４１】上述の計算により、音韻列候補のうち野球
話題の音韻ｎグラム７や一般話題の音韻ｎグラム８に単
語の列が存在するものについて、単語列の確率Ｐ（Ｗ｜
Ｙ）を最大にする単語列Ｗを算出する。なお、組み合わ
せの計算については、例えば、中川聖一著：「確率モデ
ルによる音声認識」に示されるビタビ（Ｖｉｔｅｒｂ
ｉ）やスタックデコーディングの方法を用いて高速に行
ってもよく、また、確率を対数確率として計算式を総和
で計算可能としてもよい。それぞれの単語の出現確率は
野球話題の音韻ｎグラム７、および一般話題の音韻ｎグ
ラム８に予め記憶してある値を使用する。According to the above calculation, the word sequence probability P (W |) of the phoneme sequence candidates in which the word sequence exists in the phoneme n-gram 7 of the baseball topic or the phoneme n-gram 8 of the general topic.
A word string W that maximizes Y) is calculated. For the calculation of the combination, for example, Viterbi (Viterb) shown in Seiichi Nakagawa: “Speech Recognition by Stochastic Model”
i) or the method of stack decoding may be used at high speed, or the formula may be calculated by summation using the probability as log probability. The appearance probability of each word uses a value stored in advance in the phoneme n-gram 7 of the baseball topic and the phoneme n-gram 8 of the general topic.

【００４２】ここで、図２はこの音声認識装置にて解析
される例文を示す説明図であり、図において、１０がそ
の例文である。また、図３はこの例文１０の解析に使用
する音韻ｎグラムの具体例を示す説明図であり、図にお
いて、１１がその音韻ｎグラムである。なお、この音韻
ｎグラム１１には野球話題の音韻ｎグラム７と一般話題
の音韻ｎグラム８とが記録されている。FIG. 2 is an explanatory diagram showing an example sentence analyzed by the speech recognition apparatus. In the figure, reference numeral 10 denotes the example sentence. FIG. 3 is an explanatory diagram showing a specific example of a phoneme n-gram used for the analysis of the example sentence 10, in which 11 is the phoneme n-gram. Note that the phoneme n-gram 11 records a phoneme n-gram 7 of a baseball topic and a phoneme n-gram 8 of a general topic.

【００４３】図３に示すように、この音韻ｎグラム１１
内の野球話題の音韻ｎグラム７と一般話題の音韻ｎグラ
ム８には、それぞれ２グラムと１グラムがあり、先頭の
音韻列が検索のためのキーとなっている。２グラムでは
キーとなる各音韻列に対して、前接形態素、後接形態
素、および確率が記録されている。ここに記録されてい
る確率は、前接形態素の次に後接形態素を接続する確率
であり、その２グラムの生起確率に該当する。また、１
グラムではキーとなる各音韻列に対して、直接次に連接
する形態素（後接続形態素）と確率が記録されている。
この１グラムの確率はその形態素自身の生起確率であ
る。なお、形態素は表記、音素表記、見出し読み、およ
び品詞の組であらわされる。As shown in FIG. 3, this phoneme n-gram 11
There are 2 grams and 1 gram in the phoneme n-gram 7 of the baseball topic and the phoneme n-gram 8 of the general topic, respectively. The first phoneme sequence is a key for retrieval. In the 2 gram, a leading morpheme, a trailing morpheme, and a probability are recorded for each key phoneme sequence. The probability recorded here is the probability of connecting the subsequent morpheme after the preceding morpheme, and corresponds to the occurrence probability of the two grams. Also, 1
In the gram, the morpheme (consecutive morpheme) connected directly next and the probability are recorded for each phoneme sequence as a key.
The probability of one gram is the occurrence probability of the morpheme itself. Note that a morpheme is represented by a set of notation, phoneme notation, headline reading, and part of speech.

【００４４】算出した単語列Ｗを認識結果として出力手
段６より出力する。The calculated word string W is output from the output means 6 as a recognition result.

【００４５】次に動作について説明する。ここで、図４
はこの実施の形態１による音声認識装置における認識処
理の概略動作の流れを示すフローチャートである。この
音声認識の処理はステップＳＴ１０１において、マイク
１に対して発話することによって処理が開始される。マ
イク１はステップＳＴ１０２でこの発話された音声が入
力されると、ステップＳＴ１０３でその入力音声を電気
信号に変換し、アナログデータとして取り込む。Next, the operation will be described. Here, FIG.
5 is a flowchart showing a schematic operation flow of a recognition process in the voice recognition device according to the first embodiment. The voice recognition process is started by speaking to the microphone 1 in step ST101. When the uttered voice is input in step ST102, the microphone 1 converts the input voice into an electric signal in step ST103, and takes in the analog signal as analog data.

【００４６】次にステップＳＴ１０４において、音韻確
率算出手段２はこのマイク１の取り込んだアナログデー
タをＡ／Ｄ変換し、量子化した後、スペクトル分析を行
って、音節単位に分離した認識結果を音韻列候補として
出力する。なお、その処理の詳細については、例えば、
中川聖一著：「確率モデルによる音声認識」などに示さ
れる種々の周知の手法によるものであるため、ここでは
その説明を割愛する。この音韻列候補はマイク１より取
り込んだアナログデータに対応する各音韻の確からしさ
を確率値で表現したもので、連鎖した音韻連鎖とその連
鎖の音響尤度の対で出力し、ＲＡＭ５にこれを記憶す
る。なお、この音響尤度は音韻列Ｙの出現確率Ｐ（Ｙ｜
Ｗ）の最大値である。Next, in step ST104, the phoneme probability calculating means 2 performs A / D conversion on the analog data captured by the microphone 1, quantizes the analog data, performs spectrum analysis, and converts the recognition result separated into syllable units into phoneme. Output as column candidates. For details of the processing, for example,
This is not described here because it is based on various well-known methods such as those described in Seiichi Nakagawa: “Speech Recognition by Stochastic Model”. These phoneme string candidates represent the likelihood of each phoneme corresponding to the analog data taken in from the microphone 1 as a probability value, and are output as a pair of a chained phoneme chain and the acoustic likelihood of the chain. Remember. Note that this acoustic likelihood is the appearance probability P (Y |
W) is the maximum value.

【００４７】この実施の形態１では、上記音韻連鎖と、
連鎖の音響尤度として、以下が出力されたと仮定する。＃ｓａＮｇｏｏａａｃｉｎｏｓｅＮｓｅｅ＃０.９＃ｓａＮｇｏｏａｃｉｎｏｓｅＮｓｅ＃０.１In the first embodiment, the above phoneme chain
It is assumed that the following is output as the acoustic likelihood of the chain. # SaNgooacinoseNse # 0.9 # saNgooacinoseNse # 0.1

【００４８】なお、音響尤度については、確率以外に対
数確率等を用いてもよく、音韻連鎖についてはラティス
等の効率的な記憶方式を用いてもよい。For the acoustic likelihood, a logarithmic probability or the like may be used in addition to the probability. For a phoneme chain, an efficient storage method such as lattice may be used.

【００４９】次に単語確率算出手段９はステップＳＴ１
０５において、音韻確率算出手段２の出力した音韻列候
補と音響尤度をＲＡＭ５より１つ取り出すとともに、初
期化処理をする。この初期化処理として、ヌル単語
「｛＃＃＃文頭｝」とその確率値「１」を、先行
単語列候補の初期言語尤度値としてＲＡＭ５に記憶す
る。ここでは、まず、音韻列候補として、「＃ｓａＮｇ
ｏｏａａｃｉｎｏｓｅＮｓｅｅ＃」が取り出される。Next, the word probability calculating means 9 executes step ST1.
At 05, one phoneme sequence candidate and one acoustic likelihood output from the phoneme probability calculation means 2 are extracted from the RAM 5 and an initialization process is performed. As this initialization processing, the null word “{#### sentence start}” and its probability value “1” are stored in the RAM 5 as the initial language likelihood value of the preceding word string candidate. Here, first, “#saNg” is regarded as a phoneme sequence candidate.
ooacinoseNsee # "is retrieved.

【００５０】次にステップＳＴ１０６において、単語確
率算出手段９はすべての先行単語列候補が音韻列候補の
末端の音韻と対応したか否かをチェックし、すべて対応
していれば後述するステップＳＴ１１２の処理に移り、
対応していなければステップＳＴ１０７以下の処理を行
なう。Next, in step ST106, the word probability calculation means 9 checks whether or not all preceding word string candidates correspond to the terminal phonemes of the phoneme string candidates. Processing.
If not, the process from step ST107 is performed.

【００５１】ステップＳＴ１０７ではＲＡＭ５から先行
単語列候補を１つ取り出す。この実施の形態１では、最
初に「｛＃＃＃文頭｝」が先行単語列候補として
取り出される。In step ST107, one preceding word string candidate is extracted from the RAM 5. In the first embodiment, “{#### sentence start}” is first extracted as a preceding word string candidate.

【００５２】次にステップＳＴ１０８において、音韻ｎ
グラム１１を先行単語列候補の音韻列情報により検索す
る。この実施の形態１の場合、まず、初期の先行単語列
である「｛＃＃＃文頭｝」を検索する。検索した
先行単語以降の音韻列候補の部分列に、前方一致する後
方単語があるか否かをチェックする。前方一致した後方
単語が無い場合は、ステップＳＴ１０６に処理を戻し、
前方一致した後方単語がある場合は、ステップＳＴ１０
９以下の処理に進む。Next, in step ST108, the phoneme n
The gram 11 is searched by the phoneme string information of the preceding word string candidate. In the case of the first embodiment, first, "{#### sentence head}" which is an initial preceding word string is searched. It is checked whether or not there is a backward word that matches forward in the partial sequence of the phoneme string candidate after the preceding word searched. If there is no backward word that matches the beginning, the process returns to step ST106,
If there is a backward word that matches the beginning, step ST10
The process proceeds to step 9 and below.

【００５３】ここで、この実施の形態１では、先行単語
列「｛＃＃＃文頭｝」の後方単語として音韻ｎグ
ラム１１の検索を行い、「＃」に後続する「ｓａＮｇｏ
ｏａａ…」の先頭からの音素列が部分一致する単語を検
索し後方単語とする。２グラムでは「＃ｓａＮｇｏｏ」
が音韻列「＃ｓａＮｇｏｏａａ…」と前方一致するの
で、この２グラムの後接形態素「野球：３号ｓａＮｇ
ｏｏさんごう名詞」を後方単語の候補の１つとす
る。また、１グラムの「｛野球：３号ｓａＮｇｏｏ
さんごう名詞｝」は後方の音素列に前方一致するので
候補とする。さらに「｛一般：３号ｓａＮｇｏｏさ
んごう名詞｝」も候補とする。Here, in the first embodiment, the phoneme n-gram 11 is searched as a word after the preceding word string “{#### sentence beginning}”, and “saNgo” following “#” is searched.
A word in which the phoneme sequence from the beginning of “aaa... "#SaNgoo" for 2 grams
Matches the phoneme sequence “#saNgoooa...”, The morpheme “baseball: 3 saNg
oo sango noun ”is one of the candidates for the backward word. Also, 1 gram of "@ Baseball: No. 3 saNgoo
Sango noun｝ ”is a candidate because it matches the beginning of the phoneme sequence at the back. In addition, “｛General: No. 3 saNgoo Sango Noun｝” is also a candidate.

【００５４】なお、この実施の形態１では、説明の簡単
化のために部分一致を用いたが、曖昧な音韻連鎖との類
似検索に、ＤＰマッチング処理や、阿部他：「１段目の
最適解と正解の差分傾向を考慮した２段階探索法」，音
講論，１−Ｒ−１５，１９９８.９に示されるような他
の手法を用いてもよい。In the first embodiment, partial matching is used for simplicity of description. However, similarity search with an ambiguous phoneme chain is performed by DP matching processing, Abe et al. A two-stage search method considering the difference tendency between the solution and the correct answer ", Ongaku Ron, 1-R-15, 1998.

【００５５】ステップＳＴ１０９においては、後方単語
それぞれについて同様に尤度を計算し、それをＲＡＭ５
に記憶するとともに先行単語列に後方単語を接続してゆ
き、新たに先行単語列としてＲＡＭ５に記憶する。その
際、２グラムの場合は話題が先行の形態素と同じになる
ようにし、１グラムの場合は連接がないため話題の切り
替わりがあってもよいようにする。In step ST109, the likelihood is similarly calculated for each of the backward words, and the calculated likelihood is stored in RAM5.
And a subsequent word is connected to the preceding word string, and is stored in the RAM 5 as a new preceding word string. At that time, in the case of 2 grams, the topic is the same as the preceding morpheme, and in the case of 1 gram, there is no connection, so that the topic may be switched.

【００５６】実施の形態１では、先行単語列「｛＃＃
＃文頭｝」を「｛野球：＃＃＃文頭｝、｛野
球：３号ｓａＮｇｏｏさんごう名詞｝」に置き換
える。言語尤度は、先行単語列「｛＃＃＃文
頭｝」の確率１と、野球話題の音韻ｎグラム７の「｛野
球：＃＃＃文頭｝、｛野球：３号ｓａＮｇｏｏ
さんごう名詞｝」の２グラムの確率０.０１から次の
式（７）で計算される。In the first embodiment, the preceding word string “$ ##
# Baseball # "is replaced with" Baseball: ### Baseball #, Baseball # 3 saNgoo Sango Noun ". The linguistic likelihood is the probability 1 of the preceding word string "@ #### sentence head" and the "@baseball: #### sentence head" and "baseball: No. 3 saNgoo" of the phoneme n-gram 7 of baseball topics.
It is calculated from the probability 0.01 of 2 grams of "sango noun｝" by the following formula (7).

【００５７】先行単語列の確率×ｎグラムの確率＝１×０．０１＝０．０１・・・・（７）Probability of preceding word string × probability of n-gram = 1 × 0.01 = 0.01 (7)

【００５８】次にステップＳＴ１１０において、音韻列
全体が単語列に対応したか否かのチェックを行い、対応
していればステップＳＴ１１１に進んで、最大尤度およ
び解の先行単語列をＲＡＭ５に記憶した後、処理をステ
ップＳＴ１０６に戻して、すべての先行単語列候補が音
韻列候補の末端の音韻と対応したか否かをチェックす
る。一方、対応していなければ、そのまま処理をステッ
プＳＴ１０６に戻して上記チェックを行う。Next, in step ST110, it is checked whether or not the entire phoneme sequence corresponds to the word sequence. If it does, the process proceeds to step ST111 to store the maximum likelihood and the preceding word sequence of the solution in the RAM 5. After that, the process returns to step ST106 to check whether or not all preceding word string candidates correspond to the terminal phonemes of the phoneme string candidates. On the other hand, if not, the process returns to step ST106 and the above check is performed.

【００５９】ステップＳＴ１０６で、すべての先行単語
列候補が音韻列候補の末端の音韻と対応していると判定
された場合には、ステップＳＴ１１２に移って、すべて
の音韻列候補に対して一致する単語が得られているか否
かのチェックを行う。その結果、すべての音韻列候補に
対して一致する単語が得られていなければステップＳＴ
１０５に処理を戻して同様の処理を繰り返す。一方、す
べての音韻列候補に対して一致する単語が得られていれ
ば、ステップＳＴ１１３以下の処理を行う。If it is determined in step ST106 that all preceding word string candidates correspond to the terminal phonemes of the phoneme string candidates, the process moves to step ST112, where all the phoneme string candidates match. Check whether the word has been obtained. As a result, if no matching word has been obtained for all phoneme string candidates, step ST
The process returns to 105 and the same process is repeated. On the other hand, if a word that matches all the phoneme string candidates has been obtained, the processing from step ST113 is performed.

【００６０】この実施の形態１では、以上の処理によ
り、音韻列候補に対応して、「｛＃＃＃文頭｝、
｛野球：３号ｓａＮｇｏｏさんごう名詞｝、｛野
球：アーチａａｃｉあーち名詞｝、｛野球：の
ｎｏの助詞｝、…」の順に先行単語列候補が得られ
る。In the first embodiment, by the above processing, "{### sentence start},
｛Baseball: No.3 saNgoo Sango Noun｝, ｛Baseball: Arch Aaci Aichi Noun｝, ｛Baseball: No
The preceding word string candidates are obtained in the order of “no particles {,...

【００６１】ステップＳＴ１１３では、ＲＡＭ５に記憶
してある最大尤度を持つ解の単語列を読み出す。最大尤
度は、言語尤度と音響尤度の積の最大値で近似される。
この実施の形態１では、計算の結果、音韻列候補「＃ｓ
ａＮｇｏｏａｃｉｎｏｓｅＮｓｅｅ＃」は該当する音韻
ｎグラムが存在しないため捨てられる。音韻列候補「＃
ｓａＮｇｏｏａａｃｉｎｏｓｅＮｓｅｅ＃」に対して、
「｛＃＃＃文頭｝、｛野球：３号ｓａＮｇｏｏ
さんごう名詞｝、｛野球：アーチａａｃｉあー
ち普通名詞｝、｛野球：のｎｏの接続助詞｝、
｛野球：先制ｓｅＮｓｅｅせんせいサ変名詞｝」の
音声認識結果が、また最大尤度が前述の式（６）で求め
られる単語列確率Ｐ（Ｗ）中の最大値より、５.４×１
０^−９（音響尤度；０.９、言語尤度；６×１０^−９）
と得られる。In step ST113, the word string of the solution having the maximum likelihood stored in the RAM 5 is read. The maximum likelihood is approximated by the maximum value of the product of the linguistic likelihood and the acoustic likelihood.
In the first embodiment, as a result of the calculation, the phoneme sequence candidate “#s
"aNgooacinoseNsee #" is discarded because there is no corresponding phoneme n-gram. Phoneme string candidate "#
saNgooacacinoseNsee # "
"｛# ＃文文,｝ Baseball: No. 3 saNgoo
Sango noun｝, ｛Baseball: Arch aaci Aichi Common noun｝, ｛Baseball: no no connecting particle 接続,
The result of the speech recognition of “baseball: pre-emptive seNsee sensei sa inflected noun” is 5.4 × 1 from the maximum value in the word string probability P (W) obtained by the above equation (6).
0 ^-9 (acoustic likelihood; 0.9, language likelihood; 6 × ^{10 -9)}
Is obtained.

【００６２】次にステップＳＴ１１４において、ＲＡＭ
５から読み出した解の単語列から表記のみを取り出し、
それを出力手段６から出力した後、ステップＳＴ１１５
に進んでこの一連の音声認識処理を終了する。このよう
にして、この実施の形態１では認識結果として、「３号
アーチの先制」が得られる。Next, in step ST114, the RAM
Only the notation is extracted from the word string of the solution read from 5,
After outputting it from the output means 6, step ST115
To end this series of speech recognition processing. In this way, in the first embodiment, “No. 3 arch preemption” is obtained as a recognition result.

【００６３】以上のように、この実施の形態１によれ
ば、話題を分離して統計量をとって音声認識を行ってい
るので、部分的には「の先制」よりも「の先生」の
２グラム確率の方が高いにもかかわらず、「の先制」
と認識され、ｎグラムの次数を大きくすることなく言語
制約の強いｎグラムを構成することができ、高精度な音
声認識装置を構築できるという効果が得られる。なお、
本実施例では２つの話題を扱ったが、３つ以上の話題を
扱うように構成しても良い。As described above, according to the first embodiment, speech recognition is performed by separating topics and taking statistics, so that the participant's Despite the higher 2 gram probability, "no pre-emption"
Therefore, it is possible to construct an n-gram with a strong language constraint without increasing the order of the n-gram, and it is possible to obtain an effect that a highly accurate speech recognition device can be constructed. In addition,
In the present embodiment, two topics are dealt with, but it is also possible to configure to deal with three or more topics.

【００６４】実施の形態２．なお、上記実施の形態１に
おいては、特に考慮していなかったが、単語列候補の算
出時に、一連の音声に対する音韻ｎグラム中の話題がす
べて一致するように単語確率算出手段を構成してもよ
い。図５はそのようなこの発明の実施の形態２による音
声認識装置の構成を示すブロック図である。Embodiment 2 Although not particularly taken into consideration in the first embodiment, the word probability calculating means may be configured such that all the topics in the phoneme n-gram for a series of voices match when calculating the word string candidates. Good. FIG. 5 is a block diagram showing a configuration of such a speech recognition device according to Embodiment 2 of the present invention.

【００６５】図において、１はマイク、２は音韻確率算
出手段、５はＲＡＭ、６は出力手段であり、これらは図
１に同一符号を付して示した実施の形態１のそれらと同
等の部分である。１２は図１に符号９を付して示したも
のに相当する単語確率算出手段であるが、単語列候補の
算出時に、一連の音声に対する音韻ｎグラム中の話題が
すべて一致するように構成されている点で異なってい
る。１３、１４は図１に符号７、８を付して示したもの
に相当する、野球話題の音韻ｎグラムおよび一般話題の
音韻ｎグラムであるが、この場合、２グラムのみが用い
られ、１グラムは用いられていない。In the figure, 1 is a microphone, 2 is a phoneme probability calculating means, 5 is a RAM, and 6 is an output means, which are the same as those of the first embodiment shown in FIG. Part. Numeral 12 is a word probability calculating means corresponding to the one indicated by reference numeral 9 in FIG. 1, and is configured such that all topics in a phonological n-gram for a series of voices match when calculating a word string candidate. Is different. Numerals 13 and 14 are a phoneme n-gram of a baseball topic and a phoneme n-gram of a general topic, which correspond to those indicated by reference numerals 7 and 8 in FIG. Grams are not used.

【００６６】ここで、図６は音韻ｎグラムの具体例を示
す説明図である。図において、１５はその音韻ｎグラム
であり、この音韻ｎグラム１５は野球話題の音韻ｎグラ
ム１３と一般話題の音韻ｎグラム１４とが記録されてい
る。前述のように、この音韻ｎグラム１５の野球話題の
音韻ｎグラム１３と一般話題の音韻ｎグラム１４には、
それぞれキーとなる各音韻列に対して、前接形態素、後
接形態素、および確率が記録された２グラムのみが用い
られている。FIG. 6 is an explanatory diagram showing a specific example of a phoneme n-gram. In the figure, reference numeral 15 denotes the phoneme n-gram, and the phoneme n-gram 15 records a phoneme n-gram 13 of a baseball topic and a phoneme n-gram 14 of a general topic. As described above, the phoneme n-gram 13 of the baseball topic and the phoneme n-gram 14 of the general topic of the phoneme n-gram 15 include:
For each phoneme sequence that is a key, only two grams in which the preceding morpheme, the succeeding morpheme, and the probability are recorded are used.

【００６７】次に動作について説明する。図７はこのよ
うに構成された実施の形態２による音声認識装置の概略
動作の流れを示すフローチャートである。この実施の形
態２においても、まず、ステップＳＴ１０１からステッ
プＳＴ１０７において、実施の形態１の場合と全く同様
の処理が行われる。ステップＳＴ１０７にてＲＡＭ５か
ら先行単語列候補の１つが取り出されると、単語確率算
出手段１２はステップＳＴ１２０において、音韻ｎグラ
ム１５を先行単語列候補の音韻列情報によって検索し、
前方一致する後方単語があるか否かをチェックする。そ
のとき、実施の形態１では、音韻ｎグラム１１の野球話
題の音韻ｎグラム７と一般話題の音韻ｎグラム８は、そ
れぞれ２グラムと１グラムの双方が用いられていたが、
この実施の形態２では、野球話題の音韻ｎグラム１３と
一般話題の音韻ｎグラム１４がそれぞれ２グラムのみの
音韻ｎグラム１５を用いて一致検出を行っている。チェ
ックの結果、前方一致した後方単語がある場合にはステ
ップＳＴ１０９に移り、以下ステップＳＴ１１５まで、
実施の形態１と同様に処理を進める。Next, the operation will be described. FIG. 7 is a flowchart showing a schematic operation flow of the speech recognition apparatus according to the second embodiment thus configured. Also in the second embodiment, first, in steps ST101 to ST107, exactly the same processing as in the first embodiment is performed. When one of the preceding word string candidates is extracted from the RAM 5 in step ST107, the word probability calculation means 12 searches the phoneme n-gram 15 in step ST120 using the phoneme string information of the preceding word string candidate.
Check if there is a backward matching word. At that time, in the first embodiment, both the 2-gram and the 1-gram are used as the phoneme n-gram 7 of the baseball topic and the phoneme n-gram 8 of the general topic of the phoneme n-gram 11, respectively.
In the second embodiment, the phoneme n-gram 13 of the baseball topic and the phoneme n-gram 14 of the general topic are detected by using the phoneme n-gram 15 of only 2 grams each. As a result of the check, if there is a backward word that matches the beginning, the process proceeds to step ST109.
The process proceeds as in the first embodiment.

【００６８】以上のように、この実施の形態２によれ
ば、単語確率算出手段１２は音韻ｎグラム１５の２グラ
ムのみを用いて一致を検査しているので、１つの発話に
対する一連の形態素は同一の話題の形態素となるため、
発話中に他の話題が交ざることを防止することができる
という効果が得られる。As described above, according to the second embodiment, since the word probability calculating means 12 checks the match using only two grammes of the phoneme n-gram 15, a series of morphemes for one utterance is To be morphemes on the same topic,
The effect is obtained that it is possible to prevent other topics from intermingling during the utterance.

【００６９】実施の形態３．なお、上記実施の形態１お
よび実施の形態２では、音声認識において、話題ごとの
確率の重み調整については特に考慮していなかったが、
話題ごとに確率の重みの調整を可能にするようにしても
よい。図８はそのようなこの発明の実施の形態３による
音声認識装置の構成を示すブロック図である。図におい
て、１はマイク、２は音韻確率算出手段、５はＲＡＭ、
６は出力手段、１３は野球話題の音韻ｎグラム、１４は
一般話題の音韻ｎグラムであり、これらは図５に同一符
号を付して示した実施の形態２のそれらと同等の部分で
ある。１６は図５に符号３を付して示したものに相当す
る単語確率算出手段であるが、話題ごとに確率の重みを
調整可能に構成されている点で異なっている。Embodiment 3 In the first and second embodiments, the weight adjustment of the probability for each topic is not particularly considered in the speech recognition.
The weight of the probability may be adjusted for each topic. FIG. 8 is a block diagram showing a configuration of such a speech recognition device according to Embodiment 3 of the present invention. In the figure, 1 is a microphone, 2 is a phoneme probability calculating means, 5 is a RAM,
Reference numeral 6 denotes an output means, 13 denotes a phoneme n-gram of a baseball topic, and 14 denotes a phoneme n-gram of a general topic, which are equivalent to those of the second embodiment shown in FIG. . Reference numeral 16 denotes a word probability calculating unit corresponding to the unit denoted by reference numeral 3 in FIG. 5, but differs in that the weight of the probability can be adjusted for each topic.

【００７０】次に動作について説明する。図９はこのよ
うに構成された実施の形態３による音声認識装置の概略
動作の流れを示すフローチャートである。この実施の形
態３においても、まず、ステップＳＴ１０１からステッ
プＳＴ１０７、およびステップＳＴ１２０において、実
施の形態２の場合と全く同様の処理が行われる。ステッ
プＳＴ１２０における２グラムのみの音韻ｎグラム１５
を用いた、前方一致する後方単語があるか否のチェック
の結果、前方一致した後方単語がない場合にはステップ
ＳＴ１０６に戻り、前方一致した後方単語がある場合に
はステップＳＴ１３０に進む。ステップＳＴ１３０では
単語確率算出手段１６が、後方単語のそれぞれについて
分野別に重み付けを行って尤度を計算し、それをＲＡＭ
５に記憶するとともに、先行単語列に後方単語を接続し
てゆき、新たに先行単語列としてＲＡＭ５に記憶する。
以下ステップＳＴ１１０からステップＳＴ１１５まで、
実施の形態２と同様に処理を進める。Next, the operation will be described. FIG. 9 is a flowchart showing a schematic operation flow of the speech recognition apparatus according to the third embodiment thus configured. In the third embodiment as well, first, in steps ST101 to ST107 and step ST120, exactly the same processing as in the second embodiment is performed. Phoneme n-gram 15 of only 2 grams in step ST120
As a result of checking whether or not there is a backward word that matches forward, the process returns to step ST106 if there is no backward word that matches forward, and proceeds to step ST130 if there is a backward word that matches forward. In step ST130, the word probability calculation means 16 calculates the likelihood by weighting each of the backward words according to the field, and stores the likelihood in the RAM.
5 and the subsequent word is connected to the preceding word string, and stored in the RAM 5 as a new preceding word string.
Hereinafter, from step ST110 to step ST115,
The process proceeds as in the second embodiment.

【００７１】以上のように、この実施の形態３によれ
ば、２グラムの確率の重みを話題別にかけるように単語
確率算出手段１６を構成しているので、話題別に出現確
率の調節が可能になるという効果が得られる。As described above, according to the third embodiment, the word probability calculating means 16 is configured to apply the weight of the probability of 2 grams to each topic, so that the appearance probability can be adjusted for each topic. Is obtained.

【００７２】実施の形態４．なお、上記実施の形態１〜
実施の形態３では音声解析装置に関するものについて説
明したが、漢字ｎグラムを構成することにより形態素解
析装置を構築することも可能である。図１０はそのよう
なこの発明の実施の形態４による形態素解析装置の構成
を示すブロック図である。Embodiment 4 It should be noted that the first to the first embodiments
In the third embodiment, a description has been given of a speech analysis apparatus. However, it is also possible to construct a morphological analysis apparatus by configuring a kanji n-gram. FIG. 10 is a block diagram showing a configuration of such a morphological analyzer according to Embodiment 4 of the present invention.

【００７３】図において、１７は仮名漢字混じり文字列
（入力ファイル）を入力する入力手段としてのファイル
入力装置である。１８、１９は仮名漢字混じり文字列
と、仮名漢字混じり文字列に対応する単語表記列と、生
起確率とを記憶する漢字ｎグラムであり、この漢字ｎグ
ラム中では単語が、それぞれの話題に対応して分類され
ており、漢字ｎグラム１８としては野球の話題について
記憶した野球話題の漢字ｎグラムについて、漢字ｎグラ
ム１９としては一般の話題について記憶した一般話題の
漢字ｎグラムについてそれぞれ例示されている。２０は
これら野球話題の漢字ｎグラム１８および一般話題の漢
字ｎグラム１９を参照して、ファイル入力装置１７が出
力する仮名漢字混じり文字列に対応する各単語候補の単
語生起確率を算出する形態素確率算出手段である。５は
処理過程の情報を記憶するＲＡＭであり、２１は形態素
確率算出手段２０で算出された単語生起確率を用いて求
めた、ファイル入力装置１７より入力された文字列に適
合する単語列候補を出力する出力手段である。In the figure, reference numeral 17 denotes a file input device as input means for inputting a character string (input file) mixed with kana and kanji. Numerals 18 and 19 are kanji n-grams storing a kana-kanji mixed character string, a word notation string corresponding to the kana-kanji mixed character string, and an occurrence probability. In the kanji n-gram, words correspond to respective topics. The kanji n-gram 18 is exemplified for a kanji n-gram of a baseball topic stored on a baseball topic, and the kanji n-gram 19 is exemplified for a kanji n-gram of a general topic stored on a general topic. I have. Reference numeral 20 denotes the morpheme probability for calculating the word occurrence probability of each word candidate corresponding to the kana-kanji mixed character string output by the file input device 17 with reference to the baseball topic kanji n-gram 18 and the general topic kanji n-gram 19. It is a calculating means. Reference numeral 5 denotes a RAM for storing information on a process, and reference numeral 21 denotes a word string candidate which is obtained by using the word occurrence probability calculated by the morphological probability calculation means 20 and which matches the character string input from the file input device 17. Output means for outputting.

【００７４】以下、単語列候補の生成について説明す
る。この実施の形態４における単語列候補の生成は、単
語列の出現確率Ｐ（Ｗ）を最大にするＷを算出すること
で得られる。このとき、Ｗは入力された単語列である。
また、単語列の出現確率Ｐ（Ｗ）は、ｍ語の単語列Ｗが
前述の式（４）で決定されるとき、前述の式（６）から
求める。なお、その際には野球話題の漢字ｎグラム１
８、一般話題の漢字ｎグラム１９の確率が使用される。Hereinafter, generation of a word string candidate will be described. Generation of a word string candidate in the fourth embodiment is obtained by calculating W that maximizes the word string appearance probability P (W). At this time, W is the input word string.
The word string appearance probability P (W) is obtained from the above equation (6) when the m word row W is determined by the above equation (4). In this case, the baseball kanji n-gram 1
8. Probability of kanji n-gram 19 of general topic is used.

【００７５】上述した計算により、野球話題の漢字ｎグ
ラム１８および一般話題の漢字ｎグラム１９に単語の列
が存在するものについて、単語列確率Ｐ（Ｗ）を最大に
するＷを算出する。なお、組み合わせの計算について
は、例えば、長尾真著：「自然言語処理」に示されるＶ
ｉｔｅｒｂｉ方法を用いて高速に行ってもよいし、確率
を対数確率として計算式を総和で計算可能としてもよ
い。それぞれの単語の出現確率は単語の野球話題の漢字
ｎグラム１８、一般話題の漢字ｎグラム１９に予め記憶
してある確率値をもとに算出する。With the above-described calculation, W that maximizes the word string probability P (W) is calculated for those in which the word string exists in the kanji n-gram 18 of the baseball topic and the kanji n-gram 19 of the general topic. Note that the calculation of the combination is described in, for example, V in Makoto Nagao: “Natural Language Processing”.
The calculation may be performed at high speed by using the iterbi method, or the calculation formula may be calculated by summation using the probability as a logarithmic probability. The appearance probability of each word is calculated based on the probability values stored in advance in the kanji n-gram 18 of the baseball topic and the kanji n-gram 19 of the general topic.

【００７６】ここで、図１１は図２に示した例文１０を
もとに作成した漢字ｎグラムの具体例を示す説明図であ
り、図において、２２がその漢字ｎグラムであり、この
漢字ｎグラム２２には野球話題の漢字ｎグラム１８と一
般話題の漢字ｎグラム１９とが記録されている。FIG. 11 is an explanatory view showing a specific example of a kanji n-gram created based on the example sentence 10 shown in FIG. 2. In the figure, reference numeral 22 denotes the kanji n-gram. The gram 22 records a kanji n-gram 18 of a baseball topic and a kanji n-gram 19 of a general topic.

【００７７】図１１に示すように、この漢字ｎグラム２
２内の野球話題の漢字ｎグラム１８と一般話題の漢字ｎ
グラム１９には、それぞれ２グラムと１グラムがあり、
先頭の漢字列が検索のためのキーとなっている。２グラ
ムではキーとなる各漢字列に対して、前接形態素、後接
形態素、および確率が記録されている。ここで記録され
ている確率は、前接形態素の次に後接形態素の接続する
確率であり、その２グラムの生起確率に該当する。ま
た、１グラムではキーとなる各音韻列に対して、直接次
に連接する後接続形態素と確率が記録されている。この
１グラムの確率はその形態素自身の生起確率である。な
お、形態素は表記、音素表記、見出し読み、および品詞
の組であらわされる。As shown in FIG. 11, this kanji n-gram 2
Baseball topic kanji n-gram 18 and general topic kanji n in 2
Gram 19 has 2 grams and 1 gram, respectively.
The first kanji string is the key for the search. In the 2 gram, the preceding morpheme, the succeeding morpheme, and the probability are recorded for each key kanji string. The probability recorded here is the probability that the next morpheme is connected after the preceding morpheme, and corresponds to the occurrence probability of 2 grams. In addition, in 1 gram, for each phoneme sequence serving as a key, the subsequent connected morpheme and the probability directly connected next are recorded. The probability of one gram is the occurrence probability of the morpheme itself. Note that a morpheme is represented by a set of notation, phoneme notation, headline reading, and part of speech.

【００７８】算出した単語列Ｗを認識結果として出力手
段２１より出力する。The output means 21 outputs the calculated word string W as a recognition result.

【００７９】次に動作について説明する。ここで、図１
２はこの実施の形態４による形態素解析装置における解
析処理の概略動作の流れを示すフローチャートである。
この形態素解析の処理はステップＳＴ２０１において、
ファイル入力装置１７より仮名漢字混じり文字列を入力
することによって処理が開始される。ファイル入力装置
１７はステップＳＴ２０２でその入力された仮名漢字交
じり文字列を取り込み、形態素確率算出手段２０に入力
する。形態素確率算出手段２０はファイル入力装置１７
の取り込んだ仮名漢字交じり文字列が入力されると、ス
テップＳＴ２０３においてＲＡＭ５にこれを記憶する。
この実施の形態４では、仮名漢字交じり文字列として以
下が入力されたと仮定する。３号アーチのせんせいNext, the operation will be described. Here, FIG.
2 is a flowchart showing a schematic operation flow of an analysis process in the morphological analyzer according to the fourth embodiment.
In this morphological analysis processing, in step ST201,
The process is started by inputting a character string mixed with kana and kanji from the file input device 17. The file input device 17 takes in the input kana-kanji mixed character string in step ST202 and inputs it to the morpheme probability calculation means 20. The morpheme probability calculating means 20 is a file input device 17
When a character string mixed with kana and kanji is input, it is stored in the RAM 5 in step ST203.
In the fourth embodiment, it is assumed that the following has been input as a kana-kanji mixed character string. Teacher of No. 3 arch

【００８０】次にステップＳＴ２０４において、形態素
確率算出手段２０はステップＳＴ２０３でＲＡＭ５に記
憶させた漢字列候補を取り出すとともに、初期化処理を
する。この初期化処理では、ヌル単語「｛＃＃＃
文頭｝」とその確率値「１」を先行単語列候補の初期値
としてＲＡＭ５に記憶する。従って、ここでは、漢字列
候補として、「＃３号アーチのせんせい＃」が、まず取
り出される。形態素確率算出手段２０はさらにステップ
ＳＴ２０５において、すべての先行単語列候補が漢字列
候補の末端の漢字と対応したか否かをチェックし、すべ
て対応していれば処理をステップＳＴ２１１に移し、対
応していなければ処理をステップＳＴ２０６に進める。Next, in step ST204, the morpheme probability calculating means 20 retrieves the kanji string candidates stored in the RAM 5 in step ST203 and performs an initialization process. In this initialization process, the null word "｛####"
The sentence head｝ ”and its probability value“ 1 ”are stored in the RAM 5 as initial values of preceding word string candidates. Therefore, here, "# 3 arch teacher #" is first extracted as a kanji string candidate. In step ST205, the morpheme probability calculation means 20 further checks whether or not all preceding word string candidates correspond to the terminal kanji of the kanji string candidate. If all of the preceding word string candidates correspond, the process proceeds to step ST211. If not, the process proceeds to step ST206.

【００８１】ステップＳＴ２０６では、形態素確率算出
手段２０はＲＡＭ５から先行単語列候補を１つ取り出
す。この実施の形態４では、最初に「｛＃＃＃文
頭｝」が先行単語列候補として取り出される。次にステ
ップＳＴ２０７において、野球話題の漢字ｎグラム１８
および一般話題の漢字ｎグラム１９を、先行単語列候補
の漢字列情報により検索し、検索した先行単語以降の漢
字列候補の部分列に、前方一致する後方単語があるか否
かのチェックをする。チェックの結果、前方一致した後
方単語が無い場合には、ステップＳＴ２０５に処理を戻
し、前方一致した後方単語がある場合には、ステップＳ
Ｔ２０８に処理を進める。In step ST206, the morpheme probability calculation means 20 extracts one preceding word string candidate from the RAM 5. In the fourth embodiment, "{#### sentence start}" is first extracted as a preceding word string candidate. Next, in step ST207, the baseball kanji n-gram 18
In addition, the kanji n-gram 19 of the general topic is searched by using the kanji string information of the preceding word string candidate, and it is checked whether or not the partial string of the kanji string candidate after the searched preceding word has a backward word that matches forward. . As a result of the check, if there is no backward word that matches forward, the process returns to step ST205. If there is a backward word that matches forward, the process returns to step ST205.
The process proceeds to T208.

【００８２】従って、この実施の形態４の場合には、初
期の先行単語列である「｛＃＃＃文頭｝」をまず検
索する。そして、この検索した先行単語列「｛＃＃
＃文頭｝」の後方単語として、野球話題の漢字ｎグラム
１８および一般話題の漢字ｎグラム１９を検索し、
「＃」に後続する「３号ア…」の先頭からの漢字列が部
分一致する単語を検索して後方単語とする。２グラムで
は「＃３号」が「＃３号ア…」の漢字列と前方一致する
ので、この２グラムの後接形態素「野球：３号ｓａＮｇ
ｏｏさんごう名詞」を後方単語の候補の１つとす
る。また、１グラムの「｛野球：３号ｓａＮｇｏｏ
さんごう名詞｝」は後方の漢字列に前方一致するので
これも候補とする。さらに「｛一般：３号ｓａＮｇｏ
ｏさんごう名詞｝」も候補となる。Therefore, in the case of the fourth embodiment, the initial preceding word string “{#### sentence beginning}” is searched first. Then, the searched preceding word string “｛##
As a backward word of "# sentence #", a kanji n-gram 18 of a baseball topic and a kanji n-gram 19 of a general topic are searched.
A word whose kanji string partially matches from the beginning of "No. 3a ..." following "#" is searched for as a backward word. In the 2 gram, "# 3" matches the kanji string of "# 3a ..." in front, so the grammatical suffix "baseball: 3 saNg of the 2 gram"
oo sango noun ”is one of the candidates for the backward word. Also, 1 gram of "@ Baseball: No. 3 saNgoo
Sangu Noun｝ ”matches the beginning of the back kanji string, so this is also a candidate. Furthermore, "｛General: No. 3 saNgo
o sango noun｝ ”is also a candidate.

【００８３】ステップＳＴ２０８では、後方単語のそれ
ぞれについて尤度を計算し、ＲＡＭ５に記憶するととも
に、先行単語列に後方単語を接続してゆく。この際に、
２グラムの場合は話題が先行の形態素と同じになるよう
にし、１グラムの場合は連接がないため話題の切り替わ
りを許すようにする。この後方単語を接続した先行単語
列を、新たに先行単語列としてＲＡＭ５に記憶する。こ
の実施の形態４では、先行単語列「｛＃＃＃文
頭｝」を「｛野球：＃＃＃文頭｝、｛野球：３号
ｓａＮｇｏｏさんごう名詞｝」に置き換える。言
語尤度は、先行単語列「｛＃＃＃文頭｝」の確率
１と、野球話題の「｛＃｝，｛３号｝」の２グラムの確
率０.０１から前述の式（７）で計算される。In step ST208, the likelihood is calculated for each of the backward words, stored in the RAM 5, and the backward words are connected to the preceding word string. At this time,
In the case of 2 grams, the topic is the same as the preceding morpheme, and in the case of 1 gram, there is no connection, so that the topic can be switched. The preceding word string to which the backward word is connected is stored in the RAM 5 as a new preceding word string. In the fourth embodiment, the preceding word string “{#### sentence head}” is replaced with “{baseball: #### headed head}, {baseball: No. 3 saNgoo sango noun}. The linguistic likelihood is calculated from the above-described equation (7) from the probability 1 of the preceding word string “{#### sentence start}” and the probability 0.01 of the 2-gram of the baseball topic “{#}, {3}}. Is calculated.

【００８４】次にステップＳＴ２０９において、漢字列
全体が先行単語列に対応したか否かのチェックを行い、
対応していればステップＳＴ２１０に進んで、最大尤度
および解の先行単語列をＲＡＭ５に記憶した後、処理を
ステップＳＴ２０５に戻し、すべての先行単語列候補が
漢字列候補の末端の単語と対応したか否かをチェックす
る。一方、対応していなければ、そのまま処理をステッ
プＳＴ２０５に戻して上記チェックを行う。Next, in step ST209, it is checked whether or not the entire kanji string corresponds to the preceding word string.
If so, the process proceeds to step ST210, where the maximum likelihood and the preceding word string of the solution are stored in the RAM 5, and then the process returns to step ST205, where all preceding word string candidates correspond to the terminal word of the kanji string candidate. Check if you have done it. On the other hand, if not, the process returns to step ST205 and the above check is performed.

【００８５】この実施の形態４では、以上の処理によ
り、漢字列候補に対応して、「｛＃＃＃文頭｝、
｛野球：３号ｓａＮｇｏｏさんごう名詞｝、｛野
球：アーチａａｃｉあーち名詞｝、｛野球：の
ｎｏの助詞｝、…」の順に先行単語列候補が得られ
る。In the fourth embodiment, by the above processing, “{### sentence start},
｛Baseball: No.3 saNgoo Sango Noun｝, ｛Baseball: Arch Aaci Aichi Noun｝, ｛Baseball: No
The preceding word string candidates are obtained in the order of “no particles {,...

【００８６】ステップＳＴ２０５ですべての先行単語列
候補が漢字列候補の末端の単語と対応していると判定さ
れた場合には、ステップＳＴ２１１に進んでＲＡＭ５に
記憶してある最大尤度を持つ解の単語列を読み出す。こ
こで、最大尤度は言語尤度と音響尤度の積の最大値であ
る。この実施の形態４では漢字列候補「＃３号アーチの
先制＃」に対して、「｛＃＃＃文頭｝、｛３号
ｓａＮｇｏｏさんごう名詞｝、｛アーチａａｃ
ｉあーち名詞｝、｛のｎｏの接続助詞｝、
｛先制ｓｅＮｓｅｅせんせいサ変名詞｝」の形態
素解析結果が、また最大尤度が前述の式（６）で求めら
れる単語列確率Ｐ（Ｗ）中の最大値より、５.４×１０
^−９（音響尤度；０.９、言語尤度；６×１０^−９）と
得られる。If it is determined in step ST205 that all preceding word string candidates correspond to the terminal word of the kanji string candidate, the flow advances to step ST211 to select a solution having the maximum likelihood stored in the RAM 5. Is read out. Here, the maximum likelihood is the maximum value of the product of the language likelihood and the acoustic likelihood. In the fourth embodiment, for the kanji string candidate “# 3 arch pre-emption #”, “｛#### sentence beginning｝, ｛3 saNgoo sango noun｝, ｛arch aac
i ah noun｝, 接続 no no connecting particle｝,
The result of the morphological analysis of {preemptive seNsee sensa sa noun} has a maximum likelihood of 5.4 × 10 more than the maximum value in the word string probability P (W) obtained by the above equation (6).
⁻⁹ (acoustic likelihood: 0.9, language likelihood: 6 × 10 ⁻⁹ ).

【００８７】次にステップＳＴ２１２において、ＲＡＭ
５から読み出した解の形態素列を取り出し、それを出力
手段２１から出力した後、ステップＳＴ２１３に進んで
この一連の形態素解析処理を終了する。このようにし
て、この実施の形態４では解析結果として、「｛３号
さんごう名詞｝、｛アーチあーち名詞｝、｛のの
接続助詞｝、｛せんせいせんせいサ変名詞｝」が
得られる。Next, in step ST212, the RAM
After extracting the morpheme sequence of the solution read out from No. 5 and outputting it from the output means 21, the process proceeds to step ST213, and this series of morphological analysis processes is ended. As described above, in the fourth embodiment, as the analysis result, “$ 3
Sangu noun｝, ｛arch ち noun｝, ｛no connecting particle｝, ｛せせ変変変.

【００８８】以上のように、この実施の形態４によれ
ば、話題を分離して統計量をとって形態素解析を行って
いるので、部分的には「のせんせい」という曖昧な表
記でも「先制」の意味で品詞がサ変であることが算出で
き、ｎグラムの次数を大きくすることなく言語制約の強
いｎグラムを構成することができ、高精度な形態素解析
装置を構築できるという効果が得られる。なお、本実施
例では２つの話題を扱ったが、３つ以上の話題を扱うよ
うに構成しても良い。As described above, according to the fourth embodiment, since the morphological analysis is performed by separating the topics and taking the statistics, the vague notation of “no teacher” is partially expressed as “pre-emption”. ”Means that the part of speech is bimodal, an n-gram with a strong language constraint can be constructed without increasing the degree of the n-gram, and an effect that a highly accurate morphological analyzer can be constructed can be obtained. . Although two topics are dealt with in this embodiment, three or more topics may be dealt with.

【００８９】実施の形態５．なお、上記実施の形態４で
は、特に考慮していなかったが、単語列候補の算出時
に、一連の仮名漢字混じり文字列に対する漢字ｎグラム
中の話題がすべて一致するように形態素確率算出手段を
構成してもよい。図１３はそのようなこの発明の実施の
形態５による形態素解析装置の構成を示すブロック図で
ある。Embodiment 5 Although not specifically considered in the fourth embodiment, the morpheme probability calculation means is configured so that all topics in the kanji n-gram for a series of kana-kanji mixed character strings match when calculating word string candidates. May be. FIG. 13 is a block diagram showing a configuration of such a morphological analyzer according to Embodiment 5 of the present invention.

【００９０】図において、５はＲＡＭ、１７はファイル
入力装置、２１は出力手段であり、これらは図１０に同
一符号を付して示した実施の形態４のそれらと同等の部
分である。２３は図１０に符号１８を付して示したもの
に相当する形態素確率算出手段であるが、単語列候補の
算出時に、一連の仮名漢字混じり文字列に対する漢字ｎ
グラム中の話題がすべて一致するように構成されている
点で異なっている。２４、２５は図１０に符号１８、１
９を付して示したものに相当する、野球話題の漢字ｎグ
ラムおよび一般話題の漢字ｎグラムであるが、この場合
には２グラムのみが用いられ、１グラムは用いられてい
ない。In the figure, 5 is a RAM, 17 is a file input device, and 21 is an output means, which are equivalent to those of the fourth embodiment shown in FIG. Numeral 23 denotes a morpheme probability calculating means corresponding to the one denoted by reference numeral 18 in FIG. 10, but when calculating a word string candidate, a kanji character n for a series of kana-kanji mixed character strings is used.
The difference is that all topics in the gram are configured to match. Reference numerals 24 and 25 in FIG.
The kanji n-gram of the topic of baseball and the kanji n-gram of the general topic correspond to those indicated by adding 9, but in this case, only 2 grams are used and 1 gram is not used.

【００９１】ここで、図１４は漢字ｎグラムの具体例を
示す説明図である。図において、２６はその漢字ｎグラ
ムであり、この漢字ｎグラム２６は野球話題の漢字ｎグ
ラム２４と一般話題の漢字ｎグラム２５とが記録されて
いる。前述のように、この漢字ｎグラム２６の野球話題
ｎグラム２４と一般話題ｎグラム２５には、それぞれキ
ーとなる各漢字列に対して、前接形態素、後接形態素、
および確率が記録された２グラムのみが用いられてい
る。FIG. 14 is an explanatory diagram showing a specific example of a kanji n-gram. In the figure, reference numeral 26 denotes the kanji n-gram, and the kanji n-gram 26 records a kanji n-gram 24 of a baseball topic and a kanji n-gram 25 of a general topic. As described above, the baseball topic n-gram 24 and the general topic n-gram 25 of the kanji n-gram 26 have a leading morpheme, a trailing morpheme,
Only 2 grams with recorded probability are used.

【００９２】次に動作について説明する。図１５はこの
ように構成された実施の形態５による形態素解析装置の
概略動作の流れを示すフローチャートである。この実施
の形態５においても、まず、ステップＳＴ２０１からス
テップＳＴ２０６において、実施の形態４の場合と全く
同様の処理が行われる。ステップＳＴ２０６にてＲＡＭ
５から先行単語列候補の１つが取り出されると、形態素
確率算出手段２３はステップＳＴ２２０において、漢字
ｎグラム２６を先行単語列候補の漢字列情報によって検
索し、前方一致する後方単語があるか否かのチェックを
する。そのとき、実施の形態４では、漢字ｎグラム２２
の野球話題の漢字ｎグラム１８と一般話題の漢字ｎグラ
ム１９は、それぞれ２グラムと１グラムの双方が用いら
れていたが、この実施の形態５では、野球話題の漢字ｎ
グラム２４と一般話題の漢字ｎグラム２５が、それぞれ
２グラムのみの漢字ｎグラム２６を用いて一致検出を行
っている。チェックの結果、前方一致した後方単語があ
る場合にはステップＳＴ２０８に分岐して、以下ステッ
プＳＴ２１３まで、実施の形態４と同様に処理を進め
る。Next, the operation will be described. FIG. 15 is a flowchart showing a schematic operation flow of the morphological analyzer according to the fifth embodiment thus configured. Also in the fifth embodiment, the same processing as in the fourth embodiment is performed in steps ST201 to ST206. RAM in step ST206
5, one of the preceding word string candidates is taken out, and in step ST220, the morpheme probability calculating means 23 searches the kanji n-gram 26 by the kanji string information of the preceding word string candidate and determines whether or not there is a backward word that matches forward. Check At that time, in the fourth embodiment, the kanji n-gram 22
The baseball topic kanji n-gram 18 and the general topic kanji n-gram 19 used both 2 grams and 1 gram, respectively.
The gram 24 and the kanji n-gram 25 of the general topic are used to detect coincidence using the kanji n-gram 26 of only 2 grams each. As a result of the check, if there is a backward word that matches the beginning, the process branches to step ST208, and the process proceeds to step ST213 as in the fourth embodiment.

【００９３】以上のように、この実施の形態５によれ
ば、形態素確率算出手段２３は漢字ｎグラム２６の２グ
ラムのみを用いて一致を検査しているので、１つの仮名
漢字混じり文字列に対する一連の形態素は同一の話題の
形態素となるため、他の話題が交ざることを防止するこ
とができるという効果が得られる。As described above, according to the fifth embodiment, since the morpheme probability calculating means 23 checks the match using only two grams of the kanji n-gram 26, the morpheme probability calculating means 23 determines whether a character string containing one kana kanji is mixed. Since a series of morphemes are morphemes of the same topic, it is possible to prevent another topic from intersecting.

【００９４】実施の形態６．なお、上記実施の形態４お
よび実施の形態５では、形態素解析において、話題ごと
の確率の重み調整については特に考慮していなかった
が、話題ごとに確率の重みの調整を可能にするように形
態素確率算出手段を構成してもよい。図１６はそのよう
なこの発明の実施の形態６による形態素解析装置の構成
を示すブロック図である。図において、５はＲＡＭ、１
７はファイル入力装置、２１は出力手段、２４、２５は
野球話題および一般話題の漢字ｎグラムであり、これら
は図１３に同一符号を付して示した実施の形態５のそれ
らと同等の部分である。２７は図１３に符号２３を付し
て示したものに相当する形態素確率算出手段であるが、
話題ごとに確率の重みを調整可能に構成されている点で
異なっている。Embodiment 6 FIG. In the fourth and fifth embodiments, the morphological analysis does not particularly consider the adjustment of the probability weight for each topic. However, the morphological analysis is performed so that the adjustment of the probability weight for each topic becomes possible. Probability calculation means may be configured. FIG. 16 is a block diagram showing a configuration of such a morphological analyzer according to Embodiment 6 of the present invention. In the figure, 5 is a RAM, 1
7 is a file input device, 21 is output means, and 24 and 25 are kanji n-grams of baseball topics and general topics, which are the same as those of the fifth embodiment shown in FIG. It is. Reference numeral 27 denotes a morpheme probability calculating unit corresponding to the unit indicated by the reference numeral 23 in FIG.
The difference is that the weight of the probability can be adjusted for each topic.

【００９５】次に動作について説明する。図１７はこの
ように構成された実施の形態６による形態素解析装置の
概略動作の流れを示すフローチャートである。この実施
の形態６においても、まず、ステップＳＴ２０１からス
テップＳＴ２０６、およびステップＳＴ２２０におい
て、実施の形態５の場合と全く同様の処理が行われる。
ステップＳＴ２２０における２グラムのみの漢字ｎグラ
ム２６を用いた、前方一致する後方単語があるか否のチ
ェックの結果、前方一致した後方単語がない場合にはス
テップＳＴ２０５に戻り、前方一致する後方単語がある
場合にはステップＳＴ２３０に進む。ステップＳＴ２３
０では形態素確率算出手段２７が、後方単語のそれぞれ
について分野別に重み付けを行って尤度を計算し、それ
をＲＡＭ５に記憶するとともに、先行単語列に後方単語
を接続してゆき、新たに先行単語列としてＲＡＭ５に記
憶する。以下ステップＳＴ２０９からステップＳＴ２１
３まで、実施の形態５と同様に処理を進める。Next, the operation will be described. FIG. 17 is a flowchart showing a schematic operation flow of the morphological analyzer according to the sixth embodiment thus configured. Also in the sixth embodiment, the same processing as that of the fifth embodiment is performed in steps ST201 to ST206 and step ST220.
As a result of checking whether or not there is a backward word that matches forward using the kanji n-gram 26 of only 2 grams in step ST220, if there is no backward word that matches forward, the process returns to step ST205, and the backward word that matches forward appears. If there is, the process proceeds to step ST230. Step ST23
In the case of 0, the morpheme probability calculation means 27 calculates the likelihood by weighting each of the backward words for each field, stores the likelihood in the RAM 5, connects the backward word to the preceding word string, and newly adds the preceding word. It is stored in the RAM 5 as a column. Hereinafter, steps ST209 to ST21
Up to 3, the process proceeds in the same manner as in the fifth embodiment.

【００９６】以上のように、この実施の形態６によれ
ば、２グラムの確率の重みを話題別にかけるように形態
素確率算出手段２７を構成しているので、話題別に出現
確率の調節が可能になるという効果が得られる。As described above, according to the sixth embodiment, the morpheme probability calculating means 27 is configured to apply a 2-gram probability weight to each topic, so that the appearance probability can be adjusted for each topic. Is obtained.

【００９７】実施の形態７．なお、上記実施の形態１〜
実施の形態６では音声解析装置、あるいは形態素解析装
置に関するものについて説明したが、仮名ｎグラムを構
成することにより仮名漢字変換装置を構築することも可
能である。図１８はそのようなこの発明の実施の形態７
による仮名漢字変換装置の構成を示すブロック図であ
る。Embodiment 7 FIG. It should be noted that the first to the first embodiments
Although the sixth embodiment has been described with respect to the speech analysis device or the morphological analysis device, it is also possible to construct a kana-kanji conversion device by configuring a kana n-gram. FIG. 18 shows such a seventh embodiment of the present invention.
1 is a block diagram showing a configuration of a kana-kanji conversion device according to the present invention.

【００９８】図において、２８は入力文の仮名文字列を
入力する入力手段としてのキーボードである。２９、３
０は仮名文字列と、仮名文字列に対応する単語表記列
と、生起確率とを記憶する仮名ｎグラムであり、この仮
名ｎグラム中では単語が、それぞれ話題に対応して分類
されており、仮名ｎグラム２９としては野球の話題につ
いて記憶した野球話題の仮名ｎグラムについて、仮名ｎ
グラム３０としては一般の話題について記憶した一般話
題の仮名ｎグラムについてそれぞれ例示されている。３
１はこれら野球話題の仮名ｎグラム２９および一般話題
の仮名ｎグラム３０を参照して、キーボード２８が出力
する仮名文字列に対応する各単語候補の単語生起確率を
算出する漢字確率算出手段である。５は処理過程の情報
を記憶するＲＡＭであり、３２は漢字確率算出手段３１
で算出された単語生起確率を用いて求めた、キーボード
２８より入力された仮名文字列に適合する単語列候補を
求めて出力する出力手段である。In the figure, reference numeral 28 denotes a keyboard as input means for inputting a kana character string of an input sentence. 29, 3
0 is a kana n-gram that stores a kana character string, a word notation string corresponding to the kana character string, and an occurrence probability. In the kana n-gram, words are classified according to topics, respectively. As the pseudonym n-gram 29, for the pseudonym n-gram of the baseball topic stored about the baseball topic,
Examples of the gram 30 are the kana n-grams of general topics stored for general topics. Three
Reference numeral 1 denotes a kanji probability calculating means for calculating the word occurrence probability of each word candidate corresponding to the kana character string output from the keyboard 28 with reference to the kana n-gram 29 of the baseball topic and the kana n-gram 30 of the general topic. . Reference numeral 5 denotes a RAM for storing information on the process, and 32 denotes a kanji probability calculating means 31.
This is an output unit that obtains and outputs a word string candidate that matches the kana character string input from the keyboard 28, obtained using the word occurrence probability calculated in (1).

【００９９】以下、単語列候補の生成について説明す
る。この実施の形態４における単語列候補の生成は、単
語列の出現確率Ｐ（Ｗ）を最大にするＷを算出すること
で得られる。このとき、Ｗは入力された単語列である。
また、単語列の出現確率Ｐ（Ｗ）は、ｍ語の単語列Ｗが
前述の式（４）で決定されるとき、前述の式（６）から
求める。なお、その際には野球話題の仮名ｎグラム２
９、一般話題の仮名ｎグラム３０の確率が使用される。Hereinafter, generation of a word string candidate will be described. Generation of a word string candidate in the fourth embodiment is obtained by calculating W that maximizes the word string appearance probability P (W). At this time, W is the input word string.
The word string appearance probability P (W) is obtained from the above equation (6) when the m word row W is determined by the above equation (4). In that case, the kana n-gram 2 of baseball topics
9. The probability of the kana n-gram 30 of the general topic is used.

【０１００】上述した計算により、野球話題の仮名ｎグ
ラム２９及び一般話題の仮名ｎグラム３０に単語の列が
存在するものについて、単語列確率Ｐ（Ｗ）を最大にす
るＷを算出する。なお、組み合わせの計算については、
例えば、長尾真著：「自然言語処理」に示されるＶｉｔ
ｅｒｂｉ方法を用いて高速に行ってもよいし、また、確
率を対数確率として計算式を総和で計算可能としてもよ
い。それぞれの単語の出現確率は単語の野球話題の仮名
ｎグラム２９と一般話題の仮名ｎグラム３０に予め記憶
してある確率値をもとに算出する。With the above-described calculation, W that maximizes the word string probability P (W) is calculated for words having a word string in the kana n-gram 29 of the baseball topic and the kana n-gram 30 of the general topic. For the calculation of the combination,
For example, Vit shown in Makoto Nagao: "Natural Language Processing"
The calculation may be performed at a high speed by using the erbi method, or the calculation formula may be calculated by summation using the probability as logarithmic probability. The appearance probability of each word is calculated based on the probability values stored in advance in the kana n-gram 29 of the baseball topic of the word and the kana n-gram 30 of the general topic.

【０１０１】ここで、図１９は図２に示した例文１０を
もとに作成した仮名ｎグラムの具体例を示す説明図であ
る。図において、３３がその仮名ｎグラムであり、この
仮名ｎグラム３３には野球話題の仮名ｎグラム２９と一
般話題の仮名ｎグラム３０とが記録されている。FIG. 19 is an explanatory diagram showing a specific example of a pseudonym n-gram created based on the example sentence 10 shown in FIG. In the figure, reference numeral 33 denotes the pseudonym n-gram, and the pseudonym n-gram 33 records a pseudonym n-gram 29 of a baseball topic and a pseudonym n-gram 30 of a general topic.

【０１０２】図１９に示すように、この仮名ｎグラム３
３内の野球話題の仮名ｎグラム２９と一般話題の仮名ｎ
グラム３０には、それぞれ２グラムと１グラムがあり、
先頭の仮名文字列が検索のためのキーとなっている。２
グラムではキーとなる各仮名文字列に対して、前接形態
素、後接形態素、および確率が記録されている。ここで
記録されている確率は、前接形態素の次に後接形態素の
接続する確率であり、その２グラムの生起確率に該当す
る。また、１グラムではキーとなる各音韻列に対して、
直接次に連接する後接続形態素と確率が記録されてい
る。この１グラムの確率はその形態素自身の生起確率で
ある。なお、形態素は表記、音素表記、見出し読み、お
よび品詞の組であらわされる。As shown in FIG. 19, this pseudonym n-gram 3
3 baseball kana n-gram 29 and general kana n
Gram 30 has 2 grams and 1 gram respectively,
The first kana character string is a key for searching. 2
In the gram, an antecedent morpheme, an antecedent morpheme, and a probability are recorded for each kana character string serving as a key. The probability recorded here is the probability that the next morpheme is connected after the preceding morpheme, and corresponds to the occurrence probability of 2 grams. Also, in 1 gram, for each phoneme sequence that is a key,
The subsequent connected morphemes and probabilities connected directly next are recorded. The probability of one gram is the occurrence probability of the morpheme itself. Note that a morpheme is represented by a set of notation, phoneme notation, headline reading, and part of speech.

【０１０３】算出した単語列Ｗを認識結果として出力手
段３２より出力する。The output means 32 outputs the calculated word string W as a recognition result.

【０１０４】次に動作について説明する。ここで、図２
０はこの実施の形態７による仮名漢字変換装置における
変換処理の概略動作の流れを示すフローチャートであ
る。この仮名漢字変換の処理はステップＳＴ３０１にお
いて、キーボード２８が操作されることによって処理が
開始される。キーボード２８の操作によって入力された
仮名文字列は、ステップＳＴ３０２で漢字確率算出手段
３１に取り込まれ、ステップＳＴ３０３において、ＲＡ
Ｍ５にこれを記憶する。この実施の形態７では、仮名文
字列として以下が入力されたと仮定する。さんごうあー
ちのせんせいNext, the operation will be described. Here, FIG.
0 is a flowchart showing a schematic operation flow of a conversion process in the kana-kanji conversion device according to the seventh embodiment. The kana-kanji conversion process is started by operating the keyboard 28 in step ST301. The kana character string input by the operation of the keyboard 28 is taken into the kanji probability calculating means 31 in step ST302.
This is stored in M5. In the seventh embodiment, it is assumed that the following is input as a kana character string. Teacher of cormorant art

【０１０５】次にステップＳＴ３０４において、漢字確
率算出手段３１はステップＳＴ３０３でＲＡＭ５に記憶
させた仮名文字列を取り出すとともに、初期化処理をす
る。この初期化処理では、ヌル単語「｛＃＃＃文
頭｝」とその確率値「１」を先行単語列候補の初期値と
してＲＡＭ５に記憶する。従って、ここでは、仮名文字
列として、「＃さんごうあーちのせんせい＃」が、まず
取り出される。漢字確率算出手段３１はさらにステップ
ＳＴ３０５において、すべての先行単語列候補が仮名文
字列の末端の仮名と対応したか否かをチェックし、すべ
て対応していれば処理をステップＳＴ３１１に移し、対
応していなければ処理をステップＳＴ３０６に進める。Next, in step ST304, the kanji probability calculating means 31 takes out the kana character string stored in the RAM 5 in step ST303 and performs an initialization process. In this initialization process, the null word “{#### sentence start}” and its probability value “1” are stored in the RAM 5 as initial values of the preceding word string candidates. Therefore, here, “# sango-a-no-sensei #” is first extracted as a kana character string. In step ST305, the kanji probability calculating means 31 further checks whether or not all preceding word string candidates correspond to the terminal kana of the kana character string, and if all correspond, moves the process to step ST311. If not, the process proceeds to step ST306.

【０１０６】ステップＳＴ３０６では、漢字確率算出手
段３１はＲＡＭ５から先行単語列候補を１つ取り出す。
この実施の形態７では、最初に「｛＃＃＃文
頭｝」が先行単語列候補として取り出される。次にステ
ップＳＴ３０７において、野球話題の仮名ｎグラム２９
および一般話題の仮名ｎグラム３０を、先行単語列候補
の仮名列情報により検索し、検索した先行単語以降の仮
名列候補の部分列に、前方一致する後方単語があるか否
かのチェックをする。チェックの結果、前方一致した後
方単語が無い場合には、ステップＳＴ３０５に処理を戻
し、前方一致した後方単語がある場合には、ステップＳ
Ｔ３０８に処理を進める。In step ST306, the kanji probability calculating means 31 extracts one preceding word string candidate from the RAM 5.
In the seventh embodiment, “{#### sentence start}” is first extracted as a preceding word string candidate. Next, in step ST307, the base name kana n-gram 29
The kana n-gram 30 of the general topic is searched by the kana string information of the preceding word string candidate, and it is checked whether or not the partial string of the kana string candidate after the searched preceding word has a backward word that matches forward. . As a result of the check, if there is no backward word that matches forward, the process returns to step ST305. If there is a backward word that matches forward, the process returns to step ST305.
The process proceeds to T308.

【０１０７】従って、この実施の形態７の場合には、初
期の先行単語列である「｛＃＃＃文頭｝」をまず検
索する。そして、この検索した先行単語列「｛＃＃
＃文頭｝」の後方単語として、野球話題の仮名ｎグラム
２９と一般話題の仮名ｎグラム３０を検索し、「＃」に
後続する「さんごうあー…」の先頭からの仮名文字列が
部分一致する単語を検索して後方単語とする。２グラム
では「＃さんごう」が「＃さんごうあー…」の仮名文字
列と前方一致するので、この２グラムの後接形態素「野
球：３号ｓａＮｇｏｏさんごう名詞」を後方単語
の候補の１つとする。また、１グラムの「｛野球：３号
ｓａＮｇｏｏさんごう名詞｝」は後方の仮名文字
列に前方一致するのでこれも候補とする。さらに「｛一
般：３号ｓａＮｇｏｏさんごう名詞｝」も候補と
なる。Therefore, in the case of the seventh embodiment, the initial preceding word string “{#### sentence beginning}” is searched first. Then, the searched preceding word string “｛##
The kana n-gram 29 of the baseball topic and the kana n-gram 30 of the general topic are searched for as the backward word of "# sentence #", and the kana character string from the beginning of "sangoua -..." following "#" partially matches. The word to be searched is searched for and used as the backward word. In the 2-gram, "#sango" matches the kana character string of "# sangoa-a ..." in front of the kana character string. One. In addition, one gram of “{baseball: No. 3 saNgoo sangu noun}” matches the back kana character string in front, so this is also a candidate. Furthermore, “｛General: No. 3 saNgoo Sango Noun｝” is also a candidate.

【０１０８】ステップＳＴ３０８では、後方単語それぞ
れについて尤度を計算し、ＲＡＭ５に記憶するととも
に、先行単語列に後方単語を接続してゆき、新たに先行
単語列としてＲＡＭ５にこれを記憶する。この実施の形
態７では、先行単語列「｛＃＃＃文頭｝」を「｛野
球：＃＃＃文頭｝、｛野球：３号ｓａＮｇｏｏ
さんごう名詞｝」に置き換える。言語尤度は、先行
単語列「｛＃＃＃文頭｝」の確率１と、野球話題の
「｛野球：＃＃＃文頭｝、｛野球：３号ｓａＮ
ｇｏｏさんごう名詞｝」の２グラムの確率０.０１
から前述の式（７）で計算される。In step ST308, the likelihood of each backward word is calculated and stored in the RAM 5, while the backward word is connected to the preceding word string, and this is stored in the RAM 5 as a new preceding word string. In the seventh embodiment, the preceding word string "@ ## # sentence head" is changed to "@baseball: ### sentence head," @baseball: No. 3 saNgoo
Sango noun｝ ”. The linguistic likelihood is the probability 1 of the preceding word string “｛#### sentence｝” and the baseball topic “｛baseball: #### sentence｝”, ｛baseball: No. 3 saN
goo sango noun｝ ”2 gram probability 0.01
Is calculated by the above equation (7).

【０１０９】次にステップＳＴ３０９において、仮名文
字列全体が先行単語列に対応したか否かのチェックを行
い、対応していればステップＳＴ３１０に進んで、最大
尤度および解の先行単語列をＲＡＭ５に記憶した後、処
理をステップＳＴ３０５に戻し、すべての先行単語列候
補が仮名文字列候補の末端の仮名と対応したか否かをチ
ェックする。一方、対応していなければ、そのまま処理
をステップＳＴ３０５に戻して上記チェックを行う。Next, in step ST309, it is checked whether or not the entire kana character string corresponds to the preceding word string, and if so, the flow advances to step ST310 to store the maximum likelihood and the preceding word string of the solution in the RAM 5. After that, the process returns to step ST305 to check whether all preceding word string candidates correspond to the terminal kana of the kana character string candidate. On the other hand, if not, the process returns to step ST305 to perform the above check.

【０１１０】この実施の形態７では、以上の処理によ
り、仮名列候補に対応して、「｛＃＃＃文頭｝、
｛野球：３号ｓａＮｇｏｏさんごう名詞｝、｛野
球：アーチａａｃｉあーち名詞｝、｛野球：の
ｎｏの助詞｝、…」の順に先行単語列候補が得られ
る。In the seventh embodiment, by the above processing, “{### sentence start},
｛Baseball: No.3 saNgoo Sango Noun｝, ｛Baseball: Arch Aaci Aichi Noun｝, ｛Baseball: No
The preceding word string candidates are obtained in the order of “no particles {,...

【０１１１】ステップＳＴ３０５ですべての先行単語列
候補が仮名文字列候補の末端の仮名と対応していると判
定された場合には、ステップＳＴ３１１に進んでＲＡＭ
５に記憶してある最大尤度を持つ解の単語列を読み出
す。ここで、最大尤度は言語尤度と音響尤度の積の最大
値である。この実施の形態７では仮名文字列候補「＃さ
んごうあーちのせんせい＃」に対して、「｛＃＃＃
文頭｝、｛３号ｓａＮｇｏｏさんごう名詞｝、
｛アーチａａｃｉあーち普通名詞｝、｛のｎｏ
の接続助詞｝、｛先制ｓｅＮｓｅｅせんせいサ
変名詞｝」が、また最大尤度が前述の式（６）で求めら
れる単語列確率Ｐ（Ｗ）中の最大値より、５.４×１０
^−９（音響尤度；０.９、言語尤度；６×１０^−９）と
得られる。If it is determined in step ST305 that all preceding word string candidates correspond to the terminal kana of the kana character string candidate, the process proceeds to step ST311 to proceed to step ST311.
The word string of the solution having the maximum likelihood stored in No. 5 is read out. Here, the maximum likelihood is the maximum value of the product of the language likelihood and the acoustic likelihood. In the seventh embodiment, for the kana character string candidate “# sangoaichi-no-sensei #”, “@ ####” is used.
Sentence｝、｛No.3 saNgoo sango noun｝、
｛Arch aaci Aichi common noun｝, no of no
, The maximum likelihood is 5.4 × 10 higher than the maximum value in the word string probability P (W) obtained by the above equation (6).
⁻⁹ (acoustic likelihood: 0.9, language likelihood: 6 × 10 ⁻⁹ ).

【０１１２】次にステップＳＴ３１２において、このＲ
ＡＭ５から読み出した解の単語列を出力手段３２から出
力した後、ステップＳＴ３１３に進んでこの一連の形態
素解析処理を終了する。このようにして、この実施の形
態７では仮名漢字変換結果として、「３号アーチの先
制」が得られる。Next, in step ST312, this R
After outputting the word string of the solution read from AM5 from the output unit 32, the process proceeds to step ST313, and this series of morphological analysis processing ends. In this manner, in the seventh embodiment, “No. 3 arch preemption” is obtained as the kana-kanji conversion result.

【０１１３】以上のように、この実施の形態７によれ
ば、話題を分離して統計量をとって仮名漢字変換を行っ
ているので、ｎグラムの次数を大きくすることなく言語
制約の強いｎグラムを構成することができ、高精度な仮
名漢字変換装置を構築できるという効果が得られる。な
お、本実施例では２つの話題を扱ったが、３つ以上の話
題を扱うように構成しても良い。As described above, according to the seventh embodiment, the kana-kanji conversion is performed by separating the topics and obtaining the statistics, so that the n-grams with strong linguistic constraints can be used without increasing the degree of the n-gram. Thus, an effect that a highly accurate kana-kanji conversion device can be constructed can be obtained. Although two topics are dealt with in this embodiment, three or more topics may be dealt with.

【０１１４】実施の形態８．なお、上記実施の形態７で
は、特に考慮していなかったが、仮名漢字の変換時に、
一連の仮名文字列に対する仮名ｎグラム中の話題がすべ
て一致するように漢字確率算出手段を構成してもよい。
図２１はそのようなこの発明の実施の形態８による仮名
漢字変換装置の構成を示すブロック図である。Embodiment 8 FIG. In the above-described Embodiment 7, although no particular consideration was given, when converting kana-kanji,
The kanji probability calculating means may be configured so that all topics in a kana n-gram for a series of kana character strings match.
FIG. 21 is a block diagram showing a configuration of such a kana-kanji conversion device according to the eighth embodiment of the present invention.

【０１１５】図において、５はＲＡＭ、２８はキーボー
ド、３２は出力手段であり、これらは図１８に同一符号
を付して示した実施の形態７のそれらと同等の部分であ
る。３４は図１８に符号２９を付して示したものに相当
する漢字確率算出手段であるが、単語列候補の算出時
に、一連の仮名文字列に対する仮名ｎグラム中の話題が
すべて一致するように構成されている点で異なってい
る。３５、３６は図１８に符号２９、３０を付して示し
たものに相当する、野球話題の仮名ｎグラムおよび一般
話題の仮名ｎグラムであるが、この場合には２グラムの
みが用いられ、１グラムは用いられていない。In the figure, 5 is a RAM, 28 is a keyboard, 32 is an output means, which are the same as those of the seventh embodiment shown in FIG. Reference numeral 34 denotes a kanji probability calculating means equivalent to that shown in FIG. 18 with the reference numeral 29. When calculating a word string candidate, a kanji probability calculating means is used so that all topics in a kana n-gram for a series of kana character strings match. The difference is in the configuration. 35 and 36 are a pseudonym n-gram of a baseball topic and a pseudonym n-gram of a general topic, which are equivalent to those indicated by reference numerals 29 and 30 in FIG. 18. In this case, only 2 g is used, One gram is not used.

【０１１６】ここで、図２２は仮名ｎグラムの具体例を
示す説明図である。図において、３７はその仮名ｎグラ
ムであり、この仮名ｎグラム３７は野球話題の仮名ｎグ
ラム３５と一般話題の仮名ｎグラム３６とが記録されて
いる。前述のように、この仮名ｎグラム３７の野球話題
の仮名ｎグラム３５と一般話題の仮名ｎグラム３６に
は、それぞれキーとなる各仮名文字列に対して、前接形
態素、後接形態素、および確率が記録された２グラムの
みが用いられている。FIG. 22 is an explanatory diagram showing a specific example of the kana n-gram. In the figure, reference numeral 37 denotes the pseudonym n-gram, and the pseudonym n-gram 37 records a pseudonym n-gram 35 of a baseball topic and a pseudonym n-gram 36 of a general topic. As described above, the kana n-gram 35 of the baseball topic and the kana n-gram 36 of the general topic of the kana n-gram 37 have a prefix morpheme, a postfix morpheme, and a Only 2 grams with the recorded probability are used.

【０１１７】次に動作について説明する。図２３はこの
ように構成された実施の形態８による仮名漢字変換装置
の概略動作の流れを示すフローチャートである。この実
施の形態８においても、まず、ステップＳＴ３０１から
ステップＳＴ３０６において、実施の形態７の場合と全
く同様の処理が行われる。ステップＳＴ３０６にてＲＡ
Ｍ５から先行単語列候補を１つが取り出されると、漢字
確率計算手段３４はステップＳＴ３２０において、仮名
ｎグラム３７を先行単語列候補の仮名列情報によって検
索し、前方一致する後方単語があるか否かのチェックを
する。そのとき、実施の形態７では、仮名ｎグラム３３
の野球話題の仮名ｎグラム３０と一般話題の仮名ｎグラ
ム３１は、それぞれ２グラムと１グラムの双方が用いら
れていたが、この実施の形態８では、野球話題の仮名ｎ
グラム３５と一般話題の仮名ｎグラム３６が、それぞれ
２グラムのみの仮名ｎグラム３７を用いて一致検出を行
っている。チェックの結果、前方一致した後方単語があ
る場合にはステップＳＴ３０８に分岐して、以下ステッ
プＳＴ３１３まで、実施の形態７と同様に処理を進め
る。Next, the operation will be described. FIG. 23 is a flowchart showing a schematic operation flow of the kana-kanji conversion device according to the eighth embodiment thus configured. Also in the eighth embodiment, first, in steps ST301 to ST306, exactly the same processing as in the seventh embodiment is performed. RA in step ST306
When one preceding word string candidate is extracted from M5, in step ST320, the kanji probability calculating means 34 searches the kana n-gram 37 with the kana string information of the preceding word string candidate, and determines whether or not there is a backward word that matches forward. Check At that time, in the seventh embodiment, the kana n-gram 33
The baseball topic pseudonym n-gram 30 and the general topic pseudonym n-gram 31 both use 2 gram and 1 gram, respectively.
The gram 35 and the kana n-gram 36 of the general topic are used for matching detection using the kana n-gram 37 of only 2 grams each. As a result of the check, if there is a backward word whose front matches, the process branches to step ST308, and the process proceeds to step ST313 in the same manner as in the seventh embodiment.

【０１１８】以上のように、この実施の形態８によれ
ば、漢字確率算出手段３４は仮名ｎグラム３７の２グラ
ムのみを用いて一致を検査しているので、１つの仮名文
字列に対する一連の形態素は同じ話題の形態素となるた
め、他の話題が交ざることをなくすことができるという
効果が得られる。As described above, according to the eighth embodiment, the kanji probability calculating means 34 checks the match using only 2 g of the kana n-gram 37. Since the morphemes are morphemes of the same topic, an effect that another topic can be prevented from intersecting is obtained.

【０１１９】実施の形態９．なお、上記実施の形態７お
よび実施の形態８では、仮名漢字変換において、話題ご
との確率の重み調整については特に考慮していなかった
が、話題ごとに確率の重みの調整を可能に漢字確率算出
手段を構成するようにしてもよい。図２４はそのような
この発明の実施の形態９による仮名漢字変換装置の構成
を示すブロック図である。図において、５はＲＡＭ、２
８はキーボード、３２は出力手段、３５、３６は野球話
題および一般話題の仮名ｎグラムであり、これらは図２
１に同一符号を付して示した実施の形態８のそれらと同
等の部分である。３８は図２１に符号３４を付して示し
たものに相当する漢字確率算出手段であるが、話題ごと
に確率の重みを調整可能に構成されている点で異なって
いる。Embodiment 9 FIG. In the above seventh and eighth embodiments, the kana-kanji conversion does not particularly consider the adjustment of the probability weight for each topic, but the kanji probability calculation enables the adjustment of the probability weight for each topic. Means may be constituted. FIG. 24 is a block diagram showing a configuration of such a kana-kanji conversion device according to Embodiment 9 of the present invention. In the figure, 5 is RAM, 2
Reference numeral 8 denotes a keyboard, 32 denotes output means, and 35 and 36 denote kana n-grams of baseball topics and general topics.
1 are the same as those of Embodiment 8 shown by attaching the same reference numerals. Numeral 38 denotes a kanji probability calculating means corresponding to the one denoted by reference numeral 34 in FIG. 21, but differs in that the weight of the probability can be adjusted for each topic.

【０１２０】次に動作について説明する。図２５はこの
ように構成された実施の形態９による仮名漢字変換装置
の概略動作の流れを示すフローチャートである。この実
施の形態９においても、まず、ステップＳＴ３０１から
ステップＳＴ３０６、およびステップＳＴ３２０におい
て、実施の形態８の場合と全く同様の処理が行われる。
ステップＳＴ３２０における２グラムのみの仮名ｎグラ
ム３７を用いた、前方一致する後方単語があるか否のチ
ェックの結果、前方一致した後方単語がない場合にはス
テップＳＴ３０５に戻り、前方一致する後方単語がある
場合にはステップＳＴ３３０に進む。ステップＳＴ３３
０では漢字確率算出手段３８が、後方単語のそれぞれに
ついて分野別に重み付けを行って尤度を計算し、それを
ＲＡＭ５に記憶するとともに、先行単語列に後方単語を
接続してゆき、新たに先行単語列としてＲＡＭ５に記憶
する。以下ステップＳＴ３０９からステップＳＴ３１３
まで、実施の形態８と同様に処理を進める。Next, the operation will be described. FIG. 25 is a flowchart showing a schematic operation flow of the kana-kanji conversion device according to the ninth embodiment configured as described above. Also in the ninth embodiment, the same processing as in the eighth embodiment is performed in steps ST301 to ST306 and step ST320.
As a result of checking whether or not there is a backward word that matches forward using the kana n-gram 37 of only 2 grams in step ST320, if there is no backward word that matches forward, the process returns to step ST305, and the backward word that matches forward is determined. If there is, the process proceeds to step ST330. Step ST33
In the case of 0, the kanji probability calculating means 38 calculates the likelihood by weighting each of the backward words for each field, stores the likelihood in the RAM 5, and connects the backward word to the preceding word string to newly add the preceding word. It is stored in the RAM 5 as a column. Hereinafter, steps ST309 to ST313 are performed.
Up to this point, the process proceeds in the same manner as in the eighth embodiment.

【０１２１】以上のように、この実施の形態９によれ
ば、２グラムの確率の重みを話題別にかけるように漢字
確率算出手段３８を構成しているので、話題別に出現確
率の調節が可能になるという効果が得られる。As described above, according to the ninth embodiment, the kanji probability calculating means 38 is configured to apply the probability weight of 2 grams to each topic, so that the appearance probability can be adjusted for each topic. Is obtained.

【０１２２】[0122]

【発明の効果】以上のように、この発明によれば、対象
言語の音韻列、音韻列に対応する単語表記列、および生
起確率を記憶した音韻ｎグラム中の単語を、それぞれの
話題に対応して分類し、単語確率算出手段がその音韻ｎ
グラムを参照して算出した単語生起確率と、音韻確率算
出手段が算出した音韻生起確率とを用いて、入力された
音声に類似する単語列候補を求めて出力手段より出力す
るように構成したので、話題を分離して統計量をとるこ
とによって、ｎグラムの次数を大きくすることなく言語
制約の強いｎグラムを構成することが可能となり、精度
の高い音声認識装置が得られるという効果がある。As described above, according to the present invention, the words in the phoneme n-gram in which the phoneme sequence of the target language, the word expression sequence corresponding to the phoneme sequence, and the occurrence probability are stored correspond to the respective topics. And the word probability calculating means calculates the phoneme n
By using the word occurrence probability calculated with reference to the gram and the phoneme occurrence probability calculated by the phoneme probability calculation means, a word string candidate similar to the input speech is obtained and output from the output means. By separating the topics and obtaining statistics, it is possible to construct an n-gram with a strong language constraint without increasing the order of the n-gram, and this has the effect of obtaining a highly accurate speech recognition device.

【０１２３】この発明によれば、単語列候補の算出時
に、単語確率算出手段で一連の音声に対応する音韻ｎグ
ラム中の話題をすべて一致させるように構成したので、
１つの音声に対する一連の形態素が同じ話題の形態素と
なり、発話中に他の話題が交ざることを防止することが
できる音声認識装置が得られるという効果がある。According to the present invention, at the time of calculating a word string candidate, the word probability calculation means is configured to match all topics in a phoneme n-gram corresponding to a series of voices.
A series of morphemes for one voice becomes the morpheme of the same topic, and there is an effect that a speech recognition device that can prevent another topic from intermingling during utterance can be obtained.

【０１２４】この発明によれば、単語確率算出手段によ
って、話題ごとに確率の重み設定を行うように構成した
ので、話題別に出現確率を調整することが可能な音声認
識装置が得られるという効果がある。According to the present invention, since the weight setting of the probability is performed for each topic by the word probability calculating means, the speech recognition apparatus capable of adjusting the appearance probability for each topic can be obtained. is there.

【０１２５】この発明によれば、仮名漢字混じり文字
列、仮名漢字混じり文字列に対応する単語表記列、およ
び生起確率を記憶した漢字ｎグラム中の単語を、それぞ
れの話題に対応して分類し、形態素確率算出手段がその
漢字ｎグラムを参照して算出した単語生起確率を用い
て、入力された仮名漢字混じり文字列に適合する単語列
候補を求めて出力手段より出力するように構成したの
で、話題を分離して統計量をとることによって、ｎグラ
ムの次数を大きくすることなく言語制約の強いｎグラム
を構成することが可能となり、精度の高い形態素解析装
置が得られるという効果がある。According to the present invention, the kana-kanji mixed character string, the word notation string corresponding to the kana-kanji mixed character string, and the words in the kanji n-gram storing the occurrence probabilities are classified according to the respective topics. Since the morpheme probability calculation means uses the word occurrence probabilities calculated with reference to the kanji n-gram to determine a word string candidate that matches the input kana-kanji mixed character string, and is output from the output means. By separating the topics and collecting statistics, it is possible to construct an n-gram with a strong language constraint without increasing the order of the n-gram, and there is an effect that a highly accurate morphological analyzer can be obtained.

【０１２６】この発明によれば、単語列候補の算出時
に、形態素確率算出手段で一連の仮名漢字混じり文字列
に対応する漢字ｎグラム中の話題をすべて一致させるよ
うに構成したので、１つの仮名漢字混じり文字列に対す
る一連の形態素は同一の話題の形態素となるため、他の
話題が交ざることを防止することができる形態素解析装
置が得られるという効果が得られる。According to the present invention, at the time of calculating a word string candidate, the morpheme probability calculating means is configured to match all the topics in the kanji n-gram corresponding to a series of kana-kanji mixed character strings. Since a series of morphemes for a character string mixed with kanji is a morpheme of the same topic, an effect is obtained in that a morphological analyzer capable of preventing another topic from intersecting can be obtained.

【０１２７】この発明によれば、形態素確率算出手段に
よって、話題ごとに確率の重み設定を行うように構成し
たので、話題別に出現確率を調整することが可能な形態
素解析装置が得られるという効果がある。According to the present invention, the morpheme probability calculation means is configured to set the weight of the probability for each topic. Therefore, the morphological analysis device capable of adjusting the appearance probability for each topic can be obtained. is there.

【０１２８】この発明によれば、仮名文字列、仮名文字
列に対応する単語表記列、および生起確率を記憶した仮
名ｎグラム中の単語を、それぞれの話題に対応して分類
し、漢字確率算出手段がその仮名ｎグラムを参照して算
出した単語生起確率を用いて、入力された仮名文字列に
適合する単語列候補を求めて出力手段より出力するよう
に構成したので、話題を分離して統計量をとることによ
って、ｎグラムの次数を大きくすることなく言語制約の
強いｎグラムを構成することが可能となり、精度の高い
仮名漢字変換装置が得られるという効果がある。According to the present invention, the kana character string, the word notation string corresponding to the kana character string, and the words in the kana n-gram storing the occurrence probabilities are classified according to the respective topics, and the kanji probability calculation is performed. The means is configured to use the word occurrence probability calculated with reference to the kana n-gram to determine a word string candidate that matches the input kana character string and to output the word string candidate from the output means. By obtaining statistics, it is possible to construct an n-gram with a strong language constraint without increasing the degree of the n-gram, and there is an effect that a kana-kanji conversion device with high accuracy can be obtained.

【０１２９】この発明によれば、単語列候補の算出時
に、漢字確率算出手段で一連の仮名文字列に対応する仮
名ｎグラム中の話題をすべて一致させるように構成した
ので、１つの仮名漢字混じり文字列に対する一連の形態
素は同一の話題の形態素となるため、他の話題が交ざる
ことを防止することができる仮名漢字変換装置が得られ
るという効果がある。According to the present invention, at the time of calculating a word string candidate, the kanji probability calculating means is configured to match all topics in the kana n-gram corresponding to a series of kana character strings. Since a series of morphemes for a character string are morphemes of the same topic, there is an effect that a kana-kanji conversion device capable of preventing another topic from intersecting is obtained.

【０１３０】この発明によれば、漢字確率算出手段によ
って、話題ごとに確率の重み設定を行うように構成した
ので、話題別に出現確率を調整することが可能な仮名漢
字変換装置が得られるという効果がある。According to the present invention, the kanji probability calculating means is configured to set the weight of the probability for each topic, so that a kana-kanji conversion device capable of adjusting the appearance probability for each topic can be obtained. There is.

【０１３１】この発明によれば、各音韻に対応して算出
した音韻生起確率と、記憶している単語をそれぞれの話
題対応に分類して、対象言語の音韻列、音韻列に対応す
る単語表記列、および生起確率を記憶した音韻ｎグラム
を参照して算出した単語生起確率を用いて、入力された
音声に類似する単語列候補を計算するように構成したの
で、話題を分離して統計量をとることによって、ｎグラ
ムの次数を大きくすることなく言語制約の強いｎグラム
を構成することができ、高精度の音声認識方法が得られ
るという効果がある。According to the present invention, the phoneme occurrence probabilities calculated for each phoneme and the stored words are classified into the respective topic correspondences, and the phoneme sequence of the target language and the word notation corresponding to the phoneme sequence By using the word occurrence probabilities calculated with reference to the phonemes and the phoneme n-grams storing the occurrence probabilities, word line candidates similar to the input speech are configured to be calculated. By taking the above, it is possible to construct an n-gram with a strong language constraint without increasing the order of the n-gram, and there is an effect that a highly accurate speech recognition method can be obtained.

【０１３２】この発明によれば、記憶している単語をそ
れぞれの話題対応に分類して、仮名漢字混じり文字列、
仮名漢字混じり文字列に対応する単語表記列、および生
起確率を記憶した漢字ｎグラムを参照して算出した単語
生起確率を用いて、入力された仮名漢字混じり文字列に
適合する単語列候補を計算するように構成したので、ｎ
グラムの次数を大きくすることなく言語制約の強いｎグ
ラムを構成することができ、高精度の形態素解析方法が
得られるという効果がある。According to the present invention, the stored words are classified according to the respective topics, and a character string containing kana-kanji characters,
Calculates word string candidates that match the input kana-kanji mixed character string using the word notation string corresponding to the kana-kanji mixed character string and the word occurrence probability calculated with reference to the kanji n-gram that stores the occurrence probability So that n
An n-gram with a strong language constraint can be constructed without increasing the degree of the gram, and there is an effect that a highly accurate morphological analysis method can be obtained.

【０１３３】この発明によれば、記憶している単語をそ
れぞれの話題対応に分類して、仮名文字列、仮名文字列
に対応する単語表記列、および生起確率を記憶した仮名
ｎグラムを参照して算出した単語生起確率を用いて、入
力された仮名文字列に適合する単語列候補を計算するよ
うに構成したので、ｎグラムの次数を大きくすることな
く言語制約の強いｎグラムを構成することができ、高精
度の仮名漢字変換方法が得られるという効果がある。According to the present invention, the stored words are classified into the respective topic correspondences, and the kana character string, the word notation string corresponding to the kana character string, and the kana n-gram storing the occurrence probability are referred to. By using the word occurrence probabilities calculated in this way, word string candidates that match the input kana character string are calculated, so that an n-gram with strong language constraints can be constructed without increasing the degree of the n-gram. This has the effect that a highly accurate kana-kanji conversion method can be obtained.

【０１３４】この発明によれば、各音韻に対応して算出
した音韻生起確率と、記憶している単語をそれぞれの話
題対応に分類して、対象言語の音韻列、音韻列に対応す
る単語表記列、および生起確率を記憶した音韻ｎグラム
を参照して算出した単語生起確率を用いて、入力された
音声に類似する単語列候補を計算するための音声認識方
法のプログラムを、コンピュータ読み取り可能に記録す
るように構成したので、音声認識方法を高精度に実現す
るためのプログラムが記録された記録媒体が得られると
いう効果がある。According to the present invention, the phoneme occurrence probabilities calculated for each phoneme and the stored words are classified into each topic correspondence, and the phoneme sequence of the target language, the word notation corresponding to the phoneme sequence A computer-readable program for a speech recognition method for calculating word sequence candidates similar to input speech using word occurrence probabilities calculated with reference to phonemes and phonological n-grams storing the occurrence probabilities. Since the recording is performed, there is an effect that a recording medium on which a program for realizing the speech recognition method with high accuracy is recorded can be obtained.

【０１３５】この発明によれば、記憶している単語をそ
れぞれの話題対応に分類して、仮名漢字混じり文字列、
仮名漢字混じり文字列に対応する単語表記列、および生
起確率を記憶した漢字ｎグラムを参照して算出した単語
生起確率を用いて、入力された仮名漢字混じり文字列に
適合する単語列候補を計算するための形態素解析方法の
プログラムを、コンピュータ読み取り可能に記録するよ
うに構成したので、形態素解析方法を高精度に実現する
ためのプログラムが記録された記録媒体が得られるとい
う効果がある。According to the present invention, the stored words are classified according to the respective topics, so that a character string containing kana-kanji characters,
Calculates word string candidates that match the input kana-kanji mixed character string using the word notation string corresponding to the kana-kanji mixed character string and the word occurrence probability calculated with reference to the kanji n-gram that stores the occurrence probability Since the program of the morphological analysis method for performing the morphological analysis method is configured to be recorded in a computer-readable manner, it is possible to obtain a recording medium on which a program for realizing the morphological analysis method with high accuracy is recorded.

【０１３６】この発明によれば、記憶している単語をそ
れぞれの話題対応に分類して、仮名文字列、仮名文字列
に対応する単語表記列、および生起確率を記憶した仮名
ｎグラムを参照して算出した単語生起確率を用いて、入
力された仮名文字列に適合する単語列候補を計算するた
めの仮名漢字変換方法のプログラムを、コンピュータ読
み取り可能に記録するように構成したので、仮名漢字変
換方法を高精度に実現するためのプログラムが記録され
た記録媒体が得られるという効果がある。According to the present invention, the stored words are classified into corresponding topics, and the kana character string, the word notation string corresponding to the kana character string, and the kana n-gram storing the occurrence probability are referred to. Since the kana-kanji conversion method program for calculating a word string candidate matching the input kana character string using the calculated word occurrence probability is configured to be recorded in a computer-readable manner, the kana-kanji conversion There is an effect that a recording medium on which a program for realizing the method is realized with high accuracy is obtained.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明の実施の形態１による音声認識装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition device according to a first embodiment of the present invention.

【図２】実施の形態１の音声認識装置で解析される例
文を示す説明図である。FIG. 2 is an explanatory diagram showing an example sentence analyzed by the speech recognition device of the first embodiment.

【図３】実施の形態１の音声認識装置にて解析に用い
る音韻ｎグラムの具体例を示す説明図である。FIG. 3 is an explanatory diagram showing a specific example of a phoneme n-gram used for analysis in the speech recognition device according to the first embodiment.

【図４】実施の形態１の音声認識装置における音声認
識の概略動作の流れを示すフローチャートである。FIG. 4 is a flowchart showing a schematic operation flow of speech recognition in the speech recognition device according to the first embodiment;

【図５】この発明の実施の形態２による音声認識装置
の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a speech recognition device according to a second embodiment of the present invention.

【図６】実施の形態２の音声認識装置にて解析に用い
る音韻ｎグラムの具体例を示す説明図である。FIG. 6 is an explanatory diagram showing a specific example of a phoneme n-gram used for analysis in the speech recognition device according to the second embodiment.

【図７】実施の形態２の音声認識装置における音声認
識の概略動作の流れを示すフローチャートである。FIG. 7 is a flowchart showing a schematic operation flow of speech recognition in the speech recognition device according to the second embodiment.

【図８】この発明の実施の形態３による音声認識装置
の構成を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a voice recognition device according to a third embodiment of the present invention.

【図９】実施の形態３の音声認識装置における音声認
識の概略動作の流れを示すフローチャートである。FIG. 9 is a flowchart illustrating a schematic operation flow of speech recognition in the speech recognition device according to the third embodiment;

【図１０】この発明の実施の形態４による形態素解析
装置の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a morphological analyzer according to Embodiment 4 of the present invention.

【図１１】実施の形態４の形態素解析装置にて解析に
用いる漢字ｎグラムの具体例を示す説明図である。FIG. 11 is an explanatory diagram showing a specific example of a kanji n-gram used for analysis by the morphological analyzer of the fourth embodiment.

【図１２】実施の形態４の形態素解析装置における形
態素解析の概略動作の流れを示すフローチャートであ
る。FIG. 12 is a flowchart illustrating a schematic operation flow of morphological analysis in the morphological analyzer according to the fourth embodiment;

【図１３】この発明の実施の形態５による形態素解析
装置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a morphological analyzer according to Embodiment 5 of the present invention.

【図１４】実施の形態５の形態素解析装置にて解析に
用いる漢字ｎグラムの具体例を示す説明図である。FIG. 14 is an explanatory diagram showing a specific example of a kanji n-gram used for analysis by the morphological analyzer of the fifth embodiment.

【図１５】実施の形態５の計値磯解析装置における形
態素解析の概略動作の流れを示すフローチャートであ
る。FIG. 15 is a flowchart showing a schematic operation flow of a morphological analysis in the measurement value analyzing apparatus according to the fifth embodiment.

【図１６】この発明の実施の形態６による形態素解析
装置の構成を示すブロック図である。FIG. 16 is a block diagram showing a configuration of a morphological analyzer according to Embodiment 6 of the present invention.

【図１７】実施の形態６の形態素解析装置における形
態素解析の概略動作の流れを示すフローチャートであ
る。FIG. 17 is a flowchart illustrating a schematic operation flow of morphological analysis in the morphological analyzer according to the sixth embodiment;

【図１８】この発明の実施の形態７による仮名漢字変
換装置の構成を示すブロック図である。FIG. 18 is a block diagram showing a configuration of a kana-kanji conversion device according to a seventh embodiment of the present invention.

【図１９】実施の形態７の仮名漢字変換装置にて解析
に用いる仮名ｎグラムの具体例を示す説明図である。FIG. 19 is an explanatory diagram showing a specific example of a kana n-gram used for analysis in the kana-kanji conversion device of the seventh embodiment.

【図２０】実施の形態７の仮名漢字変換装置における
仮名漢字変換の概略動作の流れを示すフローチャートで
ある。FIG. 20 is a flowchart showing a schematic operation flow of kana-kanji conversion in the kana-kanji conversion device of the seventh embodiment.

【図２１】この発明の実施の形態８による仮名漢字変
換装置の構成を示すブロック図である。FIG. 21 is a block diagram showing a configuration of a kana-kanji conversion device according to an eighth embodiment of the present invention.

【図２２】実施の形態８の仮名漢字変換装置にて解析
に用いる仮名ｎグラムの具体例を示す説明図である。FIG. 22 is an explanatory diagram showing a specific example of a kana n-gram used for analysis in the kana-kanji conversion device of the eighth embodiment.

【図２３】実施の形態８の仮名漢字変換装置における
仮名漢字変換の概略動作の流れを示すフローチャートで
ある。FIG. 23 is a flowchart showing a schematic operation flow of kana-kanji conversion in the kana-kanji conversion device of the eighth embodiment.

【図２４】この発明の実施の形態９による仮名漢字変
換装置の構成を示すブロック図である。FIG. 24 is a block diagram showing a configuration of a kana-kanji conversion device according to Embodiment 9 of the present invention.

【図２５】実施の形態９の仮名漢字変換析装置におけ
る仮名漢字変換の概略動作の流れを示すフローチャート
である。FIG. 25 is a flowchart showing a schematic operation flow of kana-kanji conversion in the kana-kanji conversion analyzer of the ninth embodiment.

【図２６】従来の音声認識装置の構成を示すブロック
図である。FIG. 26 is a block diagram illustrating a configuration of a conventional voice recognition device.

【図２７】従来の音声認識装置における音声認識の概
略動作の流れを示すフローチャートである。FIG. 27 is a flowchart showing a schematic operation flow of voice recognition in a conventional voice recognition device.

【符号の説明】[Explanation of symbols]

１マイク（入力手段）、２音韻確率算出手段、５
ＲＡＭ、６出力手段、７野球話題の音韻ｎグラム
（音韻ｎグラム）、８一般話題の音韻ｎグラム（音韻
ｎグラム）、９単語確率算出手段、１０例文、１１
音韻ｎグラム、１２単語確率算出手段、１３野球
話題の音韻ｎグラム（音韻ｎグラム）、１４一般話題
の音韻ｎグラム（音韻ｎグラム）、１５音韻ｎグラ
ム、１６単語確率算出手段、１７ファイル入力装置
（入力手段）、１８野球話題の漢字ｎグラム（漢字ｎ
グラム）、１９一般話題の漢字ｎグラム（漢字ｎグラ
ム）、２０形態素確率算出手段、２１出力手段、２
２漢字ｎグラム、２３形態素確率算出手段、２４
野球話題の漢字ｎグラム（漢字ｎグラム）、２５一般
話題の漢字ｎグラム（漢字ｎグラム）、２６漢字ｎグ
ラム、２７形態素確率算出手段、２８キーボード
（入力手段）、２９野球話題の仮名ｎグラム（仮名ｎ
グラム）、３０一般話題の仮名ｎグラム（仮名ｎグラ
ム）、３１漢字確率算出手段、３２出力手段、３３
仮名ｎグラム、３４漢字確率算出手段、３５野球
話題の仮名ｎグラム（仮名ｎグラム）、３６一般話題
の仮名ｎグラム（仮名ｎグラム）、３７仮名ｎグラ
ム、３８漢字確率算出手段。1 microphone (input means), 2 phoneme probability calculation means, 5
RAM, 6 output means, 7 phoneme n-gram (phoneme n-gram) for baseball topics, 8 phoneme n-gram (phoneme n-gram) for general topics, 9 word probability calculation means, 10 example sentences, 11
Phoneme n-gram, 12 word probability calculation means, 13 Baseball topic phoneme n-gram (phoneme n-gram), 14 General topic phoneme n-gram (phoneme n-gram), 15 phoneme n-gram, 16 word probability calculation means, 17 File input Device (input means), 18 Kanji n-gram (Kanji n
Gram), 19 kanji n-gram of general topics (kanji n-gram), 20 morpheme probability calculation means, 21 output means, 2
2 Kanji n-gram, 23 Morphological probability calculating means, 24
Baseball topic kanji n-gram (kanji n-gram), 25 general topic kanji n-gram (kanji n-gram), 26 kanji n-gram, 27 morpheme probability calculating means, 28 keyboard (input means), 29 baseball topic kana n-gram (Pseudonym n
G), 30 kana n-gram of general topics (kana n-gram), 31 kanji probability calculation means, 32 output means, 33
Kana n-gram, 34 Kanji probability calculating means, 35 Kana n-gram (Kana n-gram) for baseball topics, 36 Kana n-gram (Kana n-gram) for general topics, 37 Kana n-gram, 38 Kanji probability calculating means.

───────────────────────────────────────────────────── フロントページの続き (72)発明者阿部芳春東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B009 MA00 5B091 AA15 CA02 CB12 5D015 AA01 AA06 BB01 BB02 HH13 HH15 LL04 LL08 9A001 DD11 GG05 HH12 HH13 HH17 JJ71 KK54 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Yoshiharu Abe 2-3-2 Marunouchi, Chiyoda-ku, Tokyo F-term in Mitsubishi Electric Corporation (reference) 5B009 MA00 5B091 AA15 CA02 CB12 5D015 AA01 AA06 BB01 BB02 HH13 HH15 LL04 LL08 9A001 DD11 GG05 HH12 HH13 HH17 JJ71 KK54

Claims

【特許請求の範囲】[Claims]

【請求項１】対象言語の音韻列と、音韻列に対応する
単語表記列と、生起確率とを記憶し、記憶している単語
がそれぞれの話題に対応して分類されている音韻ｎグラ
ムと、前記対象言語の音声を入力する入力手段と、前記入力手段が出力する音声信号を音韻に変換し、各音
韻に対応する音韻生起確率を計算して、音韻列候補を出
力する音韻確率算出手段と、前記音韻ｎグラムを参照して、前記音韻確率算出手段が
出力する音韻列候補に対応する各単語候補の単語生起確
率を算出する単語確率算出手段と、前記音韻確率算出手段にて計算された音韻生起確率と、
前記単語確率算出手段にて計算された単語生起確率とを
用いて算出した、前記入力手段より入力された音声に類
似する単語列候補を出力する出力手段とを備えた音声認
識装置。1. A phoneme n-gram in which a phoneme string of a target language, a word notation string corresponding to the phoneme string, and an occurrence probability are stored, and the stored words are classified according to respective topics. An input unit for inputting voice of the target language; a phoneme probability calculating unit for converting a voice signal output by the input unit into a phoneme, calculating a phoneme occurrence probability corresponding to each phoneme, and outputting a phoneme sequence candidate. With reference to the phoneme n-gram, word probability calculation means for calculating a word occurrence probability of each word candidate corresponding to the phoneme string candidate output by the phoneme probability calculation means, and calculated by the phoneme probability calculation means. Phoneme occurrence probability,
An output unit configured to output a word string candidate similar to the voice input from the input unit, calculated using the word occurrence probability calculated by the word probability calculation unit.

【請求項２】単語確率算出手段が、単語列候補の算出
時に、一連の音声に対応する音韻ｎグラム中の話題をす
べて一致させるものであることを特徴とする請求項１記
載の音声認識装置。2. The speech recognition apparatus according to claim 1, wherein said word probability calculating means matches all topics in a phoneme n-gram corresponding to a series of voices when calculating a word string candidate. .

【請求項３】単語確率算出手段が、話題ごとに確率の
重みを設定するものであることを特徴とする請求項１ま
たは請求項２記載の音声認識装置。3. The speech recognition apparatus according to claim 1, wherein the word probability calculation means sets a weight of the probability for each topic.

【請求項４】仮名漢字混じり文字列と、仮名漢字混じ
り文字列に対応する単語表記列と、生起確率とを記憶
し、記憶している単語がそれぞれの話題に対応して分類
されている漢字ｎグラムと、前記仮名漢字混じり文字列を入力する入力手段と、前記漢字ｎグラムを参照して、前記入力手段が出力する
仮名漢字混じり文字列に対応する各単語候補の単語生起
確率を算出する形態素確率算出手段と、前記形態素確率算出手段にて計算された単語生起確率を
用いて算出した、前記入力手段より入力された文字列に
適合する単語列候補を出力する出力手段とを備えた形態
素解析装置。4. A kanji in which a kana-kanji mixed character string, a word notation string corresponding to the kana-kanji mixed character string, and an occurrence probability are stored, and the stored words are classified according to respective topics. n-gram, input means for inputting the kana-kanji mixed character string, and reference to the kanji n-gram to calculate the word occurrence probability of each word candidate corresponding to the kana-kanji mixed character string output by the input means A morpheme comprising: a morpheme probability calculation unit; and an output unit that outputs a word string candidate that is calculated using the word occurrence probability calculated by the morpheme probability calculation unit and that matches the character string input from the input unit. Analysis device.

【請求項５】形態素確率算出手段が、単語列候補の算
出時に、一連の仮名漢字混じり文字列に対応する漢字ｎ
グラム中の話題をすべて一致させるものであることを特
徴とする請求項４記載の形態素解析装置。5. The morpheme probability calculating means calculates a kanji character n corresponding to a series of kana-kanji mixed character strings when calculating a word string candidate.
5. The morphological analyzer according to claim 4, wherein all the topics in the gram are matched.

【請求項６】形態素確率算出手段が、話題ごとに確率
の重みを設定するものであることを特徴とする請求項４
または請求項５記載の形態素解析装置。6. The morpheme probability calculating means sets a probability weight for each topic.
Or the morphological analyzer according to claim 5.

【請求項７】仮名文字列と、仮名文字列に対応する単
語表記列と、生起確率とを記憶し、記憶している単語が
それぞれの話題に対応して分類されている仮名ｎグラム
と、前記仮名文字列を入力する入力手段と、前記仮名ｎグラムを参照して、前記入力手段が出力する
仮名文字列に対応する各単語候補の単語生起確率を算出
する漢字確率算出手段と、前記漢字確率算出手段にて計算された単語生起確率を用
いて算出された、前記入力手段より入力された仮名文字
列に適合する単語列候補を出力する出力手段とを備えた
仮名漢字変換装置。7. A kana n-gram in which a kana character string, a word notation string corresponding to the kana character string, and an occurrence probability are stored, and the stored words are classified according to respective topics. Input means for inputting the kana character string; kanji probability calculating means for calculating a word occurrence probability of each word candidate corresponding to the kana character string output by the input means with reference to the kana n-gram; Output means for outputting a word string candidate that is calculated using the word occurrence probability calculated by the probability calculation means and matches the kana character string input from the input means.

【請求項８】漢字確率算出手段が、単語列候補の算出
時に、一連の仮名文字列に対応する仮名ｎグラム中の話
題をすべて一致させるものであることを特徴とする請求
項７記載の仮名漢字変換装置。8. The kana according to claim 7, wherein the kanji probability calculating means matches all the topics in the kana n-gram corresponding to the series of kana character strings when calculating the word string candidates. Kanji conversion device.

【請求項９】漢字確率算出手段が、話題ごとに確率の
重みの設定を行うものであることを特徴とする請求項７
または請求項８記載の仮名漢字変換装置。9. The kanji probability calculating means sets a probability weight for each topic.
9. The kana-kanji conversion device according to claim 8.

【請求項１０】入力される音声の取り込みを行うステ
ップと、取り込まれた前記音声を音韻に変換するステップと、前記音声より変換された各音韻に対応する音韻生起確率
を計算して、音韻列候補を出力するステップと、対象言語の音韻列と、音韻列に対応する単語表記列と、
生起確率とを記憶し、記憶している単語がそれぞれの話
題に対応して分類された音韻ｎグラムを参照して、算出
された前記音韻列候補に対応する各単語候補の単語生起
確率を算出するステップと、前記音韻生起確率と単語生起確率を用いて、入力された
前記音声に類似する単語列候補を算出するステップとを
備えた音声認識方法。10. A step of capturing an input voice, a step of converting the captured voice into a phoneme, and calculating a phoneme occurrence probability corresponding to each phoneme converted from the voice to obtain a phoneme sequence. Outputting a candidate, a phoneme string of the target language, a word notation string corresponding to the phoneme string,
The occurrence probabilities are stored, and the stored words are referred to the phoneme n-grams classified according to the respective topics, and the word occurrence probabilities of the respective word candidates corresponding to the calculated phoneme sequence candidates are calculated. And a step of calculating a word string candidate similar to the input speech using the phoneme occurrence probability and the word occurrence probability.

【請求項１１】入力される仮名漢字混じり文字列の取
り込みを行うステップと、仮名漢字混じり文字列と、仮名漢字混じり文字列に対応
する単語表記列と、生起確率とを記憶し、記憶している
単語がそれぞれの話題に対応して分類された漢字ｎグラ
ムを参照して、取り込まれた前記仮名漢字混じり文字列
に対応する各単語候補の単語生起確率を算出するステッ
プと、算出された前記単語生起確率を用いて、入力された前記
仮名漢字混じり文字列に適合する単語列候補を算出する
ステップとを備えた形態素解析方法。11. A step of capturing an input kana-kanji mixed character string, storing a kana-kanji mixed character string, a word notation string corresponding to the kana-kanji mixed character string, and an occurrence probability. Calculating the word occurrence probability of each word candidate corresponding to the captured kana-kanji mixed character string by referring to the kanji n-gram in which the word is classified according to each topic; and Calculating a word string candidate that matches the input kana-kanji mixed character string using the word occurrence probability.

【請求項１２】入力される仮名文字列の取り込みを行
うステップと、仮名文字列と、仮名文字列に対応する単語表記列と、生
起確率とを記憶し、記憶している単語がそれぞれの話題
に対応して分類された仮名ｎグラムを参照して、取り込
まれた前記仮名文字列に対応する各単語候補の単語生起
確率を算出するステップと、算出された前記単語生起確率を用いて、入力された前記
仮名文字列に適合する単語列候補を算出するステップと
を備えた仮名漢字変換方法。12. A step of fetching an input kana character string, storing a kana character string, a word notation string corresponding to the kana character string, and an occurrence probability. Calculating a word occurrence probability of each word candidate corresponding to the fetched kana character string with reference to the kana n-gram classified according to the following: input using the calculated word occurrence probability Calculating a word string candidate that matches the determined kana character string.

【請求項１３】入力される音声の取り込みを行うステ
ップと、取り込まれた前記音声を音韻に変換するステップと、前記音声より変換された各音韻に対応する音韻生起確率
を計算して、音韻列候補を出力するステップと、対象言語の音韻列と、音韻列に対応する単語表記列と、
生起確率とを記憶し、記憶している単語がそれぞれの話
題に対応して分類された音韻ｎグラムを参照して、算出
された前記音韻列候補に対応する各単語候補の単語生起
確率を算出するステップと、前記音韻生起確率と単語生起確率を用いて、入力された
前記音声に類似する単語列候補を算出するステップとを
有する音声認識方法を、コンピュータに実行させるため
のプログラムを記録したコンピュータ読み取り可能な記
録媒体。13. A step of capturing an input voice, a step of converting the captured voice into a phoneme, and calculating a phoneme occurrence probability corresponding to each phoneme converted from the voice to obtain a phoneme sequence. Outputting a candidate, a phoneme string of the target language, a word notation string corresponding to the phoneme string,
The occurrence probabilities are stored, and the stored words are referred to the phoneme n-grams classified according to the respective topics, and the word occurrence probabilities of the respective word candidates corresponding to the calculated phoneme sequence candidates are calculated. And a computer that records a program for causing a computer to execute a speech recognition method having the steps of: calculating a word string candidate similar to the input speech using the phoneme occurrence probability and the word occurrence probability. A readable recording medium.

【請求項１４】入力される仮名漢字混じり文字列の取
り込みを行うステップと、仮名漢字混じり文字列と、仮名漢字混じり文字列に対応
する単語表記列と、生起確率とを記憶し、記憶している
単語がそれぞれの話題に対応して分類された漢字ｎグラ
ムを参照して、取り込まれた前記仮名漢字混じり文字列
に対応する各単語候補の単語生起確率を算出するステッ
プと、算出された前記単語生起確率を用いて、入力された前記
仮名漢字混じり文字列に適合する単語列候補を算出する
ステップとを有する形態素解析方法を、コンピュータに
実行させるためのプログラムを記録したコンピュータ読
み取り可能な記録媒体。14. A step of capturing an input kana-kanji mixed character string, storing a kana-kanji mixed character string, a word notation string corresponding to the kana-kanji mixed character string, and an occurrence probability. Calculating the word occurrence probability of each word candidate corresponding to the captured kana-kanji mixed character string by referring to the kanji n-gram in which the word is classified according to each topic; and Using a word occurrence probability to calculate a word string candidate that matches the input kana-kanji mixed character string. A computer-readable recording medium storing a program for causing a computer to execute the morphological analysis method. .

【請求項１５】入力される仮名文字列の取り込みを行
うステップと、仮名文字列と、仮名文字列に対応する単語表記列と、生
起確率とを記憶し、記憶している単語がそれぞれの話題
に対応して分類された仮名ｎグラムを参照して、取り込
まれた前記仮名文字列に対応する各単語候補の単語生起
確率を算出するステップと、算出された前記単語生起確率を用いて、入力された前記
仮名文字列に適合する単語列候補を算出するステップと
を有する仮名漢字変換方法を、コンピュータに実行させ
るためのプログラムを記録したコンピュータ読み取り可
能な記録媒体。15. A step of fetching an input kana character string, storing a kana character string, a word notation string corresponding to the kana character string, and an occurrence probability. Calculating a word occurrence probability of each word candidate corresponding to the fetched kana character string with reference to the kana n-gram classified according to the following: input using the calculated word occurrence probability Calculating a word string candidate that matches the obtained kana character string. A computer-readable recording medium storing a program for causing a computer to execute the kana-kanji conversion method.