JPH10171490A

JPH10171490A - Continuous speech recognition device

Info

Publication number: JPH10171490A
Application number: JP8330679A
Authority: JP
Inventors: Toshiyuki Takezawa; 寿幸竹澤; Takuma Morimoto; 逞森元
Original assignee: ATR ONSEI HONYAKU TSUSHIN KENKYUSHO KK; ATR Interpreting Telecommunications Research Laboratories
Current assignee: ATR ONSEI HONYAKU TSUSHIN KENKYUSHO KK; ATR Interpreting Telecommunications Research Laboratories
Priority date: 1996-12-11
Filing date: 1996-12-11
Publication date: 1998-06-26
Anticipated expiration: 2016-12-11
Also published as: JP3027543B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to reduce a processing time and improve a recognition rate. SOLUTION: Phoneme collation part 4 recognizes phoneme referring to HMM (hidden Markov model) based on a speech signal of vocalized speech sentence of a free utterance inputted, and a generalized LR parsing part 5 (GLR parser) analyzes the sentence structure and recognizes the above-mentioned vocalized speech sentence in the speech referring to a first LR parsing table generated based on CFG rule (context-free grammar rule), a second LR parsing table generated based on vocabulary rule, and a statistical language model including a bigram of the previous terminal symbol which is the last symbol but one showing a terminal element when the vocalized speech sentence is re-written according to the rule generated based on the CFG rule. The parser GLR 5 calculates a likelihood score for the speech recognition based on a sound score based on HMM and a language score based on the 1st and 2nd LR parsing tables and a statistical language model, and determines the speech recognition result by beam search by using a threshold value.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、連続音声認識装置
に関し、特に、単一又は複数の文節、あるいは単一又は
複数の単語からなる発話音声を効率的に音声認識する連
続音声認識装置に関する。本明細書において、単語及び
形態素を「語」という。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a continuous speech recognition apparatus, and more particularly to a continuous speech recognition apparatus for efficiently recognizing speech uttered from single or plural phrases or single or plural words. In this specification, words and morphemes are referred to as “words”.

【０００２】[0002]

【従来の技術】従来の連続音声認識装置として、例え
ば、音素コンテキスト依存ＬＲパーザにより、逐次状態
分割法（Successive State Splitting：ＳＳＳ）で自動
生成された隠れマルコフ網（以下、ＨＭ網という。）を
駆動して音声認識処理を実行する装置（以下、従来例と
いう。）が、従来技術文献１「永井明人ほか，“逐次状
態分割法（ＳＳＳ）と音素コンテキスト依存ＬＲパーザ
を統合したＳＳＳ−ＬＲ連続音声認識システム”，電子
情報通信学会技術報告，ＳＰ９２−３３，ｐｐ．６９−
７６，３５５−１９９２年」において開示されている。
この連続音声認識装置は、音素継続時間を制御するため
に、逐次状態分割法を用いてＨＭ網とは独立に生成した
音素コンテキスト依存の音素継続時間モデルを使用する
ことを特徴とし、より高い認識率を有し高速で処理する
ことができたことが上記従来技術文献１において報告さ
れている。2. Description of the Related Art As a conventional continuous speech recognition apparatus, for example, a hidden Markov network (hereinafter referred to as an HM network) automatically generated by a phoneme context-dependent LR parser by a successive state splitting method (SSS). A device that drives and executes a speech recognition process (hereinafter, referred to as a conventional example) is disclosed in prior art document 1 “Akihito Nagai et al.,“ SSS-LR that integrates a sequential state division method (SSS) and a phoneme context-dependent LR parser. Continuous Speech Recognition System ", IEICE Technical Report, SP92-33, pp. 69-
76, 355-1992 ".
This continuous speech recognition apparatus uses a phoneme context-dependent phoneme duration model generated independently of the HM network using a sequential state division method to control the phoneme duration. It is reported in the above-mentioned prior art document 1 that the processing can be performed at high speed with high efficiency.

【０００３】上記従来技術文献１において開示された連
続音声認識装置において、音素環境依存の精密な音素モ
デルを利用すると、音素モデルの連接可能性を実行時に
調べながら、音声認識の処理を進めなければならず、音
声認識処理の効率がきわめて悪いという問題点があっ
た。しかも、単語又は文節の境界において、後に実行さ
れる還元（reduce）処理の時点で棄却されるにもかかわ
らず、音素照合時には接続可能とみなされてしまう異音
モデルの数が増大してしまい、処理時間が増大するとと
もに、認識率が大幅に低下していた。In the continuous speech recognition apparatus disclosed in the above-mentioned prior art document 1, if a precise phoneme model dependent on the phoneme environment is used, it is necessary to proceed with the speech recognition processing while checking the connection possibility of the phoneme model at the time of execution. In addition, there is a problem that the efficiency of the voice recognition processing is extremely low. Moreover, at the boundary between words or phrases, the number of allophone models that are regarded as connectable at the time of phoneme collation, despite being rejected at the time of a reduce process performed later, increases. As the processing time increased, the recognition rate decreased significantly.

【０００４】上記の問題点を解決するために、本発明者
は、特願平０７−０８８０４１号の特許出願（特開平０
８−２８６６９４号公報で出願公開済み。）において、
「入力される発声音声に基づいて所定の隠れマルコフモ
デル（ＨＭＭ）を参照して音素認識し、かつ所定のＬＲ
構文解析テーブルを参照して構文解析することにより、
上記発声音声を音声認識する音声認識手段を備えた連続
音声認識装置において、音素間の接続関係を示す所定の
異音規則に基づいて、上記ＬＲ構文解析テーブルにおい
て単語内及び単語間における音素の連接の可能性がない
部分を削除して最適化されたＬＲ構文解析テーブルを出
力する最適化処理手段を備え、上記音声認識手段は、上
記最適化されたＬＲ構文解析テーブルを参照して構文解
析することにより、上記発声音声を音声認識することを
特徴とする連続音声認識装置。」（以下、第１の従来例
という。）を提案している。第１の従来例の連続音声認
識装置においては、具体的には、単語間のすべての連接
可能性を調べ尽くしたＬＲ構文解析テーブル（以下、Ｌ
Ｒテーブルという。）を予め用意することによって、上
記の問題点を解決している。In order to solve the above problems, the present inventor has filed a patent application No. 07-088041 (Japanese Patent Application Laid-Open No.
Application has been published in Japanese Patent Publication No. 8-286694. )
"Phone element recognition is performed with reference to a predetermined hidden Markov model (HMM) based on an input uttered voice, and a predetermined LR
By parsing with reference to the parsing table,
In the continuous speech recognition device provided with a speech recognition means for recognizing the uttered speech, the connection of phonemes within words and between words in the LR syntax analysis table is performed based on a predetermined abnormal sound rule indicating a connection relationship between phonemes. Optimizing processing means for outputting an optimized LR parsing table by deleting a portion where there is no possibility of the LR parsing, wherein the speech recognition means performs parsing with reference to the optimized LR parsing table. A continuous speech recognition apparatus characterized by performing speech recognition of the uttered speech. (Hereinafter referred to as a first conventional example). In the first conventional example of the continuous speech recognition apparatus, specifically, an LR parsing table (hereinafter, referred to as L
It is called R table. The above problem is solved by preparing in advance.

【０００５】また、本発明者は、文脈自由文法形式の統
語的な制約を用いて、部分木系列をスコア付きの仮説と
して出力する、音声パーザの検討を行っており、自然な
発話を扱うために、文法は部分木を単位として記述する
ことを、従来技術文献２「竹沢寿幸ほか，“自然発話の
言語現象と音声認識用日本語文法”，情報処理学会研究
報告，９５−ＳＬＰ−６−５，１９９５年」（以下、第
２の従来例という。）において提案している。この第２
の従来例においては、例えば、「それでは、鈴木和子
様」という発話があった場合、仮に「それでは」と「鈴
木和子様」の二つの文節に分けたとしても、断片的な発
話なので、文としての構造を持っているとは必ずしも言
えない。このような背景から、部分的な構造を表現する
ことが必要となり、本発明者はそれを部分木と名付けて
いる。このアプローチの考え方は、まず部分木に基づく
文法を採用することで、文法の被覆率を高め、音声認識
部から出力される構造を、音声翻訳や音声対話システム
の言語処理部で利用することにより、全体として効率的
な音声言語統合処理を実現している。The inventor has been studying a speech parser that outputs a subtree sequence as a hypothesis with a score using a syntactic constraint in a context-free grammar form. Prior art document 2, "Takezawa Toshiyuki et al.," Language Phenomena of Natural Speech and Japanese Grammar for Speech Recognition ", Information Processing Society of Japan, 95-SLP-6- 5, 1995 "(hereinafter referred to as a second conventional example). This second
In the conventional example of, for example, if there is an utterance "Well-like Suzuki-sama", even if it is divided into two phrases, "Well then" and "Suzuki-Sawaki-sama," it is a fragmentary utterance. It is not necessarily said that it has the structure of. From such a background, it is necessary to express a partial structure, and the present inventor calls it a subtree. The idea of this approach is to first increase the coverage of the grammar by using a grammar based on subtrees, and to use the structure output from the speech recognition unit in the language processing unit of the speech translation and spoken dialogue system. As a whole, efficient speech language integration processing is realized.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、第１と
第２の従来例の音声認識装置においては、いまだ処理時
間は比較的長く、認識率は比較的低いという問題点があ
った。However, the first and second conventional speech recognition apparatuses still have a problem that the processing time is still relatively long and the recognition rate is relatively low.

【０００７】本発明の目的は以上の問題点を解決し、従
来例に比較して処理時間を短縮することができるととも
に、認識率を改善することができる連続音声認識装置を
提供することにある。An object of the present invention is to solve the above problems and to provide a continuous speech recognition apparatus capable of reducing the processing time and improving the recognition rate as compared with the conventional example. .

【０００８】[0008]

【課題を解決するための手段】本発明に係る請求項１記
載の連続音声認識装置は、入力される自由発話の発声音
声文の音声信号に基づいて音声認識する音声認識手段を
備えた連続音声認識装置において、上記音声認識手段
は、上記音声信号に基づいて所定の隠れマルコフモデル
を参照して音素認識し、かつ、所定の文脈自由文法規則
に基づいて生成された第１のＬＲ構文解析テーブルと、
所定の語彙規則に基づいて生成された第２のＬＲ構文解
析テーブルと、上記文脈自由文法規則に基づいて生成さ
れた、上記文脈自由文法規則で書き換えたときの末端の
要素を示す終端記号の１つ手前の記号である前終端記号
のバイグラムを含む統計的言語モデルとを参照して構文
解析することにより、上記発声音声文を音声認識するこ
とを特徴とする。According to a first aspect of the present invention, there is provided a continuous speech recognition apparatus comprising a speech recognition means for recognizing a speech based on an input speech signal of a free speech utterance. In the recognition device, the speech recognition unit performs phoneme recognition with reference to a predetermined hidden Markov model based on the voice signal, and generates a first LR syntax analysis table generated based on a predetermined context-free grammar rule. When,
A second LR syntax analysis table generated based on a predetermined vocabulary rule, and one of terminal symbols generated based on the context-free grammar rule and indicating terminal elements when rewritten by the context-free grammar rule Parsing is performed with reference to a statistical language model including a bigram of the preceding terminal symbol, which is the preceding symbol, to recognize the uttered speech sentence.

【０００９】また、請求項２記載の連続音声認識装置
は、請求項１記載の連続音声認識装置において、上記語
彙規則に対して開始記号から前終端記号への規則を追加
した後、第２のＬＲ構文解析テーブルの各状態と、各状
態の要素の命令内容を決定することにより、上記第２の
ＬＲ構文解析テーブルを生成する生成手段をさらに備え
たことを特徴とする。In the continuous speech recognition apparatus according to a second aspect of the present invention, in the continuous speech recognition apparatus according to the first aspect, after adding a rule from a start symbol to a preceding terminal symbol to the vocabulary rule, a second It is characterized by further comprising a generating means for generating the second LR syntax analysis table by determining each state of the LR syntax analysis table and the instruction content of the element of each state.

【００１０】さらに、請求項３記載の連続音声認識装置
は、請求項１又は２記載の連続音声認識装置において、
上記音声認識手段は、上記隠れマルコフモデルに基づい
た音響スコアと、上記第１と第２のＬＲ構文解析テーブ
ルと上記統計的言語モデルとに基づいた言語スコアとに
基づいて音声認識のための尤度スコアを計算し、所定の
しきい値を用いてビーム探索により音声認識結果を決定
することを特徴とする。Further, the continuous speech recognition apparatus according to claim 3 is the continuous speech recognition apparatus according to claim 1 or 2,
The speech recognition means includes a speech score based on the hidden Markov model, and a likelihood for speech recognition based on a language score based on the first and second LR parsing tables and the statistical language model. A speech score is calculated, and a speech recognition result is determined by beam search using a predetermined threshold value.

【００１１】またさらに、請求項４記載の連続音声認識
装置は、請求項３記載の連続音声認識装置において、上
記音声認識手段は、上記音響スコアの対数値と、上記言
語スコアの対数値に所定の重み係数を乗算した値とを加
算した値を尤度スコアとして計算することを特徴とす
る。Further, in the continuous speech recognition apparatus according to a fourth aspect of the present invention, in the continuous speech recognition apparatus according to the third aspect, the speech recognition means may determine a logarithmic value of the acoustic score and a logarithmic value of the language score. A value obtained by adding a value obtained by multiplying the weight coefficient by the weight coefficient is calculated as a likelihood score.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して本発明に係
る実施形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１３】図１に本発明に係る一実施形態の連続音声
認識装置を示す。図１に示すように、この連続音声認識
装置は、大きく分けて、（ａ）自由発話音声の音声信号の特徴パラメータに基づ
いて、隠れマルコフ網メモリ（以下、ＨＭ網メモリとい
う。）１０に記憶された隠れマルコフ網（以下、ＨＭ網
という。）を参照して音素照合を行い、音響モデルに基
づく音声認識スコアを出力する音素照合部４と、（ｂ）一般化されたＬＲ構文解析部（以下、ＧＬＲパー
ザという。）５とを備え、ＧＬＲパーザ５は、（ｂ−
１）文脈自由文法規則メモリ（以下、ＣＦＧルールメモ
リという。）３１に記憶された文脈自由文法規則（以
下、ＣＦＧルールという。）に基づいて第１のＬＲテー
ブル生成部２１によって生成されてＣＦＧルールＬＲテ
ーブルメモリ（以下、第１のＬＲテーブルメモリとい
う。）１１に記憶されたＣＦＧルールＬＲテーブル（以
下、第１のＬＲテーブルという。）と、（ｂ−２）語彙
規則メモリ３２に記憶された語彙規則に基づいて第２の
ＬＲテーブル生成部２２によって生成されて語彙規則Ｌ
Ｒテーブルメモリ（以下、第２のＬＲテーブルメモリと
いう。）１２に記憶された語彙規則ＬＲテーブル（以
下、第２のＬＲテーブルという。）と、（ｂ−３）ＣＦ
Ｇルールメモリ３１に記憶されたＣＦＧルールに基づい
て統計的言語モデル生成部２３によって生成されて統計
的言語モデルメモリ１３に記憶された、前終端記号のバ
イグラムを含む統計的言語モデルと、を参照して、ＬＲ
構文解析処理を含む音声認識処理を実行して音声認識結
果データを出力することを特徴としている。ここで、
「終端記号」とは、ＣＦＧルールで書き換えたときの末
端の要素、具体的には、構文木の葉（リーフ）の音素又
は単語を示す記号である。FIG. 1 shows a continuous speech recognition apparatus according to an embodiment of the present invention. As shown in FIG. 1, this continuous speech recognition apparatus is roughly divided into: (a) a hidden Markov network memory (hereinafter, referred to as an HM network memory) 10 based on feature parameters of a speech signal of a freely uttered speech; A phoneme matching unit 4 that performs phoneme matching with reference to the obtained hidden Markov network (hereinafter, referred to as an HM network) and outputs a speech recognition score based on an acoustic model; and (b) a generalized LR syntax analysis unit ( The GLR parser 5 is hereinafter referred to as a GLR parser.
1) The first LR table generation unit 21 generates a CFG rule based on a context-free grammar rule (hereinafter, referred to as a CFG rule) stored in a context-free grammar rule memory (hereinafter, referred to as a CFG rule memory) 31. A CFG rule LR table (hereinafter, referred to as a first LR table) stored in an LR table memory (hereinafter, referred to as a first LR table memory) 11 and a (b-2) vocabulary rule memory 32, which is stored in a vocabulary rule memory 32 The lexical rule L generated by the second LR table generating unit 22 based on the lexical rule
A vocabulary rule LR table (hereinafter, referred to as a second LR table) stored in an R table memory (hereinafter, referred to as a second LR table memory) 12, and (b-3) CF.
Refer to the statistical language model including the bigram of the pre-terminal symbol generated by the statistical language model generation unit 23 based on the CFG rule stored in the G rule memory 31 and stored in the statistical language model memory 13. LR
A speech recognition process including a syntax analysis process is executed to output speech recognition result data. here,
The “terminal symbol” is a symbol indicating a terminal element when rewritten according to the CFG rule, specifically, a phoneme or word of a leaf of a syntax tree.

【００１４】本実施形態においては、前終端記号バイグ
ラムの評価を予測的に行うために、ＧＬＲパーザ５にお
ける辞書引きの実装方法を変更するとともに、ビーム探
索の枝刈りの条件と、スコアの計算式を改良している。
ＬＲテーブルを用いる音声認識装置においては、先読み
した語の代わりに、その語の品詞を先読み情報として用
いる。これを、終端記号以外の記号を示す非終端記号と
区別するために、終端記号の１つ手前の記号であるとい
う意味で、前終端記号（preterminal）といい、＜ｐｒ
ｅｔｅｒｍ＞で表わす。また、非終端記号から品詞を除
いたものを純非終端記号と定義する。In the present embodiment, in order to predictively evaluate the preterminal bigram, the method of implementing dictionary lookup in the GLR parser 5 is changed, and the conditions for the pruning of beam search and the score calculation formula Has been improved.
In a speech recognition device using an LR table, the part of speech of a word is used as prefetch information instead of the prefetched word. In order to distinguish this from a non-terminal symbol indicating a symbol other than the terminal symbol, it is referred to as a preterminal symbol (preterminal) in the sense that it is a symbol immediately before the terminal symbol, and <pr
term>. A nonterminal symbol excluding the part of speech is defined as a pure nonterminal symbol.

【００１５】例えば、前終端記号バイグラムを利用する
場合の言語スコアは予測された音素系列の文法履歴か
ら、For example, the language score when using the preterminal bigram is obtained from the grammar history of the predicted phoneme sequence.

【数１】（＜前終端記号＞→終端記号）という形式の構文規則を取り出して計算することが考え
られる。つまり、予測された音素系列の中で確定した語
についての言語スコアを計算することが考えられる。前
終端記号バイグラムの評価を語候補が確定する前に行う
ほうが効率的な探索が実現できると期待できる。以上の
実現方法による装置を、以下、比較例という。しかしな
がら、語彙項目と構文規則を一緒にしてＬＲテーブルを
作成してしまうと、前終端記号バイグラムを予測的に評
価する探索を実現しにくい。そこで、本実施形態におい
ては、構文規則のみからなる第１のＬＲテーブルと、語
彙項目のみからなる第２のＬＲテーブルとの２つに分離
することを特徴としている。It is conceivable to extract and calculate a syntax rule of the form ## EQU1 ## (<pre-terminal symbol> → terminal symbol). That is, it is conceivable to calculate a language score for a word determined in the predicted phoneme sequence. It can be expected that an efficient search can be realized by evaluating the preterminal bigram before the word candidate is determined. An apparatus according to the above-described realization method is hereinafter referred to as a comparative example. However, if an LR table is created by combining vocabulary items and syntax rules, it is difficult to implement a search for predictively evaluating a preterminal bigram. Therefore, the present embodiment is characterized in that the first LR table is composed only of syntax rules and the second LR table is composed only of vocabulary items.

【００１６】以下、具体例を使って説明する。表１に簡
単な文法規則の記述例を示す。Hereinafter, a specific example will be described. Table 1 shows a simple grammar rule description example.

【００１７】[0017]

【表１】文法の記述例 ────────────────────────── （１）＜Ｓ＞→＜ＰＰ＞＜Ｓ＞（２）＜Ｓ＞→＜Ｖ＞（３）＜ＰＰ＞→＜Ｎ＞＜Ｐ＞（４）＜ＰＰ＞→＜Ｓ＞＜Ｐ＞（５）＜Ｖ＞→ｋｉｔａ（６）＜Ｖ＞→ｔｕｔａｗａｑｔａ（７）＜Ｎ＞→ｋｉｔａ（８）＜Ｎ＞→ｂｕｎｇｋａ（９）＜Ｐ＞→ｋａｒａ（１０）＜Ｐ＞→ｇａ ──────────────────────────[Table 1] Description example of grammar ────────────────────────── (1) <S> → <PP> <S> (2) <S> → <V> (3) <PP> → <N> (4) <PP> → <S> (5) <V> → kita (6) <V> → utawaqta (7) <N> → kita (8) <N> → bungka (9) → kara (10) → g a ──────────────────────────

【００１８】表１において、＜Ｓ＞は文であり、＜ＰＰ
＞は後置詞句であり、＜Ｎ＞は名詞であり、＜Ｐ＞は助
詞である。上記表１の（１）の規則は、「文Ｓは、名詞
Ｎと、後置詞句ＰＰとがこの順序で並んだものであ
る。」ということを示している。また、上記数１の
（５）の規則は、「動詞Ｖは、ｋｉｔａ（きた）であ
る。」ということを示しており、さらに、上記数１の
（７）の規則は、「名詞Ｎは、ｋｉｔａ（北）であ
る。」ということを示しており、またさらに、上記数１
の（９）の規則は、「助詞Ｐは、ｋａｒａ（から）であ
る。」ということを示している。そして、上記表１に示
す文法規則に基づいて、例えば第１の従来例の方法を用
いてＬＲテーブルを作成すると表２及び表３に示すＬＲ
テーブルを得ることができる。In Table 1, <S> is a sentence, and <PP>
> Is a postpositional phrase, <N> is a noun, and is a particle. The rule of (1) in Table 1 indicates that "the sentence S is a sequence in which the noun N and the postposition phrase PP are arranged in this order." In addition, the rule of (5) in the above equation 1 indicates that “verb V is kita (Kita).” Further, the rule of (7) in the above equation 1 indicates that “noun N is , Kita (north). "
(9) indicates that “the particle P is kara (kara)”. Then, based on the grammar rules shown in Table 1 above, for example, when the LR table is created using the method of the first conventional example, the LR tables shown in Tables 2 and 3 are obtained.
You can get a table.

【００１９】[0019]

【表２】 [Table 2]

【００２０】[0020]

【表３】 [Table 3]

【００２１】表２及び表３において、ＬＲテーブルは、
左側部分の動作（ＡＣＴＩＯＮ）表と、右側部分の行先
（ＧＯＴＯ）表とからなり、動作表は、一連の状態番号
が付された各状態において上側に示す音素が入力された
ときにどのような構文解析動作を実行するかを示す一
方、行先表は各状態において動作を実行後に移動する先
の状態番号を示す。ここで、＄は文末記号を示す。表２
及び表３において、例えば、状態０で音素“ｂ”が来れ
ば、“ｓ１”すなわちルール１にシフト（移動遷移）
し、ある規則でレデュースされたあとスタックの状態が
０となり、そのときの規則の左辺が名詞Ｎであれば状態
２に行くことを示す。また、例えば、状態６で音素
“ｋ”が来れば、“ｒ２”すなわちルール２を還元（レ
デュース）することを示す。さらに、状態７で文末記号
＄が来れば、受理（ａｃｃ）することを示す。ＬＲテー
ブルについての詳細については、従来技術文献３「田中
穂積著，“自然言語解析の基礎”，ｐｐ．７８−１０
４，産業図書，平成元年１１月２７日初版発行」におい
て説明されている。In Tables 2 and 3, the LR table is
The operation table includes an operation (ACTION) table on the left side and a destination (GOTO) table on the right side. The operation table shows what kind of state when a phoneme shown above is input in each state with a series of state numbers. While indicating whether to perform the syntax analysis operation, the destination table indicates the state number to which the operation moves after executing the operation in each state. Here, ＄ indicates an end-of-sentence symbol. Table 2
In Table 3 and Table 3, for example, if phoneme “b” comes in state 0, shift to “s1”, that is, rule 1 (moving transition)
Then, the state of the stack becomes 0 after being reduced by a certain rule, and if the left side of the rule at that time is a noun N, it indicates that the state is to go to state 2. Further, for example, if a phoneme “k” comes in state 6, it indicates that “r2”, that is, rule 2 is to be reduced (reduced). Further, when the end-of-sentence symbol 来 comes in state 7, it indicates that the sentence is accepted (acc). For details of the LR table, refer to Prior Art Document 3 “Hozumi Tanaka,“ Basics of Natural Language Analysis ”, pp. 78-10.
4, Industrial Books, first edition issued on November 27, 1989 ".

【００２２】しかしながら、表２及び表３においては、
前終端記号の情報は含まれていないので、何らかの方法
で元の構文規則を参照しなければならないという問題点
がある。そこで、本実施形態においては、表１に示す文
法規則を、表４のような前終端記号までの文法規則（構
文規則）と、表５のような語彙規則に分離する。However, in Tables 2 and 3,
Since the information of the preterminal is not included, there is a problem that the original syntax rule must be referred to in some way. Therefore, in the present embodiment, the grammar rules shown in Table 1 are separated into grammar rules (syntax rules) up to the preterminal symbol as shown in Table 4 and vocabulary rules as shown in Table 5.

【００２３】[0023]

【表４】前終端記号までの文法規則 ──────────────── （１）＜Ｓ＞→＜ＰＰ＞＜Ｓ＞（２）＜Ｓ＞→Ｖ（３）＜ＰＰ＞→ＮＰ（４）＜ＰＰ＞→＜Ｓ＞Ｐ ────────────────[Table 4] Grammar rules up to the preterminal symbol ──────────────── (1) <S> → <PP> <S> (2) <S> → V (3 ) <PP> → NP (4) <PP> → <S> P ────────────────

【００２４】[0024]

【表５】語彙規則 ───────────────────────── （１）＜ｐｒｅｔｅｒｍ＞→＜Ｖ＞（２）＜ｐｒｅｔｅｒｍ＞→＜Ｎ＞（３）＜ｐｒｅｔｅｒｍ＞→＜Ｐ＞（４）＜Ｖ＞→ｋｉｔａ（５）＜Ｖ＞→ｔｕｔａｗａｑｔａ（６）＜Ｎ＞→ｋｉｔａ（７）＜Ｎ＞→ｂｕｎｇｋａ（８）＜Ｐ＞→ｋａｒａ（９）＜Ｐ＞→ｇａ ─────────────────────────[Table 5] Lexical rules ───────────────────────── (1) <preterm> → <V> (2) <preterm> → <N > (3) <preterm> → (4) <V> → kita (5) <V> → tutawaqta (6) <N> → kita ( 7) <N> → bungka (8) → karra (9) → ga─────────────────── ──────

【００２５】表４の文法規則では元の文法の前終端記号
が終端記号となっている。表４の文法規則に基づいて、
詳細後述する第１のＬＲテーブル生成部２１によって実
行される第１のＬＲテーブル生成処理により第１のＬＲ
テーブルを作成すると、表６を得ることができる。In the grammar rules shown in Table 4, the terminal symbol is the former terminal symbol of the original grammar. Based on the grammar rules in Table 4,
A first LR table is generated by a first LR table generation process executed by a first LR table generation unit 21 described in detail later.
When the table is created, Table 6 can be obtained.

【００２６】[0026]

【表６】 [Table 6]

【００２７】表５及び表６において、＜Ｖ＞，＜Ｎ＞，
＜Ｐ＞，＜ＰＰ＞は非終端記号を示している。先読み可
能な記号が前終端記号なので、本実施形態においては、
次につながる可能性のある前終端記号を簡単に予測する
ことができる。つまり、音声認識過程で前終端記号バイ
グラムの評価を予測的に活用することができる。さら
に、表５の語彙規則に対して、詳細後述する第２のＬＲ
テーブル生成部２２によって実行される第２のＬＲテー
ブル生成処理によってＬＲテーブルを作ると、表７及び
表８を得ることができる。なお、表７及び表８におい
て、表の一部は省略しており、…で表している。In Tables 5 and 6, <V>, <N>,
 and <PP> indicate non-terminal symbols. Since the pre-readable symbol is the pre-terminal symbol, in the present embodiment,
It is possible to easily predict the next pre-terminal that may be connected. In other words, the evaluation of the preterminal bigram can be used predictively in the speech recognition process. Further, the vocabulary rules shown in Table 5 are compared with the second LR described in detail later.
When an LR table is created by the second LR table creation process executed by the table creation unit 22, Tables 7 and 8 can be obtained. In Tables 7 and 8, some of the tables are omitted, and are represented by.

【００２８】[0028]

【表７】 [Table 7]

【００２９】[0029]

【表８】 [Table 8]

【００３０】表７及び表８に示す第２のＬＲテーブルに
おいては、シフト動作のところに到達可能なカテゴリ
（元の文法の前終端記号）の情報が埋め込まれているた
め、不必要な音素照合を削減することができ、これによ
って、詳細後述するように、音声認識処理を従来例に比
較して高速化しかつより高い認識率で実行することがで
きるという利点がある。本実施形態の利点は、次のよう
に要約できる。（Ｉ）語候補が確定する前に、前終端記号バイグラムを
評価することが簡単にできる。（II）人名などの新語登録を簡便に実現することができ
る。（III）未登録語の扱いも語レベルで行うことができ
る。In the second LR tables shown in Tables 7 and 8, information on a category (pre-terminal symbol of the original grammar) that can be reached at the shift operation is embedded. As a result, as described later in detail, there is an advantage that the voice recognition processing can be performed at a higher speed and with a higher recognition rate than the conventional example. The advantages of this embodiment can be summarized as follows. (I) It is easy to evaluate a preterminal bigram before a word candidate is determined. (II) It is possible to easily register new words such as personal names. (III) Unregistered words can be handled at the word level.

【００３１】図２は、図１の第１のＬＲテーブル生成部
２１によって実行される第１のＬＲテーブル生成処理を
示すフローチャートである。FIG. 2 is a flowchart showing a first LR table generation process executed by the first LR table generation section 21 of FIG.

【００３２】図２において、まず、ステップＳ１におい
て、ＣＦＧルールメモリ３１から、例えば表４に示すよ
うな前終端記号までのＣＦＧルール（文脈自由文法規
則）を読み出す。次いで、ステップＳ２において、読み
出したＣＦＧルールに対して規則［＜ＳＳ＞→＜Ｓ＞］
を追加する。ここで、＜ＳＳ＞は開始記号（Start Symb
ol）である。さらにステップＳ３において、第１のＬＲ
テーブルの各状態の要素を求める。具体的には、次の処
理を行う。（ａ）アイテム集合（クロージャ）の集合をＣとし、そ
の初期値を次式で表わす。In FIG. 2, first, in step S1, CFG rules (context-free grammar rules) up to a preterminal symbol as shown in Table 4 are read from the CFG rule memory 31, for example. Next, in step S2, a rule [<SS> → <S>] is applied to the read CFG rule.
Add. Here, <SS> is a start symbol (Start Symb
ol). Further, in step S3, the first LR
Find the elements of each state in the table. Specifically, the following processing is performed. (A) A set of item sets (closures) is C, and the initial value is represented by the following equation.

【数２】Ｃ＝｛Ｃｌｏｓｕｒｅ（｛［＜ＳＳ＞→・＜Ｓ
＞］｝）｝（ｂ）集合Ｃの中の各アイテム集合（クロージャ）Ｉに
対して、以下の計算を行う。アイテム集合（クロージ
ャ）Ｉを構成するアイテム中の右辺の各非終端記号Ａに
対して、C = ２Closure (｛[<SS> → ・ <S
>] {)} (B) The following calculation is performed for each item set (closure) I in the set C. For each non-terminal symbol A on the right side of the items that make up the item set (closure) I,

【数３】Ｇｏｔｏ（Ｉ，Ａ）を計算する。その結果が空でなく、かつＣに含まれてい
なければ、Ｃに付加する。この処理をＣに付加すべきア
イテム集合がなくなるまで繰り返す。上記のアイテム、
クロージャ関数、Ｇｏｔｏ関数の説明は後述する。以上
の処理で得られた各アイテム集合Ｉ_iがＬＲテーブルの
状態ｉの要素を表す。## EQU3 ## Goto (I, A) is calculated. If the result is not empty and is not included in C, it is added to C. This process is repeated until there is no item set to be added to C. Items above,
The description of the closure function and the Goto function will be described later. Each item set I _i obtained in the above processing represents the elements of the state i of the LR table.

【００３３】次いで、ステップＳ４において、第１のＬ
Ｒテーブルの各状態の要素の命令内容を決定する。具体
的には、次の処理を行う。（ａ）Ｇｏｔｏ（Ｉ_i，Ｐｒｅｔｅｒｍ＊）＝Ｉ_jなら
ば、Ａｃｔｉｏｎ［ｉ，Ｐｒｅｔｅｒｍ＊］にシフト操
作“Ｓｈｉｆｔｊ”を書き込む。（ｂ）もし［Ｂ→α・］∈Ｉ_iなら関数Ｆｏｌｌｏｗ
（Ｂ）に含まれるすべての前終端記号Ｐｒｅｔｅｒｍ＊
に対してＡｃｔｉｏｎ［ｉ，Ｐｒｅｔｅｒｍ＊］にレデ
ュース操作（還元操作）“ｒｅｄｕｃｅｂｙ［Ｂ→
α］”を書き込む。ここで、Ｆｏｌｌｏｗ関数は詳細後
述する。（ｃ）もし［＜ＳＳ＞→＜Ｓ＞・］∈Ｉ_iならＡｃｔｉ
ｏｎ［ｉ，＄］に「受理（ａｃｃ）」と書き込む。（ｄ）純非終端記号Ａに対して、もしＧｏｔｏ（Ｉ_i，
Ａ）＝Ｉ_jならば、第１のＬＲテーブルに対してＧｏｔ
ｏ［ｉ，Ａ］＝ｊと書き込む。（ｅ）空白のまま残った要素は失敗となる。Next, in step S4, the first L
The instruction content of each state element of the R table is determined. Specifically, the following processing is performed. (A) If Goto (I _i , Preterm *) = I _j , write the shift operation “Shift j” to Action [i, Preterm *]. (B) If [B → α ·] ∈I _{i, the} function Follow
All the preterminal symbols Preterm * included in (B)
In response to Action [i, Preterm *], a reduce operation (reduction operation) “reduce by [B →
writing α] ". Here, Follow function will be described in detail later. (c) If [if <SS> → <S> · ] ∈I i Acti
Write “accepted (acc)” in on [i, ＄]. (D) For a pure nonterminal A, if Goto (I _i ,
A) If I _j, then Got to the first LR table
Write o [i, A] = j. (E) Elements left blank will fail.

【００３４】さらに、ステップＳ５で、例えば表６に示
すような、生成した第１のＬＲテーブルを第１のＬＲテ
ーブルメモリ１１に書き込み、当該第１のＬＲテーブル
生成処理を終了する。上記第１のＬＲテーブル生成処理
において、アイテムとは、文法規則に、解析位置を表す
ドット（・）を加えたものである。例えば、規則［＜Ｓ
＞→Ｖ］からは２つのアイテム（Ａ）［＜Ｓ＞→・Ｖ］
と、（Ｂ）［＜Ｓ＞→Ｖ・］が得られる。（Ａ）はこれ
から解析が始まることを表し、（Ｂ）は解析が終わった
ことを表す。Further, in step S5, the generated first LR table, for example, as shown in Table 6, is written to the first LR table memory 11, and the first LR table generation processing ends. In the first LR table generation process, an item is obtained by adding a dot (•) representing an analysis position to a grammatical rule. For example, the rule [<S
> → V], two items (A) [<S> → · V]
And (B) [<S> → V ·] are obtained. (A) indicates that the analysis is about to start, and (B) indicates that the analysis has ended.

【００３５】クロージャ関数の処理は次の通りである。
Ｃｌｏｓｕｒｅ（Ｉ）に［Ａ→α・Ｂβ］があれば、す
べての［Ｂ→γ］に対して、重複がない限り、［Ｂ→・
γ］をＣｌｏｓｕｒｅ（Ｉ）に加える。この処理はＣｌ
ｏｓｕｒｅ（Ｉ）に加えるべき新しいアイテムがなくな
るまで繰り返す。The processing of the closure function is as follows.
If [A → α · Bβ] is present in Closure (I), [B →.
γ] to Closure (I). This process is Cl
Repeat until there are no more new items to add to OSURE (I).

【００３６】Ｇｏｔｏ関数の処理は次の通りである。ア
イテム集合Ｉと非終端記号Ｘが与えられたとき、関数Ｇ
ｏｔｏ（Ｉ，Ｘ）の関数値は、Ｉ中のすべてのアイテム
［Ａ→α・Ｘβ］に対して、ドットの位置を１つ右にず
らしたアイテム［Ａ→αＸ・β］から得られるすべての
クロージャの和集合である。The processing of the Goto function is as follows. Given an item set I and a non-terminal symbol X, the function G
The function value of auto (I, X) is obtained from all items [A → αX · β] in which the dot position is shifted to the right by one for all items [A → α · Xβ] in I. Is the union of the closures.

【００３７】Ｆｏｌｌｏｗ関数の処理は次の通りであ
る。（ａ）開始記号Ｓに対し、関数Ｆｏｌｌｏｗ（Ｓ）に終
端記号＄を加える。ただし、＄は入力文の終わりを表す
記号である。（ｂ）もし生成規則［Ｂ→αＡβ］があれば、βの最左
導出（Leftmost drivation）をすべて関数Ｆｏｌｌｏｗ
（Ａ）に加える。（ｃ）もし生成規則［Ｂ→αＡ］があれば、関数Ｆｏｌ
ｌｏｗ（Ｂ）を関数Ｆｏｌｌｏｗ（Ａ）に加える。The processing of the Follow function is as follows. (A) For the start symbol S, add the terminal symbol ＄ to the function Follow (S). Here, ＄ is a symbol indicating the end of the input sentence. (B) If there is a production rule [B → αAβ], all the leftmost derivations of β (Leftmost drivation) are functions Follow
Add to (A). (C) If there is a production rule [B → αA], the function Fol
Add low (B) to the function Follow (A).

【００３８】図３は、図１の第２のＬＲテーブル生成部
２２によって実行される第２のＬＲテーブル生成処理を
示すフローチャートである。図３において、まず、ステ
ップＳ１１において、語彙規則メモリ３２から、例えば
表５に示すような語彙規則を読み出す。次いで、ステッ
プＳ１２において、読み出した語彙規則に対して規則
［＜ＳＳ＞→＜ｐｒｅｔｅｒｍ＞］を追加する。ここ
で、規則［＜ＳＳ＞→＜ｐｒｅｔｅｒｍ＞］は、開始記
号から前終端記号への規則であり、語彙規則のための第
２のＬＲテーブルの開始位置を示す。そして、ステップ
Ｓ１３において、第２のＬＲテーブルの各状態の要素を
求める。具体的な処理は次の通りである。（ａ）条件付アイテム集合（条件付クロージャ）の集合
をＣとし、その初期値を次式で表わす。FIG. 3 is a flowchart showing a second LR table generation process executed by the second LR table generation section 22 of FIG. In FIG. 3, first, in step S11, vocabulary rules as shown in Table 5 are read from the vocabulary rule memory 32, for example. Next, in step S12, a rule [<SS> → <preterm>] is added to the read vocabulary rule. Here, the rule [<SS> → <preterm>] is a rule from the start symbol to the pre-terminal symbol, and indicates the start position of the second LR table for the vocabulary rule. Then, in step S13, the elements of each state of the second LR table are obtained. The specific processing is as follows. (A) The set of conditional item sets (conditional closures) is C, and the initial value is represented by the following equation.

【数４】Ｃ＝{条件付Ｃｌｏｓｕｒｅ({[＜ＳＳ＞→・＜
ｐｒｅｔｅｒｍ＞,{}]})} （ｂ）集合Ｃの中の各条件付アイテム集合（条件付クロ
ージャ）Ｉに対して、以下の計算を行う。条件付アイテ
ム集合（条件付クロージャ）Ｉを構成する条件付アイテ
ム中の右辺の各非終端記号Ａに対して、C = {Conditional Closure ({[<SS> →. <
preterm>, {}]})} (b) The following calculation is performed for each conditional item set (conditional closure) I in the set C. For each non-terminal symbol A on the right-hand side of the conditional item constituting the conditional item set (conditional closure) I,

【数５】条件付Ｇｏｔｏ（Ｉ，Ａ）を計算する。その結果が空（empty）でなく、かつ集合
Ｃに含まれていなければ、集合Ｃに付加する。この処理
を集合Ｃに付加すべき条件付アイテム集合がなくなるま
で繰り返す。上記条件付アイテム、条件付クロージャ関
数、条件付Ｇｏｔｏ関数の説明は詳細後述する。以上の
処理で得られた各条件付アイテム集合Ｉ_iが第２のＬＲ
テーブルの状態ｉの要素を表す。## EQU5 ## Conditional Goto (I, A) is calculated. If the result is not empty and is not included in the set C, it is added to the set C. This process is repeated until there is no more conditional item set to be added to the set C. Details of the conditional item, conditional closure function, and conditional Goto function will be described later in detail. Item set with the conditions obtained in the above process I _i is the second LR
Represents the element of state i in the table.

【００３９】次いで、ステップＳ１４において、第２の
ＬＲテーブルの各状態の要素の命令内容を決定する。具
体的な処理は次の通りである。（ａ）Ｇｏｔｏ（Ｉ_i，Ｐｈｏｎｅ＊）＝Ｉ_jならば、Ａ
ｃｔｉｏｎ［ｉ，Ｐｈｏｎｅ＊］にシフト操作“Ｓｈｉ
ｆｔｊ，｛条件付アイテムＩ_iの条件｝”を書き込
む。（ｂ）もし［Ｂ→α・］∈Ｉ_iならば、関数Ｆｏｌｌｏ
ｗ（Ｂ）に含まれるすべての前終端記号Ｐｈｏｎｅ＊に
対してＡｃｔｉｏｎ［ｉ，Ｐｈｏｎｅ＊］にレデュース
操作“ｒｅｄｕｃｅｂｙ［Ｂ→α］”を書き込む。こ
こで、Ｆｏｌｌｏｗ関数の説明は詳細後述する。（ｃ）もし［＜ＳＳ＞→＜ｐｒｅｔｅｒｍ＞・］∈Ｉ_i
ならば、Ａｃｔｉｏｎ［ｉ，＄］に「受理（ａｃｃ）」
と書き込む。（ｄ）純非終端記号Ａに対して、もしＧｏｔｏ（Ｉ_i，
Ａ）＝Ｉ_jならば、Ｇｏｔｏ［ｉ，Ａ］＝ｊと書き込
む。（ｅ）空白のまま残った要素は失敗となる。Next, in step S14, the instruction content of the element of each state of the second LR table is determined. The specific processing is as follows. (A) If Goto (I _i , Phone *) = I _j , A
shift [i, Phone *] to the shift operation “Shi
ft j, writes {condition conditional item I _i} ". (b) If [B → α ·] If ∈I _i, function Follo
For all the preterminal symbols Phone * included in w (B), the reduce operation “reduce by [B → α]” is written to Action [i, Phone *]. Here, the description of the Follow function will be described later in detail. (C) If [<SS> → <preterm> ·] ∈I i
Then, Action [i, ＄] indicates “acceptance (acc)”
Write (D) For a pure nonterminal A, if Goto (I _i ,
If A) = I _j , write Goto [i, A] = j. (E) Elements left blank will fail.

【００４０】さらに、ステップＳ１５において、表７及
び表８に示すような、作成した第２のＬＲテーブルを第
２のＬＲテーブルメモリ１２に書き込み、当該第２のＬ
Ｒテーブル生成処理を終了する。上記の条件付アイテム
とは、アイテムに条件を加えたものである。その形式
は、Further, in step S15, the created second LR table as shown in Tables 7 and 8 is written into the second LR table memory 12, and the second LR table is written in the second LR table memory 12.
The R table generation processing ends. The above-mentioned conditional item is obtained by adding a condition to the item. Its format is

【数６】［Ａ→α・β，｛Ｘ_i｝］である。また、条件付クロージャ関数の処理は次の通り
である。（ａ）Ｓが開始記号であるＩの条件付アイテム［Ｓ→・
α，｛｝］に対して、クロージャＣｌｏｓｕｒｅ（Ｉ）
に［Ｓ→・α，｛Ｓ｝］を加える。Ｉの他の条件付アイ
テムはそのままクロージャＣｌｏｓｕｒｅ（Ｉ）に加え
る。（ｂ）クロージャＣｌｏｓｕｒｅ（Ｉ）に［Ａ→α・Ｂ
β｛Ｘ_i｝］があれば、すべての［Ｂ→γ］に対して、
重複がない限り、［Ｂ→・γ，｛Ｘ_i｝］をクロージャ
Ｃｌｏｓｕｒｅ（Ｉ）に加える。さらに、Ｂから導出可
能なＣに対して、［Ｃ→・γ，｛Ｙ_j｝］があれば、
［Ｃ→・γ，｛Ｘ_i｝∪｛Ｙ_j｝］で置き換える。[A → α · β, {X _i }]. The processing of the conditional closure function is as follows. (A) Conditional item of I where S is a start symbol [S →
α, ｛｝], the closure Closure (I)
Is added to [S → α, {S}]. The other conditional items of I are added to the closure Closure (I) as is. (B) [A → α · B] in Closure (I)
β {X _i }], for all [B → γ],
As long as there is no overlap, [B → γ, {X _i }] is added to the closure Closure (I). Further, for C derivable from B, if [C → · γ, {Y _j }]
[C → · γ, {X _i iY _j }].

【００４１】上記の条件付Ｇｏｔｏ関数の処理は次の通
りである。条件付アイテム集合Ｉと非終端記号Ｘが与え
られたとき、関数Ｇｏｔｏ（Ｉ，Ｘ）の値は、Ｉ中のす
べての条件付アイテム［Ａ→α・Ｘβ，｛Ｘ_i｝］に対
して、ドットの位置を１つだけ右にずらした条件付アイ
テム［Ａ→αＸ・β，｛Ｘ_i｝］から得られるすべての
条件付クロージャの和集合である。また、Ｆｏｌｌｏｗ
関数は条件付アイテムに対しても同じである。The processing of the above-mentioned conditional Goto function is as follows. Given a conditional item set I and a non-terminal symbol X, the value of the function Goto (I, X) is, for all conditional items [A → α · Xβ, {X _i }] This is the union of all the conditional closures obtained from the conditional item [A → αX · β, {X _i }] in which the position of the dot is shifted one position to the right. Also, Follow
Functions are the same for conditional items.

【００４２】図４は、図１の統計的言語モデル生成部２
３によって実行される統計的言語モデル生成処理を示す
フローチャートである。当該統計的言語モデルは、品詞
の統計的言語モデルであって、前終端記号のバイグラム
を含む。当該処理においては、表９に示すような構文解
析データを作成し、発話開始を表す＜ＢＥＧＩＮ＞と発
話終了を表す＜ＥＮＤ＞という記号を用意して、前終端
記号の連接情報を抽出する。例えば、表９からは表１０
が得られる。多数のデータから表１０のようなデータの
出現頻度を計算し正規化して、表１１のような前終端記
号のつながりやすさのデータ、すなわち、前終端記号の
バイグラムを含む統計的言語モデルを得る。FIG. 4 shows the statistical language model generator 2 of FIG.
3 is a flowchart showing a statistical language model generation process executed by No. 3; The statistical language model is a part-of-speech statistical language model, and includes a bigram of a preterminal symbol. In this process, syntax analysis data as shown in Table 9 is created, and symbols <BEGIN> indicating the start of the utterance and <END> indicating the end of the utterance are prepared, and concatenation information of the pre-terminal symbol is extracted. For example, from Table 9 to Table 10
Is obtained. The appearance frequency of data as shown in Table 10 is calculated and normalized from a large number of data to obtain data on the ease of connection of preterminals as shown in Table 11, that is, a statistical language model including a bigram of preterminals. .

【００４３】[0043]

【００４４】[0044]

【表１０】構文解析データの例 ──────────────────────── <BEGIN>:<pow-n-proper> <pow-n-proper>:<aux-cop-da-renyo-de> <aux-cop-da-renyo-de>:<auxstem-polt-masu> <auxstem-polt-masu>:<vinfl-spe-su> <vinfl-spe-su>:<END> ────────────────────────[Table 10] Example of syntax analysis data ──────────────────────── <BEGIN>: <pow-n-proper> <pow-n-proper >: <aux-cop-da-renyo-de> <aux-cop-da-renyo-de>: <auxstem-polt-masu> <auxstem-polt-masu>: <vinfl-spe-su> <vinfl- spe-su>: <END> ────────────────────────

【００４５】[0045]

【表１１】前終端記号バイグラムの抽出結果例 ───────────────────────── <BEGIN>:<adv-desu> = 0.036585 (9/246) <BEGIN>:<adv-sent> = 0.028455 (7/246) <BEGIN>:<adv> = 0.056911 (14/246) <BEGIN>:<conj> = 0.097561 (24/246) <BEGIN>:<family-name-jap> = 0.016260 (4/246) <BEGIN>:<first-name-others> = 0.012195 (3/246) <BEGIN>:<interj-hesit> = 0.069106 (17/246) <BEGIN>:<interj-post> = 0.077236 (19/246) <BEGIN>:<interj-pre> = 0.219512 (54/246) <BEGIN>:<n-adj> = 0.012195 (3/246) <BEGIN>:<n-day> = 0.004065 (1/246) <BEGIN>:<n-hour> = 0.012195 (3/246) <BEGIN>:<n-hutu> = 0.097561 (24/246) <BEGIN>:<n-month> = 0.004065 (1/246) <BEGIN>:<n-num-kyuu> = 0.004065 (1/246) <BEGIN>:<n-num-roku> = 0.004065 (1/246) <BEGIN>:<n-num-san> = 0.004065 (1/246) <BEGIN>:<n-num-yon> = 0.004065 (1/246) <BEGIN>:<n-proper> = 0.020325 (5/246) <BEGIN>:<n-sahen> = 0.040650 (10/246) <BEGIN>:<n-spel> = 0.004065 (1/246) <BEGIN>:<n-time> = 0.020325 (5/246) <BEGIN>:<n-week> = 0.004065 (1/246) <BEGIN>:<num-suf-hyaku> = 0.004065 (1/246) <BEGIN>:<prefix-go> = 0.008130 (2/246) <BEGIN>:<prefix-o> = 0.044715 (11/246) <BEGIN>:<pro-exp> = 0.012195 (3/246) <BEGIN>:<pro1> = 0.008130 (2/246) <BEGIN>:<pro> = 0.004065 (1/246) <BEGIN>:<rentai> = 0.012195 (3/246) <BEGIN>:<vstem-1dan> = 0.016260 (4/246) <BEGIN>:<vstem-5-r> = 0.032520 (8/246) <BEGIN>:<wh-pro> = 0.008130 (2/246) <adjstem>:<vinfl-adj-i> = 0.714286 (5/7) <adjstem>:<vinfl-adj-ku> = 0.285714 (2/7) <adv-degr>:<n-num-hito> = 1.000000 (1/1) <adv-desu>:<auxstem-desu> = 0.888889 (16/18) <adv-desu>:<prefix-go> = 0.055556 (1/18) <adv-desu>:<prefix-o> = 0.055556 (1/18) <adv-sent>:<adv> = 0.100000 (1/10) <adv-sent>:<n-adj> = 0.100000 (1/10) <adv-sent>:<n-day> = 0.100000 (1/10) <adv-sent>:<n-hutu> = 0.500000 (5/10) <adv-sent>:<rentai> = 0.200000 (2/10) ─────────────────────────[Table 11] Example of extraction result of preterminal bigram ───────────────────────── <BEGIN>: <adv-desu> = 0.036585 (9 / 246) <BEGIN>: <adv-sent> = 0.028455 (7/246) <BEGIN>: <adv> = 0.056911 (14/246) <BEGIN>: <conj> = 0.097561 (24/246) <BEGIN> : <family-name-jap> = 0.016260 (4/246) <BEGIN>: <first-name-others> = 0.012195 (3/246) <BEGIN>: <interj-hesit> = 0.069106 (17/246) < BEGIN>: <interj-post> = 0.077236 (19/246) <BEGIN>: <interj-pre> = 0.219512 (54/246) <BEGIN>: <n-adj> = 0.012195 (3/246) <BEGIN> : <n-day> = 0.004065 (1/246) <BEGIN>: <n-hour> = 0.012195 (3/246) <BEGIN>: <n-hutu> = 0.097561 (24/246) <BEGIN>: < n-month> = 0.004065 (1/246) <BEGIN>: <n-num-kyuu> = 0.004065 (1/246) <BEGIN>: <n-num-roku> = 0.004065 (1/246) <BEGIN> : <n-num-san> = 0.004065 (1/246) <BEGIN>: <n-num-yon> = 0.004065 (1/246) <BEGIN>: <n-proper> = 0.020325 (5/246) < BEGIN>: <n-sahen> = 0.040650 (10/246) <BEGIN>: <n-spel> = 0.004065 (1/246) <BEGIN>: <n-time> = 0.020325 (5/246) <BEGIN> : <n- week> = 0.004065 (1/246) <BEGIN>: <num-suf-hyaku> = 0.004065 (1/246) <BEGIN>: <prefix-go> = 0.008130 (2/246) <BEGIN>: <prefix- o> = 0.044715 (11/246) <BEGIN>: <pro-exp> = 0.012195 (3/246) <BEGIN>: <pro1> = 0.008130 (2/246) <BEGIN>: <pro> = 0.004065 (1 / 246) <BEGIN>: <rentai> = 0.012195 (3/246) <BEGIN>: <vstem-1dan> = 0.016260 (4/246) <BEGIN>: <vstem-5-r> = 0.032520 (8/246) ) <BEGIN>: <wh-pro> = 0.008130 (2/246) <adjstem>: <vinfl-adj-i> = 0.714286 (5/7) <adjstem>: <vinfl-adj-ku> = 0.285714 (2 / 7) <adv-degr>: <n-num-hito> = 1.000000 (1/1) <adv-desu>: <auxstem-desu> = 0.888889 (16/18) <adv-desu>: <prefix- go> = 0.055556 (1/18) <adv-desu>: <prefix-o> = 0.055556 (1/18) <adv-sent>: <adv> = 0.100000 (1/10) <adv-sent>: < n-adj> = 0.100000 (1/10) <adv-sent>: <n-day> = 0.100000 (1/10) <adv-sent>: <n-hutu> = 0.500000 (5/10) <adv- sent>: <rentai> = 0.200000 (2/10) ─────────────────────────

【００４６】表１０において、例えば、「<BEGIN>:<pow
-n-proper>」は、発話開始＜ＢＥＧＩＮ＞のあとに固有
名詞<pow-n-proper>がつながることを示す、「<pow-n-p
roper>:<aux-cop-da-renyo-de>」は、固有名詞<pow-n-p
roper>のあとに助動詞「だ」の連用形「で」を表わす<a
ux-cop-da-renyo-de>がつながることを示す。また、表
１１において、例えば、「<BEGIN>:<adv-desu> = 0.036
585 (9/246)」は、発話開始＜ＢＥＧＩＮ＞と、「で
す」と共起可能な副詞を表わす<adv-desu>との連接確率
が０．０３６５８５であり、その根拠は、発話開始＜Ｂ
ＥＧＩＮ＞の出現頻度２４６に対してその連接の出現頻
度が９回であることに基づく。さらに、「<adjstem>:<v
infl-adj-i> = 0.714286 (5/7)」は、形容詞語幹<adjst
em>に形容詞語尾「い」を表わす<vinfl-adj-i>がつなが
る連接確率が０．７１４２８６であり、その根拠は、形
容詞語幹<adjstem>の出現頻度７回のうち、５回がその
連接であることを示す。またさらに、「<adv-degr>:<n-
num-hito> = 1.000000 (1/1)」は、程度を表わす<adv-d
egr>のあとに数詞「一」を表わす<n-num-hito>がつなが
る連接確率は１であり、その根拠は、<adv-degr>の出現
１回に対してすべてであったことに基づく。さらに、
「<adv-desu>:<auxstem-desu> = 0.888889 (16/18)」
は、「です」と共起可能な副詞<adv-desu>と助動詞「で
す」の語幹を表わす<auxstem-desu>とがつながる確率は
０．８８８８８９であり、この根拠は、<auxstem-desu>
の出現回数１８回に対して１６回の連接があったことに
基づく。In Table 10, for example, "<BEGIN>: <pow
-n-proper> indicates that the proper noun <pow-n-proper> is connected after the start of utterance <BEGIN>.
roper>: <aux-cop-da-renyo-de>'' is a proper noun <pow-np
<a> represents roped> followed by the auxiliary verb "da"
ux-cop-da-renyo-de> is connected. In Table 11, for example, “<BEGIN>: <adv-desu> = 0.036
585 (9/246) "has a connection probability of 0.036585 between the start of utterance <BEGIN> and <adv-desu> representing an adverb that can co-occur with" is ", and the basis is that the start of utterance . In addition, "<adjstem>: <v
infl-adj-i> = 0.714286 (5/7) is the adjective stem <adjst
The concatenation probability that <vinfl-adj-i> representing the adjective ending "i" is connected to "em" is 0.714286, based on five out of seven occurrences of the adjective stem <adjstem> It is shown that. Furthermore, "<adv-degr>: <n-
<num-hito> = 1.000000 (1/1) '' indicates the degree <adv-d
egr> is followed by <n-num-hito> representing the number "one". The concatenation probability is 1, based on the fact that every occurrence of <adv-degr> has been all . further,
"<Adv-desu>: <auxstem-desu> = 0.888889 (16/18)"
Is 0.888889, the probability that the adverb <adv-desu> that can co-occur with "is" and the <auxstem-desu> representing the stem of the auxiliary verb "is" is 0.888889, which is based on the <auxstem-desu>
This is based on the fact that there were 16 connections for 18 appearances of.

【００４７】統計的言語モデル生成処理を示す図４にお
いて、まず、ステップＳ２１において、ＣＦＧルールメ
モリ３１からＣＦＧルールを読み出す。次いで、ステッ
プＳ２２において、読み出したＣＦＧルールを用いて構
文解析データを作成し、ステップＳ２３において、作成
した構文解析データに、発話開始記号＜ＢＥＧＩＮ＞及
び発話終了記号＜ＥＮＤ＞を追加して、前終端記号の２
つ組を抽出する。そして、ステップＳ２４において、前
終端記号の２つ組の頻度を計数し、ステップＳ２５にお
いて、時系列で前の記号毎に頻度を正規化して、前終端
記号のバイグラムを含む統計的言語モデルを生成する。
ステップＳ２４において、何らかの平滑化処理を施して
もよい。最後に、ステップＳ２６において、生成した統
計的言語モデルを統計的言語モデルメモリ１３に書き込
み、当該統計的言語モデル生成処理を終了する。Referring to FIG. 4 showing the statistical language model generation processing, first, in step S21, CFG rules are read from the CFG rule memory 31. Next, in step S22, syntax analysis data is created using the read CFG rules. In step S23, the utterance start symbol <BEGIN> and the utterance end symbol <END> are added to the created syntax analysis data. Terminal symbol 2
Extract tuples. Then, in step S24, the frequency of the pair of preterminal symbols is counted, and in step S25, the frequency is normalized for each preceding symbol in a time series to generate a statistical language model including the bigram of the preterminal symbol. I do.
In step S24, some smoothing processing may be performed. Finally, in step S26, the generated statistical language model is written into the statistical language model memory 13, and the statistical language model generation processing ends.

【００４８】図５は、図１のＧＬＲパーザ５によって実
行される音声認識処理を示すフローチャートである。図
５において、まず、ステップＳ３１において、初期化処
理を実行する。具体的には次の処理を実行する。（ａ）現在の前終端記号に発話開始記号＜ＢＥＧＩＮ＞
を置く。（ｂ）表６に示すような第１のＬＲテーブルのためのス
タック１メモリを用意し、スタック１に初期状態０を積
む。FIG. 5 is a flowchart showing a speech recognition process executed by the GLR parser 5 of FIG. In FIG. 5, first, in step S31, an initialization process is executed. Specifically, the following processing is executed. (A) Speech start symbol <BEGIN> as the current pre-terminal symbol
Put. (B) Prepare a stack 1 memory for the first LR table as shown in Table 6, and load the stack 1 with the initial state 0.

【００４９】次いで、ステップＳ３２において、音声区
間が終了したか否かが判断され、音声区間が終了なら
ば、ステップＳ３７で音声認識候補データを出力して当
該音声認識処理を終了する。一方、音声区間が終了して
いないときは、ステップＳ３３に進む。ステップＳ３３
においては、次の前終端記号の予測処理と語の認識処理
の起動処理を実行する。具体的な処理は次の通りであ
る。（ａ）スタック１の一番上の状態が表６に示す第１のＬ
Ｒテーブルの現在の状態を示しているので、それを参照
する。（ｂ）レデュース命令があればそれを実行する。スタッ
ク１の内容が操作される。（ｃ）シフト命令があれば、そこで予測されている次の
前終端記号と現在の前終端記号とのバイグラム（統計的
なつながりやすさ）を評価する。複数候補あれば、セル
をコピーして複数個の候補を残す。シフト命令で指定さ
れている状態へ移動する。Next, at step S32, it is determined whether or not the voice section has ended. If the voice section has ended, voice recognition candidate data is output at step S37, and the voice recognition processing ends. On the other hand, if the voice section has not ended, the process proceeds to step S33. Step S33
In, the start processing of the prediction processing of the next preterminal and the recognition processing of the word is executed. The specific processing is as follows. (A) The top state of the stack 1 is the first L shown in Table 6.
Since the current state of the R table is shown, it is referred to. (B) If there is a reduce instruction, execute it. The contents of the stack 1 are operated. (C) If there is a shift instruction, the bigram (statistical ease of connection) between the next preterminal that is predicted there and the current preterminal is evaluated. If there are a plurality of candidates, the cell is copied to leave a plurality of candidates. Move to the state specified by the shift instruction.

【００５０】次いで、ステップＳ３４においては、語の
認識処理の初期化処理を実行する。具体的な処理は次の
通りである。（ａ）セル毎に予測された前終端記号を現在の前終端記
号とする。（ｂ）セル毎に表７及び表８に示す第２のＬＲテーブル
のためのスタック２メモリを用意し、初期状態０を積
む。Next, in step S34, an initialization process of the word recognition process is executed. The specific processing is as follows. (A) The pre-terminal symbol predicted for each cell is set as the current pre-terminal symbol. (B) The stack 2 memory for the second LR table shown in Tables 7 and 8 is prepared for each cell, and the initial state 0 is loaded.

【００５１】そして、ステップＳ３５において、語の認
識処理の終了条件検査を行い、すなわち、受理（ａｃ
ｃ）に至ったか否かが判断される。ここで、受理（ａｃ
ｃ）に至ったときは、ステップＳ３２に戻り、終了条件
検査を実行する。一方、受理（ａｃｃ）に至らないとき
は、ステップＳ３６に進み、語の認識処理を実行する。
具体的には次の処理を行う。（ａ）スタック２の一番上の状態が表７及び表８に示す
第２のＬＲテーブルの現在の状態を示しているので、そ
れを参照する。（ｂ）レデュース命令があればそれを実行する。スタッ
ク２の内容が操作される。（ｃ）シフト命令があれば、その条件に現在の前終端記
号が指定されているものだけを音素照合する。複数候補
あれば、セルをコピーして複数個の候補を残す。シフト
命令で指定されている状態へ移動する。（ｄ）音素照合の度に（又は音声の入力フレーム毎
に）、音素照合スコアと前終端記号バイグラムを合わせ
た評価スコアで、所定のしきい値を用いて枝刈りを行
う。すなわち、上記しきい値以下のスコアを有するもの
をビーム探索して枝刈りを実行する。（ｅ）そして、ステップＳ３５の語の認識処理の終了条
件検査に進む。Then, in step S35, an end condition check of the word recognition processing is performed, that is, the acceptance (ac
It is determined whether or not c) has been reached. Here, the acceptance (ac
If c) is reached, the process returns to step S32, and an end condition check is executed. On the other hand, if it does not reach acceptance (acc), the process proceeds to step S36, and a word recognition process is executed.
Specifically, the following processing is performed. (A) Since the top state of the stack 2 indicates the current state of the second LR table shown in Tables 7 and 8, the state is referred to. (B) If there is a reduce instruction, execute it. The contents of the stack 2 are operated. (C) If there is a shift instruction, phoneme matching is performed only on the condition where the current preterminal is specified in the condition. If there are a plurality of candidates, the cell is copied to leave a plurality of candidates. Move to the state specified by the shift instruction. (D) Each time a phoneme is collated (or for each input frame of speech), pruning is performed using an evaluation score obtained by combining the phoneme collation score and the preterminal bigram using a predetermined threshold value. That is, a beam having a score equal to or less than the threshold value is searched for a beam to perform pruning. (E) Then, the process proceeds to an end condition check of the word recognition process in step S35.

【００５２】以上に述べた処理により生成された第１と
第２のＬＲテーブル及び統計的言語モデルを、図１に示
すＳＳＳ（Successive State Splitting：逐次状態分割
法）−ＬＲ（left-to-right rightmost型）不特定話者
の自由発話連続音声認識装置に適用する。この連続音声
認識装置は、ＨＭ網と呼ばれる音素環境依存型の効率の
よいＨＭＭの表現形式を用いている。また、上記ＳＳＳ
においては、音素の特徴空間上に割り当てられた確率的
定常信号源（状態）の間の確率的な遷移により音声パラ
メータの時間的な推移を表現した確率モデルに対して、
尤度最大化の基準に基づいて個々の状態をコンテキスト
方向又は時間方向へ分割するという操作を繰り返すこと
によって、モデルの精密化を逐次的に実行する。The first and second LR tables and the statistical language model generated by the above-described processing are combined with the SSS (Successive State Splitting) -LR (left-to-right) shown in FIG. (rightmost type) Applied to an unspecified speaker's free speech continuous speech recognition device. This continuous speech recognition apparatus uses a phoneme environment-dependent and efficient HMM expression format called an HM network. In addition, the above SSS
In, a stochastic model expressing the temporal transition of speech parameters by stochastic transition between stochastic stationary signal sources (states) assigned on the phoneme feature space,
By repeating the operation of dividing each state in the context direction or the time direction based on the criterion of likelihood maximization, model refinement is sequentially performed.

【００５３】図１において、話者の自由発話の発声音声
文の音声はマイクロホン１ａに入力されて音声信号に変
換された後、Ａ／Ｄ変換器１ｂに入力される。Ａ／Ｄ変
換器１ｂは入力された音声信号をＡ／Ｄ変換した後、特
徴抽出部２に出力し、特徴抽出部２は、例えばＬＰＣ分
析を実行し、対数パワー、１６次ケプストラム係数、Δ
対数パワー及び１６次Δケプストラム係数を含む３４次
元の特徴パラメータを抽出する。抽出された特徴パラメ
ータの時系列はバッファメモリ３を介して音素照合部４
に入力される。In FIG. 1, the voice of the uttered voice sentence of the speaker's free utterance is input to the microphone 1a, converted into a voice signal, and then input to the A / D converter 1b. The A / D converter 1b performs A / D conversion on the input audio signal, and outputs the signal to the feature extraction unit 2. The feature extraction unit 2 performs, for example, LPC analysis, and performs log power, 16th order cepstrum coefficient, Δ
A 34-dimensional feature parameter including a logarithmic power and a 16th-order ΔCepstrum coefficient is extracted. The time series of the extracted feature parameters is stored in the phoneme matching unit 4 via the buffer memory 3.
Is input to

【００５４】音素照合部４に接続されるＨＭ網メモリ１
０内のＨＭ網は、各状態をノードとする複数のネットワ
ークとして表され、各状態はそれぞれ以下の情報を有す
る。（ａ）状態番号（ｂ）受理可能なコンテキストクラス（ｃ）先行状態、及び後続状態のリスト（ｄ）出力確率密度分布のパラメータ（ｅ）自己遷移確率及び後続状態への遷移確率HM network memory 1 connected to phoneme matching unit 4
The HM network within 0 is represented as a plurality of networks having each state as a node, and each state has the following information. (A) State number (b) Acceptable context class (c) List of preceding and succeeding states (d) Parameters of output probability density distribution (e) Self transition probability and transition probability to succeeding state

【００５５】なお、本実施形態において、音響モデルで
あるＨＭ網は、各分布がどの話者に由来するかを特定す
る必要があるため、所定の話者混合ＨＭ網を変換して作
成する。ここで、出力確率密度関数は３４次元の対角共
分散行列をもつ混合ガウス分布であり、各分布はある特
定の話者のサンプルを用いて学習されている。In the present embodiment, the HM network, which is an acoustic model, is created by converting a predetermined speaker mixed HM network because it is necessary to specify which speaker each distribution originates from. Here, the output probability density function is a Gaussian mixture distribution having a 34-dimensional diagonal covariance matrix, and each distribution is learned using a specific speaker sample.

【００５６】音素照合部４は、ＧＬＲパーザ５からの音
素照合要求に応じて音素照合処理を実行する。このとき
に、ＧＬＲパーザ５からは、音素照合区間及び照合対象
音素とその前後の音素から成る音素コンテキスト情報が
渡される。音素照合部４は、受け取った音素コンテキス
ト情報に基づいてそのようなコンテキストを受理するこ
とができるＨＭ網上の状態を、先行状態リストと後続状
態リストの制約内で連結することによって、１つのモデ
ルが選択される。そして、このモデルを用いて音素照合
区間内のデータに対する尤度が計算され、この尤度の値
が音素照合スコアとしてＧＬＲパーザ５に返される。こ
のときに用いられるモデルは、隠れマルコフモデル（以
下、ＨＭＭという。）と等価であるために、尤度の計算
には通常のＨＭＭで用いられている前向きパスアルゴリ
ズムをそのまま使用する。The phoneme matching unit 4 executes a phoneme matching process in response to a phoneme matching request from the GLR parser 5. At this time, the GLR parser 5 passes phoneme context information including a phoneme matching section, a phoneme to be matched, and phonemes before and after the phoneme. The phoneme matching unit 4 connects the states on the HM network capable of accepting such a context based on the received phoneme context information within the constraints of the preceding state list and the following state list, thereby forming one model. Is selected. Then, the likelihood for the data in the phoneme matching section is calculated using this model, and the value of the likelihood is returned to the GLR parser 5 as a phoneme matching score. Since the model used at this time is equivalent to a Hidden Markov Model (hereinafter, referred to as HMM), the likelihood calculation uses the forward path algorithm used in normal HMM as it is.

【００５７】ＧＬＲパーザ５は、第１と第２のＬＲテー
ブルと統計的言語モデルとをを参照して、上述の音声認
識処理（図５参照。）を実行することにより、入力され
た音素予測データについて左から右方向に、後戻りなし
に処理する。構文的にあいまいさがある場合は、スタッ
クを分割してすべての候補の解析が平行して処理され
る。ＧＬＲパーザ５は、第１と第２のＬＲテーブル及び
統計的言語モデルを参照して次にくる音素を予測して音
素予測データを音素照合部４に出力する。これに応答し
て、音素照合部４は、その音素に対応するＨＭ網メモリ
１０内のＨＭ網情報を参照して照合し、その尤度を音声
認識スコアとしてＧＬＲパーザ５に戻し、順次音素を連
接していくことにより、連続音声の認識を行っている。
複数の音素が予測された場合は、これらすべての存在を
チェックし、ビームサーチの方法により、部分的な音声
認識の尤度の高い部分木を残すという枝刈りを行って高
速処理を実現する。入力された話者音声の最後まで処理
した後、全体の尤度が最大のもの又は所定の上位複数個
のものを認識結果データ又は結果候補データとして出力
する。The GLR parser 5 refers to the first and second LR tables and the statistical language model to execute the above-described speech recognition processing (see FIG. 5), thereby obtaining the input phoneme prediction. Process the data from left to right without backtracking. If there is syntactic ambiguity, the stack is split and the analysis of all candidates is processed in parallel. The GLR parser 5 predicts the next phoneme with reference to the first and second LR tables and the statistical language model, and outputs phoneme prediction data to the phoneme matching unit 4. In response, the phoneme matching unit 4 performs matching by referring to the HM network information in the HM network memory 10 corresponding to the phoneme, returns the likelihood to the GLR parser 5 as a speech recognition score, and sequentially identifies the phoneme. Recognition of continuous speech is performed by connecting.
When a plurality of phonemes are predicted, the existence of all of them is checked, and pruning is performed by a beam search method to leave a partial tree having a high likelihood of partial speech recognition, thereby realizing high-speed processing. After processing to the end of the input speaker's voice, the one with the highest overall likelihood or a plurality of predetermined higher-order ones is output as recognition result data or result candidate data.

【００５８】なお、バッファメモリ３と、ＨＭ網メモリ
１０と、第１のＬＲテーブルメモリ１１と、第２のＬＲ
テーブルメモリ１２と、統計的言語モデルメモリ１３
と、ＣＦＧルールメモリ３１と、語彙規則メモリ３２と
は、例えば、ハードディスクメモリなどの記憶装置で構
成される。また、スタック１メモリとスタック２メモリ
とは、例えば、ＲＡＭなどの記憶装置で構成される。さ
らに、特徴抽出部２と、音素照合部４と、ＧＬＲパーザ
５と、第１のＬＲテーブル生成部２１と、第２のＬＲテ
ーブル生成部２２と、統計的言語モデル生成部２３と
は、例えばディジタル計算機などのコンピュータで構成
される。The buffer memory 3, the HM network memory 10, the first LR table memory 11, and the second LR
Table memory 12 and statistical language model memory 13
The CFG rule memory 31 and the vocabulary rule memory 32 are configured by a storage device such as a hard disk memory. The stack 1 memory and the stack 2 memory are configured by a storage device such as a RAM, for example. Further, the feature extraction unit 2, the phoneme matching unit 4, the GLR parser 5, the first LR table generation unit 21, the second LR table generation unit 22, and the statistical language model generation unit 23 include, for example, It is composed of a computer such as a digital computer.

【００５９】[0059]

【実施例】本発明者は、本実施形態の装置の効果を確認
するために、ポーズ単位の対話音声認識実験を種々の条
件の下で行った。本出願人で収集作成中の旅行会話デー
タベース（例えば、従来技術文献４「T.Morimoto et a
l.,“A Speech and Language Database for Speech Tra
nslation Research",Proceedings of ICSLP'94,pp.1791
-1794,1994年」参照。）から選択した対話音声を対象に
実験を行った。ポーズの自動検出を行って分割した音声
区間を認識対象とした。対数パワーとゼロ交差数の２つ
の特徴量を用い、３００ミリ秒より長いものを選択すれ
ば、促音と区別してポーズを検出できた（今回実験に用
いた対話音声データに限る。我々の集めている旅行会話
データベース全体の特徴という主張ではない。）。音素
モデルとしては、音素バランスの５０文によりＶＦＳ法
で話者適応を行ったモデル（状態数４０１，混合数５）
（例えば、従来技術文献５「外村政啓ほか，“ＭＡＰ−
ＶＦＳ話者適応法における平滑化係数制御の効果”，日
本音響学会講演論文集，２−５−６，１９９５年」参
照。）を利用した。音声の分析フレーム長は１０ｍｓと
した。音声認識の探索手法はフレーム同期方式を採用し
た。なお、実験に利用したマシンはヒューレット・パッ
カード製９０００／７３５型ワークステーションであ
る。そして、文法の諸元を表１２に示す。EXAMPLES In order to confirm the effects of the apparatus of the present embodiment, the present inventor conducted an interactive speech recognition experiment for each pause under various conditions. The travel conversation database being collected and created by the present applicant (for example, see Prior Art Document 4 “T. Morimoto et a
l., “A Speech and Language Database for Speech Tra
nslation Research ", Proceedings of ICSLP'94, pp.1791
-1794, 1994 ". The experiment was conducted on the dialogue speech selected from ()). Speech sections divided by automatic detection of poses were set as recognition targets. Using two features, logarithmic power and the number of zero crossings, and selecting a feature longer than 300 milliseconds, the pose could be detected separately from the prompting sound (limited to the dialogue speech data used in this experiment. It is not a claim that the characteristics of the entire travel conversation database are present.) As a phoneme model, a model in which speaker adaptation is performed by the VFS method using 50 sentences of phoneme balance (401 states, 5 mixtures)
(For example, in the prior art document 5 “Masahiro Tonomura et al.,“ MAP-
Effect of Smoothing Coefficient Control on VFS Speaker Adaptation ", Proc. Of the Acoustical Society of Japan, 2-5-6, 1995. ) Was used. The voice analysis frame length was 10 ms. The search method of speech recognition adopted the frame synchronization method. The machine used for the experiment was a Hewlett-Packard 9000/735 workstation. Table 12 shows the specifications of the grammar.

【００６０】[0060]

【表１２】文法の諸元 ─────────────────────────────────── 文法名語数規則数前終端記号数語パープレキシティ ──────────────── 文法のみ前終端記号ハ゛イク゛ラム併用時 ─────────────────────────────────── ２Ｓ３１７１３９５１８４１８．６１０．４２Ｍ５６１１５６７２４７３９．１２２．２２Ｌ１０１０１８０９２９１７１．２２５．９ ───────────────────────────────────[Table 12] Grammar specifications ─────────────────────────────────── Grammar name Number of words Number of rules Pretermination Number of symbols Word perplexity ──────────────── Grammar only When using the preterminal symbol program ゛{2S 317 1395 184 18.6 10.4 2M 561 1567 247 39.1 22.2 2L 1010 1809 291 71.2 25.9} ──────────────────────────────

【００６１】小さい文法は大きい文法の部分集合となっ
ている。旅行会話データベースからテストセットとは異
なる５０対話（１９５９文）を選び、前終端記号のバイ
グラムを求め、削除補間法により平滑化したところ、前
終端記号のみによるテストセットに対する語パープレキ
シティは２９．２であった。表１２から明らかなよう
に、いずれの文法の場合であっても、併用時の語パープ
レキシティの方が、元の文法のみの値や、前終端記号の
みの値いずれと比べても小さいことがわかる。The small grammar is a subset of the large grammar. Fifty dialogs (1959 sentences) different from the test set were selected from the travel conversation database, the bigram of the preterminal was obtained, and smoothed by the deletion interpolation method. The word perplexity for the test set using only the preterminal was 29. It was 2. As is evident from Table 12, the word perplexity when used together is smaller than the value of only the original grammar or the value of only the preterminal, regardless of the grammar. I understand.

【００６２】次いで、評価尺度の検討について述べる。
かな漢字文字列に変換した表記により、正解ラベルと音
声認識候補の間でどの程度一致しているかを評価した。
ポーズ単位認識率は、ポーズ単位全体が正解ラベルとす
べて一致したものの全体に対する割合である。部分的に
正解が含まれることがあるため、語認識率も求めた。語
認識率は正解ラベルに対して音声認識候補の語が一致し
ている割合をＤＰマッチングにより求めた。上位候補に
対し個別に語認識率を計測した時の最大値を累積の語認
識率とした。Next, the examination of the evaluation scale will be described.
The degree of matching between the correct answer label and the speech recognition candidate was evaluated using the notation converted to the kana-kanji character string.
The pose unit recognition rate is a percentage of the whole pose unit that matches all the correct labels. Since some correct answers may be included, the word recognition rate was also determined. The word recognition rate was obtained by DP matching at a rate at which the words of the speech recognition candidates matched the correct answer labels. The maximum value when the word recognition rate was measured individually for the top candidates was defined as the cumulative word recognition rate.

【００６３】次いで、ポーズ単位の対話音声認識実験結
果について述べる。５対話、４話者、２話題（ホテルの
予約とホテルでのサービス）、６６発話、１１９ポーズ
単位、８４５語を対象に実験を行った。「あのキャンセ
ルしたいんですが」のように間投詞（この例では「あ
の」）も随所に挿入されている。「はい」のような感動
詞１語や「え」のような間投詞１語で一つのポーズ単位
となることもあるし、「あいにくですがシングルが満室
となっておりますが」という比較的長いポーズ単位もあ
る。なお、ポーズ単位の平均時間は１８７４ミリ秒であ
った。Next, the results of an interactive speech recognition experiment for each pause will be described. The experiment was conducted with 5 conversations, 4 speakers, 2 topics (hotel reservation and hotel services), 66 utterances, 119 pose units, and 845 words. Interjections ("that" in this example) are inserted everywhere like "I want to cancel that." One intransitive word such as "yes" or one interjection such as "e" can constitute one pause unit, and "is unfortunately a single room is fully booked." There is also a pause unit. The average time in pause units was 1874 milliseconds.

【００６４】図６は、従来例の連続音声認識装置におけ
るＣＰＵ時間に対するポーズ単位認識率を示すグラフで
あり、図７は、従来例の連続音声認識装置におけるＣＰ
Ｕ時間に対する語認識率を示すグラフである。図６及び
図７において、文法のみを利用し、個数によりビーム探
索を制限している。また、図８は、実施形態の連続音声
認識装置におけるＣＰＵ時間に対するポーズ単位認識率
を示すグラフであり、図９は、実施形態の連続音声認識
装置におけるＣＰＵ時間に対する語認識率を示すグラフ
である。図８及び図９は、しきい値によるビーム探索方
式の条件で、前終端記号のバイグラムを併用した場合の
結果である。図６乃至図９において、図６と図８のＴｏ
ｐ２０は上位２０位までの累積のポーズ単位認識率を示
し、図７と図９のＴｏｐ２０は上位２０位までの候補に
対して個別に語認識率を求めたときの最大値を示す。ま
た、図６及び図７のＢはビームに残す個数であり、図８
及び図９のＢｅａｍはビームのしきい値である。FIG. 6 is a graph showing the pause unit recognition rate with respect to the CPU time in the conventional continuous speech recognition apparatus. FIG. 7 is a graph showing the CP in the conventional continuous speech recognition apparatus.
It is a graph which shows the word recognition rate with respect to U time. 6 and 7, the beam search is limited by the number using only the grammar. FIG. 8 is a graph showing the pause unit recognition rate with respect to the CPU time in the continuous speech recognition device of the embodiment, and FIG. 9 is a graph showing the word recognition rate with respect to the CPU time in the continuous speech recognition device of the embodiment. . FIGS. 8 and 9 show the results when the bigram of the preceding terminal symbol is used together under the condition of the beam search method using the threshold value. 6 to FIG. 9, the To of FIG. 6 and FIG.
p20 indicates the cumulative pose unit recognition rate up to the top 20 places, and Top20 in FIGS. 7 and 9 indicates the maximum value when the word recognition rate is individually obtained for the top 20 candidates. B in FIGS. 6 and 7 is the number to be left in the beam.
Beam in FIG. 9 is the threshold value of the beam.

【００６５】前終端記号バイグラムを予測的に評価す
る、効率的な探索手法を実現し、その効果を確認した。
比較例はＣＰＵ時間で計測して実時間のほぼ２倍乃至そ
れ以上であったが、本実施形態は中小の語彙サイズであ
れば実時間処理をほぼ達成した。ビーム探索において一
定の個数を残す手法としきい値による枝刈り手法を比較
した結果、しきい値による枝刈り手法のほうが効率的で
あることが確認できた。上記ビーム探索過程で利用す
る、音声認識のための尤度スコアＳｃｏｒｅの計算は次
の２つの式を試みた。An efficient search method for predictively evaluating the preterminal bigram was realized, and its effect was confirmed.
In the comparative example, the CPU time was almost twice or more of the real time, but in the present embodiment, the real time processing was almost achieved if the vocabulary size was small or medium. As a result of comparing the method that leaves a certain number of beams and the pruning method using the threshold value in the beam search, it was confirmed that the pruning method using the threshold value is more efficient. The following two formulas were tried to calculate the likelihood score Score for speech recognition used in the beam search process.

【００６６】[0066]

【数７】Ｓｃｏｒｅ１＝ｌｏｇＰ_A＋Ｗｅｉｇｈｔ×
（ｌｏｇＰ_L／Ｎ）## EQU7 ## Score1 = logP _A + Weight ×
(LogP _L / N)

【数８】Ｓｃｏｒｅ２＝ｌｏｇＰ_A＋Ｗｅｉｇｈｔ×ｌ
ｏｇＰ_L [Expression 8] Score2 = logP _A + Weight × l
ogP _L

【００６７】ここで、Ｐ_AはＨＭ網による音響スコアで
あり、Ｐ_Lは第１と第２のＬＲテーブル及び統計的言語
モデルによる言語スコアである。Ｎは音素系列を構成す
る語数である。Ｗｅｉｇｈｔは重み係数である。音響ス
コアと言語スコアの対数の底を揃えた上で予備実験を行
い、Ｗｅｉｇｈｔは５．０とした。上記比較例に基づい
て行った、認識候補を後処理的に並べ換える予備実験で
は、語数で正規化したほうが正規化しない場合よりよい
結果が得られていた。しかしながら、実際に認識過程で
併用する実験を行うと、いずれもほぼ同程度の性能向上
が確認できたが、数７は正規化に要する計算量の処理時
間が増加した。要約すると、前終端記号バイグラムを数
８の評価方法で予測的に併用する探索手法で、しきい値
によるビーム探索を行う場合がよい。すなわち、音声認
識スコアは言語スコアＰ_Lの対数値に音響スコアの対数
値を加算した値に設定することが好ましい。Here, P _A is an acoustic score based on the HM network, and P _L is a language score based on the first and second LR tables and a statistical language model. N is the number of words constituting the phoneme sequence. Weight is a weight coefficient. Preliminary experiments were performed with the logarithms of the acoustic score and the linguistic score aligned, and Weight was set to 5.0. In a preliminary experiment performed on the basis of the above comparative example in which recognition candidates are rearranged in a post-processing manner, better results were obtained when normalized by the number of words than when not normalized. However, when experiments were carried out in which both were actually used in the recognition process, almost the same performance improvement was confirmed in all cases. However, in the case of Expression 7, the processing time of the calculation amount required for normalization increased. In summary, it is preferable to perform a beam search based on a threshold value by a search method in which a preterminal bigram is used in a predictive manner by the evaluation method of Expression 8. That is, the speech recognition score is preferably set to a value obtained by adding the logarithm of the acoustic score to the logarithm of the language score P _L.

【００６８】以上説明したように、所定のＣＦＧルール
に基づいて生成された第１のＬＲテーブルと、所定の語
彙規則に基づいて生成された第２のＬＲテーブルと、上
記ＣＦＧルールに基づいて生成された前終端記号のバイ
グラムを含む統計的言語モデルとを参照して、ＬＲ構文
解析処理を含む音声認識処理を実行するように構成した
ので、従来例及び比較例に比較して処理時間を短縮する
ことができるとともに、認識率を改善することができる
連続音声認識装置を提供することにある。すなわち、文
脈自由文法形式の統語的な制約を用いて、部分木系列を
スコア付きの仮説として出力する音声パーザにおいて、
辞書引きの実装方法とビーム探索の手法を改善すること
により、高速化と高性能化が達成することができるとい
う利点がある。As described above, the first LR table generated based on the predetermined CFG rule, the second LR table generated based on the predetermined vocabulary rule, and the first LR table generated based on the CFG rule. The speech recognition process including the LR parsing process is executed with reference to the statistical language model including the bigram of the pre-terminal symbol obtained, so that the processing time is reduced as compared with the conventional example and the comparative example. It is an object of the present invention to provide a continuous speech recognition device capable of improving the recognition rate. That is, in a speech parser that outputs a subtree sequence as a hypothesis with a score using a syntactic constraint in a context-free grammar form,
By improving the dictionary lookup method and the beam search method, there is an advantage that high speed and high performance can be achieved.

【００６９】[0069]

【発明の効果】以上詳述したように本発明に係る請求項
１記載の連続音声認識装置によれば、入力される自由発
話の発声音声文の音声信号に基づいて音声認識する音声
認識手段を備えた連続音声認識装置において、上記音声
認識手段は、上記音声信号に基づいて所定の隠れマルコ
フモデルを参照して音素認識し、かつ、所定の文脈自由
文法規則に基づいて生成された第１のＬＲ構文解析テー
ブルと、所定の語彙規則に基づいて生成された第２のＬ
Ｒ構文解析テーブルと、上記文脈自由文法規則に基づい
て生成された、上記文脈自由文法規則で書き換えたとき
の末端の要素を示す終端記号の１つ手前の記号である前
終端記号のバイグラムを含む統計的言語モデルとを参照
して構文解析することにより、上記発声音声文を音声認
識する。従って、従来例及び比較例に比較して処理時間
を短縮することができるとともに、認識率を改善するこ
とができる。As described above in detail, according to the continuous speech recognition apparatus of the first aspect of the present invention, the speech recognition means for recognizing the speech based on the speech signal of the uttered speech sentence of the free speech input. In the continuous speech recognition device provided, the speech recognition means performs phoneme recognition with reference to a predetermined hidden Markov model based on the voice signal, and generates a first speech generated based on a predetermined context-free grammar rule. An LR parsing table and a second L generated based on a predetermined vocabulary rule
An R syntax analysis table and a bigram of a preterminal symbol that is generated based on the context free grammar rule and is a symbol immediately before the terminal symbol indicating the terminal element when rewritten by the context free grammar rule By performing syntax analysis with reference to the statistical language model, the uttered speech sentence is speech-recognized. Therefore, the processing time can be reduced as compared with the conventional example and the comparative example, and the recognition rate can be improved.

【００７０】また、請求項２記載の連続音声認識装置に
おいては、請求項１記載の連続音声認識装置において、
上記語彙規則に対して開始記号から前終端記号への規則
を追加した後、第２のＬＲ構文解析テーブルの各状態
と、各状態の要素の命令内容を決定することにより、上
記第２のＬＲ構文解析テーブルを生成する生成手段をさ
らに備える。従って、語彙規則に基づいた上記第２のＬ
Ｒ構文解析テーブルを生成することができ、従来例及び
比較例に比較して処理時間を短縮することができるとと
もに、認識率を改善することができる。Further, in the continuous speech recognition apparatus according to the second aspect, the continuous speech recognition apparatus according to the first aspect includes:
After adding a rule from a start symbol to a pre-terminal symbol to the vocabulary rule, each state of the second LR syntax analysis table and the instruction content of an element of each state are determined, whereby the second LR The apparatus further includes a generation unit that generates a syntax analysis table. Therefore, the second L based on the lexical rule
An R syntax analysis table can be generated, processing time can be reduced as compared with the conventional example and the comparative example, and the recognition rate can be improved.

【００７１】さらに、請求項３記載の連続音声認識装置
においては、請求項１又は２記載の連続音声認識装置に
おいて、上記音声認識手段は、上記隠れマルコフモデル
に基づいた音響スコアと、上記第１と第２のＬＲ構文解
析テーブルと上記統計的言語モデルとに基づいた言語ス
コアとに基づいて音声認識のための尤度スコアを計算
し、所定のしきい値を用いてビーム探索により音声認識
結果を決定する。従って、従来例及び比較例に比較して
処理時間を短縮することができるとともに、認識率を改
善することができる。Further, in the continuous speech recognition apparatus according to the third aspect, in the continuous speech recognition apparatus according to the first or second aspect, the speech recognition means includes an acoustic score based on the hidden Markov model and the first score. And a language score based on the second LR syntax analysis table and the statistical language model, to calculate a likelihood score for speech recognition, and perform a beam search using a predetermined threshold to perform a beam search. To determine. Therefore, the processing time can be reduced as compared with the conventional example and the comparative example, and the recognition rate can be improved.

【００７２】またさらに、請求項４記載の連続音声認識
装置においては、請求項３記載の連続音声認識装置にお
いて、上記音声認識手段は、上記音響スコアの対数値
と、上記言語スコアの対数値に所定の重み係数を乗算し
た値とを加算した値を尤度スコアとして計算する。従っ
て、尤度スコアを簡便に計算することができるととも
に、従来例及び比較例に比較して処理時間を短縮するこ
とができるとともに、認識率を改善することができる。Further, in the continuous speech recognition apparatus according to the fourth aspect, in the continuous speech recognition apparatus according to the third aspect, the voice recognition means may include a logarithmic value of the acoustic score and a logarithmic value of the language score. A value obtained by adding a value multiplied by a predetermined weight coefficient is calculated as a likelihood score. Therefore, the likelihood score can be easily calculated, the processing time can be reduced as compared with the conventional example and the comparative example, and the recognition rate can be improved.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る一実施形態である連続音声認識
装置のブロック図である。FIG. 1 is a block diagram of a continuous speech recognition apparatus according to an embodiment of the present invention.

【図２】図１の第１のＬＲテーブル生成部２１によっ
て実行される第１のＬＲテーブル生成処理を示すフロー
チャートである。FIG. 2 is a flowchart illustrating a first LR table generation process executed by a first LR table generation unit 21 of FIG. 1;

【図３】図１の第２のＬＲテーブル生成部２２によっ
て実行される第２のＬＲテーブル生成処理を示すフロー
チャートである。FIG. 3 is a flowchart illustrating a second LR table generation process executed by a second LR table generation unit 22 of FIG. 1;

【図４】図１の統計的言語モデル生成部２３によって
実行される統計的言語モデル生成処理を示すフローチャ
ートである。FIG. 4 is a flowchart showing a statistical language model generation process executed by a statistical language model generation unit 23 in FIG. 1;

【図５】図１のＧＬＲパーザ５によって実行される音
声認識処理を示すフローチャートである。FIG. 5 is a flowchart showing a speech recognition process executed by the GLR parser 5 of FIG.

【図６】比較例の連続音声認識装置におけるＣＰＵ時
間に対するポーズ単位認識率を示すグラフである。FIG. 6 is a graph showing a pause unit recognition rate with respect to CPU time in a continuous speech recognition device of a comparative example.

【図７】比較例の連続音声認識装置におけるＣＰＵ時
間に対する語認識率を示すグラフである。FIG. 7 is a graph showing a word recognition rate with respect to CPU time in a continuous speech recognition device of a comparative example.

【図８】実施形態の連続音声認識装置におけるＣＰＵ
時間に対するポーズ単位認識率を示すグラフである。FIG. 8 is a CPU in the continuous speech recognition device according to the embodiment;
It is a graph which shows the pause unit recognition rate with respect to time.

【図９】実施形態の連続音声認識装置におけるＣＰＵ
時間に対する語認識率を示すグラフである。FIG. 9 is a CPU in the continuous speech recognition apparatus according to the embodiment;
It is a graph which shows the word recognition rate with respect to time.

【符号の説明】[Explanation of symbols]

１ａ…マイクロホン、１ｂ…Ａ／Ｄ変換器、２…特徴抽出部、３…バッファメモリ、４…音素照合部、５…一般化されたＬＲ構文解析部（ＧＬＲパーザ）、１０…隠れマルコフ網メモリ（ＨＭ網メモリ）、１１…ＣＦＧルールＬＲテーブルメモリ（第１のＬＲテ
ーブルメモリ）、１２…語彙規則ＬＲテーブルメモリ（第２のＬＲテーブ
ルメモリ）、１３…統計的言語モデルメモリ、２１…第１のＬＲテーブル生成部、２２…第２のＬＲテーブル生成部、２３…統計的言語モデル生成部、３１…文脈自由文法規則メモリ（ＣＦＧルールメモ
リ）、３２…語彙規則メモリ、４１…スタック１メモリ、４２…スタック２メモリ。1a: microphone, 1b: A / D converter, 2: feature extraction unit, 3: buffer memory, 4: phoneme matching unit, 5: generalized LR parsing unit (GLR parser), 10: hidden Markov network memory (HM network memory), 11: CFG rule LR table memory (first LR table memory), 12: Vocabulary rule LR table memory (second LR table memory), 13: statistical language model memory, 21: first LR table generator, 22 ... second LR table generator, 23 ... statistical language model generator, 31 ... context-free grammar rule memory (CFG rule memory), 32 ... vocabulary rule memory, 41 ... stack 1 memory, 42: Stack 2 memory.

Claims

【特許請求の範囲】[Claims]

【請求項１】入力される自由発話の発声音声文の音声
信号に基づいて音声認識する音声認識手段を備えた連続
音声認識装置において、上記音声認識手段は、上記音声信号に基づいて所定の隠
れマルコフモデルを参照して音素認識し、かつ、所定の
文脈自由文法規則に基づいて生成された第１のＬＲ構文
解析テーブルと、所定の語彙規則に基づいて生成された
第２のＬＲ構文解析テーブルと、上記文脈自由文法規則
に基づいて生成された、上記文脈自由文法規則で書き換
えたときの末端の要素を示す終端記号の１つ手前の記号
である前終端記号のバイグラムを含む統計的言語モデル
とを参照して構文解析することにより、上記発声音声文
を音声認識することを特徴とする連続音声認識装置。1. A continuous speech recognition device comprising a speech recognition means for recognizing speech based on a speech signal of an input free speech utterance sentence, wherein the speech recognition means comprises a predetermined hidden speech based on the speech signal. A first LR parsing table generated based on a predetermined context-free grammar rule while performing phoneme recognition with reference to a Markov model, and a second LR parsing table generated based on a predetermined vocabulary rule And a statistical language model including a bigram of a preterminal symbol generated by the context-free grammar rule, which is a symbol preceding the terminal symbol indicating the terminal element when rewritten by the context-free grammar rule. A continuous speech recognition apparatus characterized in that the uttered speech sentence is speech-recognized by performing syntax analysis with reference to the following.

【請求項２】上記語彙規則に対して開始記号から前終
端記号への規則を追加した後、第２のＬＲ構文解析テー
ブルの各状態と、各状態の要素の命令内容を決定するこ
とにより、上記第２のＬＲ構文解析テーブルを生成する
生成手段をさらに備えたことを特徴とする請求項１記載
の連続音声認識装置。2. After adding a rule from a start symbol to a pre-terminal symbol to the vocabulary rule, each state of the second LR syntax analysis table and the instruction content of an element of each state are determined. 2. The continuous speech recognition apparatus according to claim 1, further comprising a generation unit configured to generate the second LR syntax analysis table.

【請求項３】上記音声認識手段は、上記隠れマルコフ
モデルに基づいた音響スコアと、上記第１と第２のＬＲ
構文解析テーブルと上記統計的言語モデルとに基づいた
言語スコアとに基づいて音声認識のための尤度スコアを
計算し、所定のしきい値を用いてビーム探索により音声
認識結果を決定することを特徴とする請求項１又は２記
載の連続音声認識装置。3. The speech recognition means according to claim 1, wherein said speech score is based on said hidden Markov model and said first and second LRs.
Calculating a likelihood score for speech recognition based on a syntax analysis table and a language score based on the statistical language model, and determining a speech recognition result by beam search using a predetermined threshold. The continuous speech recognition device according to claim 1 or 2, wherein:

【請求項４】上記音声認識手段は、上記音響スコアの
対数値と、上記言語スコアの対数値に所定の重み係数を
乗算した値とを加算した値を尤度スコアとして計算する
ことを特徴とする請求項３記載の連続音声認識装置。4. The speech recognition means calculates a value obtained by adding a logarithmic value of the acoustic score and a value obtained by multiplying a logarithmic value of the language score by a predetermined weighting factor as a likelihood score. The continuous speech recognition device according to claim 3.