JPH10340096A

JPH10340096A - Voice recognition device

Info

Publication number: JPH10340096A
Application number: JP9165205A
Authority: JP
Inventors: Yasuko Kato; 靖子加藤; Kazunaga Yoshida; 和永吉田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-06-06
Filing date: 1997-06-06
Publication date: 1998-12-22
Anticipated expiration: 2017-06-06
Also published as: JP3039453B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device for recognizing an arbitrarily uttered word and sentence with high accuracy. SOLUTION: First, single syllable recognizing processing is selected in a recognizing processing changeover part 2, a voice uttered by dividing it into syllable is recognized in a first single syllable recognizing processing part 3, the recognized result is stored in a recognized result storage part 4 and a number of syllables is counted by a number of syllable counting part 5 at the same time, next, continuous syllable recognizing processing is selected in the recognizing processing changeover part 2, recognizing is performed in a first continuous syllable recognizing processing part 6 by using information obtained by the recognized result storage part 4 and the number of syllable counting part 5 and obtained recognized result is outputted.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置に関
し、特に任意の単語または文の認識が可能な音声認識装
置に関する。The present invention relates to a speech recognition device, and more particularly to a speech recognition device capable of recognizing an arbitrary word or sentence.

【０００２】[0002]

【従来の技術】音声認識技術を様々な分野に応用する場
合、認識対象を限定せず、任意の単語や文が入力可能で
あることが理想的である。これを実現する方法として、
従来、文節単位に発声された音声を認識対象とする単音
節認識方式や、あるいは、任意の音節が任意の数だけ連
続して発声された音声を認識対象とする連続音節認識方
式などがあった。2. Description of the Related Art When a speech recognition technique is applied to various fields, it is ideal that any word or sentence can be input without limiting the recognition target. To achieve this,
Conventionally, there is a single syllable recognition method for recognizing speech uttered in units of syllables, or a continuous syllable recognition method for recognizing speech in which an arbitrary number of syllables are uttered continuously by an arbitrary number. .

【０００３】単音節認識方式の１つとして、文献（古井
貞煕著、「ディジタル音声処理」、８．７章、第１７０
頁から１７２頁、東海大学出版会出版）（以下「文献
１」という）に示されている方法がある。As one of the syllable recognition methods, a document (Sadahiro Furui, “Digital Speech Processing”, Chapter 8.7, Chapter 170)
Page 172, published by Tokai University Press (hereinafter referred to as "Document 1").

【０００４】この方法は、まず、入力された音声をＬＰ
Ｃ（ＬｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｄｉｎ
ｇ）分析してその特徴であるＬＰＣケプストラムを抽出
し、音声区間を検出する。検出した音声から子音部と母
音部を分離し、語頭の過渡部についてシフトマッチング
を行い、それに音節全体の大局的スペクトルパターン特
徴を組合せて認識を行い、これらの結果を組み合わせて
判定するものである。In this method, first, an input voice is converted to LP
C (Linear Predictive Codin)
g) Analyze and extract the characteristic LPC cepstrum, and detect the voice section. It separates consonant parts and vowel parts from the detected speech, performs shift matching on the transient part at the beginning of the word, recognizes it by combining global spectral pattern features of the entire syllable, and determines these results in combination. .

【０００５】連続音節認識方式の１つとして、たとえ
ば、アイキャスプ８３、７．９、第３２０頁から３２３
頁（ＩＣＡＳＳＰ８３７．９ｐｐ．３２０〜３２
３）に「セグメンテーションフリーシラブルレコグ
ニションインコンティニュアスリースポークン
ジャパニーズ（ＳＥＧＭＥＮＴＡＴＩＯＮ−ＦＲＥＥ
ＳＹＬＬＡＢＬＥＲＥＣＯＧＮＩＴＩＯＮＩＮＣ
ＯＮＴＩＮＵＯＵＳＬＹＳＰＯＫＥＮＪＡＰＡＮＥＳ
Ｅ）」と題して掲載されている論文（以下「文献２」と
いう）に示される、音節間の変化の情報を含む子音・母
音連鎖音声パターン（「ＣＶパターン」という）と、母
音・子音・母音連鎖音声パターン（「ＶＣＶパターン」
という）を結合した標準パターンを用いて認識する方法
がある。As one of the continuous syllable recognition systems, for example, Eyecast 83, 7.9, pages 320 to 323
Page (ICASSP83 7.9 pp. 320-32)
3) “Segmentation-free syllable recognition in continuous three spoken”
Japanese (SEGMENTATION-FREE
SYLABLE RECOGNISION IN C
ONINUOUSLYSPOKEN JAPANESE
E) ”(hereinafter referred to as“ Reference 2 ”), a consonant / vowel chain voice pattern (referred to as“ CV pattern ”) including information on the change between syllables, a vowel / consonant / Vowel chain voice pattern (“VCV pattern”
There is a method of recognizing by using a standard pattern which combines the above.

【０００６】これは、入力音声を特徴ベクトルに変換
し、母音部候補を抽出して各候補の母音名、区間を求め
たあと、ＣＶパターン、ＶＣＶパターンレベルのマッチ
ングをＤＰを用いて行い、その結果から入力全体に対す
る最適なＣＶパターン、ＶＣＶパターンの系列を認識結
果として求める方法である。In this method, an input voice is converted into a feature vector, vowel part candidates are extracted, vowel names and sections of each candidate are obtained, and CV pattern and VCV pattern level matching is performed using DP. In this method, an optimal CV pattern and VCV pattern sequence for the entire input is obtained from the result as a recognition result.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上記し
た単音節認識方式では、単音節が単独で発声されるので
調音結合の影響が少ないため、母音部は比較的安定して
認識できるが、子音部は継続時間が短いので情報量が少
なく、類似しているものが多いため誤りやすい、という
問題点を有している。However, in the single syllable recognition method described above, since the single syllable is uttered independently, the effect of articulation coupling is small, so that the vowel part can be recognized relatively stably. Has a problem that the amount of information is small because the duration is short, and it is easy to make an error because there are many similar ones.

【０００８】一方、連続音節認識方式では、語中の破裂
音の認識で破裂前の閉鎖部の有無についての情報が使え
るなど、子音部の認識性能は単音節認識に比べて良いこ
とが期待できる。しかし、連続する音節の数や種類に制
限がないために、音節の挿入や脱落が生じやすく、音節
数の誤りが生じる可能性がある。On the other hand, in the continuous syllable recognition system, recognition performance of consonant parts can be expected to be better than that of single syllable recognition, for example, information on the presence or absence of a closed part before explosion can be used for recognition of plosives in words. . However, since there is no limitation on the number and types of continuous syllables, syllables are likely to be inserted or dropped, and the number of syllables may be incorrect.

【０００９】また、連続して発声した音声を対象とする
ので、調音結合の影響により母音の認識誤りも生じやす
い。Further, since the utterance is a speech uttered continuously, vowel recognition errors are likely to occur due to the influence of articulation.

【００１０】したがって、本発明は、上記問題点に鑑み
てなされたものであって、その目的は、任意に発声され
た単語や文を高精度で認識する音声認識装置を提供する
ことにある。[0010] Accordingly, the present invention has been made in view of the above problems, and an object of the present invention is to provide a speech recognition apparatus for recognizing a word or a sentence arbitrarily uttered with high accuracy.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、認識単位に区切って発声された音声に対
して認識単位毎の認識処理を行う単位音声認識処理部
と、前記認識単位に区切って発声された音声に関連して
連続に発声された音声に対し、前記認識単位毎の認識結
果を用いて連続認識処理を行う連続音声認識処理部と、
を有している。In order to achieve the above object, the present invention provides a unit speech recognition processing unit for performing a recognition process for each recognition unit on speech uttered in units of recognition units; A continuous voice recognition processing unit that performs a continuous recognition process on the voice uttered continuously in relation to the voice uttered in units by using the recognition result for each recognition unit;
have.

【００１２】[0012]

【作用】本発明の作用について説明すると、本発明で
は、単語を入力する際に、入力したい単語をまず音節に
区切って発声し、その後に同じ単語を区切らずに連続し
て発声する。この音節に区切って発声された各音声は、
単音節認識処理部により認識される。このときに得られ
た認識結果を用いて、次に、区切らずに連続して発声さ
れた音声を連続音節認識方式により認識し、その結果を
入力したい単語の認識結果とする。The operation of the present invention will be described. In the present invention, when a word is input, the word to be input is first divided into syllables and uttered, and then the same word is uttered continuously without separating. Each voice divided into these syllables,
It is recognized by the monosyllable recognition processing unit. Next, using the recognition result obtained at this time, a continuously uttered voice without segmentation is recognized by a continuous syllable recognition method, and the result is used as a recognition result of a word to be input.

【００１３】単音節認識結果のうち、音節数や母音部の
認識結果の精度は高いことが期待されるので、これらの
情報を連続音節認識に利用することにより、連続音節認
識の性能向上を図る。これにより、単音節認識だけでは
認識精度の低い子音の認識に対して、連続音節認識の結
果を用いることができるので、認識性能の向上が図れ
る。Since the accuracy of the number of syllables and the recognition result of vowel parts among single syllable recognition results is expected to be high, the performance of continuous syllable recognition is improved by using such information for continuous syllable recognition. . Accordingly, the result of continuous syllable recognition can be used for recognition of a consonant having low recognition accuracy only by single syllable recognition, so that recognition performance can be improved.

【００１４】また単音節認識結果のうち上位候補あるい
は類似性が高い音節のみを連続音節認識の対象とするこ
とにより、連続音節認識での処理量を削減する。In addition, by processing only syllables having higher similarities or syllables having a high similarity among single syllable recognition results, the amount of processing in continuous syllable recognition is reduced.

【００１５】以上、認識単位として音節を用いて説明し
たが、これ以外にもたとえば、アルファベットを用いる
場合でも同様の効果が期待できる。以下、音節を認識単
位とした場合についてのみ説明する。Although the above description has been made using syllables as recognition units, other similar effects can be expected when alphabets are used. Hereinafter, only the case where a syllable is used as a recognition unit will be described.

【００１６】[0016]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１７】［実施の形態１］図１は、本発明の第１の
実施の形態の構成を示すブロック図である。[First Embodiment] FIG. 1 is a block diagram showing a configuration of a first embodiment of the present invention.

【００１８】図１を参照すると、本発明の第１の実施の
形態は、（ａ）話者の発声した音声を入力する音声入力
部１と、（ｂ）単音節認識処理と連続音節認識処理の切
り替えを行う認識処理切り替え部２と、（ｃ）単音節認
識処理を行う第１単音節認識処理部３と、（ｄ）第１単
音節認識処理部３で得られた認識結果を保持する認識結
果記憶部４と、（ｅ）第１単音節認識処理部３で処理を
行った音節数を計測する音節数計測部５と、（ｆ）連続
音節認識処理を行う第１連続音節認識処理部６と、を含
む。Referring to FIG. 1, a first embodiment of the present invention comprises (a) a voice input unit 1 for inputting a voice uttered by a speaker, and (b) single syllable recognition processing and continuous syllable recognition processing. , A first syllable recognition processing unit 3 for performing (c) a single syllable recognition process, and (d) a recognition result obtained by the first syllable recognition processing unit 3. A recognition result storage section 4, (e) a syllable number measuring section 5 for measuring the number of syllables processed by the first single syllable recognition processing section 3, and (f) a first continuous syllable recognition processing for performing continuous syllable recognition processing. Unit 6.

【００１９】認識処理切り替え部２は、スイッチを１１
側に接続することにより音声入力部１と第１単音節認識
処理部３とを接続し、スイッチを１２側に接続すること
により音声入力部１と第１連続音節認識処理部６とを接
続する。The recognition process switching unit 2 switches the switch to 11
Side, the voice input unit 1 and the first single syllable recognition processing unit 3 are connected, and by connecting the switch to the 12 side, the voice input unit 1 and the first continuous syllable recognition processing unit 6 are connected. .

【００２０】第１単音節認識処理部３は、音節に区切っ
て発声された各音声に対して、認識処理を行い、単音節
の母音名を出力する。The first single syllable recognition processing unit 3 performs a recognition process on each of the voices divided into syllables and outputs a vowel name of the single syllable.

【００２１】認識結果記憶部４には、認識結果として単
音節の母音名が保持される。The recognition result storage unit 4 holds a vowel name of a single syllable as a recognition result.

【００２２】音節数計数部５では、第１単音節認識処理
部３から出力された認識結果の音節数をカウントする。The syllable number counting section 5 counts the number of syllables in the recognition result output from the first single syllable recognition processing section 3.

【００２３】第１連続音節認識部６は、入力された音節
数および母音名の系列を用いて音節に区切らずに連続し
て発声された音声に対して認識処理を行う。The first continuous syllable recognizing unit 6 performs a recognition process on a voice uttered continuously without being divided into syllables, using the input syllable number and the sequence of vowel names.

【００２４】図２は、本発明の第１の実施の形態の処理
手順を説明するための流れ図である。本発明の第１の実
施の形態の動作について図１および図２を用いて説明す
る。FIG. 2 is a flowchart for explaining the processing procedure of the first embodiment of the present invention. The operation of the first embodiment of the present invention will be described with reference to FIGS.

【００２５】まず、認識処理切り替え部２のスイッチを
１１側に接続し、音声入力部１と第１単音節認識処理部
３とを接続する（ステップ１）。First, the switch of the recognition processing switching section 2 is connected to the 11 side, and the voice input section 1 and the first monosyllable recognition processing section 3 are connected (step 1).

【００２６】音声入力部１は、マイクロフォン、フィル
タ、Ａ／Ｄコンバータなどから構成されており、発声さ
れた音声を入力し、第１単音節認識処理部３に出力する
（ステップ２）。The voice input unit 1 is composed of a microphone, a filter, an A / D converter, etc., inputs a uttered voice, and outputs it to the first monosyllable recognition processing unit 3 (step 2).

【００２７】第１単音節認識処理部３は、入力された音
声に対して、上記文献１に示されている方法を用いて、
単音節認識処理を行い（ステップ３）、単音節の母音名
を出力する。The first single syllable recognition processing unit 3 uses the method described in the above-mentioned document 1 for the input speech,
A single syllable recognition process is performed (step 3), and the vowel name of the single syllable is output.

【００２８】第１単音節認識処理部３で求められた単音
節の母音名Ａ（ｎ）（ｎ＝１、…、Ｎ）（Ｎは入力され
る音節の数）が、認識結果として認識結果記憶部４に格
納される（ステップ４）。同時に、音節数計数部５で、
音節数Ｎがカウントされる（ステップ５）。The syllable vowel name A (n) (n = 1,..., N) (N is the number of input syllables) obtained by the first single syllable recognition processing unit 3 is used as a recognition result. It is stored in the storage unit 4 (step 4). At the same time, the syllable number counting unit 5
The number of syllables N is counted (step 5).

【００２９】前記ステップ２からステップ５の処理は、
１単語分の音節の入力が終了するまで実行され（ステッ
プ６）、１単語分の音節の入力が終了したら、ステップ
７へ進む。The processing from step 2 to step 5 is as follows:
The process is executed until the input of the syllable for one word is completed (step 6). When the input of the syllable for one word is completed, the process proceeds to step 7.

【００３０】次に、認識処理切り替え部２のスイッチを
１２側に接続し、音声入力部１と第１連続音節認識処理
部６とを接続する（ステップ７）。Next, the switch of the recognition processing switching section 2 is connected to the 12 side, and the voice input section 1 and the first continuous syllable recognition processing section 6 are connected (step 7).

【００３１】音声入力部１は、発声された音声を入力
し、第１連続音節認識処理部６に出力する（ステップ
８）。The voice input unit 1 inputs the uttered voice and outputs it to the first continuous syllable recognition processing unit 6 (step 8).

【００３２】第１連続音節認識処理部６では、上記文献
２に示されているような、ＣＶパターン、ＶＣＶパター
ンを結合した標準パターンを用いた方法で認識を行う
（ステップ９）。この処理における母音部候補の抽出時
に、前記ステップ４およびステップ５で得られた、入力
された母音系列および音節数の情報を用いて候補を限定
する。The first continuous syllable recognition processing unit 6 performs recognition by a method using a standard pattern obtained by combining a CV pattern and a VCV pattern as shown in the above-mentioned document 2 (step 9). When vowel part candidates are extracted in this process, candidates are limited using the input vowel sequence and syllable number information obtained in steps 4 and 5.

【００３３】第１連続音節認識処理部６で求められた最
適の音節系列を認識結果の単語として出力する（ステッ
プ１０）。The optimal syllable sequence determined by the first continuous syllable recognition processing unit 6 is output as a word as a recognition result (step 10).

【００３４】次に、本発明第１の実施の形態の作用効果
について説明する。Next, the operation and effect of the first embodiment of the present invention will be described.

【００３５】本発明の第１の実施の形態は、第１単音節
認識処理部３で得られた認識結果を第１連続音節認識処
理部６に利用するので、連続音節認識処理で誤りやすい
音節数や母音系列についての情報を補うことができ、高
い子音認識性能を有する連続音節認識の性能が向上す
る。In the first embodiment of the present invention, since the recognition result obtained by the first single syllable recognition processing unit 3 is used for the first continuous syllable recognition processing unit 6, syllables that are susceptible to errors in the continuous syllable recognition processing are provided. Information on the number and vowel sequence can be supplemented, and the performance of continuous syllable recognition having high consonant recognition performance is improved.

【００３６】［実施例１］次に、本発明の第１の実施の
形態について具体例を以て説明すべく一実施例の動作を
説明する。[Example 1] Next, the operation of one example will be described in order to explain the first embodiment of the present invention with a specific example.

【００３７】たとえば、「よこはま」という単語を認識
させようとする場合について、図１、図２を参照して説
明する。For example, a case where the word "Yokohama" is to be recognized will be described with reference to FIGS.

【００３８】まず、認識処理切り替え部２のスイッチを
１１側に接続して単音節認識処理を選択する（ステップ
Ｓ１）。「よこはま」を音節で区切り、最初の音節
「よ」を発声し、音声入力部１より入力する。入力され
た音声は、第１単音節認識処理部３に送られる（ステッ
プＳ２）。First, the switch of the recognition process switching section 2 is connected to the 11 side to select a single syllable recognition process (step S1). “Yokohama” is separated by syllables, and the first syllable “yo” is uttered and input from the voice input unit 1. The input voice is sent to the first single syllable recognition processing unit 3 (step S2).

【００３９】第１単音節認識処理部３では、まず、入力
された発声「よ」に対して、引用文献１に示されるよう
に、ＬＰＣ分析を行ってＬＰＣケプストラムを抽出し、
音声区間を検出する。The first monosyllable recognition processing unit 3 first performs an LPC analysis on the input utterance “yo” as shown in the cited document 1 to extract an LPC cepstrum.
Detect voice section.

【００４０】検出した音声区間から子音部と母音部とを
分離し、母音部の認識を行う。その結果、音節「よ（ｙ
ｏ）」の母音名「ｏ」が単音節認識処理の結果となる
（ステップＳ３）。The consonant part and the vowel part are separated from the detected voice section, and the vowel part is recognized. As a result, the syllable "yo (y
The vowel name "o" of "o)" is the result of the monosyllable recognition process (step S3).

【００４１】出力された母音名Ａ（１）＝ｏが、認識結
果記憶部４に格納される（ステップＳ４）。The output vowel name A (1) = o is stored in the recognition result storage unit 4 (step S4).

【００４２】また、音節数計数部５で、音節数Ｎが初期
値０から１にカウントされる（ステップＳ５）。The syllable number counting section 5 counts the syllable number N from 0 to 1 (step S5).

【００４３】続いてステップＳ２に戻り、次の音節
「コ」を同様に音声入力部１に入力する。以降、発声さ
れた「こ」の音声信号についても、同様にして、第１単
音節認識処理部３で前記単音節認識処理が施され（ステ
ップＳ３）、最も類似度の高い音節「こ（ｋｏ）」の母
音名「ｏ」がＡ（２）＝ｏとして、認識結果記憶部に追
加され（ステップＳ４）、認識結果記憶部４には、母音
系列Ａ（ｎ）＝｛ｏ，ｏ｝（ｎ＝１、２）が登録され
る。このとき、音節数計数部５で、音節数Ｎが１から２
にカウントされる（ステップＳ５）。Subsequently, the flow returns to step S2, and the next syllable "" is similarly input to the voice input unit 1. Thereafter, the first syllable recognition processing unit 3 similarly performs the single syllable recognition processing on the uttered voice signal of “ko” (step S3), and obtains the syllable “ko (ko) having the highest similarity. )) Is added to the recognition result storage unit as A (2) = o (step S4), and the vowel sequence A (n) = {o, o} ( n = 1, 2) is registered. At this time, the syllable number counting section 5 sets the syllable number N to 1 to 2
(Step S5).

【００４４】以降、「は」、「ま」についても、ステッ
プＳ２からステップＳ５まで同様の処理がなされ、認識
結果記憶部４に、母音系列Ａ（ｎ）＝｛ｏ，ｏ，ａ，
ａ｝（ｎ＝１、…、Ｎ）が登録され、音節数計数部５で
は入力された音節の数Ｎ＝４が計数された状態となる。Thereafter, the same processing is performed from step S2 to step S5 for “ha” and “ma”, and the vowel sequence A (n) = ｛o, o, a,
a｝ (n = 1,..., N) are registered, and the syllable number counting unit 5 counts the number of input syllables N = 4.

【００４５】続いて、認識処理切り替え部２のスイッチ
を１２側に接続して、連続音節認識処理を選択する（ス
テップＳ７）。Subsequently, the switch of the recognition processing switching section 2 is connected to the 12 side to select the continuous syllable recognition processing (step S7).

【００４６】音節を連続して「よこはま」と発声し、音
声入力部１に入力する（ステップＳ８）。The syllables are uttered continuously as "Yokohama" and input to the voice input unit 1 (step S8).

【００４７】入力された音声は、第１連続音節認識処理
部６に送られる。The input speech is sent to the first continuous syllable recognition processing unit 6.

【００４８】第１連続音節認識処理部６では、上記文献
２に記載されているように、ＣＶパターン、ＶＣＶパタ
ーンを結合した標準パターンを用いた方法で、認識処理
を行う。この時、母音部候補の抽出時に、認識結果記憶
部４に格納されている母音系列Ａ（ｎ）＝｛ｏ，ｏ，
ａ，ａ｝（ｎ＝１、…、Ｎ）と、音節数計数部５で計数
された母音数Ｎ＝４を用いて、母音部候補を限定する
（ステップＳ９）。The first continuous syllable recognition processing unit 6 performs a recognition process by using a standard pattern obtained by combining a CV pattern and a VCV pattern, as described in the above reference 2. At this time, when the vowel part candidates are extracted, the vowel sequence A (n) = {o, o,
Vowel part candidates are limited using a, a 、 (n = 1,..., N) and the vowel number N = 4 counted by the syllable number counting unit 5 (step S9).

【００４９】上記処理の結果から、最適な音節系列は、
「よ」、「こ」、「は」、「ま」が得られ、これを「よ
こはま」という単語として出力する（ステップＳ１
０）。From the results of the above processing, the optimal syllable sequence is
“Yoko”, “Ko”, “Ha”, and “Ma” are obtained, and these are output as the word “Yokohama” (step S1).
0).

【００５０】［実施の形態２］次に本発明の第２の実施
の形態について図面を参照して説明する。図３は、本発
明の第２の実施の形態の構成を示すブロック図である。[Second Embodiment] Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 3 is a block diagram showing a configuration of the second exemplary embodiment of the present invention.

【００５１】図３において、音声入力部１、認識処理切
り替え部２、認識結果記憶部４、音節数計数部５は、図
１に示した前記第１の実施の形態の構成と同一であるの
で、説明は省略する。In FIG. 3, the voice input unit 1, the recognition processing switching unit 2, the recognition result storage unit 4, and the syllable number counting unit 5 are the same as those in the first embodiment shown in FIG. The description is omitted.

【００５２】図１に示した前記第１の実施の形態の第１
単音節認識処理部３は、母音系列Ａ（ｎ）（ｎ＝１、
…、Ｎ）を出力しているが、図３に示した本発明の第２
の実施の形態における第２の単音節認識処理部２１は、
母音系列Ａ（ｎ）（ｎ＝１、…、Ｎ）に加えて、上位Ｍ
位（Ｍは、予め与えられた値）までの音節認識結果を、
音節系列Ｂ（ｎ，ｍ）（ｎ＝１、…、Ｎ：ｍ＝１、…、
Ｍ）として出力する。The first embodiment of the first embodiment shown in FIG.
The syllable recognition processing unit 3 outputs the vowel sequence A (n) (n = 1,
.., N) are output.
The second monosyllable recognition processing unit 21 in the embodiment
In addition to the vowel sequence A (n) (n = 1,..., N),
The syllable recognition result up to the position (M is a predetermined value)
Syllable sequence B (n, m) (n = 1,..., N: m = 1,.
M).

【００５３】また、本発明の第２の実施の形態における
第２連続音節認識処理部２２では、連続音節認識を行う
時、前記第１の実施の形態で説明したように、得られた
母音系列および音節数の情報を母音部候補の抽出時に用
いるだけでなく、上位Ｍ位までの音節系列Ｂ（ｎ，ｍ）
（ｎ＝１、…、Ｎ：ｍ＝１、…、Ｍ）に対するＣＶパタ
ーン、ＶＣＶパターンの標準パターンとのみマッチング
を行う。Further, in the second continuous syllable recognition processing section 22 according to the second embodiment of the present invention, when performing the continuous syllable recognition, as described in the first embodiment, the obtained vowel sequence And information on the number of syllables are not only used when extracting vowel parts, but also the syllable sequence B (n, m) up to the top M
(N = 1,..., N: m = 1,..., M) are matched only with the standard pattern of the CV pattern and the VCV pattern.

【００５４】次に本発明の第２の実施の形態の動作につ
いて、図２と図３を用いて説明する。Next, the operation of the second embodiment of the present invention will be described with reference to FIGS.

【００５５】ステップＳ１からステップＳ２で示される
第２の実施の形態における動作は第１の実施の形態の動
作と同一であるので説明は省略する。The operation in the second embodiment shown in steps S1 and S2 is the same as the operation in the first embodiment, and therefore the description is omitted.

【００５６】ステップＳ３の単音節認識処理では、母音
系列Ａ（ｎ）（ｎ＝１、…、Ｎ）だけでなく、上記文献
１で示されているように、子音部と母音部を分離し、語
頭の過渡部についてシフトマッチングを行い、それに音
節全体の大局的なスペクトルパターン特徴を組み合わせ
て音節の認識を行って得られる上位Ｍ位までの単音節名
Ｂ（ｎ，ｍ）（ｎ＝１、…、Ｎ：ｍ＝１、…、Ｍ）も出
力し、ステップＳ４でその母音系列と単音節名を認識結
果記憶部４に格納する、点が前記第１の実施の形態の動
作と異なる。In the single syllable recognition processing in step S3, not only the vowel sequence A (n) (n = 1,..., N) but also the consonant part and the vowel part are separated as shown in the above-mentioned reference 1. , S-syllable names B (n, m) (n = 1) up to the top M obtained by performing shift matching on the transient part of the beginning of the word and combining it with global spectral pattern features of the entire syllable to recognize syllables ,..., N: m = 1,..., M) are output, and the vowel sequence and monosyllabic name are stored in the recognition result storage unit 4 in step S4, which is different from the operation of the first embodiment. .

【００５７】ステップＳ５からステップＳ８で示される
第２の実施の形態における動作は、前記第１の実施の形
態の動作と同一であるので説明は省略する。The operation in the second embodiment shown in steps S5 to S8 is the same as the operation in the first embodiment, and a description thereof will be omitted.

【００５８】ステップＳ９で連続音節認識処理を行う
際、図３の第２の連続音節認識処理部２２では、前記第
１の実施の形態で説明したように、得られた母音系列お
よび音節数の情報を母音部候補の抽出時に用いるだけで
なく、上記文献２で示されるような処理を次の手順で行
う点が第１の実施の形態の動作と異なる。When performing the continuous syllable recognition processing in step S9, the second continuous syllable recognition processing unit 22 shown in FIG. 3 executes the processing of the obtained vowel sequence and syllable number as described in the first embodiment. The operation of the first embodiment is different from that of the first embodiment in that not only the information is used at the time of extracting the vowel part candidates, but also the processing shown in the above-mentioned document 2 is performed in the following procedure.

【００５９】その手順とは、入力音声を特徴ベクトルに
変換し、母音部候補を抽出して各候補の母音名、区間を
求めた後、上位Ｍ位までの音節系列Ｂ（ｎ，ｍ）（ｎ＝
１、…、Ｎ：ｍ＝１、…、Ｍ）に対するＣＶパターン、
ＶＣＶパターンの標準パターンに限定して、ＣＶパター
ン、ＶＣＶパターンレベルのＤＰを用いてマッチングを
行い、その結果から入力全体に対する最適なＣＶパター
ン、ＶＣＶパターンの系列を認識結果として求めるもの
である。The procedure is as follows: the input speech is converted into a feature vector, vowel part candidates are extracted, vowel names and sections of each candidate are obtained, and then the syllable sequence B (n, m) ( n =
1,..., N: CV pattern for m = 1,.
The matching is performed using the CV pattern and the VCV pattern level DP limited to the standard pattern of the VCV pattern, and from the result, an optimal CV pattern and VCV pattern sequence for the entire input is obtained as a recognition result.

【００６０】ステップＳ１０で示される第２の実施の形
態における動作は、前記第１の実施の形態の動作と同一
であるので説明は省略する。The operation in the second embodiment shown in step S10 is the same as the operation in the first embodiment, and a description thereof will be omitted.

【００６１】次に、本発明の第２の実施の形態の作用効
果について説明する。Next, the operation and effect of the second embodiment of the present invention will be described.

【００６２】本発明の第２の実施の形態は、図３の第２
連続音節認識部２２におけるＣＶパターン、ＶＣＶパタ
ーンレベルのマッチング処理を図３の第２単音節認識処
理部２１で得られた単語を構成する音節だけに制限する
ことが可能となるので、連続音節認識の処理量が低減で
きる。The second embodiment of the present invention is similar to the second embodiment shown in FIG.
Since the matching processing of the CV pattern and the VCV pattern level in the continuous syllable recognition unit 22 can be limited to only the syllables constituting the words obtained by the second single syllable recognition processing unit 21 in FIG. Can be reduced.

【００６３】［実施例］次に本発明の第２の実施の形態
について具体例を以て説明すべく一実施例の動作を図
２、図３および図４を用いて説明する。[Embodiment] Next, the operation of one embodiment will be described with reference to FIGS. 2, 3 and 4 in order to explain the second embodiment of the present invention with a concrete example.

【００６４】図４は、本発明の第２の実施の形態での認
識結果記憶部４に格納された内容を示す図である。FIG. 4 is a diagram showing the contents stored in the recognition result storage unit 4 according to the second embodiment of the present invention.

【００６５】たとえば「よこはま」という単語を認識さ
せようとする場合について図を参照して説明する。For example, a case where the word "Yokohama" is to be recognized will be described with reference to the drawings.

【００６６】ステップＳ１からステップＳ２で示される
第２の実施の形態における実施例の動作は、前記第１の
実施の形態の動作と同一であるので説明は省略する。The operation of the example in the second embodiment shown in steps S1 and S2 is the same as the operation of the first embodiment, and a description thereof will be omitted.

【００６７】第２の実施の形態が、前記第１の実施の形
態と相違する点は、最初の音節「よ」の入力に対し、ス
テップＳ３で母音Ａ（１）＝ｏだけでなく、単音節認識
結果の上位Ｍ位の候補であるＢ（１，ｍ）＝｛お，よ，
…，ご｝（ｍ＝１、…、Ｍ）が出力され、これらがステ
ップＳ４で認識結果記憶部４に格納される点である。以
下の音節についても同様に処理され、すべての音節の入
力が終了した時点で認識結果記憶部４に格納される母音
系列および音節系列は、図４に示すように、Ａ（ｎ）＝｛ｏ，ｏ，ａ，ａ｝（ｎ＝１、…、Ｎ）Ｂ（ｎ，ｍ）＝｛“お，よ，…，ご”，“こ，ご，…，
と”，“か，あ，…，は”，“ま，な，…，あ”｝（ｍ
＝１、…、Ｍ）となる。The difference between the second embodiment and the first embodiment is that not only the vowel A (1) = o but also the simple vowel A (1) = o B (1, m) = ｛, which is the top M candidates in the syllable recognition result
.., M (m = 1,..., M) are output, and these are stored in the recognition result storage unit 4 in step S4. The same processing is performed for the following syllables, and the vowel sequence and the syllable sequence stored in the recognition result storage unit 4 when input of all the syllables is completed are A (n) = ｛o as shown in FIG. , O, a, a｝ (n = 1,..., N) B (n, m) = ｛“O, yo,.
And ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
= 1,..., M).

【００６８】ステップＳ５からステップＳ８で示される
第２の実施の形態における動作は、前記第１の実施の形
態の動作と同一であるので説明は省略する。The operation in the second embodiment shown in steps S5 to S8 is the same as the operation in the first embodiment, and a description thereof will be omitted.

【００６９】ここで、第２の実施の形態では、ステップ
Ｓ９の連続音節認識処理において、母音部候補を抽出す
る際に、前記第１の実施の形態と同様に母音系列と音節
数を用いるだけでなく、ＣＶパターン、ＶＣＶパターン
の標準パターンとマッチング処理を行う際、その対象
を、１番目の音節は｛お，よ，…，ご｝に、２番目の音
節は｛こ，ご，…，と｝に、…と、以下同様に認識結果
記憶部４に格納されている音節だけに限定して行う。Here, in the second embodiment, in the continuous syllable recognition processing of step S9, when vowel part candidates are extracted, only the vowel sequence and the number of syllables are used as in the first embodiment. However, when performing the matching process with the standard pattern of the CV pattern and the VCV pattern, the first syllable is ｛,,..., The second syllable is ｛,,,. ..,...,...,...

【００７０】ステップＳ１０で示される第２の実施の形
態における実施例の動作は、前記第１の実施の形態の動
作と同一であるので説明は省略する。The operation of the example in the second embodiment shown in step S10 is the same as the operation of the first embodiment, and therefore the description is omitted.

【００７１】次に本発明の第１および第２の実施の形態
のその他の変形について説明する。Next, other modifications of the first and second embodiments of the present invention will be described.

【００７２】前記第１および第２の実施の形態では、単
音節認識処理では、上記文献１に示されているような、
検出した音声から子音部と母音部を分離し、語頭の過渡
部についてシフトマッチングを行い、それに音節全体の
大局的スペクトルパターン特徴を組み合わせて認識を行
い、これらの結果を組合せて判定する方法を用いている
が、上記文献１に示されているような、子音部および母
音部の認識にＤＰマッチングを適応する方法も可能であ
る。In the first and second embodiments, in the monosyllable recognition processing, as shown in the above-mentioned reference 1,
Using a method that separates consonants and vowels from the detected speech, performs shift matching on the transient part of the beginning of the word, combines it with global spectral pattern features of the entire syllable, and combines these results to determine However, a method of adapting DP matching to recognition of a consonant part and a vowel part as shown in the above-mentioned document 1 is also possible.

【００７３】また前記第１および第２の実施の形態で
は、連続音節認識には、上記文献２に示されているよう
な、ＣＶパターン、ＶＣＶパターンを結合した標準パタ
ーンを用いて音節を認識する方法を用いているが、例え
ば、電子通信学会技術研究報告ＰＲＬ７５−４４に、
「ＶＣＶ音節を単位とした連続単語音声の認識」と題し
て掲載された論文（以下「文献３」という）に示されて
いるような、入力音声をＶＣＶ音節単位にセグメントし
てセグメント毎に認識する方法などを用いることも可能
である。In the first and second embodiments, the continuous syllables are recognized by using a standard pattern obtained by combining a CV pattern and a VCV pattern as shown in the above reference 2. Although the method is used, for example, in the IEICE technical report PRL75-44,
Input speech is segmented into VCV syllable units and recognized for each segment as shown in a paper (hereinafter referred to as “Reference 3”) titled “Recognition of continuous word speech in units of VCV syllables”. It is also possible to use a method such as

【００７４】前記第１の実施の形態では、第１単音節認
識処理部３で得られた母音系列を認識結果記憶部４に出
力し、第１連続音節認識処理部６で、母音系列と音節数
の両方を用いて認識処理を行うが、認識結果記憶部４に
母音系列を出力せずに、音節数計数部５で計数された音
節数のみを用いて第１連続音節認識処理部６における母
音部の候補を限定行うことも可能である。この場合、第
１単音節認識処理部３では、音声の検出だけを行えばよ
い。In the first embodiment, the vowel sequence obtained by the first syllable recognition processing unit 3 is output to the recognition result storage unit 4, and the vowel sequence and the syllable sequence are output by the first continuous syllable recognition processing unit 6. Although the recognition process is performed using both of the numbers, the first continuous syllable recognition processing unit 6 uses only the number of syllables counted by the syllable number counting unit 5 without outputting the vowel sequence to the recognition result storage unit 4. Vowel part candidates can be limited. In this case, the first single syllable recognition processing unit 3 only needs to detect the voice.

【００７５】また、前記第１および第２の実施の形態で
は、認識単位として音節を用いているが、これを、例え
ばアルファベットにすることも可能である。アルファベ
ットを用いた場合も、前記の実施の形態と同様に、アル
ファベット認識処理、連続アルファベット認識処理を行
うことが可能である。In the first and second embodiments, syllables are used as recognition units. However, the syllables can be converted into, for example, alphabets. Even when alphabets are used, it is possible to perform alphabet recognition processing and continuous alphabet recognition processing as in the above embodiment.

【００７６】そして前記第２の実施の形態では、連続音
節認識でマッチング処理の対象とするものを上位Ｍ位の
候補としているが、これを類似度が一定閾値以上の候補
とすることも可能である。In the second embodiment, the candidate for the matching process in the continuous syllable recognition is set as the top M candidates. However, it is also possible to set the candidates whose similarity is equal to or higher than a certain threshold value. is there.

【００７７】[0077]

【発明の効果】以上説明したように、本発明によれば、
単音節認識での認識結果を連続音節認識に利用すること
により、連続音節認識の認識性能の向上を図ることがで
きる、という効果を奏する。As described above, according to the present invention,
By utilizing the recognition result of single syllable recognition for continuous syllable recognition, it is possible to improve the recognition performance of continuous syllable recognition.

【００７８】その理由は、単音節認識で得られた単語を
構成する音節数や各音節の母音系列は連続音節認識より
も精度が高いので、これらを利用することにより、連続
音節認識での音節の挿入や脱落による誤認識や、母音部
の誤認識を減少させることが可能となるからである。ま
た、これにより、単音節認識だけでは認識精度の低い子
音の認識に対して、連続音節認識の結果を用いることが
できるので、認識性能の向上が図れるからである。The reason is that the number of syllables constituting a word and the vowel sequence of each syllable obtained by single syllable recognition are higher in accuracy than continuous syllable recognition. This is because it is possible to reduce erroneous recognition due to insertion or omission of vowels and erroneous recognition of vowel parts. In addition, since the result of continuous syllable recognition can be used for recognition of a consonant having low recognition accuracy only by single syllable recognition, the recognition performance can be improved.

【００７９】さらに、本発明では、連続音節認識におけ
る処理量の削減を図ることができるという効果を奏す
る。Further, the present invention has an effect that the processing amount in continuous syllable recognition can be reduced.

【００８０】その理由は、連続音節認識における各音節
のマッチング対象となる音節を、単音節認識で得られた
一定範囲の上位候補に対応する音節に制限することが可
能となるからである。The reason is that syllables to be matched with each syllable in continuous syllable recognition can be limited to syllables corresponding to upper candidates in a certain range obtained by single syllable recognition.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の音声認識装置の第１の実施の形態の構
成を示す図である。FIG. 1 is a diagram showing a configuration of a first embodiment of a speech recognition device of the present invention.

【図２】本発明の実施の形態の処理手順を示すフローチ
ャートである。FIG. 2 is a flowchart illustrating a processing procedure according to the embodiment of the present invention.

【図３】本発明の音声認識装置の第２の実施の形態の構
成を示す図である。FIG. 3 is a diagram showing a configuration of a second embodiment of the speech recognition device of the present invention.

【図４】本発明の第２の実施の形態における認識結果記
憶部４に格納された内容を示す図である。FIG. 4 is a diagram showing contents stored in a recognition result storage unit 4 according to the second embodiment of the present invention.

【符号の説明】[Explanation of symbols]

１音声入力部２認識処理切り替え部３第１単音節認識処理部４認識結果記憶部５音節数計数部６第１連続音節認識処理部１１単音節認識処理を選択する時のスイッチ接続点１２連続音節認識処理を選択する時のスイッチ接続点２１第２単音節認識処理部２２第２連続音節認識処理部 DESCRIPTION OF SYMBOLS 1 Speech input part 2 Recognition processing switching part 3 1st syllable recognition processing part 4 Recognition result storage part 5 Syllable number counting part 6 1st continuous syllable recognition processing part 11 Switch connection point when selecting single syllable recognition processing 12 Continuous Switch connection point when selecting syllable recognition processing 21 Second monosyllable recognition processing unit 22 Second continuous syllable recognition processing unit

Claims

【特許請求の範囲】[Claims]

【請求項１】認識単位に区切って発声された音声に対し
て認識単位毎の認識処理を行う単位音声認識処理部と、前記認識単位に区切って発声された音声に関連して連続
に発声された音声に対して、前記認識単位毎の認識結果
を用いて連続認識処理を行う連続音声認識処理部と、を含むことを特徴とする音声認識装置。1. A unit speech recognition processing unit for performing a recognition process for each recognition unit on speech uttered in recognition units, and a speech uttered continuously in relation to the speech uttered in recognition units. A continuous speech recognition processing unit that performs continuous recognition processing on the obtained speech using the recognition result for each recognition unit.

【請求項２】音声入力部と、認識単位に区切って発声された音声に対して認識単位毎
の認識処理を行う単位音声認識処理部と、前記認識単位に区切って発声された音声に関連して連続
に発声された音声に対して、前記認識単位毎の認識結果
を用いて連続認識処理を行う連続音声認識処理部と、前記音声入力部の出力を前記単位音声認識処理部または
前記連続音声認識処理部の入力に切替える切替部と、を備え、はじめに前記切替部が前記単位音声認識処理部を選択し
ておき、音節に区切って発声された音声を前記単位音声
認識処理部で認識し、次に前記切替部で連続音声認識処理部を選択して、連続
して発声された音声に対して、前記単位音声認識処理部
で得られた情報を用いて前記連続音声認識処理部で認識
を行い、得られた認識結果を出力する、ことを特徴とする音声認識装置。2. A speech input unit, a unit speech recognition processing unit for performing recognition processing for each recognition unit on speech uttered in units of recognition, and a unit associated with the speech uttered in units of recognition. A continuous speech recognition processing unit that performs a continuous recognition process on the speech uttered continuously by using the recognition result for each recognition unit; and outputs the output of the speech input unit to the unit speech recognition processing unit or the continuous speech. A switching unit that switches to an input of a recognition processing unit, and, first, the switching unit selects the unit voice recognition processing unit, and recognizes the voice uttered in syllables by the unit voice recognition processing unit, Next, the continuous voice recognition processing unit is selected by the switching unit, and the continuously uttered voice is recognized by the continuous voice recognition processing unit using the information obtained by the unit voice recognition processing unit. Performed and obtained recognition results Output to the voice recognition apparatus characterized by.

【請求項３】前記単位音声認識処理部において、認識し
た結果を記憶部に記憶し、同時に音節数計数部で音節数
をカウントし、連続して発声された音声に対して、前記
単位音声認識処理部で得られた音節数、もしくは認識結
果及び音節数に基づいて、前記連続音声認識処理部で認
識を行う、ことを特徴とする請求項２記載の音声認識装
置。3. The unit voice recognition processing unit stores the result of recognition in a storage unit, and simultaneously counts the number of syllables in a syllable number counting unit. The speech recognition device according to claim 2, wherein the continuous speech recognition processing unit performs recognition based on the number of syllables obtained by the processing unit, or the recognition result and the number of syllables.