JPS59127098A

JPS59127098A - Continuous word voice recognition equipment

Info

Publication number: JPS59127098A
Application number: JP310783A
Authority: JP
Inventors: 羽金　廣
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-01-12
Filing date: 1983-01-12
Publication date: 1984-07-21

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は連続単語音声認識装置における比較操作の方法
を改良し、認識率の向上を図った装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a continuous word speech recognition device that improves the comparison operation method and improves the recognition rate.

従来この種の連続単語音声認識装置（以下認識−装置と
略す）は、まず利用者にあらかじめ認識させる単語をひ
と通り単独に区切って発生させ（以下孤立単語と呼ぶ）
、単語毎の音声パタンを標準バタンとして装置内に記憶
させ（上記操作を標準バタンの登録と呼ぶ）、次に、入
力させる連続単語音声（以下入力バタンと呼ぶ）に対し
て、各標準バタンとの間で比較操作（バタンマツチング
）を行い、両者の一致の度合（類似度ンを調べ、最大一
致の得られる標準バタンの組合せを決定し、これと同じ
単語に属すると判定する方法がある。Conventionally, this type of continuous word speech recognition device (hereinafter abbreviated as recognition device) first generates a series of words that the user wants to recognize in advance by dividing them into individual words (hereinafter referred to as isolated words).
, the sound pattern for each word is stored in the device as a standard button (the above operation is called standard button registration), and then each standard button is registered for the continuous word sound to be input (hereinafter referred to as input button). There is a method of performing a comparison operation (batan matching) between the two, checking the degree of matching (similarity) between the two, determining the standard combination of batons that yields the maximum match, and determining that they belong to the same word. .

この方法を能率よく、かつ精度よく実現する方法として
、動的計画法（ダイナミックプログラミング、以下ＤＰ
と略″ｊ）を利用した認識技術が知られている。（「特
願昭５０−１３２００３　　および１３２００４号」、
°以下引用文献と称す）。本引用文献には上記メタ／マ
ツチング法による認識装置の動作原理が記載されている
。この原理の概要は次のようである。何個かの単語が連
続している入力バタンに対し、何個かの標準バタンをあ
らゆる順列で接続することによって得られるバタンを入
力バタンの標準バタンと考えて、入力バタン全体とのマ
ツチングを行なう。この結果得られる類：似蛙□が最大
となるような標準バタ／の個数と順列組合せを・定める
ことによって認識を行なう。実際には上記最大化を単語
単位での最大化処理と、全体レベルでの最大化処理に分
割し、各最大化処理をＤＰ金利用して実行することによ
り、処理量を低減し実用的な処理速度を達成している。Dynamic programming (hereinafter referred to as DP) is a method to implement this method efficiently and accurately.
Recognition technology using the abbreviation "j" is known.
°hereinafter referred to as cited documents). This cited document describes the operating principle of a recognition device using the above meta/matching method. The outline of this principle is as follows. For an input button that has several consecutive words, the button obtained by connecting several standard buttons in any permutation is considered as the standard button of the input button, and is matched with the entire input button. . Recognition is performed by determining the number and permutation combination of standard butterflies that maximize the resulting class: similar frog□. In reality, the above-mentioned maximization is divided into word-by-word maximization processing and overall-level maximization processing, and each maximization process is executed using DP money, thereby reducing the amount of processing and making it more practical. Processing speed has been achieved.

以上述べた引用文献記載の方法が、従来では最も有効な
認識法であると考えられる。The method of describing cited documents described above is considered to be the most effective recognition method to date.

しかしながら、上述の方法等により連続単語音声が認識
可能であるとしても、実用に供する場合には、神々の要
因により誤認識が生ずる。特に利用者の発声速度がある
程度速くなると、発音のなまけ等により孤立単語による
標準バタンと入力バタン中の該当単語パタンとの間で時
間的のみならず周波数構造的にもずれが生じ、単に時間
軸正規化マツチングのみでは対処しきれない場合がある
。However, even if continuous word speech can be recognized by the above-mentioned method, erroneous recognition may occur due to divine factors in practical use. In particular, when the user's speaking speed increases to a certain extent, due to lag in pronunciation, a gap occurs not only in time but also in the frequency structure between the standard bat by an isolated word and the corresponding word pattern in the input bat, and it is simply a time axis. There are cases where normalized matching alone cannot solve the problem.

特に孤立単語と連続単語中の該描する単語間では入力バ
タンを構成する単語系列により、各単語ｑ端点付近にお
いて、孤立単語の周波数構造と大きく異なることがある
。例えば数字単語音声「０」、「１」、・・・「９」を
標準バタンとして連続数字を認識する場合を考えてみよ
う。一般に、日本語の場合数ｒ６１１ｒｏｋｕｌ（ｌ　
＋は発音内容を示す）を単独で発声する場合、語尾の１
−ｋｕｌの部分は有声化することが多い。一方連続単語
中での「６」は後続数字によ５１ｒｏｋｕｌのｋｕの部
分が無声化することがある。例えば後続数字が１３」１
ｓａｎｌＯ時には通常無声化し、「６３」は１ｒｏｋｕ
ｓａｎｌではな（ｌ　ｒｏｋｓａｎ　ｌと発声される傾
向が強い。従って標準バタンとして単独に発声された数
字「６」のみでは、こうしたバタンの変化に対処できず
結果として認識率の低下をひきおこすことになる。In particular, between an isolated word and the drawn word in a continuous word, the frequency structure may differ greatly from the isolated word's frequency structure near the q end point of each word, depending on the word sequence forming the input button. For example, let us consider a case where continuous numbers are recognized using the number words "0", "1", ... "9" as standard buttons. In general, in the case of Japanese, the number r611 rokul (l
+ indicates the content of pronunciation), when uttering it alone, the 1 at the end of the word
The -kul part is often voiced. On the other hand, in the case of "6" in a continuous word, the ku part of 51 rokul may become devoiced due to the subsequent digits. For example, the following digit is 13''1
When it is sanlO, it is usually silent, and "63" is 1roku.
There is a strong tendency for the number ``6'' to be uttered as sanl (l roksan l).Therefore, the number ``6'' uttered alone as a standard bang cannot cope with these changes in the bang, resulting in a decrease in the recognition rate. .

同様のことは英語数字に対しても存在する。例えばｒ　
８Ｊ　（ｅｉｇｈｔ）、［６Ｊ　（ｓｉｘ）における語
尾は単独に発声させた場合比較的明確に発声されるが、
後続単語が存在するときには語尾の１−ｔｌ、１−Ｘｌ
はほとんど発声されない場合が多い。The same thing exists for English numbers. For example r
The endings in 8J (eight) and [6J (six) are pronounced relatively clearly when uttered alone, but
1-tl, 1-Xl at the end of the word when a following word exists
is often hardly uttered.

このように、単独で発声した単語バタンと連続単語バタ
ン中のそれとは時間的に変化する周波数構造において、
特に単語の端点付近で大きな違いが存在する場合がある
。この違いに対処するために、標準バタン登録時に、強
制的に発声法を変え、語尾の無声化した発声、あるいは
通常の発声とは異なる不自然な発声法を行ない標準バタ
ンに登録する方法も考えられるが、上記方法では連続単
語中での単語内の変化に対し、十分に対処できる単語バ
タンを得ることは困難でおり、また利用者にも余分な負
担を与え、現実的方法とはいえない。In this way, the frequency structure that changes over time between the word bang uttered singly and that in continuous word bang,
There may be large differences, especially near the endpoints of words. In order to deal with this difference, we are considering a method of forcibly changing the vocalization method when registering the standard baton, making the ending of the word unvoiced, or using an unnatural vocalization method that is different from normal vocalization, and then registering it as a standard baton. However, with the above method, it is difficult to obtain word slams that can sufficiently deal with changes within words in consecutive words, and it also places an extra burden on the user, so it cannot be said to be a realistic method. .

上述のように、従来の孤立単語のみで標準バタンを構成
する認識装置においては、連続単語中で生ずる周波数構
造の変化に十分対処することが困難となり、認識率が低
下するという問題があった。As described above, in the conventional recognition device that forms a standard baton using only isolated words, it is difficult to adequately deal with changes in the frequency structure that occur in continuous words, resulting in a reduction in recognition rate.

本発明の目的は、複数の単語が連続して発声された時、
各単語の接続点近傍に生ずる周波数の構造的変化の影響
を弱め、同時に各単語が持つ特徴的周波数部分に重みを
付けて標準パタ／との比較を行う事によジ、高性能な連
続単語音声認識装置を提供することにある。The purpose of the present invention is to: When multiple words are uttered in succession,
By weakening the influence of structural changes in frequency that occur near the connection points of each word, and at the same time weighting the characteristic frequency portion of each word and comparing it with a standard pattern, high-performance continuous words can be achieved. An object of the present invention is to provide a speech recognition device.

上述した様に入力バタン（連続単語音声）の周波数の構
造的変化が大きい部分は各単語の接続点近傍でありこの
部分の音圧レベルは、接続的近傍であるゆえ他の部分に
比べてレベルが低い。又その人力バタンの接続点近傍を
さけて各単語の音圧レベルが高い所のみに着目すれば、
その部分を各単語それぞれの特徴的周波数区間とするこ
とができる。本発明の特徴は、上述した音圧レベルの変
化情報を利用して連続単語音声認識を実現させるため、
孤立単語の周波数の時間的変化情報に加えて、音圧レベ
ルの時間的変化情報も標準バタンとし、単語単位での最
大化処理を行う際に各標準バタンの音圧レベルの低い部
分について入カバクンとの類似の尺度を相対的に小さ′
くする事により、全体レベルでの最大化処理を確笑にす
る。即ち連続単語音声を高精度で認識する様にしたこと
にある。As mentioned above, the part where the structural change in the frequency of the input bang (continuous word speech) is large is near the connection point of each word, and the sound pressure level of this part is lower than other parts because it is connected. is low. Also, if you avoid the vicinity of the connection point of the manual slam and focus only on the areas where the sound pressure level of each word is high,
This portion can be used as the characteristic frequency section of each word. The feature of the present invention is to realize continuous word speech recognition using the above-mentioned sound pressure level change information.
In addition to the temporal change information of the frequency of isolated words, the temporal change information of the sound pressure level is also used as a standard bang, and when performing maximization processing on a word-by-word basis, the low sound pressure level part of each standard bang is input. A relatively small measure of similarity with
By doing so, we can ensure maximum processing at the overall level. That is, continuous word speech can be recognized with high accuracy.

次に図面を参照して本発明の詳細な説明する。Next, the present invention will be described in detail with reference to the drawings.

最初に本発明の装置が実行する動作原理を数式的に表現
すると次のようになる。マイクロホン等により入力され
る音声信号は周波数分析回路により分析処理され、周波
数構造等を表わす多次元特徴ベクトルａｉ　　の時系列
パタンＡとして表わすことができる。First, the principle of operation carried out by the apparatus of the present invention can be expressed mathematically as follows. An audio signal inputted by a microphone or the like is analyzed by a frequency analysis circuit and can be expressed as a time series pattern A of a multidimensional feature vector ai representing a frequency structure and the like.

Ａ＝＝ａ１１　ａ２．　・・・＋　ａｉ＋　・”　＋　
ａ（（１）一方単独に発声された各単語（孤立単語）パ
タンも、同様に分析され時系列バタンＢとして表わすこ
とができる。A==a11 a2.・・・+ ai+ ・”+
a((1) On the other hand, each word (isolated word) pattern uttered singly can also be analyzed in the same way and expressed as a time-series bang B.

Ｂｎ”””１　＋　ｂ２ｒ”・＋”’ｉ、”・・（２）
ｎは卑語を識別するための添字である。Bn"""1 + b2r"・+"'i,"...(2)
n is a subscript for identifying a vulgar word.

ｋを連ｉＦ＊単語に含まれる単語数として最大問題Ｔ”
＝　（ｍ（ｋ）〔Ｓ　（Ａ　、　Ｂ　ｎ（ＩＪ＠Ｂｎ（
２）（９、、、＄Ｂｎ（ｋ））〕〕　・・−・・（３）を計算し、最適なパラメータ（単語名）ｎ（ｋｌ＝ｎ（
ｋ）（ｋ＝１　、２　、＋＋、　Ｋ）ｒ求め、同時に区
分点１（ｋ１点を求める。ここでｅはバタンの接続を表
わす演算子である。９１１えばＢｎ■ＢｍはＢｎｃｉＢ
ｍ＝ｂｒＸ、　ｂｎ　、　・、、、、・ｂ５　、　ｂｒ
９．　、　Ｂ　Ｔ。Maximum problem T” where k is the number of words included in the series iF*words
= (m(k) [S (A, B n(IJ@Bn(
2) (9,,, $Bn(k))]] ...-... (3) Calculate the optimal parameter (word name) n (kl=n(
k) (k=1, 2, ++, K) Find r, and at the same time find division point 1 (k1 point.Here, e is an operator representing the connection of the batan.911For example, Bn■Bm is BnciB
m=brX, bn, ・,,,,・b5, br
9. , B.T.

ｂｍｍ・・・・・・（４）ｍ（３）式の最大化ｉｋおよびｎ　（ｋ）に関する総当り
法で計算すると膨大な計算量が必要となるが、引用文献
と同様に（３）式の最大化計算を単語単位での処理と全
体としての処理の２段階に分割することで実用的な処理
速度を可能とする。すなわち、（１）式で表わされる入
カパタ／Ａのｉ＝ｚ＋ｘより　ｉ　＝　ｍまでの部分区
間として部分バタンＡ　（１、ｒｒ＋　）金だ義する。bmm・・・・・・(4) m Calculating the maximum ik and n (k) of equation (3) using the brute force method requires a huge amount of calculation, but as in the cited document, equation (3) Practical processing speed can be achieved by dividing the maximization calculation into two stages: word-by-word processing and overall processing. That is, the partial pattern A (1, rr+) is defined as the partial interval from i=z+x to i=m of the input pattern /A expressed by equation (1).

Ａ　（Ｌ　、　ｍ　）＝ａｔ＋１　、　ａｔ＋２、−・
−、ａｍ以下では、ｔ’ｃ始点、ｍを終点と称する。い
ま人力バタンＡに（Ｋ−１）個の区分点ｚ（ｉ）　、　
ｔ＜２）　。A (L, m)=at+1, at+2, -・
-, am and below, t'c is referred to as the starting point and m is referred to as the ending point. Now, there are (K-1) segmentation points z(i) in the manual slam A,
t<2).

・・・ｔ（ｋ）・・・、ｔ（ＫＪ。...t(k)...,t(KJ.

１　＜　ｔ（ＩＫ　ｔ（２Ｋ・・・・・・＜ｔ　（ｋ−
１）＜ｔ（ｋ）＝　ｌを仮足して、入カバタンＡＲＫ個
の部分バタンに分割する。1 < t(IK t(2K...<t (k-
1) Temporarily add <t(k)=l and divide into ARK partial patterns.

Ａ＝Ａ　（１、４１７）（９Ａ（ｔ（１１，ｔ（２））
ｅ・ｅ９ｃ　ｔｃ　ｋ−１）、ｔ（ｋ））ｅ−・・−Ｏ
Ａ　（ｔ（Ｋ−１）　、　Ｉ　）・（５）一方、バタン
間の時間軸正規化類似度を足義すると、類似度Ｓ（Ａ、
Ｂ）はパタンの接続分解に関して次の性質を有する。A=A (1, 417) (9A(t(11, t(2))
e・e9c tc k-1), t(k))e-...-O
A (t(K-1), I)・(5) On the other hand, if we define the time-axis normalized similarity between the batons, we get the similarity S(A,
B) has the following properties regarding pattern connection decomposition.

Ｓ　（Ａ　、　Ｂｎ＄Ｂ”　）＝ｍａｘ［：５（Ａ（１
，ｔ）　、Ｂｎ）（ＥＪ５（Ａｌ、Ｉ）、Ｂｍ）：］（
６）（３）式に（５）式を代入し、さらに（６）式の関係を
繰返しく７）となり、（７）式の最大化問題は次のように分解して計
算することができる。S (A, Bn$B”)=max[:5(A(1
,t) ,Bn)(EJ5(Al,I),Bm):](
6) Substituting equation (5) into equation (3) and repeating the relationship in equation (6), we get 7) The maximization problem of equation (7) can be decomposed and calculated as follows. .

〔１〕類似度　５（Ａ（ｔ、ｍ）　、Ｂ”）　　　　　
　（８）をすべてのｔ　（ｍなる部分区間Ａ（７，ｍ）と孤立単語パタンＢＨの組合せに関して算
出する。[1] Similarity 5 (A(t, m), B”)
(8) is calculated for all combinations of the partial interval A(7, m) of t(m) and the isolated word pattern BH.

〔２〕部分類似度Ｓ（ｔ、ｍ）＝ｍａｘ（Ｓ（Ａ（Ａ、ｍ）、Ｂ”））　
　（９１部分判定結果へ（１０）を計算し、テーブルに記憶する。ここにａｒｇｍａｘ〔
−〕　　なる記号は〔〕の最大を与える変数ｎを算出す
ること全意味する。[2] Partial similarity S (t, m) = max (S (A (A, m), B”))
(To 91 partial judgment result (10) Calculate and store in the table. Here argmax [
−] means to calculate the variable n that gives the maximum of [].

なる最大問題を計算し、最適なパラメータ（区分点）泳
）＝泳）、に＝１，２置訊を求める。　−αυ式の最大
問題は次の漸化式により計算できる。　　。Calculate the maximum problem where The maximum problem of −αυ formula can be calculated using the following recurrence formula. .

初期値Ｔ’（４＝０　、７＝１　、２　、・・・、ＩＫ
＝１．２．・・・、に漸化式ｍ−１ｚ　２　＋”’＋　
Ｉ　ｒ　ｋ”’１　ｐ　２　＊”’に仮区分点仮判定結果Ｎｋｔｎ＋＝　Ｎ　＜　Ｌｋ（ｒ＃、　ｍ　＞　　　　
　　　　　（１４１ａ急、　（１３１、α４式の計算は
に、ｍに関して増加する方　。Initial value T' (4=0, 7=1, 2,..., IK
=1.2. ..., the recurrence formula m-1z 2 +"'+
Temporary dividing point provisional judgment result Nktn+= N <Lk(r#, m >
(141a sudden, (131, calculation of the α4 formula increases with respect to m).

向に計算する。以上の処理が終了すると、　Ｑ：１式の
び（ホ）から区分点Ｌ　（ｘ）が次のように決定される
。Calculate in the direction. When the above processing is completed, the division point L (x) is determined from the Q:1 equation expansion (e) as follows.

ｔ　（Ｋ−１）−ＬＫ（Ｉ）より順次逆登って仮区分点
として、仮区分点律（ホ）のテーブルを参照して求め、
それに従って、判定結果ｎ（ｋ）が、α４式の仮判定結
果よりｎ（ｋｌ−Ｎｋｉ（ｋ））、（ｋ＝１．２．・、Ｋ）　
　（１６１として参照することで得られる。t (K-1) - LK (I), ascending sequentially from
Accordingly, the judgment result n(k) is n(kl-Nki(k)), (k=1.2.., K) from the tentative judgment result of α4 formula.
(obtained by referring to it as 161).

以上の操作により、連続単語を構成する各単語の区分点
と単語名が７（ｋ）、　（ｋ＝１．２．・・・、に−１
）。With the above operations, the segmentation point and word name of each word constituting the continuous word are set to 7(k), (k=1.2..., -1
).

ｎ（ｋ）、　（ｋ＝１　、２　、＋＋・、　Ｋ）として
決定される。n(k), (k=1, 2, ++., K).

第１図は、本発明によジ連続単語を認識する方法を示す
ものである。連続単語音声信号はマイクロホン１０よジ
入力され、分析部１１により分析され連続単語バタンＡ
として（１）式で示される特徴ベクトルａｔの時系列と
して入力バタンバッファ１２に記憶される。一方孤立単
語パタンＢｎは（２）式でボされるベクトルｂｊ　　の
時系列として標準バタン記憶部１３に記憶される。同時
に孤立単語バタンＢｎの正規された時間的音圧レベル値
Ｗｎ（ｊ）（ｊ−１・２・３・・・Ｊ）が音圧レベル記
憶部２２に記憶される。第１マツチング部１４では次式
で定義される漸化式を各孤立単語パタンＢｎとバタンＡ
の部分バタンＡ　（Ａ、ｍ）に関し入力バタンベクトル
ａｍが入力される毎に（８）式の類似度Ｓを算出する。FIG. 1 shows a method for recognizing consecutive words according to the present invention. The continuous word audio signal is input through the microphone 10, analyzed by the analysis section 11, and converted into continuous word bang A.
is stored in the input button buffer 12 as a time series of the feature vector at expressed by equation (1). On the other hand, the isolated word pattern Bn is stored in the standard button storage section 13 as a time series of vectors bj that are pressed using equation (2). At the same time, the normalized temporal sound pressure level value Wn(j) (j-1, 2, 3, . . . J) of the isolated word Bn is stored in the sound pressure level storage section 22. The first matching unit 14 uses the recurrence formula defined by the following formula to match each isolated word pattern Bn and the batten A.
The similarity S of equation (8) is calculated every time the input slam vector am is input for the partial bang A (A, m).

即ち初期条件Ｇ（ｉｌｊｎ）＝Ｓ（ａｍｌｂｊｎ）　ｉ＝ｍ
　（１７）１〜ｍＷ制約条件ｊ　＋ｍ−Ｊ　ｎ−ｒ≦ｉ≦ｊ＋ｍ−Ｊ”＋ｒ　　　　
鱈なる漸化式計算をｊ”＝ｊｎ＋　ｊｎ−１、ｊｎ−２
，・・・、１　の順序で実行し、類似度５（Ａ（ｔ、ｍ）、Ｂ”）＝ｇ（７＋１．１）（２ｆ）
ｆ　ｍ　−Ｊ　ｎ−ｒ≦ｔ≦ｍ−Ｊｎ＋ｒ　　　　　　
　　（２１１なる範囲で算出する。That is, the initial condition G(iljn)=S(amlbjn) i=m
(17) 1~m W Constraints j +m-J n-r≦i≦j+m-J”+r
Calculate the recurrence formula as follows: j”=jn+ jn-1, jn-2
,...,1, and the similarity is 5(A(t,m),B")=g(7+1.1)(2f)
f m −J n−r≦t≦m−Jn+r
(Calculated within a range of 211.

餞式のＷｎ（ｊ）は重み関数であり、音圧レベル記憶部
に記憶されている各孤立単語パタンの音圧レベル値を正
規化した時系列そのものであり、音圧レベル記憶部２２
で発生されて第１マツチング部１４へ送られる。Wn(j) in the equation is a weighting function, and is a time series itself in which the sound pressure level values of each isolated word pattern stored in the sound pressure level storage unit 22 are normalized.
is generated and sent to the first matching section 14.

上述の方法により結果として（９）式で示される部分類
似度ＳＣ１，ｍ）およびα０）式で示される部分判定結
果Ｎ（７，ｍ）ｔそれぞれ部分類但度記憶部１５２部分
判定結果記憶部１６に出力する。第２マツチング部１７
では、部分類似度記憶部１５より上記部分類似度ＳＣ１
，ｍ）を読み出し、同時に漸化式値記憶部１８から％　
ｔ＜ｍなるαり式の漸化式値Ｔ”’（ｔ）を、ｋを一定
として、読み出しながら漸化式値Ｔｋ（ホ）を算出し、
漸化式値記憶部１８に出力する。同様に仮置分点Ｌｋ（
ｒｌｌ）をα３）式を算出して、仮置分点記憶部１９に
出力する。仮判定結果Ｎｋ（ホ）はαω式にもとづいて
部分判定結果ＮＣ１゜ｍ）と、仮置分点Ｌｋに）を参照
して算出され、仮判定結果記憶部２０に出力される。第
２マツチング部１７では上記操作を単語数設定端子によ
り人力される値を基にに＝１から始め、ｋ＝Ｋｔで順次
ｋを増加させながら実行する。かくのごとく構成された
装置において単語系列の既知なる連続単語バタンＡの始
点ａ１から終点ａｆまでを順次入力させて上述の動作を
実行させることで、区分点に関する値Ｌｋ−ン　と単語
名を決定する値Ｎｋ（ホ）がすべてのｍ＝（１，２，・
・・■）ｋ＝（１，２，・・・。As a result of the above method, the partial similarity SC1,m) shown by equation (9) and the partial judgment result N(7,m)t shown by α0) are obtained in the partial similarity storage section 152 and the partial judgment result storage section, respectively. Output to 16. Second matching section 17
Then, the partial similarity SC1 is calculated from the partial similarity storage unit 15.
, m), and at the same time read % from the recurrence formula value storage unit 18.
Calculate the recurrence formula value Tk (e) while reading out the recurrence formula value T"'(t) of the α formula where t<m, with k being constant,
It is output to the recurrence formula value storage section 18. Similarly, the temporary equinox Lk (
rll) by α3) and outputs it to the temporary equinox storage unit 19. The provisional determination result Nk (E) is calculated based on the αω formula with reference to the partial determination result NC1°) and the provisional equinox Lk), and is output to the provisional determination result storage unit 20. The second matching section 17 executes the above operation based on the value input manually through the number of words setting terminal, starting from =1 and sequentially increasing k so that k=Kt. In the device configured as described above, by sequentially inputting the starting point a1 to the ending point af of the known continuous word stamp A of the word series and executing the above operation, the value Lk-n regarding the division point and the word name are determined. The value Nk (e) is all m = (1, 2, ·
・・■)k=(1, 2, ・・・.

ｋ）について得られる。判定部２１では、それぞれ仮置
分点記憶部１９内の仮置分点Ｌｋ（ＩＴｌ）と仮判定記
憶部２０内の仮判定結果Ｎｋ−とを参照してαω式に従
ってｋを１つづつデクリメントしなから１陳次ｔ（ｋ−
１）、ｔＣｋ−２）、・・・、Ａ（１）を決定する。同
様にして（１６）式に従って各単語名ｎ（ｋ　−以上連
続単語を認識する方法を説明した訳であるが０９式の制
約条件下で（４）式の類似度計算を実行する際に重み関
数Ｗ（ｊ）を第２図に示すような値に設定すると教区分
点近傍で生じたパターンの変動・変化の影響を小さくせ
しめることが可能となり、高精度で連続単語の認識を行
うことができる。k). The determination unit 21 decrements k by one according to the αω formula by referring to the temporary equinox Lk (ITl) in the temporary equinox storage unit 19 and the temporary judgment result Nk− in the temporary judgment storage unit 20, respectively. Shinaka 1 Chenji t(k-
1), tCk-2), ..., A(1) are determined. Similarly, we explained a method for recognizing consecutive words of each word name n (k - or more) according to formula (16). By setting the function W(j) to a value as shown in Figure 2, it is possible to reduce the influence of fluctuations and changes in patterns that occur near the division points, and it is possible to recognize continuous words with high accuracy. can.

以上連続単語を認識する方法を説明した訳であるがαω
式の制約下で第２図に示すような値の重み関数Ｗｎ（ｊ
）で■式の類似度計算を実行する事によｐ１仮区分点近
傍で生じたパターンの変動・変化の影響を小さくせしめ
、同時にＷｎ（ｉ）の極大値近傍、つまりその孤立単語
を特徴づける周波数を持つ区間に重みが付けられバタン
マツチングが行なわれるため、高精度で連続単語の認識
を行うことができる。The above is an explanation of how to recognize continuous words, but αω
Under the constraints of Eq., the weighting function Wn(j
), by executing the similarity calculation of the formula ■, we can reduce the influence of fluctuations and changes in the pattern that occur near the p1 temporary dividing point, and at the same time characterize the vicinity of the maximum value of Wn(i), that is, the isolated word. Since weights are applied to sections with frequencies and bang matching is performed, consecutive words can be recognized with high accuracy.

以上本発明の実施態様を説明したが、これらの記載は本
発明の範囲を限定するものではない。例えば本明細書で
は類似度を基にして動作を説明したが、距離のように大
小関係が逆の尺度によっても同様な処理が可能である。Although the embodiments of the present invention have been described above, these descriptions do not limit the scope of the present invention. For example, in this specification, the operation has been described based on the degree of similarity, but similar processing is possible using a measure in which the magnitude relationship is reversed, such as distance.

また抽出する部分を単語として説明したが複数の音節か
らなる語句でも同様に処理することができる。さらに、
入力音声バタンと標準バタンとの類似度を動的計画法で
飲明したが、動的計画法に限定するものではない。Furthermore, although the extracted portion has been described as a word, words or phrases consisting of multiple syllables can be processed in the same way. moreover,
Although the degree of similarity between the input voice button and the standard button was determined using dynamic programming, the method is not limited to dynamic programming.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明による連続単語認識を実施する装置の構
成図例、第２図は音圧レベル記憶部２２に記憶されてい
る任意の単語の音圧レベル値を正規化して作られた重み
関数Ｗ（ｊ）の−例を示す。図において、１０・・・・・・マイクロホン、１１・・
・・・・分析Ｈｆｓ、１２・・・・・・入力バタンバッ
フ７．１３・・・・・・標準バタン記憶部、１４・・・
・・・第１マツチング部、１５・・・・・・部分類似度
記憶部、１６・・・・・・部分判定結果記憶部、１７・
・・・・・第２マツチング部、１８・・・・・・漸化式
値記憶部、１９・・・・・・仮置分点記憶部、２０・・
・・・・仮判定結果記憶部、２１・・・・・・判定部、
２２・・・・・・音圧レベル記憶部。FIG. 1 is an example of a block diagram of a device that performs continuous word recognition according to the present invention, and FIG. 2 is a weight created by normalizing the sound pressure level value of an arbitrary word stored in the sound pressure level storage unit 22. An example of the function W(j) is shown. In the figure, 10...microphone, 11...
...Analysis Hfs, 12...Input button buffer 7.13...Standard button storage section, 14...
. . . first matching section, 15 . . . partial similarity storage section, 16 . . . partial judgment result storage section, 17.
... Second matching section, 18 ... Recurrence formula value storage section, 19 ... Temporary equinox storage section, 20 ...
...Temporary judgment result storage section, 21... Judgment section,
22...Sound pressure level storage section.

Claims

【特許請求の範囲】[Claims]

単語毎に区切って発声された音声パタ／（孤立単語）を
用いて連続して発声された連続単語バタ／に対し、孤立
単語パタンをある順列で組み合せたパタンと連続単語バ
タンとの間で比較操作を行ない、両者の一致の度合を調
べ最大一致の得られるパタ／の組合せを決定することに
より連続単語音声を認識する装置において、組合される
孤立単語それぞれが持つ時間−音圧レベル情報に従って
前述の比較操作の際重みを変化させることを特徴とする
連続単語音声認識装置。A comparison is made between a pattern in which isolated word patterns are combined in a certain permutation and a continuous word bat, compared to a continuous word bat that is uttered consecutively using sound patterns that are uttered by dividing each word (isolated words). In a device that recognizes continuous word speech by checking the degree of matching between the two and determining the combination of patterns/patterns that yields the maximum match, the above-mentioned method is used to recognize continuous word speech according to the time-sound pressure level information of each isolated word to be combined. A continuous word speech recognition device characterized by changing weights during a comparison operation.