JPS625299A

JPS625299A - Voice recognition equipment

Info

Publication number: JPS625299A
Application number: JP14337585A
Authority: JP
Inventors: 納田　重利
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1985-06-29
Filing date: 1985-06-29
Publication date: 1987-01-12

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、例えば特定話者の音声を単語単位で認識す
るのに適用される音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device that is applied to, for example, recognizing the speech of a specific speaker word by word.

〔発明の概要〕[Summary of the invention]

この発明は、接話型マイクロホン等を入力部とする音声
認識装置のパターンマツチング判定器において、登録特
徴データブロックと人力特徴データブロックとの間で対
応するフレームのフレーム間距離に入カスベクトルデー
タの周波数軸方向の変化量に対応した重みと登録スペク
トルデータの周波数軸方向の変化量に対応した重みを用
いて求められた重み係数を乗じてスペクトルデータ列の
特徴を強く引き出した形でマツチング距離を算出してマ
ツチング判定を行うことにより、メモリに類似した登録
スペクトルデータ列が数多く登録されている場合におい
ても、認識率が低下することがないようにしたものであ
る。In a pattern matching determiner of a speech recognition device that uses a close-talking microphone or the like as an input section, the present invention provides input vector data that determines the interframe distance of corresponding frames between a registered feature data block and a human feature data block. The matching distance is calculated by multiplying the weighting coefficient obtained by using the weight corresponding to the amount of change in the frequency axis direction of the registered spectrum data and the weight corresponding to the amount of change in the frequency axis direction of the registered spectrum data to strongly bring out the characteristics of the spectral data string. By calculating and performing a matching judgment, the recognition rate is prevented from decreasing even when a large number of similar registered spectrum data sequences are registered in the memory.

〔従来の技術〕[Conventional technology]

本願出願人により、先に提案されている音声認識装置（
特願昭５９−１０６１７７）は、音声入力部としてのマ
イクロホン、前処理回路、音響分析器、特徴データ抽出
器、登録パターンメモリ及びパターンマツチング判定器
等により構成されている。The speech recognition device previously proposed by the applicant (
Japanese Patent Application No. 59-106177) is comprised of a microphone as a voice input section, a preprocessing circuit, an acoustic analyzer, a feature data extractor, a registered pattern memory, a pattern matching determiner, and the like.

マイクロホンから入力される音声信号が前処理回路にお
いて、音声認識に必要とされる帯域に制限され、Ａ／Ｄ
変換器によりディジタル音声信号とされる。このディジ
タル音声信号が音響分析器に供給される。The audio signal input from the microphone is limited to the band required for speech recognition in the preprocessing circuit, and then the A/D
A converter converts it into a digital audio signal. This digital audio signal is fed to an acoustic analyzer.

音響分析器において、音声信号が周波数スペクトルに変
換され、例えば対数軸上で一定間隔となるように周波数
スペクトルのレベルが正規化され、離散的な周波数スペ
クトルデータが発生される。In the acoustic analyzer, the audio signal is converted into a frequency spectrum, and the levels of the frequency spectrum are normalized, for example, at regular intervals on a logarithmic axis, to generate discrete frequency spectrum data.

この周波数スペクトルデータ列が単位時間（フレーム周
期）毎に１つのフレームデータとして出力される。即ち
、フレーム周期毎の１フレームのデータが、例えばＮチ
ャンネルの周波数スペクトルデータとされ、Ｎ次元ベク
トルにより表現されるパラメータとして切り出され、特
徴データ抽出器に供給される。This frequency spectrum data string is output as one frame data every unit time (frame period). That is, one frame of data for each frame period is, for example, N-channel frequency spectrum data, extracted as a parameter expressed by an N-dimensional vector, and supplied to the feature data extractor.

特徴データ抽出器において、隣り合うフレームデータの
距離が計算される。対応するチャンネルのスペクトルデ
ータの差の絶対値が夫々求められ、その総和がフレーム
間距離とされる。In the feature data extractor, the distance between adjacent frame data is calculated. The absolute values of the differences between the spectral data of the corresponding channels are determined, and the sum thereof is taken as the interframe distance.

更に、夫々のフレーム間距離の総和が求められ、音声信
号の始端フレームから終端フレームまでのＮ次元ベクト
ルの軌跡長が求められる。そして最も語数が多く長い音
声の場合に特徴を抽出するのに必要な所定の分割数でも
って軌跡長が等分割され、その分割点に対応したフレー
ムデータのみが特徴データとして抽出され、話者の音声
の発生速度変動に影響されることがないように時間軸が
正規化されて出力される。Furthermore, the sum of the distances between each frame is determined, and the trajectory length of the N-dimensional vector from the start frame to the end frame of the audio signal is determined. Then, in the case of the longest speech with the largest number of words, the trajectory length is divided equally by a predetermined number of divisions necessary to extract features, and only the frame data corresponding to the division points are extracted as feature data, and the speaker's The time axis is normalized and output so as not to be affected by variations in the rate of sound generation.

この特徴データが登録時においては、登録パターンメモ
リに登録特徴データブロック（標準パターン）として記
憶される。！！！識時においては、入力音声信号が前述
した処理を経て、入力特徴データブロックとされ、パタ
ーンマツチング判定器に供給され、入力特徴データブロ
ックと登録特徴データブロックとの間のパターンマツチ
ングが行われる。When this feature data is registered, it is stored in the registered pattern memory as a registered feature data block (standard pattern). ! ! ! At the time of recognition, the input audio signal undergoes the above-mentioned processing and is made into an input feature data block, which is supplied to a pattern matching determiner, and pattern matching is performed between the input feature data block and the registered feature data block. .

パターンマツチング判定器において、入力特徴データブ
ロックと登録特徴データブロックの間のマツチング距離
が算出される０例えば、特徴データ抽出器において１個
のフレームデータが抽出され、θ〜（１−１）フレーム
により特徴データブロックが構成される。登録特徴デー
タブロックを構成するフレームデータと入力特徴データ
ブロックを構成するフレームデータとの間で対応するフ
レーム間の距離が計算される。In the pattern matching judger, the matching distance between the input feature data block and the registered feature data block is calculated. For example, one frame data is extracted in the feature data extractor and θ~(1-1) frames A feature data block is constructed. A distance between corresponding frames is calculated between the frame data forming the registered feature data block and the frame data forming the input feature data block.

例えば、１番目のフレーム間距離り、は、ｎをチャンネ
ル番号とし、入力特徴データブロックのスペクトルデー
タをＳ　ｉｎとし、登録特徴データブロックのスペクト
ルデータをＲｉ　ｎとすると、次式のように絶対値距離
として算出される。For example, the first interframe distance is the absolute value as shown in the following equation, where n is the channel number, the spectrum data of the input feature data block is S in, and the spectrum data of the registered feature data block is Rin. Calculated as distance.

そして全ての対応するフレームに関してフレーム間距離
り五が求められ、更にこのフレーム間距離り、の（ｉ−
１−１）までの総和即ち、マツチング距離が求められる
。他の登録特徴チータブロックに関しても、同様にマツ
チング距離が求められ、マツチング距離が最小で十分に
距離が近いものと判断される登録特徴データブロックに
対応する単語が認識結果として出力される。Then, the inter-frame distance 5 is calculated for all corresponding frames, and this inter-frame distance is (i−
1-1), that is, the matching distance is determined. Matching distances are similarly determined for other registered feature cheater blocks, and words corresponding to registered feature data blocks for which the matching distance is the minimum and are determined to be sufficiently close are output as recognition results.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

従来の音声認識装置のパターンマツチング判定器におけ
るフレーム間距離の計算処理は、前述したように対応す
るフレーム及びチャンネルのスペクトルデータの差の絶
対値の総和として算出される。The interframe distance calculation process in the pattern matching determiner of the conventional speech recognition device is performed as the sum of the absolute values of the differences between the spectral data of corresponding frames and channels, as described above.

例えば、第５図Ａ、第５図Ｂ及び第５図Ｃに示すような
０チヤンネル〜９チヤンネルの１０個のスペクトルデー
タにより夫々構成される３個のフレームがある場合、第
５図Ａの各チャンネルのスペクトルデータの大きさが（
１０，１４，１２゜９、　６．　８．　１０．９．　９
．　８）で示されるフレームＡと第５図Ｃの各チャンネ
ルのスペクトルデータの大きさが（９，１３，１０，１
３，８，６゜１６．１０，９．９）で示されるフレーム
Ｃとのフレーム間距離ＤＡは、ＤＡ　＝１＋１＋２＋４＋２＋２＋６＋１＋０＋１＝２
０となる。また、第５図Ｂの各チャンネルのスペクトルデ
ータの大きさが（１０，１５，１２，９゜６．８，１５
．８．７．７）で示されるフレームＢ−と第５図Ｃに示
すフレームＣとのフレーム間距離Ｄ３は、ＤＩ　＝１＋２＋２＋４＋２＋２＋１＋２＋２＋２＝２
０となり、フレーム間距離からは、フレームＣに対してフ
レームＡ及びフレームＢが同等に類似していると判断さ
れる。For example, if there are three frames each composed of 10 spectral data from channels 0 to 9 as shown in FIG. 5A, FIG. 5B, and FIG. 5C, each frame in FIG. The size of the spectral data of the channel is (
10, 14, 12°9, 6. 8. 10.9. 9
．． The size of the spectrum data of each channel of frame A shown in 8) and FIG. 5C is (9, 13, 10, 1
The inter-frame distance DA from frame C indicated by 3, 8, 6° 16.10, 9.9) is DA = 1 + 1 + 2 + 4 + 2 + 2 + 6 + 1 + 0 + 1 = 2
It becomes 0. Also, the size of the spectrum data of each channel in Fig. 5B is (10, 15, 12, 9°6.8, 15
．． The inter-frame distance D3 between frame B- shown in 8.7.7) and frame C shown in FIG.
0, and from the interframe distance, it is determined that frames A and B are equally similar to frame C.

しかし、実際には、第５図Ａ、第５図Ｂ及び第５図Ｃか
ら明らかなようにフレームＡとフレームＣよりフレーム
ＢとフレームＣとの方がより類似したものであり、フレ
ーム間距離り、がフレーム開路ＭＤＡより小さなものと
して計算される必要があるにも係わらず、従来のフレー
ム間距離の計算処理では期待する計算結果が得られず、
認識率を低下させる原因となる。However, in reality, as is clear from FIGS. 5A, 5B, and 5C, frames B and C are more similar than frames A and C, and the interframe distance is Although it is necessary to calculate the frame distance as smaller than the frame open circuit MDA, the conventional interframe distance calculation process does not yield the expected calculation result.
This causes a decrease in the recognition rate.

従って、この発明の目的は、パターンマツチングの際に
、フレーム間距離に重み係数を乗じてスペクトルデータ
列の特徴を強く引き出してマツチング距離計算処理を行
うことにより、認識率の低下を防止することができる音
声認識装置を提供することにある。Therefore, an object of the present invention is to prevent a decrease in recognition rate by multiplying the inter-frame distance by a weighting coefficient to strongly bring out the features of the spectral data sequence and performing matching distance calculation processing during pattern matching. The purpose of this invention is to provide a speech recognition device that can perform the following tasks.

〔問題点を解決するための手段〕[Means for solving problems]

この発明は、入力音声信号をスペクトル変換等音声認識
に必要な前処理を行う音響分析手段５と、音響分析手段
５の出力データが供給され、出力データから特徴データ
を抽出する特徴データ抽出手段６と、特徴データが標準パターンとして記憶されるメモリ７と
、特徴データ抽出手段６からの入力パターンと、メモリ７
から読゛み出された標準パターンとが供給され、入力パ
ターンを構成する夫々のスペクトルデータＳ！ｆｉの周
波数軸方向の変化量に対応した値Ｗ３と標準パターンを
構成する夫々のスペクトルデータＲｉｎの周波数軸方向
の変化量に対応した値ＷｓＩを用いて重み係数Ｗｔを求
め、フレーム間距離に重み係数Ｗｓを乗じて入力パター
ンと標準パターンとの距離計算処理を行い、距離計算処
理の結果に基づいてマツチング判定するパターンマツチ
ング判定手段８と、からなることを特徴とする音声認識装置である。This invention comprises acoustic analysis means 5 which performs preprocessing necessary for speech recognition such as spectrum conversion on an input speech signal, and feature data extraction means 6 which is supplied with output data of the acoustic analysis means 5 and extracts feature data from the output data. , a memory 7 in which feature data is stored as a standard pattern, an input pattern from the feature data extraction means 6, and a memory 7
The standard pattern read out from the input pattern S! is supplied, and each spectrum data S! A weighting coefficient Wt is determined using a value W3 corresponding to the amount of change in the frequency axis direction of fi and a value WsI corresponding to the amount of change in the frequency axis direction of each spectrum data Rin constituting the standard pattern, and a weight is applied to the interframe distance. This is a speech recognition device characterized by comprising: a pattern matching determination means 8 which performs a distance calculation process between an input pattern and a standard pattern by multiplying it by a coefficient Ws, and determines matching based on the result of the distance calculation process.

〔作用〕パターンマツチング判定器８において、登録特徴データ
ブロックと入力特徴データブロックの間で対応するフレ
ーム毎に各々のスペクトルデータの隣り合うチャンネル
のスペクトルデータの差の絶対値が求められ、その差分
値が累算されて、入カスベクトルデータの重みＷｓ及び
登録スペクトルデータの重みＷＩＩが求められ、重みＷ
ｓと重みＷ真を用いて重み係数Ｗ五が求められ、登録特
徴データブロックと入力特徴データブロックの間で対応
するフレームのフレーム間距離に重み係数Ｗ直が乗ぜら
れ、乗算結果の総和によりマツチング距離が算出され、
このマツチング距離に基づいてマツチング判定が行われ
る。[Operation] In the pattern matching determiner 8, the absolute value of the difference between the spectral data of adjacent channels of each spectral data is determined for each corresponding frame between the registered feature data block and the input feature data block, and the difference is calculated. The values are accumulated to determine the weight Ws of the input waste vector data and the weight WII of the registered spectrum data, and the weight W
A weighting coefficient W is calculated using s and a weight W, and the inter-frame distance of the corresponding frame between the registered feature data block and the input feature data block is multiplied by a weighting coefficient W, and matching is performed using the sum of the multiplication results. The distance is calculated,
Matching determination is performed based on this matching distance.

〔実施例〕〔Example〕

以下、この発明の一実施例を図面を参照して説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は、この発明の一実施例を示すもので、第１図に
おいて、１が音声入力部としてのマイクロホンを示して
いる０例えばマイクロホン１には、周辺ノイズの少ない
接話型マイクロホンが用いられる。FIG. 1 shows an embodiment of the present invention. In FIG. 1, 1 indicates a microphone as an audio input section. For example, microphone 1 is a close-talk type microphone with less ambient noise. It will be done.

マイクロホン１からのアナログ音声信号がフィルタ２に
供給される。フィルタ２は、例えばカッ〜　　　トオフ
周波数が７．５　ＫＨ２のローパスフィルタであり、音
声信号がフィルタ２により、帯域制限され、この音声信
号がアンプ３を介してＡ／Ｄ変換器４に供給される。An analog audio signal from microphone 1 is supplied to filter 2 . The filter 2 is, for example, a low-pass filter with a cut-off frequency of 7.5 KH2, and the audio signal is band-limited by the filter 2, and this audio signal is supplied to the A/D converter 4 via the amplifier 3. .

Ａ／Ｄ変換器４は、例えば、サンプリング周波数１２．
５ＫＨ２の８ビツトＡ／Ｄ変換器であり、音声信号がＡ
／Ｄ変換器４において、アナログ−ディジタル変換され
て、８ビツトのディジタル信号とされ、音響分析器５に
供給される。For example, the A/D converter 4 has a sampling frequency of 12.
It is a 5KH2 8-bit A/D converter, and the audio signal is A/D converter.
/D converter 4 converts the signal into an 8-bit digital signal, which is then supplied to acoustic analyzer 5.

音響分析器５は、音声信号を周波数スペクトルに変換し
て、例えば、Ｎチャンネルのスペクトルデータ列を発生
するものである。音響分析器５において、音声信号が演
算処理により周波数スペクトルに変換され、例えば対数
軸上で一定間隔となるＮ個の周波数を代表値とするスペ
クトルデータ列が得られる。従って、音声信号がＮチャ
ンネルの離散的な周波数スペクトルの大きさによって表
現される。そして、単位時間（フレーム周期）毎にＮチ
ャンネルのスペクトルデータ列が一つのフレームデータ
として出力される。即ち、フレーム周期毎に音声信号が
Ｎ次元ベクトルにより表現されるパラメータとして切り
出され、特徴データ抽出器６に供給される。The acoustic analyzer 5 converts the audio signal into a frequency spectrum and generates, for example, an N-channel spectrum data string. In the acoustic analyzer 5, the audio signal is converted into a frequency spectrum through arithmetic processing, and a spectral data string whose representative values are, for example, N frequencies at regular intervals on the logarithmic axis is obtained. Therefore, the audio signal is expressed by the magnitude of the discrete frequency spectrum of N channels. Then, N-channel spectral data strings are output as one frame data every unit time (frame period). That is, the audio signal is extracted every frame period as a parameter expressed by an N-dimensional vector, and is supplied to the feature data extractor 6.

特徴データ抽出器６において、隣り合うフレームデータ
の距離が計算される。例えば、各チャンネルに関してス
ペクトルデータの差の絶対値が夫々求められ、その総和
がフレーム間距離とされる。The feature data extractor 6 calculates the distance between adjacent frame data. For example, the absolute value of the difference in spectral data for each channel is determined, and the sum of the values is determined as the interframe distance.

更に、フレーム間距離の総和が求められ、音声信号の始
端フレームから終端フレームまでのＮ次元ベクトルの軌
跡長が求められる。そして最も語数が多く長い音声の場
合に特徴を抽出するのに必要な所定の分割数でもって軌
跡長が等分割され、その分割点に対応したフレームデー
タのみが特徴データとして抽出され、話者の音声の発生
速度変動に影響されることがないように時間軸が正規化
されて出力される。Furthermore, the sum of the interframe distances is determined, and the trajectory length of the N-dimensional vector from the start frame to the end frame of the audio signal is determined. Then, in the case of the longest speech with the largest number of words, the trajectory length is divided equally by a predetermined number of divisions necessary to extract features, and only the frame data corresponding to the division points are extracted as feature data, and the speaker's The time axis is normalized and output so as not to be affected by variations in the rate of sound generation.

例えば、特徴データ抽出器６において、第２図に示すよ
うに１個のフレームデータが抽出され、各々が０〜（Ｎ
−１）チャンネルのデータにより構成されるフレームデ
ータがＯフレームへ（１−１）フレームまで抽出される
。For example, the feature data extractor 6 extracts one frame data as shown in FIG.
-1) Frame data constituted by channel data is extracted to O frames (1-1) frames.

この特徴データが登録時においては、登録パターンメモ
リ７に登録特徴データブロックとして記憶される。認識
時においては、入力音声信号が前述した処理を経て、入
力特徴データブロックとされ、パターンマツチング判定
器８に供給され、入力特徴データブロックと全ての登録
データブロックとの間においてパターンマツチングが行
われる。When this feature data is registered, it is stored in the registered pattern memory 7 as a registered feature data block. During recognition, the input audio signal undergoes the above-mentioned processing and is converted into an input feature data block, which is supplied to the pattern matching determiner 8, which performs pattern matching between the input feature data block and all registered data blocks. It will be done.

第３図は、パターンマツチング判定器８の一例を示し、
第３図に示すように、フレーム距離計算回路１０、重み
係数計算回路１１、乗算回路１２、マツチング距離計算
回路１３及び最小距離判定回路１４によりパターンマツ
チング判定器８が構成される。FIG. 3 shows an example of the pattern matching determiner 8,
As shown in FIG. 3, a frame distance calculation circuit 10, a weighting coefficient calculation circuit 11, a multiplication circuit 12, a matching distance calculation circuit 13, and a minimum distance determination circuit 14 constitute a pattern matching determiner 8.

特徴データ抽出器６から入力特徴データブロックがフレ
ーム距離計算回路１０及び重み係数計算回路１１に供給
されると共に、登録パターンメモリ７から比較の対象と
なる登録特徴データブロックがフレーム距離計算回路１
０及び重み係数計算回路１１に供給される。The input feature data block is supplied from the feature data extractor 6 to the frame distance calculation circuit 10 and the weighting coefficient calculation circuit 11, and the registered feature data block to be compared is supplied from the registered pattern memory 7 to the frame distance calculation circuit 1.
0 and is supplied to the weighting coefficient calculation circuit 11.

フレーム距離計算回路ＩＯにおいて、入力特徴データブ
ロックと登録特徴データブロックの間の対応するフレー
ム間の距離計算が行われる。計算処理により得られた入
力特徴データブロックと登録特徴データブロックの間の
対応するフレームのフレーム間距離データが乗算回路１
２に供給される。In the frame distance calculation circuit IO, distance calculation between corresponding frames between the input feature data block and the registered feature data block is performed. The inter-frame distance data of the corresponding frames between the input feature data block and the registered feature data block obtained by the calculation process are multiplied by the multiplication circuit 1.
2.

また、重み係数計算回路１１において、入力特徴データ
ブロックと登録特徴データブロックの間の対応するフレ
ームの夫々のスペクトルデータ列に関して隣接する２つ
のチャンネルのスペクトルデータの差が求められ、その
差分値が累算される。Further, in the weighting coefficient calculation circuit 11, the difference between the spectral data of two adjacent channels is calculated for each spectral data string of the corresponding frame between the input feature data block and the registered feature data block, and the difference value is accumulated. calculated.

得られた入カスベクトルデータの差分累計値と登録スペ
クトルデータの差分累計値とにより重み係数が算出され
、重み係数データが乗算回路１２に供給される。A weighting coefficient is calculated from the obtained cumulative difference value of the input waste vector data and the cumulative difference value of the registered spectrum data, and the weighting coefficient data is supplied to the multiplication circuit 12.

乗算回路１２により、フレーム間距離データと重み係数
データとの乗算がなされ、乗算回路１２からの重み係数
が乗じられたフレーム間距離データがフレーム間距離デ
ータとしてマツチング距離計算回路１３に供給される。The multiplication circuit 12 multiplies the interframe distance data by the weighting coefficient data, and the interframe distance data multiplied by the weighting coefficient from the multiplication circuit 12 is supplied to the matching distance calculation circuit 13 as interframe distance data.

マツチング距離計算回路１３において、順次供給される
フレーム間距離データが累算され、最大フレーム（１−
１）までのフレーム間距離データが累算されると、この
累算値がマツチング距離データとされ、マツチング距離
データが最小距離判定回路１４に供給される。同様に、
全ての登録特徴データブロックと入力特徴データブロッ
クとの間においてマツチング距離が算出されてマツチン
グ距離データが最小距離判定回路１４に供給される。In the matching distance calculation circuit 13, the sequentially supplied interframe distance data is accumulated and the maximum frame (1-
When the interframe distance data up to 1) is accumulated, this accumulated value is used as matching distance data, and the matching distance data is supplied to the minimum distance determination circuit 14. Similarly,
Matching distances are calculated between all registered feature data blocks and input feature data blocks, and the matching distance data is supplied to the minimum distance determination circuit 14.

最小距離判定回路１４は、マツチング距離が最小で十分
に距離が近いものと判断される登録特徴データブロック
に対応する単語を認識結果として出力する。The minimum distance determination circuit 14 outputs, as a recognition result, a word corresponding to a registered feature data block for which the matching distance is the minimum and the distance is determined to be sufficiently close.

上述のこの発明の一実施例におけるパターンマツチング
判定器８のフレーム距離計算回路１０゜重み係数計算回
路１１及び乗算回路１２の動作をフローチャートを参照
して説明する。The operations of the frame distance calculation circuit 10, the weighting coefficient calculation circuit 11, and the multiplication circuit 12 of the pattern matching determiner 8 in the above-mentioned embodiment of the present invention will be explained with reference to a flowchart.

登録パターンメモリ７から登録特徴データブロックがフ
レーム距離計算回路１０及び重み係数計算回路１１に供
給されると共に特徴データ抽出器６から入力特徴データ
ブロックがフレーム距離計算回路１０及び重み係数計算
回路１１に供給される０重み係数計算回路１□９おい７
、各、、′ｆ：）、ｍ徴データブロックのフレーム毎に
ステップ■〜■の処理が行われる。The registered feature data block is supplied from the registered pattern memory 7 to the frame distance calculation circuit 10 and the weighting coefficient calculation circuit 11, and the input feature data block is supplied from the feature data extractor 6 to the frame distance calculation circuit 10 and the weighting coefficient calculation circuit 11. 0 weighting coefficient calculation circuit 1□9 7
, each, ,'f:), Steps 1 to 2 are performed for each frame of m data blocks.

ｉ番目のフレームにおける０チャンネル〜Ｎ−１チヤン
ネルのＮ個のチャンネルにより構成される入カスベクト
ルデータ列の重みＷｓがｎをチャンネル番号を示す変数
とすると、で求められ、隣り合うチャンネル間の入カスベクトルデ
ータの差の絶対値の総和が重みＷｓとされる（ステップ
■）。The weight Ws of the input waste vector data string composed of N channels from channel 0 to channel N-1 in the i-th frame is calculated as follows, where n is a variable indicating the channel number. The sum of the absolute values of the differences in the waste vector data is set as the weight Ws (step ■).

次に、登録スペクトルデー、夕刊の重みＲ３が同様に、で求められ、隣り合うチャンネル間の登録スペクトルデ
ータの差の絶対値の総和が重みＷｓｌとされる（ステッ
プ■）。Next, the weight R3 of the registered spectrum data and the evening edition is similarly determined as follows, and the sum of the absolute values of the differences in the registered spectrum data between adjacent channels is set as the weight Wsl (step 2).

そして、ステップ■において、重み係数Ｗ五かにより求
められる（ステップ■）。フレーム距離計算回路ＩＯ及
び乗算回路１２においてステップ■の処理が行われる。Then, in step (2), the weighting coefficient W5 is determined (step (2)). The process of step (2) is performed in the frame distance calculation circuit IO and the multiplication circuit 12.

ステップ■において、ｉ番目のフレーム間距離Ｄｉが、で求められる。即ち、各フレームの対応するチャンネル
間の絶対距離が計算され、０チヤンネル〜（Ｎ−１）チ
ャンネルまでの総和に重み係数Ｗ。In step (2), the i-th inter-frame distance Di is found as follows. That is, the absolute distance between corresponding channels of each frame is calculated, and a weighting factor W is added to the sum total from channel 0 to channel (N-1).

が乗ぜられることにより、フレーム間距離り、が算出さ
れる。By multiplying by , the interframe distance is calculated.

尚、この発明は、ハードワイヤードの構成に限らず、マ
イクロコンピュータ又はマイクロプログラム方式を用い
てソフトウェアにより処理を行うようにしても良い。Note that the present invention is not limited to a hard-wired configuration, and processing may be performed by software using a microcomputer or a microprogram method.

〔発明の効果〕〔Effect of the invention〕

この発明では、パターンマツチング判定器において、登
録特徴データブロックと入力特徴データブロックの間で
フレーム毎に各々のスペクトルデータ列に対して隣り合
うチャンネルのスペクトルデータの差の絶対値が求めら
れ、その差分値が累算されて、入カスベクトルデータ列
の重みＷｓ及び登録スペクトルデータ列の重みＷｊＩが
求められ、重みＷ３と重みＷ８を用いて重み係数Ｗｓが
求められ、登録特徴データブロックと入力特徴データブ
ロックのフレーム間距離に重み定数Ｗｓが乗ぜられ、乗
算結果の総和によりマツチング距離が算出され、求めら
れたヤッチング距離に基づｌ、イてマツチング判定が行
われる。In this invention, the pattern matching determiner calculates the absolute value of the difference between the spectral data of adjacent channels for each spectral data string for each frame between the registered feature data block and the input feature data block, and calculates the absolute value of the difference between the spectral data of adjacent channels. The difference values are accumulated, the weight Ws of the input vector data string and the weight WjI of the registered spectrum data string are calculated, the weight coefficient Ws is calculated using the weight W3 and the weight W8, and the registered feature data block and the input feature are calculated. The interframe distance of the data block is multiplied by a weighting constant Ws, a matching distance is calculated by the sum of the multiplication results, and a matching judgment is performed based on the calculated matching distance.

例えば、第５図Ａ１第５図Ｂ及び第５図Ｃに示すような
０チヤンネル〜９チヤンネルの１０個のスペクトルデー
タにより夫々構成される３個のフレームがある場合、第
５図Ａの各チャンネルのスベクトルデータの大きさが（
１０，１４，１２゜９．　６．　８．　１０．９．　９
．　８）で示されるフレームＡの重みＷＡは、Ｗａ　”４＋２＋３＋３＋２＋２＋１＋Ｏ＋１となる、
また、第５図Ｂの各チャンネルのスペクトルデータの大
きさが（１０，１５，１２，９゜６．８，１５．８．７
．７）で示されるフレームＢの重みＷｓは、Ｗａ　−５＋３＋３＋３＋２＋７＋’？＋１＋０となり
、第５図Ｃの各チャンネルのスペクトルデータの大きさ
が（９，１３，１０，１３，８，６゜１６．１０，９．
９）で示されるフレームＣの重みＷｃは、Ｗｅ−４＋３＋３＋５＋２＋１０＋６＋１＋０菖３４となる、フレームＡとフレームＣとに関する重みとなり
、フレームＢとフレームＣとに関する重み係数Ｗ３は、となる。フレームＡとフレームＣとのフレーム開路ＮＤ
Ａは、ＤＡ　＝２０Ｘ２．４２−４８．４となり、フレ
ームＢとフレームＣとのフレーム開路ＭＤｗは、Ｄｓ　
”２０　Ｘ　２．０１　＝４０．２となり、フレームＣ
に対してフレームＢが類似していると判断される。For example, if there are three frames each composed of 10 spectrum data from channel 0 to channel 9 as shown in FIG. 5A, FIG. 5B, and FIG. 5C, each channel in FIG. The size of the vector data is (
10,14,12゜9. 6. 8. 10.9. 9
．． The weight WA of frame A shown in 8) is Wa ``4+2+3+3+2+2+1+O+1,
Also, the magnitude of the spectrum data of each channel in Fig. 5B is (10, 15, 12, 9°6.8, 15.8.7
．． 7) The weight Ws of frame B is Wa −5+3+3+3+2+7+'? +1+0, and the magnitude of the spectrum data of each channel in FIG.
The weight Wc of frame C shown in 9) is We-4+3+3+5+2+10+6+1+0, which is the weight for frame A and frame C, and the weight coefficient W3 for frame B and frame C is as follows. Frame open circuit ND between frame A and frame C
A is DA = 20X2.42-48.4, and the frame open circuit MDw between frame B and frame C is Ds
”20 x 2.01 = 40.2, frame C
It is determined that frame B is similar to the frame B.

上述の例から理解されるように、この発明に依れば、パ
ターンマツチングの際に、周波数軸方向のスペクトルの
変化量に対応した重み係数がフレーム間距離に乗ぜられ
ることにより、スペクトルデータ列の特徴が強く引き出
される。従って、登録パターンメモリに類似した登録ス
ペクトルデータ列が数多く登録されていても認識率が低
下しない。As can be understood from the above example, according to the present invention, during pattern matching, the inter-frame distance is multiplied by a weighting coefficient corresponding to the amount of change in the spectrum in the frequency axis direction, so that the spectral data string is characteristics are strongly brought out. Therefore, even if many similar registered spectrum data sequences are registered in the registered pattern memory, the recognition rate does not decrease.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図はこの発明の一実施例の全体構成を示すブロック
図、第２図はこの発明の一実施例における特徴データブ
ロックのデータ構成の説明に用いる路線図、第３図はこ
の発明の一実施例におけるパターンマツチング判定器の
ブロック図、第４図はこの発明の一実施例におけるパタ
ーンマツチング判定器のフレーム距離計算回路、重み係
数計算回路及び乗算回路の動作説明に用いるフローチャ
ート、第５図Ａ、第５図Ｂ及び第５図Ｃはスペクトルデ
ータの例を示す路線図である。図面における主要な符号の説明ｌ；マイクロホン、　　５：音響分析器、６：特徴デー
タ抽出器、７：登録パターンメモリ、１０：フレーム距
離計算回路、１１：重み係数計算回路、１２：乗算回路、１３：マツ
チング距離計算回路、１４：最小距離判定回路。Fig. 1 is a block diagram showing the overall configuration of an embodiment of the present invention, Fig. 2 is a route diagram used to explain the data structure of the feature data block in an embodiment of the invention, and Fig. 3 is a block diagram showing the overall configuration of an embodiment of the invention. FIG. 4 is a block diagram of a pattern matching determiner in an embodiment of the present invention; FIG. Figures A, 5B, and 5C are route maps showing examples of spectrum data. Explanation of main symbols in the drawings: microphone, 5: acoustic analyzer, 6: feature data extractor, 7: registered pattern memory, 10: frame distance calculation circuit, 11: weighting coefficient calculation circuit, 12: multiplication circuit, 13 : Matching distance calculation circuit, 14: Minimum distance determination circuit.

Claims

【特許請求の範囲】入力音声信号をスペクトル変換等音声認識に必要な前処
理を行う音響分析手段と、上記音響分析手段の出力データが供給され、上記出力デ
ータから特徴データを抽出する特徴データ抽出手段と、上記特徴データが標準パターンとして記憶されるメモリ
と、上記特徴データ抽出手段からの入力パターンと、上記メ
モリから読み出された上記標準パターンとが供給され、
上記入力パターンを構成する夫々のスペクトルデータＳ
＿ｉ＿ｎの周波数軸方向の変化量に対応した値Ｗ＿ｓと
上記標準パターンを構成する夫々のスペクトルデータＲ
＿ｉ＿ｎの周波数軸方向の変化量に対応した値Ｗ＿Ｒを
用いて重み係数Ｗ＿ｉを求めフレーム間距離に上記重み
係数Ｗ＿ｉを乗じて上記入力パターンと上記標準パター
ンとの距離計算処理を行い、上記距離計算処理の結果に
基づいてマッチング判定するパターンマッチング判定手
段と、からなることを特徴とする音声認識装置。[Scope of Claims] Acoustic analysis means that performs preprocessing necessary for speech recognition, such as spectrum conversion, on an input audio signal; and feature data extraction that is supplied with output data of the acoustic analysis means and extracts feature data from the output data. means, a memory in which the feature data is stored as a standard pattern, an input pattern from the feature data extraction means, and the standard pattern read from the memory;
Each spectrum data S forming the above input pattern
A value W_s corresponding to the amount of change in the frequency axis direction of _i_n and each spectrum data R forming the above standard pattern.
The weighting coefficient W_i is calculated using the value W_R corresponding to the amount of change in the frequency axis direction of _i_n, and the distance between the input pattern and the standard pattern is calculated by multiplying the inter-frame distance by the weighting coefficient W_i. A speech recognition device comprising: pattern matching determining means for determining matching based on processing results.