JPS6255699A

JPS6255699A - Voice recognition equipment

Info

Publication number: JPS6255699A
Application number: JP60194851A
Authority: JP
Inventors: 典正野村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-09-05
Filing date: 1985-09-05
Publication date: 1987-03-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、音声認識装置に関する。[Detailed description of the invention] [Technical field of invention] The present invention relates to a speech recognition device.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

人間の発声した音声を認識する装置は、入力音声が単語
音声の場合かなシのレベルまで認識が可能となってきて
いる。とくに発声を行う話者が特定話者の場合、語粟が
大きくない条件のもとて認識性能が実用可能なレベルに
達しているといえる。Devices that recognize human speech have become capable of recognizing up to the kana level when input speech is word speech. In particular, when the speaker making the utterance is a specific speaker, it can be said that the recognition performance has reached a practical level under the condition that the words are not large.

ところが特定話者単語音声の認識においても、語食が大
きくなると、単語辞書に登録されたすべての単語標準パ
ターンと入力単語音声パターンとを照合していたのでは
、照合に要する時間が犬きくなフ、シたがって認識結果
を得るまでに、時間がかかるという問題点が出てくる。However, even in the recognition of specific speaker word speech, when the word discrepancy becomes large, the time required to match the input word speech pattern with all the word standard patterns registered in the word dictionary becomes too long. Therefore, the problem arises that it takes time to obtain recognition results.

このため語粱が大きくなる場合、認識に要する時間とく
に照合処理に要する時間をいかに短縮するかが従来技術
の大きな問題点であった。Therefore, when the number of words becomes large, a major problem in the prior art is how to shorten the time required for recognition, especially the time required for verification processing.

〔発明の目的〕[Purpose of the invention]

本発明の目的は、単語音声の認識における照合処理に要
する時間を短縮し、認識速度の向上を可能にする音声認
識装置を提供することにろる。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech recognition device that can shorten the time required for matching processing in word speech recognition and improve recognition speed.

〔発明の概要〕[Summary of the invention]

本発明は、単語を構成している音韻の間の継続時間にお
ける関係を考慮することによって、照合すべき候補単語
の数を絞シ、照合処理時間を短縮し、音声認識の高速化
を可能にするものである。By considering the relationship in duration between the phonemes that make up a word, the present invention narrows down the number of candidate words to be matched, shortens the matching processing time, and enables faster speech recognition. It is something to do.

入力音声信号のパワー分析によシ単語音声区間を切出し
、その単語音声区間から無音区間を切出し、無音区間長
が顕著に長いかどうかを判定し、その判定結果にもとづ
いて認識候補単語の個数を絞フ、その個数を絞られた候
補単語について、単語標準パターンと、上記入力単語音
声のスペクトル分析から得られた入カバターンとのマツ
チングを行い、両パターンの間の距離を算出し、その結
果にもとづいて正解単語を判定するものである。A power analysis of the input speech signal is used to cut out a word speech section, a silent section is cut out from the word speech section, a judgment is made as to whether the length of the silent section is significantly long, and the number of recognition candidate words is determined based on the judgment result. For the candidate words whose number has been narrowed down, we match the word standard pattern with the input cover pattern obtained from the spectrum analysis of the input word speech, calculate the distance between both patterns, and use the results as Based on this, the correct word is determined.

〔発明の効果〕〔Effect of the invention〕

本発明は、単語音声の認識において、語粟が大きくなっ
た場合単語を構成している音韻の間の継続時間における
関係を考慮することによシ、照合する候補単語の数を減
らすことができ、このため認識処理の高速化に効果をも
つ。とくに、語粟が大きくなればなるほど、本発明の効
果が増大する。The present invention can reduce the number of candidate words to be matched by considering the relationship in duration between the phonemes that make up the word when the number of words becomes large in word speech recognition. , which has the effect of speeding up recognition processing. In particular, the larger the millet, the greater the effect of the present invention.

具体的には、日本語単語の場合、たとえば単語途中に促
音「つ」がらるとき促音部分は、単語音声区間中の無音
区間として検出されるが、この促音の継続時間は、促音
の直後のモー２（拍）でなく、促音から後方へ２つ目の
モー２の種類と強い関係があることが実験的に知られて
いる。すなわち、促音から後方へ２つ目のモーラが、独
立性の少ないモーラ（いわゆるモーラ音素、例撥音〔ん
〕、長音〔−〕など）で６るとき、促音の継続時間が著
しく長い。たとえば、単語「いつぼん（一本）」の場合
の促音「つ」と撥音「ん」の関係である。この事実を考
慮し、単語音声区間中の無声区間長が顕著に長いことが
検出された場合、認識対象の全候補単語の中から、促音
を含み、かつ促音の後方２つ目が撥音や長音など独立性
の少ないモーラをもつ、候補単語だけに絞り、入力単語
音声パターンと単語標準パターンとのマツチングは、こ
の絞った数の候補単語とだけ行えばよい。したがってマ
ツチングに要する演算時間を短縮することに対する効果
が大きい。Specifically, in the case of Japanese words, for example, when the consonant ``tsu'' appears in the middle of a word, the consonant part is detected as a silent section within the word's speech interval, but the duration of this consonant is the same as that immediately after the consonant. It is experimentally known that there is a strong relationship not with Mo2 (beat) but with the second type of Mo2 backward from the consonant. That is, when the second mora after a consonant is a mora with little independence (a so-called mora phoneme, such as a consonant [n], a long consonant [-], etc.), the duration of the consonant is significantly long. For example, in the case of the word ``Ippon'', there is a relationship between the consonant ``tsu'' and the percussion ``n''. Taking this fact into consideration, if it is detected that the length of the unvoiced section in the speech section of a word is noticeably long, we will select from all candidate words to be recognized if the word contains a consonant and the second after the consonant is a phlegmatic or long consonant. It is sufficient to narrow down the selection to only candidate words that have moras with little independence, such as, and match the input word speech pattern with the word standard pattern only with this narrowed number of candidate words. Therefore, the effect of reducing the computation time required for matching is significant.

本発明において、具体例として提示している促音〔り〕
と後方の音韻との関係を考慮した、候補単語の数の絞シ
ば、日本語単語で、数詞＋助数詞（例「いつげん（−班
）」「いりばつ（−発）」など）の場合に促音の出現頻
度が高く、このような単語音声の認識において、上記認
識速度の増大のみならず認識精度の向上にも効果をもつ
。In the present invention, the consonant [ri] presented as a specific example
Narrowing down the number of candidate words by taking into account the relationship between the word and the phonology behind it, and Japanese words that include a number + particle (e.g., ``Itsugen (-ban)'', ``Iribatsu (-release)'', etc.) Consonants appear frequently in words, and in the recognition of such word sounds, it is effective not only in increasing the recognition speed but also in improving recognition accuracy.

〔発明の実施例〕[Embodiments of the invention]

本発明の実施例を第１図に示す。 An embodiment of the invention is shown in FIG.

入力音声信号が単語区間切出し部１に送られると、音声
パワーの分析が行われ、音声パワーが閾値より一定時間
以上連続して大きい場合単語区間の始点とみなし、他方
閾値より一定時間以上連続して小さい場合、単語区間の
終点とみなすことによシ、単語音声区間を切フ出す。When the input audio signal is sent to the word section extraction unit 1, the audio power is analyzed, and if the audio power is continuously greater than the threshold for a certain period of time, it is regarded as the starting point of a word section; If the word interval is small, the end point of the word interval is regarded as the end point of the word interval, and the cutoff is determined.

つぎに切出された単語区間の音声信号が無音区間切出し
部２に送られると、単語区間中の無音区間を上記と同じ
く音声パワー分析にもとづいて検出する。ここで検出の
目標としている無音区間は、単語中の促音に相当する区
間でラフ、シたがって区間長も１モーラの継続時間程度
に長いものである。Next, when the audio signal of the extracted word section is sent to the silent section extraction section 2, the silent section in the word section is detected based on the audio power analysis as described above. The silent section targeted for detection here is a rough section corresponding to a consonant in a word, and therefore the section length is as long as the duration of one mora.

単語中に無音区間が検出された場合、無音区間長の判定
部３においてその無音区間の長さの評価を行い、無音区
間長が顕著に長いかどうかを判定する。たとえば、無音
区間長が１３０ｍ５’〜１５０ｍ５’程度でらる場合に
比較して、無音区間が２００ｍ５’前後である場合が１
）、後者の場合顕著に長いと言える。When a silent interval is detected in a word, the silent interval length determination unit 3 evaluates the length of the silent interval and determines whether the silent interval length is significantly long. For example, when the silent section length is around 130 m5' to 150 m5', the case where the silent section is around 200 m5' is 1
), the latter can be said to be significantly longer.

上記無音区間長の判定結果にもとづいて、候補単語の削
減部４では、無音区間長が顕著に長い場合、候補単語全
体の中から、促音を含み、かつ促音の後方２つ目が、撥
音、長音など独立性の少ない音韻でおる単語のみを選ぶ
ことにより、候補単語の数を絞る。たとえば、「いつげ
ん（−班）」、「いフばつ（−発）」、「はってん（入
点）」、「はってき（入部）」など、数詞＋助数詞から
成る多数の認識対象単語の中から、「いつげん」。Based on the determination result of the silent interval length, the candidate word reduction unit 4 determines if the silent interval length is significantly long, the candidate word contains a consonant, and the second after the consonant is a plosive, The number of candidate words is narrowed down by selecting only words with less independent phonemes, such as long sounds. For example, many recognition target words consisting of number words and particle words, such as ``itsugen (-ban)'', ``ifubatsu (-departure)'', ``hatten (entering point)'', and ``hatekki (entering part)'', etc. From inside, "Itsugen".

「はってん」など、促音の１つおいて後が撥音や長音で
ろる単語に絞る。他方無音区間長が顕著に長いものでな
い場合、候補単語全体の中から、促音を含み、かつ促音
の後方２つ目が一撥音、長音など独立性の少ない音韻で
ない単語のみを選ぶことによシ、候補単語の数を絞る。Focus on words that have one consonant sound and then a positive or long sound, such as "hatten." On the other hand, if the length of the silent interval is not particularly long, the system can be applied by selecting only non-phonological words that include a consonant and in which the second after the consonant is less independent, such as a first consonant or a long consonant, from among all candidate words. , narrow down the number of candidate words.

なおここでいり長音は、発音上の長音でらシ、表記上は
「う」または「い」などで示されることが多い（例「い
っぽう（−法）」）。Note that the long sound here is pronounced as a long sound, and is often written as ``u'' or ``i'' (for example, ``ippo (-ho)'').

つぎにスペクトル分析部５は、単語区間切出し部ｌから
送られてくる単語区間音声信号のスペクトル分析を行い
。入力単語の音声パダ−ンを得る。Next, the spectrum analysis section 5 performs spectrum analysis of the word section audio signal sent from the word section extraction section l. Obtain the phonetic pattern of the input word.

単語マツチング演算部６は、上記個数を絞られた候補単
語について、単語標準パターン辞書７から単、語標準パ
ターンをとシ出し、前記入カバぞ一ンとこの単語標準パ
ターンとのマツチング演算を行い、両パターンの間の距
離を求める。認識判定部８は、各候補単語について計算
された上記の距離にもとづいて、距離の最小のものを正
解単語として判定し、認識結果を出力する。The word matching operation unit 6 extracts a word standard pattern from the word standard pattern dictionary 7 for the candidate words whose number has been narrowed down, and performs a matching operation between each of the input covers and this word standard pattern. , find the distance between both patterns. The recognition determination unit 8 determines the word with the minimum distance as the correct word based on the distances calculated for each candidate word, and outputs the recognition result.

以上本発明の実施例の１つを単語音声の認識について示
したが、もちろん本発明は、単語音声のみに限らず、単
語が結合した文節、あるいは文をつくる場合についても
、それら文節音声、文音声の認識にも応用することが可
能である。One of the embodiments of the present invention has been described above regarding the recognition of word sounds, but the present invention is of course not limited to word sounds, but can also be applied to phrases or sentences in which words are combined. It can also be applied to speech recognition.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は、本発明の概略構成図である。１　単語区間切出し部２　無音区間切出し部３　無音区間の判定部４　候補単語の削減部５　スペクトル分析部６　単語マツチング演算部７　単語標準パターン辞書８　認識判定部代理人　弁理士　則　近　憲　佑同　　竹花喜久男 FIG. 1 is a schematic configuration diagram of the present invention. 1 Word section extraction part 2 Silent section extraction part 3 Silent section determination unit 4 Candidate word reduction part 5 Spectrum analysis section 6 Word matching calculation section 7. Word standard pattern dictionary 8 Recognition determination section Agent: Patent Attorney Noriyuki Chika Same Kikuo Takehana

Claims

【特許請求の範囲】[Claims]

入力音声から単語音声区間を切出し、その単語音声区間
の中から無音区間を切出す装置と、無音区間長が顕著に
長いかどうかを判定する装置と、上記無音区間長の判定
結果にもとづいて候補単語の個数を絞る装置と、前記単
語区間の音声をスペクトル分析する装置と、上記個数を
絞られた候補単語について、単語標準パターンと上記ス
ペクトル分析から得られた入力音声パターンとのマッチ
ング演算を行う装置と、マッチング演算の結果にもとづ
いて正解単語を判定する装置を持つことを特徴とする音
声認識装置。A device that extracts word speech sections from input speech and extracts silent sections from the word speech sections, a device that determines whether the silent section length is significantly long, and a device that determines whether or not the silent section length is significantly long. A device that narrows down the number of words, a device that spectrally analyzes the speech in the word section, and performs a matching operation between the word standard pattern and the input speech pattern obtained from the spectrum analysis for the candidate words that have been narrowed down. What is claimed is: 1. A speech recognition device comprising a device and a device for determining a correct word based on a result of a matching operation.