JPH06100919B2

JPH06100919B2 - Voice recognizer

Info

Publication number: JPH06100919B2
Application number: JP61156635A
Authority: JP
Inventors: 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-07-03
Filing date: 1986-07-03
Publication date: 1994-12-12
Anticipated expiration: 2009-12-12
Also published as: JPS6312000A

Description

【発明の詳細な説明】産業上の利用分野本発明は、認識候補音声を導き出す不特定話者用の音声
認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for an unspecified speaker that derives a recognition candidate voice.

従来の技術一般に、不特定話者用音声認識装置では、多人数の多数
の音声パタンをクラスタリング手法によりグループ分け
し、それらの代表パタンを標準パタンとして登録し、入
力音声パタンと辞書に蓄えられたすべての標準パタンと
の間で類似度を計算した後、類似度の最大となる標準パ
タンを認識候補音声とする方法が行なわれている。二つ
の音声パタンの類似度を計算するためには動的計画法
（ダイナミック−プログラミング法）を用いて、二つの
パタンの時間軸を非線形に伸縮するパタンマッチング
（以下、DPマッチングと記す。）が使用されている。特
に、単語音声認識装置では、このDPマッチング法により
高い認識率を得ている。（例えば、「ダイナミックプ
ログラミングオプティミゼイションフォスポーク
ンワードレコグニション」（H.Sakoe and S.Chiba,
“Dynamic programming optimization for sporken wor
d recognition",IEEE trans.Acoustic,Speech,Signal P
rocessing,Vol.ASSP−27pp.336−349,1979））発明が解決しようとする問題点しかしながら上記の音声認識装置では、話者の発声の仕
方，個人差及び音声区間検出の誤り等により語頭部ある
いは語尾部が欠落したパタンが入力された場合には、欠
落のないパタンとパタンマッチングを行うことになり類
似度が低くなり誤認識が生じやすくなるという問題点を
有していた。2. Description of the Related Art Generally, in a voice recognition device for unspecified speakers, a large number of voice patterns of a large number of people are divided into groups by a clustering method, and their representative patterns are registered as standard patterns and stored in an input voice pattern and a dictionary. A method is used in which after calculating the similarity with all the standard patterns, the standard pattern with the maximum similarity is used as the recognition candidate speech. In order to calculate the similarity between two speech patterns, dynamic programming (dynamic-programming method) is used, and pattern matching (hereinafter referred to as DP matching) that expands and contracts the time axes of the two patterns in a non-linear manner. It is used. In particular, in the word voice recognition device, a high recognition rate is obtained by this DP matching method. (For example, "Dynamic Programming Optimized Spoken Word Recognition" (H. Sakoe and S. Chiba,
“Dynamic programming optimization for sporken wor
d recognition ", IEEE trans.Acoustic, Speech, Signal P
rocessing, Vol.ASSP-27pp.336-349,1979)) Problems to be solved by the invention However, in the above-mentioned speech recognition device, the beginning of a word is affected by the way the speaker speaks, individual differences, and errors in detecting voice intervals. When a pattern with a missing part or word ending is input, pattern matching is performed with a pattern with no missing parts, resulting in a low degree of similarity and erroneous recognition.

例えば、「FUKUOKA（福岡）」と発声する場合を考える
と語頭部のFUの部分は発声の仕方，個人差等により有声
化したり無声化したりする。無声化した場合にはFUの部
分のエネルギー値は非常に小さくなり、主に音声のエネ
ルギー値系列により音声区間を検出する音声認識装置で
は、誤ってFUの部分が欠落した「KUOKA」の区間だけを
音声区間として検出する可能性が高くなる。そのため標
準パタンの「FUKUOKA」とのパタンマッチングを行って
もその類似度が低くなり誤認識が生じやすくなる。この
ように従来の音声認識装置では音声区間の検出を誤った
場合に、いかに認識率の低下を防ぐかが問題であった。For example, considering the case of uttering "FUKUOKA", the FU portion of the word head becomes voiced or unvoiced depending on the way of uttering, individual differences, and the like. When devoiced, the energy value of the FU part becomes very small, and in a voice recognition device that mainly detects the voice section by the energy value sequence of the voice, only the "KUOKA" section in which the FU section was accidentally omitted. Is more likely to be detected as a voice section. Therefore, even if pattern matching with the standard pattern "FUKUOKA" is performed, the degree of similarity is low and misrecognition is likely to occur. As described above, the conventional voice recognition device has a problem how to prevent the reduction of the recognition rate when the voice section is erroneously detected.

本発明は上記問題点に鑑み、発声の仕方により語頭部，
語尾部の欠落の可能性のあるパタンについて、音声区間
の検出を誤った場合でも精度良く認識することのできる
音声認識装置を提供するものである。In view of the above problems, the present invention provides a word head,
Provided is a voice recognition device capable of accurately recognizing a pattern in which a word tail portion may be missing even if a voice section is erroneously detected.

問題点を解決するための手段上記目的を達するために本発明の音声認識装置は、入力
音声のエネルギー系列から音声区間を検出する音声区間
検出手段と、多人数の多数の音声パタンから代表的なパ
タンを認識対象音声ごとに複数個ずつ選び出し、それら
を標準パタンとして決定する標準パタン決定手段と、標
準パタンの記憶されているアドレス及びパタン長を管理
する標準パタン管理手段と、発声の仕方，個人差により
語頭部，語尾部の欠落する可能性のあるパタンを欠落の
ない標準パタンの一部分として管理する部分パタン管理
手段と、入力音声と前記標準パタン管理手段と部分パタ
ン管理手段により管理された各パタンとの間でパタンマ
ッチングを行い、類似度の最大となるパタンを認識候補
音声とするパタンマッチング手段を備えたものである。Means for Solving the Problems In order to achieve the above object, the speech recognition apparatus of the present invention is a typical speech segment detection unit that detects a speech segment from an energy sequence of an input speech, and a large number of speech patterns of a large number of people. Standard pattern determining means for selecting a plurality of patterns for each speech to be recognized and determining them as standard patterns, standard pattern managing means for managing the address and pattern length in which the standard patterns are stored, utterance method, individual Partial pattern management means that manages patterns that may lack word heads and word tails due to differences as part of a standard pattern that does not lack, and input voice, the standard pattern management means, and partial pattern management means. A pattern matching unit that performs pattern matching with each pattern and uses the pattern with the maximum similarity as the recognition candidate speech is also provided. Of.

作用本発明は上記に述べた構成によって、あらかじめ語頭
部，語尾部の欠落の可能性のあるパタンについて、欠落
の生じたパタンを欠落のない標準パタンの一部分として
管理し、欠落のない標準パタン及び欠落の生じた代表パ
タンの部分パタンと入力音声との間でパタンマッチング
を行い認識候補音声を導き出すことにより、語頭部，語
尾部の検出の難しいパタンについて音声区間検出を誤っ
た場合にも精度良く認識する事ができる。また、欠落の
あるパタンを欠落のない標準パタンの一部分として管理
することにより標準パタンのメモリ容量が増加すること
を防止する事ができる。Effect With the above-described configuration, the present invention manages a pattern having a possibility of missing a word head portion and a word tail portion in advance as a part of a standard pattern having no missing part, and a standard pattern having no missing part. Also, by performing pattern matching between the partial pattern of the representative pattern in which the dropout occurs and the input voice and deriving a recognition candidate voice, even when the voice section detection is erroneous for a pattern in which it is difficult to detect the word head and the word tail. Can recognize with high accuracy. Further, it is possible to prevent the memory capacity of the standard pattern from increasing by managing the missing pattern as a part of the non-missing standard pattern.

実施例以下本発明の一実施例の音声認識装置について、図面を
参照しながら説明する。Embodiment A voice recognition device according to an embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例における音声認識装置のブロ
ック図である。第１図において、１は音声入力部で、話
者の音声がマイクロホン等を通して入力される。２は音
声分析手段で、入力された音声信号から特徴ベクトルの
時系列及びエネルギー系列を抽出する。３は音声区間検
出手段で、音声のエネルギー系列から音声区間部分を検
出する。４は標準パタン決定手段で、多人数の多数の音
声パタンを分析し、それらの代表パタンを標準パタンと
して決定する。５は各標準パタンのメモリ位置，パタン
長を管理する標準パタン管理手段、６は語頭部，語尾部
の欠落したパタンを標準パタン管理手段５で管理されて
いる標準パタンの一部分として管理する部分パタン管理
手段、７は入力パタンと各標準パタン及び各部分パタン
との間でパタンルッチングを行うパタンマッチング手
段、８はパタンマッチング手段７の結果から導き出した
認識候補音声を音声合成等により話者に知らせる認識結
果出力部である。FIG. 1 is a block diagram of a voice recognition device in one embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a voice input unit, which inputs a voice of a speaker through a microphone or the like. A voice analysis unit 2 extracts a time series and an energy series of feature vectors from the input voice signal. Reference numeral 3 is a voice section detecting means for detecting a voice section part from the energy sequence of the voice. Reference numeral 4 is a standard pattern determining means, which analyzes a large number of voice patterns of a large number of people and determines their representative pattern as a standard pattern. Reference numeral 5 is a standard pattern managing means for managing the memory position and pattern length of each standard pattern, and 6 is a portion for managing a pattern with missing word heads and word tails as a part of the standard pattern managed by the standard pattern managing means 5. A pattern management means, 7 is a pattern matching means for performing pattern rutting between the input pattern and each standard pattern and each partial pattern, and 8 is a speaker which recognizes a recognition candidate voice derived from the result of the pattern matching means 7 by voice synthesis or the like. It is a recognition result output unit that informs to.

第２図は本実施例の構成を示す回路図で、上記の音声区
間検出手段３、標準パタン管理手段5,部分パタン管理手
段6,パタンマッチング手段７をマイクロコンピュータ23
で実現した構成を示すものである。第２図において、11
は音声の入力を行なうマイクロホン、12はマイクロホン
11から入力された音声信号をアナログ−ディジタル変換
するアナログ／ディジタル変換器（以下A/D変換器とい
う。）、13は音声分析部、14は音声区間検出部、15は入
力音声の特徴ベクトルの時系列を記憶する入力パタンメ
モリ、17は標準パタンのなかで語頭部，語尾部の欠落の
可能性のあるパタンについて、欠落の生じたパタンを標
準パタンの部分パタンとして管理する標準パタンの部分
パタン管理テーブル、18は標準パタン決定手段５により
決定された各標準パタンを管理する標準パタン管理テー
ブル、19はすべての標準パタンの特徴ベクトルの時系列
を記憶する標準パタンメモリ、20は認識結果判定部、21
は得られた認識候補音声の音声を合成する音声合成部、
22は音声合成部21で得られた音声合成部を出力するスピ
ーカである。FIG. 2 is a circuit diagram showing the configuration of this embodiment, in which the voice section detecting means 3, the standard pattern managing means 5, the partial pattern managing means 6, the pattern matching means 7 are provided in the microcomputer 23.
It shows the configuration realized in. In FIG. 2, 11
Is a microphone for voice input, 12 is a microphone
An analog / digital converter (hereinafter referred to as an A / D converter) that performs analog-to-digital conversion on the voice signal input from 11, 13 is a voice analysis unit, 14 is a voice section detection unit, and 15 is a feature vector of the input voice. An input pattern memory for storing a time series. Reference numeral 17 is a part of the standard pattern that manages the missing pattern as a part of the standard pattern for patterns that may have missing word heads and word tails. A pattern management table, 18 is a standard pattern management table for managing each standard pattern determined by the standard pattern determining means 5, 19 is a standard pattern memory for storing time series of feature vectors of all standard patterns, and 20 is a recognition result judgment. Department, 21
Is a speech synthesis unit that synthesizes the speech of the obtained recognition candidate speech,
Reference numeral 22 is a speaker that outputs the voice synthesis unit obtained by the voice synthesis unit 21.

第３図は本実施例のマイクロコンピュータの動作を説明
するための要部フローチャートである。以上の構成によ
る本実施例の動作を、第３図のフローチャートに沿って
詳細に説明する。FIG. 3 is a main part flowchart for explaining the operation of the microcomputer of this embodiment. The operation of this embodiment having the above configuration will be described in detail with reference to the flowchart of FIG.

まず、ステツプ31でマイクロホン11から音声を入力し、
A/D変換器12で音声信号をアナログ−ディジタル変換し
たあと、音声分析部13で音声パタンの特徴ベクトル（例
えば、10次元の線形予測係数）の時系列とエネルギー系
列を求める。ステツプ32では、音声分析部13で得られた
エネルギー系列からエネルギー値がしきい値を上回る区
間が一定時間T₀を超え、しかも語頭前部，語尾後部にそ
れぞれ一定時間T₁，T₂以上のしきい値A₀を下回る区間が
存在するとき一定時間T₀を超える区間を音声区間として
検出し、ステップ33で入力パタンメモリ15にその特徴ベ
クトルの時系列を記憶する。First, in step 31, input the voice from the microphone 11,
After the A / D converter 12 performs analog-to-digital conversion on the voice signal, the voice analysis unit 13 obtains the time series and energy series of the feature vector (for example, 10-dimensional linear prediction coefficient) of the voice pattern. In step 32, the section in which the energy value exceeds the threshold value exceeds the fixed time T ₀ from the energy sequence obtained by the speech analysis unit 13 and the front part and the rear part of the word have a predetermined time T ₁ or T ₂ or more, respectively. When there is a section below the threshold value A ₀ , a section that exceeds the certain time T ₀ is detected as a voice section, and in step 33, the time series of the feature vector is stored in the input pattern memory 15.

なお、あらかじめ標準パタン決定手段４により認識対象
音声の各々に対して、多人数の多数の音声パタンより代
表的なパタンを複数個ずつ決定し、標準パタンメモリ19
にそれらのパタンを記憶している。また、標準パタン管
理テーブル18には、標準パタンメモリ19の各パタンを管
理するためのアドレス及びパタン長を記憶しており、標
準パタンの部分パタン管理テーブル17には、標準パタン
のうち語頭部，語尾部の欠落の可能性のあるパタンをあ
らかじめ調べておき、欠落の生じた時のパタンを欠落の
ない標準パタンの部分パタンとして管理するために、そ
の標準パタンメモリ19上のアドレス及びそのパタン長を
記憶している。即ち、標準パタンメモリ19には欠落のな
い代表パタンとしての標準パタンの特徴ベクトルの時系
列のみが記憶されているだけであり、語頭部，語尾部の
欠落した部分パタンが必要なときは、標準パタンの部分
パタン管理テーブル17に従い標準パタンメモリ19内の部
分パタンの部分のみを取り出せばよい。It should be noted that the standard pattern determining unit 4 determines a plurality of typical patterns from a large number of voice patterns of a large number of persons for each of the recognition target voices in advance, and the standard pattern memory 19
I remember those patterns. Further, the standard pattern management table 18 stores addresses and pattern lengths for managing each pattern of the standard pattern memory 19, and the partial pattern management table 17 of the standard pattern stores the word head of the standard pattern. The pattern on the standard pattern memory 19 and its pattern are checked in order to check the pattern with the possibility of missing the word ending in advance and manage the pattern at the time of the missing as a partial pattern of the standard pattern with no missing. I remember the length. That is, the standard pattern memory 19 stores only the time series of the feature vector of the standard pattern as a representative pattern without omission, and when a partial pattern with a missing word head and word tail is required, According to the partial pattern management table 17 of the standard pattern, only the part of the partial pattern in the standard pattern memory 19 may be taken out.

ステップ34では、標準パタン管理テーブル18に従って標
準パタンメモリ19上の最初のパタンをDPマッチング部16
のメモリにロードし、次にステップ35で入力パタンメモ
リ15に記憶された入力パタンとステップ34でロードされ
た標準パタンとの間でDPマッチングを行う。ステップ36
では、標準パタン管理テーブル18に従い、すべての標準
パタンとステップ34,35の処理を終了したかを調べ、終
了していなければステップ34に戻り同様の処理を続け
る。In step 34, the first pattern on the standard pattern memory 19 is selected by the DP matching unit 16 according to the standard pattern management table 18.
Of the input pattern stored in the input pattern memory 15 in step 35 and the standard pattern loaded in step 34 for DP matching. Step 36
Then, according to the standard pattern management table 18, it is checked whether all the standard patterns and the processes of steps 34 and 35 have been completed. If they have not been completed, the process returns to step 34 and the same process is continued.

ステップ36の条件を満足すると、次はステップ37で部分
パタン管理テーブル17に従い、最初の部分パタンを標準
パタンメモリ19からDPマッチング部16のメモリ上にロー
ドし、ステップ38でDPマッチングを実行する。その後、
ステップ39で標準パタンの部分パタン管理テーブル17に
従い、すべての部分パタンとステップ37,38の処理を終
了したかをチェックし、終了していなければステップ37
の処理に戻る。When the condition of step 36 is satisfied, next, in step 37, the first partial pattern is loaded from the standard pattern memory 19 onto the memory of the DP matching unit 16 according to the partial pattern management table 17, and in step 38 DP matching is executed. afterwards,
In step 39, according to the partial pattern management table 17 of the standard pattern, it is checked whether all the partial patterns and the processes of steps 37 and 38 have been completed, and if not completed, step 37
Return to processing.

すべての標準パタン及び部分パタンとのDPマッチングが
終了すると、ステップ40に進み、認識結果判定部20で、
DPマッチング部16で得られた各標準パタン及び部分パタ
ンとの類似度のうち最大値を与えるパタンを認識候補音
声として判定する。さらに、ステップ41で音声合成部21
を起動させ認識結果判定部20で得られた認識候補音声を
合成し、スピーカ22に出力することにより話者に認識候
補音声を通知する。When DP matching with all the standard patterns and partial patterns is completed, the process proceeds to step 40, where the recognition result determination unit 20
Among the similarities to the standard patterns and the partial patterns obtained by the DP matching unit 16, the pattern giving the maximum value is determined as the recognition candidate voice. Further, in step 41, the speech synthesis unit 21
Is activated to synthesize the recognition candidate voices obtained by the recognition result determination unit 20 and output to the speaker 22 to notify the speaker of the recognition candidate voices.

なお、本実施例では、標準パタン管理テーブルと部分パ
タン管理テーブルとを別々に持ったが、部分パタン管理
テーブルを標準パタン管理テーブルの中の一部と考えれ
ば管理テーブル一つで同様の処理を行うことができる。In the present embodiment, the standard pattern management table and the partial pattern management table are separately provided, but if the partial pattern management table is considered as a part of the standard pattern management table, the same processing is performed by one management table. It can be carried out.

以上のように本実施例によれば、標準パタンを管理する
標準パタン管理手段と、語頭部，語尾部の欠落する可能
性のあるパタンについて欠落の生じた時のパタンを欠落
のない標準パタンの一部分として管理する部分パタン管
理手段とを持ち、語頭部，語尾部の検出を誤った場合に
も、部分パターンとパタンマッチングすることにより正
しく認識を行うことができる。As described above, according to the present embodiment, the standard pattern managing means for managing the standard pattern and the standard pattern without missing the pattern at the time of occurrence of the missing parts of the word head part and the word tail part Even if the detection of the word head portion and the word tail portion is erroneous, there is a partial pattern managing means for managing the partial pattern as a part of, and the pattern can be correctly recognized by pattern matching with the partial pattern.

また、語頭部，語尾部の不安定な標準パタンについては
欠落の生じたパタンを欠落のない代表パタン一つで管理
することができるのでテンプレートを増やす必要がな
く、メモリの有効利用がはかれる。Further, with respect to the standard patterns with unstable word heads and word tails, it is possible to manage the missing patterns with only one representative pattern without missing, so that it is not necessary to increase the number of templates and effective use of memory can be achieved.

発明の効果以上のように本発明は、多人数の多数の音声パタンから
代表的なパタンを各認識対象音声に複数個ずつ選択し、
標準パタンとして決定する標準パタン決定手段と、各標
準パタンのメモリ上のアドレスとパタン長を管理する標
準パタン管理手段と、標準パタンのうち語頭部，語尾部
の欠落する可能性のあるパタンについて欠落の生じたと
きのパタンを、欠落のない標準パタンの一部分としてそ
のアドレスとパタン長を標準パタン一つで管理する部分
パタン管理手段とを持ち、入力パタンと各標準パタン及
び各部分パタンとの間でパタンマッチングを行い類似度
が最大となるパタンを認識候補音声とすることにより、
音声区間検出の際に誤って語頭部，語尾部が欠落したパ
タンを入力した場合でも部分パタン管理手段により管理
された部分パタンとパタンマッチングを行うことにより
精度良く認識を行うことのできる音声認識装置を提供す
ることができる。Effects of the Invention As described above, the present invention selects a plurality of typical patterns for each recognition target voice from a large number of voice patterns of a large number of people,
Regarding standard pattern determining means for determining as a standard pattern, standard pattern managing means for managing the address and pattern length on the memory of each standard pattern, and the patterns in which the word head and word tail of the standard pattern may be missing The pattern at the time of omission is provided as a part of the standard pattern without omission, and the partial pattern management means for managing the address and the pattern length by one standard pattern is provided, and the input pattern and each standard pattern and each partial pattern are By performing pattern matching between the two and setting the pattern with the maximum similarity as the recognition candidate speech,
Even if a pattern in which the word head and the word tail are missing is input by mistake when detecting a voice section, it is possible to perform accurate recognition by performing pattern matching with the partial pattern managed by the partial pattern management means. A device can be provided.

また、欠落の生じたパタンを欠落のない標準パタンを代
表パタンとして代表パタン一つで管理することにより、
テンプレート数を増加させることなく音声区間検出を誤
った場合にも正しく認識することのできる音声認識装置
を提供することができる。In addition, by managing the standard pattern with no missing pattern as a typical pattern with one missing pattern,
It is possible to provide a voice recognition device capable of correctly recognizing even when the voice section detection is erroneous without increasing the number of templates.

【図面の簡単な説明】[Brief description of drawings]

第１図は本発明の一実施例における音声認識装置の構成
を示すブロック図、第２図は同装置の構成を示す回路ブ
ロック図、第３図は同装置の動作説明のための要部フロ
ーチャートである。２……音声分析手段、３……音声区間検出手段、４……
標準パタン決定手段、５……標準パタン管理手段、６…
…部分パタン管理手段、７……パタンマッチング手段、
11……マイクロホン、15……入力パタンメモリ、17……
部分パタン管理テーブル、18……標準パタン管理テーブ
ル、19……標準パタンメモリ、22……スピーカ、23……
マイクロコンピュータ。FIG. 1 is a block diagram showing the configuration of a voice recognition device according to an embodiment of the present invention, FIG. 2 is a circuit block diagram showing the configuration of the same device, and FIG. 3 is a main part flowchart for explaining the operation of the device. Is. 2 ... Voice analysis means, 3 ... Voice section detection means, 4 ...
Standard pattern determination means, 5 ... Standard pattern management means, 6 ...
... Partial pattern management means, 7 ... Pattern matching means,
11 …… Microphone, 15 …… Input pattern memory, 17 ……
Partial pattern management table, 18 …… Standard pattern management table, 19 …… Standard pattern memory, 22 …… Speaker, 23 ……
Microcomputer.

Claims

【特許請求の範囲】[Claims]

【請求項１】入力音声からエネルギー系列を含む特徴ベ
クトルの時系列を抽出する音声分析手段と、前記音声分
析手段により得られたエネルギー系列から音声区間を検
出する音声区間検出手段と、多人数の多数の音声パタン
から代表的なパタンを選択し、音声認識対象音声ごとに
複数個ずつ標準パタンとして決定する標準パタン決定手
段と、前記標準パタン決定手段により決定された各標準
パタンの記憶されているメモリ上のアドレス、パタン長
を管理する標準パタン管理手段と、発声の仕方、個人差
によって音声パタンの語頭部あるいは語尾部が欠落する
可能性のある認識対象音声の標準パタンに関して、欠落
のないパタンを代表パタンとして、欠落のあるパタンの
標準パタンは代表パタンの一部分としてその記憶されて
いるメモリ上のアドレス、パタン長を管理する部分パタ
ン管理手段と、前記標準パタン管理手段により管理され
た各標準パタン、及び前記部分パタン管理手段により管
理された標準パタンの各部分パタンと入力音声パタンと
の間でパタンマッチングを行い類似度が最大となるパタ
ンを認識候補音声とするパタンマッチング手段とを備え
たことを特徴とする音声認識装置。1. A voice analysis unit for extracting a time series of a feature vector containing an energy sequence from an input voice, a voice section detection unit for detecting a voice section from the energy series obtained by the voice analysis unit, and a large number of persons. A standard pattern determining unit that selects a representative pattern from a large number of voice patterns and determines a plurality of standard patterns for each voice recognition target voice, and the respective standard patterns determined by the standard pattern determining unit are stored. There is no omission regarding the standard pattern management means for managing addresses and pattern lengths in memory, and the standard pattern of the recognition target speech in which the word head or tail part of the voice pattern may be missing due to the way of uttering and individual differences. The standard pattern of the missing pattern is the pattern as the representative pattern, and the standard pattern of the missing pattern is a part of the representative pattern. Between the input pattern and the partial pattern management means for managing the response and the pattern length, the standard patterns managed by the standard pattern management means, and the partial patterns and the input voice patterns of the standard pattern managed by the partial pattern management means. A voice recognition device comprising: a pattern matching unit that performs pattern matching and uses a pattern having a maximum degree of similarity as a recognition candidate voice.