JP2992324B2

JP2992324B2 - Voice section detection method

Info

Publication number: JP2992324B2
Application number: JP2289113A
Authority: JP
Inventors: 貢松下
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-10-26
Filing date: 1990-10-26
Publication date: 1999-12-20
Anticipated expiration: 2014-12-20
Also published as: JPH04163497A

Description

【発明の詳細な説明】産業上の利用分野本発明は、騒音下、特に自動車の中、工場の中などで
の音声認識用の音声区間検出方法に関する。Description: TECHNICAL FIELD The present invention relates to a method for detecting a speech section for speech recognition under noise, especially in a car, a factory, or the like.

従来の技術音声認識装置を実現する上で、音声区間の検出は非常
に重要であり、以後の認識結果に大きな影響を及ぼす。
音声区間検出の一般的なものとしては、例えば「音声認
識」（新見著、共立出版、p.68〜69）に示されるように
２つの閾値を用いる方法がある。2. Description of the Related Art In realizing a speech recognition device, detection of a speech section is very important, and has a great effect on subsequent recognition results.
As a general method of voice section detection, for example, there is a method using two thresholds as shown in “Speech Recognition” (Niimi, Kyoritsu Shuppan, pp. 68-69).

発明が解決しようとする課題ところが、一般に母音“い”は他の母音に比べて音声
パワーが小さいので、従来法によると欠落が生じ、誤認
識してしまうことがある。このような母音“い”の欠落
対策として例えば特開昭60−260096号公報に示されるよ
うに、区間検出処理前に母音“い”を検出する手段を設
け、母音“い”と判断した場合、区間検出のための閾値
を下げるという方法があるが、母音“い”を判断する手
段を設けなければならず、コストや処理速度の点で問題
が残る。Problems to be Solved by the Invention However, since the vowel "i" generally has lower audio power than other vowels, the vowel "i" may be missing according to the conventional method and may be erroneously recognized. As a countermeasure against such lack of vowel "i", for example, as shown in Japanese Patent Application Laid-Open No. 60-260096, when a means for detecting vowel "i" is provided before section detection processing, and vowel "i" is determined Although there is a method of lowering the threshold value for section detection, a means for judging a vowel “I” must be provided, and problems remain in terms of cost and processing speed.

課題を解決するための手段音声を集音して電気信号に変換し、変換された信号を
基に音声の存在する区間を検出するようにした音声区間
検出方法において、音声パワーの小さい母音のホルマン
ト周波数近傍の周波数成分を強調するプリエンファスを
通した出力を用いて音声の存在する区間を検出するよう
にした。Means for Solving the Problems In a voice section detection method in which voice is collected and converted into an electric signal, and a section in which voice is present is detected based on the converted signal, a formant of a vowel having low voice power is provided. The section where the voice is present is detected by using the output through the pre-emphasis that emphasizes the frequency components near the frequency.

作用母音“い”を強調するプリエンファシスをかけた出力
信号を用いて音声区間を検出するので、音声パワーの小
さい母音“い”の欠落の可能性が低下し、音声区間の誤
検出が少なくなる。Since the voice section is detected using the output signal subjected to pre-emphasis which emphasizes the vowel "i", the possibility of missing the vowel "i" having low voice power is reduced, and the erroneous detection of the voice section is reduced. .

実施例本発明の一実施例を図面に基づいて説明する。基本的
には、音声入力部１と本実施例の特徴とするプリエンフ
ァシス２と閾値計算部３と区間検出部４と音声認識部５
よりなる。Embodiment An embodiment of the present invention will be described with reference to the drawings. Basically, a speech input unit 1, a pre-emphasis 2, a threshold calculation unit 3, a section detection unit 4, and a speech recognition unit 5 which are features of the present embodiment.
Consisting of

このような構成において、本実施例の音声区間検出処
理は第２図に示すフローチャートに従い行われる。In such a configuration, the voice section detection processing of this embodiment is performed according to the flowchart shown in FIG.

まず、入力音の取込みを行う。音声入力部１はマイク
ロフォンのような音響・電気信号変換器によるもので、
音声を集音して電気信号ｘ（ｔ）に変換する。First, the input sound is captured. The audio input unit 1 is based on an acoustic / electric signal converter such as a microphone.
The sound is collected and converted into an electric signal x (t).

ついで、プリエンファシス計算処理を行う。プリエン
ファシス２は母音“い”などのように音声パワーの小さ
い音声を大きくするために音声パワーの小さい母音のホ
ルマント周波数近傍の周波数成分を強調するものであ
り、例えば第３図に示すように構成されている。このプ
リエンファシス２は帯域パワー検出部６で入力信号ｘ
（ｔ）をバンドパスフィルタ群、或いはFETなどを用い
てｍ個の帯域に分割し、10msec程度の短時間スペクトル
Ｘ（ｆ）を検出し（ただし、ｆ＝1,2,〜,m）、重み付け
パワー検出部７で各帯域パワーを次式のように重み係数
記憶部８に記憶しておいて重み係数ｗ（ｆ）を用いて、
重み付けし、和をとったものをプリエンファシス２の出
力P_Yとする。Next, a pre-emphasis calculation process is performed. The pre-emphasis 2 emphasizes a frequency component near a formant frequency of a vowel having a low voice power so as to increase a voice having a low voice power such as a vowel "i". Have been. This pre-emphasis 2 is applied to the input signal x by the band power detector 6.
(T) is divided into m bands using a band-pass filter group or an FET, and a short-time spectrum X (f) of about 10 msec is detected (however, f = 1, 2, to m), Each band power is stored in the weighting factor storage unit 8 by the weighting power detection unit 7 as in the following equation, and the weighting factor w (f) is used.
Weighting, the one taking the sum with the output P _Y preemphasis 2.

Ｙ（ｆ）＝ｗ（ｆ）・Ｘ（ｆ）なお、上式中の重み係数ｗ（ｆ）は母音“い”の第1,
第２ホルマントなどを強調するもので、発声者が男性の
場合は160〜300Hz（第１ホルマント）、1.9〜2.4kHz
（第２ホルマント）付近で大きな値となる。第４図は重
み係数ｗ（ｆ）の例を示したものである。Y (f) = w (f) .X (f) Note that the weight coefficient w (f) in the above equation is the first vowel "i"
Emphasizes the second formant, etc., 160-300Hz (first formant), 1.9-2.4kHz when the speaker is male
The value becomes large near (the second formant). FIG. 4 shows an example of the weight coefficient w (f).

また、このようなプリエンファシスの代りに、第４図
に示すような特性を持つフィルタなどを用いてもよい。Further, instead of such pre-emphasis, a filter having characteristics as shown in FIG. 4 may be used.

第５図は、例えば“北見（きたみ）”と発声したとき
の入力信号ｘ（ｔ）の短時間パワーの時系列P_X（ｎ）
と、プリエンファシスの出力の時系列P_Y（ｎ）とを示し
ている。なお、ここで用いるパワーとは、入力信号の絶
対値、或いは自乗値の短時間平均値である。FIG. 5 is a time series P _X (n) of short-time power of the input signal x (t) when, for example, “Kitami” is uttered.
And a time series P _Y (n) of the output of the pre-emphasis. The power used here is the absolute value of the input signal or the short-time average value of the square value.

ついで、閾値計算部３による閾値計算を選択的に行
う。即ち、音声が存在しない区間におけるプリエンファ
シス２の出力信号P_Yから閾値Thを計算し、記憶する。例
えば、 Th＝α・P_Y＋β として求める。Next, threshold calculation by the threshold calculator 3 is selectively performed. That is, the threshold Th is calculated from the output signal P _Y pre-emphasis 2 in the section that speech is not present, and stores. For example, it is obtained as Th = α · P _Y + β.

一方、区間検出部４によりプリエンファシス２の出力
信号P_Yを用いて区間検出を行う。区間検出の方法として
は、プリエンファシス２の出力信号P_Yが閾値計算部３で
求められた閾値Thを一定時間以上越えた区間を音声とす
る方法、その他の方法でよい。On the other hand, it performs the section detected by using an output signal P _Y pre-emphasis 2 by the section detecting unit 4. As a method for section detection, a method for a section exceeds a threshold value Th determined by the output signal P _Y the threshold calculating unit 3 of the pre-emphasis 2 fixed time or a voice may be other methods.

この区間検出部４の出力信号と入力信号ｘ（ｔ）とは
音声認識部５に入力され、入力信号ｘ（ｔ）の内で区間
検出部４で区間検出された信号のみを用いて音声が認識
される。認識方法は、例えば「２値のTSPを用いた単語
音声認識システムの開発」（安田晴剛他著、電気学会論
文誌C108巻、昭和63年10月号、p858〜865）に示される
ような音声認識システムでよいが、その他の周知の音声
認識システムでもよい。The output signal of the section detection unit 4 and the input signal x (t) are input to the speech recognition unit 5, and the speech is generated using only the signal of the input signal x (t) detected by the section detection unit 4 in the section. Be recognized. Recognition methods are described in, for example, "Development of a Word Speech Recognition System Using Binary TSP" (Harushi Yasuda et al., IEEJ Transactions on Information Technology, Vol. C108, October 1988, p. 858-865). A speech recognition system may be used, but other well-known speech recognition systems may be used.

発明の効果本発明、上述したように母音“い”のように音声パワ
ーの小さい母音のホルマント周波数近傍の周波数成分を
強調するプリエンファスを通した出力を用いて音声の存
在する区間を検出するようにしたので、語頭や語尾に音
声パワーの小さい母音“い”を含む単語を認識するに際
しても、処理速度の遅れを伴うことなく、母音“い”の
欠落の可能性を低下させて、音声区間の誤検出を減少さ
せることができる。According to the present invention, as described above, a section in which a voice is present is detected using an output through a pre-emphasis that emphasizes a frequency component near a formant frequency of a vowel having a low voice power such as a vowel “i”. Therefore, when recognizing a word including a vowel "i" having a low voice power at the beginning or end of the word, the possibility of missing the vowel "i" is reduced without delaying the processing speed, and False detection can be reduced.

【図面の簡単な説明】[Brief description of the drawings]

図面は本発明の一実施例を示すもので、第１図はブロッ
ク図、第２図はフローチャート、第３図はプリエンファ
シスの構成を示すブロック図、第４図は重み係数例を示
す特性図、第５図は具体的なプリエンファシスの例を示
す特性図である。1 is a block diagram, FIG. 2 is a flowchart, FIG. 3 is a block diagram showing a configuration of pre-emphasis, and FIG. 4 is a characteristic diagram showing an example of a weight coefficient. FIG. 5 is a characteristic diagram showing a specific example of pre-emphasis.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】音声を集音して電気信号に変換し、変換さ
れた信号を基に音声の存在する区間を検出するようにし
た音声区間検出方法において、音声パワーの小さい母音
のホルマント周波数近傍の周波数成分を強調するプリエ
ンファスを通した出力を用いて音声の存在する区間を検
出するようにしたことを特徴とする音声区間検出方法。1. A voice section detecting method for collecting voice and converting it into an electric signal, and detecting a section in which voice exists based on the converted signal. A voice section detection method characterized in that a section in which voice exists is detected using an output through a pre-emphasis that emphasizes a frequency component of the voice section.