JP2992324B2 - Voice section detection method - Google Patents

Voice section detection method

Info

Publication number
JP2992324B2
JP2992324B2 JP2289113A JP28911390A JP2992324B2 JP 2992324 B2 JP2992324 B2 JP 2992324B2 JP 2289113 A JP2289113 A JP 2289113A JP 28911390 A JP28911390 A JP 28911390A JP 2992324 B2 JP2992324 B2 JP 2992324B2
Authority
JP
Japan
Prior art keywords
voice
section
vowel
emphasis
section detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2289113A
Other languages
Japanese (ja)
Other versions
JPH04163497A (en
Inventor
貢 松下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP2289113A priority Critical patent/JP2992324B2/en
Publication of JPH04163497A publication Critical patent/JPH04163497A/en
Application granted granted Critical
Publication of JP2992324B2 publication Critical patent/JP2992324B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、騒音下、特に自動車の中、工場の中などで
の音声認識用の音声区間検出方法に関する。
Description: TECHNICAL FIELD The present invention relates to a method for detecting a speech section for speech recognition under noise, especially in a car, a factory, or the like.

従来の技術 音声認識装置を実現する上で、音声区間の検出は非常
に重要であり、以後の認識結果に大きな影響を及ぼす。
音声区間検出の一般的なものとしては、例えば「音声認
識」(新見著、共立出版、p.68〜69)に示されるように
2つの閾値を用いる方法がある。
2. Description of the Related Art In realizing a speech recognition device, detection of a speech section is very important, and has a great effect on subsequent recognition results.
As a general method of voice section detection, for example, there is a method using two thresholds as shown in “Speech Recognition” (Niimi, Kyoritsu Shuppan, pp. 68-69).

発明が解決しようとする課題 ところが、一般に母音“い”は他の母音に比べて音声
パワーが小さいので、従来法によると欠落が生じ、誤認
識してしまうことがある。このような母音“い”の欠落
対策として例えば特開昭60−260096号公報に示されるよ
うに、区間検出処理前に母音“い”を検出する手段を設
け、母音“い”と判断した場合、区間検出のための閾値
を下げるという方法があるが、母音“い”を判断する手
段を設けなければならず、コストや処理速度の点で問題
が残る。
Problems to be Solved by the Invention However, since the vowel "i" generally has lower audio power than other vowels, the vowel "i" may be missing according to the conventional method and may be erroneously recognized. As a countermeasure against such lack of vowel "i", for example, as shown in Japanese Patent Application Laid-Open No. 60-260096, when a means for detecting vowel "i" is provided before section detection processing, and vowel "i" is determined Although there is a method of lowering the threshold value for section detection, a means for judging a vowel “I” must be provided, and problems remain in terms of cost and processing speed.

課題を解決するための手段 音声を集音して電気信号に変換し、変換された信号を
基に音声の存在する区間を検出するようにした音声区間
検出方法において、音声パワーの小さい母音のホルマン
ト周波数近傍の周波数成分を強調するプリエンファスを
通した出力を用いて音声の存在する区間を検出するよう
にした。
Means for Solving the Problems In a voice section detection method in which voice is collected and converted into an electric signal, and a section in which voice is present is detected based on the converted signal, a formant of a vowel having low voice power is provided. The section where the voice is present is detected by using the output through the pre-emphasis that emphasizes the frequency components near the frequency.

作用 母音“い”を強調するプリエンファシスをかけた出力
信号を用いて音声区間を検出するので、音声パワーの小
さい母音“い”の欠落の可能性が低下し、音声区間の誤
検出が少なくなる。
Since the voice section is detected using the output signal subjected to pre-emphasis which emphasizes the vowel "i", the possibility of missing the vowel "i" having low voice power is reduced, and the erroneous detection of the voice section is reduced. .

実施例 本発明の一実施例を図面に基づいて説明する。基本的
には、音声入力部1と本実施例の特徴とするプリエンフ
ァシス2と閾値計算部3と区間検出部4と音声認識部5
よりなる。
Embodiment An embodiment of the present invention will be described with reference to the drawings. Basically, a speech input unit 1, a pre-emphasis 2, a threshold calculation unit 3, a section detection unit 4, and a speech recognition unit 5 which are features of the present embodiment.
Consisting of

このような構成において、本実施例の音声区間検出処
理は第2図に示すフローチャートに従い行われる。
In such a configuration, the voice section detection processing of this embodiment is performed according to the flowchart shown in FIG.

まず、入力音の取込みを行う。音声入力部1はマイク
ロフォンのような音響・電気信号変換器によるもので、
音声を集音して電気信号x(t)に変換する。
First, the input sound is captured. The audio input unit 1 is based on an acoustic / electric signal converter such as a microphone.
The sound is collected and converted into an electric signal x (t).

ついで、プリエンファシス計算処理を行う。プリエン
ファシス2は母音“い”などのように音声パワーの小さ
い音声を大きくするために音声パワーの小さい母音のホ
ルマント周波数近傍の周波数成分を強調するものであ
り、例えば第3図に示すように構成されている。このプ
リエンファシス2は帯域パワー検出部6で入力信号x
(t)をバンドパスフィルタ群、或いはFETなどを用い
てm個の帯域に分割し、10msec程度の短時間スペクトル
X(f)を検出し(ただし、f=1,2,〜,m)、重み付け
パワー検出部7で各帯域パワーを次式のように重み係数
記憶部8に記憶しておいて重み係数w(f)を用いて、
重み付けし、和をとったものをプリエンファシス2の出
力PYとする。
Next, a pre-emphasis calculation process is performed. The pre-emphasis 2 emphasizes a frequency component near a formant frequency of a vowel having a low voice power so as to increase a voice having a low voice power such as a vowel "i". Have been. This pre-emphasis 2 is applied to the input signal x by the band power detector 6.
(T) is divided into m bands using a band-pass filter group or an FET, and a short-time spectrum X (f) of about 10 msec is detected (however, f = 1, 2, to m), Each band power is stored in the weighting factor storage unit 8 by the weighting power detection unit 7 as in the following equation, and the weighting factor w (f) is used.
Weighting, the one taking the sum with the output P Y preemphasis 2.

Y(f)=w(f)・X(f) なお、上式中の重み係数w(f)は母音“い”の第1,
第2ホルマントなどを強調するもので、発声者が男性の
場合は160〜300Hz(第1ホルマント)、1.9〜2.4kHz
(第2ホルマント)付近で大きな値となる。第4図は重
み係数w(f)の例を示したものである。
Y (f) = w (f) .X (f) Note that the weight coefficient w (f) in the above equation is the first vowel "i"
Emphasizes the second formant, etc., 160-300Hz (first formant), 1.9-2.4kHz when the speaker is male
The value becomes large near (the second formant). FIG. 4 shows an example of the weight coefficient w (f).

また、このようなプリエンファシスの代りに、第4図
に示すような特性を持つフィルタなどを用いてもよい。
Further, instead of such pre-emphasis, a filter having characteristics as shown in FIG. 4 may be used.

第5図は、例えば“北見(きたみ)”と発声したとき
の入力信号x(t)の短時間パワーの時系列PX(n)
と、プリエンファシスの出力の時系列PY(n)とを示し
ている。なお、ここで用いるパワーとは、入力信号の絶
対値、或いは自乗値の短時間平均値である。
FIG. 5 is a time series P X (n) of short-time power of the input signal x (t) when, for example, “Kitami” is uttered.
And a time series P Y (n) of the output of the pre-emphasis. The power used here is the absolute value of the input signal or the short-time average value of the square value.

ついで、閾値計算部3による閾値計算を選択的に行
う。即ち、音声が存在しない区間におけるプリエンファ
シス2の出力信号PYから閾値Thを計算し、記憶する。例
えば、 Th=α・PY+β として求める。
Next, threshold calculation by the threshold calculator 3 is selectively performed. That is, the threshold Th is calculated from the output signal P Y pre-emphasis 2 in the section that speech is not present, and stores. For example, it is obtained as Th = α · P Y + β.

一方、区間検出部4によりプリエンファシス2の出力
信号PYを用いて区間検出を行う。区間検出の方法として
は、プリエンファシス2の出力信号PYが閾値計算部3で
求められた閾値Thを一定時間以上越えた区間を音声とす
る方法、その他の方法でよい。
On the other hand, it performs the section detected by using an output signal P Y pre-emphasis 2 by the section detecting unit 4. As a method for section detection, a method for a section exceeds a threshold value Th determined by the output signal P Y the threshold calculating unit 3 of the pre-emphasis 2 fixed time or a voice may be other methods.

この区間検出部4の出力信号と入力信号x(t)とは
音声認識部5に入力され、入力信号x(t)の内で区間
検出部4で区間検出された信号のみを用いて音声が認識
される。認識方法は、例えば「2値のTSPを用いた単語
音声認識システムの開発」(安田晴剛他著、電気学会論
文誌C108巻、昭和63年10月号、p858〜865)に示される
ような音声認識システムでよいが、その他の周知の音声
認識システムでもよい。
The output signal of the section detection unit 4 and the input signal x (t) are input to the speech recognition unit 5, and the speech is generated using only the signal of the input signal x (t) detected by the section detection unit 4 in the section. Be recognized. Recognition methods are described in, for example, "Development of a Word Speech Recognition System Using Binary TSP" (Harushi Yasuda et al., IEEJ Transactions on Information Technology, Vol. C108, October 1988, p. 858-865). A speech recognition system may be used, but other well-known speech recognition systems may be used.

発明の効果 本発明、上述したように母音“い”のように音声パワ
ーの小さい母音のホルマント周波数近傍の周波数成分を
強調するプリエンファスを通した出力を用いて音声の存
在する区間を検出するようにしたので、語頭や語尾に音
声パワーの小さい母音“い”を含む単語を認識するに際
しても、処理速度の遅れを伴うことなく、母音“い”の
欠落の可能性を低下させて、音声区間の誤検出を減少さ
せることができる。
According to the present invention, as described above, a section in which a voice is present is detected using an output through a pre-emphasis that emphasizes a frequency component near a formant frequency of a vowel having a low voice power such as a vowel “i”. Therefore, when recognizing a word including a vowel "i" having a low voice power at the beginning or end of the word, the possibility of missing the vowel "i" is reduced without delaying the processing speed, and False detection can be reduced.

【図面の簡単な説明】[Brief description of the drawings]

図面は本発明の一実施例を示すもので、第1図はブロッ
ク図、第2図はフローチャート、第3図はプリエンファ
シスの構成を示すブロック図、第4図は重み係数例を示
す特性図、第5図は具体的なプリエンファシスの例を示
す特性図である。
1 is a block diagram, FIG. 2 is a flowchart, FIG. 3 is a block diagram showing a configuration of pre-emphasis, and FIG. 4 is a characteristic diagram showing an example of a weight coefficient. FIG. 5 is a characteristic diagram showing a specific example of pre-emphasis.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.6,DB名) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06 ──────────────────────────────────────────────────続 き Continued on the front page (58) Field surveyed (Int.Cl. 6 , DB name) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】音声を集音して電気信号に変換し、変換さ
れた信号を基に音声の存在する区間を検出するようにし
た音声区間検出方法において、音声パワーの小さい母音
のホルマント周波数近傍の周波数成分を強調するプリエ
ンファスを通した出力を用いて音声の存在する区間を検
出するようにしたことを特徴とする音声区間検出方法。
1. A voice section detecting method for collecting voice and converting it into an electric signal, and detecting a section in which voice exists based on the converted signal. A voice section detection method characterized in that a section in which voice exists is detected using an output through a pre-emphasis that emphasizes a frequency component of the voice section.
JP2289113A 1990-10-26 1990-10-26 Voice section detection method Expired - Fee Related JP2992324B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2289113A JP2992324B2 (en) 1990-10-26 1990-10-26 Voice section detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2289113A JP2992324B2 (en) 1990-10-26 1990-10-26 Voice section detection method

Publications (2)

Publication Number Publication Date
JPH04163497A JPH04163497A (en) 1992-06-09
JP2992324B2 true JP2992324B2 (en) 1999-12-20

Family

ID=17738955

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2289113A Expired - Fee Related JP2992324B2 (en) 1990-10-26 1990-10-26 Voice section detection method

Country Status (1)

Country Link
JP (1) JP2992324B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3393532B2 (en) 1997-03-14 2003-04-07 日本電信電話株式会社 Method for normalizing volume of recorded voice and apparatus for implementing the method
CN112399004A (en) * 2019-08-14 2021-02-23 原相科技股份有限公司 Sound output adjusting method and electronic device for executing adjusting method

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5305422A (en) * 1992-02-28 1994-04-19 Panasonic Technologies, Inc. Method for determining boundaries of isolated words within a speech signal
US5796842A (en) 1996-06-07 1998-08-18 That Corporation BTSC encoder
US8908872B2 (en) 1996-06-07 2014-12-09 That Corporation BTSC encoder
JP3620787B2 (en) * 2000-02-28 2005-02-16 カナース・データー株式会社 Audio data encoding method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3393532B2 (en) 1997-03-14 2003-04-07 日本電信電話株式会社 Method for normalizing volume of recorded voice and apparatus for implementing the method
CN112399004A (en) * 2019-08-14 2021-02-23 原相科技股份有限公司 Sound output adjusting method and electronic device for executing adjusting method
CN112399004B (en) * 2019-08-14 2024-05-24 达发科技股份有限公司 Sound output adjusting method and electronic device for executing same

Also Published As

Publication number Publication date
JPH04163497A (en) 1992-06-09

Similar Documents

Publication Publication Date Title
JP2776848B2 (en) Denoising method, neural network learning method used for it
US7117148B2 (en) Method of noise reduction using correction vectors based on dynamic aspects of speech and noise normalization
JPS58130393A (en) Voice recognition equipment
JPH0990974A (en) Signal processor
KR101250668B1 (en) Method for recogning emergency speech using gmm
Couvreur et al. Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
JP2992324B2 (en) Voice section detection method
US20060178881A1 (en) Method and apparatus for detecting voice region
JP3354252B2 (en) Voice recognition device
JP2564821B2 (en) Voice judgment detector
JP2989219B2 (en) Voice section detection method
JPH01255000A (en) Apparatus and method for selectively adding noise to template to be used in voice recognition system
Kingsbury et al. Improving ASR performance for reverberant speech
JPH04324499A (en) Speech recognition device
JP3190231B2 (en) Apparatus and method for extracting pitch period of voiced sound signal
JPH0556520B2 (en)
KR100345402B1 (en) An apparatus and method for real - time speech detection using pitch information
JP2557497B2 (en) How to identify male and female voices
JPS6039695A (en) Method and apparatus for automatically detecting voice activity
JPH0318720B2 (en)
JP2891259B2 (en) Voice section detection device
KR950002704B1 (en) Speech recognition system
JPS6217800A (en) Voice section decision system
JPS61260299A (en) Voice recognition equipment
JP3008404B2 (en) Voice recognition device

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees