JP2992324B2 - Voice section detection method - Google Patents
Voice section detection methodInfo
- Publication number
- JP2992324B2 JP2992324B2 JP2289113A JP28911390A JP2992324B2 JP 2992324 B2 JP2992324 B2 JP 2992324B2 JP 2289113 A JP2289113 A JP 2289113A JP 28911390 A JP28911390 A JP 28911390A JP 2992324 B2 JP2992324 B2 JP 2992324B2
- Authority
- JP
- Japan
- Prior art keywords
- voice
- section
- vowel
- emphasis
- section detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Description
【発明の詳細な説明】 産業上の利用分野 本発明は、騒音下、特に自動車の中、工場の中などで
の音声認識用の音声区間検出方法に関する。Description: TECHNICAL FIELD The present invention relates to a method for detecting a speech section for speech recognition under noise, especially in a car, a factory, or the like.
従来の技術 音声認識装置を実現する上で、音声区間の検出は非常
に重要であり、以後の認識結果に大きな影響を及ぼす。
音声区間検出の一般的なものとしては、例えば「音声認
識」(新見著、共立出版、p.68〜69)に示されるように
2つの閾値を用いる方法がある。2. Description of the Related Art In realizing a speech recognition device, detection of a speech section is very important, and has a great effect on subsequent recognition results.
As a general method of voice section detection, for example, there is a method using two thresholds as shown in “Speech Recognition” (Niimi, Kyoritsu Shuppan, pp. 68-69).
発明が解決しようとする課題 ところが、一般に母音“い”は他の母音に比べて音声
パワーが小さいので、従来法によると欠落が生じ、誤認
識してしまうことがある。このような母音“い”の欠落
対策として例えば特開昭60−260096号公報に示されるよ
うに、区間検出処理前に母音“い”を検出する手段を設
け、母音“い”と判断した場合、区間検出のための閾値
を下げるという方法があるが、母音“い”を判断する手
段を設けなければならず、コストや処理速度の点で問題
が残る。Problems to be Solved by the Invention However, since the vowel "i" generally has lower audio power than other vowels, the vowel "i" may be missing according to the conventional method and may be erroneously recognized. As a countermeasure against such lack of vowel "i", for example, as shown in Japanese Patent Application Laid-Open No. 60-260096, when a means for detecting vowel "i" is provided before section detection processing, and vowel "i" is determined Although there is a method of lowering the threshold value for section detection, a means for judging a vowel “I” must be provided, and problems remain in terms of cost and processing speed.
課題を解決するための手段 音声を集音して電気信号に変換し、変換された信号を
基に音声の存在する区間を検出するようにした音声区間
検出方法において、音声パワーの小さい母音のホルマン
ト周波数近傍の周波数成分を強調するプリエンファスを
通した出力を用いて音声の存在する区間を検出するよう
にした。Means for Solving the Problems In a voice section detection method in which voice is collected and converted into an electric signal, and a section in which voice is present is detected based on the converted signal, a formant of a vowel having low voice power is provided. The section where the voice is present is detected by using the output through the pre-emphasis that emphasizes the frequency components near the frequency.
作用 母音“い”を強調するプリエンファシスをかけた出力
信号を用いて音声区間を検出するので、音声パワーの小
さい母音“い”の欠落の可能性が低下し、音声区間の誤
検出が少なくなる。Since the voice section is detected using the output signal subjected to pre-emphasis which emphasizes the vowel "i", the possibility of missing the vowel "i" having low voice power is reduced, and the erroneous detection of the voice section is reduced. .
実施例 本発明の一実施例を図面に基づいて説明する。基本的
には、音声入力部1と本実施例の特徴とするプリエンフ
ァシス2と閾値計算部3と区間検出部4と音声認識部5
よりなる。Embodiment An embodiment of the present invention will be described with reference to the drawings. Basically, a speech input unit 1, a pre-emphasis 2, a threshold calculation unit 3, a section detection unit 4, and a speech recognition unit 5 which are features of the present embodiment.
Consisting of
このような構成において、本実施例の音声区間検出処
理は第2図に示すフローチャートに従い行われる。In such a configuration, the voice section detection processing of this embodiment is performed according to the flowchart shown in FIG.
まず、入力音の取込みを行う。音声入力部1はマイク
ロフォンのような音響・電気信号変換器によるもので、
音声を集音して電気信号x(t)に変換する。First, the input sound is captured. The audio input unit 1 is based on an acoustic / electric signal converter such as a microphone.
The sound is collected and converted into an electric signal x (t).
ついで、プリエンファシス計算処理を行う。プリエン
ファシス2は母音“い”などのように音声パワーの小さ
い音声を大きくするために音声パワーの小さい母音のホ
ルマント周波数近傍の周波数成分を強調するものであ
り、例えば第3図に示すように構成されている。このプ
リエンファシス2は帯域パワー検出部6で入力信号x
(t)をバンドパスフィルタ群、或いはFETなどを用い
てm個の帯域に分割し、10msec程度の短時間スペクトル
X(f)を検出し(ただし、f=1,2,〜,m)、重み付け
パワー検出部7で各帯域パワーを次式のように重み係数
記憶部8に記憶しておいて重み係数w(f)を用いて、
重み付けし、和をとったものをプリエンファシス2の出
力PYとする。Next, a pre-emphasis calculation process is performed. The pre-emphasis 2 emphasizes a frequency component near a formant frequency of a vowel having a low voice power so as to increase a voice having a low voice power such as a vowel "i". Have been. This pre-emphasis 2 is applied to the input signal x by the band power detector 6.
(T) is divided into m bands using a band-pass filter group or an FET, and a short-time spectrum X (f) of about 10 msec is detected (however, f = 1, 2, to m), Each band power is stored in the weighting factor storage unit 8 by the weighting power detection unit 7 as in the following equation, and the weighting factor w (f) is used.
Weighting, the one taking the sum with the output P Y preemphasis 2.
Y(f)=w(f)・X(f) なお、上式中の重み係数w(f)は母音“い”の第1,
第2ホルマントなどを強調するもので、発声者が男性の
場合は160〜300Hz(第1ホルマント)、1.9〜2.4kHz
(第2ホルマント)付近で大きな値となる。第4図は重
み係数w(f)の例を示したものである。Y (f) = w (f) .X (f) Note that the weight coefficient w (f) in the above equation is the first vowel "i"
Emphasizes the second formant, etc., 160-300Hz (first formant), 1.9-2.4kHz when the speaker is male
The value becomes large near (the second formant). FIG. 4 shows an example of the weight coefficient w (f).
また、このようなプリエンファシスの代りに、第4図
に示すような特性を持つフィルタなどを用いてもよい。Further, instead of such pre-emphasis, a filter having characteristics as shown in FIG. 4 may be used.
第5図は、例えば“北見(きたみ)”と発声したとき
の入力信号x(t)の短時間パワーの時系列PX(n)
と、プリエンファシスの出力の時系列PY(n)とを示し
ている。なお、ここで用いるパワーとは、入力信号の絶
対値、或いは自乗値の短時間平均値である。FIG. 5 is a time series P X (n) of short-time power of the input signal x (t) when, for example, “Kitami” is uttered.
And a time series P Y (n) of the output of the pre-emphasis. The power used here is the absolute value of the input signal or the short-time average value of the square value.
ついで、閾値計算部3による閾値計算を選択的に行
う。即ち、音声が存在しない区間におけるプリエンファ
シス2の出力信号PYから閾値Thを計算し、記憶する。例
えば、 Th=α・PY+β として求める。Next, threshold calculation by the threshold calculator 3 is selectively performed. That is, the threshold Th is calculated from the output signal P Y pre-emphasis 2 in the section that speech is not present, and stores. For example, it is obtained as Th = α · P Y + β.
一方、区間検出部4によりプリエンファシス2の出力
信号PYを用いて区間検出を行う。区間検出の方法として
は、プリエンファシス2の出力信号PYが閾値計算部3で
求められた閾値Thを一定時間以上越えた区間を音声とす
る方法、その他の方法でよい。On the other hand, it performs the section detected by using an output signal P Y pre-emphasis 2 by the section detecting unit 4. As a method for section detection, a method for a section exceeds a threshold value Th determined by the output signal P Y the threshold calculating unit 3 of the pre-emphasis 2 fixed time or a voice may be other methods.
この区間検出部4の出力信号と入力信号x(t)とは
音声認識部5に入力され、入力信号x(t)の内で区間
検出部4で区間検出された信号のみを用いて音声が認識
される。認識方法は、例えば「2値のTSPを用いた単語
音声認識システムの開発」(安田晴剛他著、電気学会論
文誌C108巻、昭和63年10月号、p858〜865)に示される
ような音声認識システムでよいが、その他の周知の音声
認識システムでもよい。The output signal of the section detection unit 4 and the input signal x (t) are input to the speech recognition unit 5, and the speech is generated using only the signal of the input signal x (t) detected by the section detection unit 4 in the section. Be recognized. Recognition methods are described in, for example, "Development of a Word Speech Recognition System Using Binary TSP" (Harushi Yasuda et al., IEEJ Transactions on Information Technology, Vol. C108, October 1988, p. 858-865). A speech recognition system may be used, but other well-known speech recognition systems may be used.
発明の効果 本発明、上述したように母音“い”のように音声パワ
ーの小さい母音のホルマント周波数近傍の周波数成分を
強調するプリエンファスを通した出力を用いて音声の存
在する区間を検出するようにしたので、語頭や語尾に音
声パワーの小さい母音“い”を含む単語を認識するに際
しても、処理速度の遅れを伴うことなく、母音“い”の
欠落の可能性を低下させて、音声区間の誤検出を減少さ
せることができる。According to the present invention, as described above, a section in which a voice is present is detected using an output through a pre-emphasis that emphasizes a frequency component near a formant frequency of a vowel having a low voice power such as a vowel “i”. Therefore, when recognizing a word including a vowel "i" having a low voice power at the beginning or end of the word, the possibility of missing the vowel "i" is reduced without delaying the processing speed, and False detection can be reduced.
図面は本発明の一実施例を示すもので、第1図はブロッ
ク図、第2図はフローチャート、第3図はプリエンファ
シスの構成を示すブロック図、第4図は重み係数例を示
す特性図、第5図は具体的なプリエンファシスの例を示
す特性図である。1 is a block diagram, FIG. 2 is a flowchart, FIG. 3 is a block diagram showing a configuration of pre-emphasis, and FIG. 4 is a characteristic diagram showing an example of a weight coefficient. FIG. 5 is a characteristic diagram showing a specific example of pre-emphasis.
───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.6,DB名) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06 ──────────────────────────────────────────────────続 き Continued on the front page (58) Field surveyed (Int.Cl. 6 , DB name) G10L 3/00 513 G10L 3/02 301 G10L 7/08 G10L 9/02 301 G10L 9/06
Claims (1)
れた信号を基に音声の存在する区間を検出するようにし
た音声区間検出方法において、音声パワーの小さい母音
のホルマント周波数近傍の周波数成分を強調するプリエ
ンファスを通した出力を用いて音声の存在する区間を検
出するようにしたことを特徴とする音声区間検出方法。1. A voice section detecting method for collecting voice and converting it into an electric signal, and detecting a section in which voice exists based on the converted signal. A voice section detection method characterized in that a section in which voice exists is detected using an output through a pre-emphasis that emphasizes a frequency component of the voice section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2289113A JP2992324B2 (en) | 1990-10-26 | 1990-10-26 | Voice section detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2289113A JP2992324B2 (en) | 1990-10-26 | 1990-10-26 | Voice section detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH04163497A JPH04163497A (en) | 1992-06-09 |
JP2992324B2 true JP2992324B2 (en) | 1999-12-20 |
Family
ID=17738955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2289113A Expired - Fee Related JP2992324B2 (en) | 1990-10-26 | 1990-10-26 | Voice section detection method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2992324B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3393532B2 (en) | 1997-03-14 | 2003-04-07 | 日本電信電話株式会社 | Method for normalizing volume of recorded voice and apparatus for implementing the method |
CN112399004A (en) * | 2019-08-14 | 2021-02-23 | 原相科技股份有限公司 | Sound output adjusting method and electronic device for executing adjusting method |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5305422A (en) * | 1992-02-28 | 1994-04-19 | Panasonic Technologies, Inc. | Method for determining boundaries of isolated words within a speech signal |
US5796842A (en) | 1996-06-07 | 1998-08-18 | That Corporation | BTSC encoder |
US8908872B2 (en) | 1996-06-07 | 2014-12-09 | That Corporation | BTSC encoder |
JP3620787B2 (en) * | 2000-02-28 | 2005-02-16 | カナース・データー株式会社 | Audio data encoding method |
-
1990
- 1990-10-26 JP JP2289113A patent/JP2992324B2/en not_active Expired - Fee Related
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3393532B2 (en) | 1997-03-14 | 2003-04-07 | 日本電信電話株式会社 | Method for normalizing volume of recorded voice and apparatus for implementing the method |
CN112399004A (en) * | 2019-08-14 | 2021-02-23 | 原相科技股份有限公司 | Sound output adjusting method and electronic device for executing adjusting method |
CN112399004B (en) * | 2019-08-14 | 2024-05-24 | 达发科技股份有限公司 | Sound output adjusting method and electronic device for executing same |
Also Published As
Publication number | Publication date |
---|---|
JPH04163497A (en) | 1992-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2776848B2 (en) | Denoising method, neural network learning method used for it | |
US7117148B2 (en) | Method of noise reduction using correction vectors based on dynamic aspects of speech and noise normalization | |
JPS58130393A (en) | Voice recognition equipment | |
JPH0990974A (en) | Signal processor | |
KR101250668B1 (en) | Method for recogning emergency speech using gmm | |
Couvreur et al. | Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models | |
JP2992324B2 (en) | Voice section detection method | |
US20060178881A1 (en) | Method and apparatus for detecting voice region | |
JP3354252B2 (en) | Voice recognition device | |
JP2564821B2 (en) | Voice judgment detector | |
JP2989219B2 (en) | Voice section detection method | |
JPH01255000A (en) | Apparatus and method for selectively adding noise to template to be used in voice recognition system | |
Kingsbury et al. | Improving ASR performance for reverberant speech | |
JPH04324499A (en) | Speech recognition device | |
JP3190231B2 (en) | Apparatus and method for extracting pitch period of voiced sound signal | |
JPH0556520B2 (en) | ||
KR100345402B1 (en) | An apparatus and method for real - time speech detection using pitch information | |
JP2557497B2 (en) | How to identify male and female voices | |
JPS6039695A (en) | Method and apparatus for automatically detecting voice activity | |
JPH0318720B2 (en) | ||
JP2891259B2 (en) | Voice section detection device | |
KR950002704B1 (en) | Speech recognition system | |
JPS6217800A (en) | Voice section decision system | |
JPS61260299A (en) | Voice recognition equipment | |
JP3008404B2 (en) | Voice recognition device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
LAPS | Cancellation because of no payment of annual fees |