JP3299574B2

JP3299574B2 - Recognition device

Info

Publication number: JP3299574B2
Application number: JP32504792A
Authority: JP
Inventors: 教幸藤本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-12-04
Filing date: 1992-12-04
Publication date: 2002-07-08
Anticipated expiration: 2017-07-08
Also published as: JPH06175687A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に係り、特
に登録パタンと照合するに先立ち、入力された音が雑音
かどうかを判別して、雑音であると判別したものについ
ては登録パタンとの照合を行わず、雑音でないと判別さ
れたものについてのみ照合を行うようにして、正確な認
識度を向上させるものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus, and more particularly to a speech recognition apparatus which determines whether or not an input sound is noise before comparing it with a registration pattern. Is not performed, but only those that are determined not to be noise are compared, thereby improving the degree of accurate recognition.

【０００２】[0002]

【従来の技術】従来の音声認識装置では、雑音が音声と
間違われて認識されてしまうケースが多く問題となって
いた。勿論従来の装置では、雑音を音声でないと判定し
てリジェクトする機能は具備されていたが、後述するよ
うにリジェクト能力が十分でなかった。2. Description of the Related Art In a conventional speech recognition apparatus, there has been a problem that noise is often mistakenly recognized as speech. Needless to say, the conventional apparatus has a function of rejecting noise by determining that the noise is not speech, but the rejecting ability is not sufficient as described later.

【０００３】従来の音声認識装置を図４により説明す
る。マイクロホン１より入力された音声信号はＡＤ変換
部２によりディジタル信号に変換され、特徴抽出部３に
より特徴抽出される。特徴抽出部３は複数のバンドパス
フィルターを具備しており、複数の周波数ｆ₁、ｆ₂・
・・ｆｎの特徴を抽出する。特徴抽出部３は、またパワ
ー抽出をも行なう。区間検出部４でパワーの大きな区分
のみの前記特徴を出力する。A conventional speech recognition apparatus will be described with reference to FIG . The audio signal input from the microphone 1 is converted into a digital signal by the AD converter 2, and the feature is extracted by the feature extractor 3. The feature extraction unit 3 includes a plurality of band pass filters, and a plurality of frequencies f ₁ , f _2.
Extract features of fn. The feature extraction unit 3 also performs power extraction. The section detector 4 outputs the above-mentioned feature only for the section having a large power.

【０００４】例えば図５に示す如く、周波数ｆ₁の強度
Ｉが（Ａ）の状態であり、ｆ₂の強度Ｉが（Ｂ）の状態
であり、ｆｎの強度Ｉが（Ｎ）の状態の場合、区分検出
部４はその強度の弱い部分を休止区間としてこれを利用
して区切り、音声出力の大きな区分のみの特徴を照合部
６に出力する。For example, as shown in FIG. 5 , the intensity I of frequency f ₁ is in the state of (A), the intensity I of f ₂ is in the state of (B), and the intensity I of fn is in the state of (N). In this case, the section detection unit 4 uses the low-intensity part as a pause section and divides it using the pause section, and outputs the feature of only the section with a large audio output to the collation unit 6.

【０００５】照合部６ではこの特徴を、あらかじめ登録
パタン部５に格納されている各単語等のパタンとの距離
を算出し、この算出距離のもっとも小さいものを認識結
果として判定部８に出力する。判定部８ではこれをしき
い値記憶部７に予め保持されている、距離に対するしき
い値と比較し、このしきい値以内の場合が認識結果とし
て出力される。The matching section 6 calculates the distance between the feature and the pattern of each word or the like stored in the registration pattern section 5 in advance, and outputs the smallest calculated distance to the determination section 8 as a recognition result. . The determination unit 8 compares this with a threshold value for the distance stored in the threshold value storage unit 7 in advance, and outputs a result within the threshold value as a recognition result.

【０００６】しかし、入力された音の前記特徴パタン
と、予め登録されている音声パタン群との照合を照合部
６において行い、個々の距離を求めたとき、その時の最
小距離が前記しきい値以上であった場合に、登録パタン
のどれとも対応しない音、つまり雑音が入力されたもの
としてリジェクトしていた。However, when the feature pattern of the input sound is compared with a pre-registered voice pattern group in the matching unit 6 and individual distances are obtained, the minimum distance at that time is determined by the threshold value. In the case described above, a sound that does not correspond to any of the registered patterns, that is, noise was rejected as being input.

【０００７】[0007]

【発明が解決しようとする課題】このしきい値は、実験
的もしくは経験的に求められるが、入力を意図として発
声された音声と、雑音とを完全にふるい分けるしきい値
を設定することは不可能であり、雑音の一部はリジェク
トされず、入力を意図して発音した音声の一部がリジェ
クトされることがあるというのが現実であった。したが
って本発明の目的は、ノイズと判別可能なものはあらか
じめ別に判別するようにして、認識率の向上した音声認
識装置を提供することである。This threshold value can be obtained experimentally or empirically. However, it is not possible to set a threshold value for completely sifting a voice uttered for input and noise. It was impossible, and part of the noise was not rejected, and it was a reality that part of the sound that was pronounced for input was sometimes rejected. Accordingly, it is an object of the present invention to provide a speech recognition device having an improved recognition rate by separately discriminating what can be distinguished from noise in advance.

【０００８】[0008]

【課題を解決するための手段】前記目的を達成するた
め、本発明では、図１に示す如く、ノイズ検出部１０を
設け、入力音声が雑音の場合、これを照合部６を使用せ
ずに判別するように構成した。図１において、マイクロ
ホン１、ＡＤ変換部２、特徴抽出部３、区間検出部４、
登録パタン部５、照合部６、しきい値記憶部７、判定部
８は、前述した従来例の場合と同様である。According to the present invention, in order to achieve the above object, a noise detector 10 is provided as shown in FIG. It was configured to determine. In FIG. 1, a microphone 1, an AD conversion unit 2, a feature extraction unit 3, a section detection unit 4,
The registration pattern unit 5, collation unit 6, threshold value storage unit 7, and determination unit 8 are the same as those in the above-described conventional example.

【０００９】ノイズ検出部１０は、入力した音の中に、
急激に大音量に達したときにそれを雑音であるとして検
出する。 [0009] The noise detection unit 10 includes:
When a loud sound level is reached, it is detected as noise.
Put out.

【００１０】[0010]

【作用】本発明によれば、例えば、音が急激に大音量に
なったとき等を、登録パタン等の照合によらずにこれら
を雑音として判別できる。したがって雑音の認識度を向
上することが可能となり相対的に音声認識度を向上でき
る。According to the present invention, for example, the sound is suddenly increased to a large volume.
For example, when they do not occur, they can be discriminated as noise without depending on the collation of the registered pattern or the like. Therefore, the noise recognition degree can be improved, and the voice recognition degree can be relatively improved.

【００１１】[0011]

【実施例】本発明の第一実施例を図２及び図３にもとづ
き説明する。図２は第一実施例の構成図、図３はその動
作説明図である。人間は、急に大きな音量の音を出せな
いという特徴がある。それに対して騒音のある種のもの
は、急激に大音量に達するものがある。例えば扉の閉ま
る音や、ハンマーを叩いたときの音や、マイクロホンを
何かにぶつけた時等の衝撃音はこれに該当する。したが
って、この第一実施例では、音声認識装置に入力した音
の最初の立ち上がりの鋭さを測定して、鋭く立ち上って
いる音は、音声ではなく雑音であると判別して認識結果
を出力しない、もしくはリジェクトするようにしたもの
である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described with reference to FIGS.
I will explain. FIG. 2 is a block diagram of the first embodiment, and FIG.
It is operation | movement explanatory drawing . Humans have the characteristic that they cannot suddenly produce a loud sound. On the other hand, some noises suddenly reach a high volume. For example, the sound of closing a door, the sound of hitting a hammer, or the impact sound of hitting a microphone against something, etc. correspond to this. Therefore, in the first embodiment, the sharpness of the first rise of the sound input to the speech recognition device is measured, and the sound that rises sharply is not speech but noise, and does not output a recognition result. Or it is rejected.

【００１２】図２において、他図と同記号部分は同一部
分を示し、２１はパワー抽出部、２２は雑音・音声判定
部であり、鋭さ測定部２３を有するもの、２４はしきい
値記憶部である。In FIG . 2 , the same reference numerals as those in the other figures denote the same parts, reference numeral 21 denotes a power extraction unit, reference numeral 22 denotes a noise / voice determination unit, which has a sharpness measurement unit 23, and reference numeral 24 denotes a threshold storage unit. It is.

【００１３】パワー抽出部２１はマイクロホン１より入
力された音声信号のパワーを抽出するものである。雑音
・音声判定部２２は入力された音声信号が雑音か音声か
を判定するものであって入力された音声信号の立ち上が
りの鋭さにもとづきこの判定を行うものであり、そのた
め鋭さ測定部２３を具備する。The power extracting section 21 extracts the power of the audio signal input from the microphone 1. The noise / voice determination unit 22 determines whether the input voice signal is noise or voice, and makes this determination based on the sharpness of the rising edge of the input voice signal. I do.

【００１４】鋭さ測定部２３は、図３（Ａ）に示す如
く、例えば入力された音の始端Ｔｓから一定時間（例え
ば１０ｍｓ）後のパワー値Ｐ（１０ｍｓ後の場合は
Ｐ₁₀）を測定し、この値が、しきい値記憶部２４に予め
記入されているしきい値以上のとき、これを雑音と判断
する。これにより雑音・音声判定部２２は、照合部６に
対しリジェクトを行う。これにより、照合部６は照合結
果を出力しない。As shown in FIG. 3A , the sharpness measuring section 23 measures, for example, a power value P (P _{10 in} the case of 10 ms) after a fixed time (for example, 10 ms) from the starting point Ts of the input sound. When this value is equal to or larger than a threshold value previously written in the threshold value storage unit 24, it is determined that the noise is a noise. As a result, the noise / speech determining unit 22 rejects the matching unit 6. Thereby, the matching unit 6 does not output the matching result.

【００１５】次に第一実施例の動作を簡単に説明する。
マイクロホン１より入力された音声信号はＡＤ変換部２
でサンプリングされ、ディジタル信号に変換され、特徴
抽出部３及びパワー抽出部２１に伝達される。これによ
り特徴抽出部３において特徴抽出が行われ、またパワー
抽出部２１においてパワーが算出され鋭さ測定部２３に
伝達される。Next, the operation of the first embodiment will be briefly described.
The audio signal input from the microphone 1 is converted to an A / D converter 2
, Is converted into a digital signal, and transmitted to the feature extracting unit 3 and the power extracting unit 21. Thereby, feature extraction is performed in the feature extraction unit 3, and power is calculated in the power extraction unit 21 and transmitted to the sharpness measurement unit 23.

【００１６】ところで、入力された音声信号が、前記の
如く、扉の閉まる音やハンマーを叩いたときのような雑
音の場合には、最初の立ち上がりが鋭いものとなる。こ
れを判別するため、鋭さ測定部２３では、区間検出部４
から伝達された音の始端Ｔｓ信号を受信したのち、一定
時間例えば１０ｍｓ後のパワーＰ₁₀をしきい値記憶部２
４に保持されているしきい値と比較する。そしてこのパ
ワーＰ₁₀がしきい値以上の場合、この音声信号を雑音と
判別する。これにより雑音・音声判定部２２は、照合部
６に対しリジェクトを行う。When the input audio signal is a noise such as a door closing sound or a hit of a hammer as described above, the initial rise becomes sharp. To determine this, the sharpness measuring section 23 uses the section detecting section 4
After receiving the start Ts signal of the transmitted sound from a certain time, for example, a threshold storage unit 2 the power P ₁₀ after 10ms
Compare with the threshold value held at 4. And this power P ₁₀ is equal to or larger than the threshold value, determines the speech signal noise and. As a result, the noise / speech determining unit 22 rejects the matching unit 6.

【００１７】しかし前記パワーＰ₁₀がしきい値に達しな
いとき、リジェクトは行われないので、照合部６におい
ては、従来と同様に、照合部６において、特徴パタンと
登録パタン部５に保持されている各登録パタンとの照合
が行われ、照合結果が判定部８に出力され、しきい値記
憶部７に記入されているしきい値と比較され、照合結果
の距離がしきい値以内のとき照合部６における認識結果
が出力されることになる。[0017] However, when the power P ₁₀ does not reach the threshold, the reject is not performed, the matching unit 6, as in the prior art, the collation part 6 is held in the registration pattern portion 5, wherein the pattern The matching is performed with each registered pattern, and the matching result is output to the determination unit 8 and compared with the threshold value written in the threshold value storage unit 7, and the distance of the matching result is within the threshold value. At this time, the recognition result in the collating unit 6 is output.

【００１８】なお、前記説明では、立ち上がりの鋭さが
事前に設定してある基準値以上の場合に雑音とみなして
リジェクトするとき、鋭さを測定する手段として入力さ
れた音の始端から一定時間後のパワー値を測定してこれ
を基準値と比較した例について記述したが、本発明は勿
論これのみに限定されるものではない。In the above description, when the rejection is considered as noise when the sharpness of the rising edge is equal to or greater than a predetermined reference value, when the sound is input as a means for measuring the sharpness, a predetermined time after the beginning of the sound. Although an example in which a power value is measured and compared with a reference value has been described, the present invention is, of course, not limited to this.

【００１９】鋭さ測定手段としては、図３（Ａ）に示す
如く、例えば入力された音の始端Ｔｓから一定時間（例
えば１０ｍｓ）後のパワー値（例えばＰ₁₀）と、その入
力全体の中の最大パワー値Ｐｍａｘとの差もしくは比の
値を求め、事前に設定してある基準値以上の場合に雑音
とみなすように構成してもよい。また、図３（Ａ）に示
す如く、入力された音の始端Ｔｓから一定時間後（例え
ば１５ｍｓ）のパワー値（例えばＰ₁₅）と、それより始
端に近い時間（例えば始端から５ｍｓ）のパワー値（例
えばＰ₅）との差もしくは比の値を求め、事前に設定し
てある基準値以上の場合に雑音とみなすように構成して
もよい。As shown in FIG. 3A , the sharpness measuring means includes, for example, a power value (for example, P ₁₀ ) after a predetermined time (for example, 10 ms) from the starting point Ts of the input sound, and A configuration may be adopted in which a value of a difference or a ratio from the maximum power value Pmax is determined, and if the difference is greater than or equal to a preset reference value, it is regarded as noise. Further, as shown in FIG. 3A, the power value (for example, P ₁₅ ) after a certain time (for example, 15 ms) from the starting point Ts of the input sound and the power value for a time closer to the starting point (for example, 5 ms from the starting point). A configuration may be adopted in which a value of a difference or a ratio from a value (for example, P ₅ ) is obtained, and if the difference is equal to or more than a preset reference value, it is regarded as noise.

【００２０】さらに鋭さ測定手段を、図３（Ｂ）に示す
如く、入力された音の始端Ｔｓから最初に現れるパワー
の極大値Ｐｌの時間位置Ｔｌと、前記始端Ｔｓとの差Ｔ
ｌ−Ｔｓを求め、この差が基準値以下であれば雑音とみ
なすように構成してもよい。また、図３（Ｂ）に示す如
く、始端Ｔｓから初めて現れるパワーの極大値Ｐｌの時
間位置Ｔｌと、このＴｌより一定時間ΔＴ始端よりの位
置のパワー値Ｐｔとの差もしくは比を求め、この差もし
くは比が基準値以上であれば雑音とみなすように構成し
てもよい。Further, as shown in FIG. 3 (B), the sharpness measuring means detects the difference T between the time position Tl of the maximum value Pl of the power first appearing from the starting point Ts of the input sound and the starting point Ts.
1-Ts may be obtained, and if the difference is equal to or less than the reference value, the difference may be regarded as noise. Further, as shown in FIG. 3B , the difference or ratio between the time position Tl of the power maximum value Pl first appearing from the start end Ts and the power value Pt at a position from the start end for a certain time ΔT from this Tl is obtained. If the difference or the ratio is equal to or more than the reference value, it may be configured to be regarded as noise.

【００２１】本発明では、音声を電気信号に変換し、こ
れをある時間間隔で特徴量を抽出し、照合手段により登
録パタンと照合して認識結果を出力する音声認識装置に
おいて、予め規定された状態の入力音声をノイズとして
判別するノイズ検出手段１０を設け、前記ノイズ検出手
段に入力された音声中から、パワーピークの鋭さを測定
する鋭さ検出手段を設け、鋭さが事前に設定してある基
準値以上の場合は、雑音とみなしてリジェクトするよう
に構成することができる。According to the present invention, in a speech recognition apparatus for converting a speech into an electric signal, extracting a feature amount at a certain time interval, collating with a registered pattern by a collation means and outputting a recognition result, Noise detection means 10 for determining the input voice in the state as noise; and sharpness detection means for measuring the sharpness of the power peak from the voice input to the noise detection means. If the value is equal to or larger than the value, it can be configured to be regarded as noise and rejected.

【００２２】また本発明では、前記鋭さノイズ検出手段
を入力された音声の最初のパワーピークの時間位置とそ
のピークの両側で、パワーピークのパワー値に一定の値
ａをかけた値より小さいパワー値の時間位置を求め、そ
の両側の時間位置の幅が事前に設定してある基準値以下
の場合は、雑音とみなしてリジェクトするように構成す
ることができる。ここで例えば前記値ａの値を１／２に
すると半値幅と呼ばれる。Further, in the present invention, the sharpness noise detecting means has a power position smaller than a value obtained by multiplying the power value of the power peak by a constant value a on both sides of the time position of the first power peak of the input voice and both sides of the peak. The time position of the value is obtained, and if the width of the time position on both sides is equal to or less than a preset reference value, it can be configured as noise and rejected. Here, for example, if the value of the value a is halved, it is called a half width.

【００２３】本発明では、前記鋭さノイズ検出手段を入
力された音声のパワーピークの時間位置とそのピークの
両側で、パワーピークのパワー値に一定の値ａをかけた
値より小さいパワー値の時間位置を求め、その両側の時
間位置の幅をパワーピーク毎に求め、その最小値が事前
に設定してある基準値以下の場合は、雑音とみなしてリ
ジェクトするように構成することができる。According to the present invention, the sharpness noise detecting means has a time position of a power peak of the input voice and, on both sides of the peak, a time of a power value smaller than a value obtained by multiplying the power value of the power peak by a constant value a. The position is obtained, the width of the time position on both sides thereof is obtained for each power peak, and if the minimum value is equal to or less than a preset reference value, it can be regarded as noise and rejected.

【００２４】本発明では、前記鋭さノイズ検出手段を入
力された音声の最初のパワーピークの時間位置から一定
時間前のパワー値と一定時間後のパワー値を求め、パワ
ーピーク値から両側のパワー値を引いた値、もしくはパ
ワーピーク値から両側のパワー値の平均値を引いた値、
もしくはパワーピーク値を両側のパワー値の平均値で割
った値を求め、事前に設定してある基準値以上の場合
は、雑音とみなしてリジェクトするように構成すること
ができる。In the present invention, the sharpness noise detecting means obtains a power value before and after a predetermined time from a time position of a first power peak of the input voice, and determines a power value on both sides from the power peak value. Or the value obtained by subtracting the average of the power values on both sides from the power peak value,
Alternatively, it is possible to obtain a value obtained by dividing the power peak value by the average value of the power values on both sides, and if the power peak value is equal to or larger than a preset reference value, it is regarded as noise and rejected.

【００２５】さらに本発明では、前記鋭さノイズ検出手
段を入力された音声のパワーピークの時間位置から一定
時間前のパワー値と一定時間後のパワー値を求め、パワ
ーピーク値から両側のパワー値を引いた値、もしくはパ
ワーピーク値から両側のパワー値の平均値を引いた値、
もしくはパワーピーク値を両側のパワー値の平均値で割
った値をパワーピーク毎に求め、その最大値が事前に設
定してある基準値以上の場合は、雑音とみなしてリジェ
クトするように構成することができる。Further, in the present invention, the power value before and after a certain time is obtained from the time position of the power peak of the input voice by the sharpness noise detecting means, and the power values on both sides are obtained from the power peak value. The value obtained by subtracting the average value of the power values on both sides from the subtracted value or the power peak value,
Alternatively, a value obtained by dividing the power peak value by the average value of the power values on both sides is obtained for each power peak, and when the maximum value is equal to or greater than a preset reference value, the apparatus is regarded as noise and rejected. be able to.

【００２６】なお、本発明において、リジェクトの場合
は、認識装置は何も出力せず次の入力を待ってもよい
し、リジェクトであることを使用者もしくはホストコン
ピュータ等に通知してもよい。勿論、登録パタンと異な
る音声が入力されたと判断されたとき、前記のように、
従来通りリジェクトされるが、この場合は、リジェクト
であることを使用者もしくはホストコンピュータ等に通
知する。In the present invention, in the case of rejection, the recognition device may wait for the next input without outputting anything, or may notify the user or the host computer of the rejection. Of course, when it is determined that a voice different from the registered pattern has been input, as described above,
Rejection is performed as before, but in this case, the user or the host computer is notified of the rejection.

【００２７】[0027]

【発明の効果】本発明によれば、あらかじめ雑音である
ことが明らかなものについて、照合部を使用せずに雑音
判別を行うので、高速に、効果的にこれを除くことがで
き、雑音リジェクト能力の向上をはかることができる。According to the present invention, noise that is clearly known in advance is determined without using the collation unit, so that it can be eliminated quickly and effectively. You can improve your ability.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の第一実施例構成図である。FIG. 2 is a configuration diagram of a first embodiment of the present invention.

【図３】本発明の第一実施例の動作説明図である。FIG. 3 is an operation explanatory diagram of the first embodiment of the present invention.

【図４】従来例の構成図である。FIG. 4 is a configuration diagram of a conventional example.

【図５】従来例の動作説明図である。FIG. 5 is an operation explanatory diagram of a conventional example.

【符号の説明】[Explanation of symbols]

１マイクロホン２ＡＤ変換部３特徴抽出部４区間検出部５登録パタン部６照合部７しきい値記憶部８判定部１０ノイズ検出部 DESCRIPTION OF SYMBOLS 1 Microphone 2 AD conversion part 3 Feature extraction part 4 Section detection part 5 Registration pattern part 6 Collation part 7 Threshold storage part 8 Judgment part 10 Noise detection part

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定
手段を、入力された音の始端から一定時間後のパワー値
を測定し、事前に設定してある基準値以上の場合に雑音とみなし
てリジェクトするように構成したことを特徴とする認識
装置。 1. A method for extracting a feature amount of a voice at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
Provided sharpness detecting means for measuring the Rino sharpness, the sharpness measuring
Measures the power value after a certain time from the beginning of the input sound.
Is measured , and it is regarded as noise if it exceeds the preset reference value.
Recognition characterized by being configured to reject
apparatus.

【請求項２】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、入力された音の始端から一定時間
後のパワー値とその入力全体の中の最大パワー値の差も
しくは比の値を求め、事前に設定してある基準値以上の
場合に雑音とみなしてリジェクトするように構成したこ
とを特徴とする認識装置。 2. A speech feature value is extracted at a certain time interval.
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
A sharpness detecting means for measuring the sharpness of the sound, the sharpness measuring means is provided for a predetermined time from the beginning of the input sound.
The difference between the later power value and the maximum power value in the entire input
Or the value of the ratio, and
Is configured to be rejected as noise.
A recognition device characterized by the following.

【請求項３】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、入力された音の始端から一定時間
後のパワー値とそれより始端に近い時間のパワー値との
差もしくは比を求め、事前に設定してある基準値以上の
場合に雑音とみなしてリジェクトするように構成したこ
とを特徴とする認識装置。 3. A feature amount of a voice is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
A sharpness detecting means for measuring the sharpness of the sound, the sharpness measuring means is provided for a predetermined time from the beginning of the input sound.
After the power value and by Ri of the power value of the time close to the beginning it
Find the difference or ratio, and then
Is configured to be rejected as noise.
A recognition device characterized by the following.

【請求項４】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、始端から初めて現れるパワーの極
大値の時間位置と始端の時間位置との差が基準値以下の
場合に雑音とみなしてリジェクトするように構成したこ
とを特徴とする認識装置。 4. A feature amount of a voice is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
A sharpness detecting means for measuring the sharpness of the power, and providing the sharpness measuring means with a power pole first appearing from the start end.
The difference between the time position of the large value and the time position of the starting point is less than the reference value.
Is configured to be rejected as noise.
A recognition device characterized by the following.

【請求項５】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、始端から初めて現れるパワーの極
大値と、その時間位置から一定時間始端よりの位置のパ
ワー値との差もしくは比が基準値以上の場合に雑音とみ
なしてリジェクトするように構成したことを特徴とする
認識装置。 5. A speech feature amount is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
A sharpness detecting means for measuring the sharpness of the power, and providing the sharpness measuring means with a power pole first appearing from the start end.
The maximum value and the distance between the time position and the position
If the difference or ratio from the
Characterized by being configured to be rejected
Recognition device.

【請求項６】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、入力音声中の始端より一定時間以
後の任意の時間のパワー値と、その一定時間前のパワー
値との差もしくは比の最大値を求めるようにし、その最
大値が事前に設定してある基準値以上の場合は、雑音と
みなしてリジェクトするように構成したことを特徴とす
る認識装置。 6. A feature amount of a voice is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
Sharpness detecting means for measuring the sharpness of the input sound , wherein the sharpness measuring means is provided with a predetermined time or more from the start end of the input voice.
Power value at any time after and power before a certain time
Find the maximum value of the difference or ratio from the
If the large value is higher than the preset reference value, noise and
The feature is that it is configured to be denied and rejected.
Recognition device.

【請求項７】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、あらかじめ規定された状態の入力音声をノイズとして判
別するノイズ検出手段と、前記ノイズ検出手段に入力された音声中から、立ち上が
りの鋭さを測定する鋭さ検出手段を設け、前記鋭さ測定手段を、入力音声中のパワー極大値と、そ
の一定時間前のパワー値との差もしくは比を、パワーピ
ーク毎に求め、その最大値が事前に設定してある基準値
以上の場合は、雑音とみなしてリジェクトするように構
成したことを特徴とする認識装置。 7. A feature amount of a voice is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
In the above, the input voice in a predefined state is determined as noise.
From the noise detection means to be separated and from the sound input to the noise detection means,
Sharpness detecting means for measuring the sharpness of the input sound, and the sharpness measuring means determines the power maximum value in the input voice and the
The difference or ratio from the power value
The maximum value is determined for each work and the maximum value is a preset reference value.
In the above cases, it is configured to reject as noise.
A recognition device characterized by the fact that it has been realized.

【請求項８】ある時間間隔で音声の特徴量を抽出し、
照合手段により登録パタンと照合して認識する認識装置
において、予め規定された状態の入力音声をノイズとして判別する
ノイズ検出手段を設け、前記ノイズ検出手段に入力された音声中から、パワーピ
ークの鋭さを測定する鋭さ検出手段を設け、鋭さが事前
に設定してある基準値以上の場合は、雑音とみなしてリ
ジェクトするように構成したことを特徴とする認識装
置。 8. A speech feature value is extracted at a certain time interval,
Recognition device for collating and recognizing a registered pattern by collating means
, Discriminates input speech in a predetermined state as noise
Noise detecting means, and a power pi
Provision of sharpness detection means to measure the sharpness of the
If the value is higher than the reference value set in
Recognition device characterized in that it is configured to
Place.

【請求項９】前記鋭さノイズ検出手段を入力された音
声の最初のパワーピークの時間位置とそのピークの両側
で、パワーピークのパワー値に一定の値をかけた値より
小さいパワー値の時間位置を求め、その両側の時間位置
の幅が事前に設定してある基準値以下の場合は、雑音と
みなしてリジェクトするように構成したことを特徴とす
る請求項８記載の認識装置。 9. A sound input to said sharpness noise detecting means.
Time position of the first power peak of the voice and both sides of that peak
And the power peak power value multiplied by a certain value
Find the time position of the small power value and the time position on both sides
If the width is less than the preset reference value,
The feature is that it is configured to be denied and rejected.
9. The recognition device according to claim 8, wherein:

【請求項１０】前記鋭さノイズ検出手段を入力された
音声のパワーピークの時間位置とそのピークの両側で、
パワーピークのパワー値に一定の値をかけた値より小さ
いパワー値の時間位置を求め、その両側の時間位置の幅
をパワーピーク毎に求め、その最小値が事前に設定して
ある基準値以下の場合は、雑音とみなしてリジェクトす
るように構成したことを特徴とする請求項８記載の認識
装置。 10. The apparatus according to claim 1, wherein said sharpness noise detecting means is inputted.
At the time position of the audio power peak and on both sides of the peak,
Less than the power peak power multiplied by a certain value
The time position of the new power value and the width of the time position on both sides
Is calculated for each power peak, and the minimum value is set in advance.
If it is below a certain reference value, it is rejected as noise.
9. The recognition device according to claim 8, wherein the recognition device is configured to:
apparatus.

【請求項１１】前記鋭さノイズ検出手段を入力された
音声の最初のパワーピークの時間位置から一定時間前の
パワー値と一定時間後のパワー値を求め、パワーピーク
値から両側のパワー値を引いた値、もしくはパワーピー
ク値から両側のパワー値の平均値を引いた値、もしくは
パワーピーク値を両側のパワー値の平均値で割った値を
求め、事前に設定してある基準値以上の場合は、雑音と
みなしてリジェクトするように構成したことを特徴とす
る請求項８記載の認識装置。 11. The apparatus according to claim 1, wherein said sharpness noise detecting means is inputted.
A certain time before the time position of the first power peak of the sound
Find the power value and the power value after a certain time,
Value minus power value on both sides, or power peak
Value obtained by subtracting the average value of the power values on both sides from the peak value, or
The value obtained by dividing the power peak value by the average value of the power values on both sides
If it is higher than the preset reference value, noise and
The feature is that it is configured to be denied and rejected.
9. The recognition device according to claim 8, wherein:

【請求項１２】前記鋭さノイズ検出手段を入力された
音声のパワーピークの時間位置から一定時間前のパワー
値と一定時間後のパワー値を求め、パワーピーク値から
両側のパワー値を引いた値、もしくはパワーピーク値か
ら両側のパワー値の平均値を引いた値、もしくはパワー
ピーク値を両側のパワー値の平均値で割った値をパワー
ピーク毎に求め、その最大値が事前に設定してある基準
値以上の場合は、雑音とみなしてリジェクトするように
構成したことを特徴とする請求項８記載の認識装置。 12. The apparatus according to claim 11, wherein said sharpness noise detecting means is inputted.
Power before a certain time from the time position of the audio power peak
Value and the power value after a certain period of time.
Is the value obtained by subtracting the power values on both sides, or the power peak value?
The value obtained by subtracting the average of the power values on both sides from
The value obtained by dividing the peak value by the average value of the power values on both sides is the power
A standard that is obtained for each peak and the maximum value is set in advance
If it is higher than the value, it will be rejected as noise
The recognition device according to claim 8, wherein the recognition device is configured.