JPH01277899A - In-speech-band signal detection system - Google Patents
In-speech-band signal detection systemInfo
- Publication number
- JPH01277899A JPH01277899A JP63105522A JP10552288A JPH01277899A JP H01277899 A JPH01277899 A JP H01277899A JP 63105522 A JP63105522 A JP 63105522A JP 10552288 A JP10552288 A JP 10552288A JP H01277899 A JPH01277899 A JP H01277899A
- Authority
- JP
- Japan
- Prior art keywords
- signal
- speech
- input
- audio
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims description 5
- 238000001228 spectrum Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 230000005236 sound signal Effects 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 230000007935 neutral effect Effects 0.000 abstract 2
- 230000001131 transforming effect Effects 0.000 abstract 2
- 238000000034 method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 230000007257 malfunction Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Abstract
Description
【発明の詳細な説明】
(産業上の利用分野)
本発明は音声パケットにおける音声信号検出方式に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for detecting audio signals in audio packets.
音声パケットとは周知のように、パケット交換システム
に用いられたり、音声の無音区間に他チャンネルの音声
を挿入して回線の有効利用を図るために用いられる技法
で、音声を・!ケノト化する技術である。As is well known, voice packet is a technique used in packet switching systems and to insert audio from other channels into silent periods of audio to make effective use of the line. It is a technique to become a kenot.
(従来の技術)
前述のように音声・9ケyトは音声をパケット化するの
であるから、当然音声信号の正確な識別が必要となって
くる。なかでも音声帯域内信号には、いわゆる音声信号
(単に音声と呼ぶのと同意)と音声信号以外の信号(以
下非音声信号と称す)例えばモデム信号、PB倍信号ト
ーン信号等が含まれているのが通信の常でその識別の確
度が重要な技術となってくる。(Prior Art) As mentioned above, since the voice 9-key packet is used to packetize the voice, it is naturally necessary to accurately identify the voice signal. Among these, the audio band signals include so-called audio signals (simply called audio) and signals other than audio signals (hereinafter referred to as non-audio signals) such as modem signals, PB double signal tone signals, etc. This is a constant in communications, and the accuracy of identification is an important technology.
一般に音声・?ケソトにおいては、音声信号と非音声信
号を識別し音声信号を無音サグレスする。Audio in general? In Kesoto, audio signals are distinguished from non-audio signals, and audio signals are silenced.
しかし非音声信号まで無音サプレスされると、その信号
を受ける受信器が誤動作する可能性が生じる。従って伝
送効率の向上と受信器の誤動作防止のためには精度のよ
い前述の識別が必須となってく る 。However, if even non-audio signals are silently suppressed, there is a possibility that a receiver receiving the signals will malfunction. Therefore, in order to improve transmission efficiency and prevent receiver malfunction, the above-mentioned accurate identification is essential.
このための技術としては、−例として昭;fo61年度
電子通信学会通信部門全国大会講演論文集、社団法人電
子通信学会発行、331「ス4クトルの8・にワ分散を
利用したDSI用音声検出器j2−149頁に示されて
いるように、使われる音声帯域内信号に着目し分割周波
数のチャンネル数を決定し、その出力における音声とそ
の成分の違いを検出する帯域分割法等が用いられている
。Techniques for this purpose include, for example, Proceedings of the 1961 National Conference of the Telecommunications Society of the Institute of Electronics and Communication Engineers, published by the Institute of Electronics and Communication Engineers, 331. As shown on page 2-149, a band division method is used that focuses on the audio band signal to be used, determines the number of divided frequency channels, and detects the difference between the audio and its components in the output. ing.
(発明が解決しようどする課題)
しかしながら前述の方式であると、非音声信号の種類が
多いとすべてに対処したスにクトル抽出は至難であり、
また音声信号にも非音声信号のスペクトラムを含んでい
るため識別は難かしく、識別のための時間も数十〜数百
ミリセカンドと多くを要する。(Problem to be solved by the invention) However, with the above-mentioned method, it is extremely difficult to extract vectors that deal with all types of non-audio signals.
Furthermore, since the audio signal also includes the spectrum of the non-audio signal, it is difficult to identify it, and it takes a long time, ranging from tens to hundreds of milliseconds.
この発明は前述のスペクトラム抽出の難かしさと識別時
間の問題点を除去し、音声・9ケツトにおける音声信号
と非音声信号の識別を短い時間で確度高く実現させる音
声信号検出方式を提供するものである。The present invention provides a voice signal detection method that eliminates the difficulty of spectrum extraction and the problem of identification time described above, and realizes the discrimination between voice and non-voice signals in a short time and with high accuracy. It is.
(課題を解決するための手段)
前記目的を達成するために、本発明の音声信号検出方式
は、音声信号の、スペクトラム抽出を行う手段と、その
抽出された出力によって各種信号の特徴を予め学習させ
ておくことができるニューラルネットを用いて各種信号
を検出させる方式である。(Means for Solving the Problems) In order to achieve the above object, the audio signal detection method of the present invention includes means for extracting a spectrum of an audio signal, and features of various signals learned in advance using the extracted output. This is a method that detects various signals using a neural network that can be programmed.
(作用)
前述のような方式にすると、音声帯域内信号をまず第1
.第2.第3ホルマントと言ったスペクトラム抽出を行
った上高い学習能力を有するニューラルネットに識別さ
せる方式としたので短い識別時間で確度の高い音声帯域
内信号の識別が実現できる。(Function) When using the method described above, the signal within the audio band is first
.. Second. Since the system extracts a spectrum such as the third formant and then uses a neural network with high learning ability to perform identification, it is possible to identify signals within the voice band with high accuracy in a short identification time.
(実施例)
第1図は本発明の方式図であって、入力信号は入力端子
1に入力され、スペクトラム抽出部2で各スペクトラム
が抽出され、その出力は予め所望する信号を学習された
ニューラルネット3全通して出力端子4に検出される。(Example) Fig. 1 is a system diagram of the present invention, in which an input signal is input to an input terminal 1, each spectrum is extracted by a spectrum extractor 2, and the output is a neural network that has been trained on the desired signal in advance. The signal is detected at the output terminal 4 through the entire net 3.
第2図は第1図の主としてスペクトラム抽出部の実施例
を示すブロック図であり、入力端子1はアナログ・ディ
ジタル変換回路(A/D)5に接続され、該A/Dの出
力はスペクトラム抽出を行う離散的フーリエ変換装置(
DFT ) 6 、7 、8に接続されている。該DF
T 6 、7 、8の出力は各々ニューラルネット3に
接続され、その出力が出力端子40には音声信号、出力
端子4ノには非音声信号が出力される。FIG. 2 is a block diagram mainly showing an embodiment of the spectrum extraction section in FIG. A discrete Fourier transform device (
DFT) 6, 7, and 8. The DF
The outputs of T 6 , 7 , and 8 are each connected to the neural network 3 , and the outputs thereof are outputted as an audio signal to the output terminal 40 and a non-audio signal to the output terminal 4 .
離散的フーリエ変換装置(DFT ) 6 、7 、8
は一般にはディゾタルシグナルプロセノサ等を使用して
入力されるデイ・ノタル信号を離散的フーリエ変換を行
わしめスペクトラム抽出を行う周知の装置である。Discrete Fourier Transform (DFT) 6, 7, 8
is a well-known device that performs discrete Fourier transform on an input di-notal signal using a dizotal signal processor or the like to extract a spectrum.
第3図、第4図は二一一うルネノトの説明図であり、第
3図はそのユニット、第4図はその構成の一例を示す図
である。FIGS. 3 and 4 are explanatory diagrams of the 211-rune notebook, with FIG. 3 showing its unit and FIG. 4 showing an example of its configuration.
以下にこれらの動作を説明する。入力信号は先に述べた
ように電話回線等からくる音声帯域内信号であり、アナ
ログ信号である。これが第2図の入力端子1に入力され
、アナログ・ディジタル変換回路(A/D)5でディジ
タル信号に変換する。These operations will be explained below. As mentioned above, the input signal is a voice band signal coming from a telephone line or the like, and is an analog signal. This is input to the input terminal 1 in FIG. 2, and converted into a digital signal by an analog/digital conversion circuit (A/D) 5.
これは以下の処理をディジタル信号処理で行うためであ
る。(スペクトラム抽出およびニューラルネットワーク
での処理はデイソタルシグナルプロセノサで行うため)
前記のディジタル信号を音声の特徴抽出を行うため、例
えば音声の第1.第2.第3ホルマントのスペクトラム
抽出を行うように中心周波数を設定した離散的フーリエ
変換装置(DFTJ、2.3)5.7.8でスペクトラ
ム抽出を行う。そしてその出力によって予め各種信号の
特徴を学習させられているニューラルネット3にそれぞ
れ入力させる。This is because the following processing is performed by digital signal processing. (Spectrum extraction and neural network processing are performed by a deisotal signal processor.) In order to extract audio features from the digital signal, for example, the first . Second. Spectrum extraction is performed using a discrete Fourier transform device (DFTJ, 2.3) 5.7.8 whose center frequency is set to perform spectrum extraction of the third formant. The outputs are then input to the neural network 3, which has been trained in advance to learn the characteristics of various signals.
ニューラルネットは例えば文献(日経エレクトロニクス
、第427号、1987年8月10日 日経マグロウヒ
ル社発行、「ニューラル・ネットをパターン認識、信号
処理、知識処理に使う]、115頁〜124頁)に記載
されているように近年脚光を浴びてきたシステムである
。Neural nets are described, for example, in the literature (Nikkei Electronics, No. 427, August 10, 1987, published by Nikkei McGraw-Hill, ``Using neural nets for pattern recognition, signal processing, and knowledge processing'', pp. 115-124). This is a system that has been in the spotlight in recent years.
その構成は第3図に示すようなユニットを複数個複雑に
組合せ接続したものであり、一般にはディノタルシグナ
ルプロセノサを使用し、相応のアルゴリズムで学習させ
る。第3図のユニットを説明すると、一つのユニットを
或いはmは他のユニットからの入力を受ける部分、入力
を一定の規則で変換する部分、結果を出力する部分から
成っている。他のユニットとの結合部にはそれぞれ可変
の重み(第3図ではWtm)を付ける。この値を変・え
るとネットワークの構造が変る。ネットワークの学習と
はこの重みを変えることである。Its configuration is a complex combination of a plurality of units as shown in FIG. 3, and generally uses a Dinotal signal processor and performs learning using a corresponding algorithm. To explain the unit in FIG. 3, one unit or m consists of a part that receives input from other units, a part that converts the input according to a certain rule, and a part that outputs the result. A variable weight (Wtm in FIG. 3) is attached to each connection portion with another unit. Changing this value changes the network structure. Network learning involves changing these weights.
ユニットtからユニットmへの前向きの学習ではユニノ
)mにおいてまず入力の総和Xmがとなり、本実施例で
はユニノ)mの出力特性はXmとなるよう設定した。In forward learning from unit t to unit m, the sum of inputs in Unino)m is first equal to Xm, and in this embodiment, the output characteristic of Unino)m is set to be Xm.
本実施例で採用したニューラルネットの構成は・ンター
ン連想型ニューラルネットであり、アルゴリズムはバッ
クグロパr−ジョンである。第4図に示すように、第3
図で示したユニットを複数個の入力層、中間層、出力層
の方向に結合させている。各層内での結合はなく、出力
層から入力層に向う結合もない。いわゆる“前向き“の
ネットワークである。The configuration of the neural network adopted in this embodiment is a turn-associative neural network, and the algorithm is a backlog algorithm. As shown in Figure 4, the third
The units shown in the figure are coupled in the directions of a plurality of input layers, intermediate layers, and output layers. There are no connections within each layer, and no connections from the output layer to the input layer. This is a so-called "forward-looking" network.
学習は入力部yot〜yo3 から入力層の各ユニット
1.1〜1.3に入力データを与える。本実施例は第2
図のDFT 6 、7 、8の出力を与える。(Wll
l〜w333等は重み付けを表す)この入力信号は各ユ
ニット1.1〜1.3で変換され、中間層の各ユニット
2.1〜2.3に伝わシ、最後に出力層の各ユニット3
.1〜3.3から出てくる。その出力値と、望ましい出
力値を比べその差を減らすように結合の強さを変える。In the learning, input data is given to each unit 1.1 to 1.3 of the input layer from the input units yot to yo3. This example is the second
The outputs of DFT 6, 7, and 8 in the figure are given. (Wll
(l to w333, etc. represent weighting) This input signal is converted in each unit 1.1 to 1.3, transmitted to each unit 2.1 to 2.3 of the intermediate layer, and finally to each unit 3 of the output layer.
.. It comes out from 1 to 3.3. The output value is compared with the desired output value, and the strength of the coupling is changed to reduce the difference.
本実施例では、ある入カバターンを与えたときの出力ユ
ニノ)nの出力値を)’ nsそのユニットでの望まし
い値をrnとしたときの2乗誤差En即ちを最小にする
ように出力層から入力層に向って逆向きに学習、即ち各
ユニットの重みWill〜W333を変えてゆくように
した。これをパックプロ・ぞr−ジョンと言う。学習は
前述の値が最小になるまで繰り返す。学習が終えたら勿
論固定させる。In this embodiment, when a certain input cover pattern is given, the output value of the output unit (n) is )'ns, and the desired value for that unit is rn, and the output layer is set so as to minimize the squared error En, that is, Learning is performed in the reverse direction toward the input layer, that is, the weights Will to W333 of each unit are changed. This is called Pack Pro Z-R-John. Learning is repeated until the above value is minimized. After you finish learning, of course, fix it.
本実施例では音声信号と非音声信号とを識別することを
先ず主眼とし、入力層、中間層、出力層のユニット数を
それぞれ16,8.3としく8゜4.3でも行った)学
習を音声信号、モデム信号、トーン信号によって行わせ
た。その結果その識別検出時間は数ミリセカンドで実現
できている。In this example, the main focus was on distinguishing between audio signals and non-audio signals, and the number of units in the input layer, intermediate layer, and output layer was set to 16 and 8.3, respectively, and learning was also performed at 8° 4.3) was performed using voice signals, modem signals, and tone signals. As a result, the identification detection time can be realized in a few milliseconds.
つまり第2図におけるDFT 6 、7 、1?でス4
りl・ラム抽出された出力は前述したように二一一うル
ネソトの第4図で言えばyot l 3’02 、ya
aの入力部へ与えられ、このニューラルネットで予め学
習された処理・ンターンで識別されその出力層から出力
される。第2図での出力端子は音声信号40、非音声信
号41の出力のみ示したが、識別はそれ以上複数種類で
きることは当然である。In other words, DFT 6, 7, 1 in Fig. 2? Desu 4
As mentioned above, the extracted output is yot l 3'02, ya in Figure 4 of 211 Rene Soto.
It is applied to the input part of a, is identified by processing/turns learned in advance by this neural network, and is output from its output layer. Although the output terminals in FIG. 2 only show the output of the audio signal 40 and the non-audio signal 41, it goes without saying that more than one type of identification can be made.
(発明の効果)
本発明は以上説明したように、入力される音声帯域内信
号をまず第1、第2、第3ホルマントといったスペクト
ラム抽出を行った上、学習能力の高いニューラルネット
に信号の識別をさせた方式であるので、極めて高い精度
の識別と短い時間での識別が実現でき、音声14’ケノ
トを用いるシステム全般に応用できるものであり効果大
なるものである0(Effects of the Invention) As explained above, the present invention first performs spectrum extraction such as first, second, and third formants on an input voice band signal, and then uses a neural network with high learning ability to identify the signal. Since this is a method that allows identification to be performed with extremely high accuracy and in a short time, it can be applied to all systems that use audio 14'kenoto and is highly effective.
第1図は本発明の方式図、第2図は本発明の実施例のス
ペクトラム抽出部のブロック図、第3図はニューラルネ
ットのユニットの説明図、第4図は二一一うルネノトの
構成を示す図である。
1・・・入力端子、2・・・スペクトラム抽出部、3・
・・ニューラルネット、4.40.41・・・出力端子
、′5・・・アナログ・ディジタル変換器。
特許出願人 沖電気工業株式会社
本発明1文に図
第1図
スぐクトウム釉出■戸のゴロ、7図
第2図
ニz−5ルネットの精成図
第4図Fig. 1 is a system diagram of the present invention, Fig. 2 is a block diagram of the spectrum extraction section of the embodiment of the present invention, Fig. 3 is an explanatory diagram of the neural network unit, and Fig. 4 is the configuration of the 211 Rune Note. FIG. 1... Input terminal, 2... Spectrum extraction section, 3.
...Neural net, 4.40.41...Output terminal,'5...Analog-digital converter. Patent Applicant: Oki Electric Industry Co., Ltd. Invention 1. Figure 1. Sugukutoum glaze ■ Door goro, Figure 7. Figure 2. Finished drawing of the z-5 lunette. Figure 4.
Claims (1)
識別をするため、音声帯域内信号のスペクトラム抽出を
行う手段と該スペクトラム抽出による出力をニューラル
ネットに与え、予め該ニューラルネットに信号の特徴を
学習させておくことによって音声帯域内信号の中の各種
信号を識別し検出する音声帯域内信号検出方式。In order to distinguish between audio signals and various non-audio signals in the audio band signal, a means for extracting the spectrum of the audio band signal and the output from the spectrum extraction are provided to a neural network, and the neural network is trained in advance on the characteristics of the signal. An audio band signal detection method that identifies and detects various signals within the audio band signal by keeping the signals in the audio band.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63105522A JPH01277899A (en) | 1988-04-30 | 1988-04-30 | In-speech-band signal detection system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63105522A JPH01277899A (en) | 1988-04-30 | 1988-04-30 | In-speech-band signal detection system |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH01277899A true JPH01277899A (en) | 1989-11-08 |
Family
ID=14409929
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP63105522A Pending JPH01277899A (en) | 1988-04-30 | 1988-04-30 | In-speech-band signal detection system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH01277899A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2724029A1 (en) * | 1992-03-17 | 1996-03-01 | Thomson Csf | Neural net detection of acoustic signals from torpedo |
US5611019A (en) * | 1993-05-19 | 1997-03-11 | Matsushita Electric Industrial Co., Ltd. | Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech |
JP2006171714A (en) * | 2004-11-22 | 2006-06-29 | Institute Of Physical & Chemical Research | Self-development type voice language pattern recognition system, and method and program for structuring self-organizing neural network structure used for same system |
CN109074820A (en) * | 2016-05-10 | 2018-12-21 | 谷歌有限责任公司 | Audio processing is carried out using neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58123293A (en) * | 1982-01-08 | 1983-07-22 | エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン | Multifrequency signal detecting method |
JPS6238097A (en) * | 1985-07-12 | 1987-02-19 | エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン | Receiver |
-
1988
- 1988-04-30 JP JP63105522A patent/JPH01277899A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58123293A (en) * | 1982-01-08 | 1983-07-22 | エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン | Multifrequency signal detecting method |
JPS6238097A (en) * | 1985-07-12 | 1987-02-19 | エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン | Receiver |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2724029A1 (en) * | 1992-03-17 | 1996-03-01 | Thomson Csf | Neural net detection of acoustic signals from torpedo |
US5611019A (en) * | 1993-05-19 | 1997-03-11 | Matsushita Electric Industrial Co., Ltd. | Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech |
JP2006171714A (en) * | 2004-11-22 | 2006-06-29 | Institute Of Physical & Chemical Research | Self-development type voice language pattern recognition system, and method and program for structuring self-organizing neural network structure used for same system |
CN109074820A (en) * | 2016-05-10 | 2018-12-21 | 谷歌有限责任公司 | Audio processing is carried out using neural network |
CN109074820B (en) * | 2016-05-10 | 2023-09-12 | 谷歌有限责任公司 | Audio processing using neural networks |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2764277B2 (en) | Voice recognition device | |
CN107393526B (en) | Voice silence detection method, device, computer equipment and storage medium | |
US4918735A (en) | Speech recognition apparatus for recognizing the category of an input speech pattern | |
EP0763810B1 (en) | Speech signal processing apparatus for detecting a speech signal from a noisy speech signal | |
CA1172363A (en) | Continuous speech recognition method | |
US4624011A (en) | Speech recognition system | |
EP0074822B1 (en) | Recognition of speech or speech-like sounds | |
GB2107100A (en) | Continuous speech recognition | |
FR2743238A1 (en) | TELECOMMUNICATION DEVICE RESPONDING TO VOICE ORDERS AND METHOD OF USING THE SAME | |
KR0173923B1 (en) | Phoneme Segmentation Using Multi-Layer Neural Networks | |
JPH03273722A (en) | Sound/modem signal identifying circuit | |
WO1989002146A1 (en) | Improvements in or relating to apparatus and methods for voice recognition | |
US5819209A (en) | Pitch period extracting apparatus of speech signal | |
US5159637A (en) | Speech word recognizing apparatus using information indicative of the relative significance of speech features | |
JPH01277899A (en) | In-speech-band signal detection system | |
US5745874A (en) | Preprocessor for automatic speech recognition system | |
JPS58123293A (en) | Multifrequency signal detecting method | |
JPS63250932A (en) | Circuit system for double-voice and multifrequency signal detection in telephone equipment | |
KR100480506B1 (en) | Speech recognition method | |
JPH04369698A (en) | Voice recognition system | |
Pinto et al. | Using neural networks for automatic speaker recognition: a practical approach | |
Vieira et al. | Speaker verification for security systems using artificial neural networks | |
JPS63238679A (en) | Input recognizing device | |
Harb et al. | Isolated words recognition using neural networks | |
JPH0573090A (en) | Speech recognizing method |