JPH01277899A - In-speech-band signal detection system - Google Patents

In-speech-band signal detection system

Info

Publication number
JPH01277899A
JPH01277899A JP63105522A JP10552288A JPH01277899A JP H01277899 A JPH01277899 A JP H01277899A JP 63105522 A JP63105522 A JP 63105522A JP 10552288 A JP10552288 A JP 10552288A JP H01277899 A JPH01277899 A JP H01277899A
Authority
JP
Japan
Prior art keywords
signal
speech
input
audio
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63105522A
Other languages
Japanese (ja)
Inventor
Yukinao Hashizume
橋爪 幸直
Kiyoshi Shimokoshi
霜越 潔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP63105522A priority Critical patent/JPH01277899A/en
Publication of JPH01277899A publication Critical patent/JPH01277899A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To enable discrimination with high accuracy and in a short time by extracting the spectrum of an input in-speech-band signal and allowing a neutral network which has high learning ability to discriminate a signal. CONSTITUTION:The input signal is inputted to an input terminal 1 and a discrete Fourier transforming device which has its center frequency set so as to perform the spectrum extraction of the 1st, 2nd, and 3rd formants of a speech performs the spectrum extraction so as to extract the features of the speech by a spectrum extraction part 2. Then, the spectra are supplied to the neutral network 3 which has learnt the features of various signals previously with the output of the transforming device and discriminates them with previously learnt processing patterns, thereby outputting them from its output layer. Consequently, the speech signal of a speech packet and a no-speech signal are discriminated in a short time with high accuracy.

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は音声パケットにおける音声信号検出方式に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for detecting audio signals in audio packets.

音声パケットとは周知のように、パケット交換システム
に用いられたり、音声の無音区間に他チャンネルの音声
を挿入して回線の有効利用を図るために用いられる技法
で、音声を・!ケノト化する技術である。
As is well known, voice packet is a technique used in packet switching systems and to insert audio from other channels into silent periods of audio to make effective use of the line. It is a technique to become a kenot.

(従来の技術) 前述のように音声・9ケyトは音声をパケット化するの
であるから、当然音声信号の正確な識別が必要となって
くる。なかでも音声帯域内信号には、いわゆる音声信号
(単に音声と呼ぶのと同意)と音声信号以外の信号(以
下非音声信号と称す)例えばモデム信号、PB倍信号ト
ーン信号等が含まれているのが通信の常でその識別の確
度が重要な技術となってくる。
(Prior Art) As mentioned above, since the voice 9-key packet is used to packetize the voice, it is naturally necessary to accurately identify the voice signal. Among these, the audio band signals include so-called audio signals (simply called audio) and signals other than audio signals (hereinafter referred to as non-audio signals) such as modem signals, PB double signal tone signals, etc. This is a constant in communications, and the accuracy of identification is an important technology.

一般に音声・?ケソトにおいては、音声信号と非音声信
号を識別し音声信号を無音サグレスする。
Audio in general? In Kesoto, audio signals are distinguished from non-audio signals, and audio signals are silenced.

しかし非音声信号まで無音サプレスされると、その信号
を受ける受信器が誤動作する可能性が生じる。従って伝
送効率の向上と受信器の誤動作防止のためには精度のよ
い前述の識別が必須となってく  る 。
However, if even non-audio signals are silently suppressed, there is a possibility that a receiver receiving the signals will malfunction. Therefore, in order to improve transmission efficiency and prevent receiver malfunction, the above-mentioned accurate identification is essential.

このための技術としては、−例として昭;fo61年度
電子通信学会通信部門全国大会講演論文集、社団法人電
子通信学会発行、331「ス4クトルの8・にワ分散を
利用したDSI用音声検出器j2−149頁に示されて
いるように、使われる音声帯域内信号に着目し分割周波
数のチャンネル数を決定し、その出力における音声とそ
の成分の違いを検出する帯域分割法等が用いられている
Techniques for this purpose include, for example, Proceedings of the 1961 National Conference of the Telecommunications Society of the Institute of Electronics and Communication Engineers, published by the Institute of Electronics and Communication Engineers, 331. As shown on page 2-149, a band division method is used that focuses on the audio band signal to be used, determines the number of divided frequency channels, and detects the difference between the audio and its components in the output. ing.

(発明が解決しようどする課題) しかしながら前述の方式であると、非音声信号の種類が
多いとすべてに対処したスにクトル抽出は至難であり、
また音声信号にも非音声信号のスペクトラムを含んでい
るため識別は難かしく、識別のための時間も数十〜数百
ミリセカンドと多くを要する。
(Problem to be solved by the invention) However, with the above-mentioned method, it is extremely difficult to extract vectors that deal with all types of non-audio signals.
Furthermore, since the audio signal also includes the spectrum of the non-audio signal, it is difficult to identify it, and it takes a long time, ranging from tens to hundreds of milliseconds.

この発明は前述のスペクトラム抽出の難かしさと識別時
間の問題点を除去し、音声・9ケツトにおける音声信号
と非音声信号の識別を短い時間で確度高く実現させる音
声信号検出方式を提供するものである。
The present invention provides a voice signal detection method that eliminates the difficulty of spectrum extraction and the problem of identification time described above, and realizes the discrimination between voice and non-voice signals in a short time and with high accuracy. It is.

(課題を解決するための手段) 前記目的を達成するために、本発明の音声信号検出方式
は、音声信号の、スペクトラム抽出を行う手段と、その
抽出された出力によって各種信号の特徴を予め学習させ
ておくことができるニューラルネットを用いて各種信号
を検出させる方式である。
(Means for Solving the Problems) In order to achieve the above object, the audio signal detection method of the present invention includes means for extracting a spectrum of an audio signal, and features of various signals learned in advance using the extracted output. This is a method that detects various signals using a neural network that can be programmed.

(作用) 前述のような方式にすると、音声帯域内信号をまず第1
.第2.第3ホルマントと言ったスペクトラム抽出を行
った上高い学習能力を有するニューラルネットに識別さ
せる方式としたので短い識別時間で確度の高い音声帯域
内信号の識別が実現できる。
(Function) When using the method described above, the signal within the audio band is first
.. Second. Since the system extracts a spectrum such as the third formant and then uses a neural network with high learning ability to perform identification, it is possible to identify signals within the voice band with high accuracy in a short identification time.

(実施例) 第1図は本発明の方式図であって、入力信号は入力端子
1に入力され、スペクトラム抽出部2で各スペクトラム
が抽出され、その出力は予め所望する信号を学習された
ニューラルネット3全通して出力端子4に検出される。
(Example) Fig. 1 is a system diagram of the present invention, in which an input signal is input to an input terminal 1, each spectrum is extracted by a spectrum extractor 2, and the output is a neural network that has been trained on the desired signal in advance. The signal is detected at the output terminal 4 through the entire net 3.

第2図は第1図の主としてスペクトラム抽出部の実施例
を示すブロック図であり、入力端子1はアナログ・ディ
ジタル変換回路(A/D)5に接続され、該A/Dの出
力はスペクトラム抽出を行う離散的フーリエ変換装置(
DFT ) 6 、7 、8に接続されている。該DF
T 6 、7 、8の出力は各々ニューラルネット3に
接続され、その出力が出力端子40には音声信号、出力
端子4ノには非音声信号が出力される。
FIG. 2 is a block diagram mainly showing an embodiment of the spectrum extraction section in FIG. A discrete Fourier transform device (
DFT) 6, 7, and 8. The DF
The outputs of T 6 , 7 , and 8 are each connected to the neural network 3 , and the outputs thereof are outputted as an audio signal to the output terminal 40 and a non-audio signal to the output terminal 4 .

離散的フーリエ変換装置(DFT ) 6 、7 、8
は一般にはディゾタルシグナルプロセノサ等を使用して
入力されるデイ・ノタル信号を離散的フーリエ変換を行
わしめスペクトラム抽出を行う周知の装置である。
Discrete Fourier Transform (DFT) 6, 7, 8
is a well-known device that performs discrete Fourier transform on an input di-notal signal using a dizotal signal processor or the like to extract a spectrum.

第3図、第4図は二一一うルネノトの説明図であり、第
3図はそのユニット、第4図はその構成の一例を示す図
である。
FIGS. 3 and 4 are explanatory diagrams of the 211-rune notebook, with FIG. 3 showing its unit and FIG. 4 showing an example of its configuration.

以下にこれらの動作を説明する。入力信号は先に述べた
ように電話回線等からくる音声帯域内信号であり、アナ
ログ信号である。これが第2図の入力端子1に入力され
、アナログ・ディジタル変換回路(A/D)5でディジ
タル信号に変換する。
These operations will be explained below. As mentioned above, the input signal is a voice band signal coming from a telephone line or the like, and is an analog signal. This is input to the input terminal 1 in FIG. 2, and converted into a digital signal by an analog/digital conversion circuit (A/D) 5.

これは以下の処理をディジタル信号処理で行うためであ
る。(スペクトラム抽出およびニューラルネットワーク
での処理はデイソタルシグナルプロセノサで行うため) 前記のディジタル信号を音声の特徴抽出を行うため、例
えば音声の第1.第2.第3ホルマントのスペクトラム
抽出を行うように中心周波数を設定した離散的フーリエ
変換装置(DFTJ、2.3)5.7.8でスペクトラ
ム抽出を行う。そしてその出力によって予め各種信号の
特徴を学習させられているニューラルネット3にそれぞ
れ入力させる。
This is because the following processing is performed by digital signal processing. (Spectrum extraction and neural network processing are performed by a deisotal signal processor.) In order to extract audio features from the digital signal, for example, the first . Second. Spectrum extraction is performed using a discrete Fourier transform device (DFTJ, 2.3) 5.7.8 whose center frequency is set to perform spectrum extraction of the third formant. The outputs are then input to the neural network 3, which has been trained in advance to learn the characteristics of various signals.

ニューラルネットは例えば文献(日経エレクトロニクス
、第427号、1987年8月10日 日経マグロウヒ
ル社発行、「ニューラル・ネットをパターン認識、信号
処理、知識処理に使う]、115頁〜124頁)に記載
されているように近年脚光を浴びてきたシステムである
Neural nets are described, for example, in the literature (Nikkei Electronics, No. 427, August 10, 1987, published by Nikkei McGraw-Hill, ``Using neural nets for pattern recognition, signal processing, and knowledge processing'', pp. 115-124). This is a system that has been in the spotlight in recent years.

その構成は第3図に示すようなユニットを複数個複雑に
組合せ接続したものであり、一般にはディノタルシグナ
ルプロセノサを使用し、相応のアルゴリズムで学習させ
る。第3図のユニットを説明すると、一つのユニットを
或いはmは他のユニットからの入力を受ける部分、入力
を一定の規則で変換する部分、結果を出力する部分から
成っている。他のユニットとの結合部にはそれぞれ可変
の重み(第3図ではWtm)を付ける。この値を変・え
るとネットワークの構造が変る。ネットワークの学習と
はこの重みを変えることである。
Its configuration is a complex combination of a plurality of units as shown in FIG. 3, and generally uses a Dinotal signal processor and performs learning using a corresponding algorithm. To explain the unit in FIG. 3, one unit or m consists of a part that receives input from other units, a part that converts the input according to a certain rule, and a part that outputs the result. A variable weight (Wtm in FIG. 3) is attached to each connection portion with another unit. Changing this value changes the network structure. Network learning involves changing these weights.

ユニットtからユニットmへの前向きの学習ではユニノ
)mにおいてまず入力の総和Xmがとなり、本実施例で
はユニノ)mの出力特性はXmとなるよう設定した。
In forward learning from unit t to unit m, the sum of inputs in Unino)m is first equal to Xm, and in this embodiment, the output characteristic of Unino)m is set to be Xm.

本実施例で採用したニューラルネットの構成は・ンター
ン連想型ニューラルネットであり、アルゴリズムはバッ
クグロパr−ジョンである。第4図に示すように、第3
図で示したユニットを複数個の入力層、中間層、出力層
の方向に結合させている。各層内での結合はなく、出力
層から入力層に向う結合もない。いわゆる“前向き“の
ネットワークである。
The configuration of the neural network adopted in this embodiment is a turn-associative neural network, and the algorithm is a backlog algorithm. As shown in Figure 4, the third
The units shown in the figure are coupled in the directions of a plurality of input layers, intermediate layers, and output layers. There are no connections within each layer, and no connections from the output layer to the input layer. This is a so-called "forward-looking" network.

学習は入力部yot〜yo3 から入力層の各ユニット
1.1〜1.3に入力データを与える。本実施例は第2
図のDFT 6 、7 、8の出力を与える。(Wll
l〜w333等は重み付けを表す)この入力信号は各ユ
ニット1.1〜1.3で変換され、中間層の各ユニット
2.1〜2.3に伝わシ、最後に出力層の各ユニット3
.1〜3.3から出てくる。その出力値と、望ましい出
力値を比べその差を減らすように結合の強さを変える。
In the learning, input data is given to each unit 1.1 to 1.3 of the input layer from the input units yot to yo3. This example is the second
The outputs of DFT 6, 7, and 8 in the figure are given. (Wll
(l to w333, etc. represent weighting) This input signal is converted in each unit 1.1 to 1.3, transmitted to each unit 2.1 to 2.3 of the intermediate layer, and finally to each unit 3 of the output layer.
.. It comes out from 1 to 3.3. The output value is compared with the desired output value, and the strength of the coupling is changed to reduce the difference.

本実施例では、ある入カバターンを与えたときの出力ユ
ニノ)nの出力値を)’ nsそのユニットでの望まし
い値をrnとしたときの2乗誤差En即ちを最小にする
ように出力層から入力層に向って逆向きに学習、即ち各
ユニットの重みWill〜W333を変えてゆくように
した。これをパックプロ・ぞr−ジョンと言う。学習は
前述の値が最小になるまで繰り返す。学習が終えたら勿
論固定させる。
In this embodiment, when a certain input cover pattern is given, the output value of the output unit (n) is )'ns, and the desired value for that unit is rn, and the output layer is set so as to minimize the squared error En, that is, Learning is performed in the reverse direction toward the input layer, that is, the weights Will to W333 of each unit are changed. This is called Pack Pro Z-R-John. Learning is repeated until the above value is minimized. After you finish learning, of course, fix it.

本実施例では音声信号と非音声信号とを識別することを
先ず主眼とし、入力層、中間層、出力層のユニット数を
それぞれ16,8.3としく8゜4.3でも行った)学
習を音声信号、モデム信号、トーン信号によって行わせ
た。その結果その識別検出時間は数ミリセカンドで実現
できている。
In this example, the main focus was on distinguishing between audio signals and non-audio signals, and the number of units in the input layer, intermediate layer, and output layer was set to 16 and 8.3, respectively, and learning was also performed at 8° 4.3) was performed using voice signals, modem signals, and tone signals. As a result, the identification detection time can be realized in a few milliseconds.

つまり第2図におけるDFT 6 、7 、1?でス4
りl・ラム抽出された出力は前述したように二一一うル
ネソトの第4図で言えばyot l 3’02 、ya
aの入力部へ与えられ、このニューラルネットで予め学
習された処理・ンターンで識別されその出力層から出力
される。第2図での出力端子は音声信号40、非音声信
号41の出力のみ示したが、識別はそれ以上複数種類で
きることは当然である。
In other words, DFT 6, 7, 1 in Fig. 2? Desu 4
As mentioned above, the extracted output is yot l 3'02, ya in Figure 4 of 211 Rene Soto.
It is applied to the input part of a, is identified by processing/turns learned in advance by this neural network, and is output from its output layer. Although the output terminals in FIG. 2 only show the output of the audio signal 40 and the non-audio signal 41, it goes without saying that more than one type of identification can be made.

(発明の効果) 本発明は以上説明したように、入力される音声帯域内信
号をまず第1、第2、第3ホルマントといったスペクト
ラム抽出を行った上、学習能力の高いニューラルネット
に信号の識別をさせた方式であるので、極めて高い精度
の識別と短い時間での識別が実現でき、音声14’ケノ
トを用いるシステム全般に応用できるものであり効果大
なるものである0
(Effects of the Invention) As explained above, the present invention first performs spectrum extraction such as first, second, and third formants on an input voice band signal, and then uses a neural network with high learning ability to identify the signal. Since this is a method that allows identification to be performed with extremely high accuracy and in a short time, it can be applied to all systems that use audio 14'kenoto and is highly effective.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の方式図、第2図は本発明の実施例のス
ペクトラム抽出部のブロック図、第3図はニューラルネ
ットのユニットの説明図、第4図は二一一うルネノトの
構成を示す図である。 1・・・入力端子、2・・・スペクトラム抽出部、3・
・・ニューラルネット、4.40.41・・・出力端子
、′5・・・アナログ・ディジタル変換器。 特許出願人  沖電気工業株式会社 本発明1文に図 第1図 スぐクトウム釉出■戸のゴロ、7図 第2図 ニz−5ルネットの精成図 第4図
Fig. 1 is a system diagram of the present invention, Fig. 2 is a block diagram of the spectrum extraction section of the embodiment of the present invention, Fig. 3 is an explanatory diagram of the neural network unit, and Fig. 4 is the configuration of the 211 Rune Note. FIG. 1... Input terminal, 2... Spectrum extraction section, 3.
...Neural net, 4.40.41...Output terminal,'5...Analog-digital converter. Patent Applicant: Oki Electric Industry Co., Ltd. Invention 1. Figure 1. Sugukutoum glaze ■ Door goro, Figure 7. Figure 2. Finished drawing of the z-5 lunette. Figure 4.

Claims (1)

【特許請求の範囲】[Claims]  音声帯域内信号における音声信号と各種非音声信号の
識別をするため、音声帯域内信号のスペクトラム抽出を
行う手段と該スペクトラム抽出による出力をニューラル
ネットに与え、予め該ニューラルネットに信号の特徴を
学習させておくことによって音声帯域内信号の中の各種
信号を識別し検出する音声帯域内信号検出方式。
In order to distinguish between audio signals and various non-audio signals in the audio band signal, a means for extracting the spectrum of the audio band signal and the output from the spectrum extraction are provided to a neural network, and the neural network is trained in advance on the characteristics of the signal. An audio band signal detection method that identifies and detects various signals within the audio band signal by keeping the signals in the audio band.
JP63105522A 1988-04-30 1988-04-30 In-speech-band signal detection system Pending JPH01277899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63105522A JPH01277899A (en) 1988-04-30 1988-04-30 In-speech-band signal detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63105522A JPH01277899A (en) 1988-04-30 1988-04-30 In-speech-band signal detection system

Publications (1)

Publication Number Publication Date
JPH01277899A true JPH01277899A (en) 1989-11-08

Family

ID=14409929

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63105522A Pending JPH01277899A (en) 1988-04-30 1988-04-30 In-speech-band signal detection system

Country Status (1)

Country Link
JP (1) JPH01277899A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2724029A1 (en) * 1992-03-17 1996-03-01 Thomson Csf Neural net detection of acoustic signals from torpedo
US5611019A (en) * 1993-05-19 1997-03-11 Matsushita Electric Industrial Co., Ltd. Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech
JP2006171714A (en) * 2004-11-22 2006-06-29 Institute Of Physical & Chemical Research Self-development type voice language pattern recognition system, and method and program for structuring self-organizing neural network structure used for same system
CN109074820A (en) * 2016-05-10 2018-12-21 谷歌有限责任公司 Audio processing is carried out using neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58123293A (en) * 1982-01-08 1983-07-22 エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン Multifrequency signal detecting method
JPS6238097A (en) * 1985-07-12 1987-02-19 エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン Receiver

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58123293A (en) * 1982-01-08 1983-07-22 エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン Multifrequency signal detecting method
JPS6238097A (en) * 1985-07-12 1987-02-19 エヌ・ベ−・フイリツプス・フル−イランペンフアブリケン Receiver

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2724029A1 (en) * 1992-03-17 1996-03-01 Thomson Csf Neural net detection of acoustic signals from torpedo
US5611019A (en) * 1993-05-19 1997-03-11 Matsushita Electric Industrial Co., Ltd. Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech
JP2006171714A (en) * 2004-11-22 2006-06-29 Institute Of Physical & Chemical Research Self-development type voice language pattern recognition system, and method and program for structuring self-organizing neural network structure used for same system
CN109074820A (en) * 2016-05-10 2018-12-21 谷歌有限责任公司 Audio processing is carried out using neural network
CN109074820B (en) * 2016-05-10 2023-09-12 谷歌有限责任公司 Audio processing using neural networks

Similar Documents

Publication Publication Date Title
JP2764277B2 (en) Voice recognition device
CN107393526B (en) Voice silence detection method, device, computer equipment and storage medium
US4918735A (en) Speech recognition apparatus for recognizing the category of an input speech pattern
EP0763810B1 (en) Speech signal processing apparatus for detecting a speech signal from a noisy speech signal
CA1172363A (en) Continuous speech recognition method
US4624011A (en) Speech recognition system
EP0074822B1 (en) Recognition of speech or speech-like sounds
GB2107100A (en) Continuous speech recognition
FR2743238A1 (en) TELECOMMUNICATION DEVICE RESPONDING TO VOICE ORDERS AND METHOD OF USING THE SAME
KR0173923B1 (en) Phoneme Segmentation Using Multi-Layer Neural Networks
JPH03273722A (en) Sound/modem signal identifying circuit
WO1989002146A1 (en) Improvements in or relating to apparatus and methods for voice recognition
US5819209A (en) Pitch period extracting apparatus of speech signal
US5159637A (en) Speech word recognizing apparatus using information indicative of the relative significance of speech features
JPH01277899A (en) In-speech-band signal detection system
US5745874A (en) Preprocessor for automatic speech recognition system
JPS58123293A (en) Multifrequency signal detecting method
JPS63250932A (en) Circuit system for double-voice and multifrequency signal detection in telephone equipment
KR100480506B1 (en) Speech recognition method
JPH04369698A (en) Voice recognition system
Pinto et al. Using neural networks for automatic speaker recognition: a practical approach
Vieira et al. Speaker verification for security systems using artificial neural networks
JPS63238679A (en) Input recognizing device
Harb et al. Isolated words recognition using neural networks
JPH0573090A (en) Speech recognizing method