JPH01277899A

JPH01277899A - In-speech-band signal detection system

Info

Publication number: JPH01277899A
Application number: JP63105522A
Authority: JP
Inventors: Yukinao Hashizume; 橋爪　幸直; Kiyoshi Shimokoshi; 霜越　潔
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1988-04-30
Filing date: 1988-04-30
Publication date: 1989-11-08

Abstract

PURPOSE:To enable discrimination with high accuracy and in a short time by extracting the spectrum of an input in-speech-band signal and allowing a neutral network which has high learning ability to discriminate a signal. CONSTITUTION:The input signal is inputted to an input terminal 1 and a discrete Fourier transforming device which has its center frequency set so as to perform the spectrum extraction of the 1st, 2nd, and 3rd formants of a speech performs the spectrum extraction so as to extract the features of the speech by a spectrum extraction part 2. Then, the spectra are supplied to the neutral network 3 which has learnt the features of various signals previously with the output of the transforming device and discriminates them with previously learnt processing patterns, thereby outputting them from its output layer. Consequently, the speech signal of a speech packet and a no-speech signal are discriminated in a short time with high accuracy.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声パケットにおける音声信号検出方式に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for detecting audio signals in audio packets.

音声パケットとは周知のように、パケット交換システム
に用いられたり、音声の無音区間に他チャンネルの音声
を挿入して回線の有効利用を図るために用いられる技法
で、音声を・！ケノト化する技術である。As is well known, voice packet is a technique used in packet switching systems and to insert audio from other channels into silent periods of audio to make effective use of the line. It is a technique to become a kenot.

（従来の技術）前述のように音声・９ケｙトは音声をパケット化するの
であるから、当然音声信号の正確な識別が必要となって
くる。なかでも音声帯域内信号には、いわゆる音声信号
（単に音声と呼ぶのと同意）と音声信号以外の信号（以
下非音声信号と称す）例えばモデム信号、ＰＢ倍信号ト
ーン信号等が含まれているのが通信の常でその識別の確
度が重要な技術となってくる。(Prior Art) As mentioned above, since the voice 9-key packet is used to packetize the voice, it is naturally necessary to accurately identify the voice signal. Among these, the audio band signals include so-called audio signals (simply called audio) and signals other than audio signals (hereinafter referred to as non-audio signals) such as modem signals, PB double signal tone signals, etc. This is a constant in communications, and the accuracy of identification is an important technology.

一般に音声・？ケソトにおいては、音声信号と非音声信
号を識別し音声信号を無音サグレスする。Audio in general? In Kesoto, audio signals are distinguished from non-audio signals, and audio signals are silenced.

しかし非音声信号まで無音サプレスされると、その信号
を受ける受信器が誤動作する可能性が生じる。従って伝
送効率の向上と受信器の誤動作防止のためには精度のよ
い前述の識別が必須となってく　　る　。However, if even non-audio signals are silently suppressed, there is a possibility that a receiver receiving the signals will malfunction. Therefore, in order to improve transmission efficiency and prevent receiver malfunction, the above-mentioned accurate identification is essential.

このための技術としては、−例として昭；ｆｏ６１年度
電子通信学会通信部門全国大会講演論文集、社団法人電
子通信学会発行、３３１「ス４クトルの８・にワ分散を
利用したＤＳＩ用音声検出器ｊ２−１４９頁に示されて
いるように、使われる音声帯域内信号に着目し分割周波
数のチャンネル数を決定し、その出力における音声とそ
の成分の違いを検出する帯域分割法等が用いられている
。Techniques for this purpose include, for example, Proceedings of the 1961 National Conference of the Telecommunications Society of the Institute of Electronics and Communication Engineers, published by the Institute of Electronics and Communication Engineers, 331. As shown on page 2-149, a band division method is used that focuses on the audio band signal to be used, determines the number of divided frequency channels, and detects the difference between the audio and its components in the output. ing.

（発明が解決しようどする課題）しかしながら前述の方式であると、非音声信号の種類が
多いとすべてに対処したスにクトル抽出は至難であり、
また音声信号にも非音声信号のスペクトラムを含んでい
るため識別は難かしく、識別のための時間も数十〜数百
ミリセカンドと多くを要する。(Problem to be solved by the invention) However, with the above-mentioned method, it is extremely difficult to extract vectors that deal with all types of non-audio signals.
Furthermore, since the audio signal also includes the spectrum of the non-audio signal, it is difficult to identify it, and it takes a long time, ranging from tens to hundreds of milliseconds.

この発明は前述のスペクトラム抽出の難かしさと識別時
間の問題点を除去し、音声・９ケツトにおける音声信号
と非音声信号の識別を短い時間で確度高く実現させる音
声信号検出方式を提供するものである。The present invention provides a voice signal detection method that eliminates the difficulty of spectrum extraction and the problem of identification time described above, and realizes the discrimination between voice and non-voice signals in a short time and with high accuracy. It is.

（課題を解決するための手段）前記目的を達成するために、本発明の音声信号検出方式
は、音声信号の、スペクトラム抽出を行う手段と、その
抽出された出力によって各種信号の特徴を予め学習させ
ておくことができるニューラルネットを用いて各種信号
を検出させる方式である。(Means for Solving the Problems) In order to achieve the above object, the audio signal detection method of the present invention includes means for extracting a spectrum of an audio signal, and features of various signals learned in advance using the extracted output. This is a method that detects various signals using a neural network that can be programmed.

（作用）前述のような方式にすると、音声帯域内信号をまず第１
．第２．第３ホルマントと言ったスペクトラム抽出を行
った上高い学習能力を有するニューラルネットに識別さ
せる方式としたので短い識別時間で確度の高い音声帯域
内信号の識別が実現できる。(Function) When using the method described above, the signal within the audio band is first
．． Second. Since the system extracts a spectrum such as the third formant and then uses a neural network with high learning ability to perform identification, it is possible to identify signals within the voice band with high accuracy in a short identification time.

（実施例）第１図は本発明の方式図であって、入力信号は入力端子
１に入力され、スペクトラム抽出部２で各スペクトラム
が抽出され、その出力は予め所望する信号を学習された
ニューラルネット３全通して出力端子４に検出される。(Example) Fig. 1 is a system diagram of the present invention, in which an input signal is input to an input terminal 1, each spectrum is extracted by a spectrum extractor 2, and the output is a neural network that has been trained on the desired signal in advance. The signal is detected at the output terminal 4 through the entire net 3.

第２図は第１図の主としてスペクトラム抽出部の実施例
を示すブロック図であり、入力端子１はアナログ・ディ
ジタル変換回路（Ａ／Ｄ）５に接続され、該Ａ／Ｄの出
力はスペクトラム抽出を行う離散的フーリエ変換装置（
ＤＦＴ　）　６　、７　、８に接続されている。該ＤＦ
Ｔ　６　、７　、８の出力は各々ニューラルネット３に
接続され、その出力が出力端子４０には音声信号、出力
端子４ノには非音声信号が出力される。FIG. 2 is a block diagram mainly showing an embodiment of the spectrum extraction section in FIG. A discrete Fourier transform device (
DFT) 6, 7, and 8. The DF
The outputs of T 6 , 7 , and 8 are each connected to the neural network 3 , and the outputs thereof are outputted as an audio signal to the output terminal 40 and a non-audio signal to the output terminal 4 .

離散的フーリエ変換装置（ＤＦＴ　）　６　、７　、８
は一般にはディゾタルシグナルプロセノサ等を使用して
入力されるデイ・ノタル信号を離散的フーリエ変換を行
わしめスペクトラム抽出を行う周知の装置である。Discrete Fourier Transform (DFT) 6, 7, 8
is a well-known device that performs discrete Fourier transform on an input di-notal signal using a dizotal signal processor or the like to extract a spectrum.

第３図、第４図は二一一うルネノトの説明図であり、第
３図はそのユニット、第４図はその構成の一例を示す図
である。FIGS. 3 and 4 are explanatory diagrams of the 211-rune notebook, with FIG. 3 showing its unit and FIG. 4 showing an example of its configuration.

以下にこれらの動作を説明する。入力信号は先に述べた
ように電話回線等からくる音声帯域内信号であり、アナ
ログ信号である。これが第２図の入力端子１に入力され
、アナログ・ディジタル変換回路（Ａ／Ｄ）５でディジ
タル信号に変換する。These operations will be explained below. As mentioned above, the input signal is a voice band signal coming from a telephone line or the like, and is an analog signal. This is input to the input terminal 1 in FIG. 2, and converted into a digital signal by an analog/digital conversion circuit (A/D) 5.

これは以下の処理をディジタル信号処理で行うためであ
る。（スペクトラム抽出およびニューラルネットワーク
での処理はデイソタルシグナルプロセノサで行うため）前記のディジタル信号を音声の特徴抽出を行うため、例
えば音声の第１．第２．第３ホルマントのスペクトラム
抽出を行うように中心周波数を設定した離散的フーリエ
変換装置（ＤＦＴＪ、２．３）５．７．８でスペクトラ
ム抽出を行う。そしてその出力によって予め各種信号の
特徴を学習させられているニューラルネット３にそれぞ
れ入力させる。This is because the following processing is performed by digital signal processing. (Spectrum extraction and neural network processing are performed by a deisotal signal processor.) In order to extract audio features from the digital signal, for example, the first . Second. Spectrum extraction is performed using a discrete Fourier transform device (DFTJ, 2.3) 5.7.8 whose center frequency is set to perform spectrum extraction of the third formant. The outputs are then input to the neural network 3, which has been trained in advance to learn the characteristics of various signals.

ニューラルネットは例えば文献（日経エレクトロニクス
、第４２７号、１９８７年８月１０日　日経マグロウヒ
ル社発行、「ニューラル・ネットをパターン認識、信号
処理、知識処理に使う］、１１５頁〜１２４頁）に記載
されているように近年脚光を浴びてきたシステムである
。Neural nets are described, for example, in the literature (Nikkei Electronics, No. 427, August 10, 1987, published by Nikkei McGraw-Hill, ``Using neural nets for pattern recognition, signal processing, and knowledge processing'', pp. 115-124). This is a system that has been in the spotlight in recent years.

その構成は第３図に示すようなユニットを複数個複雑に
組合せ接続したものであり、一般にはディノタルシグナ
ルプロセノサを使用し、相応のアルゴリズムで学習させ
る。第３図のユニットを説明すると、一つのユニットを
或いはｍは他のユニットからの入力を受ける部分、入力
を一定の規則で変換する部分、結果を出力する部分から
成っている。他のユニットとの結合部にはそれぞれ可変
の重み（第３図ではＷｔｍ）を付ける。この値を変・え
るとネットワークの構造が変る。ネットワークの学習と
はこの重みを変えることである。Its configuration is a complex combination of a plurality of units as shown in FIG. 3, and generally uses a Dinotal signal processor and performs learning using a corresponding algorithm. To explain the unit in FIG. 3, one unit or m consists of a part that receives input from other units, a part that converts the input according to a certain rule, and a part that outputs the result. A variable weight (Wtm in FIG. 3) is attached to each connection portion with another unit. Changing this value changes the network structure. Network learning involves changing these weights.

ユニットｔからユニットｍへの前向きの学習ではユニノ
）ｍにおいてまず入力の総和Ｘｍがとなり、本実施例で
はユニノ）ｍの出力特性はＸｍとなるよう設定した。In forward learning from unit t to unit m, the sum of inputs in Unino)m is first equal to Xm, and in this embodiment, the output characteristic of Unino)m is set to be Xm.

本実施例で採用したニューラルネットの構成は・ンター
ン連想型ニューラルネットであり、アルゴリズムはバッ
クグロパｒ−ジョンである。第４図に示すように、第３
図で示したユニットを複数個の入力層、中間層、出力層
の方向に結合させている。各層内での結合はなく、出力
層から入力層に向う結合もない。いわゆる“前向き“の
ネットワークである。The configuration of the neural network adopted in this embodiment is a turn-associative neural network, and the algorithm is a backlog algorithm. As shown in Figure 4, the third
The units shown in the figure are coupled in the directions of a plurality of input layers, intermediate layers, and output layers. There are no connections within each layer, and no connections from the output layer to the input layer. This is a so-called "forward-looking" network.

学習は入力部ｙｏｔ〜ｙｏ３　から入力層の各ユニット
１．１〜１．３に入力データを与える。本実施例は第２
図のＤＦＴ　６　、７　、８の出力を与える。（Ｗｌｌ
ｌ〜ｗ３３３等は重み付けを表す）この入力信号は各ユ
ニット１．１〜１．３で変換され、中間層の各ユニット
２．１〜２．３に伝わシ、最後に出力層の各ユニット３
．１〜３．３から出てくる。その出力値と、望ましい出
力値を比べその差を減らすように結合の強さを変える。In the learning, input data is given to each unit 1.1 to 1.3 of the input layer from the input units yot to yo3. This example is the second
The outputs of DFT 6, 7, and 8 in the figure are given. (Wll
(l to w333, etc. represent weighting) This input signal is converted in each unit 1.1 to 1.3, transmitted to each unit 2.1 to 2.3 of the intermediate layer, and finally to each unit 3 of the output layer.
．． It comes out from 1 to 3.3. The output value is compared with the desired output value, and the strength of the coupling is changed to reduce the difference.

本実施例では、ある入カバターンを与えたときの出力ユ
ニノ）ｎの出力値を）’　ｎｓそのユニットでの望まし
い値をｒｎとしたときの２乗誤差Ｅｎ即ちを最小にする
ように出力層から入力層に向って逆向きに学習、即ち各
ユニットの重みＷｉｌｌ〜Ｗ３３３を変えてゆくように
した。これをパックプロ・ぞｒ−ジョンと言う。学習は
前述の値が最小になるまで繰り返す。学習が終えたら勿
論固定させる。In this embodiment, when a certain input cover pattern is given, the output value of the output unit (n) is )'ns, and the desired value for that unit is rn, and the output layer is set so as to minimize the squared error En, that is, Learning is performed in the reverse direction toward the input layer, that is, the weights Will to W333 of each unit are changed. This is called Pack Pro Z-R-John. Learning is repeated until the above value is minimized. After you finish learning, of course, fix it.

本実施例では音声信号と非音声信号とを識別することを
先ず主眼とし、入力層、中間層、出力層のユニット数を
それぞれ１６，８．３としく８゜４．３でも行った）学
習を音声信号、モデム信号、トーン信号によって行わせ
た。その結果その識別検出時間は数ミリセカンドで実現
できている。In this example, the main focus was on distinguishing between audio signals and non-audio signals, and the number of units in the input layer, intermediate layer, and output layer was set to 16 and 8.3, respectively, and learning was also performed at 8° 4.3) was performed using voice signals, modem signals, and tone signals. As a result, the identification detection time can be realized in a few milliseconds.

つまり第２図におけるＤＦＴ　６　、７　、１？でス４
りｌ・ラム抽出された出力は前述したように二一一うル
ネソトの第４図で言えばｙｏｔ　ｌ　３’０２　、ｙａ
ａの入力部へ与えられ、このニューラルネットで予め学
習された処理・ンターンで識別されその出力層から出力
される。第２図での出力端子は音声信号４０、非音声信
号４１の出力のみ示したが、識別はそれ以上複数種類で
きることは当然である。In other words, DFT 6, 7, 1 in Fig. 2? Desu 4
As mentioned above, the extracted output is yot l 3'02, ya in Figure 4 of 211 Rene Soto.
It is applied to the input part of a, is identified by processing/turns learned in advance by this neural network, and is output from its output layer. Although the output terminals in FIG. 2 only show the output of the audio signal 40 and the non-audio signal 41, it goes without saying that more than one type of identification can be made.

（発明の効果）本発明は以上説明したように、入力される音声帯域内信
号をまず第１、第２、第３ホルマントといったスペクト
ラム抽出を行った上、学習能力の高いニューラルネット
に信号の識別をさせた方式であるので、極めて高い精度
の識別と短い時間での識別が実現でき、音声１４’ケノ
トを用いるシステム全般に応用できるものであり効果大
なるものである０(Effects of the Invention) As explained above, the present invention first performs spectrum extraction such as first, second, and third formants on an input voice band signal, and then uses a neural network with high learning ability to identify the signal. Since this is a method that allows identification to be performed with extremely high accuracy and in a short time, it can be applied to all systems that use audio 14'kenoto and is highly effective.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の方式図、第２図は本発明の実施例のス
ペクトラム抽出部のブロック図、第３図はニューラルネ
ットのユニットの説明図、第４図は二一一うルネノトの
構成を示す図である。１・・・入力端子、２・・・スペクトラム抽出部、３・
・・ニューラルネット、４．４０．４１・・・出力端子
、′５・・・アナログ・ディジタル変換器。特許出願人　　沖電気工業株式会社本発明１文に図第１図スぐクトウム釉出■戸のゴロ、７図第２図ニｚ−５ルネットの精成図第４図Fig. 1 is a system diagram of the present invention, Fig. 2 is a block diagram of the spectrum extraction section of the embodiment of the present invention, Fig. 3 is an explanatory diagram of the neural network unit, and Fig. 4 is the configuration of the 211 Rune Note. FIG. 1... Input terminal, 2... Spectrum extraction section, 3.
...Neural net, 4.40.41...Output terminal,'5...Analog-digital converter. Patent Applicant: Oki Electric Industry Co., Ltd. Invention 1. Figure 1. Sugukutoum glaze ■ Door goro, Figure 7. Figure 2. Finished drawing of the z-5 lunette. Figure 4.

Claims

【特許請求の範囲】[Claims]

　音声帯域内信号における音声信号と各種非音声信号の
識別をするため、音声帯域内信号のスペクトラム抽出を
行う手段と該スペクトラム抽出による出力をニューラル
ネットに与え、予め該ニューラルネットに信号の特徴を
学習させておくことによって音声帯域内信号の中の各種
信号を識別し検出する音声帯域内信号検出方式。In order to distinguish between audio signals and various non-audio signals in the audio band signal, a means for extracting the spectrum of the audio band signal and the output from the spectrum extraction are provided to a neural network, and the neural network is trained in advance on the characteristics of the signal. An audio band signal detection method that identifies and detects various signals within the audio band signal by keeping the signals in the audio band.