JPH036520B2

JPH036520B2 -

Info

Publication number: JPH036520B2
Application number: JP57032426A
Authority: JP
Inventors: Toyozo Sugimoto; Takeo Murata
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1982-03-03
Filing date: 1982-03-03
Publication date: 1991-01-30
Also published as: JPS58150997A

Description

【発明の詳細な説明】[Detailed description of the invention]

本発明は音声以外の情報から発音の認識を行な
う発音特徴抽出装置に関するものである。音声は肺から送り出された呼気流が喉頭に存す
る声帯を通過する際に声帯が振動することにより
声に変換され、***や鼻腔に至る呼気の通路が形
を変えることにより変調され、これら発声器管の
総合的な動きの結果、産声される。さて従来、このような音声を抽出するには音響
マイクロホンにより音声波を電気信号に変換し、
所定の周波数帯域を有する多数のフイルタ回路に
入力し、各フイルタ回路の出力から判断して発音
を特徴づけていた。しかし発声器管の総合的動きの結果である音声
を、音声波のみにより全ての音素の発音特徴を抽
出して音声認識を行なうことは極めて困難であ
る。とりわけ非定常的な子音については雑音エネ
ルギーが強く、音声波の中でほぼ確実な特徴抽出
ができる無声摩擦音／ｓ，∫／等を除けば、無声
摩擦音／ｈ／や無声破裂音／ｐ，ｔ，ｋ／や有声
破裂音／ｂ，ｄ，ｇ／や鼻音／ｍ，ｎ，η／等は
その検出及び分離は非常に困難なものである。本発明は上記欠点に鑑み、発声器管各部の動き
を検出する検出器を発声器管各部の近傍に装着ま
たは配置し、前記各検出器からの出力を処理装置
により処理させることにより、従来よりも正確に
発音抽出ができる発音特徴抽出装置を提供するも
のである。以下、図面を参照しながら本発明の一実施例に
ついて説明する。第１図は本発明の一実施例における発音抽出装
置のブロツク構成を示すものである。同図におい
て、１は喉頭部声帯付近に取付けられ声帯の振動
を検出する声帯振動検出器、２は鼻壁中央部付近
に取付けられ鼻腔内における音声の振動を検出す
る鼻振動検出器、３は口腔前方に配置し口気流を
検出する口気流検出器、４は口腔内口蓋に装着し
舌と口蓋との接触を検出する口蓋接触検出器であ
る。５は声帯振動検出器１、鼻振動検出器２、口
気流検出器３及び口蓋接触検出器４の出力から発
音特徴を抽出する処理装置で、以下第２図を用い
てさらに処理装置５における構成の詳細な説明を
行なう。第２図において、６は声帯振動検出器１の声帯
振動情報から特定の値に基づいて声帯振動の有無
を決定する閾値回路、７は鼻振動検出器２の鼻振
動情報から特定の値に基づいて鼻振動の有無を決
定する閾値回路、８は口気流検出器３の口気流情
報を微分することにより口気流の変化率（加速
度）を求める微分回路、９は口気流の変化率の有
無を特定の値に基づいて決定する閾値回路、１０
は口気流検出器３の口気流情報から特定の値に基
づいて口気流の有無を決定する閾値回路、１１は
口蓋接触検出器４の口蓋接触情報を一旦測定回路
１２により舌と口蓋との接触信号に変換した後に
後述する前舌閉鎖、後舌閉鎖及び閉鎖なしの３種
類の状態を判断する舌閉鎖検出回路、１３は閾値
回路６，７，９，１０から出力される各閾値情報
の有無、及び舌閉鎖検出回路１１における３種類
の情報から音素分類を行なう音素分類回路であ
る。上記のように構成された発音特徴抽出装置につ
いて、以下具体的な使用方法を第３図を用い説明
を行なう。声帯振動検出部１として第３図に示すように加
速度センサー１′を医療用両面テープにより人体
における喉頭の声帯部に取り付けることにより、
声帯振動を検出する。検出された声帯振動は閾値
回路６に出力され、閾値回路６は声帯振動の値が
特定の値以上であれば音素分類回路１３に有
（＋）信号を、また一定の値以下であれば無（−）
信号を出力する。また鼻振動検出器２として加速度センサー２′
を医療用両面テープにより人体における鼻壁中央
部付近に取り付けることにより、鼻振動を検出す
る。検出された鼻振動は閾値回路７に出力され、
閾値回路７は鼻振動の値が特定の値以上であれば
音素分類回路１３に有（＋）信号を、また一定の
値以下であれば無（−）信号を出力する。また口気流検出器３として熱線流量計センサー
３′を人体における口腔前方の机上等に固定し配
置することにより、口気流の検出を行なう。検出
された口気流は微分回路８に出力され、微分回路
８では口気流の変化率を求めその変化率を閾値回
路９に出力する。そして閾値回路９は変化率の値
が特定の値以上であれば音素分類回路１３に有
（＋）信号を、また一定の値以下であれば無（−）
信号を出力する。一方熱線流量計センサー３′に
より検出された口気流は閾値回路１０にも出力さ
れ、閾値回路１０ではその口気流の値が特定値以
上であれば音素分類回路１３に有（＋）信号を、
また一定値以下であれば無（−）信号を出力す
る。さらに口蓋接触検出器４としては第４図に示さ
れるような接触センサー４′を用いる。接触セン
サー４′は舌と接触する部分に多数の電極４′ａを
有し、止め部４′ｂにより人体における口腔内口
上蓋に装着され、電極４′ａにより舌との接触状
態を検出する。そして検出された電極４′ａと舌
との接触状態は測定回路１２及び舌閉鎖検出回路
１１に順次入力され、接触状態が第５図イのよう
なパターンとなつた際には前舌閉鎖としての情報
が、第５図ロのようなパターンとなつた際には後
舌閉鎖としての情報が、また舌との接触がない場
合には閉鎖なしの情報が音素分類回路１３に出力
される。最終的に音素分類回路１３では下表に示すよう
な内部の記憶テーブルから、閾値回路６，７，
９，１０及び舌閉鎖検出回路１１より入力した各
情報に基づいて音声を判断できる。 The present invention relates to a pronunciation feature extraction device that recognizes pronunciation from information other than speech. Speech is converted into voice by the vibration of the vocal cords as the exhaled airflow from the lungs passes through the vocal cords in the larynx, and is modulated by changing the shape of the exhaled air passage leading to the lips and nasal cavity. As a result of the overall movement of the ducts, cries are produced. Conventionally, to extract such sounds, the sound waves are converted into electrical signals using an acoustic microphone.
The input signal is input to a number of filter circuits each having a predetermined frequency band, and the pronunciation is characterized based on the output of each filter circuit. However, it is extremely difficult to perform speech recognition by extracting the pronunciation characteristics of all phonemes using only speech waves for speech, which is the result of comprehensive movements of the vocal organ. In particular, non-stationary consonants have strong noise energy, and with the exception of voiceless fricatives /s, ∫/, etc., whose features can almost certainly be extracted from the speech wave, voiceless fricatives /h/ and voiceless plosives /p, t , k/, voiced plosives /b, d, g/, nasal sounds /m, n, η/, etc. are extremely difficult to detect and separate. In view of the above-mentioned drawbacks, the present invention has been proposed by installing or arranging a detector for detecting the movement of each part of the vocal tube in the vicinity of each part of the vocal tube, and by having the output from each of the detectors processed by a processing device. The present invention also provides a pronunciation feature extraction device that can accurately extract pronunciations. An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows the block configuration of a pronunciation extraction device in one embodiment of the present invention. In the figure, 1 is a vocal cord vibration detector attached near the vocal cords of the larynx to detect vibrations of the vocal cords, 2 is a nasal vibration detector attached near the center of the nasal wall and detects voice vibrations in the nasal cavity, and 3 is a An oral airflow detector is placed in front of the oral cavity to detect oral airflow, and 4 is a palate contact detector that is attached to the palate in the oral cavity to detect contact between the tongue and the palate. Reference numeral 5 denotes a processing device for extracting pronunciation features from the outputs of the vocal cord vibration detector 1, nasal vibration detector 2, oral airflow detector 3, and palate contact detector 4. The configuration of the processing device 5 will be further explained using FIG. A detailed explanation will be given below. In FIG. 2, 6 is a threshold circuit that determines the presence or absence of vocal fold vibration based on a specific value from the vocal fold vibration information of the vocal fold vibration detector 1, and 7 is a threshold circuit that determines the presence or absence of vocal fold vibration based on a specific value from the nasal vibration information of the nasal vibration detector 2. A threshold circuit 8 determines the rate of change (acceleration) of the oral airflow by differentiating the oral airflow information from the oral airflow detector 3; 9 a differential circuit that determines the presence or absence of the rate of change of the oral airflow; Threshold circuit that determines based on a specific value, 10
11 is a threshold circuit that determines the presence or absence of oral airflow based on a specific value from the oral airflow information of the oral airflow detector 3; 11 is a threshold circuit that determines the presence or absence of oral airflow based on a specific value from the oral airflow information of the oral airflow detector 3; and 11, the palate contact information of the palate contact detector 4; A tongue closure detection circuit that determines three types of states, including anterior tongue closure, posterior tongue closure, and no closure, which will be described later after converting into a signal; reference numeral 13 indicates the presence or absence of each threshold value information output from threshold circuits 6, 7, 9, and 10; , and the tongue closure detection circuit 11, which performs phoneme classification based on three types of information. A specific method of using the pronunciation feature extracting device configured as described above will be explained below with reference to FIG. As shown in FIG. 3, an acceleration sensor 1' is attached as the vocal cord vibration detection unit 1 to the vocal cord part of the larynx of the human body with medical double-sided tape.
Detects vocal cord vibration. The detected vocal cord vibration is output to the threshold circuit 6, and the threshold circuit 6 outputs a positive (+) signal to the phoneme classification circuit 13 if the value of the vocal cord vibration is above a certain value, and a negative signal if it is below a certain value. (-)
Output a signal. Also, the acceleration sensor 2' serves as the nasal vibration detector 2.
Nasal vibrations are detected by attaching the device to the center of the nasal wall of the human body using double-sided medical tape. The detected nasal vibration is output to the threshold circuit 7,
The threshold circuit 7 outputs a presence (+) signal to the phoneme classification circuit 13 if the value of the nasal vibration is above a specific value, and outputs an absence (-) signal if it is below a certain value. Oral airflow is detected by fixing and arranging a hot wire flowmeter sensor 3' as the oral airflow detector 3 on a desk or the like in front of the oral cavity of the human body. The detected oral airflow is output to a differentiating circuit 8, which determines the rate of change in the oral airflow and outputs the rate of change to a threshold circuit 9. Then, the threshold circuit 9 sends a presence (+) signal to the phoneme classification circuit 13 if the rate of change value is above a certain value, and a no signal (-) if it is below a certain value.
Output a signal. On the other hand, the oral airflow detected by the hot-wire flowmeter sensor 3' is also output to the threshold circuit 10, and if the value of the oral airflow is greater than or equal to a specific value, the threshold circuit 10 sends a positive (+) signal to the phoneme classification circuit 13.
Moreover, if it is below a certain value, a null (-) signal is output. Further, as the palate contact detector 4, a contact sensor 4' as shown in FIG. 4 is used. The contact sensor 4' has a large number of electrodes 4'a on the part that comes into contact with the tongue, is attached to the roof of the oral cavity in the human body by a stopper part 4'b, and detects the state of contact with the tongue by the electrode 4'a. . The detected contact state between the electrode 4'a and the tongue is sequentially input to the measurement circuit 12 and the tongue closure detection circuit 11, and when the contact state becomes a pattern as shown in Fig. 5A, it is determined that the anterior tongue is closed. When the information becomes a pattern as shown in FIG. Finally, in the phoneme classification circuit 13, threshold circuits 6, 7,
Speech can be determined based on each piece of information input from 9, 10 and the tongue closure detection circuit 11.

【表】【table】

【表】さてたとえば第６図イに示すような音素波を有
する「hana」という音声を発声すると、加速度
センサー１′は第６図ロのような波形を閾値回路
６に出力する。そして閾値回路６では特定の閾値
から判断して「ｈ」の部分では無（−）信号を、
「ｎ」の部分では有（＋）信号を音素分類回路１
３に出力する。また加速度センサー１′は第６図ハのような波
形を閾値回路７に出力する。そして閾値回路７で
は特定の閾値から判断して「ｈ」の部分では無
（−）信号を、「ｎ」の部分では有（＋）信号を音
素分類回路１３に出力する。さらに熱線流量計センサー３′では第６図ニの
ような波形を微分回路８及び閾値回路１０に出力
する。そして閾値回路９では微分回路８からの微
分値を特定の閾値から判断して「ｈ」及び「ｎ」
の部分で無（−）信号を音素分類回路１３に出力
する。また閾値回路１０でも特定の閾値から判断
して「ｈ」の部分では有（＋）信号を、「ｎ」の
部分では無（−）信号を音素分類回路１３に出力
する。一方接触センサー４′は電極４ａと舌との接触
状態を検出し、測定回路１２を介して舌閉鎖検出
回路１１に出力する。そして舌閉鎖検出回路１１
では「ｈ」の部分で接触パターンにより「閉鎖な
し」の情報を、また「ｎ」の部分では「前舌閉
鎖」の情報を音素分類回路１３に出力する。そして音素分類回路１３では各情報に基づいて
表に示したような内部の記憶テーブルから「ｈ」
及び「ｎ」を認識することができる。以上のように、声帯振動検出器１、鼻振動検出
器２、口気流検出器３及び口蓋接触検出器４によ
り各発声器管の動きを検出し、処理装置５により
各検出器が検出した情報に基づいてあらかじめ記
憶しているテーブルの中から特定の音素を決定す
ることにより、従来困難であつた音声の認識を正
確に行なうことができる。以上のように本発明は声帯振動検出器が検出し
た声帯の振動情報と、鼻振動検出器が検出した鼻
腔内の振動情報と、口気流検出器が検出した口気
流情報と、口蓋接触検出器が検出した舌と口蓋と
の接触情報とに基づいて従来よりも正確に破裂音
および鼻音の各音素を識別することができ、その
実用的効果は大なるものがある。[Table] For example, when the voice "hana" having a phoneme wave as shown in FIG. 6A is uttered, the acceleration sensor 1' outputs a waveform as shown in FIG. 6B to the threshold circuit 6. Then, in the threshold circuit 6, judging from a specific threshold value, there is no (-) signal at the "h" part,
In the “n” part, the presence (+) signal is sent to the phoneme classification circuit 1.
Output to 3. Further, the acceleration sensor 1' outputs a waveform as shown in FIG. 6C to the threshold circuit 7. Then, the threshold circuit 7 outputs an absent (-) signal for the "h" portion and a present (+) signal for the "n" portion to the phoneme classification circuit 13, based on a judgment based on a specific threshold value. Furthermore, the hot wire flow meter sensor 3' outputs a waveform as shown in FIG. Then, in the threshold circuit 9, the differential value from the differentiating circuit 8 is judged from a specific threshold value, and it is determined as "h" and "n".
A null (-) signal is output to the phoneme classification circuit 13 at the part. The threshold circuit 10 also outputs a presence (+) signal at the "h" portion and a no (-) signal at the "n" portion to the phoneme classification circuit 13 based on a judgment based on a specific threshold value. On the other hand, the contact sensor 4' detects the contact state between the electrode 4a and the tongue, and outputs the detected state to the tongue closure detection circuit 11 via the measurement circuit 12. and tongue closure detection circuit 11
Then, the contact pattern outputs information on "no closure" at the "h" part, and information on "frontal tongue closure" at the "n" part, to the phoneme classification circuit 13. Then, the phoneme classification circuit 13 selects "h" from an internal memory table as shown in the table based on each information.
and "n" can be recognized. As described above, the movement of each vocal tube is detected by the vocal cord vibration detector 1, the nasal vibration detector 2, the oral airflow detector 3, and the palate contact detector 4, and the information detected by each detector is processed by the processing device 5. By determining a specific phoneme from a pre-stored table based on the above, it is possible to accurately recognize speech, which has been difficult in the past. As described above, the present invention uses vocal cord vibration information detected by a vocal cord vibration detector, intranasal vibration information detected by a nasal vibration detector, oral airflow information detected by an oral airflow detector, and a palate contact detector. Each phoneme of plosives and nasals can be identified more accurately than before based on the contact information between the tongue and the roof of the mouth detected by the system, and its practical effects are significant.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例における発音特徴抽
出装置のブロツク図、第２図は同発音特徴抽出装
置における処理装置のブロツク図、第３図は同発
音特徴抽出装置の使用例を示す図、第４図は接触
センサーの平面図、第５図は舌と口蓋との接触パ
ターンを示す図、第６図は各検出器の波形図であ
る。１……声帯振動検出器、２……鼻振動検出器、
３……口気流検出器、４……口蓋接触検出器、５
……処理装置。 FIG. 1 is a block diagram of a pronunciation feature extracting device according to an embodiment of the present invention, FIG. 2 is a block diagram of a processing device in the same pronunciation feature extraction device, and FIG. 3 is a diagram showing an example of use of the same pronunciation feature extraction device. , FIG. 4 is a plan view of the contact sensor, FIG. 5 is a diagram showing a contact pattern between the tongue and the palate, and FIG. 6 is a waveform diagram of each detector. 1... Vocal cord vibration detector, 2... Nasal vibration detector,
3... Oral air flow detector, 4... Palate contact detector, 5
...Processing device.

Claims

【特許請求の範囲】[Claims]

１喉頭部に取り付けた声帯振動検出器と、鼻部
に取り付けた鼻振動検出器と、口腔前方に配置し
た口気流検出器と、舌と口蓋の接触を検出する口
蓋接触検出器とを備えかつ、口気流検出器の出力
に基づいて破裂音ｐ，ｔ，ｋ，ｂ，ｄ，ｇおよび
ｈのグループを抽出し、鼻振動検出器の出力に基
づいて鼻音ｍ，ｎ、を抽出し、声帯振動検出器の
出力に基づいてｐ，ｔ，ｋ，ｈとｂ，ｄ，ｇとを
分離し、口蓋接触検出器の出力に基づいてｐ，
ｈ，ｔ，ｋ，ｂ，ｄ，ｇ，ｍ，ｎ、とに分離、識
別し、さらに口気流検出器の出力にもとづく口気
流の変化率によりｐとｈを分離する処理装置とを
具備したことを特徴とする発音特徴抽出装置。1.Equipped with a vocal cord vibration detector attached to the larynx, a nasal vibration detector attached to the nose, an oral airflow detector placed in front of the oral cavity, and a palate contact detector that detects contact between the tongue and the palate. , extract the groups of plosive sounds p, t, k, b, d, g and h based on the output of the oral airflow detector, extract the nasal sounds m, n, based on the output of the nasal vibration detector, and Separate p, t, k, h and b, d, g based on the output of the vibration detector, and separate p, t, k, h and b, d, g based on the output of the palate contact detector.
h, t, k, b, d, g, m, and n, and further includes a processing device that separates p and h based on the rate of change in oral airflow based on the output of the oral airflow detector. A pronunciation feature extraction device characterized by: