JPS607492A - Monosyllable voice recognition system - Google Patents

Monosyllable voice recognition system

Info

Publication number
JPS607492A
JPS607492A JP58115573A JP11557383A JPS607492A JP S607492 A JPS607492 A JP S607492A JP 58115573 A JP58115573 A JP 58115573A JP 11557383 A JP11557383 A JP 11557383A JP S607492 A JPS607492 A JP S607492A
Authority
JP
Japan
Prior art keywords
vowel
standard
input
monosyllabic
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58115573A
Other languages
Japanese (ja)
Inventor
寺尾 修
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP58115573A priority Critical patent/JPS607492A/en
Publication of JPS607492A publication Critical patent/JPS607492A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 (a) 発明の技術分野 本発明は擬似連続発声の単音節を認識対象とする登録方
式による特定話者の音声認識方式に関する。
DETAILED DESCRIPTION OF THE INVENTION (a) Technical Field of the Invention The present invention relates to a speech recognition method for a specific speaker using a registration method that recognizes monosyllables of pseudo-continuous utterances.

(b) 技術の背景 近年、音声認識はデータ処理技術の発達と普及に伴いデ
ータ処理システムにおけるデータ入出力手段の一端とし
て当初は音声制御による仕分け、′α話回1)プにおけ
る案内サービス程j蜆にとソ、tっでいた音声認識・合
成技術も半害体特に集イ!(化技術の進展に支えられ、
従来内外とさhでいた大量の情報を高速処理する音声認
識のための?f15理回11′i′1あるいは高速大谷
41メモリがLSIによって実現すると共に低コストで
提供されるようになり、13本語による音声入力手段の
持つ対語形式に4. L、1・や作者に他の入出力装(
醒に見られるようなIIf別のH熟全必費とすることの
ない操作がち易な人力音声〜デジタルデータの変換機能
を生かしたデータ処理装置として普及するようになった
(b) Background of the technology In recent years, with the development and spread of data processing technology, voice recognition has been used as part of the data input/output means in data processing systems. Speech recognition and synthesis technology, which has been around for a long time, is now available especially for semi-harmful creatures! (Supported by advances in chemical technology,
For voice recognition that can process large amounts of information at high speed, which was previously required both internally and externally? f15 logic 11'i'1 or high-speed Otani 41 memory was realized by LSI and provided at low cost, and 4. L, 1, and other input/output devices (
It has become popular as a data processing device that takes advantage of the easy-to-operate human-powered voice to digital data conversion function, without requiring the full cost of IIf as seen in modern technology.

(c) 従来技術と問題点 第1図は従来における単音節認識方式のブロック図金示
す。図において1は例えばマイクロプロセッサ(MPU
)によりね成される制御部、2は高速半導体メモリによ
る記憶部、3は音声り環部、4は入カバクンバッファお
よび5は比較部である。
(c) Prior Art and Problems FIG. 1 shows a block diagram of a conventional monosyllable recognition method. In the figure, 1 is, for example, a microprocessor (MPU).
), 2 is a storage section using a high-speed semiconductor memory, 3 is an audio ring section, 4 is an input buffer, and 5 is a comparison section.

単音節認識方式は通常特定語基のため認識すべき入力音
声における複数の単音節を設定して、制御部1は音声処
理部3への入力音声における単音節を図示省略したが予
め扶数の帯域フィルタ群に印加して得るスペクトラム出
力毎に5ms程度のフレーム周期で標準化して得た特定
パラメータの各音素毎に対応した足常的な部分を記憶部
3に標準パクン辞423として蓄積する。線形時間正規
化により1単6Mにつき例えば512ビツトの単訪バタ
ンか作成蓄積でべろ。このように先行する学習モードで
は5〜10個程の訓練サンプルから平均的な標準バタン
23a−nを作成して標準バタン23全登録保持し、認
識モードでは入力音声による入力バタンとHM h4 
卑バタン23a−nとを同一フレーム周期で予め設定し
たしきい値レベルに従い類似度を比較部4によ請求めて
日本語における母音・子音(ア〜ン)45fffl、濁
音18種、半濁音5種。
The monosyllable recognition method usually sets a plurality of monosyllables in the input speech to be recognized for a specific word base, and the control unit 1 sets the monosyllables in the input speech to the speech processing unit 3 in advance, although not shown in the figure. A regular part corresponding to each phoneme of a specific parameter obtained by standardizing each spectrum output obtained by applying it to a group of bandpass filters at a frame period of about 5 ms is stored in the storage unit 3 as a standard parody 423. By linear time normalization, for example, 512 bits per single 6M can be generated and accumulated. In this way, in the preceding learning mode, average standard batons 23a-n are created from about 5 to 10 training samples and all the standard batons 23 are registered, and in the recognition mode, input batons based on input voice and HM h4 are stored.
The comparison unit 4 calculates the similarity between the Japanese vowels and consonants 23a to 23a-n according to a preset threshold level in the same frame period, and calculates 45 fffl vowels and consonants (a-n), 18 types of voiced sounds, and 5 semi-voiced sounds in Japanese. seed.

拗音33種の計101種の単音節を基本に1識を行う。A study is conducted based on a total of 101 types of monosyllables, including 33 types of syllables.

認識は通常単音節を子音部分と母音部分に分離し、母音
標準バタンを使って母音を決定した後、子音の認識を行
う方法によっている。この方法は単音節の候補が削減さ
るので子音の認識が容易になる上処理量が少くなる利点
がある。しかしこの方法は一般的な擬似連続発声による
音μ」入力に見られる母音発声が不充分な特に独りの半
母音を含む拗音については特に母音が誤認識され易く子
音部を含めた単音節の認識率が下る欠点があった0 (d) 発明の目的 本発明の目的は上記の欠点を除去するため従来の単音節
だけによる母音の標準バタンたけでなく母音または/お
よび半母音の連続組合せにおける音韻の通過部(わたシ
)に関する推移バタン全母音のモデルとして辞−卦に備
え、入力バタンにおける推移パタンとの差異を演算して
差異結果の移行方向によシ母音を決定して、母音抽出に
おける認識率を同上しようとするものである。
Recognition usually involves separating a single syllable into consonant and vowel parts, determining the vowel using a standard vowel button, and then recognizing the consonant. This method has the advantage of reducing the number of monosyllable candidates, making it easier to recognize consonants, and reducing the amount of processing required. However, this method tends to misrecognize vowels, especially for persistent consonants that include solitary semi-vowels, where the vowel pronunciation seen in the input of the general pseudo-continuous utterance of the sound μ'' is insufficient, and the recognition rate of monosyllables including consonants is high. (d) Object of the Invention The object of the invention is to eliminate the above-mentioned disadvantages by not only the conventional standard slam of vowels with only monosyllables, but also the passage of phonemes in successive combinations of vowels and/or semi-vowels. Prepare the ji-ku as a model for all the transitional vowels related to part (watashi), calculate the difference with the transition pattern in the input baton, determine the transition direction of the difference result, determine the shi vowel, and improve the recognition rate in vowel extraction. It is an attempt to do the same as above.

(e) 発明の構成 この目的は、未知入力音P+予め辞書に登録された単音
節標準バタンと照合して行う音声認識装置において、音
声処理部は、話者の単音節標準バタンの登録時にスペク
トラム時系列による分析に基づく標準バタンと共に母音
との推移バタンを拗音・半母音等の特定の単音節に対し
て母音の推移モデル辞書として作成して記憶部Vc登録
せし虻る慎能を俯え、制御部は擬似連続発声された入力
音声に伴う音声処理部、経由、照合部における入力バタ
ンと標準バタンによる一次照合において、しきい値を満
たす類似度が侍られぬとき、または、複数の母音とのI
i′I2離が近いときには、該入力バタンをモデル辞書
の推移パタンに従って各母音との差異を演算し、各母音
との距離の差異傾向をめ、母音推移モデル辞書と照什し
て、その母音認識を行うことを特徴とする単音節音声認
識方式荀提供することによシ達成することができる。
(e) Structure of the Invention The object of the present invention is to provide a speech recognition device that performs matching between an unknown input sound P and a monosyllabic standard bang registered in advance in a dictionary. I created a vowel transition model dictionary for specific monosyllables such as obsessives and semi-vowels, along with standard slams based on time-series analysis, and registered them in the memory unit Vc. The control unit is configured to perform a primary matching between the input button and the standard button in the voice processing unit, relay, and matching unit accompanying pseudo-continuously uttered input speech, when the similarity that satisfies the threshold cannot be met, or when multiple vowels and I of
i'I2 When the separation is close, calculate the difference between the input button and each vowel according to the transition pattern of the model dictionary, find the tendency of the difference in distance from each vowel, and compare it with the vowel transition model dictionary to calculate the difference between the input button and each vowel. This can be achieved by providing a monosyllabic speech recognition method that performs recognition.

(f) 発明の実施例 以下、図面を参照しつつ本発明の一実施例について説明
する。第2図は本発明の一実施例における単音節音声認
識方式のブロック図、第3 ff?I(R)は入力バタ
ンのパワー推移バタンおよび第3図(b)は各母音モデ
ルとの差異傾向何回を示す。図において1aは制御部、
2aは記憶部、3aは音声処理部、4aは入力バタンバ
ッファ、5aは比較部である。更に記憶部2aの記憶領
域において23は標準バタン辞書、23a−nは標準バ
タン、24は推移パタンによる母音モデル辞書、24p
−sは母音モデルである。図の構成部材を示す省゛)号
で従来と共通の符号を有するものは従来と共通の僚能と
特性をイエし、サフィックスが付加されたrlt成部材
は従来と共通の機能に加えて付加(幾能を備えたことを
示す。従って本実施例においても1nll (1一部1
aは記憶部2aにおける1υ1j仰プログラム21 a
 !pよひ制酉1データ22aに従い構成各部を制仰し
て話者の擬似連続発声でよる学習モードにおける入力音
声に従って単音節標準バタンを音声処理部3aによシ分
析して標準バタン辞書23に標準パタン23a−nを登
録し、認識モードでは入力音声を音声処理部3aにおい
て分析し入力バタンを入力バタンバッファ4a経由比収
部5aに入力し、標準バタン23a−nと照合して認識
を冥行することに従来と変りない。しかし本実施例にお
いては母音発声が不充分なゆ数の母音または/および半
母音の連続組合せを含む拗音晴・についてはしきい値を
満足する照合がイ:≠られないので、通當の標準パタン
との照合に絖いて例えは第3図(a)に示す入力バタン
を母音モデル24p〜8と同一1i!II期フレーム毎
に差異を演算し、その傾向を第3図(b)のようにめる
。ここでは母音Uが平均値的には最も近いが音韻末尾部
においては母音0が差異がOに近接する傾向に比較して
離脱する傾向に見られるので母音0と認識する。勿論母
音Uに対して点線に示す延長のようになれば該入力バタ
ンは母音Uと認識する。このようにすれば入力音声にお
ける発声が不充分なため木尾都が欠損して母音Uと誤g
織されるような場合でも望ましい認識結果の母音0が出
力され、従来に比較してよシ認識率の高いルY識万式が
イ(iられる〇 (g) 発明の詳細 な説明したように母音抽出が困MICな母音IF、たは
/および半母音の連続組合せについてもよりi4在的な
特徴を捕えて照合する方法全加味し、ト5!識率を高め
る単音節の音声認識方式が得られるので4j用である。
(f) Embodiment of the Invention An embodiment of the invention will be described below with reference to the drawings. FIG. 2 is a block diagram of a monosyllabic speech recognition method according to an embodiment of the present invention. I(R) shows the power transition of the input button, and FIG. 3(b) shows the number of times the difference trend with each vowel model. In the figure, 1a is a control unit;
2a is a storage section, 3a is an audio processing section, 4a is an input button buffer, and 5a is a comparison section. Furthermore, in the storage area of the storage unit 2a, 23 is a standard batan dictionary, 23a-n is a standard batan dictionary, 24 is a vowel model dictionary based on transition patterns, 24p
-s is a vowel model. Components in the figure that have the same reference numerals as conventional ones have functions and characteristics common to conventional ones, and rlt component parts with a suffix have added functions in addition to conventional ones. (Indicates that it has geometric function. Therefore, also in this example, 1nll (1 part 1
a is the 1υ1j elevation program 21 a in the storage unit 2a
! The monosyllabic standard bang is analyzed by the voice processing unit 3a according to the input voice in the learning mode using the speaker's pseudo-continuous utterances by controlling each constituent part according to the data 22a and converted into the standard bang dictionary 23. The standard patterns 23a-n are registered, and in the recognition mode, the input voice is analyzed by the audio processing section 3a, the input button is inputted to the ratio acquisition section 5a via the input button buffer 4a, and the recognition is performed by comparing it with the standard patterns 23a-n. The process remains the same as before. However, in this embodiment, matching that satisfies the threshold value is not possible for the continuous combination of vowels and/or semi-vowels with insufficient vowel pronunciation, so the standard pattern For example, the input button shown in FIG. 3(a) is the same as the vowel models 24p to 8, 1i! The difference is calculated for each period II frame, and its tendency is plotted as shown in FIG. 3(b). Here, the vowel U is the closest in terms of average value, but in the final part of the phoneme, vowel 0 is recognized as vowel 0 because it tends to separate from the difference compared to the tendency for vowel 0 to approach O. Of course, if the input button becomes an extension of the vowel U as shown by the dotted line, the input button is recognized as the vowel U. In this way, Kio Miyako will be lost due to insufficient voicing in the input voice, and it will be mistaken for the vowel U.
Even in cases where the vowel 0 is the desired recognition result, the vowel 0 is output, and the recognition rate is higher than that of the conventional method. A monosyllable speech recognition method that increases the recognition rate by taking into account all the features of continuous combinations of vowels IF, MIC, and/or semi-vowels that are difficult to extract. It is for 4j.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は従来における単音節に織方式によるブロック図
、第2図は本発明の一実施例におりる単音節トエ献方式
によるブロック図および帛3図(a)は入力バタンにお
けるパワー推移バタンおよび2.33図(b)は各母音
モデルとの差異傾向何区を示す。図において1,1aは
制御部、2aは記憶部、3aは音声処理部、4aは入力
バタンバッファ、5aは比較部、23は標準パタン辞書
および24は母音モデル辞書である。 代理人 弁理士 松 岡 犬四部
Fig. 1 is a block diagram of the conventional monosyllabic method, Fig. 2 is a block diagram of the monosyllabic method according to an embodiment of the present invention, and Fig. 3 (a) shows the power transition of the input bat. Figure 2.33 (b) shows the number of different trends from each vowel model. In the figure, 1 and 1a are control units, 2a is a storage unit, 3a is a voice processing unit, 4a is an input button buffer, 5a is a comparison unit, 23 is a standard pattern dictionary, and 24 is a vowel model dictionary. Agent Patent Attorney Inu Shibe Matsuoka

Claims (1)

【特許請求の範囲】[Claims] 未知入力音声を予め辞書に登録された単音節標準バタン
と照合して行う音声認識装置において、音声処理部は、
話者の単音節標準バタンの登録時にスペクトラム時系列
による分析に基づく標準バタンと共に母音との推移バタ
ン金物音・半母音等の特定の単音節チ対して母音の推移
モデル辞書として作成して記憶部に登録せしめる機能を
備え、制御部は擬似連続発声された入力音声に伴う音声
処理部、経由、照合部における入力バタンと9.q f
’Aパタンによる一次照合において、しきい値を満たす
類似度が得られぬとき、または、複数の母音との距離が
近いときには、該入力バタンをモデル辞否の推移バタン
に従って各母音との差異を演算し、各母音との距離の差
異傾向をめ、母音推移モデル辞書と照合して、その母音
認識を行うことを特O5(とする単音節音声認識方式。
In a speech recognition device that performs unknown input speech by comparing it with monosyllabic standard bangs registered in advance in a dictionary, the speech processing unit includes:
When registering a speaker's monosyllabic standard batan, a vowel transition model dictionary is created for specific monosyllabic chis such as the standard batan and a vowel, such as a metal sound or a semi-vowel, and is stored in the storage unit. 9. The control unit is equipped with a function for registering, and the control unit performs input slamming in the audio processing unit, relay, and verification unit in response to pseudo-continuously uttered input speech. q f
In the primary matching using the 'A pattern, if a similarity that satisfies the threshold cannot be obtained, or if the distance to multiple vowels is close, the difference between the input button and each vowel is determined according to the transition button of the model word. A monosyllabic speech recognition method that recognizes the vowel by calculating the difference in distance from each vowel and comparing it with a vowel transition model dictionary.
JP58115573A 1983-06-27 1983-06-27 Monosyllable voice recognition system Pending JPS607492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58115573A JPS607492A (en) 1983-06-27 1983-06-27 Monosyllable voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58115573A JPS607492A (en) 1983-06-27 1983-06-27 Monosyllable voice recognition system

Publications (1)

Publication Number Publication Date
JPS607492A true JPS607492A (en) 1985-01-16

Family

ID=14665908

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58115573A Pending JPS607492A (en) 1983-06-27 1983-06-27 Monosyllable voice recognition system

Country Status (1)

Country Link
JP (1) JPS607492A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160058765A (en) 2013-09-24 2016-05-25 엔티엔 가부시키가이샤 Sintered metal bearing and fluid-dynamic bearing device provided with said bearing
KR20160058766A (en) 2013-09-24 2016-05-25 엔티엔 가부시키가이샤 Sintered metal bearing and method for producing same

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160058765A (en) 2013-09-24 2016-05-25 엔티엔 가부시키가이샤 Sintered metal bearing and fluid-dynamic bearing device provided with said bearing
KR20160058766A (en) 2013-09-24 2016-05-25 엔티엔 가부시키가이샤 Sintered metal bearing and method for producing same
US10415573B2 (en) 2013-09-24 2019-09-17 Ntn Corporation Fluid-dynamic bearing device provided with a sintered metal bearing and a fan motor provided with the fluid-dynamic bearing device

Similar Documents

Publication Publication Date Title
US9812122B2 (en) Speech recognition model construction method, speech recognition method, computer system, speech recognition apparatus, program, and recording medium
US6553342B1 (en) Tone based speech recognition
US7319959B1 (en) Multi-source phoneme classification for noise-robust automatic speech recognition
JPS6336676B2 (en)
JP2001166789A (en) Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end
JPS6138479B2 (en)
JPS607492A (en) Monosyllable voice recognition system
JP2010072446A (en) Coarticulation feature extraction device, coarticulation feature extraction method and coarticulation feature extraction program
JP3231365B2 (en) Voice recognition device
JPS63161499A (en) Voice recognition equipment
Wang et al. Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
JP3291073B2 (en) Voice recognition method
JPH0455518B2 (en)
Lubensky Continuous digit recognition using coarse phonetic segmentation
JPS607493A (en) Monosyllable voice recognition system
TW202129628A (en) Speech recognition system with fine-grained decoding
JPS6312000A (en) Voice recognition equipment
JPS6180298A (en) Voice recognition equipment
JPS62245295A (en) Specified speaker's voice recognition equipment
JPS5953900A (en) Speaker recognition system
JP2000242292A (en) Voice recognizing method, device for executing the method, and storage medium storing program for executing the method
JPH0695684A (en) Sound recognizing system
JPS60170900A (en) Syllabic voice standard pattern registration system
JPS6033599A (en) Voice recognition equipment
JPH05241592A (en) Continuous word recognition device