JPH02289004A

JPH02289004A - Character input device using voice

Info

Publication number: JPH02289004A
Application number: JP1075525A
Authority: JP
Inventors: Yuichi Murakami; 裕一村上
Original assignee: Individual
Current assignee: Individual
Priority date: 1989-03-27
Filing date: 1989-03-27
Publication date: 1990-11-29

Abstract

PURPOSE:To attain the application of a simple voice input device to a compact and inexpensive character input device by identifying the molar division via a voice identifying means with use of the trigger signal received from a division input member. CONSTITUTION:An input means 1 is used to convert the voices into the electric signals together with a voice identifying means 2 which identifies the voice signals received from the means 1 and outputs the identified voices in the digital signals, and a division input member 3 which inputs the molar division signal with a manual pushing operation of a button, etc. The trigger signal received from the member 3 is inputted to the means 2. Thus the means 2 identifies the molar division with the trigger signal received from the member 3. Thus no time is required for execution of a program that is used to judge the molar division. As a result, a pleasant operating environment is secured and the processing time is shortened with an inexpensive device. At the same time, the exact judgment is possible for the molar division.

Description

【発明の詳細な説明】[Detailed description of the invention] 【産業上の利用分野】[Industrial application field]

この発明は、音声入力信号を識別して、コンピュータや
ワープロに等に、文章や操作指示等を入力する音声によ
る文字入力装置に関する。The present invention relates to a voice character input device for identifying voice input signals and inputting sentences, operating instructions, etc. to computers, word processors, etc.

【従来の技術】[Conventional technology]

従来は、コンピュータ等に文章や操作指示を与え′る方
法としてキーボードが主に使われてきた。キーボードに代わって、音声で入力できると、以下の特
長を実現できる。 ■　キーボードに比べて練習時間が少なくてすみ、不慣
れな者でも取り付き易く、コンピュータ等の普及を老若
問わず促進できる。 ■　図形等の入力をするとにきに、マウスやデジタイザ
等のボインテングデバイスと、キーボードの双方を同時
に操作するので、操作が繁雑となる。これに対して、音
声で文字や操作等を入力し、手でボインテングデバイス
を操作すれば大変に扱い易い。 ■　キーボードは自閉的であり、肩凝り等の職業病を引
き起こしやすい。これに対して、音声で入力できると、
適当なストレス発散になる。 ■　キーボードに比へ部品点数が少なく、パーソナルコ
ンピュータ等の小型情報処理機器をより小型で安価に出
来る。音声入力は、このように優れた特長があるにもかかわら
ず、特殊な用途以外には殆ど普及していないのが実状で
ある。処理が複雑で、安価な回路で認識率を高くできな
いことが理由である。簡便な音声入力装置を、小型で安
価な装置へ応用することが切望されている。従来、比較的安価な音声入力装置は、使用者が事前に自
分の声を登録し、その声との比較により認識を行う方式
と、昭和６３年特許願第１９９５３３号に記載された、
フォルマント情報を表示して発声者の調整により入力す
るものに分けられる。登録方式は、単語単位の認識を行うものが多い。一般文書入力に使えるものは、言葉をハツキリと区切っ
て発音しなければならない。従って、入力スピードはあ
がらず、良くてもキーボード並の使用感しか得られない
。しかも、音声入力を識別する回路には、高速の比較処
理能力が要求され、キーボードに比較して著しく高価に
なる欠点があった。フォルマント表示型は、原理的には単語登録の必要がな
い。このため、言葉を区切って発音することなく入力で
きる筈である。しかしながら、実際に音声信号を区切り
なく連続して入力すると、コンピュータ等の識別手段で
区切り点を見つける必要がある。それにはプログラムス
テップが必要であり、応答性の悪いものとなる。また、
区切り精度が悪いと言葉を全く勘違いしてしまう場合が
ありえる。例えば「縫子ｊを「あやこｊと３文字に発音したつもり
が「あえあこ」と４文字に発音したと入力してしまうこ
とがままある。Conventionally, keyboards have been primarily used as a method of giving text and operating instructions to computers and the like. If you can input by voice instead of using a keyboard, you can realize the following features. ■ Compared to keyboards, it requires less practice time, is easier to use even for inexperienced users, and can promote the spread of computers, etc., among people of all ages. ■ When inputting figures, etc., both a pointing device such as a mouse or digitizer and a keyboard must be operated at the same time, making the operation complicated. On the other hand, it is much easier to use by inputting characters, operations, etc. by voice and operating the pointing device by hand. ■ Keyboards are autistic and can easily cause occupational diseases such as stiff shoulders. On the other hand, if you can input by voice,
It's a good stress reliever. ■ It has fewer parts than a keyboard, making small information processing equipment such as personal computers smaller and cheaper. Despite these excellent features, the reality is that voice input is not widely used except for special purposes. This is because the processing is complex and it is not possible to achieve a high recognition rate with inexpensive circuits. There is a strong desire to apply a simple voice input device to a small and inexpensive device. Conventionally, relatively inexpensive voice input devices have a method in which the user registers his or her own voice in advance and recognizes it by comparing it with that voice, and a method described in Patent Application No. 199533 of 1988.
It can be divided into two types: displaying formant information and inputting it according to the adjustment of the speaker. Many registration methods perform word-by-word recognition. To use it for general document input, you must pronounce the words clearly. Therefore, the input speed does not increase, and at best the user experience is comparable to that of a keyboard. Moreover, the circuit for identifying voice input requires high-speed comparison processing capability, and has the drawback of being significantly more expensive than a keyboard. In principle, the formant display type does not require word registration. Therefore, it should be possible to input words without having to pronounce them separately. However, when audio signals are actually input continuously without any breaks, it is necessary to find the break points using identification means such as a computer. This requires program steps and results in poor responsiveness. Also,
If the delimitation accuracy is poor, words may be completely misunderstood. For example, I sometimes input ``Nyuko j'' as ``Ayako j'' when I intended to pronounce it as 3 letters, but instead I ended up pronouncing it as ``Ae Ako'' as 4 letters.

【発明が解決しようとする問題点】[Problems to be solved by the invention]

日本語はモーラと呼ばれる単位にほぼ等時間に発声され
る。モーラは、多くの場合カナ１文字車位である。例外
は、後ろに小文字の「やゆよ」が付く言葉と、長い音を
表す「−」と、声帯の振動停止を表す小文字の「っ」が
付く言葉とがある。これ等の言葉は、「きゃ」等がひとつのモーラとなる。特殊なものに小文字の「あ」が付く「ファ」などのよう
な外来語を表現したものがある。すなわち、俳句を作るときの５．７．５の１単位がモー
ラに他ならない。大型の記憶装置があれば、全ての単語の発声情報を記録
することにより、モーラに触れず認識可能である。しか
しながら、この方式では、全体のコストが著しく高くな
る。大型の記憶装置を使用することなく、自然な発声の
日本語を処理する場合、言葉をモーラに分解することが
必要である。モーラに分解する作業は、言葉に子音系の音が含まれて
いれば比較的容易である。特に破裂系の子音である「ば
行、た行、か行」の音であれば、−瞬の無音期間がある
ことから明確に分解できる。しかしｒ大尾を追う」なとの文章を平仮名で書けば「お
おおをおう」となり、曖昧に喋ると６つのモーラを区別
するのは極めて難しい。つまり、「ア行音、ヤ行音、ワ行音Ｊが連続すると、モ
ーラ区切りをつけるのが非常に難しくなる。例えば、ア
ヤという音はアの音に続いて工（人によってはイ）の音
が短く入り、アの音に戻る。ゆっくり発音すれば問題はないが、早口になると時間的
に区別をつけることが難しくなり、アヤの２文字がアエ
アの３文字にもとれるようになる。これにより「はえある」と「はやる」の区別が出来なく
なる。これらを解決するプログラムアルゴリズムは開発
が難しく、原理的に煩雑であり、実行に長時間を要し、
使用感を損なう。Japanese is uttered in units called moras at approximately equal times. Mora is often one kana character. The exceptions are words that end with the lowercase letter ``yayuyo,'' words that end with a ``-'' to represent a long sound, and words that end with a lowercase ``tsu'' that represents the cessation of vibration of the vocal cords. In these words, ``Kya'' etc. becomes one mora. There are some expressions of foreign words such as ``fa'' with a lowercase ``a'' for special words. In other words, one unit of 5.7.5 when creating a haiku is nothing but a mora. If you have a large storage device, you can record the utterance information of all words and recognize them without touching the mora. However, this approach significantly increases the overall cost. When processing naturally spoken Japanese without using large storage devices, it is necessary to break down words into moras. Breaking down words into moras is relatively easy if the words contain consonant sounds. In particular, the sounds of ``Bagyo'', ``Tagyo'', and ``Kagyo'', which are plosive consonants, can be clearly broken down because of the silent period of -shun. However, if you write the sentence, ``Chasing Ōo,'' in hiragana, it becomes ``Ooo wo Ooo,'' and if you speak it vaguely, it is extremely difficult to distinguish between the six moras. In other words, it becomes very difficult to add a mora break when there are consecutive A, Y, and Wa sounds. The sound enters briefly and returns to the A sound. There is no problem if you pronounce it slowly, but if you pronounce it quickly, it becomes difficult to distinguish in time, and the two letters of Aya can be interpreted as the three letters of Aea. This makes it impossible to distinguish between ``fly'' and ``hayaru''. Program algorithms to solve these problems are difficult to develop, complicated in principle, and take a long time to execute.
It impairs the usability.

【問題点を解決する手段】[Means to solve the problem]

この発明の音声による文字入力装置は、音声を電気信号
に変換する入力手段１と、この入力手段１から出力され
る音声信号を識別し、識別した音声をデジタル信号で出
力する音声識別手段２とを備えている。さらに、この発明の音声による文字入力Ｈｐは、モーラ
の区切り信号を入力する区切入力部材３も備えている。区切入力部材３から入力されるトリガー信号は、音声識
別手段２に入力される。音声識別手段２は、区切入力部
材３からのトリガー信号で、モーラの区切りを識別する
ように構成されている。すなわち、この発明の文字入力装置は、モーラの区切り
処理を、プログラムアルゴリズムではなく、文字入力者
の意志をもって行うようにしている。例えば、モーラの
区切り入力を、ボタンを手で押す等の動作で行えるよう
にしている。手でスイッチを押す以外に、モーラに合わ
せて頭を振り、あるいは、足でスイッチを踏む動作で、
モーラの区切りを入力できる。手を使わないでモーラの区切りを入力できる装置は、手
を別に使いたい作業現場に最適である。The voice character input device of the present invention includes an input means 1 that converts voice into an electrical signal, and a voice identification means 2 that identifies the voice signal output from the input means 1 and outputs the identified voice as a digital signal. It is equipped with Furthermore, the audio character input HP of the present invention also includes a delimiter input member 3 for inputting a mora delimiter signal. A trigger signal inputted from the delimiter input member 3 is inputted to the voice identification means 2. The audio identification means 2 is configured to identify the mora break using the trigger signal from the break input member 3. In other words, in the character input device of the present invention, the mora separation process is performed not by a program algorithm but by the will of the character inputter. For example, a mora delimiter can be input by manually pressing a button. In addition to pressing the switch with your hand, you can also shake your head in time with the mora, or press the switch with your foot.
You can enter mora separators. A device that allows you to input mora divisions without using your hands is ideal for work sites where you want to use your hands separately.

【作用効果】[effect]

本発明の音声による文字入力装置は、モーラの区切りを
入力する区切入力部材を備えている。区切入力部材は、
マイクから入力される音声信号とは別に、押しボタンス
イッチ等によりモーラの区切りを入力する。このため、
モーラの区切り判断のためにプログラム実行時間を必要
としない。これにより、より快適な操作環境が得られ、
安価な装置で処理時間を速くでき、しかも、正確に判断
できる特長が実現される。以下、この発明の文字入力装置の操作方法を説明をする
。本発明を使用して、「赤い」という文字を入力しよう
とする場合、口で「あかい」と普通に喋り、区切入力部
材の押しボタンスイッチを、「あかい」にあわせてボン
ボンボンと３回押す。特に好ましい使用状態においては、発声が完了する最後
の１回は、スイッチをギューと連続して押し込むように
押す。だから、３文字（３モーラ）ｌ単語の場合は、発
声に合わせて、ボン・ボン・ギューと押す。「いろ」の
ような２文字（２モーラ）の場合はボン・ギューと押す
。１文字なら当然ギューと押すだけである。つまり、ギューと押されたことをもって、１単語を喋り
終ったことを使用者から装置に知らせることによって、
単語の句切りを音声識別手段２に入力することによって
、さらに処理時間を短くすることができる。区切入力部材３からモーラの句切りが入力されない、従
来の装置と、本発明の装置とを比較して、操作上の良否
を比べてみると次のようになる。区切入力部材３がない装置は、ソフトウェアアルゴリズ
ムで、ｌモーラ、１単語の句切りを判断しなければなら
ない。簡易で安価な文字入力装置が１単語の終了を検出
する場合、そのアルゴリズムとして、一定の無音期間を
検出する方式とせざるをえない。その場合の問題は、「
札幌」の「っ」のような声帯停止期間を、単語終了と誤
判断することである。この誤判断を防止するには、単語
終了無音期間を、単語間に起こる声帯停止期間よりも長
時間に設定せざるをえない。よって、このような装置は、利用者の発声が終ってすぐ
に応答出来るものでなく、一定の時間、待時間が必要で
ある。つまり、音声による文字入力装置が、手を使わず
利用出来るというのは手に障害のある方にはまことに便
利なものであるが、キーボードの代替えとしての利用を
考えると、利用者にとフでいまひとつ応答の遅い不便な
ものとなってしまう。これに対して、この発明の装置は、最後のモーラを発声
中に、区切入力部材３のスイッチなギューと押し込むよ
うに押すことによって、１単語の終了を入力することが
可能である。よってこの発明の装置は、一定の入力信号
によって、ｌ単語終了を検出できる。したがって、母音
の発声中であっても単語の終了を判断でき、判断、表示
に入るタイミングを最適に設計できる。これにより、こ
の発明の文字入力装置は、キーボードの代替えとして充
分に利用可能になる。単語終了の合図としてだけであれば、モーラに合わせて
押しボタンスイッチ等を押す必要はない。単語終了の時にボタンを押せば良いことになる。単語終了を入力するには、次の状態で押しボタンスイッ
チを押せばよい。 ■　単語発音中ボタンを押し続ける。 ■　単語の区切りにボタンを押す。 ■と■の方法は、簡単そうに思えるが、実際には決して
簡単でない。すなわち、発音しながらスイッチを押し続
け、発音終了と共にボタンを離すのは馴れないものには
難しい動作である。初心者に「ボタンを押してから喋っ
て離す」と説明すると、ボタンを押してから喋り始める
までに何秒もかかり、さらに喋り終ってから離すのを忘
れる。これでは不必要な音声の記録が増え、それを処理するの
に余分な時間も必要となる。単語の区切りにボタンを押すのは、リズムが取り難い動
作である。これに対して、モーラ毎に押す動作はリズム
が取り易く、老人でも快適に操作出来る。なぜなら、モ
ーラとは日本語の等時間に話される単位を意味しており
、モーラ毎にボタンを押すという動作は、当然等時間に
押されることになるからである。つまり、モーラ毎に押す動作は初心者にも簡単で、熟練
者の高速入力にも対応出来る、日本語入力には最適な方
式であると言える。さらに、初心者への指導も簡単で「アカイと言うのに合
わせて、スイッチをボン・ボン・ギューとおして下さい
」と説明すれば、普通の人であれば、１回で操作を習得
出来る。これは俳句などで日本語をモーラに分解する訓
練が十分にされているからに他ならない。キーボードのようにたくさん並んでいるボタンを捜して
押す操作は初心者には大変に苦痛で、取り付き難いもの
であるが、ひとつのボタンを言葉にあわせてボン・ボン
と押す操作は、極めて簡単である。特に、日本語の文字入力装置は、外国語に比較してモー
ラが明確であるため、モーラの区切りを入力することに
よって、音声識別の処理能率を著しく改善できる特長が
ある。本発明の装置は、単にボタンスイッチの存在で構成され
るのではなく、区切入力部材３で駆動される音声識別プ
ログラムアルゴリズムという技術思想の存在により構成
される点に注目して頂きたい。つまり、ボタンの形状や
種類に関係なく、使用者が「あかい」と言いながらボン
・ボン・ギューと操作して、モーラの区切りを入力して
、　「赤い」と入力されるようにした装置が本発明を構
成するものである。本発明の構成は極めて簡単である。すなわち、この発明
の装置は、キーボードからの入力に代わって、区切入力
部材３を設けたことを特徴とするものである。キーボー
ドはスイッチで構成され、区切入力部材３もスイッチで
構成できる。しかしながら、両者の使用状態は極めて異
なり、区切入力部材３は発声と一緒に単一のスイッチを
押して使用でき、キーボードは、入力文字に合わせて特
定のキーを選択して押す必要がある。この発明の文字入
力装置は、モーラの区切りを入力するという、新しい技
術思想により実現されたものである。また本発明の装置は、モーラの区切りを識別するために
、特別な電子回路を必要とせず、汎用のＣＰＵ　（マイ
クロプロセッサ）と、音声識別手段２だけを利用して、
キーボードに匹敵する入力速度と、キーボード以上の簡
易さを実現した点において特筆に値する特長を実現して
いる。The audio character input device of the present invention includes a delimiter input member for inputting mora delimiters. The delimiter input member is
Separately from the audio signal input from the microphone, the mora delimiter is input using a push button switch or the like. For this reason,
No program execution time is required to determine the mora delimiter. This provides a more comfortable operating environment,
It is possible to speed up the processing time using inexpensive equipment, and also has the advantage of being able to make accurate judgments. Hereinafter, a method of operating the character input device of the present invention will be explained. When trying to input the character "red" using the present invention, say "akai" normally with your mouth and press the push button switch on the separator input member three times in time with "red". . In a particularly preferred state of use, the switch is pressed firmly and continuously the last time the utterance is completed. So, for a 3-letter (3-mora) l word, press bon bon gyu in time with the utterance. For two characters (two moras) like ``iro'', press Bon Gyu. If it's just one character, you just have to press it hard. In other words, by being pressed firmly, the user notifies the device that he has finished speaking one word.
The processing time can be further shortened by inputting the word punctuation into the speech recognition means 2. A comparison between a conventional device in which no mora punctuation is inputted from the punctuation input member 3 and the device of the present invention, and their operational advantages and disadvantages are as follows. A device without the punctuation input member 3 must use a software algorithm to determine the punctuation of one mora or one word. When a simple and inexpensive character input device detects the end of one word, the algorithm must be a method that detects a certain period of silence. In that case, the problem is
The problem is that the vocal cord suspension period, such as the ``t'' in ``Sapporo,'' is mistakenly judged as the end of a word. In order to prevent this misjudgment, it is necessary to set the word-end silent period to be longer than the vocal cord suspension period that occurs between words. Therefore, such a device cannot respond immediately after the user finishes speaking, but requires waiting time for a certain period of time. In other words, voice-based character input devices that can be used without the use of hands are extremely convenient for people with hand disabilities, but when considered as a substitute for a keyboard, they are very convenient for users. This results in an inconvenient and slow response time. In contrast, with the device of the present invention, it is possible to input the end of one word by pressing the delimiter input member 3 firmly while uttering the last mora. Thus, the device of the invention can detect the end of an l word by means of a constant input signal. Therefore, the end of a word can be determined even during vowel utterance, and the timing for determining and displaying can be optimally designed. Thereby, the character input device of the present invention can be fully used as a substitute for a keyboard. If it is only used as a signal to end a word, there is no need to press a push button switch or the like in time with the mora. All you have to do is press the button when you finish the word. To input the end of a word, press the push button switch in the following state. ■ Hold down the button while the word is being pronounced. ■ Press the button to separate words. Methods ■ and ■ seem easy, but in reality they are by no means easy. In other words, it is difficult for an unaccustomed person to hold down a switch while producing a sound, and then release the button when the sound is finished. When I explain to beginners that they need to press the button, speak, and then release it, they often find that it takes several seconds to start speaking after pressing the button, and they often forget to release the button after they have finished speaking. This increases the recording of unnecessary audio and requires additional time to process it. Pressing a button to separate words is a rhythmic movement that is difficult to maintain. On the other hand, the movement of pressing each mora has an easy rhythm and can be operated comfortably even by an elderly person. This is because a mora means a unit spoken at equal intervals in Japanese, and the action of pressing the button for each mora naturally means that the button is pressed at equal intervals. In other words, it can be said that pressing the button for each mora is easy even for beginners, and it is suitable for high-speed input by experts, making it the best method for Japanese input. Furthermore, it is easy to teach beginners how to operate the device in just one session by simply explaining to them, ``Press the switch bon bon gyu while saying akai.'' This is due to the fact that they have been sufficiently trained to break down Japanese words into mora in haiku and other forms. Searching for and pressing many buttons lined up like on a keyboard can be extremely painful and difficult for beginners to master, but pressing a single button in time with the words is extremely easy. . In particular, Japanese character input devices have clearer moras than those for foreign languages, so they have the advantage that the processing efficiency of speech recognition can be significantly improved by inputting mora breaks. It should be noted that the device of the present invention is not simply constituted by the presence of a button switch, but is constituted by the existence of a technical idea of a voice recognition program algorithm driven by the delimiter input member 3. In other words, regardless of the shape or type of the button, there is a device in which the user inputs the mora separator by saying ``Akai'' and operating it in a bon-bon-gyu manner, thereby inputting ``red.'' This constitutes the present invention. The configuration of the present invention is extremely simple. That is, the device of the present invention is characterized in that a delimiter input member 3 is provided in place of input from a keyboard. The keyboard is composed of switches, and the delimiter input member 3 can also be composed of switches. However, the usage conditions of the two are quite different; the delimiter input member 3 can be used by pressing a single switch while speaking, and the keyboard requires selecting and pressing a specific key in accordance with the input character. The character input device of the present invention is realized based on a new technical idea of inputting mora delimiters. Furthermore, the device of the present invention does not require any special electronic circuit to identify the boundaries of the mora, and uses only a general-purpose CPU (microprocessor) and the voice recognition means 2.
It has remarkable features in that it has an input speed comparable to a keyboard and is simpler than a keyboard.

【好ましい実施例】[Preferred embodiment]

以下、この発明の実施例を図面に基づいて説明する。但
し、以下に示す実施例は、この発明の技術思想を具体化
する為の文字入力装置を例示すものであって、この発明
の装置は、回路構成を下記のもの特定しない。この発明
の装置は、特許請求の範囲に記載の範囲に於て、種々の
変更が加えられる。更に、この明細書は、特許請求の範囲が理解し易いよう
に、実施例に示される部材に対応する番号を、特許請求
の範囲に示される部材に付記している。ただ、特許請求
の範囲に記述される部材を、実施例に示す部材に特定す
るものでは決してない。本発明の実施例として第１図の回路構成を示す。第１図に示す音声による文字入力装置は、音声を電気信
号に変換する入力手段ｌと、この入力手段ｌから出力さ
れる音声信号を識別し、識別した音声をデジタル信号で
出力する音声識別手段２と、モーラの区切り信号を入力
する区切入力部材３を備えている。この文字入力装置は、コンピュータの入力手段１として
、キーボードに代わって使用される。音声の入力手段１は、音声信号を電気信号に変換するマ
イクと、マイクからの信号を増幅するマイクアンプとを
備えている。音声識別手段２には、現在市販されているパーソナルコ
ンピュータをそのまま利用することができる。すなわち
、入力手段ｌと音声識別手段２とは、パーソナルコンピ
ュータに、マイクとマイクアンプを追加しもので構成で
きる。音声識別手段２は、マイクアンプから入力されるアナロ
グ信号をデジタル信号に変換して、入力された音声を識
別する。音声識別手段２は、区切入力部材３からのトリ
ガー信号で、音声信号のモーラの区切りを識別する。音声識別手段２に利用されるパーソナルコンピューター
は汎用のもので充分である。必ずしも、音声識別専用の
文字入力用のものを使用する必要はない。このため、この発明の文字入力装置は、パーソナルコン
ピュータ上で動くアプリケーションソフトに、文字入力
や指示を入力するのに利用して、キーボードに代わって
音声で入力できる。音声識別手段２は、入力された音声信号を、区切入力部
材３からのトリガー信号をモーラの句切りとして識別で
きる全てものを利用できる。区切入力部材３でモーラの
句切りが特定された音声入力信号は、音声識別手段２で
正確に認識される。この発明は、音声識別手段２の音声信号識別方式を特定
しない。音声識別手段２には、現在使用され、あるいは
、これから開発される、区切りが明確にされたモーラを
識別できる全ての方式を採用できる。区切入力部材３は、音声の発声に合わせて、モーラの区
切りを入力するスイッチを備えている。このスイッチには、コンピュータに接続されたマウスに
付いているものが最も便利に利用できる。マウスとはボインテングデバイスのひとつであり、安価
な普及品として多くのコンピュータに接続されている。区切入力部材３は、音声信号と共に、モーラの区切りを
示すトリガー信号を音声識別手段２に入力する。従って
、区切入力部材３には、マウスボタンに限らず、モーラ
に合わせて、トリガー信号を音声識別手段２に入力でき
る全てのものを使用できる。例えば、キーボードの一部
の特定のキー（スペースバー等）を区切入力部材３のス
イッチとして使用することも可能である。キーボードに代わフて、音声による文字入力装置を使用
してコンピュータに入力する場合、下記の状態で使用す
ることが可能である。ＣＲ７表示画面の一部に、「窓」と呼ばれる小領域を常
時表示しておく。マウスのカーソルが窓を指した時、キ
ーボードからの入力に代わって、音声による文字入カプ
ログラムを動かすようにする。もちろんアプリケーショ
ンソフトが画面の窓領域に表示要求を出した時、不都合
が生じないよう窓を消去し、アプリケーションの表示が
終了してから再度窓表示するようにすることもできる。音声による文字入力がなされた後、それにより作られた
データー列がアプリケーションに渡され、アプリケーシ
ョンソフトが継続して実行される。音声による文字入力装置の使用者は、以下のように操作
して、キーボードに代わって文字を入力する。 ■　まず、音声による文字入力画面をマウスの操作によ
り選択する。 ■　使用者は発音しながら、発声するモーラに合わせて
、ボン・ボン・ギューとマウスボタンを押す。 ■　最後のギューとおし込んだ時に、画面には、単語単
位で複数の文字候補が表示される。 ■　使用者はマウスボタンを押し続けながらマウスを動
かし、単語候補を選択する。マウスボタンを離して、選
択した単語をコンピュータに入力する。文章の場合これを繰り返し続けながら入力してゆく。上記のような操作を実現するプログラムは多くある。そ
の内のひとつを具体的に説明する。また本発明は、本発明者が先に出願した「特願昭６２−
１９９５３３号公報」に記載された発明と組み合わせる
ことにより、より効果を発揮する。この実施例においても、前記の公報に開示されている方
式に基づいて説明する。特に周期の検出とフォルマント
検出に関してはこれを引用する。コンピュータに音声信号入力とボタン入力が接続されて
いる状態を考える。（音声のサンプル周波数）音声入力信号はサンプリング周波数１万３千ヘルツ以上
から２万ヘルツ程度の範囲内で選択するのが最もよい。実施例においては１５６００ヘルツを使用している。（音声の入力ビツト数）音声信号は音声識別手段２のＡＤコンバータ（以下ＡＤ
Ｃ）で、アナログ信号からデジタル信号に変換される。要求されるのは、小さな音量の子音が十分な精度で取れ
、大きな音量の母音でオーバーフローを生じないことで
ある。オーバーフローが生じると波形は矩型となり、不
必要な高周波成分を生じさせる。よって音声信号そのま
まを入力する場合、１４ビット程度のＡＤコンバータを
使用する。しかし、マイク入力からＤＡＣの間に自動増幅率調整を
するアナログ回路を挿入することで小さな音量の時に増
幅率をあげ、大きな音量の時に増幅率を下げることによ
りＤＡＣに必要な精度を８ビット程度まで下げることが
出来る。この工夫はマイクの位置による音量の変化を吸
収し、使いかつてを向上させる。（音声信号の入力）デジタル量に変換された音声入力は、メモリー上に連続
して入力させる。つまり、配列型と呼ばれるデータ型に
格納する。連続して大きなメモリーが取れない場合や、
逆に巨大なメモリがあって、音声入力を常時し続けつつ
文字変換処理し、古いデータを捨てつつメモリーを利用
したい場合は、アレイチェーンテーブルと呼ばれるテク
ニック、つまり確保したメモリーの最初と大きさを格納
しておき、それを参照しながらメモリーを使うといった
こともするが、配列型の変形と考えれはよい。入力の実際は、ＤＭＡと呼ばれるＣＰＵと別のコントロ
ーラによりサンプリング時間待にメモリーバスを横取り
して入力するか、サンプリング周期毎に割り込みと呼ば
れろ強制分岐によりＣＰＵのプログラム実行時間を定期
的に割当て、プログラムにより入力される。ＣＰＵの処
理能力が太きい時は割り込みによる入力でもよいがクロ
ック周波数がｌＯメガヘルツ程度の汎用１６ビツ）ＣＰ
ＵクラスではＤＭＡを使用しなければ処理時間不足にな
る場合がある。連続して入力出来るメモリーの大きさは１．５秒分以上
あれば十分である。普通の単語はこの時間内に充分発声
出来る。よって音声の入力用メモリーは６４キロバイト
程度あればよい。（ボタン入力）ボタンはボンボンと押された時と、ギューと押された時
の区別をつけなければならない。その区別の為にタイマ
ーを用意し、ボタンが離されている時はクリアし、押さ
れている時に増え、一定量に達した時にギューで、次に
離される迄に達しない時ポンと判断する。これは電子回
路でも可能であるしプログラムアルゴリズムによる実現
も容易である。たとえば、ＣＰＵはボタンのボートを常時監視し、■　ボタンが離
された状態から初めて押された時（前回がオフで今回オ
ン）その時の音声入力メモリーアドレスを記録し、時間の記
録の代替えとする。 ■　連続して押された時（前回オンで今回オン）現在の
音声入力メモリーアドレスと記録したアドレスとの差よ
り時間を判断する。という処理をすればよい。ボンがギューかの区別の時間は、１５０ミリ秒から４０
０ミリ秒の間におけばよい。短すぎるとポンと押したつ
もりがギューになり易いし、長すぎると早口の人をイラ
イラさせる。この時間は使用者により選択させることも
可能であるし、ボンボンの時間間隔から早口かどうかを
判断し自動的に可変することも可能である。つまり、ボ
ンボンの時間間隔程度から、その半分程度にすればよい
のであるから、例えば、設定時間の倍よりボンボンの間
隔が長ければ設定時間を長く、設定時間より短かければ
設定時間を短くすればよい。こうすれば使用者が代わっ
ても自動的に対応出来る。ボタンのボートの監視は割り込みを利用するかソフトウ
ェア的に２０ミリ秒に１回程度の頻度で行う。人間のボ
タン押し精度より、５ミリ秒に１回以」−の頻度は無意
味であるし、これ以下の例えは１００ミリに１回ではよ
い結果は得られない。練習すればボタン押し精度を数ミリ単位に持ってゆくこ
とが出来、その方がプログラム処理上都合がよい。しか
し、初心者や老人にそれを期待出来ない以上、２０ミリ
に１回程度で十分である。訓練していない使用者にとってモーラの始まりとボタン
押しとのタイミングのバラツキは５０ミリ秒程度であり
、１秒に５モ一ラ程度の早さであれば十分に入力出来る
ことが分かる。ボタン入力は、押された時の音声入力データの位置を示
す形、つまりポインタ型配列と呼はれる型に記録してお
くのが最も効率がよい。（音声信号処理の最初）処理の最初は必ずボタンを押してから話すことにすれば
、音声信号のサンプリングは最初のボタン押しがあって
から行えばよい。しかし、それは使用者に苦痛を強いる
し、最初の音がす行のような摩擦子音を持つものの場合
に、夕行のような破裂音と誤入力してしまう場合がある
。そこで、音声信号は常時入力しておき、最初のボタン押
しがあってから、それより一定時間手前よりを必要デー
タとする。一定時間とはボタン押しのバラツキを考えて
５０ミリ秒程度以上を取ればよい。サンプリングデータ
にして１０００個以上である。これを実現するには、最初のボタン押しを待っている状
態では、空いているメモリーをリングバッファと呼ばれ
る技術で使用し、ボタンが入力された時にその時点から
１０００個以上遡って必要な場所に転送すればよい。空
いてるメモリーには音声信号保存用のメモリーの下位を
利用すればよく、そこに、入力された音声信号を、一番
古いデータを消すように書き込んでゆくのである。（音声信号処理）得られた音声信号データは加工しなければならない。そ
の方法は無数にあり、音声信号データの加工方法と本発
明の主旨は無間係である。しかし、本発明は汎用の安価
なマイクロプロセッサのみで音声入力を実現するのが目
的である以上、マイクロプロセッサのみで可能な音声信
号加工方法のひとつを示す必要がある。よって実際の使
用例を簡単に説明する。但し、このことは本発明がマイ
クロプロセッサを使用したもののみに利用されることを
意味しない。（周期検出）入力された音声信号より、まず周期検出を行う。普通に喋られる音声の基本ピッチは８０ヘルツから３５
０ヘルツまで広い範囲を取りうる。同一の単語中でもｌ
オクターブ近い周波数シフトを行う場合がある。関西の
方言では「家」のイから工に移る時、音の高さはｌオク
ターブ近く下がる。また、母音のフォルマントの存在が周期検出を非常に難
しくしている。フォルマントとはピッチによりあまり変
動しない共震周波数のことで、基本ピッチの周波数成分
よりフォルマント周波数成分が非常に強く、一番低いフ
ォルマント周波数を基本ピッチと誤り易い。この傾向は
よく訓練された話者になる程激しい。さらに、女性のような高い声では基本ピッチ成分と一番
低いフォルマント周波数が重なることがあり、問題を難
しくしている。これらのことと、その対策は昭和６３年特許願第１９９
５３３号の明細書に記載された周期検出方法に詳しい。対策を、簡単に説明すると、まず初めに候補になりそう
な箇所をデーター列として得た後、それから適当でない
ものを除くという方法を取っている。つまり、最初に音声信号波形の頂点のアドレスを求め、
頂点の値が周囲より小さいものを捨て、そのアドレス間
の時間差から周期として適当でないものを除いてゆくと
いう方法である。周期検出の結果は、音声入力データの位置を示す形、つ
まりポインタ配列型に記録しておく。（フォルマント算出）周期が検出される期間は、アイウェオの５母音の期間の
外、「やゆより」の重母音、「ン、な行ま行」の鼻音、
「ら行」等の子音が周期性が高く、外に「が行ざ行だ行
ば行」の濁音にも雑音性信号と周期性信号が混在してい
る。この内、母音と重母音についてはフォルマント周波数を
求めればよい。フォルマント周波数は、昭和６３年特許
願第１９９５３３号の明細書に詳しく記載されている通
り、１次のＨＰＦ、ＬＰＦの比として容易に算出出来る
。昭和６３年特許願第１９９５３３号を実施し、画面に
フォルマント情報及び音強度を表示することにより、こ
れらについて曖昧性の無い確定入力のレベルで入力出来
ることも利用出来る。馴れた利用者の為にフォルマント
表示を停止可能にすることも容易な工夫である。（母音解析）母音は、ボタン入力された区切りの間にある。十分に強度が大きく、周期性がある期間が連続すればそ
れが母音である。母音の区別はフォルマント周波数だけ
で容易に出来る。周期性があり強度が小さい場合は「ん
」である。ボタン入力された間が全く無音であれば小文字のｒつ」
にする。ボタン入力された間に周期性のある信号がなく、またそ
の次の子音が「は行さ行」の場合もｒつ」にする。なお、　「ち、つ、し」は特別な場合がある。例えば「
シ」は「さしすせそ」の中でひとつだけ歯に舌が触れず
出す音であり、「ち、つ」は「たちつてと」の中で、他
の子音が破裂音であるのにさ行系の摩擦音が短くなった
ものであり、子音だけで母音が判断出来てしまう。だか
ら人間は不精なものだからこれらの語は子音だけですま
せ、母音を省略してしまう。省略までいかなくとも母音
を小さく発音することが多い。例えば数字の１の「チ」
とか椅子の「ス」によく見られる。これがさらに、１寸
のことを「いつすん」のように子音迄省略し、小文字の
「つ」だけですむようになれば逆に簡単である。母音だ
けの省略は人によって省略したりしなかったりがあるの
で面倒を増す。これらの母音省略を利用者に許さないことにすれば問題
はない。もし許す場合には、例えば「写せ」という言葉
を「うつせ」のように発音されても判断出来るように工
夫する必要があり、面倒な割に使い勝手はそれほど向上
しない。（子音解析と文字変換）子音はボタン入力された付近にある。そこで、その前後
一定時間の範囲を調査し、付近に比べ十分に弱い信号か
、非周期性信号期間があればそこに子音があるとする。子音の期間が発見されたら、母音の始まる直前から２５
６個程度のデータを取り出し、ＦＦＴ演算を行い絶対値
を求め、１２８個の周波数情報に変換する。子音については事前に周波数情報のテーブルを用意し、
それと比較し、差の２乗和とか絶対値和により各テーブ
ルとの類似度を数字化する。類似度の高いものから順に
推定するのである。コンピュータには読み漢字の辞書データを用意し、推定
された順に検索し、合致したものから順に画面表示する
。その表示から利用者が自分の希望する語を選び確定す
る。周波数情報のテーブルを、確定した時の子音の場所に今
回発声した周波数情報を書き込むことで話者が換わって
も自動的に学習されるようになる。この時、音の高さも辞書に書き込むようにし、次回検索
する時は音の高さについても比較するようにすれば、よ
り希望する語がすぐに出るようになる。いわゆるヒツト
率が高くなる訳である。ただし、辞書には最初から音の
高さを書き込まないことが必要で、そうしなければ方言
による発音の差を吸収出来ない。また、音の高さは、周
波数を対数表示つまり音階表示し、高さの差情報でもっ
て比較するのが話者交代に対応出来、有利である。なお、音の高さの他に音の強さも情報として考えられる
が、音の強さは不安定すぎ、利用してもそれほどヒツト
率をあげられず、逆に体調や話者交代による差が大きす
ぎることになる。単純に周波数情報で検索するだけでなく、少しの工夫で
より検索の範囲を狭められ、検索が高速になり、かつ確
度が高まる。母音と母音の間に完全な無音期間が存在すればそれは破
裂音であり、「ば行た行か行」のどれかである。その場
合は、子音の始まりより母音の始まり迄の時間を測定し
、その時間の長さも検索の要素に加える。母音と母音の間に無音期間に代わり非周期性の−様な弱
い信号があれば「さ行は行」の摩擦音である。母音と母音の間に無音間間に代わり非常に弱い周ｙ月性
が検出されたら「が行だ行ば行」のいずれかの濁音であ
る。濁音は声帯の振動が止まらない状態で、口も鼻へも
音が抜けない期間があるのでそうなる。ただし、濁音の
中て「ざ母音」の摩擦濁音では、口が摩擦音が生じる程
度に開けられており、音の強度が大きい、また同じガ行
でも鼻濁音の、力°行と書かれる音も鼻への通路が問い
ている分音の強度が大きく、大きさで判断出来る。つまり、子音期間に比較的長く弱い周期性が検出された
時、その強さが十分に弱ければ「が行だ行は行」より検
索し、それ以外であれば「ざ打力。行な行ま行」より検索する。短く弱い周期性が検出されたなら「ヤ行ワ行う行」より
検索する。Embodiments of the present invention will be described below based on the drawings. However, the embodiment shown below is an example of a character input device for embodying the technical idea of the present invention, and the circuit configuration of the device of the present invention is not specified as described below. Various modifications may be made to the device of the present invention within the scope of the claims. Further, in this specification, numbers corresponding to the members shown in the embodiments are added to the members shown in the claims so that the claims are easy to understand. However, the members described in the claims are by no means limited to the members shown in the examples. The circuit configuration of FIG. 1 is shown as an embodiment of the present invention. The voice character input device shown in FIG. 1 includes an input means 1 that converts voice into an electrical signal, and a voice identification means that identifies the voice signal output from the input means 1 and outputs the identified voice as a digital signal. 2, and a delimiter input member 3 for inputting a mora delimiter signal. This character input device is used as input means 1 of a computer in place of a keyboard. The audio input means 1 includes a microphone that converts an audio signal into an electrical signal, and a microphone amplifier that amplifies the signal from the microphone. As the voice recognition means 2, a personal computer currently available on the market can be used as is. That is, the input means 1 and the voice recognition means 2 can be constructed by adding a microphone and a microphone amplifier to a personal computer. The voice identification means 2 converts the analog signal input from the microphone amplifier into a digital signal and identifies the input voice. The audio identification means 2 identifies the mora division of the audio signal using the trigger signal from the division input member 3. A general-purpose personal computer is sufficient for the voice recognition means 2. It is not necessarily necessary to use a character input device dedicated to voice recognition. Therefore, the character input device of the present invention can be used to input characters and instructions to application software running on a personal computer, and can be input by voice instead of a keyboard. The voice identifying means 2 can use any input voice signal that can identify the trigger signal from the dividing input member 3 as a mora punctuation. The voice input signal whose mora punctuation is specified by the delimiter input member 3 is accurately recognized by the voice recognition means 2. This invention does not specify the audio signal identification method of the audio identification means 2. The voice identification means 2 may employ any method currently used or to be developed in the future that can identify mora with clearly defined boundaries. The delimiter input member 3 includes a switch for inputting a mora delimiter in accordance with the utterance of the voice. This switch is most conveniently attached to a mouse connected to your computer. A mouse is one of the pointing devices, and is connected to many computers as an inexpensive and popular item. The delimiter input member 3 inputs a trigger signal indicating a mora delimiter to the audio identification means 2 together with the audio signal. Therefore, the delimiter input member 3 is not limited to a mouse button, but any device that can input a trigger signal to the voice recognition means 2 in accordance with the mora can be used. For example, it is also possible to use some specific keys (space bar, etc.) on the keyboard as a switch for the delimiter input member 3. When inputting into a computer using a voice character input device instead of a keyboard, it can be used in the following conditions. A small area called a "window" is always displayed on a part of the CR7 display screen. When the mouse cursor points to the window, a voice input program is run instead of inputting from the keyboard. Of course, when the application software issues a display request to the window area of the screen, the window can be erased to avoid any inconvenience, and the window can be displayed again after the application has finished displaying. After characters are input by voice, the resulting data string is passed to the application, and the application software continues to run. The user of the voice character input device operates as follows to input characters instead of using the keyboard. ■ First, select the voice character input screen by operating the mouse. ■ While pronouncing the sound, the user presses the mouse button in time with the mora being uttered. ■ When you press the last key, multiple character candidates are displayed on the screen for each word. ■ The user moves the mouse while holding down the mouse button to select word candidates. Release the mouse button and enter the selected word into your computer. In the case of text, repeat this while inputting. There are many programs that implement the operations described above. One of them will be explained in detail. Furthermore, the present invention is based on the patent application filed in 1983, which the inventor previously filed.
By combining the invention with the invention described in ``Japanese Publication No. 199533'', more effects can be achieved. This embodiment will also be explained based on the system disclosed in the above-mentioned publication. This is especially cited regarding period detection and formant detection. Consider a situation where an audio signal input and a button input are connected to a computer. (Audio Sampling Frequency) It is best to select the audio input signal within the sampling frequency range of 13,000 Hz or more to about 20,000 Hz. In the example, 15,600 hertz is used. (Number of audio input bits) The audio signal is sent to the AD converter (hereinafter referred to as AD) of the audio identification means 2.
In C), the analog signal is converted into a digital signal. What is required is that low-volume consonants be captured with sufficient accuracy and that high-volume vowels do not cause overflow. When overflow occurs, the waveform becomes rectangular, producing unnecessary high frequency components. Therefore, when inputting the audio signal as it is, an AD converter of about 14 bits is used. However, by inserting an analog circuit that automatically adjusts the amplification factor between the microphone input and the DAC, it increases the amplification factor when the volume is low and lowers the amplification factor when the volume is loud, reducing the accuracy required for the DAC to about 8 bits. It can be lowered to This device absorbs changes in volume due to the position of the microphone, improving usability. (Input of audio signal) The audio input converted into digital quantity is continuously input into the memory. In other words, it is stored in a data type called an array type. If a large amount of memory cannot be obtained continuously,
On the other hand, if you have a huge amount of memory and want to use the memory while continuously inputting voice input and processing character conversion, and discarding old data, you can use a technique called array chain table, which means that the beginning and size of the allocated memory can be Although it is possible to store it and use memory while referencing it, it is best to think of it as a modification of the array type. In actuality, the input is input by intercepting the memory bus while waiting for the sampling time by a controller called DMA, which is separate from the CPU, or by periodically allocating the program execution time of the CPU by forced branches called interrupts every sampling period. Input by If the CPU has a large processing capacity, input via interrupts may be acceptable, but a general-purpose 16-bit CPU with a clock frequency of about 10 megahertz)
In the U class, if DMA is not used, processing time may be insufficient. It is sufficient that the memory size for continuous input is 1.5 seconds or more. Normal words can be uttered sufficiently within this time. Therefore, the memory for voice input only needs to be about 64 kilobytes. (Button input) It is necessary to distinguish between when a button is pressed with a bang and when it is pressed with a force. To differentiate between them, a timer is prepared, and when the button is released, it is cleared, when it is pressed, it increases, and when it reaches a certain amount, it is judged as "gyu", and when it does not reach the amount until the next time it is released, it is judged as "pop". . This is possible with an electronic circuit and can be easily realized using a program algorithm. For example, the CPU constantly monitors the button port, ■ When the button is pressed for the first time after being released (the previous time was off and now on), it records the voice input memory address at that time and uses it as a substitute for recording time. . ■ When pressed continuously (on the previous time and now on), the time is determined from the difference between the current audio input memory address and the recorded address. You can do this process. The time it takes to distinguish between bon and gyu is from 150 milliseconds to 40 milliseconds.
It is sufficient to leave it for 0 milliseconds. If it's too short, it tends to turn out to be a bit of a squeeze, and if it's too long, it can irritate people who talk quickly. This time can be selected by the user, or can be changed automatically by determining whether or not the user is speaking quickly from the time interval between bonbons. In other words, it is sufficient to change the time interval from the bonbon time interval to about half that time.For example, if the bonbon interval is longer than twice the set time, the setting time should be increased, and if it is shorter than the set time, the setting time should be shortened. good. In this way, even if the user changes, it can be handled automatically. The button boats are monitored by using interrupts or by software at a frequency of about once every 20 milliseconds. Considering the button press accuracy of humans, a frequency of "once every 5 milliseconds" is meaningless, and a frequency of "once every 100 milliseconds" is meaningless, and good results cannot be obtained with a frequency lower than this once every 100 millimeters. With practice, you can improve the accuracy of button presses to within a few millimeters, which is convenient for program processing. However, since this cannot be expected of beginners or the elderly, it is sufficient to use it once every 20 mm. For an untrained user, the variation in timing between the start of a mora and the button press is about 50 milliseconds, and it can be seen that inputting at a rate of about 5 moras per second is sufficient. It is most efficient to record button input in a format that indicates the position of audio input data when the button is pressed, that is, in a format called a pointer type array. (Start of audio signal processing) If the button is always pressed before speaking at the beginning of the processing, the audio signal can be sampled after the first button press. However, this is painful for the user, and in cases where the first sound has a fricative consonant, such as a line, it may be mistakenly input as a plosive, such as ``yūgo''. Therefore, the audio signal is always input, and the required data is a certain period of time after the first button press. The certain period of time may be about 50 milliseconds or more, taking into account variations in button presses. The sampling data is more than 1000 pieces. To achieve this, while waiting for the first button press, the free memory is used with a technology called a ring buffer, and when the button is pressed, it goes back more than 1000 times and stores it at the required location. Just transfer it. For free memory, the lower part of the memory for storing audio signals can be used, and the input audio signals are written there, erasing the oldest data. (Audio signal processing) The obtained audio signal data must be processed. There are countless ways to do this, and the method of processing audio signal data and the gist of the present invention are completely unrelated. However, since the purpose of the present invention is to realize audio input using only a general-purpose, inexpensive microprocessor, it is necessary to show one of the audio signal processing methods that can be performed using only a microprocessor. Therefore, an actual usage example will be briefly explained. However, this does not mean that the present invention is applicable only to those using a microprocessor. (Period detection) First, the period is detected from the input audio signal. The basic pitch of normal spoken voice is 80 Hz to 35 Hz.
It can have a wide range up to 0 hertz. Even in the same word
Frequency shifts close to an octave may be performed. In the Kansai dialect, when moving from ``i'' to ``kaku'', the pitch of the sound drops nearly an octave. Furthermore, the presence of vowel formants makes period detection extremely difficult. A formant is a resonant frequency that does not vary much with pitch, and the formant frequency component is much stronger than the fundamental pitch frequency component, making it easy to mistake the lowest formant frequency as the fundamental pitch. This tendency is more pronounced in highly trained speakers. Furthermore, in high-pitched voices such as those of women, the fundamental pitch component and the lowest formant frequency may overlap, complicating the problem. These matters and their countermeasures are disclosed in Patent Application No. 199 of 1988.
The period detection method described in the specification of No. 533 is detailed. To explain the countermeasure simply, we first obtain a data sequence of locations that are likely to be candidates, and then remove unsuitable locations. In other words, first find the address of the peak of the audio signal waveform,
This method discards those whose vertices have values smaller than their surroundings, and removes those whose periods are not appropriate from the time difference between the addresses. The results of period detection are recorded in a form that indicates the position of the audio input data, that is, in a pointer array type. (Formant calculation) The period in which the period is detected is outside the period of the five vowels of Aiweo, the diphthong of "Yayuyori", the nasal sound of "N, na Goma row",
Consonants such as ``ra row'' are highly periodic, and even the voiced sounds of ``ga gyo za gyo da gyaba gyo'' contain a mixture of noise signals and periodic signals. Among these, for vowels and diphthongs, formant frequencies can be found. The formant frequency can be easily calculated as the ratio of the first-order HPF and LPF, as described in detail in the specification of Patent Application No. 199533 of 1988. By implementing Patent Application No. 199533 of 1988 and displaying formant information and sound intensity on the screen, it is also possible to take advantage of the fact that these can be entered at the level of definite input without ambiguity. It is also an easy idea to make it possible for experienced users to stop the formant display. (Vowel analysis) Vowels are located between the boundaries entered by the button. If the intensity is sufficiently high and the periodicity is continuous, it is a vowel. Vowels can be easily distinguished using formant frequencies alone. If there is periodicity and the intensity is small, it is "n". If there is no sound during the button input, it is a lowercase letter r.
Make it. Even if there is no periodic signal during the button input and the next consonant is ``ha-gyo-sa-gyo'', it is also set to ``r''. Note that ``chi, tsu, shi'' may be used in special cases. for example"
``shi'' is the only sound made without the tongue touching the teeth in ``sashisu seso,'' and ``chi, tsu'' is the only sound made in ``tachitsu teto,'' even though the other consonants are plosives. It is a shortened fricative, and the vowel can be determined from the consonant alone. Therefore, because humans are lazy, we end up writing these words with only consonants and omit the vowels. Vowels are often pronounced softly, even if they are not omitted. For example, the number 1 “chi”
It is often seen in the ``s'' of chairs. This would be even simpler if we could omit the consonant and just use the lowercase letter ``tsu'' for 1 sun, such as ``itsun.'' Omitting just the vowels adds to the trouble, as some people may or may not omit the vowels. There is no problem if the user is not allowed to omit these vowels. If this were to be allowed, it would be necessary to devise a way to be able to judge the word ``copy'' even if it is pronounced like ``utuse'', which would be troublesome but would not significantly improve usability. (Consonant analysis and character conversion) Consonants are located near the button input. Therefore, a certain period of time before and after that time is investigated, and if the signal is sufficiently weak compared to the surrounding area or there is a non-periodic signal period, it is assumed that the consonant is present there. Once the consonant period has been discovered, it is 25 minutes from just before the vowel begins.
Approximately 6 pieces of data are taken out, FFT calculation is performed to find the absolute value, and the data is converted into 128 pieces of frequency information. For consonants, prepare a table of frequency information in advance,
Compare this and quantify the degree of similarity with each table using the sum of squares of differences or the sum of absolute values. Estimation is performed in order of similarity. A dictionary of kanji readings is prepared on the computer, searched in the estimated order, and displayed on the screen in the order of matches. The user selects and confirms the desired word from the display. By writing the frequency information of the current utterance in the frequency information table at the location of the consonant at the time of confirmation, it will be automatically learned even if the speaker changes. At this time, if you also write the pitch of the sound in the dictionary and compare the pitch the next time you search, you will be able to find the word you want more quickly. This is why the so-called hit rate increases. However, it is necessary to not write the pitch in the dictionary from the beginning, otherwise it will not be able to absorb differences in pronunciation depending on dialect. Furthermore, it is advantageous to display the frequency in a logarithmic display, that is, in a scale, and to compare the pitch using the height difference information, since this can accommodate speaker changes. In addition to the pitch, the strength of the sound can also be considered as information, but the strength of the sound is too unstable, and even if you use it, you won't be able to increase the hit rate much, and on the contrary, the difference due to physical condition or speaker change It will be too big. In addition to simply searching based on frequency information, with a little ingenuity, the search range can be narrowed, making the search faster and more accurate. If there is a complete silent period between vowels, it is a plosive sound, and it is either ``bago ta ga gyo''. In that case, measure the time from the beginning of the consonant to the beginning of the vowel, and add that length of time as a search element. If there is a weak non-periodic --like signal between the vowels instead of a silent period, it is a fricative in ``sa row wa row.'' If a very weak periodicity is detected between vowels instead of a silent interval, it is one of the voiced sounds of ``gagyoda gyabagyo.'' Dryness occurs because the vocal cords do not stop vibrating, and there is a period when the sound does not pass through the mouth or nose. However, among the voiced sounds, in the fricative sound of ``Za vowel'', the mouth is opened to the extent that a fricative sound is produced, and the sound intensity is large. The intensity of the diacritic sound that the passage to is asking is large and can be judged by its size. In other words, when a relatively long and weak periodicity is detected in the consonant period, if the strength is sufficiently weak, the search is performed using ``ga gyo da gyo wa gyo'', and otherwise, it is searched using ``zabatiki. gyo na gyo''. Search from ``Ma line''. If a short and weak periodicity is detected, search from "Ya row wa do row".

【図面の簡単な説明】[Brief explanation of the drawing]

第１図はこの発明の一実施例を示す音声による文字入力
装置のブロック線図である。ｌ・・・・・・入力手段、　　　２・・・・・・音声識
別手段、３・・・・・・区切入力部材。FIG. 1 is a block diagram of a voice character input device showing an embodiment of the present invention. l...Input means, 2...Voice identification means, 3...Separator input member.

Claims

【特許請求の範囲】音声を電気信号に変換する入力手段１と、この入力手段
１から出力される音声信号を識別し、識別した音声をデ
ジタル信号で出力する音声識別手段２とを備える文字入
力装置において、モーラの区切り信号を入力する区切入力部材３を備えて
おり、この区切入力部材３から入力されるトリガー信号
が音声識別手段２に入力され、音声識別手段２が区切入
力部材３からのトリガー信号で、モーラの区切りを識別
するように構成されたことを特徴とする音声による文字
入力装置。[Claims] A character input device comprising an input means 1 for converting voice into an electrical signal, and a voice identification means 2 for identifying the voice signal output from the input means 1 and outputting the identified voice as a digital signal. The device is equipped with a delimiter input member 3 for inputting a mora delimiter signal, a trigger signal inputted from the delimiter input member 3 is input to the voice recognition means 2, and the voice recognition means 2 receives the delimiter signal from the delimiter input member 3. A character input device using voice, characterized in that the device is configured to identify a mora break using a trigger signal.