JP2008040197A

JP2008040197A - Utterance training device

Info

Publication number: JP2008040197A
Application number: JP2006215275A
Authority: JP
Inventors: Shingo Yuasa; 信吾湯浅
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2006-08-08
Filing date: 2006-08-08
Publication date: 2008-02-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a utterance training device enabling a utterance trainee to intuitively perceive whether one's uttered speech is proper, using a simple constitution. <P>SOLUTION: The utterance training device comprises a storage means 17 of storing utterance training data 18, a display means 13 for reading a character string to be a training subject out of the utterance training data 18 and displaying the character string, a speech input means 14 for inputting a speech that the trainee speaks according to the displayed character string, a speech recognizing means 15 for recognizing the input speech and generating a recognition result, a comparative determining means 11 of comparing the character string and recognition result with each other to determine whether the recognition result is correct, and an output means 16 of outputting the result of the comparative determination in prescribed form with different colors. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、発話訓練装置に関し、詳しくは、聴覚に障害を持っている人が正しい発声方法を習得するための発話訓練装置に関するものである。 The present invention relates to an utterance training apparatus, and more particularly to an utterance training apparatus for a person with hearing impairments to acquire a correct utterance method.

聴覚に障害を持っている人は、自らの発した音声が聴覚を介して認識できないため、自分が正しい発音で発声しているのかを自ら判断することが困難で、発声練習を一人で行うのが困難であった。
また、従来提案されている発声訓練装置では、各種の検出センサを訓練者の口腔内、鼻部分、喉部分などに装着して発声を行うことで訓練者の音声情報を収集する構成のものが提案されているが、このものでは、前記のような各種センサを用いることから装置自体のコストアップに繋がり、また、訓練者の口腔内などにセンサを装着する必要があるため、訓練者にとって不快感を伴うものであった。 People with hearing impairments cannot recognize their own voice through hearing, so it is difficult to determine whether they are speaking with the correct pronunciation. It was difficult.
In addition, the conventionally proposed utterance training apparatus is configured to collect voice information of the trainer by attaching various detection sensors to the trainee's mouth, nose, throat, etc. Although this has been proposed, the use of various sensors as described above leads to an increase in the cost of the apparatus itself, and it is necessary for the trainer to install the sensor in the oral cavity of the trainee. It was accompanied by pleasure.

このような問題を解決するものとして、特許文献１では、図５に示すように、手本となる音声波形データと訓練者からマイク２を介して入力される音声波形データとの一致度合いを評価して、それらの音声波形データを対応付けて表示するディスプレイ３と、訓練者の触覚に接触させて、前記音声波形データに応答して振動するハンディアクチュエータ４及びグッドイヤー５とを備えた発声練習訓練器１が提案されている。 In order to solve such a problem, in Patent Document 1, as shown in FIG. 5, the degree of coincidence between voice waveform data as a model and voice waveform data input from a trainer via a microphone 2 is evaluated. The voice practice training includes the display 3 that displays the voice waveform data in association with each other, and the handy actuator 4 and the Goodyear 5 that vibrate in response to the voice waveform data in contact with the tactile sense of the trainee. A vessel 1 has been proposed.

この発声練習訓練器１では、手本となる音声波形データに応答してハンディアクチュエータ４及びグッドイヤー５を振動させることにより、訓練者は触覚により手本の発生音を知覚することができるとともに、ディスプレイ３に表示された手本の音声波形データや口の空け方のヒントなどを参考にして発声をし、その判定結果がディスプレイ３に音声波形データとして表示されるとともに、その音声波形データに応答してハンディアクチュエータ４及びグッドイヤー５が振動して触覚により自らの発声音の知覚ができる構成となっている。
特開平１０−１６１５１８号公報 In this vocal training trainer 1, the handicap actuator 4 and the Goodyear 5 are vibrated in response to voice waveform data serving as a model, so that the trainee can perceive the generated sound by touch and display The voice is uttered with reference to the voice waveform data of the model displayed in 3 and hints on how to open the mouth, and the determination result is displayed as voice waveform data on the display 3 and responds to the voice waveform data. Thus, the handy actuator 4 and the Goodyear 5 vibrate so that their uttered sounds can be perceived by touch.
Japanese Patent Laid-Open No. 10-161518

しかし、前記特許文献１で提案されている発声練習訓練器１では、音声波形データや「もっと大きく発声して下さい」などのような文字表示、及び振動により判定結果を出力する構成となっており、訓練者にとって、自らの発した音声が正しいのか否かを直感的に知覚できる構成ではなく、また、ハンディアクチュエータ等が必要であり装置の複雑化、高コスト化を伴うものであった。 However, the utterance practice trainer 1 proposed in Patent Document 1 is configured to output the determination result by voice waveform data, character display such as “Please speak more loudly”, and vibration. The trainer is not able to intuitively perceive whether or not his / her voice is correct, but also requires a handy actuator and the like, which complicates the apparatus and increases costs.

本発明は、前記問題を解決するために提案されたもので、その目的は、簡易な構成により発話訓練者が発した音声の正否などを直感的に知覚させることができる発話訓練装置を提供することにある。 The present invention has been proposed to solve the above-described problem, and an object of the present invention is to provide an utterance training apparatus that can intuitively perceive the correctness of speech uttered by an utterance trainer with a simple configuration. There is.

前記目的を達成するために、請求項１に記載の発話訓練装置は、発話訓練データを記憶する記憶手段と、前記発話訓練データから訓練課題となる文字列を読み出し、該文字列を表示する表示手段と、表示された前記文字列に従って発話訓練者が発する音声を入力する音声入力手段と、入力された音声を認識し、認識結果を生成する音声認識手段と、前記文字列と前記認識結果との正否を比較判定する比較判定手段と、比較判定の結果を所定の態様で色を変化させて出力する出力手段とを備えたことを特徴とする。
ここに、文字列とは、１文字や単語、並びに挨拶などの定型文も含むものである。
また、色を変化させる態様には、出力前後の色を変化させるものだけではなく、例えば、ＬＥＤなどで発光していないＬＥＤを発光させることも含むものである。 In order to achieve the object, the speech training apparatus according to claim 1, a storage unit for storing speech training data, and a display for reading a character string to be a training task from the speech training data and displaying the character string Means, voice input means for inputting a voice uttered by an utterance trainer according to the displayed character string, voice recognition means for recognizing the input voice and generating a recognition result, the character string, and the recognition result Comparison determination means for comparing and determining whether the image is correct or not, and output means for outputting the result of the comparison determination by changing the color in a predetermined manner.
Here, the character string includes one character, a word, and a fixed sentence such as a greeting.
Further, the mode of changing the color includes not only changing the color before and after the output but also causing the LED that is not emitting light, for example, to emit light.

請求項２では、請求項１において、前記出力手段は、前記文字列の色を変化させて出力することを特徴とする。 According to a second aspect of the present invention, in the first aspect, the output unit outputs the character string by changing a color.

請求項３では、請求項１において、前記音声認識手段は、入力された音声から算出した所定の確率値に基づいて、１又は複数の認識結果を生成する構成としており、前記出力手段は、前記１又は複数の認識結果を前記確率値とともに色を変化させて出力することを特徴とする。 According to a third aspect of the present invention, in the first aspect, the voice recognition unit is configured to generate one or a plurality of recognition results based on a predetermined probability value calculated from input voice, and the output unit includes the output unit One or a plurality of recognition results are output together with the probability value while changing the color.

請求項４では、請求項１乃至３のいずれか１項において、前記出力手段は、前記音声入力手段から入力された音声の大小を更に色を変化させて出力することを特徴とする。 According to a fourth aspect of the present invention, in any one of the first to third aspects, the output means outputs the magnitude of the voice input from the voice input means by further changing the color.

請求項１乃至４に記載の発話訓練装置によれば、比較判定の結果を色を変化させて出力するので、発話訓練者の視覚刺激に訴えて容易かつ直感的に知覚させることができる。 According to the utterance training apparatus according to the first to fourth aspects, since the result of the comparison determination is output by changing the color, it can be easily and intuitively perceived by the visual stimulus of the utterance trainer.

請求項２では、表示された文字列の色を変化させて比較判定の結果を出力するので、発話訓練者に視点を変えさせることなく、より容易に比較判定の結果を知覚させることができる。 According to the second aspect, since the result of the comparison determination is output by changing the color of the displayed character string, it is possible to more easily perceive the result of the comparison determination without causing the utterance trainer to change the viewpoint.

請求項３では、入力された音声から生成された１又は複数の認識結果とともに確率値も色を変化させて出力する構成としているので、発話訓練者の発した音声が、どの文字として認識されたかを視覚的に知覚させることができる。
すなわち、入力された音声が表示された文字列と１００％一致したときには、１の認識結果を確率値とともに色を変化させて出力する一方、入力された音声から生成された認識結果が複数の場合には、複数の認識結果を確率値とともに色を変化させて出力する構成としているので、発話訓練者は、自らの発した音声が、どの程度、発しようとした文字列に近く、また、どの文字列に近いのかを、視覚で直感的に知覚できる。 In claim 3, since the probability value is also changed in color and output together with one or a plurality of recognition results generated from the input speech, which character the speech uttered by the utterance trainer is recognized as Can be perceived visually.
That is, when the input speech matches 100% with the displayed character string, one recognition result is output while changing the color together with the probability value, while there are a plurality of recognition results generated from the input speech. Is configured to output multiple recognition results by changing the color along with the probability value, so that the utterance trainer can tell how close his speech is to the character string he tried to utter and which You can visually perceive whether it is close to a character string.

請求項４では、音声の大小も色を変化させて出力するので、音声の大小も視覚刺激に訴えて、容易に知覚させることができる。 According to the fourth aspect of the present invention, since the sound level is also changed and output, the size of the sound can be easily perceived by appealing to the visual stimulus.

以下に、本発明の実施の形態について、図面を参照しながら説明する。
図１乃至図４は、本実施形態に係る発話訓練装置を示し、図１は、本実施形態の発話訓練装置の概略構成を示すブロック図、図２は、本実施形態に係る発話訓練装置の表示及び出力の例を示す概略図、図３は、本実施形態に係る発話訓練装置の表示及び出力の他例を示す概略図、図４は、本実施形態で実行される発話訓練の基本動作を示すフローチャートである。 Embodiments of the present invention will be described below with reference to the drawings.
1 to 4 show an utterance training apparatus according to the present embodiment, FIG. 1 is a block diagram showing a schematic configuration of the utterance training apparatus according to the present embodiment, and FIG. 2 shows an utterance training apparatus according to the present embodiment. FIG. 3 is a schematic diagram illustrating an example of display and output, FIG. 3 is a schematic diagram illustrating another example of display and output of the speech training apparatus according to the present embodiment, and FIG. 4 is a basic operation of speech training performed in the present embodiment. It is a flowchart which shows.

図１に示す発話訓練装置１０は、装置の各部を制御するとともに比較判定手段を構成するＣＰＵ１１と、マウスやキーボード、操作キー、タッチパネルなどで構成される操作手段１２と、液晶モニタ（ＬＣＤ）やＣＲＴなどで構成される表示手段１３と、マイクロホンなどで構成される音声入力手段１４と、入力された音声を認識し、認識結果を生成する音声認識手段１５と、各部の制御プログラムや発話訓練データ１８、発話訓練プログラム１９などを格納する記憶手段１７と、比較判定の結果を所定の態様で出力する出力手段１６とを備えている。 An utterance training apparatus 10 shown in FIG. 1 controls a CPU 11 that controls each part of the apparatus and constitutes a comparison / determination unit, an operation unit 12 that includes a mouse, a keyboard, operation keys, a touch panel, a liquid crystal monitor (LCD), Display means 13 composed of a CRT, voice input means 14 composed of a microphone, speech recognition means 15 for recognizing the input speech and generating a recognition result, control programs and speech training data for each part 18, storage means 17 for storing the speech training program 19 and the like, and output means 16 for outputting the result of the comparison determination in a predetermined manner.

詳しくは、記憶手段１７には、発話訓練者が発声練習をするための訓練課題として、１文字、単語、挨拶などの定型文などの文字列を記憶した発話訓練データ１８が記憶されている。
ここで、発話訓練データ１８は、例えば、母音、あ行、い行・・・、２字の単語、３字の単語・・・、挨拶文・・・、などとしてデータ化されており、発話訓練者の操作により、訓練課題の中から所望のものを選択可能なように構成されており、習得状況に応じて、あるいは後述する発話訓練の開始の操作による自動制御により難易度を徐々に上げて実行するなどのモード選択機能を備える構成としてもよい。 Specifically, the storage unit 17 stores utterance training data 18 in which character strings such as one letter, a word, and a fixed sentence such as a greeting are stored as a training task for the utterance trainer to practice speaking.
Here, the utterance training data 18 is converted into data, for example, as vowels, a line, a line, a 2-character word, a 3-character word, a greeting, and so on. It is configured so that the desired one can be selected from the training tasks by the trainer's operation, and the difficulty is gradually increased according to the acquisition situation or by automatic control by the operation of starting speech training described later It is good also as a structure provided with mode selection functions, such as executing.

音声認識手段１５により実行される音声認識技術については公知の音声認識技術が適用できるが、例えば、入力された音声をＡ／Ｄ変換器などにより音声データとしてデジタル化し、音声分析などによりその音声データから特徴量を抽出し、例えば、ＬＰＣ（ＬｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｅｆｆｉｃｉｅｎｔ）ケプストラムやＭＦＣＣ（メル周波数ケプストラム係数）などの認識計算に最適な音響特徴ベクトルに変換する。
変換された音響特徴ベクトルは、尤度算出プログラムにより、音響モデルに格納されているＨＭＭ（隠れマルコフモデル）などを適用した音素モデルを参照して、尤度、すなわち確率値が算出され、算出された尤度に基づいて、認識用ルールが格納された辞書部から最尤のものを認識結果として生成する構成としてもよい。 As the speech recognition technology executed by the speech recognition means 15, a known speech recognition technology can be applied. For example, the input speech is digitized as speech data by an A / D converter and the speech data is analyzed by speech analysis or the like. The feature amount is extracted from the image and converted into an acoustic feature vector optimal for recognition calculation such as an LPC (Linear Predictive Coefficient) cepstrum or MFCC (Mel frequency cepstrum coefficient).
The converted acoustic feature vector is calculated by calculating the likelihood, that is, the probability value, by referring to the phoneme model to which the HMM (Hidden Markov Model) or the like stored in the acoustic model is applied by the likelihood calculation program. The maximum likelihood may be generated as a recognition result from the dictionary unit storing the recognition rules based on the likelihood.

尚、音声認識の際に、公知の雑音抑圧処理などを行う構成としてもよい。
また、ＨＭＭを適用する際には、音素環境を考慮して適用することが好ましく、例えば、表示される文字列が２文字以上の場合には、先行音素及び後続音素も考慮した音声認識を行う構成としてもよい。 Note that a known noise suppression process or the like may be performed during speech recognition.
In addition, when applying the HMM, it is preferable to apply it in consideration of the phoneme environment. For example, when the displayed character string includes two or more characters, speech recognition is performed in consideration of the preceding phoneme and the subsequent phoneme. It is good also as a structure.

ＣＰＵ１１は、前記のように生成された認識結果と、表示手段１３に表示された文字列とを比較判定し、出力手段１６は、比較判定の結果を所定の態様で色を変化させて出力する構成としている。
尚、比較判定の結果の出力は、発話訓練者の音声の入力後、リアルタイムで出力する構成とするのが好ましい。これにより、発話訓練者は、自らの入力した音声の比較判定の結果をリアルタイムで、知覚することができ、更に、その後、直ちに再入力を行える構成とすれば、比較判定の結果に基づいて、段階的に正しい発音を行うように発話訓練を行うことが可能となる。
また、図１では、発話訓練装置１０を機能ブロックの組み合わせとして示しているが、パソコンに専用の音声入力手段１４、出力手段１６を付加したり、あるいはそのような付加をせずに、パソコンに各種のアプリケーションソフトを組み込むことで実現するようにしてもよい。 The CPU 11 compares and determines the recognition result generated as described above and the character string displayed on the display unit 13, and the output unit 16 outputs the result of the comparison determination by changing the color in a predetermined manner. It is configured.
In addition, it is preferable that the output of the comparison determination result is output in real time after the speech trainee's voice is input. Thereby, the utterance trainer can perceive the result of the comparison determination of his / her input voice in real time, and if it is configured to be able to re-input immediately thereafter, based on the result of the comparison determination, Speaking training can be performed so that correct pronunciation is performed step by step.
In FIG. 1, the speech training apparatus 10 is shown as a combination of functional blocks. However, a dedicated voice input means 14 and output means 16 may be added to the personal computer, or may not be added to the personal computer. It may be realized by incorporating various application software.

次に、表示手段１３の文字列表示の例及び出力手段１６の比較判定結果出力の例を図２に基づいて説明する。
図２に示すように、本実施形態では、表示及び出力の態様として６つのパターンを例示している。 Next, an example of character string display of the display unit 13 and an example of comparison determination result output of the output unit 16 will be described with reference to FIG.
As shown in FIG. 2, in this embodiment, six patterns are exemplified as display and output modes.

図２（ａ）では、液晶モニタなどから構成される表示手段１３に文字列として「あ」が表示されており、表示手段１３とは別に、ＬＥＤなどの光源から構成される出力手段１６が設けられている。
本例では、入力された音声から生成された認識結果と文字列、ここでは「あ」とが一致すると、例えば、出力手段（ＬＥＤ）１６を赤色に発光させ、一方、認識結果と文字列が一致しない場合は、出力手段１６を他の色に発光させる構成としている。
これにより、比較判定の結果を発話訓練者の視覚刺激に訴えて、容易かつ直感的に知覚させることができる。また、簡易な構成かつ低コストの発話訓練装置１０が提供できる。
尚、比較判定結果の出力前は、出力手段（ＬＥＤ）１６を消灯させておいてもよく、あるいは、例えば、音声入力を促すべく比較判定結果の正否とは異なる色に発光させておく構成としてもよい。 In FIG. 2A, “A” is displayed as a character string on the display means 13 constituted by a liquid crystal monitor or the like, and an output means 16 constituted by a light source such as an LED is provided separately from the display means 13. It has been.
In this example, if the recognition result generated from the input speech matches the character string, here “A”, for example, the output means (LED) 16 emits red light, while the recognition result and the character string are If they do not match, the output means 16 is configured to emit light in another color.
Thereby, the result of the comparison determination can be easily and intuitively perceived by the visual stimulus of the utterance trainer. In addition, the speech training apparatus 10 with a simple configuration and low cost can be provided.
Before outputting the comparison determination result, the output means (LED) 16 may be turned off, or, for example, the light is emitted in a color different from the correctness of the comparison determination result in order to prompt voice input. Also good.

図２（ｂ）では、表示手段１３と出力手段１６は液晶モニタなどから構成されており、表示された文字列が出力手段１６も兼ねる構成としている。
すなわち、表示された文字列の色を変化させることにより比較判定の結果を出力する構成としており、例えば、表示された際には文字列「あ」は白色として出力されているが、発話訓練者が音声を入力し、その認識結果が「あ」であれば、赤色を出力し、一方、認識結果と文字列が一致しない場合は、他の色を出力する構成としている。
これによれば、文字列自体の色を変化させるので、比較判定の結果を視点を変えさせることなく発話訓練者の視覚刺激に訴えることができ、より容易かつ直感的に知覚させることができる。 In FIG. 2B, the display means 13 and the output means 16 are composed of a liquid crystal monitor or the like, and the displayed character string also serves as the output means 16.
In other words, the result of the comparison determination is output by changing the color of the displayed character string. For example, the character string “A” is output as white when displayed, but the utterance trainer When the voice is input and the recognition result is “A”, red is output. On the other hand, when the recognition result does not match the character string, another color is output.
According to this, since the color of the character string itself is changed, the result of the comparison determination can be appealed to the visual stimulus of the utterance trainer without changing the viewpoint, and can be perceived more easily and intuitively.

図２（ｃ）では、図２（ａ）の例に加えて、出力手段１６は、入力された音声の大小も色を変化させて出力する構成としている。
すなわち、本例の出力手段１６は、音声の正否とともに、音声の大小をレベルメータ１６ａで出力する構成としており、これにより、発話訓練者が自らの発した音声の大小を視覚刺激に訴えて、容易に知覚することができる。
例えば、本例では、発話訓練者が発した音声が小さい場合には、ＬＥＤなどから構成されるレベルメータ１６ａは、「小」に対応する箇所のみを発光させる構成とし、音声の大小に併せて、図示のように順次、発光させる箇所を増やす構成としている。
尚、この場合は、音声の正否もレベルメータ１６ａの色を図２（ａ）の例で説明したように色を変化させて出力する。または、別途、図２（ａ）の例で説明したようなＬＥＤを音声の正否用の出力手段として設ける構成としてもよい。 In FIG. 2 (c), in addition to the example of FIG. 2 (a), the output means 16 is configured to change the color of the input voice and output it.
That is, the output means 16 of this example is configured to output the level of the voice with the level meter 16a together with the correctness of the voice. Can be easily perceived.
For example, in this example, when the speech uttered by the utterance trainer is small, the level meter 16a composed of an LED or the like is configured to emit light only at the portion corresponding to “small”, and is combined with the size of the speech As shown in the figure, the number of light emitting portions is sequentially increased.
In this case, whether the sound is correct or not is output by changing the color of the level meter 16a as described in the example of FIG. Alternatively, an LED as described in the example of FIG. 2A may be separately provided as an audio output unit.

図２（ｄ）では、図２（ｂ）の例に加えて、前記の図２（ｃ）の例と同様に、出力手段１６は、入力された音声の大小も色を変化させて出力する構成としており、音声の正否は液晶モニタに表示された文字列の色を変化させることにより出力するとともに、音声の大小は別に設けたレベルメータ１６ａで出力する構成としている。
尚、音声の大小の出力は、前記した構成に限られず、例えば、ＬＥＤによる発光の強弱や、液晶モニタでの点滅などにより、音声の大小を色を変化させて出力する構成としてもよく、例えば、図２（ｄ）で、ＬＥＤを別途設けず、液晶モニタに表示された文字列を点滅させることにより、あるいは、文字列とは別に、液晶モニタ上で音声の大小をレベルメータなどにより色を変化させて出力する構成としてもよい。 In FIG. 2 (d), in addition to the example of FIG. 2 (b), as in the example of FIG. 2 (c), the output means 16 changes the color of the input audio and outputs it. In this configuration, whether the voice is correct or not is output by changing the color of the character string displayed on the liquid crystal monitor, and the level of the voice is output by a separately provided level meter 16a.
Note that the output of the sound level is not limited to the above-described configuration. For example, the sound level may be output by changing the color of the sound depending on the intensity of light emission by the LED or blinking on the liquid crystal monitor. In FIG. 2 (d), the LED is not provided separately, and the character string displayed on the LCD monitor is blinked, or separately from the character string, the level of the sound is adjusted with a level meter or the like on the LCD monitor. It is good also as a structure which changes and outputs.

図２（ｅ）では、図２（ａ）の例に加えて、母音とそれぞれに対応する色の変化情報、すなわち、各文字列に対応させた色を表示手段１３の付近に図示しており、更に、発話訓練者の入力した音声から生成された認識結果が、表示した文字列と異なり、他の文字列に該当するときには、その他の文字列に対応させた色に変化させて出力する構成としている。
例えば、表示した文字列が「あ」であるのに対して、入力された音声から生成された認識結果が「う」である場合には、出力手段１６を赤色ではなく、黄色に発光させる構成としている。 In FIG. 2 (e), in addition to the example of FIG. 2 (a), vowels and color change information corresponding to each vowel, that is, colors corresponding to each character string are shown in the vicinity of the display means 13. Furthermore, when the recognition result generated from the speech input by the utterance trainer is different from the displayed character string and corresponds to another character string, the color is changed to a color corresponding to the other character string and output. It is said.
For example, when the displayed character string is “A” and the recognition result generated from the input voice is “U”, the output unit 16 emits light in yellow instead of red. It is said.

この場合は、テーブルなどを作成し、文字列に対応させた色の変化情報、例えば、図２（ｅ）の例のように、文字列「あ」は「赤」、「い」は「青」、「う」は「黄」、「え」は「緑」、「お」は「紫」などとして、各文字に固有の色、あるいは色の組み合わせ、点滅などの態様（例えば、子音であれば各母音の色を点滅させるなど）で色の変化情報として記憶手段１７などに格納する構成としてもよい。
また、本例においては、図示したように母音のみの色の変化情報を図示しているが、この場合には、例えば、前述の尤度を母音の中で最尤のものを認識結果とする構成とすれば、自らの発した音声が正しいのか否かだけではなく、その音声がどの文字（母音）に近いのかも容易に知覚することができる。 In this case, a table or the like is created, and color change information corresponding to the character string, for example, the character string “A” is “red” and “I” is “blue” as in the example of FIG. ”,“ U ”is“ Yellow ”,“ E ”is“ Green ”,“ O ”is“ Purple ”, etc. Colors, combinations of colors, flashing, etc. (for example, consonants) For example, the color of each vowel may be blinked), and the information may be stored in the storage unit 17 as color change information.
Further, in this example, the color change information of only the vowel is illustrated as illustrated, but in this case, for example, the above-mentioned likelihood is the maximum likelihood among the vowels as the recognition result. With this configuration, it is possible to easily perceive not only whether or not the voice that the user utters is correct, but also which character (vowel) the voice is close to.

また、色の変化情報の表示態様は、本例で示したものに限られず、表示手段１３において、文字列とともに表示する構成としてもよく、また、母音に限られず、子音や、単語などの色の変化情報を、選択された訓練課題に基づいて、表示する構成としてもよい。
さらに、文字列に対応させた色の変化情報は、一対一に対応させて色を設定する必要性はなく、選択された訓練課題に基づいて、最小限の組み合わせの色を設定する構成としてもよい。 The display mode of the color change information is not limited to the one shown in this example, and the display unit 13 may be configured to display the character change string together with the character string. The change information may be displayed on the basis of the selected training task.
Furthermore, it is not necessary to set the color change information corresponding to the character string on a one-to-one basis, and the minimum combination of colors may be set based on the selected training task. Good.

図２（ｆ）では、図２（ｂ）の例に加えて、図２（ｅ）の例と同様に、母音とそれぞれに対応する色の変化情報を表示手段１３の付近に図示しており、図２（ｅ）と同様に、発話訓練者の入力した音声から生成された認識結果が、表示した文字列と異なり、他の文字列に該当するときには、その他の文字列に対応させた色に変化させて出力する構成としている。 In FIG. 2 (f), in addition to the example of FIG. 2 (b), vowels and color change information corresponding to each are shown in the vicinity of the display means 13, as in the example of FIG. 2 (e). Similarly to FIG. 2E, when the recognition result generated from the speech input by the utterance trainer is different from the displayed character string and corresponds to another character string, the color corresponding to the other character string The output is changed to.

尚、前記の図２（ａ）乃至（ｆ）に基づいて説明した例は、それぞれを組み合わせて、文字列の表示及び比較判定結果の出力を行う構成としてもよい。
また、図２の（ｂ）、（ｄ）及び（ｆ）では、理解を容易とするために、黒い液晶モニタに文字列を白抜きで表示しているが、これに限られず、表示時の文字列の色と、比較判定の結果を異なる色にして出力する構成とすればよい。 In addition, the example demonstrated based on said FIG. 2 (a) thru | or (f) is good also as a structure which displays each character string and outputs a comparison determination result combining each.
Further, in FIGS. 2B, 2D, and 2F, for easy understanding, the character string is displayed in white on the black liquid crystal monitor. However, the present invention is not limited to this. What is necessary is just to set it as the structure which outputs the color of a character string, and the result of a comparison determination as a different color.

次に、表示手段１３の文字列表示の例及び出力手段１６の比較判定結果出力の他例を図３に基づいて説明する。
図３に示すように、本実施形態では、更に表示及び出力の態様の他例として２つのパターンを例示している。
図２に基づいて説明した例との相違点は、表示手段１３の表示態様及び出力手段１６の出力態様であり、他の基本的な構成は図２の例と同様であるため、同一符号を付し、説明を省略する。 Next, an example of character string display of the display unit 13 and another example of comparison determination result output of the output unit 16 will be described with reference to FIG.
As shown in FIG. 3, in the present embodiment, two patterns are further illustrated as other examples of display and output modes.
The difference from the example described based on FIG. 2 is the display mode of the display unit 13 and the output mode of the output unit 16, and the other basic configuration is the same as the example of FIG. The description is omitted.

図３（ａ）及び（ｂ）では、表示手段１３及び出力手段１６は、液晶モニタなどから構成されている。
詳しくは、発話訓練者が、発話訓練データ１８から所望の訓練課題を選択すると、液晶モニタ上に、文字列が表示、すなわち、図３（ａ）では、「か」が表示されている。
発話訓練者が、表示された文字列に従って、音声入力手段１４から音声を入力すると、入力された音声は、直ちに音声認識手段１５により、前記したように認識結果が生成される。ここで、本例では、音声認識手段１５により実行される尤度算出プログラムが算出する尤度、すなわち所定の確率値に基づいて、１又は複数の認識結果を生成する構成としている。 3A and 3B, the display means 13 and the output means 16 are composed of a liquid crystal monitor or the like.
Specifically, when the utterance trainer selects a desired training task from the utterance training data 18, a character string is displayed on the liquid crystal monitor, that is, “ka” is displayed in FIG.
When the utterance trainer inputs a voice from the voice input unit 14 according to the displayed character string, the voice recognition unit 15 immediately generates a recognition result as described above. Here, in this example, one or a plurality of recognition results are generated based on the likelihood calculated by the likelihood calculation program executed by the speech recognition means 15, that is, a predetermined probability value.

すなわち、本例では、尤度算出プログラムで算出される尤度から最尤のものだけではなく、その尤度（確率値）が複数ある場合には、その尤度に基づいた複数の認識結果を生成する構成としている。
ここで、生成される認識結果を１又は複数としているのは、確率的には非常に低いが、表示した文字列と入力された音声から生成された認識結果が１００％一致している場合、すなわち、算出された確率値が１００％となる場合を仮定しているためである。 That is, in this example, when there are a plurality of likelihoods (probability values) from the likelihoods calculated by the likelihood calculation program, a plurality of recognition results based on the likelihoods are displayed. It is configured to generate.
Here, although the generated recognition result is set to one or more, the probability is very low, but when the displayed character string and the recognition result generated from the input speech match 100%, That is, it is assumed that the calculated probability value is 100%.

本例では、比較判定の結果として、前記のように生成された１又は複数の認識結果を確率値とともに色を変化させて出力手段１６に出力させる構成としている。
詳しくは、表示した文字列「か」に対して、入力された音声から生成された認識結果の確率値が、本例では、「が」が４０％、「か」が２０％、「あ」が１０％、「は」が５％として、認識結果１６ｂ乃至１６ｅ及びそれに対応する確率値を現すレベルメータ１６ｆ乃至１６ｉとして出力する構成としている。 In this example, as a result of the comparison determination, one or a plurality of recognition results generated as described above are output to the output unit 16 by changing the color together with the probability value.
Specifically, the probability value of the recognition result generated from the input voice for the displayed character string “ka” is 40% for “ga”, 20% for “ka”, and “a” in this example. Is 10% and “ha” is 5%, and the recognition results 16b to 16e and the corresponding probability values are output as level meters 16f to 16i.

ここで、認識結果１６ｂ乃至１６ｅ及びそれに対応する確率値を現すレベルメータ１６ｆ乃至１６ｉには、それぞれ異なる色が出力されている。例えば、本例では、訓練課題として表示された文字列「か」に相当する箇所１６ｃ及び１６ｇは、赤色に出力し、他は、適宜それぞれ異なる色として出力されている。
尚、それぞれ異なる色とせず、同色で出力する構成としてもよい。 Here, different colors are output to the level meters 16f to 16i representing the recognition results 16b to 16e and the corresponding probability values, respectively. For example, in this example, the portions 16c and 16g corresponding to the character string “ka” displayed as the training task are output in red, and the others are appropriately output as different colors.
In addition, it is good also as a structure which does not set it as each different color but outputs with the same color.

前記したような構成によれば、発話訓練者の発した音声が、どの文字として認識されたかを視覚的に知覚させることができる。
すなわち、入力された音声が表示された文字列と１００％一致したときには、１の認識結果を確率値とともに色を変化させて出力する一方、入力された音声から生成された認識結果が複数の場合には、複数の認識結果を確率値とともに色を変化させて出力する構成としているので、発話訓練者は、自らの発した音声が、どの程度、発しようとした文字列に近く、また、どの文字列に近いのかを、視覚で直感的に知覚できる。 According to the configuration as described above, it is possible to visually perceive as which character the voice uttered by the utterance trainer is recognized.
That is, when the input speech matches 100% with the displayed character string, one recognition result is output while changing the color together with the probability value, while there are a plurality of recognition results generated from the input speech. Is configured to output multiple recognition results by changing the color along with the probability value, so that the utterance trainer can tell how close his speech is to the character string he tried to utter and which You can visually perceive whether it is close to a character string.

例えば、本例の比較判定結果の出力を見ると、訓練課題として表示された文字列である「か」（認識結果１６ｃ）に対して、「が」（認識結果１６ｂ）の確率値が高く、「か」と「が」は、共に母音は同じ「ａ」で、後舌面を軟口蓋に接して破裂させて発音する音であるが、ここで、他の確率値を有する「あ」（認識結果１６ｄ）と「は」（認識結果１６ｅ）が低い確率値ではあるが出力されているのを見れば分かる通り、両者は、「が」と「か」とは異なり、破裂音ではなく開放音であり、発話訓練者は、これらを総合的に見ることにより、舌使いを直す必要があることが、視覚により直感的に知覚できる。
さらに、発話訓練者は、同じ課題に対して、繰り返し音声入力を行うことにより、試行錯誤を繰り返しながらも訓練課題として表示された文字列を発音できるように一人で訓練を行うことができる。 For example, looking at the output of the comparison determination result of this example, the probability value of “ga” (recognition result 16b) is higher than “ka” (recognition result 16c), which is a character string displayed as a training task. Both “ka” and “ga” are sounds with the same vowel “a” and ruptured with the back tongue surface in contact with the soft palate. Here, “a” (recognition) having other probability values. As can be seen from the results 16d) and “ha” (recognition result 16e), which are output with low probability values, they are different from “ga” and “ka”, but they are not plosive sounds but open sounds. Therefore, the speech trainer can intuitively perceive that he / she needs to correct his tongue usage by looking at these comprehensively.
Further, the utterance trainer can perform training alone so that the character string displayed as the training task can be pronounced while repeating trial and error by repeatedly inputting voice to the same task.

尚、本例では、比較判定結果の出力として、４つの認識結果及びそれに対応する確率値を色を変化させて出力する構成としているが、これに限られず、例えば、３以下あるいは５以上の認識結果及びそれに対応する確率値を色を変化させて出力させる構成としてもよい。
また、１文字のみの訓練課題を実行しているものを例示しているが、これに限られず、単語、及び挨拶などの定型文を訓練課題として表示し、比較判定結果を出力する構成としてもよい。
さらに、認識結果１６ｂ乃至１６ｅ及びそれに対応する確率値を現すレベルメータ１６ｆ乃至１６ｉの出力は、前記した構成に限られず、例えば、円グラフや折れ線グラフ、棒グラフなど、発話訓練者の視覚刺激に訴えて、直感的に知覚させることができる構成とすればよい。 In this example, as the output of the comparison determination result, the four recognition results and the corresponding probability values are output by changing the color. However, the present invention is not limited to this. For example, three or less or five or more recognitions are output. It is good also as a structure which changes a color and outputs a result and the probability value corresponding to it.
Moreover, although the thing which is performing the training task of only 1 character is illustrated, it is not restricted to this, As a structure which displays a fixed sentence, such as a word and a greeting, as a training task, and outputs a comparison determination result Good.
Further, the outputs of the level meters 16f to 16i representing the recognition results 16b to 16e and the probability values corresponding to the recognition results are not limited to the above-described configuration. In other words, the configuration can be intuitively perceived.

次に、更に他例を図３（ｂ）に基づいて説明する。
この例では、「す」を訓練課題として表示しているが、図３（ａ）に基づいて説明した例との相違点は、出力手段１６が、更に、入力された音声の大小を色を変化させて出力する構成としている点であり、他の基本的な構成は、図３（ａ）の例と同様であるため説明を省略する。 Next, another example will be described with reference to FIG.
In this example, “su” is displayed as a training task. However, the difference from the example described with reference to FIG. 3A is that the output means 16 further changes the size of the input voice. The other basic configuration is the same as the example of FIG. 3A, and the description is omitted.

詳しくは、図３（ａ）に加えて、入力された音声の大小をレベルメータ１６ａの色を変化させて出力する構成としており、これによれば、発話訓練者は、自らの発した音声が、どの文字として認識されたかを視覚的に知覚することができるとともに、音声の大小も視覚刺激に訴えて、容易に知覚することができる。 Specifically, in addition to FIG. 3 (a), the structure is such that the level of the input voice is output by changing the color of the level meter 16a. It is possible to visually perceive which character has been recognized, and to easily perceive the size of the voice by appealing to the visual stimulus.

次に、図４に基づいて、前記のように構成された、発話訓練装置１０で実行される発話訓練の基本動作を説明する。 Next, based on FIG. 4, the basic operation | movement of the speech training performed by the speech training apparatus 10 comprised as mentioned above is demonstrated.

まず、操作手段１２を操作することにより発話訓練装置１０を起動し、各種の訓練課題を記憶する発話訓練データ１８の中から所望の訓練課題を選択し、選択された訓練課題に基づいて、表示手段１６に文字列を表示する（ステップ１００〜１０１）。次いで、音声入力を待ち、表示された文字列に従って発話訓練者が音声入力手段１４から音声を入力すると、音声認識手段１５により、前記したように音声を認識し、認識結果を生成する、一方、音声入力がなされない場合は、所定時間、その入力を待ち、所定時間経過しても音声の入力がない場合には発話訓練を終了する（ステップ１０２〜１０３、１０６）。生成された認識結果と表示した文字列とをＣＰＵ１１において比較判定し、前記したように色を変化させて出力手段１６により出力し、その後、発話訓練者が終了の操作を行うと発話訓練を終了する（ステップ１０４〜１０５）。一方、終了の操作がなされない場合は、訓練課題に従って、順次、文字列が表示され、前記の動作が繰り返される。 First, the utterance training device 10 is activated by operating the operation means 12, a desired training task is selected from the utterance training data 18 storing various training tasks, and the display is performed based on the selected training task. A character string is displayed on the means 16 (steps 100 to 101). Next, waiting for voice input, when the utterance trainer inputs voice from the voice input means 14 according to the displayed character string, the voice recognition means 15 recognizes the voice as described above, and generates a recognition result. If no voice input is made, the input is waited for a predetermined time. If no voice is input even after the predetermined time has elapsed, the speech training is terminated (steps 102 to 103, 106). The CPU 11 compares and determines the generated recognition result and the displayed character string, changes the color as described above, and outputs it by the output means 16. After that, when the utterance trainer performs an end operation, the utterance training is terminated. (Steps 104 to 105). On the other hand, when the end operation is not performed, the character strings are sequentially displayed according to the training task, and the above operation is repeated.

尚、前述したようにパソコンで前記した発話訓練データ１８を記憶する記憶手段１７、発話訓練課題データ１８から訓練課題となる文字列を読み出し、文字列を表示する表示手段１３、表示された文字列に従って発話訓練者が発する音声を入力する音声入力手段１４、入力された音声を認識し、認識結果を生成する音声認識手段１５、文字列と認識結果との正否を比較判定する比較判定手段１１、比較判定の結果を所定の態様で色を変化させて出力する出力手段１６を構成し、発話訓練プログラム１９を組み込んだ場合には、ＣＤ−ＲＯＭなどの記録媒体に発話訓練プログラム１９を記憶させて、パソコンなどのＣＤドライブ（ＣＤＤ）から読み取り記憶手段１７に記憶、あるいは、通信回線を介してダウンロードして記憶させて、発話訓練を実行させる構成としてもよい。この場合は、ＰＣが発話訓練装置として機能する。 As described above, the storage means 17 for storing the utterance training data 18 on the personal computer as described above, the character string to be a training task is read from the utterance training task data 18, and the display means 13 for displaying the character string is displayed. A voice input means 14 for inputting a voice uttered by the utterance trainer, a voice recognition means 15 for recognizing the input voice and generating a recognition result, a comparison judgment means 11 for comparing and judging whether the character string and the recognition result are correct, When the output means 16 is configured to output the result of the comparison determination by changing the color in a predetermined manner and the speech training program 19 is incorporated, the speech training program 19 is stored in a recording medium such as a CD-ROM. Speaking training by reading from a CD drive (CDD) such as a personal computer and storing it in the storage means 17, or downloading and storing it via a communication line It may be configured to be executed. In this case, the PC functions as an utterance training device.

本発明に係る発話訓練装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the speech training apparatus which concerns on this invention. 本発明の発話訓練装置における表示及び出力の例を示す概略図である。It is the schematic which shows the example of the display and output in the speech training apparatus of this invention. 本発明の発話訓練装置における表示及び出力の他例を示す概略図である。It is the schematic which shows the other example of the display and output in the speech training apparatus of this invention. 本発明の同実施形態で実行される発話訓練の基本動作を示すフローチャートである。It is a flowchart which shows the basic operation | movement of the speech training performed in the same embodiment of this invention. 従来の発話訓練装置の概略構成を示す図である。It is a figure which shows schematic structure of the conventional speech training apparatus.

符号の説明Explanation of symbols

１０、ＰＣ発話訓練装置
１１ＣＰＵ（比較判定手段）
１３表示手段
１４音声入力手段
１５音声認識手段
１６出力手段
１７記憶手段
１８発話訓練データ 10, PC utterance training device 11 CPU (comparison judgment means)
13 Display means 14 Voice input means 15 Voice recognition means 16 Output means 17 Storage means 18 Speech training data

Claims

発話訓練データを記憶する記憶手段と、前記発話訓練データから訓練課題となる文字列を読み出し、該文字列を表示する表示手段と、表示された前記文字列に従って発話訓練者が発する音声を入力する音声入力手段と、入力された音声を認識し、認識結果を生成する音声認識手段と、前記文字列と前記認識結果との正否を比較判定する比較判定手段と、比較判定の結果を所定の態様で色を変化させて出力する出力手段とを備えたことを特徴とする発話訓練装置。 A storage means for storing speech training data, a character string to be a training task is read from the speech training data, a display means for displaying the character string, and a voice uttered by the speech trainer according to the displayed character string are input. A voice input unit; a voice recognition unit that recognizes an input voice and generates a recognition result; a comparison determination unit that compares the character string with the recognition result; An utterance training apparatus comprising output means for changing the color and outputting.

請求項１において、
前記出力手段は、前記文字列の色を変化させて出力することを特徴とする発話訓練装置。 In claim 1,
The speech training apparatus according to claim 1, wherein the output means outputs the character string while changing the color.

請求項１において、
前記音声認識手段は、入力された音声から算出した所定の確率値に基づいて、１又は複数の認識結果を生成する構成としており、
前記出力手段は、前記１又は複数の認識結果を前記確率値とともに色を変化させて出力することを特徴とする発話訓練装置。 In claim 1,
The voice recognition means is configured to generate one or a plurality of recognition results based on a predetermined probability value calculated from input voice.
The utterance training apparatus, wherein the output means outputs the one or more recognition results by changing a color together with the probability value.

請求項１乃至３のいずれか１項において、
前記出力手段は、前記音声入力手段から入力された音声の大小を更に色を変化させて出力することを特徴とする発話訓練装置。 In any one of Claims 1 thru | or 3,
The speech training apparatus according to claim 1, wherein the output means outputs the magnitude of the voice input from the voice input means while further changing the color.