JP2006145791A

JP2006145791A - Speech recognition device and method, and mobile information terminal using speech recognition method

Info

Publication number: JP2006145791A
Application number: JP2004335175A
Authority: JP
Inventors: Katsumi Shiono; 勝美塩野
Original assignee: NEC Saitama Ltd
Current assignee: NEC Saitama Ltd
Priority date: 2004-11-18
Filing date: 2004-11-18
Publication date: 2006-06-08
Anticipated expiration: 2024-11-18
Also published as: JP4299768B2

Abstract

<P>PROBLEM TO BE SOLVED: To prevent the degradation in a recognition rate by amplifying a speech level inputted according to the use state of a mobile information terminal into an appropriate speech level. <P>SOLUTION: The speech recognition device for recognizing the speech inputted to a microphone is equipped with: an amplifier 108 which amplifies the speech signal outputted from the microphone; a speech level detector 103A which detects the amplified speech level; a transmission gain information storage section 106A which memorizes a transmission gain, an appropriate speech level and a time constant for updating the transmission level; a transmission gain setting controller 103B which reads out the transmission gain, the appropriate speech level, and the time constant, updates the transmission gain by adding the value obtained by multiplying the gain ought to adjust the detected speech level to the appropriate speech level by the time constant and memorizes the updated transmission gain in the transmission gain information storage section; and a speech recognition section 111 which is inputted with the updated transmission gain and performs the speech recognition. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は移動可能な状態で使用される音声認識装置に関する。特に、本発明は、送話時の使用状態に応じて入力した音声レベルを適切な音声レベルに増幅できないことに起因する認識率低下の防止を可能にする音声認識装置、方法及び音声認識方法を用いた携帯型情報端末装置に関する。 The present invention relates to a speech recognition apparatus used in a movable state. In particular, the present invention provides a speech recognition apparatus, method, and speech recognition method that can prevent a reduction in recognition rate due to an inability to amplify an input speech level according to a use state at the time of transmission to an appropriate speech level. The present invention relates to the portable information terminal device used.

近年、移動機である携帯電話機に音声認識機能が設けられ、携帯電話機では送話部のマイクロフォンから音声を入力し、適正な音声レベルに増幅後、音声認識を行い、認識結果を表示部に表示させ、スピーカから音声で鳴動させたりしている。
音声認識時には適正な音声レベルの音声信号を入力した場合に一番高い認識性能が得られるが、適正な音声レベルよりも低い場合でも高い場合でも認識率が低下する。 In recent years, mobile phones, which are mobile devices, have been equipped with a voice recognition function. In mobile phones, voice is input from the microphone of the transmitter, amplified to an appropriate voice level, voice recognition is performed, and the recognition result is displayed on the display. And sounding from the speaker.
At the time of voice recognition, the highest recognition performance can be obtained when a voice signal having an appropriate voice level is input, but the recognition rate is lowered regardless of whether it is lower or higher than the appropriate voice level.

このように音声認識機能を有する携帯電話機では、標準的な送話ゲインが保存され、以下のようにして、送話部のマイクロフォンの増幅器に設定される。
図１６は本発明の前提となる携帯電話機における音声認識の使用例を説明する図である。
本図(ａ)に示すように、携帯電話機における受話部のスピーカをユーザの耳に密着した通話状態の持ち方で、音声認識が行われる。 Thus, in a mobile phone having a voice recognition function, a standard transmission gain is stored and set in the microphone amplifier of the transmission unit as follows.
FIG. 16 is a diagram for explaining an example of use of voice recognition in a mobile phone which is a premise of the present invention.
As shown in FIG. 5A, voice recognition is performed in a manner of holding a call state in which a speaker of a receiving unit in a mobile phone is in close contact with a user's ear.

この場合、携帯電話機の長さ、形状、ユーザの標準的な頭の大きさを基準とし携帯電話機の送話部のマイクロフォンとユーザの口の間の標準的な距離ｄ１、標準的な声の大きさなどの条件で送話部のマイクロフォンの増幅器に標準的な送話ゲインＧａが設定される。
しかしながら、実際には携帯電話機のユーザの携帯電話機の持ち方、頭の大きさ、声の大きさ等に関し、ユーザは同じ使用条件である、標準的な距離ｄ１を確保せず、標準的な声の大きさで発声せず変動するため、ユーザの使用条件のバラツキが生じるので、送話部のマイクロフォンに入力する音声のレベルが変動し、増幅器に設定された標準的な送話ゲインＧａで増幅された音声レベルが適正な音声レベルにならず、このため、音声の認識率が低下し、不具合が発生するという問題がある。 In this case, the standard distance d1 between the microphone of the transmitter of the mobile phone and the user's mouth, based on the length and shape of the mobile phone and the standard size of the user's head, and the standard loudness of the voice Under such conditions, a standard transmission gain Ga is set in the microphone amplifier of the transmission unit.
However, in practice, the user does not secure the standard distance d1, which is the same usage condition, in terms of how to hold the cellular phone, the size of the head, the volume of the voice, etc. Since the user's usage conditions vary, the voice level input to the microphone of the transmitter varies, and is amplified with the standard transmission gain Ga set in the amplifier. Therefore, there is a problem in that the voice level that is set does not become an appropriate voice level, and thus the voice recognition rate is lowered and a malfunction occurs.

本図(ｂ)に示すように、携帯電話機における受話部をユーザの耳から離し携帯電話機の表示部を見ながら通話を行う状態で、音声認識が行われる。
この場合、携帯電話機における送話部のマイクロフォンとユーザの口の間の標準的な距離ｄ２で標準的な音の大きさが適正な音声レベルになるように送話部のマイクロフォンの増幅器に標準的な送話ゲインＧｂが設定される。 As shown in FIG. 5B, voice recognition is performed in a state where a telephone call is made while the receiving unit of the mobile phone is separated from the user's ear and the display unit of the mobile phone is viewed.
In this case, it is standard for the microphone of the microphone of the transmitter so that the standard loudness becomes an appropriate voice level at a standard distance d2 between the microphone of the microphone of the portable phone and the user's mouth. A simple transmission gain Gb is set.

しかしながら、実際には携帯電話機における送話部のマイクロフォンとユーザの口の間の標準的な距離ｄ２を確保できず距離のバラツキが生じ、標準的な音の大きさで発声せず変動し、さらに、距離が図１６（ａ）の場合よりも大きいため騒音による影響を受けやすいため、マイクロフォンに入力する音声が変動し、増幅器に設定された標準的な送話ゲインＧｂで増幅された音声レベルが適正な音声レベルにならず、このため、音声の認識率が低下するという問題がある。 However, in practice, the standard distance d2 between the microphone of the transmitter in the mobile phone and the user's mouth cannot be ensured, resulting in a variation in distance, and fluctuations without speaking with the standard sound volume. Since the distance is larger than in the case of FIG. 16 (a), it is easily affected by noise, so that the voice input to the microphone fluctuates, and the voice level amplified by the standard transmission gain Gb set in the amplifier is There is a problem in that an appropriate voice level is not achieved and the voice recognition rate is lowered.

さらに、本図（ａ）に示すように携帯電話機をユーザの耳に密着する使用方法、本図（ｂ）に示すように携帯電話機をユーザの耳から離す使用方法では、送話部のマイクロフォンとユーザの口の間の距離ｄ１、ｄ２が相互に大きく異なることに起因して、送話部のマイクロフォンの増幅器に設定される標準的な送話ゲインＧａ、Ｇｂが相互に異なるので、本図（ａ）から本図（ｂ）への使用方法の変化に対して、本図（ａ）に示す携帯電話機における送話部のマイクロフォンの増幅器に設定された標準的な送話ゲインＧａを、本図（ｂ）に示す携帯電話機における送話部のマイクロフォンの増幅器に設定される標準的な送話ゲインＧｂに設定変更をしなければならない。この設定変更をしないと、音声の認識率が著しく低下する。 Further, in the method of use in which the mobile phone is in close contact with the user's ear as shown in FIG. 5A, the method of use in which the mobile phone is separated from the user's ear as shown in FIG. Since the distances d1 and d2 between the user's mouths are greatly different from each other, standard transmission gains Ga and Gb set in the microphone amplifier of the transmission unit are different from each other. The standard transmission gain Ga set in the microphone amplifier of the transmission unit in the mobile phone shown in FIG. The setting change must be made to the standard transmission gain Gb set in the microphone amplifier of the transmission unit in the cellular phone shown in FIG. If this setting is not changed, the speech recognition rate is significantly reduced.

逆も同様であり、本図（ｂ）から本図（ａ）への使用方法の変化に対して、本図（ｂ）に示す携帯電話機における送話部のマイクロフォンの増幅器に設定された標準的な送話ゲインＧｂを、本図（ａ）に示す携帯電話機における送話部のマイクロフォンの増幅器に設定される標準的な送話ゲインＧａに設定変更をしなければならない。この設定変更をしないと、音声の認識率が著しく低下する。 The reverse is also true, and the standard set in the microphone amplifier of the transmitter in the cellular phone shown in FIG. (B) with respect to the change in usage from FIG. (B) to FIG. (A). The transmission gain Gb must be changed to the standard transmission gain Ga set in the microphone amplifier of the transmission unit in the cellular phone shown in FIG. If this setting is not changed, the speech recognition rate is significantly reduced.

このため、本図（ａ）から本図（ｂ）への使用方法の切替、本図（ｂ）から本図（ａ）への使用方法の切替に関し、切替毎にマイクロフォンの増幅器の標準的な送話ゲインＧａ、Ｇｂの設定変更をしなければならず、操作が煩雑となる。
換言すれば、本発明の前提となる音声認識機能には、携帯電話機のユーザの声の大きさ、使用方法が必ずしも想定している条件と一致しないので音声の認識率の低下が生じるという問題がある。 For this reason, with regard to the switching of the usage method from (a) to (b) in this figure and the switching from the usage method to (a) in this figure from (b) to (a), the standard of the microphone amplifier is changed at each switching. The settings of the transmission gains Ga and Gb must be changed, and the operation becomes complicated.
In other words, the voice recognition function which is the premise of the present invention has a problem that the voice recognition rate is lowered because the volume of the user of the mobile phone and the usage method do not necessarily match the assumed conditions. is there.

従来、通話中に自動的に受話音量を調節するため、レベル検出器は、送話音声信号の直流成分の電力値Ａ１を検出及び算出し、減算器は、送話電力値Ａ１とメモリに書き込まれている基準電力値Ａ０とを減算して増幅量Ａ２を算出し、メモリ部に一旦格納し、次のタイミングで受信音声信号が音声信号処理部を経由して、信号比較部内の増幅器に入力された時に、メモリ部から読み出した受話音声の増幅量Ａ２を読み出して、増幅器の利得を可変制御し、Ａ２倍だけ受信音声信号を増幅させ、これにより、送話音量によって、受話音量を可変調節できるものがある（例えば、特許文献１参照）。 Conventionally, in order to automatically adjust the reception volume during a call, the level detector detects and calculates the DC component power value A1 of the transmitted voice signal, and the subtractor writes the transmitted power value A1 and the memory. Amplification amount A2 is calculated by subtracting the reference power value A0, and temporarily stored in the memory unit. At the next timing, the received audio signal is input to the amplifier in the signal comparison unit via the audio signal processing unit. When this is done, the amplification amount A2 of the received voice read from the memory unit is read out, and the gain of the amplifier is variably controlled to amplify the received voice signal by A2 times, thereby variably adjusting the received volume according to the transmission volume. There is something that can be done (for example, see Patent Document 1).

しかしながら、上記特許文献１では、受話音量を調節するため、増幅量を算出し増幅器の利得を可変制御し算出した増幅量だけ受信音声信号を増幅させるが、前述のように、携帯電話機をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず、さらに、携帯電話機をユーザの耳から離して使用する場合送話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず音声の認識率が低下するという問題を解決できず、さらに、携帯電話機を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯電話機を耳から離した状態から耳に密着した状態に使用方法を変えた場合、送話部のマイクロフォンの増幅器に設定すべき標準的な送話ゲインが異なるため、標準的な送話ゲインの設定変更を行う必要があり、操作が煩雑になるという問題を解決することができない。 However, in Patent Document 1, in order to adjust the reception volume, the amount of amplification is calculated, the gain of the amplifier is variably controlled, and the received audio signal is amplified by the calculated amount of amplification. When used in close contact with the ears, the sound level amplified by the microphone amplifier of the transmitter is not an appropriate sound level due to variations in the size of the head and the user's sound level. When the telephone is used away from the user's ear, the voice level amplified by the microphone amplifier of the transmitter section becomes an appropriate voice level due to variations in the distance between the microphone and mouth of the transmitter section and noise. If the usage method is changed from a state in which the mobile phone is in close contact with the ear to a state in which the mobile phone is separated from the ear, or If the usage method is changed from a state where the mobile phone is separated from the ear to a state where the mobile phone is in close contact with the ear, the standard transmission gain that should be set for the microphone amplifier of the transmitter is different. It is necessary to change the setting, and the problem that the operation becomes complicated cannot be solved.

また、従来、声の小さいあるいはマイクロホンから離れて発言する特定会議参加者の音声を他の会議参加者の音声出力レベルと同程度になるように出力されることにより受聴音声の品質の向上を図るため、複数個のマイクロホンとスピーカを用いてハンズフリーで遠隔会議を行う会議電話装置において、１または複数のマイクロホンより入力される音声信号を音声要素に分解する音声認識回路と、メモリ回路と、予め前記１または複数のマイクロホンより入力される特定会話者の音声認識回路の出力をメモリ回路に記憶するための手段と、遠隔会議時の音声認識回路の出力とメモリ回路に記憶されている内容とを照合する照合回路と、１または複数のマイクロホン対応に設けられた１または複数の利得設定回路と、照合回路によって特定会話者の音声と認識されたとき特定会話者の音声が入力されているマイクロホン対応の利得設定回路の利得を上昇制御する手段とを設けているものがある（例えば、特許文献２参照）。 Conventionally, the quality of the listening sound is improved by outputting the voice of a specific conference participant whose voice is low or speaking away from the microphone so as to be the same level as the voice output level of other conference participants. Therefore, in a conference telephone device that performs a hands-free remote conference using a plurality of microphones and speakers, a speech recognition circuit that decomposes a speech signal input from one or more microphones into speech elements, a memory circuit, Means for storing in the memory circuit the output of the voice recognition circuit of the specific talker inputted from the one or more microphones, and the output of the voice recognition circuit during the remote conference and the contents stored in the memory circuit. A specific conversation using a matching circuit for matching, one or more gain setting circuits provided for one or more microphones, and a matching circuit There are those provided with means for increasing control the gain of the microphone corresponding gain setting circuit voice of a particular conversation who is input when the voice recognition (for example, see Patent Document 2).

しかしながら、上記特許文献２では、声の小さいあるいはマイクロホンから離れて発言する特定会議参加者の音声を他の会議参加者の音声出力レベルと同程度になるように出力されるようにするが、前述のように、携帯電話機をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず、さらに、携帯電話機をユーザの耳から離して使用する場合送話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず音声の認識率が低下するという問題を解決できず、さらに、携帯電話機を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯電話機を耳から離した状態から耳に密着した状態に使用方法を変えた場合、送話部のマイクロフォンの増幅器に設定すべき標準的な送話ゲインが異なるため、標準的な送話ゲインの設定変更を行う必要があり、操作が煩雑になるという問題を解決することができない。 However, in the above-mentioned patent document 2, the voice of a specific conference participant whose voice is low or speaks away from the microphone is output so as to be the same level as the voice output level of other conference participants. Thus, when the mobile phone is used in close contact with the user's ear, the sound level amplified by the microphone amplifier of the transmitter is appropriate due to the variation in the head size and the variation in the user's voice level. When the mobile phone is used away from the user's ear, the voice level is not amplified, and it is amplified by the microphone of the microphone of the transmitter due to variations in the distance between the microphone of the microphone and the mouth and noise. The problem that the voice level does not reach the proper level and the voice recognition rate is reduced cannot be solved, and the mobile phone is moved away from the ear when it is in close contact with the ear. When the usage method is changed to a normal state, or conversely, when the usage method is changed from a state where the mobile phone is away from the ear to a state where it is in close contact with the ear, the standard transmitter to be set in the microphone amplifier of the transmitter is used. Since the speech gains are different, it is necessary to change the standard transmission gain setting, and the problem that the operation becomes complicated cannot be solved.

また、従来、自動車電話で、ハンドフリー通話時に、外部騒音の大きさが変化しても自動的に音量レベルが調整でき、明瞭な対話が行えるハンドフリー自動車電話装置を得るため、無線機と、この無線機に会話音声を供給するハンドフリー通話回路と、このハンドフリー通話回路に会話音声を入力するマイクロホンと、ハンドフリー通話回路より会話音声を出力するスピーカと、相手先名が音声にて入力された時、音声認識処理を行い、認識結果により上記無線機に発呼を指令する音声認識装置とを備えたハンドフリー自動車電話装置において、ハンドフリー通話時に音声認識装置により検出された騒音データによってマイクロホンより入力される入力音声レベルとスピーカより出力する会話音声の音量の少なくとも１つを自動的に調節するようにしたのもがある（例えば、特許文献３参照）。 In addition, conventionally, in order to obtain a hand-free car phone device that can automatically adjust the volume level even if the amount of external noise changes during a hands-free call with a car phone and can perform a clear dialogue, A hand-free call circuit that supplies conversational voice to this radio, a microphone that inputs conversational voice to this hand-free call circuit, a speaker that outputs conversational voice from the hand-free call circuit, and the other party's name are input by voice In a hands-free car telephone device having a voice recognition device that performs voice recognition processing and instructs the radio device to make a call according to a recognition result, according to noise data detected by the voice recognition device during a hand-free call. Automatically adjust at least one of the input voice level input from the microphone and the volume of the conversation voice output from the speaker Also that it was Unishi (e.g., see Patent Document 3).

しかしながら、上記特許文献３では、自動車電話で、ハンドフリー通話時に、外部騒音の大きさが変化しても自動的に音量レベルが調整でき、明瞭な対話が行えるようにしているが、前述のように、携帯電話機をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず、さらに、携帯電話機をユーザの耳から離して使用する場合受話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず音声の認識率が低下するという問題を解決できず、さらに、携帯電話機を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯電話機を耳から離した状態から耳に密着した状態に使用方法を変えた場合、送話部のマイクロフォンの増幅器に設定すべき標準的な送話ゲインが異なるため、標準的な送話ゲインの設定変更を行う必要があり、操作が煩雑になるという問題を解決することができない。 However, in the above-mentioned Patent Document 3, the volume level can be automatically adjusted even in the case of a hand-free call in a car phone even if the level of external noise changes, and a clear dialogue can be performed. In addition, when the mobile phone is used in close contact with the user's ear, the sound level amplified by the microphone amplifier of the transmitting unit due to the variation in the size of the head and the sound level of the user is an appropriate sound level. In addition, when the mobile phone is used away from the user's ear, the voice level amplified by the microphone amplifier of the transmitter is caused by the variation in the distance between the microphone of the receiver and the mouth and noise. Cannot solve the problem that the voice recognition rate declines due to an inappropriate voice level, and the mobile phone is used close to the ear and away from the ear. If the usage method is changed from a state where the mobile phone is separated from the ear to a state where the mobile phone is in close contact with the ear, the standard transmission gain to be set in the microphone amplifier of the transmission unit differs. Therefore, it is necessary to change the standard transmission gain setting, and the problem that the operation becomes complicated cannot be solved.

また、従来、入力電話音声のパワーレベルの変化に拘わらず、その語頭、語尾切れや飽和を招来することなしに入力電話音声の音声区間を正しく検出することができ、入力電話音声に対する認識性能の向上を図るため、電話回線を介して入力される電話音声に所定の増幅利得を与える前置増幅器と、この前置増幅器を介して入力される上記電話音声の音声区間を検出し、この音声区間における前記電話音声の特徴を検出して該電話音声を認識する音声認識部と、この音声認識結果に従って所定の応答音声を前記電話回線に送出する音声応答部と、前記音声認識および音声応答の過程で前記電話回線を介して最初に入力された電話音声のパワーレベルを検出する手段と、この検出されたパワーレベルに従って前記増幅器における増幅利得を設定する手段とを具備する音声入力装置がある（例えば、特許文献４参照）。 Also, conventionally, it is possible to correctly detect the voice section of the input telephone voice without incurring the beginning, ending or saturation of the input telephone voice regardless of the change in the power level of the input telephone voice. For the purpose of improvement, a preamplifier for giving a predetermined amplification gain to telephone voice input via a telephone line, and a voice section of the telephone voice input via the preamplifier are detected. A voice recognition unit for detecting a feature of the telephone voice and recognizing the telephone voice, a voice response part for sending a predetermined response voice to the telephone line according to the voice recognition result, and a process of the voice recognition and voice response Means for detecting the power level of the first telephone voice input via the telephone line and setting the amplification gain in the amplifier according to the detected power level A voice input device having a that means (e.g., see Patent Document 4).

しかしながら、上記特許文献４では、電話音声のパワーレベルを検出し、検出されたパワーレベルに従って増幅器における増幅利得を設定し、入力電話音声の音声区間を正しく検出し、入力電話音声に対する認識性能の向上を図るが、前述のように、携帯電話機をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず、さらに、携帯電話機をユーザの耳から離して使用する場合送話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず音声の認識率が低下するという問題を解決できず、さらに、携帯電話機を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯電話機を耳から離した状態から耳に密着した状態に使用方法を変えた場合、送話部のマイクロフォンの増幅器に設定すべき標準的な送話ゲインが異なるため、標準的な送話ゲインの設定変更を行う必要があり、操作が煩雑になるという問題を解決することができない。 However, in Patent Document 4, the power level of the telephone voice is detected, the amplification gain in the amplifier is set according to the detected power level, the voice section of the input telephone voice is correctly detected, and the recognition performance for the input telephone voice is improved. However, as described above, when the mobile phone is used in close contact with the user's ear, it is amplified by the microphone amplifier of the transmitter due to the variation in the head size and the variation in the voice level of the user. The voice level does not reach an appropriate level, and when the mobile phone is used away from the user's ear, the distance between the microphone of the transmitter and the mouth and the noise of the microphone of the transmitter are reduced due to noise. The problem that the voice level amplified by the amplifier does not reach the proper voice level and the voice recognition rate is reduced cannot be solved. When the usage method is changed from being in close contact with the ear to being away from the ear, or vice versa, when the usage method is changed from being away from the ear to being in close contact with the ear, Since the standard transmission gain to be set in the amplifier is different, it is necessary to change the standard transmission gain setting, and the problem that the operation becomes complicated cannot be solved.

また、従来、緊急通報機能付き自動車電話装置において、利用者の送話音声を確実に緊急通報センターに伝えるため、交通事故等の緊急事態が生じた場合に、マイクロコンピュータが、乗員から発せられる送話レベルが、所定値未満であることを判定したとき、通常状態より送話ゲインを上げてマイクからの出力信号を電力増幅させる処理を行い、したがって、ゲインコントロールアンプは、自動的に、通常状態よりも大きな電力レベルで出力信号を出力できるため、基地局に対して上り通信信号を通常状態に比べて大きな電力レベルで送ることができ、このため、乗員の送話音声を確実にサービスセンターのオペレータに対して伝えることができるものがある（例えば、特許文献５参照）。 Conventionally, in an automobile telephone device with an emergency call function, in order to reliably transmit the user's transmitted voice to the emergency call center, a microcomputer is sent from the passenger in the event of an emergency such as a traffic accident. When it is determined that the talk level is less than the predetermined value, the transmission gain is increased from the normal state and the output signal from the microphone is amplified. Therefore, the gain control amplifier automatically performs the normal state. The output signal can be output at a higher power level, so that the uplink communication signal can be sent to the base station at a higher power level than in the normal state. There is something that can be communicated to the operator (see, for example, Patent Document 5).

しかしながら、上記特許文献５では、交通事故等の緊急事態が生じた場合に、通常状態より送話ゲインを上げてマイクからの出力信号を電力増幅させる処理を行い、乗員の送話音声を確実にサービスセンターのオペレータに対して伝えるが、前述のように、携帯電話機をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず、さらに、携帯電話機をユーザの耳から離して使用する場合受話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルにならず音声の認識率が低下するという問題を解決できず、さらに、携帯電話機を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯電話機を耳から離した状態から耳に密着した状態に使用方法を変えた場合、送話部のマイクロフォンの増幅器に設定すべき標準的な送話ゲインが異なるため、標準的な送話ゲインの設定変更を行う必要があり、操作が煩雑になるという問題を解決することができない。 However, in the above-mentioned Patent Document 5, when an emergency such as a traffic accident occurs, a process for increasing the transmission gain from the normal state and amplifying the output signal from the microphone is performed to ensure the transmission voice of the passenger. As described above, when the mobile phone is used in close contact with the user's ear, as described above, the microphone of the transmitter is caused by the variation in the size of the head and the variation in the voice level of the user. When the mobile phone is used away from the user's ears, the sound level amplified by the amplifier in (1) is not transmitted properly due to variations in the distance between the microphone of the receiver and the mouth and noise. We cannot solve the problem that the voice level amplified by the microphone of the talking part does not reach the proper voice level and the voice recognition rate decreases. When the usage method is changed from a state where the device is in close contact with the ear to a state where it is separated from the ear, or conversely, when the usage method is changed from a state where the mobile phone is separated from the ear to a state where the mobile phone is in close contact with the ear, Since the standard transmission gain to be set in the amplifier of the microphone is different, it is necessary to change the standard transmission gain setting, and the problem that the operation becomes complicated cannot be solved.

特開平１１−２３９０９３号公報Japanese Patent Laid-Open No. 11-239093 特開昭６１−１６１８６３号公報JP 61-161863 A 特開平４−２６１２５４号公報JP-A-4-261254 特開平１−１４２７９９号公報Japanese Patent Laid-Open No. 1-142799 特開２００４−８０６９７号公報JP 2004-80697 A

したがって、本発明は上記問題点に鑑みて、携帯型情報端末装置の使用状態に応じて入力した音声レベルを適切な音声レベルに増幅し、認識率低下の防止を可能にする音声認識装置、方法及び音声認識方法を用いた携帯型情報端末装置を提供することを目的とする。 Therefore, in view of the above problems, the present invention amplifies a voice level input according to a usage state of a portable information terminal device to an appropriate voice level, and enables a speech recognition apparatus and method capable of preventing a reduction in recognition rate. It is another object of the present invention to provide a portable information terminal device using the voice recognition method.

本発明は前記問題点を解決するために、送話部のマイクロフォンに入力する音声を認識する音声認識装置において、送話部の前記マイクロフォンから出力される音声信号を送話ゲインで増幅する増幅器と、前記増幅器で増幅された音声レベルを検出する音声レベル検出部と、送話ゲインの初期値、送話ゲイン、適正音声レベル、送話ゲインを更新するための時定数を記憶する送話ゲイン情報記憶部と、前記送話ゲイン情報記憶部から送話ゲイン、適正音声レベル、時定数を読み出し、前記増幅器に前記送話ゲインを設定し、前記音声レベル検出部で検出された音声レベルを前記適正音声レベルにすべきゲインに前記時定数を乗じた値を前記送話ゲインに加算して前記送話ゲインを更新し、更新した前記送話ゲインを前記送話ゲイン情報記憶部に記憶させる送話ゲイン設定制御部と、前記増幅器で増幅された音声信号を入力して音声認識を行う音声認識部とを備えることを特徴とする音声認識装置を提供する。 In order to solve the above problems, the present invention provides a speech recognition apparatus for recognizing speech input to a microphone of a transmitter, and an amplifier that amplifies a speech signal output from the microphone of the transmitter by a transmission gain. A voice level detector for detecting a voice level amplified by the amplifier; and transmission gain information for storing an initial value of a transmission gain, a transmission gain, an appropriate voice level, and a time constant for updating the transmission gain. Read the transmission gain, appropriate voice level, and time constant from the storage unit and the transmission gain information storage unit, set the transmission gain to the amplifier, and set the voice level detected by the voice level detection unit to the appropriate level. A value obtained by multiplying a gain to be a voice level by the time constant is added to the transmission gain to update the transmission gain, and the updated transmission gain is stored in the transmission gain information storage unit. A transmission gain setting control unit to be stored, to provide a speech recognition apparatus characterized by comprising a voice recognition unit for performing voice recognition by entering the amplified audio signal at the amplifier.

さらに、前記送話ゲイン設定制御部は、音声認識時の第１回目の発声時に前記送話ゲイン情報記憶部から前記送話ゲインの初期値を読み出し前記増幅器に設定する。
さらに、前記送話ゲイン設定制御部は、前記送話ゲイン情報記憶部に更新された送話ゲインが記憶されている場合には、音声認識の再開時の第１回目の発声時に前記送話ゲイン情報記憶部から更新された前記送話ゲインを読み出し前記増幅器に設定する。 Further, the transmission gain setting control unit reads an initial value of the transmission gain from the transmission gain information storage unit and sets it in the amplifier during the first utterance during speech recognition.
Further, the transmission gain setting control unit, when the updated transmission gain is stored in the transmission gain information storage unit, the transmission gain at the first utterance when the speech recognition is resumed. The updated transmission gain is read from the information storage unit and set in the amplifier.

さらに、前記送話ゲイン設定制御部は、音声認識時の第１回目の発声時に前記送話ゲイン情報記憶部から前記送話ゲインの初期値を読み出し前記増幅器に設定し前記音声認識部に音声認識を行わせ、音声の認識確定後から所定時間内に音声認識の起動が検出された場合には前記送話ゲイン情報記憶部から送話ゲインの初期値を読み出し前記増幅器に設定し、取得した音声レベル情報に基づき前記送話ゲインの初期値を更新し前記送話ゲイン情報記憶部に保存し、又は前記送話ゲイン情報記憶部から更新前の送話ゲインを読み出し前記増幅器に設定し、取得した音声レベル情報に基づき前記送話ゲインを更新し、前記送話ゲイン情報記憶部に保存し、次回の発声時に更新した送話ゲインを前記送話ゲイン情報記憶部から読み出し前記増幅器に設定し、所定時間内に音声認識の起動が検出されない場合には音声認識処理を終了させる。 Further, the transmission gain setting control unit reads the initial value of the transmission gain from the transmission gain information storage unit at the time of the first utterance at the time of speech recognition, sets the initial value of the transmission gain in the amplifier, and performs speech recognition on the speech recognition unit. If the activation of speech recognition is detected within a predetermined time after the speech recognition is confirmed, the initial value of the transmission gain is read from the transmission gain information storage unit, set in the amplifier, and the acquired speech Based on the level information, the initial value of the transmission gain is updated and stored in the transmission gain information storage unit, or the transmission gain before update is read from the transmission gain information storage unit, set in the amplifier, and acquired. The transmission gain is updated based on voice level information, stored in the transmission gain information storage unit, and the updated transmission gain at the next utterance is read from the transmission gain information storage unit and set in the amplifier. And, if the activation of the voice recognition is not detected within the predetermined time to end the speech recognition process.

さらに、前記送話ゲイン設定制御部は、前記送話ゲイン情報記憶部で検出された音声レベルが前記適正音声レベルを中心とする一定範囲内に在るか又は一定範囲外に在るかを判断し、一定範囲内に在る場合の時定数を一定範囲外に在る場合の時定数よりも小さくする。
さらに、前記送話ゲイン設定制御部で更新される送話ゲインＧｎは、下記の式
Ｇn＝Ｇn-1−Ｋ×２０×ｌｏｇ（Ｘn／Ｃ）ｄＢ
（Ｋ：送話ゲインの更新の時定数（０.０＜Ｋ≦１．０）、
n：音声認識回数（＝１，2、3…）、
Ｃ：適正音声レベル、
Ｘn：音声レベル検出部で検出された音声レベル）
で表される。 Further, the transmission gain setting control unit determines whether the voice level detected by the transmission gain information storage unit is within a certain range centered on the appropriate voice level or outside the certain range. Then, the time constant when it is within the certain range is made smaller than the time constant when it is outside the certain range.
Further, the transmission gain Gn updated by the transmission gain setting control unit is expressed by the following equation: Gn = Gn-1−K × 20 × log (Xn / C) dB
(K: Time constant for updating transmission gain (0.0 <K ≦ 1.0),
n: Number of voice recognition (= 1, 2, 3 ...),
C: Appropriate audio level,
Xn: voice level detected by the voice level detector
It is represented by

さらに、テスト部が設けられ、前記テスト部は、キー操作による時定数を調整可能にし、複数回のテスト用の発声に対して前記音声認識部に音声認識を処理させ、前記送話ゲイン設定制御部に送話ゲインの最適値を予め求めさせ、前記送話ゲイン情報記憶部に送話ゲインの初期値として保存させる。
さらに、本発明は、送話部のマイクロフォンに入力する音声を認識する音声認識方法において、適正音声レベル、送話ゲインを記憶する工程と、送話部の前記マイクロフォンから出力される音声信号を前記送話ゲインで増幅する工程と、増幅された音声レベルを検出する工程と、検出された前記音声レベルを前記適正音声レベルにすべきゲインに時定数を乗じた値を前記送話ゲインに加算して前記送話ゲインを更新し更新した送話ゲインを記憶する工程と、増幅された音声信号を入力して音声認識を行う工程とを備えることを特徴とする音声認識方法を提供する。 Further, a test unit is provided, and the test unit can adjust a time constant by a key operation, allows the voice recognition unit to process voice recognition for a plurality of test utterances, and controls the transmission gain setting control. The optimum value of the transmission gain is obtained in advance in the transmission gain information storage unit and stored as the initial value of the transmission gain in the transmission gain information storage unit.
Furthermore, the present invention relates to a speech recognition method for recognizing speech input to a microphone of a transmitter, a step of storing an appropriate speech level and transmission gain, and a speech signal output from the microphone of the transmitter Amplifying with transmission gain, detecting the amplified audio level, and adding a value obtained by multiplying the detected audio level to the appropriate audio level by a time constant to the transmission gain. A speech recognition method comprising: updating the transmission gain and storing the updated transmission gain; and inputting the amplified speech signal to perform speech recognition.

さらに、本発明は、音声認識方法を用いた携帯型情報端末装置において、前記携帯型情報端末装置の携帯情報端末機能に加えて、携帯型情報端末装置の使用状態に応じて入力した音声レベルを適切な音声レベルに増幅して音声認識を行う音声認識機能を備える。
さらに、音声認識された認識結果を前記携帯型情報端末装置の受話口のスピーカに鳴動させ、前記携帯型情報端末装置の表示部に表示させる。 Furthermore, the present invention provides a portable information terminal device using a voice recognition method, wherein, in addition to the portable information terminal function of the portable information terminal device, the voice level input according to the usage state of the portable information terminal device. A voice recognition function for performing voice recognition by amplifying to an appropriate voice level is provided.
Furthermore, the recognition result recognized by voice is sounded on the speaker of the earpiece of the portable information terminal device and displayed on the display unit of the portable information terminal device.

以上説明したように、本発明によれば、送話部の前記マイクロフォンから出力される音声信号を前記送話ゲインで増幅された音声信号を入力して音声認識を行うようにし、増幅した音声信号の音声レベルを検出し、検出された音声レベルを適正音声レベルにすべきゲインに時定数を乗じた値を送話ゲインに加算して送話ゲインを更新し、更新した送話ゲインを記憶させるようにしたので、特に携帯型情報端末装置の送話部の使用状態に応じて入力した音声レベルを適切な音声レベルに増幅し、認識率低下の防止が可能になる。 As described above, according to the present invention, an audio signal output from the microphone of the transmission unit is input with the audio signal amplified by the transmission gain to perform audio recognition, and the amplified audio signal The voice level is detected, and the transmission gain is updated by adding the value obtained by multiplying the detected voice level to the appropriate voice level by the time constant to the transmission gain, and the updated transmission gain is stored. Since it did in this way, especially the audio | voice level input according to the use condition of the transmission part of a portable information terminal device is amplified to an appropriate audio | voice level, and it becomes possible to prevent a recognition rate fall.

携帯型情報端末装置をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルになり、さらに、携帯型情報端末装置をユーザの耳から離して使用する場合受話部のマイクロフォンと口の間の距離のバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルになり、音声の認識率が向上し、さらに、携帯型情報端末装置を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯型情報端末装置を耳から離した状態から耳に密着した状態に使用方法を変えた場合、自動的に送話ゲインの設定変更が行われ、従来のように送話ゲインの設定変更を行う必要がなくなり、操作が簡単になる。 When the portable information terminal device is used in close contact with the user's ear, the sound level amplified by the microphone amplifier of the transmitter is appropriate due to the variation in the head size and the variation in the user's voice level. Furthermore, when the portable information terminal device is used away from the user's ear, the sound level amplified by the microphone amplifier of the transmitter due to the variation in the distance between the microphone of the receiver and the mouth Becomes an appropriate voice level, the speech recognition rate is improved, and the portable information terminal device is changed from the state of being in close contact with the ear to the state of being separated from the ear, or vice versa. When the usage method is changed from a state in which the terminal device is away from the ear to a state in which the terminal device is in close contact with the ear, the transmission gain setting is automatically changed, and it is necessary to change the transmission gain setting as before. Eliminated, operation is simplified.

以下、本発明の実施の形態について図面を参照して説明する。
図１は本発明に係る携帯型情報端末装置の概略構成を示すブロック図である。本図に示すように、移動機である携帯型情報端末装置１００にはアンテナ１０１が設けられ、アンテナ１０１は図示しない基地局と無線通信を行う。
アンテナ１０１には無線部１０２が接続され、無線部１０２はアンテナ１０１への送信信号の変調を行い、アンテナ１０１からの受信信号の復調を行う。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram showing a schematic configuration of a portable information terminal device according to the present invention. As shown in the figure, an antenna 101 is provided in a portable information terminal device 100 that is a mobile device, and the antenna 101 performs wireless communication with a base station (not shown).
A radio unit 102 is connected to the antenna 101, and the radio unit 102 modulates a transmission signal to the antenna 101 and demodulates a reception signal from the antenna 101.

無線部１０２には制御部１０３が接続され、制御部１０３は無線部１０２を含む携帯型情報端末装置１００全体の制御を行う。
制御部１０３には操作部１０４が接続され、操作部１０４はＣＰＵ（中央演算装置）からなり、携帯電話の操作、音声認識の開始等を行う。制御部１０３は操作部１０４により音声認識の開始キーが押下されると、後述する音声認識部１１１に対して音声認識の開始命令を送信する。 A control unit 103 is connected to the wireless unit 102, and the control unit 103 controls the entire portable information terminal device 100 including the wireless unit 102.
An operation unit 104 is connected to the control unit 103. The operation unit 104 includes a CPU (Central Processing Unit), and performs operations such as operation of a mobile phone and start of voice recognition. When the voice recognition start key is pressed by the operation unit 104, the control unit 103 transmits a voice recognition start command to the voice recognition unit 111 described later.

さらに、制御部１０３には表示部１０５が接続され、表示部１０５は数字、文字、画像、音声認識の認識結果等を表示する。
さらに、制御部１０３にはメモリ１０６が接続され、メモリ１０６はデータ書き替え可能で携帯型情報端末装置１００を制御する各種情報、送受信データを保存する。
さらに、制御部１０３にはＡ／Ｄ・Ｄ／Ａコンバータ１０７が接続され、Ａ／Ｄ・Ｄ／Ａコンバータ１０７は制御部１０３への送話音の音声信号をアナログ信号からデジタル信号に変換し、制御部１０３からの受話音の音声信号をデジタル信号からアナログ信号に変換する。 Further, a display unit 105 is connected to the control unit 103, and the display unit 105 displays numbers, characters, images, speech recognition recognition results, and the like.
Further, a memory 106 is connected to the control unit 103, and the memory 106 can rewrite data and stores various information for controlling the portable information terminal device 100 and transmission / reception data.
Further, an A / D / D / A converter 107 is connected to the control unit 103, and the A / D / D / A converter 107 converts an audio signal of a transmission sound to the control unit 103 from an analog signal to a digital signal. The voice signal of the received sound from the control unit 103 is converted from a digital signal to an analog signal.

Ａ／Ｄ・Ｄ／Ａコンバータ１０７には増幅器（アンプ）１０８が接続され、増幅器１０８は後述する送話ゲイン設定制御部１０３Ｂにより送話ゲインが設定されると増幅率を変えて感度を調整し、後述するマイクロフォン１０９からのアナログ信号である音声信号を増幅する。
増幅器１０８には送話部のマイクロフォン１０９が接続され、マイクロフォン１０９はユーザの送話音を入力し電気信号に変換しアナログ信号の音声信号として増幅器１０８に出力する。 An amplifier (amplifier) 108 is connected to the A / D / D / A converter 107, and the amplifier 108 adjusts sensitivity by changing the amplification factor when the transmission gain is set by a transmission gain setting control unit 103B described later. Then, an audio signal which is an analog signal from a microphone 109 described later is amplified.
The amplifier 108 is connected to the microphone 109 of the transmission unit. The microphone 109 receives the user's transmission sound, converts it into an electrical signal, and outputs it to the amplifier 108 as an analog audio signal.

Ａ／Ｄ・Ｄ／Ａコンバータ１０７には受話部のスピーカ１１０が接続され、スピーカ１１０はＡ／Ｄ・Ｄ／Ａコンバータ１０７から受話音のアナログ信号の電気信号を受信し電気信号を受話音に変換して受話音を鳴動し、特に音声認識の認識結果を鳴動する。
制御部１０３には音声認識部１１１が接続され、音声認識部１１１はＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）のＬＳＩ（大規模集積回路）からなり、マイクロフォン１０９からの音声信号を増幅器１０８で増幅し、Ａ／Ｄ・Ｄ／Ａコンバータ１０７でデジタル化し、制御部１０３を経由して入力した音声データに関し、音声認識処理を行い、制御部１０３を介して、音声認識の認識結果を表示部１０５に表示させ、音声認識時の開始音、音声認識の認識結果を音声でスピーカ１１０に鳴動させる。 The A / D / D / A converter 107 is connected to the speaker 110 of the receiver. The speaker 110 receives an electrical signal of the analog signal of the received sound from the A / D / D / A converter 107 and converts the electrical signal into the received sound. It converts and rings the received sound, especially the recognition result of voice recognition.
A speech recognition unit 111 is connected to the control unit 103, and the speech recognition unit 111 is composed of a DSP (Digital Signal Processor) LSI (Large Scale Integrated Circuit). The speech signal from the microphone 109 is amplified by the amplifier 108, and the A / A Voice recognition processing is performed on the voice data digitized by the D / D / A converter 107 and input via the control unit 103, and the voice recognition recognition result is displayed on the display unit 105 via the control unit 103. The start sound at the time of voice recognition and the recognition result of voice recognition are caused to sound on the speaker 110 by voice.

メモリ１０６には送話ゲイン情報記憶部１０６Ａが設けられ、送話ゲイン情報記憶部１０６Ａは音声認識時に増幅器１０８への送話ゲインを最適化するための各種情報として送話ゲイン初期値（ｄＢ）、更新した送話ゲイン、音声認識の適正音声レベル、送話ゲインの更新用の時定数等を保存する。
制御部１０３には音声レベル検出部１０３Ａが設けられ、音声レベル検出部１０３Ａは、常に音声認識時に、送話部のマイクロフォン１０９からの音声信号が増幅器１０８で増幅され、Ａ／Ｄ・Ｄ／Ａコンバータ１０７で音声データにデジタル化され制御部１０３に入力した後に音声データから音声区間を検出し送話音の音声信号の音声レベルを検出する。 The memory 106 is provided with a transmission gain information storage unit 106A. The transmission gain information storage unit 106A is a transmission gain initial value (dB) as various information for optimizing the transmission gain to the amplifier 108 during speech recognition. The updated transmission gain, the appropriate voice level for speech recognition, the time constant for updating the transmission gain, and the like are stored.
The control unit 103 is provided with a voice level detection unit 103A, and the voice level detection unit 103A always amplifies a voice signal from the microphone 109 of the transmission unit by the amplifier 108 at the time of voice recognition, and A / D / D / A After being converted into voice data by the converter 107 and input to the control unit 103, a voice section is detected from the voice data to detect the voice level of the voice signal of the transmission sound.

制御部１０３には送話ゲイン設定制御部１０３Ｂが設けられ、送話ゲイン設定制御部１０３Ｂは、音声認識時に、送話ゲイン情報記憶部１０６Ａに保持される各種情報を用い、さらに、送話ゲイン設定制御部１０３Ｂで検出される音声レベルを用いて、次回の音声認識に対して音声レベルと適性音声レベルから求めたゲインに時定数を乗じ、前回の音声認識時に求めた送話ゲインに加算した値を新しい送話ゲインの値として更新する。 The control unit 103 is provided with a transmission gain setting control unit 103B. The transmission gain setting control unit 103B uses various types of information held in the transmission gain information storage unit 106A during speech recognition, and further transmits a transmission gain. Using the voice level detected by the setting control unit 103B, the gain obtained from the voice level and the appropriate voice level for the next voice recognition is multiplied by the time constant and added to the transmission gain obtained at the previous voice recognition. Update the value as the new transmission gain value.

さらに、送話ゲイン設定制御部１０３Ｂは、次回の音声認識時に前回の音声認識時に求めた送話ゲインを増幅器１０８に設定する。
上記で説明した送話ゲイン設定制御部１０３Ｂで行う送話ゲインの更新式を以下に説明する。
マイクロフォン１０９に入力される音声レベルをＡn、音声レベル検出部１０３Ａに検出される音声レベルをＸn、適正な音声レベルをＣと置き、送話ゲインＧnの更新式を下記式により表す。 Furthermore, the transmission gain setting control unit 103B sets the transmission gain obtained in the previous speech recognition in the amplifier 108 in the next speech recognition.
The transmission gain update formula performed by the transmission gain setting control unit 103B described above will be described below.
The voice level input to the microphone 109 is An, the voice level detected by the voice level detector 103A is Xn, the appropriate voice level is C, and the transmission gain Gn update formula is expressed by the following formula.

Ｇn＝Ｇn-1−Ｋ×２０×ｌｏｇ（Ｘn／Ｃ）ｄＢ
…（１）
（Ｋ：送話ゲインの更新の時定数（０.０＜Ｋ≦１．０）、
n：音声認識回数（＝１，2、3…））
Ｘn＝Ａn×１０^{Ｇｎ−１／２０} …（２） Gn = Gn-1 -K * 20 * log (Xn / C) dB
... (1)
(K: Time constant for updating transmission gain (0.0 <K ≦ 1.0),
n: Number of voice recognition (= 1, 2, 3 ...))
Xn = An * 10 ^{Gn-1 / 20} (2)

このように、次回の音声認識起動時は、前回の音声認識結果の音声レベルを反映した送話ゲインを増幅器１０８に設定することになるため、ユーザの声の大きさ、使用方法が想定している条件に応じて、音声認識に適した音声レベルが得られる。 Thus, when the next speech recognition is activated, the transmission gain reflecting the speech level of the previous speech recognition result is set in the amplifier 108. Therefore, the user's voice volume and usage method are assumed. Depending on the condition, a sound level suitable for speech recognition can be obtained.

図２は図１における送話ゲイン情報記憶部１０６Ａに保持され、音声認識時に送話ゲインを最適化するための各種情報例を説明する図である。 FIG. 2 is a diagram for explaining various examples of information for optimizing the transmission gain at the time of speech recognition, held in the transmission gain information storage unit 106A in FIG.

本図に示すように、送話ゲイン情報記憶部１０６Ａには、ユーザの口と携帯型情報端末装置１００における送話部のマイクロフォン１０９の間の距離ｄ１（図１６（ａ）参照）が３ｃｍである場合、又は距離ｄ２（図１６（ｂ）参照）が７ｃｍである場合一方を選択して基準として送話ゲインの初期値Ｇ0として「０.００ｄＢ」が保持され、さらに、更新される送話ゲイン（ｄＢ）が保持される。この更新される送話ゲインは次回の音声認識時に増幅器１０８に設定される値として使用される。 As shown in this figure, in the transmission gain information storage unit 106A, the distance d1 (see FIG. 16A) between the user's mouth and the microphone 109 of the transmission unit in the portable information terminal device 100 is 3 cm. If there is, or if the distance d2 (see FIG. 16 (b)) is 7 cm, one is selected and “0.00 dB” is held as the initial value G0 of the transmission gain as a reference, and further updated Gain (dB) is maintained. This updated transmission gain is used as a value set in the amplifier 108 at the next speech recognition.

さらに、送話ゲイン情報記憶部１０６Ａには音声認識の適正音声レベルＣが保存され、適正音声レベルＣとして、例えば、「１０００」が保持され、送信ゲインを更新する時定数Ｋが保存され、時定数Ｋとして選択可能に複数の「１．０」、「０．５」、…が保持される。
図３は図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連の動作例を説明するフローチャートである。
本図に示すように、ステップ２０１において、制御部１０３は操作部１０４の音声認識開始キー押下を検出し音声認識が起動されるのを検出する。 Further, the transmission gain information storage unit 106A stores a proper speech level C for speech recognition, and for example, “1000” is held as the proper speech level C, and a time constant K for updating the transmission gain is stored. A plurality of “1.0”, “0.5”,... Are held so as to be selectable as the constant K.
FIG. 3 is a flowchart for explaining a series of operation examples of transmission gain update processing of the transmission gain setting control unit 103B in FIG.
As shown in the figure, in step 201, the control unit 103 detects that the voice recognition start key is pressed by the operation unit 104 and detects that voice recognition is activated.

ステップ２０２において、送話ゲイン設定制御部１０３Ｂは、制御部１０３の起動検出後、送話ゲイン情報記憶部１０６Ａから更新前の送話ゲインＧn-1を読み出し増幅器１０８に設定を行う。送話ゲイン設定制御部１０３Ｂに更新前の送話ゲインが保持されていない場合には送話ゲインの初期値を増幅器１０８に設定する。
ステップ２０３において、制御部１０３は送話ゲイン設定制御部１０３Ｂが更新前の送話ゲインＧn-1を増幅器１０８に設定した後、マイクロフォン１０９より入力された音声信号を増幅器１０８で音声認識に適した音声レベル（式（２）参照）に調整した入力音声に対して、音声認識部１１１を起動して音声認識させる。 In step 202, the transmission gain setting control unit 103B detects the activation of the control unit 103, reads the transmission gain Gn-1 before update from the transmission gain information storage unit 106A, and sets it in the amplifier 108. When the transmission gain before update is not held in the transmission gain setting control unit 103B, the initial value of the transmission gain is set in the amplifier 108.
In step 203, after the transmission gain setting control unit 103B sets the transmission gain Gn-1 before update in the amplifier 108, the control unit 103 uses the amplifier 108 to recognize the voice signal input from the microphone 109 for speech recognition. For the input voice adjusted to the voice level (see Expression (2)), the voice recognition unit 111 is activated to perform voice recognition.

ステップ２０４において、送話ゲイン設定制御部１０３Ｂは音声認識部１１１からの認識結果の確定を待つ。
ステップ２０５において、送話ゲイン設定制御部１０３Ｂは、音声認識の認識結果の確定後、認識結果と音声レベル検出部１０３Ａにより検出された音声レベル情報を取得する。認識結果を表示部１０５に表示し、スピーカ１１０に音声で鳴動させる。 In step 204, the transmission gain setting control unit 103B waits for confirmation of the recognition result from the speech recognition unit 111.
In step 205, the transmission gain setting control unit 103B acquires the recognition result and the voice level information detected by the voice level detection unit 103A after the recognition result of the voice recognition is confirmed. The recognition result is displayed on the display unit 105, and the speaker 110 is sounded by voice.

ステップ２０６において、送話ゲインの更新処理（式（１）参照）を行う。
ステップ２０７において、送話ゲイン設定制御部１０３Ｂは送話ゲイン情報記憶部１０６Ａに更新された送話ゲインの値を保存し、処理を終了する。
このようにして、送話ゲインを送話ゲイン情報記憶部１０６Ａに保持し、音声認識を行う毎に送話ゲイン情報記憶部１０６Ａに保持している送話ゲインを読み出し送話ゲインの更新を行うことにより、最適な送話ゲインを取得することが可能になる。これにより、音声認識時の音声レベルが適正音声レベルになり、以降も継続して音声認識に適した音声レベルが確保される。特に、推奨する使用方法と異なる使用者が音声起動しても、数回音声認識を行うことにより、使用者の使い方、声の大きさに合わせて送話ゲインの更新が行われるため、音声認識に最適な音声レベルが用いられることになる。 In step 206, transmission gain update processing (see equation (1)) is performed.
In step 207, the transmission gain setting control unit 103B stores the updated transmission gain value in the transmission gain information storage unit 106A, and ends the process.
In this way, the transmission gain is held in the transmission gain information storage unit 106A, and the transmission gain held in the transmission gain information storage unit 106A is read and the transmission gain is updated every time speech recognition is performed. This makes it possible to acquire the optimum transmission gain. Thereby, the voice level at the time of voice recognition becomes an appropriate voice level, and the voice level suitable for voice recognition is continuously secured thereafter. In particular, even if a user who does not use the recommended method of voice activation, voice recognition is performed several times, so that the transmission gain is updated according to the user's usage and voice volume. The optimum audio level is used.

携帯電話機を含む携帯型情報端末装置をユーザの耳に密着して使用する場合頭の大きさのバラツキ、ユーザの音声レベルのバラツキに起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルになり、さらに、携帯型情報端末装置をユーザの耳から離して使用する場合受話部のマイクロフォンと口の間の距離のバラツキ、騒音に起因して送話部のマイクロフォンの増幅器で増幅された音声レベルが適正な音声レベルになり、音声の認識率が向上し、さらに、携帯型情報端末装置を耳に密着した状態から耳から離した状態に使用方法を変えた場合、又はこの逆に携帯型情報端末装置を耳から離した状態から耳に密着した状態に使用方法を変えた場合、自動的に送話ゲインの設定変更が行われ、従来のように送話ゲインの設定変更を行う必要がなくなり、操作が簡単になる。
以下に具体例で説明する。 When a portable information terminal device including a mobile phone is used in close contact with the user's ear, the voice level amplified by the microphone amplifier of the transmitter due to the variation in head size and the variation in the user's voice level When the portable information terminal device is used away from the user's ear, the microphone of the transmitter is amplified due to variations in the distance between the microphone of the receiver and the mouth and noise. When the voice level amplified in step 1 becomes an appropriate voice level, the voice recognition rate is improved, and the usage method is changed from a state where the portable information terminal device is in close contact with the ear to a state where it is separated from the ear, or Conversely, when the usage method is changed from a state where the portable information terminal device is separated from the ear to a state where the portable information terminal device is in close contact with the ear, the transmission gain setting is automatically changed, and the transmission gay is performed as in the conventional case. There is no need to perform configuration changes, operation is simplified.
A specific example will be described below.

図４は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝１．０である場合の例を説明する図である。
本図では、一例として、音声認識時の第１回目、第２回目、第３回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００、Ａ2＝７００、Ａ3＝７００、…であるとし、適正音声レベルをＣ＝１０００とし、以下のように、送話ゲインを算出する。 FIG. 4 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, in which the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the sound level input to the microphone 109 is small, It is a figure explaining the example in case there is no variation and the time constant K = 1.0.
In this figure, as an example, it is assumed that the voice levels input to the microphone 109 during the first, second, third, etc. speech recognition are A1 = 700, A2 = 700, A3 = 700,. The appropriate voice level is set to C = 1000, and the transmission gain is calculated as follows.

第１回目の音声認識時の発声では、送話ゲインの初期値がＧ0＝０．００ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ａ1／Ｘ1＝１
を満たし、この場合、
Ｘ1＝Ａ1＝７００
となる。 In the utterance at the time of the first speech recognition, the initial value of the transmission gain is G0 = 0.00 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A From the equation (2)
A1 / X1 = 1
In this case,
X1 = A1 = 700
It becomes.

更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−１．０×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝０．０−１．０×２０×ｌｏｇ（７００／１０００）ｄＢ
＝３．１０ｄＢ
となる。 The updated transmission gain G1 is calculated from equation (1):
G1 = G0−1.0 × 20 × log (X1 / 1000) dB
= 0.0-1.0 x 20 x log (700/1000) dB
= 3.10 dB
It becomes.

なお上記の例で、図２に示すように、送話ゲイン情報記憶部１０６Ａから送話ゲインの初期値（Ｇ0＝０．００ｄＢ）を読み出し、算出された送話ゲインＧ1は更新した送話ゲインとして送話ゲイン情報記憶部１０６Ａに保持され、次回の音声認識時に更新前の送話ゲインＧ1として使用される。以下同様である。 In the above example, as shown in FIG. 2, the initial value (G0 = 0.00 dB) of the transmission gain is read from the transmission gain information storage unit 106A, and the calculated transmission gain G1 is the updated transmission gain. Is stored in the transmission gain information storage unit 106A, and is used as the transmission gain G1 before update at the time of the next speech recognition. The same applies hereinafter.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝３．１０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝７００×１０^{３．１０／２０}
＝１０００
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = 3.10 dB and is set in the amplifier 108 before utterance. In this case, the speech level X2 detected by the speech level detection unit 103A From the equation (2)
X2 = A2 × 10 ^{G1 / 20}
= 700 × 10 ^{3.10 / 20}
= 1000
It becomes.

更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−１．０×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝３．１０−１．０×２０×ｌｏｇ（１０００／１０００）ｄＢ
＝３．１０ｄＢ
となる。 The updated transmission gain G2 is calculated from equation (1):
G2 = G1-1.0 * 20 * log (X2 / 1000) dB
= 3.10-1.0 x 20 x log (1000/1000) dB
= 3.10 dB
It becomes.

第３回目の音声認識時以降の発声では、第2回目の音声認識時の場合と同様の音声レベル検出部１０３Ａにより検出される音声レベルＸ3＝１０００、Ｇ3＝３．１０ｄＢが得られる。
すなわち、送話ゲイン設定制御部１０３Ｂでは、第1回目の音声認識時には、発声前に増幅器１０８にＧ0＝０．０ｄＢを設定し、音声レベル検出部１０３Ａでは音声レベルＸ1＝７００を検出し、Ｇ0＝０．００ｄＢをＧ１＝３．１０ｄＢに更新する。 In the utterance after the third voice recognition, the voice level X3 = 1000 and G3 = 3.10 dB detected by the voice level detection unit 103A similar to the second voice recognition are obtained.
That is, at the time of the first speech recognition, the transmission gain setting control unit 103B sets G0 = 0.0 dB in the amplifier 108 before speaking, the speech level detection unit 103A detects the speech level X1 = 700, and G0 = 0.00 dB is updated to G1 = 3.10 dB.

この場合、時定数がＫ＝１．０で、マイクロフォン１０９に入力する音声レベルＡ1＝Ａ２＝Ａ3=…＝７００としバラツキが無いとしているので、第２回目の音声認識時に音声レベル検出部１０３Ａにより検出された音声レベルＸ2は、Ｘ2＝１０００となり、適正音声レベルＣ＝１０００に一致し、送話ゲインの更新値はＧ2＝３．１０ｄＢとなり、最適値となる。 In this case, since the time constant is K = 1.0 and the voice level A1 = A2 = A3 =... = 700 is inputted to the microphone 109 and there is no variation, the voice level detection unit 103A performs the second voice recognition. The detected voice level X2 is X2 = 1000, which matches the appropriate voice level C = 1000, and the transmission gain update value is G2 = 3.10 dB, which is the optimum value.

すわなち、マイクロフォン１０９とユーザの口の距離が一定で、入力される音声レベルが一定で、使用環境に騒音が無い場合には時定数を大きくして追従を早くすることが好ましい。
上記例では、マイクロフォン１０９に入力する音声レベルにはバラツキが無いとしたが、マイクロフォン１０９とユーザの口が離れて距離が一定でなく、入力される音声レベルが変動し、騒音環境で使用される場合には、時定数を大きくすると、送話ゲインの設定変化が大きく、逆に最適でない送話ゲインが設定されるので、バラツキがある場合には、以下のように時定数を小さくして送話ゲインの設定を行う。 In other words, when the distance between the microphone 109 and the user's mouth is constant, the input voice level is constant, and there is no noise in the usage environment, it is preferable to increase the time constant to speed up the follow-up.
In the above example, there is no variation in the sound level input to the microphone 109, but the distance between the microphone 109 and the user's mouth is not constant, the input sound level fluctuates, and is used in a noise environment. In this case, if the time constant is increased, the transmission gain setting changes greatly, and conversely, a non-optimal transmission gain is set.If there is variation, the time constant is reduced as follows. Set the talk gain.

図５は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキがあり、時定数Ｋ＝１．０である場合例を説明する図である。
本図では、音声認識時の第１回目、第２回目、第３回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００、Ａ2＝７５０、Ａ3＝７００、Ａ4＝７５０…とし、バラツキがあるとし、その他の条件は上記例と同じであるとして、以下のように、送話ゲインを算出する。 FIG. 5 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, and the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, and the voice level input to the microphone 109 is small. It is a figure explaining an example in case there is variation and time constant K = 1.0.
In this figure, the voice levels input to the microphone 109 during the first, second, third, etc. speech recognition are as follows: A1 = 700, A2 = 750, A3 = 700, A4 = 750, etc. Suppose that other conditions are the same as the above example, the transmission gain is calculated as follows.

第１回目の音声認識時の発声では、送話ゲインの初期値がＧ0＝０.００ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ａ1／Ｘ1＝１
を満たし、この場合、
Ｘ1＝Ａ1＝７００
となる。 In the utterance at the time of the first speech recognition, the initial value of the transmission gain is G0 = 0.00 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A From the equation (2)
A1 / X1 = 1
In this case,
X1 = A1 = 700
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝３．１０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝７５０×１０^{３．１０／２０}
＝１０７２
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = 3.10 dB and is set in the amplifier 108 before utterance. In this case, the speech level X2 detected by the speech level detection unit 103A From the equation (2)
X2 = A2 × 10 ^{G1 / 20}
= 750 × 10 ^{3.10 / 20}
= 1072
It becomes.

更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−１．０×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝３．１０−１．０×２０×ｌｏｇ（１０７２／１０００）ｄＢ
＝２．５０ｄＢ
となる。 The updated transmission gain G2 is calculated from equation (1):
G2 = G1-1.0 * 20 * log (X2 / 1000) dB
= 3.10-1.0 x 20 x log (1072/1000) dB
= 2.50 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝２．５０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝７００×１０^{２．５０／２０}
＝９３３
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 2.50 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X3 detected by the speech level detection unit 103A From the equation (2)
X3 = A3 × 10 ^{G2 / 20}
= 700 × 10 ^{2.50 / 20}
= 933
It becomes.

更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−１．０×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝２．５−１．０×２０×ｌｏｇ（９３３／１０００）ｄＢ
＝３．１０ｄＢ
となる。 The updated transmission gain G3 is calculated from equation (1):
G3 = G2-1.0 * 20 * log (X3 / 1000) dB
= 2.5-1.0 * 20 * log (933/1000) dB
= 3.10 dB
It becomes.

第４回目の音声認識時以降の発声では、第2回目、第３回目の音声認識時の場合と同様に、音声レベル検出部１０３Ａにより検出される音声レベルＸ2＝１０７２、Ｘ3＝９３３が繰り返される。
すなわち、送話ゲイン設定制御部１０３Ｂでは、第1回目の音声認識時には、増幅器１０８にＧ0＝０．０ｄＢを設定し、音声レベル検出部１０３Ａでは音声レベルＸ1＝７００を検出し、Ｇ0＝０．００ｄＢをＧ１＝３．１０ｄＢに更新する。 In the utterance after the fourth speech recognition, the speech levels X2 = 1072 and X3 = 933 detected by the speech level detector 103A are repeated as in the second and third speech recognition. .
That is, the transmission gain setting control unit 103B sets G0 = 0.0 dB in the amplifier 108 during the first speech recognition, the speech level detection unit 103A detects the speech level X1 = 700, and G0 = 0. Update 00 dB to G1 = 3.10 dB.

この場合、時定数がＫ＝１．０で、マイクロフォン１０９に入力する音声レベルＡ1＝７００、Ａ２＝７５０、Ａ3＝７００、…としバラツキがあるとしているので、第２回目、第3回目…の音声認識時に音声レベル検出部１０３Ａにより検出された音声レベルＸ2、Ｘ3…は、Ｘ2＝１０７２、Ｘ3＝９３３…となり、適正音声レベルＣ＝１０００に一致せず、同様に更新後の送話ゲインも２．５０ｄＢ、３．１０ｄＢとなり、相互に一致しない。 In this case, since the time constant is K = 1.0 and there are variations in the sound levels A1 = 700, A2 = 750, A3 = 700,... Input to the microphone 109, the second, third,. The voice levels X2, X3,... Detected by the voice level detector 103A at the time of voice recognition are X2 = 1072, X3 = 933, and do not coincide with the appropriate voice level C = 1000. 2.50 dB and 3.10 dB, which do not match each other.

前述のように、バラツキが無い場合には第２回目の音声認識時に適正音声レベルになったが、バラツキが有る場合には音声レベル検出部１０３Ａにより検出された音声レベルが適正音声レベルにならない。
このため、ユーザが常に同一の条件で同一の声の大きさで発声せずに変動する場合、マイクロフォン１０９に入力する音声レベルＡ1、Ａ2、Ａ3、…にバラツキが生じ、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルと一致しないので、音声認識時に高い認識率を得ることは困難である。 As described above, when there is no variation, the appropriate sound level is obtained during the second speech recognition. However, when there is variation, the sound level detected by the sound level detection unit 103A does not become the appropriate sound level.
For this reason, when the user always fluctuates without speaking at the same voice volume under the same conditions, the voice levels A1, A2, A3,... Input to the microphone 109 vary, and the voice level detector 103A Since the detected voice level does not match the appropriate voice level, it is difficult to obtain a high recognition rate during voice recognition.

このため、次に、時定数Ｋを、一例として、「１．０」よりも小さい値、例えば、「０．５」に設定しバラツキの無い場合で、以下のように、音声レベル検出部１０３Ａにより検出された音声レベルを適正音声レベルＣに近づけるようにする。 Therefore, next, as an example, when the time constant K is set to a value smaller than “1.0”, for example, “0.5” and there is no variation, the audio level detection unit 103A is as follows. The sound level detected by the above is brought close to the appropriate sound level C.

送話ゲインの初期値図６は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。
本図では、一例として、音声認識時の第１回目、第２回目、第３回目、第４回目、第５回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００、Ａ2＝７００、Ａ3＝７００、Ａ4＝７００、Ａ5＝７００、Ａ6＝７００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。 FIG. 6 shows an example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1. The distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, and is input to the microphone 109. It is a figure explaining the example in case the audio | voice level to be small, there is no dispersion | variation, and the time constant K = 0.5.
In this figure, as an example, the voice levels input to the microphone 109 during the first, second, third, fourth, fifth, etc. speech recognition are as follows: A1 = 700, A2 = 700, With A3 = 700, A4 = 700, A5 = 700, A6 = 700,... And a time constant of 0.5, the transmission gain is calculated as follows.

第１回目の音声認識時の発声では、送話ゲインの初期値がＧ0＝０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ａ1／Ｘ1＝１
となり、
Ｘ1＝Ａ1＝７００
となる。 In the utterance at the time of the first speech recognition, the initial value of the transmission gain is G0 = 0 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A is From equation (2)
A1 / X1 = 1
And
X1 = A1 = 700
It becomes.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝０．０−０．５×２０×ｌｏｇ（７００／１０００）ｄＢ
＝１．２ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 0.0-0.5 x 20 x log (700/1000) dB
= 1.2dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝１．５５ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝７００×１０^{１．５５／２０}
＝８３７
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = 1.55 dB, and is set in the amplifier 108 before utterance. The detected speech level X2 in this case is
X2 = A2 × 10 ^{G1 / 20}
= 700 × 10 ^{1.55 / 20}
= 837
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝１．５５−０．５×２０×ｌｏｇ（８３７／１０００）ｄＢ
＝２．３２ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 1.55-0.5 x 20 x log (837/1000) dB
= 2.32 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝２．３２ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝７００×１０^{２．３２／２０}
＝９１４
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 2.32 dB and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 700 × 10 ^{2.32 / 20}
= 914
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝２．３２−０．５×２０×ｌｏｇ（９１４／１０００）ｄＢ
＝２．７１ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 2.32-0.5 x 20 x log (914/1000) dB
= 2.71 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝２．７１ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝７００×１０^{２．７１／２０}
＝９５６
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 2.71 dB and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 700 × 10 ^{2.71 / 20}
= 956
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝２．７１−０．５×２０×ｌｏｇ（９５６／１０００）ｄＢ
＝２．９１ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 2.71−0.5 × 20 × log (956/1000) dB
= 2.91 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝２．９１ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝７００×１０^{２．９１／２０}
＝９７９
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 2.91 dB and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detection unit 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 700 × 10 ^{2.91 / 20}
= 979
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝２．９１−０．５×２０×ｌｏｇ（９７９／１０００）ｄＢ
＝２．４ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 2.91−0.5 × 20 × log (979/1000) dB
= 2.4dB
It becomes.

この場合の最終的な送話ゲインは、
−２０×ｌｏｇ（７００／１０００）ｄＢ
＝３．００ｄＢ
である。
このように前述の図４に示すように２回目の発声で適正音声レベルにできないが、５回目の発声でほぼ適正音声レベルにできる。すなわち、送話ゲインを更新することにより、最適な送話ゲインになる。 The final transmission gain in this case is
−20 × log (700/1000) dB
= 3.00 dB
It is.
As described above, as shown in FIG. 4, it is not possible to achieve an appropriate sound level with the second utterance, but it is possible to achieve an approximately appropriate sound level with the fifth utterance. That is, by updating the transmission gain, the optimal transmission gain is obtained.

図７は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。
本図では、一例として、音声認識時の第１回目、第２回目、第３回目、第４回目、第５回目、第6回…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝１３００、Ａ2＝１３００、Ａ3＝１３００、Ａ4＝１３００、Ａ5＝１３００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。 FIG. 7 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1. The distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, and the sound level input to the microphone 109 is large. It is a figure explaining an example in case there is no variation and time constant K = 0.5.
In this figure, as an example, the voice level input to the microphone 109 during the first, second, third, fourth, fifth, sixth, etc. speech recognition is A1 = 1300, With A2 = 1300, A3 = 1300, A4 = 1300, A5 = 1300,... And a time constant of 0.5, the transmission gain is calculated as follows.

第１回目の音声認識時の発声では、送話ゲインの初期値がＧ0＝０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ａ1／Ｘ1＝１
となり、
Ｘ1＝Ａ1＝１３００
となる。 In the utterance at the time of the first speech recognition, the initial value of the transmission gain is G0 = 0 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A is From equation (2)
A1 / X1 = 1
And
X1 = A1 = 1300
It becomes.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝０．０−０．５×２０×ｌｏｇ（１３００／１０００）ｄＢ
＝−１．１４ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 0.0-0.5 x 20 x log (1300/1000) dB
= -1.14 dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝−１．１４ｄＢであり、発声前に増幅器１０８に設定され、、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝１３００×１０^{−１．１４／２０}
＝１１４０
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = −1.14 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level X2 is expressed by equation (2). Than,
X2 = A2 × 10 ^{G1 / 20}
= 1300 × 10 ^{−1.14 / 20}
= 1140
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝−１．１４−０．５×２０×ｌｏｇ（１１４０／１０００）ｄＢ
＝−１．７１ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= -1.14-0.5 x 20 x log (1140/1000) dB
= -1.71 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝−１．７１ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝１３００×１０^{−１．７１／２０}
＝１０６８
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = -1.71 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech detected by the speech level detection unit 103A Level X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 1300 × 10 ^{−1.71 / 20}
= 1068
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝−１．７１−０．５×２０×ｌｏｇ（１０６８／１０００）ｄＢ
＝−２．００ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= -1.71-0.5 x 20 x log (1068/1000) dB
= -2.00 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝−２．００ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝１３００×１０^{−２．００／２０}
＝１０３２
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = −2.00 dB, and is set in the amplifier 108 before utterance. In this case, the speech level detected by the speech level detection unit 103A X4 is calculated from equation (2).
X4 = A4 × 10 ^{G3 / 20}
= 1300 × 10 ^{−2.00 / 20}
= 1032
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝−２．００−０．５×２０×ｌｏｇ（１０３２／１０００）ｄＢ
＝−２．１４ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= −2.00−0.5 × 20 × log (1032/1000) dB
= -2.14 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝−２．１４ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝１３００×１０^{−２．１４／２０}
＝１０１６
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = -2.14 dB, and is set in the amplifier 108 before utterance. In this case, the speech level detected by the speech level detection unit 103A X5 is calculated from equation (2).
X5 = A5 × 10 ^{G4 / 20}
= 1300 × 10 ^{−2.14 / 20}
= 1016
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝−２．１４−０．５×２０×ｌｏｇ（１０１６／１０００）ｄＢ
＝−２．２１ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= −2.14−0.5 × 20 × log (1016/1000) dB
= -2.21dB
It becomes.

この場合の最終的な送話ゲインは、
−２０×ｌｏｇ（１３００／１０００）ｄＢ
＝−２．２８ｄＢ
である。
このように前述の図４に示すように２回目の発声で適正音声レベルにできないが、５回目の発声でほぼ適正音声レベルにできる。すなわち、送話ゲインを更新することにより、最適な送話ゲインになる。 The final transmission gain in this case is
−20 × log (1300/1000) dB
= -2.28 dB
It is.
As described above, as shown in FIG. 4, it is not possible to achieve an appropriate sound level with the second utterance, but it is possible to achieve an approximately appropriate sound level with the fifth utterance. That is, by updating the transmission gain, the optimal transmission gain is obtained.

図８は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。
本図では、一例として、音声認識時の第１回目、第２回目、第３回目、第４回目、第５回目、第6回…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝３００、Ａ2＝３００、Ａ3＝３００、Ａ4＝３００、Ａ5＝３００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。 FIG. 8 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, and the distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, and the sound level input to the microphone 109 is small. It is a figure explaining an example in case there is no variation and time constant K = 0.5.
In this figure, as an example, when the first, second, third, fourth, fifth, sixth,... With A2 = 300, A3 = 300, A4 = 300, A5 = 300,... And a time constant of 0.5, the transmission gain is calculated as follows.

第１回目の音声認識時の発声では、送話ゲインの初期値がＧ0＝０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ａ1／Ｘ1＝１
となり、
Ｘ1＝Ａ1＝３００
となる。 In the utterance at the time of the first speech recognition, the initial value of the transmission gain is G0 = 0 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A is From equation (2)
A1 / X1 = 1
And
X1 = A1 = 300
It becomes.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝０．０−０．５×２０×ｌｏｇ（３００／１０００）ｄＢ
＝５．２３ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 0.0-0.5 x 20 x log (300/1000) dB
= 5.23 dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝５．２３ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝３００×１０^{５．２３／２０}
＝５４８
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = 5.23 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level X2 is expressed by the following equation (2):
X2 = A2 × 10 ^{G1 / 20}
= 300 × 10 ^{5.23 / 20}
= 548
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝５．２３−０．５×２０×ｌｏｇ（５４８／１０００）ｄＢ
＝７．８４ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 5.23−0.5 × 20 × log (548/1000) dB
= 7.84 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝７．８４ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝３００×１０^{７．８４／２０}
＝７４０
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 7.84 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 300 × 10 ^{7.84 / 20}
= 740
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝７．８４−０．５×２０×ｌｏｇ（７４０／１０００）ｄＢ
＝８．９０ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 7.84−0.5 × 20 × log (740/1000) dB
= 8.90 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝８．９０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝３００×１０^{８．９０／２０}
＝８３６
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 8.90 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 300 × 10 ^{8.90 / 20}
= 836
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝８．９０−０．５×２０×ｌｏｇ（８３６／１０００）ｄＢ
＝９．６８ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 8.90−0.5 × 20 × log (836/1000) dB
= 9.68 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝９．６８ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝３００×１０^{９．６８／２０}
＝９１４
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 9.68 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detection unit 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 300 × 10 ^{9.68 / 20}
= 914
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝９．６８−０．５×２０×ｌｏｇ（９１４／１０００）ｄＢ
＝１０．０７ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 9.68−0.5 × 20 × log (914/1000) dB
= 10.07 dB
It becomes.

この場合の最終的な送話ゲインは、
−２０×ｌｏｇ（３００／１０００）ｄＢ
＝１０．４６ｄＢ
である。
このように前述の図４に示すように２回目の発声で適正音声レベルにできないが、５回目の発声でほぼ適正音声レベルにできる。すなわち、送話ゲインを更新することにより、最適な送話ゲインになる。 The final transmission gain in this case is
−20 × log (300/1000) dB
= 10.46 dB
It is.
As described above, as shown in FIG. 4, it is not possible to achieve an appropriate sound level with the second utterance, but it is possible to achieve an approximately appropriate sound level with the fifth utterance. That is, by updating the transmission gain, the optimal transmission gain is obtained.

次に、時定数Ｋ＝０．５をそのままで、バラツキ有りとして、以下のように、音声レベル検出部１０３Ａにより検出された音声レベルを適正音声レベルＣに近づけるようにする。 Next, with the time constant K = 0.5 as it is, there is variation, and the sound level detected by the sound level detection unit 103A is brought closer to the appropriate sound level C as follows.

図９は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが有り、時定数Ｋ＝０．５である場合の例を説明する図である。
本図では、一例として、音声認識時の第１回目、第２回目、第３回目、第４回目、第５回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００、Ａ2＝７５０、Ａ3＝７００、Ａ4＝７５０、Ａ5＝７００、Ａ6＝７５０、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。 FIG. 9 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, and the distance (d1 = 3 cm) from the microphone 109 to the user's mouth is small, and the voice level input to the microphone 109 is small. It is a figure explaining an example in case there is variation and time constant K = 0.5.
In this figure, as an example, when the first, second, third, fourth, fifth, and so on during voice recognition, the voice levels input to the microphone 109 are A1 = 700, A2 = 750, With A3 = 700, A4 = 750, A5 = 700, A6 = 750,... And a time constant of 0.5, the transmission gain is calculated as follows.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝０．０−０．５×２０×ｌｏｇ（７００／１０００）ｄＢ
＝１．５５ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 0.0-0.5 x 20 x log (700/1000) dB
= 1.55 dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝１．５５ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝７５０×１０^{１．５５／２０}
＝８９５
となる。 In the utterance at the time of the second speech recognition, the transmission gain before update is G1 = 1.55 dB, and is set in the amplifier 108 before utterance. The detected speech level X2 in this case is
X2 = A2 × 10 ^{G1 / 20}
= 750 × 10 ^{1.55 / 20}
= 895
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝１．５５−０．５×２０×ｌｏｇ（８９５／１０００）ｄＢ
＝２．０２ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 1.55-0.5 x 20 x log (895/1000) dB
= 2.02 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝２．０２ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝７００×１０^{２．０２／２０}
＝８８３
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 2.02 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 700 × 10 ^{2.02 / 20}
= 883
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝２．０２−０．５×２０×ｌｏｇ（８８３／１０００）ｄＢ
＝２．５６ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 2.02-0.5 x 20 x log (883/1000) dB
= 2.56dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝２．５６ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝７５０×１０^{２．５６／２０}
＝１００７
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 2.56 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 750 × 10 ^{2.56 / 20}
= 1007
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝２．５６−０．５×２０×ｌｏｇ（１００７／１０００）ｄＢ
＝２．５３ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 2.56-0.5 × 20 × log (1007/1000) dB
= 2.53 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝２．５３ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝７００×１０^{２．５３／２０}
＝９３７
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 2.53 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detection unit 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 700 × 10 ^{2.53 / 20}
= 937
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝２．５３−０．５×２０×ｌｏｇ（９３７／１０００）ｄＢ
＝２．８１ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 2.53-0.5 × 20 × log (937/1000) dB
= 2.81 dB
It becomes.

本図で示す６回目から８回目の途中の経過の説明を省略し、９回目、１０回目の発声における音声レベル検出部１０３Ａで検出される音声レベル「９５４」、「１０４６」、更新後の送話ゲイン「２．８９」、「２．６９」が以降の発声で繰り返される。
このように図５の時定数Ｋ＝１．０に設定した場合と比較すると、時定数Ｋ＝０．５の場合、音声レベル検出部１０３Ａで検出される音声レベルは適正音声レベルＣ（＝１０００）に近づくのに発声回数をより多く必要とするが、適正音声レベルにより近づき、音声認識の認識率が向上する。すなわち、送話ゲインを更新することにより、最適な送話ゲインに近づく。 The explanation of the process from the 6th to the 8th in the figure is omitted, and the audio levels “954” and “1046” detected by the audio level detection unit 103A in the 9th and 10th utterances are transmitted. The talk gains “2.89” and “2.69” are repeated in subsequent utterances.
As compared with the case where the time constant K = 1.0 in FIG. 5 is set in this way, when the time constant K = 0.5, the sound level detected by the sound level detection unit 103A is the appropriate sound level C (= 1000). ) Requires a greater number of utterances, but approaches the appropriate voice level and improves the recognition rate of voice recognition. That is, by updating the transmission gain, the optimum transmission gain is approached.

図１０は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急に雑音が入った場合の例を説明する図である。
本図では、一例として、音声認識再開時の第１回目、第２回目、第３回目、第４回目、第５回目、第6回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝３００、Ａ2＝５００、Ａ3＝３００、Ａ4＝３００、Ａ5＝３００、Ａ6＝３００、Ａ7＝３００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。 FIG. 10 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1. The distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, and the sound level input to the microphone 109 is small. It is a figure explaining the example when there is no variation, time constant K = 0.5, and noise suddenly enters in a state where the sound level detected by the sound level detection unit 103A is close to the appropriate sound level.
In this figure, as an example, the voice level input to the microphone 109 when the first, second, third, fourth, fifth, sixth,. , A2 = 500, A3 = 300, A4 = 300, A5 = 300, A6 = 300, A7 = 300,..., And the time constant is 0.5, the transmission gain is calculated as follows.

すなわち、音声認識再開時の第１回目の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝３００であり、音声レベル検出部１０３Ａで検出される音声レベルがほぼ適正音声レベルＣ＝１０００になっており、第２回目だけに雑音が入力したとする。
第１回目の音声認識時の発声では、更新前の送話ゲインがＧ0＝１０．４６ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ｘ1＝Ａ1×１０^G0／２０
＝７００×１０^{１０．４６／２０}
＝１０００
となる。 That is, the voice level input to the microphone 109 at the first utterance when voice recognition is resumed is A1 = 300, and the voice level detected by the voice level detection unit 103A is almost the appropriate voice level C = 1000. Suppose that noise is input only in the second time.
In the utterance at the time of the first speech recognition, the transmission gain before update is G0 = 10.46 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A From the equation (2)
X1 = A1 × 10 ^{G0 / 20}
= 700 × 10 ^{10.46 / 20}
= 1000
It becomes.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝１０．４６−０．５×２０×ｌｏｇ（１０００／１０００）ｄＢ
＝１０．４６ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 10.46-0.5 x 20 x log (1000/1000) dB
= 10.46 dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝１０．４６ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝５００×１０^{２．７１／２０}
＝１６６７
となる。 In the utterance at the time of the second speech recognition, the transmission gain before the update is G1 = 10.46 dB, and is set in the amplifier 108 before the utterance. The detected speech level X2 in this case is
X2 = A2 × 10 ^{G1 / 20}
= 500 × 10 ^{2.71 / 20}
= 1667
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝１０．４６−０．５×２０×ｌｏｇ（１６６７／１０００）ｄＢ
＝８．２４ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 10.46-0.5 x 20 x log (1667/1000) dB
= 8.24 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝８．２４ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝３００×１０^{８．２４／２０}
＝７７５
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 8.24 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 300 × 10 ^{8.24 / 20}
= 775
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝８．２４−０．５×２０×ｌｏｇ（７７５／１０００）ｄＢ
＝９．３５ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 8.24-0.5 × 20 × log (775/1000) dB
= 9.35 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝９．３５ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝３００×１０^{９．３５／２０}
＝８８０
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 9.35 dB and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 300 × 10 ^{9.35 / 20}
= 880
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝９．３５−０．５×２０×ｌｏｇ（８８０／１０００）ｄＢ
＝９．９１ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 9.35-0.5 x 20 x log (880/1000) dB
= 9.91 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝９．９１ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝３００×１０^{９．９１／２０}
＝９３９
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 9.91 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detector 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 300 × 10 ^{9.91 / 20}
= 939
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝９．９１−０．５×２０×ｌｏｇ（９３９／１０００）ｄＢ
＝１０．１８ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 9.91−0.5 × 20 × log (939/1000) dB
= 10.18 dB
It becomes.

第６回目の音声認識時の発声では、更新前の送話ゲインがＧ5＝１０．１８ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ6は、（２）式より、
Ｘ6＝Ａ6×１０^G5／２０
＝３００×１０^{１０．１８／２０}
＝９６９
となる。 In the utterance at the time of the sixth speech recognition, the transmission gain before update is G5 = 10.18 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X6 detected by the speech level detection unit 103A From the equation (2)
X6 = A6 × 10 ^{G5 / 20}
= 300 × 10 ^{10.18 / 20}
= 969
It becomes.

この場合、更新後の送話ゲインＧ6は、（１）式より、
Ｇ6＝Ｇ5−０．５×２０×ｌｏｇ（Ｘ6／１０００）ｄＢ
＝１０．１８−０．５×２０×ｌｏｇ（９６９／１０００）ｄＢ
＝１０．３２ｄＢ
となる。
このように、雑音により送話ゲインが変化しても、送話ゲインを更新することにより、元の最適な送話ゲインに戻る。 In this case, the updated transmission gain G6 is as follows from the equation (1):
G6 = G5−0.5 × 20 × log (X6 / 1000) dB
= 10.18-0.5 x 20 x log (969/1000) dB
= 10.32 dB
It becomes.
Thus, even if the transmission gain changes due to noise, the transmission gain is updated to return to the original optimum transmission gain.

図１１は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さくなった場合の例を説明する図である。 FIG. 11 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, and the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, and the voice level input to the microphone 109 is large. There is no variation, the time constant K = 0.5, and the distance from the microphone 109 to the user's mouth (d2 = 7 cm) suddenly becomes a state where the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. It is a figure explaining the example when the audio | voice level input into the microphone 109 becomes large largely.

本図では、一例として、音声認識再開時の第１回目、第２回目、第３回目、第４回目、第５回目、第6回目、第7回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝３００、Ａ2＝７００、Ａ3＝７００、Ａ4＝７００、Ａ5＝７００、Ａ6＝７００、Ａ7＝７００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。
すなわち、音声認識再開時の第１回目の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝３００であり、音声レベル検出部１０３Ａで検出される音声レベルがほぼ適正音声レベルＣ＝１０００になっており、第２回目以降の音声レベルが７００に変化するとする。 In this figure, as an example, the voice level input to the microphone 109 when the first, second, third, fourth, fifth, sixth, seventh,... Where A1 = 300, A2 = 700, A3 = 700, A4 = 700, A5 = 700, A6 = 700, A7 = 700,..., And the time constant is 0.5, the transmission gain is calculated as follows: To do.
That is, the voice level input to the microphone 109 at the first utterance when voice recognition is resumed is A1 = 300, and the voice level detected by the voice level detection unit 103A is almost the appropriate voice level C = 1000. Assume that the audio level after the second time changes to 700.

第１回目の音声認識時の発声では、更新前の送話ゲインがＧ0＝１０．４６ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ｘ1＝Ａ1×１０^G0／２０
＝３００×１０^{１０．４６／２０}
＝１０００
となる。 In the utterance at the time of the first speech recognition, the transmission gain before update is G0 = 10.46 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A From the equation (2)
X1 = A1 × 10 ^{G0 / 20}
= 300 × 10 ^{10.46 / 20}
= 1000
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝１０．４６ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝７００×１０^{１０．４６／２０}
＝２３３１
となる。 In the utterance at the time of the second speech recognition, the transmission gain before the update is G1 = 10.46 dB, and is set in the amplifier 108 before the utterance. The detected speech level X2 in this case is
X2 = A2 × 10 ^{G1 / 20}
= 700 × 10 ^{10.46 / 20}
= 2331
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝１０．４６−０．５×２０×ｌｏｇ（２３３１／１０００）ｄＢ
＝６．７８ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 10.46-0.5 x 20 x log (2331/1000) dB
= 6.78 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝６．７８ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝７００×１０^{６．７８／２０}
＝１５２８
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 6.78 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 700 × 10 ^{6.78 / 20}
= 1528
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝６．７８−０．５×２０×ｌｏｇ（１５２８／１０００）ｄＢ
＝４．９４ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 6.78−0.5 × 20 × log (1528/1000) dB
= 4.94 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝４．９４ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝７００×１０^{４．９４／２０}
＝１２３６
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 4.94 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 700 × 10 ^{4.94 / 20}
= 1236
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝４．９４−０．５×２０×ｌｏｇ（１２３６／１０００）ｄＢ
＝４．０２ｄＢ
となる。 In this case, the updated transmission gain G4 is expressed by the following equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 4.94−0.5 × 20 × log (1236/1000) dB
= 4.02 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝４．０２ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝７００×１０^{４．０２／２０}
＝１１１２
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 4.02 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detection unit 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 700 × 10 ^{4.02 / 20}
= 1112
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝４．０２−０．５×２０×ｌｏｇ（１１１２／１０００）ｄＢ
＝３．５６ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 4.02-0.5 x 20 x log (1112/1000) dB
= 3.56dB
It becomes.

第６回目の音声認識時の発声では、更新前の送話ゲインがＧ5＝３．５６ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ6は、（２）式より、
Ｘ6＝Ａ6×１０^G5／２０
＝７００×１０^{３．５６／２０}
＝１０５５
となる。 In the utterance at the time of the sixth speech recognition, the transmission gain before update is G5 = 3.56 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X6 detected by the speech level detection unit 103A From the equation (2)
X6 = A6 × 10 ^{G5 / 20}
= 700 × 10 ^{3.56 / 20}
= 1055
It becomes.

この場合、更新後の送話ゲインＧ6は、（１）式より、
Ｇ6＝Ｇ5−０．５×２０×ｌｏｇ（Ｘ6／１０００）ｄＢ
＝３．５６−０．５×２０×ｌｏｇ（１０５５／１０００）ｄＢ
＝３．３３ｄＢ
となる。 In this case, the updated transmission gain G6 is as follows from the equation (1):
G6 = G5−0.5 × 20 × log (X6 / 1000) dB
= 3.56-0.5 × 20 × log (1055/1000) dB
= 3.33dB
It becomes.

第７回目の音声認識時の発声では、更新前の送話ゲインがＧ6＝３．３３ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ7は、（２）式より、
Ｘ7＝Ａ7×１０^G6／２０
＝７００×１０^{３．３３／２０}
＝１０２７
となる。 In the utterance at the time of the seventh speech recognition, the transmission gain before update is G6 = 3.33 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X7 detected by the speech level detection unit 103A From the equation (2)
X7 = A7 × 10 ^{G6 / 20}
= 700 × 10 ^{3.33 / 20}
= 1027
It becomes.

この場合、更新後の送話ゲインＧ7は、（１）式より、
Ｇ7＝Ｇ6−０．５×２０×ｌｏｇ（Ｘ7／１０００）ｄＢ
＝３．３３−０．５×２０×ｌｏｇ（１０２７／１０００）ｄＢ
＝３．２１ｄＢ
となる。 In this case, the updated transmission gain G7 is calculated from the equation (1):
G7 = G6−0.5 × 20 × log (X7 / 1000) dB
= 3.33-0.5 x 20 x log (1027/1000) dB
= 3.21 dB
It becomes.

このように、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きくなった場合にも、送話ゲインを更新することにより、第7回目でほぼ最適な送話ゲインになる。 As described above, the distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, the sound level input to the microphone 109 is small, and the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. Even when the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is suddenly small and the voice level input to the microphone 109 becomes large, the transmission gain is updated, so that Optimal transmission gain.

図１２は図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きくなった場合の例を説明する図である。 FIG. 12 is a calculation example of the transmission gain by the transmission gain setting control unit 103B in FIG. 1. The distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, and the voice level input to the microphone 109 is small. There is no variation, the time constant K = 0.5, and the distance from the microphone 109 to the user's mouth (d1 = 3 cm) suddenly becomes a state where the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. It is a figure explaining the example when the audio | voice level input into the microphone 109 becomes small small.

本図では、一例として、音声認識再開時の第１回目、第２回目、第３回目、第４回目、第５回目、第6回目、第7回目…の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００、Ａ2＝３００、Ａ3＝３００、Ａ4＝３００、Ａ5＝３００、Ａ6＝３００、Ａ7＝３００、…とし、時定数を０．５として、以下のように、送話ゲインを算出する。
すなわち、音声認識再開時の第１回目の発声時にマイクロフォン１０９に入力する音声レベルがＡ1＝７００であり、音声レベル検出部１０３Ａで検出される音声レベルがほぼ適正音声レベルＣ＝１０００になっており、第２回目以降の音声レベルが３００に変化するとする。 In this figure, as an example, the voice level input to the microphone 109 when the first, second, third, fourth, fifth, sixth, seventh,... Where A1 = 700, A2 = 300, A3 = 300, A4 = 300, A5 = 300, A6 = 300, A7 = 300,..., And the time constant is 0.5, the transmission gain is calculated as follows: To do.
That is, the voice level input to the microphone 109 at the time of the first utterance when resuming voice recognition is A1 = 700, and the voice level detected by the voice level detector 103A is substantially the appropriate voice level C = 1000. Suppose that the audio level after the second time changes to 300.

第１回目の音声認識時の発声では、更新前の送話ゲインがＧ0＝３．００ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ1は、（２）式より、
Ｘ1＝Ａ1×１０^G0／２０
＝７００×１０^{３．００／２０}
＝１０００
となる。 In the utterance at the time of the first speech recognition, the transmission gain before update is G0 = 3.00 dB and is set in the amplifier 108 before utterance. In this case, the speech level X1 detected by the speech level detection unit 103A From the equation (2)
X1 = A1 × 10 ^{G0 / 20}
= 700 × 10 ^{3.00 / 20}
= 1000
It becomes.

この場合、更新後の送話ゲインＧ1は、（１）式より、
Ｇ1＝Ｇ0−０．５×２０×ｌｏｇ（Ｘ1／１０００）ｄＢ
＝３．００−０．５×２０×ｌｏｇ（１０００／１０００）ｄＢ
＝３．００ｄＢ
となる。 In this case, the updated transmission gain G1 is expressed by the following equation (1):
G1 = G0−0.5 × 20 × log (X1 / 1000) dB
= 3.00-0.5 x 20 x log (1000/1000) dB
= 3.00 dB
It becomes.

第２回目の音声認識時の発声では、更新前の送話ゲインがＧ1＝３．００ｄＢであり、発声前に増幅器１０８に設定され、この場合の検出音声レベルＸ2は、（２）式より、
Ｘ2＝Ａ2×１０^G1／２０
＝３００×１０^{３．００／２０}
＝４２４
となる。 In the utterance at the time of the second speech recognition, the transmission gain before the update is G1 = 3.00 dB, and is set in the amplifier 108 before the utterance. In this case, the detected speech level X2 is obtained from the equation (2):
X2 = A2 × 10 ^{G1 / 20}
= 300 × 10 ^{3.00 / 20}
= 424
It becomes.

この場合、更新後の送話ゲインＧ2は、（１）式より、
Ｇ2＝Ｇ1−０．５×２０×ｌｏｇ（Ｘ2／１０００）ｄＢ
＝３．００−０．５×２０×ｌｏｇ（４２４／１０００）ｄＢ
＝６．７３ｄＢ
となる。 In this case, the updated transmission gain G2 is calculated from the equation (1):
G2 = G1-0.5 * 20 * log (X2 / 1000) dB
= 3.00-0.5 x 20 x log (424/1000) dB
= 6.73 dB
It becomes.

第３回目の音声認識時の発声では、更新前の送話ゲインがＧ2＝６．７３ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される検出音声レベルＸ3は、（２）式より、
Ｘ3＝Ａ3×１０^G2／２０
＝３００×１０^{６．７３／２０}
＝６５１
となる。 In the utterance at the time of the third speech recognition, the transmission gain before update is G2 = 6.73 dB, and is set in the amplifier 108 before utterance. In this case, the detected speech level detected by the speech level detection unit 103A X3 is calculated from equation (2).
X3 = A3 × 10 ^{G2 / 20}
= 300 × 10 ^{6.73 / 20}
= 651
It becomes.

この場合、更新後の送話ゲインＧ3は、（１）式より、
Ｇ3＝Ｇ2−０．５×２０×ｌｏｇ（Ｘ3／１０００）ｄＢ
＝６．７３−０．５×２０×ｌｏｇ（６５１／１０００）ｄＢ
＝８．６０ｄＢ
となる。 In this case, the updated transmission gain G3 is calculated from the equation (1):
G3 = G2-0.5 * 20 * log (X3 / 1000) dB
= 6.73−0.5 × 20 × log (651/1000) dB
= 8.60 dB
It becomes.

第４回目の音声認識時の発声では、更新前の送話ゲインがＧ3＝８．６０ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ4は、（２）式より、
Ｘ4＝Ａ4×１０^G3／２０
＝３００×１０^{８．６０／２０}
＝８０７
となる。 In the utterance at the time of the fourth speech recognition, the transmission gain before update is G3 = 8.60 dB and is set in the amplifier 108 before utterance. In this case, the speech level X4 detected by the speech level detection unit 103A From the equation (2)
X4 = A4 × 10 ^{G3 / 20}
= 300 × 10 ^{8.60 / 20}
= 807
It becomes.

この場合、更新後の送話ゲインＧ4は、（１）式より、
Ｇ4＝Ｇ3−０．５×２０×ｌｏｇ（Ｘ4／１０００）ｄＢ
＝８．６０−０．５×２０×ｌｏｇ（８０７／１０００）ｄＢ
＝９．５３ｄＢ
となる。 In this case, the updated transmission gain G4 is as follows from the equation (1):
G4 = G3−0.5 × 20 × log (X4 / 1000) dB
= 8.60−0.5 × 20 × log (807/1000) dB
= 9.53 dB
It becomes.

第５回目の音声認識時の発声では、更新前の送話ゲインがＧ4＝９．５３ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ5は、（２）式より、
Ｘ5＝Ａ5×１０^G4／２０
＝３００×１０^{９．５３／２０}
＝８９９
となる。 In the utterance at the time of the fifth speech recognition, the transmission gain before update is G4 = 9.53 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X5 detected by the speech level detection unit 103A From the equation (2)
X5 = A5 × 10 ^{G4 / 20}
= 300 × 10 ^{9.53 / 20}
= 899
It becomes.

この場合、更新後の送話ゲインＧ5は、（１）式より、
Ｇ5＝Ｇ4−０．５×２０×ｌｏｇ（Ｘ5／１０００）ｄＢ
＝９．５３−０．５×２０×ｌｏｇ（８９９／１０００）ｄＢ
＝９．９９ｄＢ
となる。 In this case, the updated transmission gain G5 is calculated from the equation (1):
G5 = G4−0.5 × 20 × log (X5 / 1000) dB
= 9.53−0.5 × 20 × log (899/1000) dB
= 9.99 dB
It becomes.

第６回目の音声認識時の発声では、更新前の送話ゲインがＧ5＝９．９９ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ6は、（２）式より、
Ｘ6＝Ａ6×１０^G5／２０
＝３００×１０^{９．９９／２０}
＝９４８
となる。 In the utterance at the time of the sixth speech recognition, the transmission gain before update is G5 = 9.99 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X6 detected by the speech level detection unit 103A From the equation (2)
X6 = A6 × 10 ^{G5 / 20}
= 300 × 10 ^{9.99 / 20}
= 948
It becomes.

この場合、更新後の送話ゲインＧ6は、（１）式より、
Ｇ6＝Ｇ5−０．５×２０×ｌｏｇ（Ｘ6／１０００）ｄＢ
＝９．９９−０．５×２０×ｌｏｇ（９４８／１０００）ｄＢ
＝１０．２２ｄＢ
となる。 In this case, the updated transmission gain G6 is as follows from the equation (1):
G6 = G5−0.5 × 20 × log (X6 / 1000) dB
= 9.99−0.5 × 20 × log (948/1000) dB
= 10.22 dB
It becomes.

第７回目の音声認識時の発声では、更新前の送話ゲインがＧ6＝１０．２２ｄＢであり、発声前に増幅器１０８に設定され、この場合、音声レベル検出部１０３Ａで検出される音声レベルＸ7は、（２）式より、
Ｘ7＝Ａ7×１０^G6／２０
＝３００×１０^{１０．２２／２０}
＝９７３
となる。 In the utterance at the time of the seventh speech recognition, the transmission gain before update is G6 = 10.22 dB, and is set in the amplifier 108 before utterance. In this case, the speech level X7 detected by the speech level detection unit 103A From the equation (2)
X7 = A7 × 10 ^{G6 / 20}
= 300 × 10 ^{10.22 / 20}
= 973
It becomes.

この場合、更新後の送話ゲインＧ7は、（１）式より、
Ｇ7＝Ｇ6−０．５×２０×ｌｏｇ（Ｘ7／１０００）ｄＢ
＝１０．２２−０．５×２０×ｌｏｇ（９７３／１０００）ｄＢ
＝１０．３４ｄＢ
となる。 In this case, the updated transmission gain G7 is calculated from the equation (1):
G7 = G6−0.5 × 20 × log (X7 / 1000) dB
= 10.22-0.5 × 20 × log (973/1000) dB
= 10.34 dB
It becomes.

このように、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きく、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さくなった場合にも、送話ゲインを更新することにより、第7回目でほぼ最適な送話ゲインになる。 As described above, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the sound level input to the microphone 109 is large, and the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. Even when the distance from the microphone 109 to the user's mouth (d2 = 7 cm) suddenly increases and the sound level input to the microphone 109 decreases, the transmission gain is updated, so that Optimal transmission gain.

図１３は図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連の別の動作例を説明するフローチャートである。
本図に示すように、ステップ２１１において、制御部１０３は操作部１０４の音声認識開始キー押下を検出し音声認識が起動されるのを検出する。
ステップ２１２において、送話ゲイン設定制御部１０３Ｂは、制御部１０３の起動検出後、送話ゲイン情報記憶部１０６Ａから送話ゲインの初期値（Ｇ0）を読み出し増幅器１０８に設定する。 FIG. 13 is a flowchart for explaining another series of operation examples of the transmission gain update processing of the transmission gain setting control unit 103B in FIG.
As shown in this figure, in step 211, the control unit 103 detects that the voice recognition start key of the operation unit 104 is pressed and detects that voice recognition is activated.
In step 212, the transmission gain setting control unit 103B reads the initial value (G0) of the transmission gain from the transmission gain information storage unit 106A and sets it in the amplifier 108 after detecting the activation of the control unit 103.

ステップ２１３において、制御部１０３は、送話ゲイン設定制御部１０３Ｂが送話ゲインを増幅器１０８に設定した後、マイクロフォン１０９より入力された音声信号を増幅器１０８で音声認識に適した音声レベルに調整した入力音声に対して、音声認識部１１１を起動して音声認識させる。
ステップ２１４において、送話ゲイン設定制御部１０３Ｂは音声認識部１１１からの認識結果の確定を待つ。 In step 213, after the transmission gain setting control unit 103 B sets the transmission gain in the amplifier 108, the control unit 103 adjusts the voice signal input from the microphone 109 to a voice level suitable for voice recognition by the amplifier 108. For the input voice, the voice recognition unit 111 is activated to perform voice recognition.
In step 214, the transmission gain setting control unit 103B waits for confirmation of the recognition result from the speech recognition unit 111.

ステップ２１５において、送話ゲイン設定制御部１０３Ｂは、認識結果の確定後、認識結果と音声レベル検出部１０３Ａにより検出された音声レベル情報を取得する。認識結果を表示部１０５に表示し、スピーカ１１０に音声で出力させる。
ステップ２１６において、送話ゲイン設定制御部１０３Ｂはタイマをスタートし、次の音声認識起動検出までの時間（ＴＩＭＥ）を計測する。 In step 215, the transmission gain setting control unit 103B acquires the recognition result and the voice level information detected by the voice level detection unit 103A after the recognition result is confirmed. The recognition result is displayed on the display unit 105, and the speaker 110 is made to output the sound.
In step 216, the transmission gain setting control unit 103B starts a timer and measures the time (TIME) until the next voice recognition activation detection.

ステップ２１７において、送話ゲイン設定制御部１０３Ｂは、計測された時間ＴＩＭＥが所定時間Ｔｈと比較して、
ＴＩＭＥ＞Ｔｈ
が成立する場合には処理を終了する。すなわち、所定時間Ｔｈ経過しても再度音声認識起動検出が無い場合には、音声認識完了として処理を終了する。 In step 217, the transmission gain setting control unit 103B compares the measured time TIME with the predetermined time Th,
TIME> Th
If is established, the process is terminated. That is, when the voice recognition activation is not detected again even after the predetermined time Th has elapsed, the process is terminated as the voice recognition is completed.

ステップ２１８において、制御部１０３は操作部１０４の音声認識開始キー押下を検出し音声認識が起動されるか否かを検出する。起動が検出されない場合にはステップ２１７に戻る。
ステップ２１９において、送話ゲイン設定制御部１０３Ｂは、制御部１０３の起動検出後、送話ゲイン情報記憶部１０６Ａから送話ゲインの初期値（Ｇ0）を読み出し増幅器１０８に設定し、取得した音声レベル情報に基づき送話ゲインの初期値を更新し送話ゲイン情報記憶部１０６Ａに保存し、又は送話ゲイン情報記憶部１０６Ａから更新前の送話ゲイン（Ｇn-1）を読み出し増幅器１０８に設定し、取得した音声レベル情報に基づき送話ゲインを更新し、更新した送話ゲイン（Ｇn）を送話ゲイン情報記憶部１０６Ａに保存し、次回の発声時に更新した送話ゲインを送話ゲイン情報記憶部１０６Ａから読み出し増幅器１０８に設定する。 In step 218, the control unit 103 detects whether or not the voice recognition start key of the operation unit 104 is pressed and detects whether or not voice recognition is activated. If activation is not detected, the process returns to step 217.
In step 219, the transmission gain setting control unit 103B detects the activation of the control unit 103, reads the initial value (G0) of the transmission gain from the transmission gain information storage unit 106A, sets it in the amplifier 108, and acquires the acquired voice level. Based on the information, the initial value of the transmission gain is updated and stored in the transmission gain information storage unit 106A, or the transmission gain (Gn-1) before update is read from the transmission gain information storage unit 106A and set in the amplifier 108. The transmission gain is updated based on the acquired voice level information, the updated transmission gain (Gn) is stored in the transmission gain information storage unit 106A, and the updated transmission gain at the next utterance is stored in the transmission gain information storage. The read amplifier 108 is set from the unit 106A.

ステップ２２０において、制御部１０３は、送話ゲイン設定制御部１０３Ｂが送話ゲインを増幅器１０８に設定した後、マイクロフォン１０９より入力された音声信号を増幅器１０８で音声認識に適した音声レベルに調整した入力音声に対して、音声認識部１１１を起動して音声認識させる。
ステップ２２１において、送話ゲイン設定制御部１０３Ｂは、音声認識部１１１が音声認識結果を確定するまで音声認識処理を待つ。 In step 220, after the transmission gain setting control unit 103B sets the transmission gain in the amplifier 108, the control unit 103 adjusts the audio signal input from the microphone 109 to an audio level suitable for speech recognition by the amplifier 108. For the input voice, the voice recognition unit 111 is activated to perform voice recognition.
In step 221, the transmission gain setting control unit 103B waits for a speech recognition process until the speech recognition unit 111 determines a speech recognition result.

ステップ２２２において、送話ゲイン設定制御部１０３Ｂは、認識結果の確定後、認識結果と音声レベル検出部１０３Ａにより検出された音声レベル情報を取得する。認識結果を表示部１０５に表示し、スピーカ１１０に音声で出力させ、ステップ２１６に戻る。
このようにして、連続起動しない音声認識時には常に送話ゲインの初期値として送話ゲイン情報記憶部１０６Ａに保持している値を使用し、連続起動する音声認識時には送話ゲインを更新し送話ゲインの最適化を行う。すなわち、前回の音声認識で誤認識が生じた場合音声認識の再起動を行い、再起動時には送話ゲインを更新する。 In step 222, the transmission gain setting control unit 103B acquires the recognition result and the voice level information detected by the voice level detection unit 103A after the recognition result is confirmed. The recognition result is displayed on the display unit 105 and is output to the speaker 110 by voice, and the process returns to step 216.
In this way, the value held in the transmission gain information storage unit 106A is always used as the initial value of the transmission gain at the time of voice recognition without continuous activation, and the transmission gain is updated at the time of continuous voice recognition. Perform gain optimization. That is, when an erroneous recognition occurs in the previous speech recognition, the speech recognition is restarted, and the transmission gain is updated at the restart.

図１４は図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連のさらなる別の動作例を説明するフローチャートである。
本図に示すように、ステップ２３１において、制御部１０３は操作部１０４の音声認識開始キー押下を検出し音声認識が起動されるのを検出する。
ステップ２３２において、送話ゲイン設定制御部１０３Ｂは、制御部１０３の起動検出後、送話ゲイン情報記憶部１０６Ａから更新前の送話ゲイン（Ｇn-1）を読み出し増幅器１０８に設定を行う。送話ゲイン情報記憶部１０６Ａに更新前の送話ゲイン（Ｇn-1）が保持されていない場合には送話ゲインの初期値（Ｇ0）を増幅器１０８に設定する。 FIG. 14 is a flowchart for explaining still another operation example of a series of transmission gain update processing of the transmission gain setting control unit 103B in FIG.
As shown in the figure, in step 231, the control unit 103 detects that the voice recognition start key of the operation unit 104 is pressed and detects that voice recognition is activated.
In step 232, the transmission gain setting control unit 103B detects the activation of the control unit 103, reads the transmission gain (Gn-1) before update from the transmission gain information storage unit 106A, and sets it in the amplifier 108. When the transmission gain (Gn-1) before update is not held in the transmission gain information storage unit 106A, the initial value (G0) of the transmission gain is set in the amplifier 108.

ステップ２３３において、音声レベル検出部１０３Ａで音声レベルＸnを検出し、
９００≦Ｘｎ≦１１００
（適正音声レベルＣ＝１０００）
の不等式を満たすか否かを判断する。
ステップ２３４において、この不等式を満たす場合には時定数Ｋ＝０．５とおき、ステップ２３６に進む。 In step 233, the audio level detection unit 103A detects the audio level Xn,
900 ≦ Xn ≦ 1100
(Proper audio level C = 1000)
Whether or not the inequality is satisfied.
In step 234, when this inequality is satisfied, the time constant K is set to 0.5 and the process proceeds to step 236.

ステップ２３５において、この不等式を満たさない場合には時定数Ｋ＝１．０とおく。
ステップ２３６において、制御部１０３は送話ゲイン設定制御部１０３Ｂによる送話ゲインを増幅器１０８に設定後、マイクロフォン１０９に入力された音声信号を増幅器１０８で音声認識に適した音声レベルに調整した入力音声に対して、音声認識部１１１を起動して音声認識させる。 In step 235, if this inequality is not satisfied, the time constant K is set to 1.0.
In step 236, the control unit 103 sets the transmission gain by the transmission gain setting control unit 103B in the amplifier 108, and then adjusts the voice signal input to the microphone 109 to a voice level suitable for voice recognition by the amplifier 108. In response to this, the voice recognition unit 111 is activated to perform voice recognition.

ステップ２３７において、送話ゲイン設定制御部１０３Ｂは音声認識部１１１からの認識結果の確定を待つ。
ステップ２３８において、送話ゲイン設定制御部１０３Ｂは、認識結果の確定後、認識結果と音声レベル検出部１０３Ａにより検出された音声レベル情報を取得する。認識結果を表示部１０５に表示し、スピーカ１１０に音声で出力させる。 In step 237, the transmission gain setting control unit 103B waits for confirmation of the recognition result from the speech recognition unit 111.
In step 238, the transmission gain setting control unit 103B acquires the recognition result and the voice level information detected by the voice level detection unit 103A after the recognition result is confirmed. The recognition result is displayed on the display unit 105, and the speaker 110 is made to output the sound.

ステップ２３９において、送話ゲインの更新処理（式（１）参照）を行う。この場合、ステップ２３４、ステップ２３５において決定された時定数Ｋの設定を行う。
ステップ２４０において、送話ゲイン設定制御部１０３Ｂは送話ゲイン情報記憶部１０６Ａに更新された送話ゲイン（Ｇn）の値を保存し、処理を終了する。
このようにして、音声レベル検出部１０３Ａで検出された音声レベルＸnが適正音声レベルＣから離れている場合には時定数Ｋを大きくし、を検出し、音声レベルＸnが適正音声レベルＣに近い場合には時定数Ｋを小さくし、適正音声レベルになるまでの発声回数を低減し、最適な送話ゲインを取得することが可能になる。 In step 239, transmission gain update processing (see equation (1)) is performed. In this case, the time constant K determined in step 234 and step 235 is set.
In step 240, the transmission gain setting control unit 103B stores the updated transmission gain (Gn) value in the transmission gain information storage unit 106A, and ends the process.
In this way, when the audio level Xn detected by the audio level detection unit 103A is far from the appropriate audio level C, the time constant K is increased, and the audio level Xn is close to the appropriate audio level C. In this case, the time constant K is reduced, the number of utterances until the appropriate voice level is reached, and the optimum transmission gain can be acquired.

図１５は図１における変形例に係る携帯型情報端末装置の概略構成を示すブロック図である。本図に示すように、図１と比較して、制御部１０３にテスト部１０３Ｃが設けられ、テスト部１０３Ｃは、操作部１０４でテストモードのキー操作により、時定数を調整可能にし、例えば、テスト用の「テスト」という音声認識の発声を複数回行って音声認識部１１１に音声認識を処理させ、送話ゲイン設定制御部１０３Ｂに送話ゲインの最適値を予め求めさせ、送話ゲイン情報記憶部１０６Ａに送話ゲインの初期値として保存させてもよい。
これにより、音声認識時の最適な送話ゲインの設定が容易に行われる。 FIG. 15 is a block diagram showing a schematic configuration of a portable information terminal device according to a modification of FIG. As shown in this figure, compared to FIG. 1, a test unit 103C is provided in the control unit 103, and the test unit 103C can adjust a time constant by operating a key in a test mode on the operation unit 104. The test “test” speech recognition is performed a plurality of times, the speech recognition unit 111 processes the speech recognition, the transmission gain setting control unit 103B determines the optimum value of the transmission gain in advance, and the transmission gain information The storage unit 106A may store the transmission gain as an initial value.
Thereby, the optimum transmission gain at the time of voice recognition can be easily set.

以上の説明では、携帯型情報端末装置について説明を行ったが、携帯電話機、ＰＨＳ（簡易型電話機）、ページャ、電子手帳等、さらに、パーソナルコンピュータを含む移動可能な移動機のすべての装置に本発明の利用が可能である。 In the above description, the portable information terminal device has been described. However, the portable information terminal device is not limited to a mobile phone, a PHS (simple phone), a pager, an electronic notebook, or any other mobile device including a personal computer. The invention can be used.

本発明に係る携帯型情報端末装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the portable information terminal device which concerns on this invention. 図１における送話ゲイン情報記憶部１０６Ａに保持され、音声認識時に送話ゲインを最適化するための各種情報例を説明する図である。It is a figure explaining the example of various information hold | maintained at the transmission gain information storage part 106A in FIG. 1, and optimizing a transmission gain at the time of speech recognition. 図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連の動作例を説明するフローチャートである。3 is a flowchart for explaining a series of operation examples of transmission gain update processing of a transmission gain setting control unit 103B in FIG. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝１．０である場合の例を説明する図である。In the example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the voice level input to the microphone 109 is small, and there is no variation. It is a figure explaining the example in case the time constant K = 1.0. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキがあり、時定数Ｋ＝１．０である場合例を説明する図である。In the example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the voice level input to the microphone 109 is small, and there are variations. It is a figure explaining an example in case of time constant K = 1.0. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。In the example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the voice level input to the microphone 109 is small, and there is no variation. It is a figure explaining the example in case the time constant K = 0.5. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。In the example of the transmission gain calculation by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the sound level input to the microphone 109 is large, and there is no variation. It is a figure explaining the example in case the time constant K = 0.5. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５である場合の例を説明する図である。In the example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, the voice level input to the microphone 109 is small, and there is no variation. It is a figure explaining the example in case the time constant K = 0.5. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが有り、時定数Ｋ＝０．５である場合の例を説明する図である。In the transmission gain calculation example by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the voice level input to the microphone 109 is small, and there is variation. It is a figure explaining the example in case the time constant K = 0.5. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急に雑音が入った場合の例を説明する図である。In the example of the transmission gain calculation by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, the voice level input to the microphone 109 is small, and there is no variation. FIG. 10 is a diagram for explaining an example when noise suddenly enters in a state where the time constant K = 0.5 and the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さくなった場合の例を説明する図である。In the example of the transmission gain calculation by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is small, the sound level input to the microphone 109 is large, and there is no variation. The time constant K = 0.5, and the distance from the microphone 109 to the user's mouth (d2 = 7 cm) suddenly increases with the sound level detected by the sound level detection unit 103A close to the appropriate sound level. It is a figure explaining the example when the audio | voice level input into 109 becomes small. 図１における送話ゲイン設定制御部１０３Ｂによる送話ゲインの算出例で、マイクロフォン１０９からユーザの口までの距離（ｄ２＝７ｃｍ）が大きく、マイクロフォン１０９に入力される音声レベルが小さく、バラツキが無く、時定数Ｋ＝０．５であり、音声レベル検出部１０３Ａで検出される音声レベルが適正音声レベルに近い状態で急にマイクロフォン１０９からユーザの口までの距離（ｄ１＝３ｃｍ）が小さく、マイクロフォン１０９に入力される音声レベルが大きくなった場合の例を説明する図である。In the example of calculation of the transmission gain by the transmission gain setting control unit 103B in FIG. 1, the distance from the microphone 109 to the user's mouth (d2 = 7 cm) is large, the voice level input to the microphone 109 is small, and there is no variation. The time constant K = 0.5, and the distance from the microphone 109 to the user's mouth (d1 = 3 cm) is suddenly small when the sound level detected by the sound level detection unit 103A is close to the appropriate sound level. It is a figure explaining the example when the audio | voice level input into 109 becomes large. 図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連の別の動作例を説明するフローチャートである。6 is a flowchart for explaining another series of operation examples of transmission gain update processing of the transmission gain setting control unit 103B in FIG. 図１における送話ゲイン設定制御部１０３Ｂの送話ゲインの更新処理の一連のさらなる別の動作例を説明するフローチャートである。10 is a flowchart for explaining still another example of a series of operations for updating the transmission gain of the transmission gain setting control unit 103B in FIG. 図１における変形例に係る携帯型情報端末装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the portable information terminal device which concerns on the modification in FIG. 本発明の前提となる携帯電話機における音声認識の使用例を説明する図である。It is a figure explaining the usage example of the speech recognition in the mobile telephone used as the premise of this invention.

符号の説明Explanation of symbols

１００…携帯型情報端末装置
１０１…アンテナ
１０２…無線部
１０３…制御部
１０３Ａ…音声レベル検出部
１０３Ｂ…送話ゲイン設定制御部
１０３Ｃ…テスト部
１０４…操作部
１０５…表示部
１０６…メモリ
１０６Ａ…送話ゲイン情報記憶部
１０７…Ａ／Ｄ・Ｄ／Ａコンバータ
１０８…増幅器（アンプ）
１０９…マイクロフォン
１１０…スピーカ
１１１…音声認識部 DESCRIPTION OF SYMBOLS 100 ... Portable information terminal device 101 ... Antenna 102 ... Radio | wireless part 103 ... Control part 103A ... Voice level detection part 103B ... Transmission gain setting control part 103C ... Test part 104 ... Operation part 105 ... Display part 106 ... Memory 106A ... Transmission Talk gain information storage unit 107 ... A / D / D / A converter 108 ... Amplifier
109 ... Microphone 110 ... Speaker 111 ... Voice recognition unit

Claims

送話部のマイクロフォンに入力する音声を認識する音声認識装置において、
送話部の前記マイクロフォンから出力される音声信号を送話ゲインで増幅する増幅器と、
前記増幅器で増幅された音声レベルを検出する音声レベル検出部と、
送話ゲインの初期値、送話ゲイン、適正音声レベル、送話ゲインを更新するための時定数を記憶する送話ゲイン情報記憶部と、
前記送話ゲイン情報記憶部から送話ゲイン、適正音声レベル、時定数を読み出し、前記増幅器に前記送話ゲインを設定し、前記音声レベル検出部で検出された音声レベルを前記適正音声レベルにすべきゲインに前記時定数を乗じた値を前記送話ゲインに加算して前記送話ゲインを更新し、更新した前記送話ゲインを前記送話ゲイン情報記憶部に記憶させる送話ゲイン設定制御部と、
前記増幅器で増幅された音声信号を入力して音声認識を行う音声認識部とを備えることを特徴とする音声認識装置。 In a speech recognition device that recognizes speech input to the microphone of the transmitter,
An amplifier that amplifies the audio signal output from the microphone of the transmission unit by a transmission gain;
A sound level detector for detecting a sound level amplified by the amplifier;
A transmission gain information storage unit for storing an initial value of the transmission gain, a transmission gain, an appropriate voice level, and a time constant for updating the transmission gain;
The transmission gain, appropriate sound level, and time constant are read from the transmission gain information storage unit, the transmission gain is set in the amplifier, and the sound level detected by the sound level detection unit is set to the appropriate sound level. A transmission gain setting control unit that updates a transmission gain by adding a value obtained by multiplying a power gain by the time constant to the transmission gain, and stores the updated transmission gain in the transmission gain information storage unit When,
A speech recognition apparatus comprising: a speech recognition unit that performs speech recognition by inputting the speech signal amplified by the amplifier.

前記送話ゲイン設定制御部は、音声認識時の第１回目の発声時に前記送話ゲイン情報記憶部から前記送話ゲインの初期値を読み出し前記増幅器に設定することを特徴とする、請求項１に記載の音声認識装置。 2. The transmission gain setting control unit reads an initial value of the transmission gain from the transmission gain information storage unit and sets it in the amplifier at the first utterance during speech recognition. The speech recognition apparatus described in 1.

前記送話ゲイン設定制御部は、前記送話ゲイン情報記憶部に更新された送話ゲインが記憶されている場合には、音声認識の再開時の第１回目の発声時に前記送話ゲイン情報記憶部から更新された前記送話ゲインを読み出し前記増幅器に設定することを特徴とする、請求項１に記載の音声認識装置。 When the updated transmission gain is stored in the transmission gain information storage unit, the transmission gain setting control unit stores the transmission gain information at the first utterance when speech recognition is resumed. The speech recognition apparatus according to claim 1, wherein the transmission gain updated from a unit is read and set in the amplifier.

前記送話ゲイン設定制御部は、音声認識時の第１回目の発声時に前記送話ゲイン情報記憶部から前記送話ゲインの初期値を読み出し前記増幅器に設定し前記音声認識部に音声認識を行わせ、音声の認識確定後から所定時間内に音声認識の起動が検出された場合には前記送話ゲイン情報記憶部から送話ゲインの初期値を読み出し前記増幅器に設定し、取得した音声レベル情報に基づき前記送話ゲインの初期値を更新し前記送話ゲイン情報記憶部に保存し、又は前記送話ゲイン情報記憶部から更新前の送話ゲインを読み出し前記増幅器に設定し、取得した音声レベル情報に基づき前記送話ゲインを更新し、前記送話ゲイン情報記憶部に保存し、次回の発声時に更新した送話ゲインを前記送話ゲイン情報記憶部から読み出し前記増幅器に設定し、所定時間内に音声認識の起動が検出されない場合には音声認識処理を終了させることを特徴とする、請求項１に記載の音声認識装置。 The transmission gain setting control unit reads the initial value of the transmission gain from the transmission gain information storage unit at the time of the first utterance at the time of voice recognition, sets the value to the amplifier, and performs voice recognition on the voice recognition unit. If the activation of voice recognition is detected within a predetermined time after the voice recognition is confirmed, the initial value of the transmission gain is read from the transmission gain information storage unit and set in the amplifier, and the acquired voice level information The initial value of the transmission gain is updated based on the transmission gain information and stored in the transmission gain information storage unit, or the transmission gain before update is read from the transmission gain information storage unit and set in the amplifier, and the acquired voice level The transmission gain is updated based on the information, stored in the transmission gain information storage unit, the transmission gain updated at the next utterance is read from the transmission gain information storage unit, set in the amplifier, Characterized in that to terminate the voice recognition process if the activation of the voice recognition is not detected in time, the speech recognition apparatus according to claim 1.

前記送話ゲイン設定制御部は、前記送話ゲイン情報記憶部で検出された音声レベルが前記適正音声レベルを中心とする一定範囲内に在るか又は一定範囲外に在るかを判断し、一定範囲内に在る場合の時定数を一定範囲外に在る場合の時定数よりも小さくすることを特徴とする、請求項１に記載の音声認識装置。 The transmission gain setting control unit determines whether the voice level detected by the transmission gain information storage unit is within a certain range centered on the appropriate voice level or outside the certain range, The speech recognition apparatus according to claim 1, wherein a time constant in the case of being within a certain range is made smaller than a time constant in the case of being outside the certain range.

前記送話ゲイン設定制御部で更新される送話ゲインＧｎは、下記の式
Ｇn＝Ｇn-1−Ｋ×２０×ｌｏｇ（Ｘn／Ｃ）ｄＢ
（Ｋ：送話ゲインの更新の時定数（０.０＜Ｋ≦１．０）、
n：音声認識回数（＝１，2、3…）、
Ｃ：適正音声レベル、
Ｘn：音声レベル検出部で検出された音声レベル）
で表されることを特徴とする、請求項１に記載の音声認識装置。 The transmission gain Gn updated by the transmission gain setting control unit is expressed by the following equation: Gn = Gn-1−K × 20 × log (Xn / C) dB
(K: Time constant for updating transmission gain (0.0 <K ≦ 1.0),
n: Number of voice recognition (= 1, 2, 3 ...),
C: Appropriate audio level,
Xn: voice level detected by the voice level detector
The speech recognition apparatus according to claim 1, wherein

さらに、テスト部が設けられ、前記テスト部は、キー操作による時定数を調整可能にし、複数回のテスト用の発声に対して前記音声認識部に音声認識を処理させ、前記送話ゲイン設定制御部に送話ゲインの最適値を予め求めさせ、前記送話ゲイン情報記憶部に送話ゲインの初期値として保存させることを特徴とする、請求項１に記載の音声認識装置。 Further, a test unit is provided, and the test unit can adjust a time constant by a key operation, allows the voice recognition unit to process voice recognition for a plurality of test utterances, and controls the transmission gain setting control. The speech recognition apparatus according to claim 1, wherein an optimal value of the transmission gain is obtained in advance by the unit, and is stored as an initial value of the transmission gain in the transmission gain information storage unit.

送話部のマイクロフォンに入力する音声を認識する音声認識方法において、
適正音声レベル、送話ゲインを記憶する工程と、
送話部の前記マイクロフォンから出力される音声信号を前記送話ゲインで増幅する工程と、
増幅された音声レベルを検出する工程と、
検出された前記音声レベルを前記適正音声レベルにすべきゲインに時定数を乗じた値を前記送話ゲインに加算して前記送話ゲインを更新し更新した送話ゲインを記憶する工程と、
増幅された音声信号を入力して音声認識を行う工程とを備えることを特徴とする音声認識方法。 In a speech recognition method for recognizing speech input to a microphone of a transmitter,
Storing a proper voice level and transmission gain;
Amplifying the audio signal output from the microphone of the transmission unit by the transmission gain;
Detecting the amplified audio level;
Adding the value obtained by multiplying the detected voice level to the appropriate voice level by a time constant to the transmission gain to update the transmission gain and storing the updated transmission gain;
And a step of performing speech recognition by inputting the amplified speech signal.

音声認識方法を用いた携帯型情報端末装置において、
前記携帯型情報端末装置の携帯情報端末機能に加えて、請求項８に記載の方法に基づく音声認識機能を備えることを特徴とする、音声認識方法を用いた携帯型情報端末装置。 In a portable information terminal device using a speech recognition method,
A portable information terminal device using a voice recognition method, comprising a voice recognition function based on the method according to claim 8 in addition to the portable information terminal function of the portable information terminal device.

音声認識された認識結果を前記携帯型情報端末装置の受話口のスピーカに鳴動させ、前記携帯型情報端末装置の表示部に表示させることを特徴とする、請求項９に記載の音声認識方法を用いた携帯型情報端末装置。 The speech recognition method according to claim 9, wherein the recognition result recognized by speech is caused to ring on a speaker of an earpiece of the portable information terminal device and displayed on a display unit of the portable information terminal device. The portable information terminal device used.