JP2007194833A

JP2007194833A - Mobile phone with hands-free function

Info

Publication number: JP2007194833A
Application number: JP2006009985A
Authority: JP
Inventors: Yasuaki Ohashi; 靖明大橋
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2006-01-18
Filing date: 2006-01-18
Publication date: 2007-08-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mobile phone with which a hands-free telephone call can be made, without putting on a headset microphone, the mobile phone serving as an audio input interface for a car navigation system. <P>SOLUTION: A plurality of microphones 2 are mounted on the mobile phone 1. Echo canceller processing 4 is performed for an input to suppress a sound repeatedly input from a speaker 3. Before starting a voice operation or telephone call, a user speaks a predetermined word, and then spoken speech detection 5 is performed. When the word is recognized, a mean gain value of a speech signal obtained by the input and the gain value of a noise signal obtained in several seconds after the speaking are calculated. Gain correction processing 6 is performed, based on both the gain values and noise suppression 7, is further carried out to obtain an articulate speech signal. For a telephone call, the voice signal is transmitted and for operations of the car navigation system, a signal, after voice recognition feature quantity conversion 8 is transmitted. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ハンズフリー通話を可能とし、さらにカーナビへの音声入力インターフェース機能をもつ携帯電話に関する。 The present invention relates to a mobile phone that enables a hands-free call and further has a voice input interface function to a car navigation system.

現在、車内での携帯電話の手持ちによる使用は、車両事故の発生が増加してしまうため禁止されている。しかし、車内での携帯電話の使用が強く望まれており、それに対してハンズフリーによる通話手段が種々提案されている。ここで、従来から提案されている車内でのハンズフリー通話の手段であるヘッドセットマイクロホンなどは、身体に装着するなどといった拘束が生じる。 Currently, the use of a mobile phone in a car by hand is prohibited because the occurrence of vehicle accidents increases. However, the use of a mobile phone in a vehicle is strongly desired, and various hands-free calling means have been proposed. Here, a restriction such as wearing a headset microphone or the like, which is a conventionally proposed means for hands-free calling in a vehicle, occurs.

また、カーナビゲーションシステム（以下、カーナビと略称する）等のインターフェースとして、音声操作機能が搭載されているが、認識性能が著しく低いために、実用性に課題がありあまり普及していないというのが実情である。 In addition, the voice operation function is installed as an interface of a car navigation system (hereinafter abbreviated as “car navigation”). However, the recognition performance is extremely low, so there is a problem in practicality and it is not widely used. It is a fact.

ここで、携帯電話をマイクロホンとしてカーナビなどの音声操作システムのインターフェースに用いることが、例えば特許文献１に示されており、ユーザが携帯電話に発話し、得られた音声信号が有線もしくは無線によって音声操作システムに送信されるようになっている。より詳しくは、遠隔操作モードを選択して移動電話機でキー操作を行うと、そのキー操作信号がカーナビゲーション装置に送信され、カーナビゲーション装置は、そのキー操作信号に応じた動作を行う。また、音声認識モードを選択し、使用者が音声を発すると、その音声は移動電話機のマイクロホンに入力されて、移動電話機から音声信号が送信され、カーナビゲーション装置は、その音声信号を音声認識部により認識し、その認識結果に応じた動作を行う。
特開２００３−１５２８８４号公報 Here, using a mobile phone as a microphone for an interface of a voice operation system such as a car navigation system is disclosed in, for example, Patent Document 1, in which a user speaks to a mobile phone, and the obtained voice signal is voiced by wire or wirelessly. It is sent to the operation system. More specifically, when the remote operation mode is selected and a key operation is performed on the mobile telephone, the key operation signal is transmitted to the car navigation device, and the car navigation device performs an operation according to the key operation signal. Further, when the voice recognition mode is selected and the user utters a voice, the voice is input to the microphone of the mobile phone, and a voice signal is transmitted from the mobile phone. The car navigation device transmits the voice signal to the voice recognition unit. Is recognized, and an operation corresponding to the recognition result is performed.
JP 2003-152848 A

しかしながら、特許文献１を含めた従来技術においては、携帯電話はハンズフリーの機能を備えてはいるが、携帯電話には、通話元の音声だけでなく、通話先の音声（返答）、車内の音楽や音声、または背景雑音などが混入してしまい、通話元の音声とその他の音声や雑音などとの区別がつかないので、通話元音声と間違えて誤認識を生じ得るという課題が発生する。 However, in the prior art including Patent Document 1, the mobile phone has a hands-free function. However, the mobile phone has not only the voice of the caller but also the voice (response) of the callee, Since music, voice, background noise, and the like are mixed and the caller's voice cannot be distinguished from other voices, noise, etc., there arises a problem that erroneous recognition may occur due to mistaken caller's voice.

本発明の目的は、ハンズフリー通話を可能とし、さらにカーナビ等への音声入力インターフェース機能を備えた携帯電話であって、通話元の音源を識別し識別精度を向上させる携帯電話を提供することにある。 An object of the present invention is to provide a mobile phone that enables a hands-free call and further has a voice input interface function to a car navigation system, etc., that identifies a call source sound source and improves the identification accuracy. is there.

前記課題を解決するために、本発明は次のような構成を採用する。
車内で使用し得るハンズフリー機能をもつ携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、前記複数マイクロホンから入力される各信号をもとにして、スピーカからの回り込み音を抑圧するエコーキャンセラ処理、発話音声と発話音声以外の音声・雑音とに対する利得補正処理と雑音抑圧処理を行い、発話音声を通話信号として送信する構成とする。 In order to solve the above problems, the present invention adopts the following configuration.
A mobile phone with a hands-free function that can be used in a car,
A plurality of microphones are installed in the mobile phone, and an echo canceller process that suppresses a wraparound sound from a speaker based on each signal input from the plurality of microphones, for speech voice and voice / noise other than speech voice The configuration is such that the gain correction processing and the noise suppression processing are performed, and the speech is transmitted as a call signal.

また、前記携帯電話において、前記携帯電話をハンズフリー用マイクロホンとして機能させ、車内に搭載されたカーナビゲーションシステムまたはＡＶシステム、もしくはＷＥＢサーバに対して音声操作を行えるインターフェースとして用いる構成とする。さらに、前記携帯電話において、前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムまたはサーバに送信する構成とする。 In the mobile phone, the mobile phone functions as a hands-free microphone and is used as an interface for performing voice operations on a car navigation system or an AV system installed in a vehicle or a WEB server. Further, in the cellular phone, a speech recognition feature value conversion process is performed on the speech signal subjected to the noise suppression process to reduce the amount of information, and the result is transmitted as a voice operation signal to the voice operated system or server. The configuration.

また、車内で使用し得るハンズフリー機能をもち、通話を可能とするとともにカーナビゲーションシステムまたはＡＶシステムに対して音声操作を可能とする携帯電話であって、前記携帯電話に複数のマイクロホンを設置し、前記通話または前記音声操作の初期状態において、特定の単語群の初期発話音声に対して予め指定された単一マイクロホンから出力を得て利得調整処理を行うとともに前記初期発話音声の方位推定を行い、前記利得調整処理の完了後に、前記通話または前記音声操作のための発話音声に対する前記複数マイクロホンから出力される各信号、または前記方位推定信号をもとにして、エコーキャンセラ処理、利得補正処理、および雑音抑圧処理を行い、発話音声を通話信号として送信し、前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムに送信する構成とする。 In addition, the mobile phone has a hands-free function that can be used in a vehicle, enables a telephone call, and enables voice operation with respect to a car navigation system or an AV system, and a plurality of microphones are installed on the mobile phone. In the initial state of the call or the voice operation, an output is obtained from a single microphone specified in advance for the initial speech of a specific word group, and gain adjustment processing is performed and the orientation of the initial speech is estimated. , After completion of the gain adjustment process, echo canceller process, gain correction process, based on each signal output from the plurality of microphones with respect to the speech voice for the call or the voice operation, or the direction estimation signal, And the noise-suppressed process, and the uttered voice is transmitted as a call signal, and the uttered voice subjected to the noise-suppressed process A structure in which to reduce the amount of information by performing a conversion process of the speech recognition feature quantity, and transmits to the system that is the voice operation as an audio operation signal to issue.

本発明によると、車内に設置された携帯電話は、ハンズフリー通話を可能とし、カーナビなどへの音声操作を可能とすることができる。その際、携帯電話に複数のマイクロホンを搭載し、信号処理を施すことによってユーザの発話音声の認識精度を向上させることができる。 According to the present invention, a mobile phone installed in a vehicle can make a hands-free call and can perform voice operations on a car navigation system or the like. In that case, the recognition accuracy of a user's uttered voice can be improved by mounting a plurality of microphones on a mobile phone and performing signal processing.

本発明の実施形態に係る携帯電話について、図１〜図４を参照しながら以下説明する。図１は本発明の実施形態に係る携帯電話の構成を示すブロック図である。図２は本実施形態に係る携帯電話のマイクロフォンの設置例を示す図である。図３は本実施形態に関する携帯電話、カーナビ並びに発話者（例えば運転者）の車内での配置関係を示す図である。図４は本実施形態に係る携帯電話における発話音声検出処理から雑音抑圧処理までの一連の処理を実施する構成例を示す図である。 A mobile phone according to an embodiment of the present invention will be described below with reference to FIGS. FIG. 1 is a block diagram showing a configuration of a mobile phone according to an embodiment of the present invention. FIG. 2 is a diagram showing an installation example of the microphone of the mobile phone according to the present embodiment. FIG. 3 is a diagram showing the arrangement relationship of the mobile phone, the car navigation system, and the speaker (for example, the driver) in the vehicle according to the present embodiment. FIG. 4 is a diagram showing a configuration example for carrying out a series of processes from a speech voice detection process to a noise suppression process in the mobile phone according to the present embodiment.

図面において、１は携帯電話、２は携帯電話に搭載するマイクロホン、３は携帯電話のスピーカ、４はエコーキャンセラ処理、５は発話音声検出処理、６は利得補正処理、７は雑音抑圧処理、８は音声認識特徴量変換処理、９はカーナビ、１０は設置された携帯電話、１１は初期状態、１２は特定単語認識状態、１３は雑音区間検出処理、１４は利得調整処理、１５は認識環境切り替え処理、１６は音声方位推定処理、１７は更新状態、１８は利得補正処理、１９は雑音抑圧処理、をそれぞれ表す。 In the drawings, 1 is a mobile phone, 2 is a microphone mounted on the mobile phone, 3 is a speaker of the mobile phone, 4 is an echo canceller process, 5 is a speech detection process, 6 is a gain correction process, 7 is a noise suppression process, 8 Is a voice recognition feature amount conversion process, 9 is a car navigation system, 10 is a mobile phone installed, 11 is an initial state, 12 is a specific word recognition state, 13 is a noise section detection process, 14 is a gain adjustment process, and 15 is a recognition environment switch. Processing, 16 represents voice direction estimation processing, 17 represents an update state, 18 represents gain correction processing, and 19 represents noise suppression processing.

図１において、本発明の実施形態においては、携帯電話１に複数のマイクロホン２…を搭載する。複数のマイクロホン２には、発話音声だけでなく、雑音やスピーカ３から出力される回り込み音も入力される。それらの音声や音の信号に対し、ハウリングが生じないようにエコーキャンセラ処理４を施し、その後、発話音声検出処理５を行う。ここで、通話相手の音声や車内で出力されている音楽やカーナビの応答音に対しては、車載のＡＶ機器やカーナビのシステムから携帯電話に伝送し、エコーキャンセラ４によって抑圧する（公知の技術でエコーキャンセラを行う）。また、複数のマイクロホンを用いることで、これらのマイクロホンからの位相差や振幅差（公知のマイクロホンアレー信号処理技術を行う）、または独立性（公知のＩＣＡ技術であり、各マイクロホンからの出力信号の独立成分分析の手法）に基づいて雑音を抑圧する。 In FIG. 1, in the embodiment of the present invention, a plurality of microphones 2. To the plurality of microphones 2, not only speech sound but also noise and sneak sound output from the speaker 3 are input. Echo canceller processing 4 is applied to those voices and sound signals so that howling does not occur, and then speech speech detection processing 5 is performed. Here, the voice of the other party, the music output in the car, and the response sound of the car navigation system are transmitted from the in-vehicle AV device or the car navigation system to the mobile phone and suppressed by the echo canceller 4 (known technology). Echo canceller in). Also, by using a plurality of microphones, the phase difference and amplitude difference from these microphones (performs a known microphone array signal processing technique) or independence (a known ICA technique, the output signal from each microphone Noise is suppressed based on an independent component analysis method.

図示するように、エコーキャンセラ４した後に発話音声が検出５された場合、利得補正処理６を行い、さらに雑音抑圧処理７を行う。携帯電話の用途が通話の場合はそのまま通話信号を送信するが、音声の操作であれば、音声認識特徴量変換処理８を施し、その音声操作信号を音声認識システム（例えば、カーナビまたはＡＶ機器に搭載されるシステム）へ送信する。上述したように、背景雑音に対しては、複数のマイクロホン２を搭載し、マイクロホン間に入力された各信号の位相差や振幅差、また独立性等から雑音を抑圧することができ、従来技術におけるヘッドセット購入及び装着などの煩わしさがない。また、マイクロホン特性や入力音量によって信号が大きく変わることを考慮し、利得補正処理６（公知のＡＧＣの技術を適用して）を行うことで、さらに快適な通話が可能となる。また、雑音抑圧処理などを追加することで、ＳＮ比及び認識精度が向上する。なお、複数のマイクロホンからの各出力信号ラインで上述した各種処理を行うこととする。 As shown in the figure, when the speech voice is detected 5 after the echo canceller 4, the gain correction process 6 is performed, and the noise suppression process 7 is further performed. When the use of the mobile phone is a call, the call signal is transmitted as it is. However, if it is a voice operation, a voice recognition feature value conversion process 8 is performed, and the voice operation signal is sent to a voice recognition system (for example, a car navigation system or an AV device). To the installed system). As described above, with respect to background noise, a plurality of microphones 2 are mounted, and noise can be suppressed from the phase difference, amplitude difference, independence, etc. of each signal input between the microphones. There is no hassle of purchasing and wearing a headset. Considering the fact that the signal varies greatly depending on the microphone characteristics and the input volume, performing a gain correction process 6 (using a well-known AGC technique) enables a more comfortable call. Further, by adding noise suppression processing or the like, the SN ratio and the recognition accuracy are improved. The various processes described above are performed on each output signal line from a plurality of microphones.

図２の点線枠は、図１に示すマイクロフォン２の搭載例を示す。マイクロホンアレイとは、配列されているマイクロホン間の入力位相差や振幅差を基に指向特性を形成する。したがって、図２に示すマイクロホンの配置によって、携帯電話から向かって水平に対する入力信号の指向特性を形成する。図２の左側には携帯電話を開けて使用する場合のマイクロホン配置例を示し、右側には携帯電話を閉じて車内に設置し使用する場合のマイクロホン配置例を示す。 A dotted frame in FIG. 2 shows an example of mounting the microphone 2 shown in FIG. The microphone array forms directivity characteristics based on input phase differences and amplitude differences between arranged microphones. Therefore, the directional characteristic of the input signal with respect to the horizontal direction from the mobile phone is formed by the arrangement of the microphones shown in FIG. The left side of FIG. 2 shows an example of microphone arrangement when the mobile phone is opened and used, and the right side shows an example of microphone arrangement when the mobile phone is closed and installed in the vehicle.

図３に、車内で携帯電話１０を用いる場合の例を示す。携帯電話１０はユーザ（例えば、運転者）の正面付近に設置し、初期状態から正面方位に指向特性を形成させておけば、その方位の信号を集音することが可能である。入力された音声信号は、有線またはＢｌｕｅｔｏｏｔｈなどの無線を用いてカーナビ９に伝送される。 FIG. 3 shows an example in which the mobile phone 10 is used in a vehicle. If the mobile phone 10 is installed near the front of a user (for example, a driver) and directivity characteristics are formed in the front direction from the initial state, signals in that direction can be collected. The input audio signal is transmitted to the car navigation 9 using wireless communication such as wired or Bluetooth.

ユーザが通話を行う場合、携帯電話１０に伝送された相手の音声をカーナビ９に伝送し、カーナビ９に接続されているスピーカより出力する。このように、携帯電話をユーザ（運転者）の正面あたりに固定すれば、目的音声方位がある程度分かっているため、その方位の音声を集音し、利得補正や雑音抑圧の処理を施した信号を送信すれば良い。具体的には、複数マイクロホンの中央のマイクロホンが発話音声に対して最も指向特性が合致していれば、中央のマイクロホン以外のマイクロホンからの信号ラインの利得補正と雑音抑圧を、中央マイクロホン信号ラインとの比較で、適宜に調節すればよい。 When the user makes a call, the other party's voice transmitted to the mobile phone 10 is transmitted to the car navigation 9 and output from a speaker connected to the car navigation 9. In this way, if the mobile phone is fixed to the front of the user (driver), the target voice direction is known to some extent, so the voice in that direction is collected and the signal that has been subjected to gain correction and noise suppression processing. Can be sent. Specifically, if the center microphone of the plurality of microphones has the best directional characteristics for the speech, the signal line gain correction and noise suppression from microphones other than the center microphone are performed with the center microphone signal line. In this comparison, it may be adjusted appropriately.

図４は、図１に示される発話音声検出処理５から雑音抑圧処理７までの処理に関する構成例を示す。まず、図４の初期状態１１は、予め指定された単一マイクロホンの入力を用い、特定の単語群に対する音響モデル及び言語モデルを保持する状態である。少数のデータから成るモデルであれば、携帯電話のメモリ内に保管できるが、その単語群以外の音韻もデータベースとして保管する必要があるため、カーナビ９（携帯電話１と無線または有線で信号授受するシステム）で処理してもよい。また、特定の単語群は、車に付ける名前や掛け声などユーザが独自で作成することができるものである。 FIG. 4 shows a configuration example relating to the processing from the speech sound detection processing 5 to the noise suppression processing 7 shown in FIG. First, an initial state 11 in FIG. 4 is a state in which an acoustic model and a language model for a specific word group are held by using a single microphone input designated in advance. If the model is composed of a small number of data, it can be stored in the memory of the mobile phone, but the phonemes other than the word group also need to be stored as a database, so the car navigation 9 (transmits and receives signals from the mobile phone 1 wirelessly or by wire). System). Further, the specific word group can be created by the user such as a name or a shout given to the car.

特定単語認識状態１２によって初期発話がマッチングされた場合、雑音区間検出処理１３で数秒の間に雑音信号を得る。そして、初期発話の音声と雑音の信号から利得調整処理１４を行う。これによって、雑音信号の利得をある程度抑えることが可能となる。また、認識環境切り替え処理１５によって、特定の単語群に対する音響モデル及び言語モデルから複数の単語に対する音響モデル及び言語モデルへ切り替える。ここで、初期発話は、上述したように指定の単一マイクロホンで処理を行うが、複数のマイクロホンに入力されているため、その信号を用いて音声方位推定１６を行うことが可能である（発話音声の方位を推定することで、複数マイクロホンの各入力信号ラインの利得補正を適宜に行える）。 When the initial utterance is matched by the specific word recognition state 12, a noise signal is obtained in a few seconds by the noise interval detection processing 13. Then, gain adjustment processing 14 is performed from the voice of the initial utterance and the noise signal. As a result, the gain of the noise signal can be suppressed to some extent. Also, the recognition environment switching process 15 switches the acoustic model and language model for a specific word group to the acoustic model and language model for a plurality of words. Here, the initial utterance is processed by the designated single microphone as described above, but since it is input to a plurality of microphones, it is possible to perform the voice azimuth estimation 16 using the signal (utterance). By estimating the direction of the voice, the gain correction of each input signal line of the plurality of microphones can be performed as appropriate).

このように、予め特定の単語群に対する音響モデル及び言語モデルなどによる初期状態をセットしておき、その単語に対してのみ認識することで発話検出を行う。その際、認識された音声及びその後数秒間で得た雑音信号から利得調整を行う。 As described above, an initial state based on an acoustic model and a language model for a specific word group is set in advance, and speech detection is performed by recognizing only the word. At that time, gain adjustment is performed from the recognized speech and the noise signal obtained within a few seconds thereafter.

上述した処理が完了したら、携帯電話またはカーナビがユーザに対して認識環境が整った合図を送る。その後、ユーザは、通話または音声操作を目的として発話する（初期状態１１から更新状態１７に変更して発話する）。複数のマイクロホンに入力された信号に対して、利得補正処理１８、及び位相差や振幅差または独立性の情報を用いて雑音抑圧処理１９を施す。 When the processing described above is completed, the mobile phone or the car navigation system sends a signal indicating that the recognition environment has been prepared to the user. Thereafter, the user speaks for the purpose of a call or voice operation (changes from the initial state 11 to the update state 17 and speaks). A gain correction process 18 and a noise suppression process 19 are performed on the signals input to the plurality of microphones using the phase difference, amplitude difference, or independence information.

通話の場合、図４に示す利得補正処理１８、雑音抑圧処理１９を施した信号を送信すればよい（図１を参照）。また、音声操作の場合、図１に示すように、雑音抑圧された信号を音声認識特徴量へ変換８した信号をカーナビへ伝送し得る。音声認識特徴量に変換した信号は音声信号よりも情報量が小さいため、携帯電話からカーナビへの送信処理が早くなる。また、特徴量変換を携帯電話内で行い、ＷＥＢのサーバへ送信するようにしてもよい。サーバは事前に音響モデル及び言語モデルを保持し、送信された信号とマッチングを行うことで音声認識が可能となる。 In the case of a telephone call, a signal subjected to the gain correction process 18 and the noise suppression process 19 shown in FIG. 4 may be transmitted (see FIG. 1). In the case of voice operation, as shown in FIG. 1, a signal obtained by converting a noise-suppressed signal into a voice recognition feature value 8 can be transmitted to the car navigation system. Since the signal converted into the voice recognition feature amount has a smaller amount of information than the voice signal, transmission processing from the mobile phone to the car navigation system is accelerated. Further, the feature amount conversion may be performed within the mobile phone and transmitted to the WEB server. The server holds an acoustic model and a language model in advance, and speech recognition is possible by matching with the transmitted signal.

以上説明したように、本発明の実施形態の特徴は、マイクロホンアレーまたは対に配置されたマイクロホンを装着した携帯電話をユーザから離れた箇所に設置し、通話の際に発生する雑音混入を信号処理によって抑圧するものであり、また、カーナビゲーションシステムに対して、雑音抑圧が可能な携帯電話をマイクロホンとして用い、ハンズフリー音声操作を行う。その際、携帯電話が入力信号を音声認識特徴量へ変換する機能も含ませる構成とする。 As described above, the embodiment of the present invention is characterized in that a microphone array or a mobile phone equipped with a pair of microphones is installed at a location away from the user, and noise contamination that occurs during a call is signal processed. In addition, a mobile phone capable of noise suppression is used as a microphone for a car navigation system, and a hands-free voice operation is performed. At that time, the mobile phone is configured to include a function of converting an input signal into a voice recognition feature amount.

そして、本実施形態の具体的構成例としては、携帯電話１に複数のマイクロホン２を搭載し、スピーカ３から入力される回り込み音を抑圧するため、入力に対してエコーキャンセラ処理４を施す。通話または音声操作を始める前に、ユーザが特定の単語を入力することで発話音声検出５を行う。その単語が認識された場合、その入力で得た音声信号の平均利得値およびその発話以降数秒間に得た雑音信号の利得値を計算する。音声と雑音の利得値を基に利得補正処理６を行い、さらに雑音抑圧７を施すことにより、明瞭な音声信号を得る。通話の場合は音声信号を送信し、カーナビなどの操作を行う場合は、音声認識特徴量変換８を行った信号を送信する。 As a specific configuration example of the present embodiment, a plurality of microphones 2 are mounted on the mobile phone 1 and an echo canceller process 4 is performed on the input in order to suppress the wraparound sound input from the speaker 3. Before starting a telephone call or voice operation, the user inputs a specific word, and the speech voice detection 5 is performed. When the word is recognized, the average gain value of the voice signal obtained at the input and the gain value of the noise signal obtained several seconds after the utterance are calculated. A clear voice signal is obtained by performing gain correction processing 6 based on the gain values of voice and noise and further applying noise suppression 7. In the case of a telephone call, a voice signal is transmitted, and in the case where an operation such as car navigation is performed, a signal subjected to voice recognition feature amount conversion 8 is transmitted.

本発明の実施形態に係る携帯電話の構成を示すブロック図である。It is a block diagram which shows the structure of the mobile telephone which concerns on embodiment of this invention. 本実施形態に係る携帯電話のマイクロフォンの設置例を示す図である。It is a figure which shows the example of installation of the microphone of the mobile telephone which concerns on this embodiment. 本実施形態に関する携帯電話、カーナビ並びに発話者（例えば運転者）の車内での配置関係を示す図である。It is a figure which shows the arrangement | positioning relationship in the vehicle of the mobile telephone regarding this embodiment, a car navigation system, and a speaker (for example, driver | operator). 本実施形態に係る携帯電話における発話音声検出処理から雑音抑圧処理までの一連の処理を実施する構成例を示す図である。It is a figure which shows the structural example which implements a series of processes from the speech audio | voice detection process to a noise suppression process in the mobile telephone which concerns on this embodiment.

符号の説明Explanation of symbols

１携帯電話
２携帯電話に搭載するマイクロホン
３携帯電話のスピーカ
４エコーキャンセラ処理
５発話音声検出処理
６利得補正処理
７雑音抑圧処理
８音声認識特徴量変換処理
９カーナビ
１０設置された携帯電話
１１初期状態
１２特定単語認識状態
１３雑音区間検出処理
１４利得調整処理
１５認識環境切り替え処理
１６音声方位推定処理
１７更新状態
１８利得補正処理
１９雑音抑圧処理 DESCRIPTION OF SYMBOLS 1 Mobile phone 2 Microphone mounted in a mobile phone 3 Mobile phone speaker 4 Echo canceller processing 5 Speech detection processing 6 Gain correction processing 7 Noise suppression processing 8 Voice recognition feature value conversion processing 9 Car navigation 10 Installed mobile phone 11 Initial state 12 Specific word recognition state 13 Noise section detection processing 14 Gain adjustment processing 15 Recognition environment switching processing 16 Speech direction estimation processing 17 Update state 18 Gain correction processing 19 Noise suppression processing

Claims

車内で使用し得るハンズフリー機能をもつ携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、
前記複数マイクロホンから入力される各信号をもとにして、スピーカからの回り込み音を抑圧するエコーキャンセラ処理、発話音声と発話音声以外の音声・雑音とに対する利得補正処理と雑音抑圧処理を行い、発話音声を通話信号として送信する
ことを特徴とする携帯電話。 A mobile phone with a hands-free function that can be used in a car,
Installing a plurality of microphones on the mobile phone;
Based on the signals input from the multiple microphones, echo canceller processing to suppress the sneak sound from the speaker, gain correction processing and noise suppression processing for speech and noise other than speech speech, noise suppression processing, A mobile phone characterized by transmitting voice as a call signal.

請求項１において、
前記携帯電話をハンズフリー用マイクロホンとして機能させ、
車内に搭載されたカーナビゲーションシステムまたはＡＶシステム、もしくはＷＥＢサーバに対して音声操作を行えるインターフェースとして用いる
ことを特徴とする携帯電話。 In claim 1,
Allowing the mobile phone to function as a hands-free microphone;
A cellular phone characterized in that it is used as an interface for performing voice operations on a car navigation system or AV system installed in a vehicle, or a WEB server.

請求項２において、
前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムまたはサーバに送信する
ことを特徴とする携帯電話。 In claim 2,
A cellular phone characterized in that a speech recognition feature value conversion process is performed on the speech signal subjected to noise suppression processing to reduce the amount of information, and is transmitted as a voice operation signal to the system or server operated by voice. .

車内で使用し得るハンズフリー機能をもち、通話を可能とするととともにカーナビゲーションシステムまたはＡＶシステムに対して音声操作を可能とする携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、
前記通話または前記音声操作の初期状態において、特定の単語群の初期発話音声に対して予め指定された単一マイクロホンから出力を得て利得調整処理を行うとともに前記初期発話音声の方位推定を行い、
前記利得調整処理の完了後に、前記通話または前記音声操作のための発話音声に対する前記複数マイクロホンから出力される各信号、または前記方位推定信号をもとにして、エコーキャンセラ処理、利得補正処理、および雑音抑圧処理を行い、発話音声を通話信号として送信し、
前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムに送信する
ことを特徴とする携帯電話。 A mobile phone having a hands-free function that can be used in a car, enabling a call, and enabling voice operation to a car navigation system or an AV system,
Installing a plurality of microphones on the mobile phone;
In the initial state of the call or the voice operation, an output is obtained from a single microphone specified in advance for an initial utterance voice of a specific word group and a gain adjustment process is performed and a direction estimation of the initial utterance voice is performed,
After the completion of the gain adjustment process, an echo canceller process, a gain correction process, based on each signal output from the plurality of microphones with respect to the speech voice for the call or the voice operation, or the direction estimation signal, and Perform noise suppression processing, send the speech as a call signal,
A mobile phone characterized in that a speech recognition feature value conversion process is performed on the speech signal subjected to noise suppression processing to reduce the amount of information, and the information is transmitted as a voice operation signal to the voice-operated system.