JP2007194833A - Mobile phone with hands-free function - Google Patents

Mobile phone with hands-free function Download PDF

Info

Publication number
JP2007194833A
JP2007194833A JP2006009985A JP2006009985A JP2007194833A JP 2007194833 A JP2007194833 A JP 2007194833A JP 2006009985 A JP2006009985 A JP 2006009985A JP 2006009985 A JP2006009985 A JP 2006009985A JP 2007194833 A JP2007194833 A JP 2007194833A
Authority
JP
Japan
Prior art keywords
voice
mobile phone
signal
speech
microphones
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2006009985A
Other languages
Japanese (ja)
Inventor
Yasuaki Ohashi
靖明 大橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Priority to JP2006009985A priority Critical patent/JP2007194833A/en
Publication of JP2007194833A publication Critical patent/JP2007194833A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephone Function (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mobile phone with which a hands-free telephone call can be made, without putting on a headset microphone, the mobile phone serving as an audio input interface for a car navigation system. <P>SOLUTION: A plurality of microphones 2 are mounted on the mobile phone 1. Echo canceller processing 4 is performed for an input to suppress a sound repeatedly input from a speaker 3. Before starting a voice operation or telephone call, a user speaks a predetermined word, and then spoken speech detection 5 is performed. When the word is recognized, a mean gain value of a speech signal obtained by the input and the gain value of a noise signal obtained in several seconds after the speaking are calculated. Gain correction processing 6 is performed, based on both the gain values and noise suppression 7, is further carried out to obtain an articulate speech signal. For a telephone call, the voice signal is transmitted and for operations of the car navigation system, a signal, after voice recognition feature quantity conversion 8 is transmitted. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ハンズフリー通話を可能とし、さらにカーナビへの音声入力インターフェース機能をもつ携帯電話に関する。   The present invention relates to a mobile phone that enables a hands-free call and further has a voice input interface function to a car navigation system.

現在、車内での携帯電話の手持ちによる使用は、車両事故の発生が増加してしまうため禁止されている。しかし、車内での携帯電話の使用が強く望まれており、それに対してハンズフリーによる通話手段が種々提案されている。ここで、従来から提案されている車内でのハンズフリー通話の手段であるヘッドセットマイクロホンなどは、身体に装着するなどといった拘束が生じる。   Currently, the use of a mobile phone in a car by hand is prohibited because the occurrence of vehicle accidents increases. However, the use of a mobile phone in a vehicle is strongly desired, and various hands-free calling means have been proposed. Here, a restriction such as wearing a headset microphone or the like, which is a conventionally proposed means for hands-free calling in a vehicle, occurs.

また、カーナビゲーションシステム(以下、カーナビと略称する)等のインターフェースとして、音声操作機能が搭載されているが、認識性能が著しく低いために、実用性に課題がありあまり普及していないというのが実情である。   In addition, the voice operation function is installed as an interface of a car navigation system (hereinafter abbreviated as “car navigation”). However, the recognition performance is extremely low, so there is a problem in practicality and it is not widely used. It is a fact.

ここで、携帯電話をマイクロホンとしてカーナビなどの音声操作システムのインターフェースに用いることが、例えば特許文献1に示されており、ユーザが携帯電話に発話し、得られた音声信号が有線もしくは無線によって音声操作システムに送信されるようになっている。より詳しくは、遠隔操作モードを選択して移動電話機でキー操作を行うと、そのキー操作信号がカーナビゲーション装置に送信され、カーナビゲーション装置は、そのキー操作信号に応じた動作を行う。また、音声認識モードを選択し、使用者が音声を発すると、その音声は移動電話機のマイクロホンに入力されて、移動電話機から音声信号が送信され、カーナビゲーション装置は、その音声信号を音声認識部により認識し、その認識結果に応じた動作を行う。
特開2003−152884号公報
Here, using a mobile phone as a microphone for an interface of a voice operation system such as a car navigation system is disclosed in, for example, Patent Document 1, in which a user speaks to a mobile phone, and the obtained voice signal is voiced by wire or wirelessly. It is sent to the operation system. More specifically, when the remote operation mode is selected and a key operation is performed on the mobile telephone, the key operation signal is transmitted to the car navigation device, and the car navigation device performs an operation according to the key operation signal. Further, when the voice recognition mode is selected and the user utters a voice, the voice is input to the microphone of the mobile phone, and a voice signal is transmitted from the mobile phone. The car navigation device transmits the voice signal to the voice recognition unit. Is recognized, and an operation corresponding to the recognition result is performed.
JP 2003-152848 A

しかしながら、特許文献1を含めた従来技術においては、携帯電話はハンズフリーの機能を備えてはいるが、携帯電話には、通話元の音声だけでなく、通話先の音声(返答)、車内の音楽や音声、または背景雑音などが混入してしまい、通話元の音声とその他の音声や雑音などとの区別がつかないので、通話元音声と間違えて誤認識を生じ得るという課題が発生する。   However, in the prior art including Patent Document 1, the mobile phone has a hands-free function. However, the mobile phone has not only the voice of the caller but also the voice (response) of the callee, Since music, voice, background noise, and the like are mixed and the caller's voice cannot be distinguished from other voices, noise, etc., there arises a problem that erroneous recognition may occur due to mistaken caller's voice.

本発明の目的は、ハンズフリー通話を可能とし、さらにカーナビ等への音声入力インターフェース機能を備えた携帯電話であって、通話元の音源を識別し識別精度を向上させる携帯電話を提供することにある。   An object of the present invention is to provide a mobile phone that enables a hands-free call and further has a voice input interface function to a car navigation system, etc., that identifies a call source sound source and improves the identification accuracy. is there.

前記課題を解決するために、本発明は次のような構成を採用する。
車内で使用し得るハンズフリー機能をもつ携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、前記複数マイクロホンから入力される各信号をもとにして、スピーカからの回り込み音を抑圧するエコーキャンセラ処理、発話音声と発話音声以外の音声・雑音とに対する利得補正処理と雑音抑圧処理を行い、発話音声を通話信号として送信する構成とする。
In order to solve the above problems, the present invention adopts the following configuration.
A mobile phone with a hands-free function that can be used in a car,
A plurality of microphones are installed in the mobile phone, and an echo canceller process that suppresses a wraparound sound from a speaker based on each signal input from the plurality of microphones, for speech voice and voice / noise other than speech voice The configuration is such that the gain correction processing and the noise suppression processing are performed, and the speech is transmitted as a call signal.

また、前記携帯電話において、前記携帯電話をハンズフリー用マイクロホンとして機能させ、車内に搭載されたカーナビゲーションシステムまたはAVシステム、もしくはWEBサーバに対して音声操作を行えるインターフェースとして用いる構成とする。さらに、前記携帯電話において、前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムまたはサーバに送信する構成とする。   In the mobile phone, the mobile phone functions as a hands-free microphone and is used as an interface for performing voice operations on a car navigation system or an AV system installed in a vehicle or a WEB server. Further, in the cellular phone, a speech recognition feature value conversion process is performed on the speech signal subjected to the noise suppression process to reduce the amount of information, and the result is transmitted as a voice operation signal to the voice operated system or server. The configuration.

また、車内で使用し得るハンズフリー機能をもち、通話を可能とするとともにカーナビゲーションシステムまたはAVシステムに対して音声操作を可能とする携帯電話であって、前記携帯電話に複数のマイクロホンを設置し、前記通話または前記音声操作の初期状態において、特定の単語群の初期発話音声に対して予め指定された単一マイクロホンから出力を得て利得調整処理を行うとともに前記初期発話音声の方位推定を行い、前記利得調整処理の完了後に、前記通話または前記音声操作のための発話音声に対する前記複数マイクロホンから出力される各信号、または前記方位推定信号をもとにして、エコーキャンセラ処理、利得補正処理、および雑音抑圧処理を行い、発話音声を通話信号として送信し、前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムに送信する構成とする。   In addition, the mobile phone has a hands-free function that can be used in a vehicle, enables a telephone call, and enables voice operation with respect to a car navigation system or an AV system, and a plurality of microphones are installed on the mobile phone. In the initial state of the call or the voice operation, an output is obtained from a single microphone specified in advance for the initial speech of a specific word group, and gain adjustment processing is performed and the orientation of the initial speech is estimated. , After completion of the gain adjustment process, echo canceller process, gain correction process, based on each signal output from the plurality of microphones with respect to the speech voice for the call or the voice operation, or the direction estimation signal, And the noise-suppressed process, and the uttered voice is transmitted as a call signal, and the uttered voice subjected to the noise-suppressed process A structure in which to reduce the amount of information by performing a conversion process of the speech recognition feature quantity, and transmits to the system that is the voice operation as an audio operation signal to issue.

本発明によると、車内に設置された携帯電話は、ハンズフリー通話を可能とし、カーナビなどへの音声操作を可能とすることができる。その際、携帯電話に複数のマイクロホンを搭載し、信号処理を施すことによってユーザの発話音声の認識精度を向上させることができる。   According to the present invention, a mobile phone installed in a vehicle can make a hands-free call and can perform voice operations on a car navigation system or the like. In that case, the recognition accuracy of a user's uttered voice can be improved by mounting a plurality of microphones on a mobile phone and performing signal processing.

本発明の実施形態に係る携帯電話について、図1〜図4を参照しながら以下説明する。図1は本発明の実施形態に係る携帯電話の構成を示すブロック図である。図2は本実施形態に係る携帯電話のマイクロフォンの設置例を示す図である。図3は本実施形態に関する携帯電話、カーナビ並びに発話者(例えば運転者)の車内での配置関係を示す図である。図4は本実施形態に係る携帯電話における発話音声検出処理から雑音抑圧処理までの一連の処理を実施する構成例を示す図である。   A mobile phone according to an embodiment of the present invention will be described below with reference to FIGS. FIG. 1 is a block diagram showing a configuration of a mobile phone according to an embodiment of the present invention. FIG. 2 is a diagram showing an installation example of the microphone of the mobile phone according to the present embodiment. FIG. 3 is a diagram showing the arrangement relationship of the mobile phone, the car navigation system, and the speaker (for example, the driver) in the vehicle according to the present embodiment. FIG. 4 is a diagram showing a configuration example for carrying out a series of processes from a speech voice detection process to a noise suppression process in the mobile phone according to the present embodiment.

図面において、1は携帯電話、2は携帯電話に搭載するマイクロホン、3は携帯電話のスピーカ、4はエコーキャンセラ処理、5は発話音声検出処理、6は利得補正処理、7は雑音抑圧処理、8は音声認識特徴量変換処理、9はカーナビ、10は設置された携帯電話、11は初期状態、12は特定単語認識状態、13は雑音区間検出処理、14は利得調整処理、15は認識環境切り替え処理、16は音声方位推定処理、17は更新状態、18は利得補正処理、19は雑音抑圧処理、をそれぞれ表す。   In the drawings, 1 is a mobile phone, 2 is a microphone mounted on the mobile phone, 3 is a speaker of the mobile phone, 4 is an echo canceller process, 5 is a speech detection process, 6 is a gain correction process, 7 is a noise suppression process, 8 Is a voice recognition feature amount conversion process, 9 is a car navigation system, 10 is a mobile phone installed, 11 is an initial state, 12 is a specific word recognition state, 13 is a noise section detection process, 14 is a gain adjustment process, and 15 is a recognition environment switch. Processing, 16 represents voice direction estimation processing, 17 represents an update state, 18 represents gain correction processing, and 19 represents noise suppression processing.

図1において、本発明の実施形態においては、携帯電話1に複数のマイクロホン2…を搭載する。複数のマイクロホン2には、発話音声だけでなく、雑音やスピーカ3から出力される回り込み音も入力される。それらの音声や音の信号に対し、ハウリングが生じないようにエコーキャンセラ処理4を施し、その後、発話音声検出処理5を行う。ここで、通話相手の音声や車内で出力されている音楽やカーナビの応答音に対しては、車載のAV機器やカーナビのシステムから携帯電話に伝送し、エコーキャンセラ4によって抑圧する(公知の技術でエコーキャンセラを行う)。また、複数のマイクロホンを用いることで、これらのマイクロホンからの位相差や振幅差(公知のマイクロホンアレー信号処理技術を行う)、または独立性(公知のICA技術であり、各マイクロホンからの出力信号の独立成分分析の手法)に基づいて雑音を抑圧する。   In FIG. 1, in the embodiment of the present invention, a plurality of microphones 2. To the plurality of microphones 2, not only speech sound but also noise and sneak sound output from the speaker 3 are input. Echo canceller processing 4 is applied to those voices and sound signals so that howling does not occur, and then speech speech detection processing 5 is performed. Here, the voice of the other party, the music output in the car, and the response sound of the car navigation system are transmitted from the in-vehicle AV device or the car navigation system to the mobile phone and suppressed by the echo canceller 4 (known technology). Echo canceller in). Also, by using a plurality of microphones, the phase difference and amplitude difference from these microphones (performs a known microphone array signal processing technique) or independence (a known ICA technique, the output signal from each microphone Noise is suppressed based on an independent component analysis method.

図示するように、エコーキャンセラ4した後に発話音声が検出5された場合、利得補正処理6を行い、さらに雑音抑圧処理7を行う。携帯電話の用途が通話の場合はそのまま通話信号を送信するが、音声の操作であれば、音声認識特徴量変換処理8を施し、その音声操作信号を音声認識システム(例えば、カーナビまたはAV機器に搭載されるシステム)へ送信する。上述したように、背景雑音に対しては、複数のマイクロホン2を搭載し、マイクロホン間に入力された各信号の位相差や振幅差、また独立性等から雑音を抑圧することができ、従来技術におけるヘッドセット購入及び装着などの煩わしさがない。また、マイクロホン特性や入力音量によって信号が大きく変わることを考慮し、利得補正処理6(公知のAGCの技術を適用して)を行うことで、さらに快適な通話が可能となる。また、雑音抑圧処理などを追加することで、SN比及び認識精度が向上する。なお、複数のマイクロホンからの各出力信号ラインで上述した各種処理を行うこととする。   As shown in the figure, when the speech voice is detected 5 after the echo canceller 4, the gain correction process 6 is performed, and the noise suppression process 7 is further performed. When the use of the mobile phone is a call, the call signal is transmitted as it is. However, if it is a voice operation, a voice recognition feature value conversion process 8 is performed, and the voice operation signal is sent to a voice recognition system (for example, a car navigation system or an AV device). To the installed system). As described above, with respect to background noise, a plurality of microphones 2 are mounted, and noise can be suppressed from the phase difference, amplitude difference, independence, etc. of each signal input between the microphones. There is no hassle of purchasing and wearing a headset. Considering the fact that the signal varies greatly depending on the microphone characteristics and the input volume, performing a gain correction process 6 (using a well-known AGC technique) enables a more comfortable call. Further, by adding noise suppression processing or the like, the SN ratio and the recognition accuracy are improved. The various processes described above are performed on each output signal line from a plurality of microphones.

図2の点線枠は、図1に示すマイクロフォン2の搭載例を示す。マイクロホンアレイとは、配列されているマイクロホン間の入力位相差や振幅差を基に指向特性を形成する。したがって、図2に示すマイクロホンの配置によって、携帯電話から向かって水平に対する入力信号の指向特性を形成する。図2の左側には携帯電話を開けて使用する場合のマイクロホン配置例を示し、右側には携帯電話を閉じて車内に設置し使用する場合のマイクロホン配置例を示す。   A dotted frame in FIG. 2 shows an example of mounting the microphone 2 shown in FIG. The microphone array forms directivity characteristics based on input phase differences and amplitude differences between arranged microphones. Therefore, the directional characteristic of the input signal with respect to the horizontal direction from the mobile phone is formed by the arrangement of the microphones shown in FIG. The left side of FIG. 2 shows an example of microphone arrangement when the mobile phone is opened and used, and the right side shows an example of microphone arrangement when the mobile phone is closed and installed in the vehicle.

図3に、車内で携帯電話10を用いる場合の例を示す。携帯電話10はユーザ(例えば、運転者)の正面付近に設置し、初期状態から正面方位に指向特性を形成させておけば、その方位の信号を集音することが可能である。入力された音声信号は、有線またはBluetoothなどの無線を用いてカーナビ9に伝送される。   FIG. 3 shows an example in which the mobile phone 10 is used in a vehicle. If the mobile phone 10 is installed near the front of a user (for example, a driver) and directivity characteristics are formed in the front direction from the initial state, signals in that direction can be collected. The input audio signal is transmitted to the car navigation 9 using wireless communication such as wired or Bluetooth.

ユーザが通話を行う場合、携帯電話10に伝送された相手の音声をカーナビ9に伝送し、カーナビ9に接続されているスピーカより出力する。このように、携帯電話をユーザ(運転者)の正面あたりに固定すれば、目的音声方位がある程度分かっているため、その方位の音声を集音し、利得補正や雑音抑圧の処理を施した信号を送信すれば良い。具体的には、複数マイクロホンの中央のマイクロホンが発話音声に対して最も指向特性が合致していれば、中央のマイクロホン以外のマイクロホンからの信号ラインの利得補正と雑音抑圧を、中央マイクロホン信号ラインとの比較で、適宜に調節すればよい。   When the user makes a call, the other party's voice transmitted to the mobile phone 10 is transmitted to the car navigation 9 and output from a speaker connected to the car navigation 9. In this way, if the mobile phone is fixed to the front of the user (driver), the target voice direction is known to some extent, so the voice in that direction is collected and the signal that has been subjected to gain correction and noise suppression processing. Can be sent. Specifically, if the center microphone of the plurality of microphones has the best directional characteristics for the speech, the signal line gain correction and noise suppression from microphones other than the center microphone are performed with the center microphone signal line. In this comparison, it may be adjusted appropriately.

図4は、図1に示される発話音声検出処理5から雑音抑圧処理7までの処理に関する構成例を示す。まず、図4の初期状態11は、予め指定された単一マイクロホンの入力を用い、特定の単語群に対する音響モデル及び言語モデルを保持する状態である。少数のデータから成るモデルであれば、携帯電話のメモリ内に保管できるが、その単語群以外の音韻もデータベースとして保管する必要があるため、カーナビ9(携帯電話1と無線または有線で信号授受するシステム)で処理してもよい。また、特定の単語群は、車に付ける名前や掛け声などユーザが独自で作成することができるものである。   FIG. 4 shows a configuration example relating to the processing from the speech sound detection processing 5 to the noise suppression processing 7 shown in FIG. First, an initial state 11 in FIG. 4 is a state in which an acoustic model and a language model for a specific word group are held by using a single microphone input designated in advance. If the model is composed of a small number of data, it can be stored in the memory of the mobile phone, but the phonemes other than the word group also need to be stored as a database, so the car navigation 9 (transmits and receives signals from the mobile phone 1 wirelessly or by wire). System). Further, the specific word group can be created by the user such as a name or a shout given to the car.

特定単語認識状態12によって初期発話がマッチングされた場合、雑音区間検出処理13で数秒の間に雑音信号を得る。そして、初期発話の音声と雑音の信号から利得調整処理14を行う。これによって、雑音信号の利得をある程度抑えることが可能となる。また、認識環境切り替え処理15によって、特定の単語群に対する音響モデル及び言語モデルから複数の単語に対する音響モデル及び言語モデルへ切り替える。ここで、初期発話は、上述したように指定の単一マイクロホンで処理を行うが、複数のマイクロホンに入力されているため、その信号を用いて音声方位推定16を行うことが可能である(発話音声の方位を推定することで、複数マイクロホンの各入力信号ラインの利得補正を適宜に行える)。   When the initial utterance is matched by the specific word recognition state 12, a noise signal is obtained in a few seconds by the noise interval detection processing 13. Then, gain adjustment processing 14 is performed from the voice of the initial utterance and the noise signal. As a result, the gain of the noise signal can be suppressed to some extent. Also, the recognition environment switching process 15 switches the acoustic model and language model for a specific word group to the acoustic model and language model for a plurality of words. Here, the initial utterance is processed by the designated single microphone as described above, but since it is input to a plurality of microphones, it is possible to perform the voice azimuth estimation 16 using the signal (utterance). By estimating the direction of the voice, the gain correction of each input signal line of the plurality of microphones can be performed as appropriate).

このように、予め特定の単語群に対する音響モデル及び言語モデルなどによる初期状態をセットしておき、その単語に対してのみ認識することで発話検出を行う。その際、認識された音声及びその後数秒間で得た雑音信号から利得調整を行う。   As described above, an initial state based on an acoustic model and a language model for a specific word group is set in advance, and speech detection is performed by recognizing only the word. At that time, gain adjustment is performed from the recognized speech and the noise signal obtained within a few seconds thereafter.

上述した処理が完了したら、携帯電話またはカーナビがユーザに対して認識環境が整った合図を送る。その後、ユーザは、通話または音声操作を目的として発話する(初期状態11から更新状態17に変更して発話する)。複数のマイクロホンに入力された信号に対して、利得補正処理18、及び位相差や振幅差または独立性の情報を用いて雑音抑圧処理19を施す。   When the processing described above is completed, the mobile phone or the car navigation system sends a signal indicating that the recognition environment has been prepared to the user. Thereafter, the user speaks for the purpose of a call or voice operation (changes from the initial state 11 to the update state 17 and speaks). A gain correction process 18 and a noise suppression process 19 are performed on the signals input to the plurality of microphones using the phase difference, amplitude difference, or independence information.

通話の場合、図4に示す利得補正処理18、雑音抑圧処理19を施した信号を送信すればよい(図1を参照)。また、音声操作の場合、図1に示すように、雑音抑圧された信号を音声認識特徴量へ変換8した信号をカーナビへ伝送し得る。音声認識特徴量に変換した信号は音声信号よりも情報量が小さいため、携帯電話からカーナビへの送信処理が早くなる。また、特徴量変換を携帯電話内で行い、WEBのサーバへ送信するようにしてもよい。サーバは事前に音響モデル及び言語モデルを保持し、送信された信号とマッチングを行うことで音声認識が可能となる。   In the case of a telephone call, a signal subjected to the gain correction process 18 and the noise suppression process 19 shown in FIG. 4 may be transmitted (see FIG. 1). In the case of voice operation, as shown in FIG. 1, a signal obtained by converting a noise-suppressed signal into a voice recognition feature value 8 can be transmitted to the car navigation system. Since the signal converted into the voice recognition feature amount has a smaller amount of information than the voice signal, transmission processing from the mobile phone to the car navigation system is accelerated. Further, the feature amount conversion may be performed within the mobile phone and transmitted to the WEB server. The server holds an acoustic model and a language model in advance, and speech recognition is possible by matching with the transmitted signal.

以上説明したように、本発明の実施形態の特徴は、マイクロホンアレーまたは対に配置されたマイクロホンを装着した携帯電話をユーザから離れた箇所に設置し、通話の際に発生する雑音混入を信号処理によって抑圧するものであり、また、カーナビゲーションシステムに対して、雑音抑圧が可能な携帯電話をマイクロホンとして用い、ハンズフリー音声操作を行う。その際、携帯電話が入力信号を音声認識特徴量へ変換する機能も含ませる構成とする。   As described above, the embodiment of the present invention is characterized in that a microphone array or a mobile phone equipped with a pair of microphones is installed at a location away from the user, and noise contamination that occurs during a call is signal processed. In addition, a mobile phone capable of noise suppression is used as a microphone for a car navigation system, and a hands-free voice operation is performed. At that time, the mobile phone is configured to include a function of converting an input signal into a voice recognition feature amount.

そして、本実施形態の具体的構成例としては、携帯電話1に複数のマイクロホン2を搭載し、スピーカ3から入力される回り込み音を抑圧するため、入力に対してエコーキャンセラ処理4を施す。通話または音声操作を始める前に、ユーザが特定の単語を入力することで発話音声検出5を行う。その単語が認識された場合、その入力で得た音声信号の平均利得値およびその発話以降数秒間に得た雑音信号の利得値を計算する。音声と雑音の利得値を基に利得補正処理6を行い、さらに雑音抑圧7を施すことにより、明瞭な音声信号を得る。通話の場合は音声信号を送信し、カーナビなどの操作を行う場合は、音声認識特徴量変換8を行った信号を送信する。   As a specific configuration example of the present embodiment, a plurality of microphones 2 are mounted on the mobile phone 1 and an echo canceller process 4 is performed on the input in order to suppress the wraparound sound input from the speaker 3. Before starting a telephone call or voice operation, the user inputs a specific word, and the speech voice detection 5 is performed. When the word is recognized, the average gain value of the voice signal obtained at the input and the gain value of the noise signal obtained several seconds after the utterance are calculated. A clear voice signal is obtained by performing gain correction processing 6 based on the gain values of voice and noise and further applying noise suppression 7. In the case of a telephone call, a voice signal is transmitted, and in the case where an operation such as car navigation is performed, a signal subjected to voice recognition feature amount conversion 8 is transmitted.

本発明の実施形態に係る携帯電話の構成を示すブロック図である。It is a block diagram which shows the structure of the mobile telephone which concerns on embodiment of this invention. 本実施形態に係る携帯電話のマイクロフォンの設置例を示す図である。It is a figure which shows the example of installation of the microphone of the mobile telephone which concerns on this embodiment. 本実施形態に関する携帯電話、カーナビ並びに発話者(例えば運転者)の車内での配置関係を示す図である。It is a figure which shows the arrangement | positioning relationship in the vehicle of the mobile telephone regarding this embodiment, a car navigation system, and a speaker (for example, driver | operator). 本実施形態に係る携帯電話における発話音声検出処理から雑音抑圧処理までの一連の処理を実施する構成例を示す図である。It is a figure which shows the structural example which implements a series of processes from the speech audio | voice detection process to a noise suppression process in the mobile telephone which concerns on this embodiment.

符号の説明Explanation of symbols

1 携帯電話
2 携帯電話に搭載するマイクロホン
3 携帯電話のスピーカ
4 エコーキャンセラ処理
5 発話音声検出処理
6 利得補正処理
7 雑音抑圧処理
8 音声認識特徴量変換処理
9 カーナビ
10 設置された携帯電話
11 初期状態
12 特定単語認識状態
13 雑音区間検出処理
14 利得調整処理
15 認識環境切り替え処理
16 音声方位推定処理
17 更新状態
18 利得補正処理
19 雑音抑圧処理
DESCRIPTION OF SYMBOLS 1 Mobile phone 2 Microphone mounted in a mobile phone 3 Mobile phone speaker 4 Echo canceller processing 5 Speech detection processing 6 Gain correction processing 7 Noise suppression processing 8 Voice recognition feature value conversion processing 9 Car navigation 10 Installed mobile phone 11 Initial state 12 Specific word recognition state 13 Noise section detection processing 14 Gain adjustment processing 15 Recognition environment switching processing 16 Speech direction estimation processing 17 Update state 18 Gain correction processing 19 Noise suppression processing

Claims (4)

車内で使用し得るハンズフリー機能をもつ携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、
前記複数マイクロホンから入力される各信号をもとにして、スピーカからの回り込み音を抑圧するエコーキャンセラ処理、発話音声と発話音声以外の音声・雑音とに対する利得補正処理と雑音抑圧処理を行い、発話音声を通話信号として送信する
ことを特徴とする携帯電話。
A mobile phone with a hands-free function that can be used in a car,
Installing a plurality of microphones on the mobile phone;
Based on the signals input from the multiple microphones, echo canceller processing to suppress the sneak sound from the speaker, gain correction processing and noise suppression processing for speech and noise other than speech speech, noise suppression processing, A mobile phone characterized by transmitting voice as a call signal.
請求項1において、
前記携帯電話をハンズフリー用マイクロホンとして機能させ、
車内に搭載されたカーナビゲーションシステムまたはAVシステム、もしくはWEBサーバに対して音声操作を行えるインターフェースとして用いる
ことを特徴とする携帯電話。
In claim 1,
Allowing the mobile phone to function as a hands-free microphone;
A cellular phone characterized in that it is used as an interface for performing voice operations on a car navigation system or AV system installed in a vehicle, or a WEB server.
請求項2において、
前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムまたはサーバに送信する
ことを特徴とする携帯電話。
In claim 2,
A cellular phone characterized in that a speech recognition feature value conversion process is performed on the speech signal subjected to noise suppression processing to reduce the amount of information, and is transmitted as a voice operation signal to the system or server operated by voice. .
車内で使用し得るハンズフリー機能をもち、通話を可能とするととともにカーナビゲーションシステムまたはAVシステムに対して音声操作を可能とする携帯電話であって、
前記携帯電話に複数のマイクロホンを設置し、
前記通話または前記音声操作の初期状態において、特定の単語群の初期発話音声に対して予め指定された単一マイクロホンから出力を得て利得調整処理を行うとともに前記初期発話音声の方位推定を行い、
前記利得調整処理の完了後に、前記通話または前記音声操作のための発話音声に対する前記複数マイクロホンから出力される各信号、または前記方位推定信号をもとにして、エコーキャンセラ処理、利得補正処理、および雑音抑圧処理を行い、発話音声を通話信号として送信し、
前記雑音抑圧処理された発話音声信号に対して音声認識特徴量の変換処理を行って情報量を少なくし、音声操作信号として前記音声操作されるシステムに送信する
ことを特徴とする携帯電話。
A mobile phone having a hands-free function that can be used in a car, enabling a call, and enabling voice operation to a car navigation system or an AV system,
Installing a plurality of microphones on the mobile phone;
In the initial state of the call or the voice operation, an output is obtained from a single microphone specified in advance for an initial utterance voice of a specific word group and a gain adjustment process is performed and a direction estimation of the initial utterance voice is performed,
After the completion of the gain adjustment process, an echo canceller process, a gain correction process, based on each signal output from the plurality of microphones with respect to the speech voice for the call or the voice operation, or the direction estimation signal, and Perform noise suppression processing, send the speech as a call signal,
A mobile phone characterized in that a speech recognition feature value conversion process is performed on the speech signal subjected to noise suppression processing to reduce the amount of information, and the information is transmitted as a voice operation signal to the voice-operated system.
JP2006009985A 2006-01-18 2006-01-18 Mobile phone with hands-free function Pending JP2007194833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006009985A JP2007194833A (en) 2006-01-18 2006-01-18 Mobile phone with hands-free function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006009985A JP2007194833A (en) 2006-01-18 2006-01-18 Mobile phone with hands-free function

Publications (1)

Publication Number Publication Date
JP2007194833A true JP2007194833A (en) 2007-08-02

Family

ID=38450182

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006009985A Pending JP2007194833A (en) 2006-01-18 2006-01-18 Mobile phone with hands-free function

Country Status (1)

Country Link
JP (1) JP2007194833A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011227199A (en) * 2010-04-16 2011-11-10 Nec Casio Mobile Communications Ltd Noise suppression device, noise suppression method and program
KR20160031847A (en) * 2014-09-15 2016-03-23 현대모비스 주식회사 Method and apparatus for controlling hands-free device of vehicle

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011227199A (en) * 2010-04-16 2011-11-10 Nec Casio Mobile Communications Ltd Noise suppression device, noise suppression method and program
KR20160031847A (en) * 2014-09-15 2016-03-23 현대모비스 주식회사 Method and apparatus for controlling hands-free device of vehicle
KR102276538B1 (en) * 2014-09-15 2021-07-13 현대모비스 주식회사 Method and apparatus for controlling hands-free device of vehicle

Similar Documents

Publication Publication Date Title
EP1953735B1 (en) Voice control system and method for voice control
US10535362B2 (en) Speech enhancement for an electronic device
US10269369B2 (en) System and method of noise reduction for a mobile device
CN106782589B (en) Mobile terminal and voice input method and device thereof
JP4779748B2 (en) Voice input / output device for vehicle and program for voice input / output device
US5353376A (en) System and method for improved speech acquisition for hands-free voice telecommunication in a noisy environment
US8054990B2 (en) Method of recognizing speech from a plurality of speaking locations within a vehicle
US8738368B2 (en) Speech processing responsive to a determined active communication zone in a vehicle
KR100984528B1 (en) System and method for voice recognition in a distributed voice recognition system
US20030061036A1 (en) System and method for transmitting speech activity in a distributed voice recognition system
JP6635394B1 (en) Audio processing device and audio processing method
JP2004511823A (en) Dynamically reconfigurable speech recognition system and method
JP6545419B2 (en) Acoustic signal processing device, acoustic signal processing method, and hands-free communication device
US20170365249A1 (en) System and method of performing automatic speech recognition using end-pointing markers generated using accelerometer-based voice activity detector
WO2014143432A1 (en) Method and apparatus including parallel processes for voice recognition
JP2003511924A (en) Speech recognition technology based on local interrupt detection
KR20020071850A (en) Method and apparatus for processing an input speech signal during presentation of an output audio signal
EP1494208A1 (en) Method for controlling a speech dialog system and speech dialog system
KR20080107376A (en) Communication device having speaker independent speech recognition
US20140365212A1 (en) Receiver Intelligibility Enhancement System
EP1493993A1 (en) Method and device for controlling a speech dialog system
EP1357543A2 (en) Beamformer delay compensation during handsfree speech recognition
US20120197643A1 (en) Mapping obstruent speech energy to lower frequencies
JP2002524777A (en) Voice dialing method and system
JP2007194833A (en) Mobile phone with hands-free function

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20080220

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090417

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090421

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20090818