JP2004219728A - Speech recognition device - Google Patents

Speech recognition device Download PDF

Info

Publication number
JP2004219728A
JP2004219728A JP2003007158A JP2003007158A JP2004219728A JP 2004219728 A JP2004219728 A JP 2004219728A JP 2003007158 A JP2003007158 A JP 2003007158A JP 2003007158 A JP2003007158 A JP 2003007158A JP 2004219728 A JP2004219728 A JP 2004219728A
Authority
JP
Japan
Prior art keywords
signal
voice
dictionary
speech
speech recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP2003007158A
Other languages
Japanese (ja)
Inventor
Mikio Oda
幹夫 小田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2003007158A priority Critical patent/JP2004219728A/en
Publication of JP2004219728A publication Critical patent/JP2004219728A/en
Withdrawn legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device which improves a speech recognition rate and shorten a processing time by securely detecting a speech segmentation section. <P>SOLUTION: The speech recognition device is equipped with: a microphone which converts an inputted speech signal into an electric signal; an amplifier which amplifies the microphone output; an A/D converter which converts the speech signal amplified by the amplifier from an analog signal to a digital signal; a speech recognition main processing part which inputs the speech signal converted into the digital signal by the A/D converter; and a speech dictionary memory data part in which reference data for speech recognition are previously recorded. A plurality of dictionary selection talk switches as control signals for segmentation section detection of speech speaking and speech dictionary memory data part switching are inputted to the speech recognition main processing part and only a section, wherein a dictionary select talk switch is pressed is regarded as a speech segmentation section. Speech recognition is carried out by referring to only the speech dictionary memory data corresponding to the pressed dictionary select talk switch to improve the speech recognition rate and shorten the processing time. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【0001】
【発明の属する技術分野】
本発明は、音声認識装置において、音声認識率の向上及び、処理時間の高速化を図る音声認識装置に関するものである。
【0002】
【従来の技術】
昨今の音声認識技術は、デジタル信号処理技術の向上、処理LSIの高性能化、低価格化などにより、民生機器に数多く導入が図られており、機器の操作性向上に役立っている。
【0003】
しかし、民生機器では周囲会話、生活雑音の多い家庭内での音声入力となり、常に音声認識入力状態にしておくと、ハンズフリーでの操作となり、確かに便利であるが、単なる会話でも音声認識モードに入り、余分な制御をしてしまうケースがある。会話か、音声認識入力かの判断は現在の技術では切り分けできない。この課題を解決する音声認識装置が提案されている(例えば特許文献1参照。)。
【0004】
以下図6を参照しながら音声認識装置の一例について説明する。
図6において、符号38は電話用の通話マイク、30は音声信号を電気信号に変換するための送話部、32は公衆回線選択制御する回線制御部、37は公衆回線、31は電気信号で得られた通話相手の音声を音声信号に変換する受話部、39は電話用の通話スピーカ、40は音声認識マイク、34は音声認識部、33はマイクスイッチ、35は音声認識結果に基づいて音声合成、回線選択などの制御を行う制御部、36は認識結果や電話機の状態や情報を使用者に報知する音声合成部、41は音声合成スピーカである。
【0005】
以上のように構成された音声認識装置について、その動作を説明する。図6は電話に音声認識技術を利用した例であり、使用者が音声認識を利用して機器を制御する場合、まず、図6で構成された音声認識装置のマイクスイッチ33をオンし、音声認識マイクに向かって所定の発声を行う。発声単語として、登録してある相手先名、電話の制御単語名などがあげられる。所定の発声が終了すると、音声認識部34で認識処理を行い、認識結果を制御部35に転送し、音声合成部36、音声合成スピーカ41で認識結果を音声で知らしめる。使用者は認識結果が間違っていたら再度発声を繰り返し、正解になるまで音声入力を続け、正解ならば、マイクスイッチ33をオフとする。発声単語が相手先名であれば、制御部35から回線制御部32へ制御信号を送り、ダイヤル自動送出し、音声認識による電話回線接続が可能となる。つまり、音声認識させる場合のみマイクスイッチ33をオン状態とし、音声認識処理が終了するとマイクスイッチ33はオフ状態とすることで、一連の音声認識処理を完了する。
【0006】
【特許文献1】
特開平3−52442号公報
【0007】
【発明が解決しようとする課題】
しかしながら前記従来の構成では、家庭内の周囲会話、生活雑音に対する誤動作対策には一番簡単で実現容易であるが、音声認識率向上への対策としては不十分である。つまり民生機器では、不特定話者を音声入力として認識しなければならず、話者は色々な話し方で入力する。例えば、早口でしゃべる人、低音の男性、高音の女性、子供など、様々な特徴がある。また家庭内AV機器、電化商品など全ての制御を音声認識で行う時などは発声単語数が多くなり、高い音声認識率を実現し、かつ認識処理の高速化には課題も多い。
【0008】
【課題を解決するための手段】
前記課題を解決するために本発明の音声認識装置は、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れることを特徴としたものである。
【0009】
本発明によれば、音声認識の発声を行う時は、辞書選択トークスイッチを押し、周辺の雑音に影響されることなく、辞書選択トークスイッチの押された時間のみ、音声認識の音声切り出し区間として音声認識のデータとすると共に、複数の辞書選択トークスイッチの内、選択された辞書選択トークスイッチに対応した音声辞書メモリーデータのみを選択参照するので、参照単語数の減少になり、音声認識率の向上及び、処理時間の高速化が図れる音声認識装置を提供することが可能となる。
【0010】
【発明の実施の形態】
本発明の請求項1に記載された音声認識装置は、入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するA/D変換器と、前記A/D変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部と、音声発声の切り出し区間検出信号と前記音声辞書メモリーデータ部の切り換え信号を生成して前記音声認識主処理部に出力する辞書選択トークスイッチを具備する構成とすることで、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識を実現しうるものである。
【0011】
つぎに、本発明の請求項2に記載された音声認識装置は、入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号を変調する変調器と前記変調した音声信号を無線で送信する送信出力器と、音声発声の切り出し区間検出信号と辞書切り換え信号を生成する辞書選択トークスイッチと、前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部からなる音声送信部を具備すると共に、前記音声送信部から送信された音声信号を受信する受信器と、前記受信器出力信号の検波並びに搬送波検出信号を行う複数の検波器からなる検波部と、前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチと、前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するA/D変換器と、前記A/D変換器によりデジタル信号に変換された音声信号と前記搬送波検出信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部からなる音声受信認識部とを具備した構成とすることで、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識を実現しうるものである。
【0012】
以下本発明の実施の形態について、図1から図5を用いて説明する。
【0013】
(実施の形態1)
以下に、本発明の請求項1に記載された発明の実施の形態について、図1、図2、図3を用いて説明する。
【0014】
図1は、本発明の一実施例における音声認識装置のブロック構成図を示す。図1において、符号1は入力された音声信号を電気信号に変換するマイク、2は前記マイク出力を増幅する増幅器、3は前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するA/D変換器、4は前記A/D変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部、5は音声認識の参照データとして予め記録されている音声辞書メモリーデータ部、6は音声発声の切り出し区間検出と前記音声辞書メモリーデータ切り換えの制御信号とする複数の辞書選択トークスイッチである。
【0015】
以上のように構成された音声認識装置について、その動作を説明する。音声認識でTVやVTRなどの機器を動作制御する場合、使用者はまず、辞書選択トークスイッチ6を押し続けて、音声認識に登録されている単語を発声する。この辞書選択トークスイッチ6は複数のスイッチからなるもので、説明の為、2個あると仮定する。一個は男声用辞書選択トークスイッチ、もう一個は女声用辞書選択トークスイッチとする。使用者がもし、男声であるとすると、男声用の辞書選択トークスイッチを押して、所定の発声を行う。発声した音声信号は増幅器2で増幅され、アナログ信号からデジタル信号に変換するA/D変換器3を通して音声認識主処理部4に音声データが入力されると共に、辞書選択トークスイッチ6は音声認識主処理部4に接続されており、押した辞書選択トークスイッチの番号と、押してる区間を検出する。使用者が所定の発声を終了すると、辞書選択トークスイッチ6を離し、音声認識主処理部4は音声データの収集を終了する。つまり辞書選択トークスイッチ6の押されている区間のみ音声データ区間として切り出す。音声認識主処理部4は通常マイクロコンピュータやDSP(デジタルシグナルプロセッサ)が使用されており、入力された音声データを信号分析する。分析されたデータは音声認識メモリーデータ部5に予め登録してある単語辞書と参照される。参照される音声認識メモリーデータは、使用者が辞書選択トークスイッチ6の内、どのスイッチを使用したかの番号情報をもとに、音声認識メモリーデータ部5の内、辞書選択トークスイッチ番号に対応した辞書メモリー番号が選択され、この仮定の場合、男声用の辞書データを参照して認識処理がなされる。これらの一連の処理の流れを説明したのが図2に示すフローチャートである。
【0016】
通常、音声認識を行う場合の辞書作成基礎データとして、男声と女声のデータをサンプリングし、合成周波数データとして辞書メモリーデータの登録を行うが、図3に示すごとく、男声は低音側、女声は高音側となり、これを同一辞書にしてしまうと、やや低い男声、やや高い女声の認識率はチューニング範囲が広すぎるため、認識率が低下することが知られている。これを改善する為に、予め男声用の辞書データ、女性用の辞書データと切り分けて登録し、参照すれば認識率は向上する。これを辞書選択トークスイッチ6を押す時に、同時に使用者が指定しておけば、音声発声切り出し区間の確実さと共に、認識率向上につながる。
【0017】
一方法として男声、女声の辞書データ例で、複数の辞書切り換えスイッチの応用を説明したが、他に家電機器毎の辞書選択スイッチとする場合なども考えられる。この場合は、各家電機器の単語を集めた単語辞書とすると膨大になるが、機器毎の単語辞書にしてしまえば、機器内の認識単語のみの参照となり、認識率の向上、処理の高速化を図れる音声認識が実現できる。
【0018】
(実施の形態2)
つぎに、本発明の請求項2に記載された発明の実施の形態について、図4、図5を用いて説明する。
【0019】
図4は、本発明の一実施例における音声認識装置の音声送信機及び受信・認識部のブロック構成図を示す。図4において、符号10は音声送信部、11は入力された音声信号を電気信号に変換するマイク、12は前記マイク出力を増幅する増幅器、13は前記増幅器で増幅された音声信号を変調する変調器、14は前記変調した音声信号を無線で送信する送信出力器、15は音声発声の切り出し区間検出と辞書切り換えの制御信号とする辞書選択トークスイッチ、16は前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部、17は前記音声送信部で送信された音声無線信号を受信検波し、認識処理する音声受信認識部、18は前記音声送信部から送信された音声信号を受信する受信器、19は前記受信器出力信号を検波する複数の検波器からなる検波部、20は前記受信器で受信した搬送波を音声検波し、音声信号と搬送波検出信号を出力する検波器、21は前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチ、22は前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するA/D変換器、23は前記A/D変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部、24は音声認識の参照データとして予め記録されている音声辞書メモリーデータ部である。
【0020】
以上のように構成された音声認識装置について、その動作を説明する。音声認識でTVやVTRなどの機器を動作制御する場合、リモコンの普及により機器本体から離れて操作することが多い。その為、機器本体から離れて操作する音声送信部10で、音声信号を無線で伝送し、本体に設けた音声受信認識部17で音声検波、認識処理ができる構成としている。使用者はまず、音声送信部10に設けた辞書選択トークスイッチ15を押し続けて、音声認識に登録されている単語を発声する。この辞書選択トークスイッチ15は複数のスイッチからなるもので、実施の形態1と同様、説明の為、2個あると仮定する。一個は男声用辞書選択トークスイッチ、もう一個は女声用辞書選択トークスイッチとする。使用者がもし、男声であるとすると、男声用の辞書選択トークスイッチを押して、所定の発声を行う。発声した音声信号は増幅器12で増幅され、変調器13で変調される。普通変調にはFM変調などが用いられ、簡単な回路、ICで実現可能である。一方辞書選択トークスイッチ15は搬送波周波数切り換え部16に入力され、トークスイッチ番号に応じて、前記変調器13の搬送波周波数を切り換える。この様にして変調された音声信号は赤外線、または電波により、機器本体に無線伝送される。機器本体では、赤外線、または電波を受信する受信部18で受信し、辞書選択トークスイッチ15の数量と同数の複数の検波器20で構成された検波部19で各搬送波の検波が可能な構成となっており、検波された音声信号は音声信号選択スイッチ21で選択され、アナログ信号からデジタル信号に変換するA/D変換器22を通して音声認識主処理部23に音声データが入力される。検波器20からは検波された音声信号を出力すると同時に、搬送波検出信号を出力し、音声主処理部23に入力されており、検波した検波器20の番号と、搬送波の有無の区間を検出する。搬送波検出番号に応じて音声認識主処理部23は前記音声信号選択スイッチ21を切り換え制御する。使用者が所定の発声を終了すると、音声送信部10の辞書選択トークスイッチ15を離し、検波器20は搬送波検出信号終了を、音声認識主処理部23に入力し、音声データの収集を終了する。つまり音声送信部10に設けた辞書選択トークスイッチ15の押されている区間のみ音声発声区間として切り出す。音声認識主処理部23は、通常マイクロコンピュータやDSP(デジタルシグナルプロセッサ)が使用されており、入力された音声データを信号分析する。分析されたデータは音声認識メモリーデータ部24に予め登録してある単語辞書と参照される。参照される音声認識メモリーデータは、使用者が辞書選択トークスイッチ15の内、どのスイッチを使用したかの番号情報をもとに、音声認識メモリーデータ部24の内、辞書選択トークスイッチ番号に対応した辞書メモリー番号が選択され、この仮定の場合、男声用の辞書データを参照して認識処理がなされる。これらの一連の処理の流れを説明したのが図5に示すフローチャートである。
【0021】
男声、女声の辞書データに分けて説明したのは、実施の形態1と同じ理由であり、予め男声用の辞書データ、女性用の辞書データと切り分けて登録し、参照すれば認識率は向上する。これを辞書選択トークスイッチ15を押す時に、同時に使用者が指定しておけば、切り出しデータの確実さと共に、認識率向上につながる。また一方法として男声、女声の辞書データ例で、複数の辞書切り換えスイッチの応用を説明したが、他に家電機器毎の辞書選択スイッチとする場合なども考えられる。この場合は、各家電機器の単語を集めた単語辞書とすると膨大になるが、機器毎の単語辞書にしてしまえば、機器内の認識単語のみの参照となり、認識率の向上、処理の高速化を図れる音声認識が実現できることは実施の形態1と同様であり、無線音声送信で遠隔操作ができる方法が実施の形態2である。
【0022】
【発明の効果】
以上のように、本発明の音声認識装置は、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識装置を提供することが可能となる。
【図面の簡単な説明】
【図1】本発明の第1の実施の形態における音声認識装置のブロック構成図
【図2】図1に示す音声認識装置の処理の流れを説明するフローチャート
【図3】男声、女声用の辞書切り換え例を説明する男声、女声周波数スペクトル帯域を示す図
【図4】本発明の第2の実施の形態における音声認識装置の音声送信部及び音声受信・認識部のブロック構成図
【図5】図4に示す音声認識装置の処理の流れを説明するフローチャート
【図6】従来の実施例の音声認識装置のブロック構成図
【符号の説明】
1 マイク
2 増幅器
3 A/D変換器
4 音声認識主処理部
5 音声認識メモリーデータ部
6 辞書選択トークスイッチ
10 音声送信部
11 マイク
12 増幅器
13 変調器
14 送信出力器
15 辞書選択トークスイッチ
16 搬送波周波数切り換え部
17 音声受信認識部
18 受信器
19 検波部
20 検波器
21 音声信号選択スイッチ
22 A/D変換器
23 音声認識主処理部
24 音声辞書メモリーデータ部
30 送話部
31 受話部
32 回線制御部
33 マイクスイッチ
34 音声認識部
35 制御部
36 音声合成部
37 公衆回線
38 通話マイク
39 通話スピーカ
40 音声認識マイク
41 音声合成スピーカ
[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition device that improves the speech recognition rate and shortens the processing time in a speech recognition device.
[0002]
[Prior art]
Recently, many speech recognition technologies have been introduced into consumer devices due to improvements in digital signal processing technology, higher performance of processing LSIs, lower prices, and the like, which have contributed to improved operability of devices.
[0003]
However, in the case of consumer equipment, ambient conversation and voice input in the home where there is a lot of noise in the living room, and if the voice recognition input state is always set, the operation is hands-free, and it is certainly convenient. In some cases, extra control is performed. Judgment of conversation or voice recognition input cannot be distinguished by current technology. A speech recognition device that solves this problem has been proposed (for example, see Patent Document 1).
[0004]
Hereinafter, an example of the speech recognition device will be described with reference to FIG.
In FIG. 6, reference numeral 38 denotes a telephone call microphone, 30 denotes a transmitting unit for converting a voice signal into an electric signal, 32 denotes a line control unit for selecting and controlling a public line, 37 denotes a public line, and 31 denotes an electric signal. A receiver for converting the obtained voice of the other party into a voice signal, 39 is a telephone call speaker, 40 is a voice recognition microphone, 34 is a voice recognition unit, 33 is a microphone switch, and 35 is a voice based on the voice recognition result. A control unit that controls synthesis, line selection, and the like, a voice synthesis unit 36 that notifies the user of the recognition result, the state of the telephone, and information, and a voice synthesis speaker 41.
[0005]
The operation of the speech recognition device configured as described above will be described. FIG. 6 shows an example in which speech recognition technology is used for a telephone. When a user controls a device using speech recognition, first, the microphone switch 33 of the speech recognition device shown in FIG. A predetermined utterance is made toward the recognition microphone. Examples of the utterance word include a registered destination name, a telephone control word name, and the like. When the predetermined utterance is completed, the speech recognition unit 34 performs a recognition process, transfers the recognition result to the control unit 35, and notifies the speech synthesis unit 36 and the speech synthesis speaker 41 of the recognition result by voice. If the recognition result is incorrect, the user repeats the utterance again and continues to input voice until the correct answer is obtained. If the answer is correct, the microphone switch 33 is turned off. If the uttered word is the name of the other party, a control signal is sent from the control unit 35 to the line control unit 32, the dial is automatically transmitted, and the telephone line can be connected by voice recognition. That is, the microphone switch 33 is turned on only when performing voice recognition, and the microphone switch 33 is turned off when the voice recognition processing is completed, thereby completing a series of voice recognition processing.
[0006]
[Patent Document 1]
JP-A-3-52442
[Problems to be solved by the invention]
However, the above-described conventional configuration is the simplest and simplest measure for countermeasures against erroneous conversations around households and living noise, but is insufficient as a measure for improving the voice recognition rate. That is, the consumer device must recognize an unspecified speaker as voice input, and the speaker inputs in various ways. For example, there are various characteristics such as a person who speaks fast, a male with a low pitch, a female with a high pitch, and a child. Also, when all controls such as home AV equipment and electric appliances are controlled by voice recognition, the number of words to be uttered becomes large, and a high voice recognition rate is realized, and there are many problems in speeding up the recognition processing.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, a voice recognition device according to the present invention comprises a voice dictionary memory corresponding to a dictionary selection talk switch, as well as a voice cutout section for voice recognition only when a dictionary selection talk switch is pressed. The speech recognition is performed by referring to the data, whereby the speech recognition rate can be improved and the processing time can be shortened.
[0009]
According to the present invention, when uttering voice recognition, the dictionary selection talk switch is pressed, and the time period during which the dictionary selection talk switch is pressed is set as the voice cutout section for voice recognition without being affected by surrounding noise. In addition to the voice recognition data, only the voice dictionary memory data corresponding to the selected dictionary selection talk switch among the plurality of dictionary selection talk switches is selected and referenced, so that the number of reference words is reduced and the voice recognition rate is reduced. It is possible to provide a speech recognition device capable of improving the processing time and shortening the processing time.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
A voice recognition device according to claim 1 of the present invention includes a microphone that converts an input voice signal into an electrical signal, an amplifier that amplifies the microphone output, and a voice signal that is amplified by the amplifier from an analog signal. An A / D converter for converting into a digital signal, a speech recognition main processing unit which receives the speech signal converted into a digital signal by the A / D converter, and speech pre-recorded as reference data for speech recognition The dictionary includes a dictionary memory data unit, a dictionary selection talk switch that generates a voice utterance cutout section detection signal and a switching signal for the voice dictionary memory data unit, and outputs the signal to the voice recognition main processing unit. Only when the selected talk switch is pressed, the voice recognition data is used as the voice cutout section for voice recognition, and the dictionary selection talk switch is supported. By voice recognition with reference to speech dictionary memory data, improvement of speech recognition rate and one in which high-speed processing time can achieve speech recognition attained.
[0011]
Next, the voice recognition device according to claim 2 of the present invention includes a microphone that converts an input voice signal into an electric signal, an amplifier that amplifies the microphone output, and a voice signal amplified by the amplifier. A modulator that modulates and a transmission output device that wirelessly transmits the modulated audio signal, a dictionary selection talk switch that generates a cutout section detection signal of a voice utterance and a dictionary switching signal, and a dictionary selection talk switch that controls the modulator by the dictionary selection talk switch. An audio transmission unit comprising a carrier frequency switching unit for switching a carrier frequency is provided, and a receiver for receiving an audio signal transmitted from the audio transmission unit, and a plurality of units for detecting the output signal of the receiver and performing a carrier detection signal. A detection unit including a detector, and a carrier wave controlled by a dictionary selection talk switch among audio signals detected and output by the plurality of detectors. An audio signal selection switch for selecting an adjusted and detected audio signal, an A / D converter for converting an audio analog signal selected by the audio signal selection switch into a digital signal, and a digital signal by the A / D converter A speech recognition main processing unit that receives the converted speech signal and the carrier detection signal as input, and a configuration including a speech reception recognition unit that includes a speech dictionary memory data unit that is recorded in advance as speech recognition reference data. By doing so, only the time during which the dictionary selection talk switch is pressed is used as voice recognition data as a voice cutout section for voice recognition, and voice recognition is performed by referring to voice dictionary memory data corresponding to the dictionary selection talk switch. It is possible to realize speech recognition that can improve the speech recognition rate and shorten the processing time.
[0012]
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 5.
[0013]
(Embodiment 1)
Hereinafter, an embodiment of the present invention described in claim 1 of the present invention will be described with reference to FIGS. 1, 2, and 3. FIG.
[0014]
FIG. 1 is a block diagram showing a speech recognition apparatus according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a microphone for converting an input audio signal into an electric signal, 2 an amplifier for amplifying the microphone output, and 3 an A for converting the audio signal amplified by the amplifier from an analog signal to a digital signal. A / D converter 4 is a voice recognition main processing unit which receives a voice signal converted into a digital signal by the A / D converter, and 5 is a voice dictionary memory data unit recorded in advance as voice recognition reference data. Reference numerals 6 denote a plurality of dictionary selection talk switches which are used as control signals for detecting a voice utterance cut-out section and switching the voice dictionary memory data.
[0015]
The operation of the speech recognition device configured as described above will be described. When controlling the operation of a device such as a TV or VTR by voice recognition, first, the user keeps pressing the dictionary selection talk switch 6 to utter a word registered in the voice recognition. The dictionary selection talk switch 6 is composed of a plurality of switches, and for the sake of explanation, it is assumed that there are two switches. One is a male voice dictionary select talk switch, and the other is a female voice dictionary select talk switch. If the user has a male voice, the user presses the male voice dictionary selection talk switch to make a predetermined utterance. The uttered voice signal is amplified by the amplifier 2, and voice data is input to the voice recognition main processing unit 4 through an A / D converter 3 for converting an analog signal to a digital signal. It is connected to the processing unit 4 and detects the number of the pressed dictionary selection talk switch and the pressed section. When the user finishes the predetermined utterance, the user releases the dictionary selection talk switch 6, and the voice recognition main processing unit 4 ends the collection of the voice data. That is, only the section in which the dictionary selection talk switch 6 is pressed is cut out as the audio data section. The voice recognition main processing unit 4 usually uses a microcomputer or a DSP (Digital Signal Processor), and performs signal analysis on the input voice data. The analyzed data is referred to a word dictionary registered in advance in the voice recognition memory data unit 5. The voice recognition memory data to be referred to corresponds to the dictionary selection talk switch number in the voice recognition memory data section 5 based on the number information of which of the dictionary selection talk switches 6 the user has used. The selected dictionary memory number is selected. In this case, recognition processing is performed with reference to the male voice dictionary data. FIG. 2 is a flowchart illustrating the flow of a series of these processes.
[0016]
Normally, male and female voice data are sampled as dictionary creation basic data for speech recognition, and dictionary memory data is registered as synthesized frequency data. As shown in FIG. 3, male voices are on the low tone side and female voices are on the high tone side. It is known that, if the same dictionary is used, the recognition rate of a slightly lower male voice and a slightly higher female voice is lowered because the tuning range is too wide. In order to improve this, the dictionary data for a male voice and the dictionary data for a female are separated and registered in advance, and if they are referred to, the recognition rate is improved. If the user specifies this when the dictionary selection talk switch 6 is pressed at the same time, the recognition rate is improved as well as the reliability of the voice utterance cutout section.
[0017]
As one method, the application of a plurality of dictionary change switches has been described in the case of male and female voice dictionary data, but a dictionary selection switch for each household electric appliance may be used. In this case, it would be enormous if the word dictionary for each home appliance was collected, but if it was a word dictionary for each device, only the recognized words in the device would be referenced, improving the recognition rate and speeding up processing. Can be realized.
[0018]
(Embodiment 2)
Next, an embodiment of the invention described in claim 2 of the present invention will be described with reference to FIGS.
[0019]
FIG. 4 is a block diagram of a voice transmitter and a receiving / recognizing unit of the voice recognition device according to one embodiment of the present invention. In FIG. 4, reference numeral 10 denotes an audio transmitting unit, 11 denotes a microphone for converting an input audio signal into an electric signal, 12 denotes an amplifier for amplifying the microphone output, and 13 denotes a modulation for modulating the audio signal amplified by the amplifier. 14 is a transmission output device for wirelessly transmitting the modulated voice signal, 15 is a dictionary selection talk switch which is a control signal for detecting a cut-out section of voice utterance and dictionary switching, and 16 is the modulator by the dictionary selection talk switch. A carrier frequency switching unit for switching the carrier frequency of the received signal; a voice reception signal recognition unit for receiving and detecting the voice radio signal transmitted by the voice transmission unit; and a voice reception recognition unit for performing a recognition process; , A detector 19 comprising a plurality of detectors for detecting the output signal of the receiver, and 20 a voice detector for the carrier received by the receiver. A detector that outputs an audio signal and a carrier detection signal; and 21 selects, from among the audio signals detected and output by the plurality of detectors, an audio signal modulated and detected by a carrier controlled by a dictionary selection talk switch. An A / D converter for converting the audio analog signal selected by the audio signal selection switch into a digital signal; and 23 an audio signal converted into a digital signal by the A / D converter. A voice recognition main processing unit 24 to be input is a voice dictionary memory data unit recorded in advance as reference data for voice recognition.
[0020]
The operation of the speech recognition device configured as described above will be described. When controlling the operation of a device such as a TV or a VTR by voice recognition, the remote control is often used to operate the device away from the device body. Therefore, the audio signal is transmitted wirelessly by the audio transmission unit 10 that is operated away from the device main body, and the audio detection and recognition processing can be performed by the audio reception recognition unit 17 provided in the main body. First, the user keeps pressing the dictionary selection talk switch 15 provided in the voice transmission unit 10 to utter a word registered for voice recognition. This dictionary selection talk switch 15 is composed of a plurality of switches, and it is assumed that there are two switches for the sake of explanation, as in the first embodiment. One is a male voice dictionary select talk switch, and the other is a female voice dictionary select talk switch. If the user has a male voice, the user presses the male voice dictionary selection talk switch to make a predetermined utterance. The uttered voice signal is amplified by the amplifier 12 and modulated by the modulator 13. Normal modulation uses FM modulation or the like, and can be realized by a simple circuit or IC. On the other hand, the dictionary selection talk switch 15 is input to the carrier frequency switching unit 16 and switches the carrier frequency of the modulator 13 according to the talk switch number. The audio signal thus modulated is wirelessly transmitted to the device body by infrared rays or radio waves. In the main body of the apparatus, a receiving unit 18 that receives infrared rays or radio waves receives a signal, and a detection unit 19 including a plurality of detectors 20 in the same number as the number of dictionary selection talk switches 15 can detect each carrier. The detected audio signal is selected by an audio signal selection switch 21, and audio data is input to an audio recognition main processing unit 23 through an A / D converter 22 that converts an analog signal into a digital signal. The detector 20 outputs a detected audio signal and, at the same time, outputs a carrier detection signal. The carrier detection signal is input to the audio main processing unit 23 and detects the number of the detected detector 20 and a section indicating whether or not a carrier exists. . The voice recognition main processing unit 23 controls switching of the voice signal selection switch 21 according to the carrier detection number. When the user finishes the predetermined utterance, the user releases the dictionary selection talk switch 15 of the voice transmission unit 10, and the detector 20 inputs the end of the carrier detection signal to the voice recognition main processing unit 23, and ends the collection of the voice data. . That is, only a section in which the dictionary selection talk switch 15 provided in the voice transmission unit 10 is pressed is cut out as a voice utterance section. The voice recognition main processing unit 23 usually uses a microcomputer or a DSP (Digital Signal Processor), and performs signal analysis on the input voice data. The analyzed data is referred to a word dictionary registered in advance in the voice recognition memory data unit 24. The voice recognition memory data to be referred to corresponds to the dictionary selection talk switch number in the voice recognition memory data section 24 based on the number information of which of the dictionary selection talk switches 15 the user has used. The selected dictionary memory number is selected. In this case, recognition processing is performed with reference to the male voice dictionary data. FIG. 5 is a flowchart illustrating the flow of a series of these processes.
[0021]
The reason why the description is made separately for the male and female voice dictionary data is the same as in the first embodiment. The recognition rate is improved by separating and registering the male voice and female dictionary data in advance and referring to them. . If the user specifies this when the dictionary selection talk switch 15 is pressed at the same time, the recognition rate is improved as well as the reliability of the cut-out data. Further, as one method, the application of a plurality of dictionary change switches has been described in the example of dictionary data of male and female voices, but a dictionary select switch for each home appliance may be used. In this case, it would be enormous if the word dictionary for each home appliance was collected, but if it was a word dictionary for each device, only the recognized words in the device would be referenced, improving the recognition rate and speeding up processing. The embodiment 2 is capable of realizing voice recognition, and the second embodiment is a method of performing remote control by wireless voice transmission.
[0022]
【The invention's effect】
As described above, the voice recognition device of the present invention uses the voice recognition data as the voice cutout section for voice recognition only during the time when the dictionary selection talk switch is pressed, and stores the voice dictionary memory data corresponding to the dictionary selection talk switch. By referring to and recognizing speech, it is possible to provide a speech recognition device capable of improving the speech recognition rate and shortening the processing time.
[Brief description of the drawings]
FIG. 1 is a block diagram of a speech recognition apparatus according to a first embodiment of the present invention; FIG. 2 is a flowchart illustrating a processing flow of the speech recognition apparatus shown in FIG. 1; FIG. FIG. 4 is a diagram illustrating frequency bands of male and female voices for explaining a switching example. FIG. 4 is a block diagram of a voice transmitting unit and a voice receiving / recognizing unit of a voice recognition device according to a second embodiment of the present invention. 4 is a flowchart for explaining the flow of processing of the speech recognition apparatus shown in FIG. 4;
Reference Signs List 1 microphone 2 amplifier 3 A / D converter 4 voice recognition main processing unit 5 voice recognition memory data unit 6 dictionary selection talk switch 10 voice transmission unit 11 microphone 12 amplifier 13 modulator 14 transmission output unit 15 dictionary selection talk switch 16 carrier frequency Switching section 17 Voice reception recognition section 18 Receiver 19 Detection section 20 Detector 21 Voice signal selection switch 22 A / D converter 23 Voice recognition main processing section 24 Voice dictionary memory data section 30 Transmission section 31 Receiver section 32 Line control section 33 microphone switch 34 voice recognition unit 35 control unit 36 voice synthesis unit 37 public line 38 call microphone 39 call speaker 40 voice recognition microphone 41 voice synthesis speaker

Claims (2)

入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するA/D変換器と、前記A/D変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部と、音声発声の切り出し区間検出信号と前記音声辞書メモリーデータ部の切り換え信号を生成して前記音声認識主処理部に出力する辞書選択トークスイッチを具備したことを特徴とする音声認識装置。A microphone for converting an input audio signal into an electric signal, an amplifier for amplifying the microphone output, an A / D converter for converting the audio signal amplified by the amplifier from an analog signal to a digital signal, A voice recognition main processing unit that receives a voice signal converted into a digital signal by the D converter, a voice dictionary memory data unit that is recorded in advance as reference data for voice recognition, a voice utterance cutout detection signal, A speech recognition apparatus, comprising: a dictionary selection talk switch for generating a switching signal for a speech dictionary memory data section and outputting the signal to the speech recognition main processing section. 入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号を変調する変調器と前記変調した音声信号を無線で送信する送信出力器と、音声発声の切り出し区間検出信号と辞書切り換え信号を生成する辞書選択トークスイッチと、前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部からなる音声送信部を具備すると共に、前記音声送信部から送信された音声信号を受信する受信器と、前記受信器出力信号の検波並びに搬送波検出信号の生成を行う複数の検波器からなる検波部と、前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチと、前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するA/D変換器と、前記A/D変換器によりデジタル信号に変換された音声信号と前記搬送波検出信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部からなる音声受信認識部とを具備したとしたことを特徴とする音声認識装置。A microphone for converting an input audio signal into an electric signal, an amplifier for amplifying the microphone output, a modulator for modulating the audio signal amplified by the amplifier, and a transmission output device for wirelessly transmitting the modulated audio signal And a voice transmission unit including a dictionary selection talk switch that generates a cut-out section detection signal of a voice utterance and a dictionary switching signal, and a carrier frequency switching unit that switches a carrier frequency of the modulator by the dictionary selection talk switch. A receiver that receives the audio signal transmitted from the audio transmitting unit, a detection unit that includes a plurality of detectors that performs detection of the output signal of the receiver and generation of a carrier detection signal, and detection output by the plurality of detectors. Audio signal selection, which selects an audio signal modulated and detected by a carrier controlled by a dictionary selection talk switch from among the audio signals thus selected. A switch, an A / D converter for converting an audio analog signal selected by the audio signal selection switch into a digital signal, and inputting the audio signal converted to a digital signal by the A / D converter and the carrier detection signal. A voice recognition main processing unit, and a voice reception recognition unit including a voice dictionary memory data unit recorded in advance as reference data for voice recognition.
JP2003007158A 2003-01-15 2003-01-15 Speech recognition device Withdrawn JP2004219728A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003007158A JP2004219728A (en) 2003-01-15 2003-01-15 Speech recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003007158A JP2004219728A (en) 2003-01-15 2003-01-15 Speech recognition device

Publications (1)

Publication Number Publication Date
JP2004219728A true JP2004219728A (en) 2004-08-05

Family

ID=32897335

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003007158A Withdrawn JP2004219728A (en) 2003-01-15 2003-01-15 Speech recognition device

Country Status (1)

Country Link
JP (1) JP2004219728A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006162782A (en) * 2004-12-03 2006-06-22 Mitsubishi Electric Corp Voice recognizer
JP2007010971A (en) * 2005-06-30 2007-01-18 Canon Inc Speech recognition method and speech recognition apparatus
JP2007267331A (en) * 2006-03-30 2007-10-11 Railway Technical Res Inst Combination microphone system for speaking voice collection
WO2014068788A1 (en) * 2012-11-05 2014-05-08 三菱電機株式会社 Speech recognition device
CN104868915A (en) * 2015-05-30 2015-08-26 宁波摩米创新工场电子科技有限公司 Linear drive type speech recognition system based on analog to digital conversion circuit
WO2016118480A1 (en) * 2015-01-21 2016-07-28 Knowles Electronics, Llc Low power voice trigger for acoustic apparatus and method
US9502028B2 (en) 2013-10-18 2016-11-22 Knowles Electronics, Llc Acoustic activity detection apparatus and method
US9711166B2 (en) 2013-05-23 2017-07-18 Knowles Electronics, Llc Decimation synchronization in a microphone
US9712923B2 (en) 2013-05-23 2017-07-18 Knowles Electronics, Llc VAD detection microphone and method of operating the same
US9830913B2 (en) 2013-10-29 2017-11-28 Knowles Electronics, Llc VAD detection apparatus and method of operation the same
US10020008B2 (en) 2013-05-23 2018-07-10 Knowles Electronics, Llc Microphone and corresponding digital interface
US10121472B2 (en) 2015-02-13 2018-11-06 Knowles Electronics, Llc Audio buffer catch-up apparatus and method with two microphones
CN114080641A (en) * 2019-07-17 2022-02-22 星电株式会社 Microphone unit

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006162782A (en) * 2004-12-03 2006-06-22 Mitsubishi Electric Corp Voice recognizer
JP4498906B2 (en) * 2004-12-03 2010-07-07 三菱電機株式会社 Voice recognition device
JP2007010971A (en) * 2005-06-30 2007-01-18 Canon Inc Speech recognition method and speech recognition apparatus
JP4667138B2 (en) * 2005-06-30 2011-04-06 キヤノン株式会社 Speech recognition method and speech recognition apparatus
JP2007267331A (en) * 2006-03-30 2007-10-11 Railway Technical Res Inst Combination microphone system for speaking voice collection
JP5677650B2 (en) * 2012-11-05 2015-02-25 三菱電機株式会社 Voice recognition device
WO2014068788A1 (en) * 2012-11-05 2014-05-08 三菱電機株式会社 Speech recognition device
CN104756185A (en) * 2012-11-05 2015-07-01 三菱电机株式会社 Speech recognition device
US9378737B2 (en) 2012-11-05 2016-06-28 Mitsubishi Electric Corporation Voice recognition device
US10313796B2 (en) 2013-05-23 2019-06-04 Knowles Electronics, Llc VAD detection microphone and method of operating the same
US10020008B2 (en) 2013-05-23 2018-07-10 Knowles Electronics, Llc Microphone and corresponding digital interface
US9711166B2 (en) 2013-05-23 2017-07-18 Knowles Electronics, Llc Decimation synchronization in a microphone
US9712923B2 (en) 2013-05-23 2017-07-18 Knowles Electronics, Llc VAD detection microphone and method of operating the same
US9502028B2 (en) 2013-10-18 2016-11-22 Knowles Electronics, Llc Acoustic activity detection apparatus and method
US9830913B2 (en) 2013-10-29 2017-11-28 Knowles Electronics, Llc VAD detection apparatus and method of operation the same
US9830080B2 (en) 2015-01-21 2017-11-28 Knowles Electronics, Llc Low power voice trigger for acoustic apparatus and method
WO2016118480A1 (en) * 2015-01-21 2016-07-28 Knowles Electronics, Llc Low power voice trigger for acoustic apparatus and method
US10121472B2 (en) 2015-02-13 2018-11-06 Knowles Electronics, Llc Audio buffer catch-up apparatus and method with two microphones
CN104868915B (en) * 2015-05-30 2017-12-01 宁波摩米创新工场电子科技有限公司 A kind of Linear drive speech recognition system based on analog to digital conversion circuit
CN104868915A (en) * 2015-05-30 2015-08-26 宁波摩米创新工场电子科技有限公司 Linear drive type speech recognition system based on analog to digital conversion circuit
CN114080641A (en) * 2019-07-17 2022-02-22 星电株式会社 Microphone unit

Similar Documents

Publication Publication Date Title
US9748913B2 (en) Apparatus and method for transmitting/receiving voice signal through headset
CN100423593C (en) Mobile phone and audio processing method
JP5419361B2 (en) Voice control system and voice control method
US7106870B2 (en) Method for adjusting a hearing device to a momentary acoustic surround situation and a hearing device system
US20060028337A1 (en) Voice-operated remote control for TV and electronic systems
US20030018479A1 (en) Electronic appliance capable of preventing malfunction in speech recognition and improving the speech recognition rate
MXPA05000311A (en) Voice-controllable communication gateway for controlling multiple electronic and information appliances.
JP2004219728A (en) Speech recognition device
US20030061033A1 (en) Remote control system for translating an utterance to a control parameter for use by an electronic device
KR101814171B1 (en) Emergency call system using speech recognition
US7020292B1 (en) Apparatuses and methods for recognizing an audio input and muting an audio device
US8060361B2 (en) Communication device with a function of audio modulation and method for audio modulation
KR101877245B1 (en) Method of transporting a code data using an analogue audio signal, data transmission apparatus for performing the same, method of extracting a code data from an analogue audio signal, and data extraction apparatus for performing the same
US5966406A (en) Method and apparatus for noise burst detection in signal processors
US20080130909A1 (en) Apparatus and Method for Removing Ambient Noise and Mobile Communication Terminal Equipped with Apparatus
CN103295571A (en) Control using time and/or spectrally compacted audio commands
KR101386883B1 (en) Mobile terminal and method for executing communication mode thereof
KR101919474B1 (en) Earphone with microphone performing button function
JPH10240283A (en) Voice processor and telephone system
US20160205462A1 (en) Method of transmiting data and a device and system thereof
JP4210660B2 (en) Hearing aid system for remote control of hearing aids
JP3237275U (en) Voice collector, voice transmitter with voice collection function, and voice recognition system
TW202529B (en)
JP2005536107A (en) Ring-activated mute
RU2787130C1 (en) Remote control

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20051017

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20051114

A761 Written withdrawal of application

Free format text: JAPANESE INTERMEDIATE CODE: A761

Effective date: 20070613