JP2004219728A

JP2004219728A - Speech recognition device

Info

Publication number: JP2004219728A
Application number: JP2003007158A
Authority: JP
Inventors: Mikio Oda; 幹夫小田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-01-15
Filing date: 2003-01-15
Publication date: 2004-08-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device which improves a speech recognition rate and shorten a processing time by securely detecting a speech segmentation section. <P>SOLUTION: The speech recognition device is equipped with: a microphone which converts an inputted speech signal into an electric signal; an amplifier which amplifies the microphone output; an A/D converter which converts the speech signal amplified by the amplifier from an analog signal to a digital signal; a speech recognition main processing part which inputs the speech signal converted into the digital signal by the A/D converter; and a speech dictionary memory data part in which reference data for speech recognition are previously recorded. A plurality of dictionary selection talk switches as control signals for segmentation section detection of speech speaking and speech dictionary memory data part switching are inputted to the speech recognition main processing part and only a section, wherein a dictionary select talk switch is pressed is regarded as a speech segmentation section. Speech recognition is carried out by referring to only the speech dictionary memory data corresponding to the pressed dictionary select talk switch to improve the speech recognition rate and shorten the processing time. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識装置において、音声認識率の向上及び、処理時間の高速化を図る音声認識装置に関するものである。
【０００２】
【従来の技術】
昨今の音声認識技術は、デジタル信号処理技術の向上、処理ＬＳＩの高性能化、低価格化などにより、民生機器に数多く導入が図られており、機器の操作性向上に役立っている。
【０００３】
しかし、民生機器では周囲会話、生活雑音の多い家庭内での音声入力となり、常に音声認識入力状態にしておくと、ハンズフリーでの操作となり、確かに便利であるが、単なる会話でも音声認識モードに入り、余分な制御をしてしまうケースがある。会話か、音声認識入力かの判断は現在の技術では切り分けできない。この課題を解決する音声認識装置が提案されている（例えば特許文献１参照。）。
【０００４】
以下図６を参照しながら音声認識装置の一例について説明する。
図６において、符号３８は電話用の通話マイク、３０は音声信号を電気信号に変換するための送話部、３２は公衆回線選択制御する回線制御部、３７は公衆回線、３１は電気信号で得られた通話相手の音声を音声信号に変換する受話部、３９は電話用の通話スピーカ、４０は音声認識マイク、３４は音声認識部、３３はマイクスイッチ、３５は音声認識結果に基づいて音声合成、回線選択などの制御を行う制御部、３６は認識結果や電話機の状態や情報を使用者に報知する音声合成部、４１は音声合成スピーカである。
【０００５】
以上のように構成された音声認識装置について、その動作を説明する。図６は電話に音声認識技術を利用した例であり、使用者が音声認識を利用して機器を制御する場合、まず、図６で構成された音声認識装置のマイクスイッチ３３をオンし、音声認識マイクに向かって所定の発声を行う。発声単語として、登録してある相手先名、電話の制御単語名などがあげられる。所定の発声が終了すると、音声認識部３４で認識処理を行い、認識結果を制御部３５に転送し、音声合成部３６、音声合成スピーカ４１で認識結果を音声で知らしめる。使用者は認識結果が間違っていたら再度発声を繰り返し、正解になるまで音声入力を続け、正解ならば、マイクスイッチ３３をオフとする。発声単語が相手先名であれば、制御部３５から回線制御部３２へ制御信号を送り、ダイヤル自動送出し、音声認識による電話回線接続が可能となる。つまり、音声認識させる場合のみマイクスイッチ３３をオン状態とし、音声認識処理が終了するとマイクスイッチ３３はオフ状態とすることで、一連の音声認識処理を完了する。
【０００６】
【特許文献１】
特開平３−５２４４２号公報
【０００７】
【発明が解決しようとする課題】
しかしながら前記従来の構成では、家庭内の周囲会話、生活雑音に対する誤動作対策には一番簡単で実現容易であるが、音声認識率向上への対策としては不十分である。つまり民生機器では、不特定話者を音声入力として認識しなければならず、話者は色々な話し方で入力する。例えば、早口でしゃべる人、低音の男性、高音の女性、子供など、様々な特徴がある。また家庭内ＡＶ機器、電化商品など全ての制御を音声認識で行う時などは発声単語数が多くなり、高い音声認識率を実現し、かつ認識処理の高速化には課題も多い。
【０００８】
【課題を解決するための手段】
前記課題を解決するために本発明の音声認識装置は、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れることを特徴としたものである。
【０００９】
本発明によれば、音声認識の発声を行う時は、辞書選択トークスイッチを押し、周辺の雑音に影響されることなく、辞書選択トークスイッチの押された時間のみ、音声認識の音声切り出し区間として音声認識のデータとすると共に、複数の辞書選択トークスイッチの内、選択された辞書選択トークスイッチに対応した音声辞書メモリーデータのみを選択参照するので、参照単語数の減少になり、音声認識率の向上及び、処理時間の高速化が図れる音声認識装置を提供することが可能となる。
【００１０】
【発明の実施の形態】
本発明の請求項１に記載された音声認識装置は、入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部と、音声発声の切り出し区間検出信号と前記音声辞書メモリーデータ部の切り換え信号を生成して前記音声認識主処理部に出力する辞書選択トークスイッチを具備する構成とすることで、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識を実現しうるものである。
【００１１】
つぎに、本発明の請求項２に記載された音声認識装置は、入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号を変調する変調器と前記変調した音声信号を無線で送信する送信出力器と、音声発声の切り出し区間検出信号と辞書切り換え信号を生成する辞書選択トークスイッチと、前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部からなる音声送信部を具備すると共に、前記音声送信部から送信された音声信号を受信する受信器と、前記受信器出力信号の検波並びに搬送波検出信号を行う複数の検波器からなる検波部と、前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチと、前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号と前記搬送波検出信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部からなる音声受信認識部とを具備した構成とすることで、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識を実現しうるものである。
【００１２】
以下本発明の実施の形態について、図１から図５を用いて説明する。
【００１３】
（実施の形態１）
以下に、本発明の請求項１に記載された発明の実施の形態について、図１、図２、図３を用いて説明する。
【００１４】
図１は、本発明の一実施例における音声認識装置のブロック構成図を示す。図１において、符号１は入力された音声信号を電気信号に変換するマイク、２は前記マイク出力を増幅する増幅器、３は前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するＡ／Ｄ変換器、４は前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部、５は音声認識の参照データとして予め記録されている音声辞書メモリーデータ部、６は音声発声の切り出し区間検出と前記音声辞書メモリーデータ切り換えの制御信号とする複数の辞書選択トークスイッチである。
【００１５】
以上のように構成された音声認識装置について、その動作を説明する。音声認識でＴＶやＶＴＲなどの機器を動作制御する場合、使用者はまず、辞書選択トークスイッチ６を押し続けて、音声認識に登録されている単語を発声する。この辞書選択トークスイッチ６は複数のスイッチからなるもので、説明の為、２個あると仮定する。一個は男声用辞書選択トークスイッチ、もう一個は女声用辞書選択トークスイッチとする。使用者がもし、男声であるとすると、男声用の辞書選択トークスイッチを押して、所定の発声を行う。発声した音声信号は増幅器２で増幅され、アナログ信号からデジタル信号に変換するＡ／Ｄ変換器３を通して音声認識主処理部４に音声データが入力されると共に、辞書選択トークスイッチ６は音声認識主処理部４に接続されており、押した辞書選択トークスイッチの番号と、押してる区間を検出する。使用者が所定の発声を終了すると、辞書選択トークスイッチ６を離し、音声認識主処理部４は音声データの収集を終了する。つまり辞書選択トークスイッチ６の押されている区間のみ音声データ区間として切り出す。音声認識主処理部４は通常マイクロコンピュータやＤＳＰ（デジタルシグナルプロセッサ）が使用されており、入力された音声データを信号分析する。分析されたデータは音声認識メモリーデータ部５に予め登録してある単語辞書と参照される。参照される音声認識メモリーデータは、使用者が辞書選択トークスイッチ６の内、どのスイッチを使用したかの番号情報をもとに、音声認識メモリーデータ部５の内、辞書選択トークスイッチ番号に対応した辞書メモリー番号が選択され、この仮定の場合、男声用の辞書データを参照して認識処理がなされる。これらの一連の処理の流れを説明したのが図２に示すフローチャートである。
【００１６】
通常、音声認識を行う場合の辞書作成基礎データとして、男声と女声のデータをサンプリングし、合成周波数データとして辞書メモリーデータの登録を行うが、図３に示すごとく、男声は低音側、女声は高音側となり、これを同一辞書にしてしまうと、やや低い男声、やや高い女声の認識率はチューニング範囲が広すぎるため、認識率が低下することが知られている。これを改善する為に、予め男声用の辞書データ、女性用の辞書データと切り分けて登録し、参照すれば認識率は向上する。これを辞書選択トークスイッチ６を押す時に、同時に使用者が指定しておけば、音声発声切り出し区間の確実さと共に、認識率向上につながる。
【００１７】
一方法として男声、女声の辞書データ例で、複数の辞書切り換えスイッチの応用を説明したが、他に家電機器毎の辞書選択スイッチとする場合なども考えられる。この場合は、各家電機器の単語を集めた単語辞書とすると膨大になるが、機器毎の単語辞書にしてしまえば、機器内の認識単語のみの参照となり、認識率の向上、処理の高速化を図れる音声認識が実現できる。
【００１８】
（実施の形態２）
つぎに、本発明の請求項２に記載された発明の実施の形態について、図４、図５を用いて説明する。
【００１９】
図４は、本発明の一実施例における音声認識装置の音声送信機及び受信・認識部のブロック構成図を示す。図４において、符号１０は音声送信部、１１は入力された音声信号を電気信号に変換するマイク、１２は前記マイク出力を増幅する増幅器、１３は前記増幅器で増幅された音声信号を変調する変調器、１４は前記変調した音声信号を無線で送信する送信出力器、１５は音声発声の切り出し区間検出と辞書切り換えの制御信号とする辞書選択トークスイッチ、１６は前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部、１７は前記音声送信部で送信された音声無線信号を受信検波し、認識処理する音声受信認識部、１８は前記音声送信部から送信された音声信号を受信する受信器、１９は前記受信器出力信号を検波する複数の検波器からなる検波部、２０は前記受信器で受信した搬送波を音声検波し、音声信号と搬送波検出信号を出力する検波器、２１は前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチ、２２は前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するＡ／Ｄ変換器、２３は前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部、２４は音声認識の参照データとして予め記録されている音声辞書メモリーデータ部である。
【００２０】
以上のように構成された音声認識装置について、その動作を説明する。音声認識でＴＶやＶＴＲなどの機器を動作制御する場合、リモコンの普及により機器本体から離れて操作することが多い。その為、機器本体から離れて操作する音声送信部１０で、音声信号を無線で伝送し、本体に設けた音声受信認識部１７で音声検波、認識処理ができる構成としている。使用者はまず、音声送信部１０に設けた辞書選択トークスイッチ１５を押し続けて、音声認識に登録されている単語を発声する。この辞書選択トークスイッチ１５は複数のスイッチからなるもので、実施の形態１と同様、説明の為、２個あると仮定する。一個は男声用辞書選択トークスイッチ、もう一個は女声用辞書選択トークスイッチとする。使用者がもし、男声であるとすると、男声用の辞書選択トークスイッチを押して、所定の発声を行う。発声した音声信号は増幅器１２で増幅され、変調器１３で変調される。普通変調にはＦＭ変調などが用いられ、簡単な回路、ＩＣで実現可能である。一方辞書選択トークスイッチ１５は搬送波周波数切り換え部１６に入力され、トークスイッチ番号に応じて、前記変調器１３の搬送波周波数を切り換える。この様にして変調された音声信号は赤外線、または電波により、機器本体に無線伝送される。機器本体では、赤外線、または電波を受信する受信部１８で受信し、辞書選択トークスイッチ１５の数量と同数の複数の検波器２０で構成された検波部１９で各搬送波の検波が可能な構成となっており、検波された音声信号は音声信号選択スイッチ２１で選択され、アナログ信号からデジタル信号に変換するＡ／Ｄ変換器２２を通して音声認識主処理部２３に音声データが入力される。検波器２０からは検波された音声信号を出力すると同時に、搬送波検出信号を出力し、音声主処理部２３に入力されており、検波した検波器２０の番号と、搬送波の有無の区間を検出する。搬送波検出番号に応じて音声認識主処理部２３は前記音声信号選択スイッチ２１を切り換え制御する。使用者が所定の発声を終了すると、音声送信部１０の辞書選択トークスイッチ１５を離し、検波器２０は搬送波検出信号終了を、音声認識主処理部２３に入力し、音声データの収集を終了する。つまり音声送信部１０に設けた辞書選択トークスイッチ１５の押されている区間のみ音声発声区間として切り出す。音声認識主処理部２３は、通常マイクロコンピュータやＤＳＰ（デジタルシグナルプロセッサ）が使用されており、入力された音声データを信号分析する。分析されたデータは音声認識メモリーデータ部２４に予め登録してある単語辞書と参照される。参照される音声認識メモリーデータは、使用者が辞書選択トークスイッチ１５の内、どのスイッチを使用したかの番号情報をもとに、音声認識メモリーデータ部２４の内、辞書選択トークスイッチ番号に対応した辞書メモリー番号が選択され、この仮定の場合、男声用の辞書データを参照して認識処理がなされる。これらの一連の処理の流れを説明したのが図５に示すフローチャートである。
【００２１】
男声、女声の辞書データに分けて説明したのは、実施の形態１と同じ理由であり、予め男声用の辞書データ、女性用の辞書データと切り分けて登録し、参照すれば認識率は向上する。これを辞書選択トークスイッチ１５を押す時に、同時に使用者が指定しておけば、切り出しデータの確実さと共に、認識率向上につながる。また一方法として男声、女声の辞書データ例で、複数の辞書切り換えスイッチの応用を説明したが、他に家電機器毎の辞書選択スイッチとする場合なども考えられる。この場合は、各家電機器の単語を集めた単語辞書とすると膨大になるが、機器毎の単語辞書にしてしまえば、機器内の認識単語のみの参照となり、認識率の向上、処理の高速化を図れる音声認識が実現できることは実施の形態１と同様であり、無線音声送信で遠隔操作ができる方法が実施の形態２である。
【００２２】
【発明の効果】
以上のように、本発明の音声認識装置は、辞書選択トークスイッチの押された時間のみ音声認識の音声切り出し区間として音声認識のデータとすると共に、辞書選択トークスイッチに対応した音声辞書メモリーデータを参照して音声認識することで、音声認識率の向上及び、処理時間の高速化が図れる音声認識装置を提供することが可能となる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態における音声認識装置のブロック構成図
【図２】図１に示す音声認識装置の処理の流れを説明するフローチャート
【図３】男声、女声用の辞書切り換え例を説明する男声、女声周波数スペクトル帯域を示す図
【図４】本発明の第２の実施の形態における音声認識装置の音声送信部及び音声受信・認識部のブロック構成図
【図５】図４に示す音声認識装置の処理の流れを説明するフローチャート
【図６】従来の実施例の音声認識装置のブロック構成図
【符号の説明】
１マイク
２増幅器
３Ａ／Ｄ変換器
４音声認識主処理部
５音声認識メモリーデータ部
６辞書選択トークスイッチ
１０音声送信部
１１マイク
１２増幅器
１３変調器
１４送信出力器
１５辞書選択トークスイッチ
１６搬送波周波数切り換え部
１７音声受信認識部
１８受信器
１９検波部
２０検波器
２１音声信号選択スイッチ
２２Ａ／Ｄ変換器
２３音声認識主処理部
２４音声辞書メモリーデータ部
３０送話部
３１受話部
３２回線制御部
３３マイクスイッチ
３４音声認識部
３５制御部
３６音声合成部
３７公衆回線
３８通話マイク
３９通話スピーカ
４０音声認識マイク
４１音声合成スピーカ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition device that improves the speech recognition rate and shortens the processing time in a speech recognition device.
[0002]
[Prior art]
Recently, many speech recognition technologies have been introduced into consumer devices due to improvements in digital signal processing technology, higher performance of processing LSIs, lower prices, and the like, which have contributed to improved operability of devices.
[0003]
However, in the case of consumer equipment, ambient conversation and voice input in the home where there is a lot of noise in the living room, and if the voice recognition input state is always set, the operation is hands-free, and it is certainly convenient. In some cases, extra control is performed. Judgment of conversation or voice recognition input cannot be distinguished by current technology. A speech recognition device that solves this problem has been proposed (for example, see Patent Document 1).
[0004]
Hereinafter, an example of the speech recognition device will be described with reference to FIG.
In FIG. 6, reference numeral 38 denotes a telephone call microphone, 30 denotes a transmitting unit for converting a voice signal into an electric signal, 32 denotes a line control unit for selecting and controlling a public line, 37 denotes a public line, and 31 denotes an electric signal. A receiver for converting the obtained voice of the other party into a voice signal, 39 is a telephone call speaker, 40 is a voice recognition microphone, 34 is a voice recognition unit, 33 is a microphone switch, and 35 is a voice based on the voice recognition result. A control unit that controls synthesis, line selection, and the like, a voice synthesis unit 36 that notifies the user of the recognition result, the state of the telephone, and information, and a voice synthesis speaker 41.
[0005]
The operation of the speech recognition device configured as described above will be described. FIG. 6 shows an example in which speech recognition technology is used for a telephone. When a user controls a device using speech recognition, first, the microphone switch 33 of the speech recognition device shown in FIG. A predetermined utterance is made toward the recognition microphone. Examples of the utterance word include a registered destination name, a telephone control word name, and the like. When the predetermined utterance is completed, the speech recognition unit 34 performs a recognition process, transfers the recognition result to the control unit 35, and notifies the speech synthesis unit 36 and the speech synthesis speaker 41 of the recognition result by voice. If the recognition result is incorrect, the user repeats the utterance again and continues to input voice until the correct answer is obtained. If the answer is correct, the microphone switch 33 is turned off. If the uttered word is the name of the other party, a control signal is sent from the control unit 35 to the line control unit 32, the dial is automatically transmitted, and the telephone line can be connected by voice recognition. That is, the microphone switch 33 is turned on only when performing voice recognition, and the microphone switch 33 is turned off when the voice recognition processing is completed, thereby completing a series of voice recognition processing.
[0006]
[Patent Document 1]
JP-A-3-52442
[Problems to be solved by the invention]
However, the above-described conventional configuration is the simplest and simplest measure for countermeasures against erroneous conversations around households and living noise, but is insufficient as a measure for improving the voice recognition rate. That is, the consumer device must recognize an unspecified speaker as voice input, and the speaker inputs in various ways. For example, there are various characteristics such as a person who speaks fast, a male with a low pitch, a female with a high pitch, and a child. Also, when all controls such as home AV equipment and electric appliances are controlled by voice recognition, the number of words to be uttered becomes large, and a high voice recognition rate is realized, and there are many problems in speeding up the recognition processing.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, a voice recognition device according to the present invention comprises a voice dictionary memory corresponding to a dictionary selection talk switch, as well as a voice cutout section for voice recognition only when a dictionary selection talk switch is pressed. The speech recognition is performed by referring to the data, whereby the speech recognition rate can be improved and the processing time can be shortened.
[0009]
According to the present invention, when uttering voice recognition, the dictionary selection talk switch is pressed, and the time period during which the dictionary selection talk switch is pressed is set as the voice cutout section for voice recognition without being affected by surrounding noise. In addition to the voice recognition data, only the voice dictionary memory data corresponding to the selected dictionary selection talk switch among the plurality of dictionary selection talk switches is selected and referenced, so that the number of reference words is reduced and the voice recognition rate is reduced. It is possible to provide a speech recognition device capable of improving the processing time and shortening the processing time.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
A voice recognition device according to claim 1 of the present invention includes a microphone that converts an input voice signal into an electrical signal, an amplifier that amplifies the microphone output, and a voice signal that is amplified by the amplifier from an analog signal. An A / D converter for converting into a digital signal, a speech recognition main processing unit which receives the speech signal converted into a digital signal by the A / D converter, and speech pre-recorded as reference data for speech recognition The dictionary includes a dictionary memory data unit, a dictionary selection talk switch that generates a voice utterance cutout section detection signal and a switching signal for the voice dictionary memory data unit, and outputs the signal to the voice recognition main processing unit. Only when the selected talk switch is pressed, the voice recognition data is used as the voice cutout section for voice recognition, and the dictionary selection talk switch is supported. By voice recognition with reference to speech dictionary memory data, improvement of speech recognition rate and one in which high-speed processing time can achieve speech recognition attained.
[0011]
Next, the voice recognition device according to claim 2 of the present invention includes a microphone that converts an input voice signal into an electric signal, an amplifier that amplifies the microphone output, and a voice signal amplified by the amplifier. A modulator that modulates and a transmission output device that wirelessly transmits the modulated audio signal, a dictionary selection talk switch that generates a cutout section detection signal of a voice utterance and a dictionary switching signal, and a dictionary selection talk switch that controls the modulator by the dictionary selection talk switch. An audio transmission unit comprising a carrier frequency switching unit for switching a carrier frequency is provided, and a receiver for receiving an audio signal transmitted from the audio transmission unit, and a plurality of units for detecting the output signal of the receiver and performing a carrier detection signal. A detection unit including a detector, and a carrier wave controlled by a dictionary selection talk switch among audio signals detected and output by the plurality of detectors. An audio signal selection switch for selecting an adjusted and detected audio signal, an A / D converter for converting an audio analog signal selected by the audio signal selection switch into a digital signal, and a digital signal by the A / D converter A speech recognition main processing unit that receives the converted speech signal and the carrier detection signal as input, and a configuration including a speech reception recognition unit that includes a speech dictionary memory data unit that is recorded in advance as speech recognition reference data. By doing so, only the time during which the dictionary selection talk switch is pressed is used as voice recognition data as a voice cutout section for voice recognition, and voice recognition is performed by referring to voice dictionary memory data corresponding to the dictionary selection talk switch. It is possible to realize speech recognition that can improve the speech recognition rate and shorten the processing time.
[0012]
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 5.
[0013]
(Embodiment 1)
Hereinafter, an embodiment of the present invention described in claim 1 of the present invention will be described with reference to FIGS. 1, 2, and 3. FIG.
[0014]
FIG. 1 is a block diagram showing a speech recognition apparatus according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a microphone for converting an input audio signal into an electric signal, 2 an amplifier for amplifying the microphone output, and 3 an A for converting the audio signal amplified by the amplifier from an analog signal to a digital signal. A / D converter 4 is a voice recognition main processing unit which receives a voice signal converted into a digital signal by the A / D converter, and 5 is a voice dictionary memory data unit recorded in advance as voice recognition reference data. Reference numerals 6 denote a plurality of dictionary selection talk switches which are used as control signals for detecting a voice utterance cut-out section and switching the voice dictionary memory data.
[0015]
The operation of the speech recognition device configured as described above will be described. When controlling the operation of a device such as a TV or VTR by voice recognition, first, the user keeps pressing the dictionary selection talk switch 6 to utter a word registered in the voice recognition. The dictionary selection talk switch 6 is composed of a plurality of switches, and for the sake of explanation, it is assumed that there are two switches. One is a male voice dictionary select talk switch, and the other is a female voice dictionary select talk switch. If the user has a male voice, the user presses the male voice dictionary selection talk switch to make a predetermined utterance. The uttered voice signal is amplified by the amplifier 2, and voice data is input to the voice recognition main processing unit 4 through an A / D converter 3 for converting an analog signal to a digital signal. It is connected to the processing unit 4 and detects the number of the pressed dictionary selection talk switch and the pressed section. When the user finishes the predetermined utterance, the user releases the dictionary selection talk switch 6, and the voice recognition main processing unit 4 ends the collection of the voice data. That is, only the section in which the dictionary selection talk switch 6 is pressed is cut out as the audio data section. The voice recognition main processing unit 4 usually uses a microcomputer or a DSP (Digital Signal Processor), and performs signal analysis on the input voice data. The analyzed data is referred to a word dictionary registered in advance in the voice recognition memory data unit 5. The voice recognition memory data to be referred to corresponds to the dictionary selection talk switch number in the voice recognition memory data section 5 based on the number information of which of the dictionary selection talk switches 6 the user has used. The selected dictionary memory number is selected. In this case, recognition processing is performed with reference to the male voice dictionary data. FIG. 2 is a flowchart illustrating the flow of a series of these processes.
[0016]
Normally, male and female voice data are sampled as dictionary creation basic data for speech recognition, and dictionary memory data is registered as synthesized frequency data. As shown in FIG. 3, male voices are on the low tone side and female voices are on the high tone side. It is known that, if the same dictionary is used, the recognition rate of a slightly lower male voice and a slightly higher female voice is lowered because the tuning range is too wide. In order to improve this, the dictionary data for a male voice and the dictionary data for a female are separated and registered in advance, and if they are referred to, the recognition rate is improved. If the user specifies this when the dictionary selection talk switch 6 is pressed at the same time, the recognition rate is improved as well as the reliability of the voice utterance cutout section.
[0017]
As one method, the application of a plurality of dictionary change switches has been described in the case of male and female voice dictionary data, but a dictionary selection switch for each household electric appliance may be used. In this case, it would be enormous if the word dictionary for each home appliance was collected, but if it was a word dictionary for each device, only the recognized words in the device would be referenced, improving the recognition rate and speeding up processing. Can be realized.
[0018]
(Embodiment 2)
Next, an embodiment of the invention described in claim 2 of the present invention will be described with reference to FIGS.
[0019]
FIG. 4 is a block diagram of a voice transmitter and a receiving / recognizing unit of the voice recognition device according to one embodiment of the present invention. In FIG. 4, reference numeral 10 denotes an audio transmitting unit, 11 denotes a microphone for converting an input audio signal into an electric signal, 12 denotes an amplifier for amplifying the microphone output, and 13 denotes a modulation for modulating the audio signal amplified by the amplifier. 14 is a transmission output device for wirelessly transmitting the modulated voice signal, 15 is a dictionary selection talk switch which is a control signal for detecting a cut-out section of voice utterance and dictionary switching, and 16 is the modulator by the dictionary selection talk switch. A carrier frequency switching unit for switching the carrier frequency of the received signal; a voice reception signal recognition unit for receiving and detecting the voice radio signal transmitted by the voice transmission unit; and a voice reception recognition unit for performing a recognition process; , A detector 19 comprising a plurality of detectors for detecting the output signal of the receiver, and 20 a voice detector for the carrier received by the receiver. A detector that outputs an audio signal and a carrier detection signal; and 21 selects, from among the audio signals detected and output by the plurality of detectors, an audio signal modulated and detected by a carrier controlled by a dictionary selection talk switch. An A / D converter for converting the audio analog signal selected by the audio signal selection switch into a digital signal; and 23 an audio signal converted into a digital signal by the A / D converter. A voice recognition main processing unit 24 to be input is a voice dictionary memory data unit recorded in advance as reference data for voice recognition.
[0020]
The operation of the speech recognition device configured as described above will be described. When controlling the operation of a device such as a TV or a VTR by voice recognition, the remote control is often used to operate the device away from the device body. Therefore, the audio signal is transmitted wirelessly by the audio transmission unit 10 that is operated away from the device main body, and the audio detection and recognition processing can be performed by the audio reception recognition unit 17 provided in the main body. First, the user keeps pressing the dictionary selection talk switch 15 provided in the voice transmission unit 10 to utter a word registered for voice recognition. This dictionary selection talk switch 15 is composed of a plurality of switches, and it is assumed that there are two switches for the sake of explanation, as in the first embodiment. One is a male voice dictionary select talk switch, and the other is a female voice dictionary select talk switch. If the user has a male voice, the user presses the male voice dictionary selection talk switch to make a predetermined utterance. The uttered voice signal is amplified by the amplifier 12 and modulated by the modulator 13. Normal modulation uses FM modulation or the like, and can be realized by a simple circuit or IC. On the other hand, the dictionary selection talk switch 15 is input to the carrier frequency switching unit 16 and switches the carrier frequency of the modulator 13 according to the talk switch number. The audio signal thus modulated is wirelessly transmitted to the device body by infrared rays or radio waves. In the main body of the apparatus, a receiving unit 18 that receives infrared rays or radio waves receives a signal, and a detection unit 19 including a plurality of detectors 20 in the same number as the number of dictionary selection talk switches 15 can detect each carrier. The detected audio signal is selected by an audio signal selection switch 21, and audio data is input to an audio recognition main processing unit 23 through an A / D converter 22 that converts an analog signal into a digital signal. The detector 20 outputs a detected audio signal and, at the same time, outputs a carrier detection signal. The carrier detection signal is input to the audio main processing unit 23 and detects the number of the detected detector 20 and a section indicating whether or not a carrier exists. . The voice recognition main processing unit 23 controls switching of the voice signal selection switch 21 according to the carrier detection number. When the user finishes the predetermined utterance, the user releases the dictionary selection talk switch 15 of the voice transmission unit 10, and the detector 20 inputs the end of the carrier detection signal to the voice recognition main processing unit 23, and ends the collection of the voice data. . That is, only a section in which the dictionary selection talk switch 15 provided in the voice transmission unit 10 is pressed is cut out as a voice utterance section. The voice recognition main processing unit 23 usually uses a microcomputer or a DSP (Digital Signal Processor), and performs signal analysis on the input voice data. The analyzed data is referred to a word dictionary registered in advance in the voice recognition memory data unit 24. The voice recognition memory data to be referred to corresponds to the dictionary selection talk switch number in the voice recognition memory data section 24 based on the number information of which of the dictionary selection talk switches 15 the user has used. The selected dictionary memory number is selected. In this case, recognition processing is performed with reference to the male voice dictionary data. FIG. 5 is a flowchart illustrating the flow of a series of these processes.
[0021]
The reason why the description is made separately for the male and female voice dictionary data is the same as in the first embodiment. The recognition rate is improved by separating and registering the male voice and female dictionary data in advance and referring to them. . If the user specifies this when the dictionary selection talk switch 15 is pressed at the same time, the recognition rate is improved as well as the reliability of the cut-out data. Further, as one method, the application of a plurality of dictionary change switches has been described in the example of dictionary data of male and female voices, but a dictionary select switch for each home appliance may be used. In this case, it would be enormous if the word dictionary for each home appliance was collected, but if it was a word dictionary for each device, only the recognized words in the device would be referenced, improving the recognition rate and speeding up processing. The embodiment 2 is capable of realizing voice recognition, and the second embodiment is a method of performing remote control by wireless voice transmission.
[0022]
【The invention's effect】
As described above, the voice recognition device of the present invention uses the voice recognition data as the voice cutout section for voice recognition only during the time when the dictionary selection talk switch is pressed, and stores the voice dictionary memory data corresponding to the dictionary selection talk switch. By referring to and recognizing speech, it is possible to provide a speech recognition device capable of improving the speech recognition rate and shortening the processing time.
[Brief description of the drawings]
FIG. 1 is a block diagram of a speech recognition apparatus according to a first embodiment of the present invention; FIG. 2 is a flowchart illustrating a processing flow of the speech recognition apparatus shown in FIG. 1; FIG. FIG. 4 is a diagram illustrating frequency bands of male and female voices for explaining a switching example. FIG. 4 is a block diagram of a voice transmitting unit and a voice receiving / recognizing unit of a voice recognition device according to a second embodiment of the present invention. 4 is a flowchart for explaining the flow of processing of the speech recognition apparatus shown in FIG. 4;
Reference Signs List 1 microphone 2 amplifier 3 A / D converter 4 voice recognition main processing unit 5 voice recognition memory data unit 6 dictionary selection talk switch 10 voice transmission unit 11 microphone 12 amplifier 13 modulator 14 transmission output unit 15 dictionary selection talk switch 16 carrier frequency Switching section 17 Voice reception recognition section 18 Receiver 19 Detection section 20 Detector 21 Voice signal selection switch 22 A / D converter 23 Voice recognition main processing section 24 Voice dictionary memory data section 30 Transmission section 31 Receiver section 32 Line control section 33 microphone switch 34 voice recognition unit 35 control unit 36 voice synthesis unit 37 public line 38 call microphone 39 call speaker 40 voice recognition microphone 41 voice synthesis speaker

Claims

入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号をアナログ信号からデジタル信号に変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部と、音声発声の切り出し区間検出信号と前記音声辞書メモリーデータ部の切り換え信号を生成して前記音声認識主処理部に出力する辞書選択トークスイッチを具備したことを特徴とする音声認識装置。A microphone for converting an input audio signal into an electric signal, an amplifier for amplifying the microphone output, an A / D converter for converting the audio signal amplified by the amplifier from an analog signal to a digital signal, A voice recognition main processing unit that receives a voice signal converted into a digital signal by the D converter, a voice dictionary memory data unit that is recorded in advance as reference data for voice recognition, a voice utterance cutout detection signal, A speech recognition apparatus, comprising: a dictionary selection talk switch for generating a switching signal for a speech dictionary memory data section and outputting the signal to the speech recognition main processing section.

入力された音声信号を電気信号に変換するマイクと、前記マイク出力を増幅する増幅器と、前記増幅器で増幅された音声信号を変調する変調器と前記変調した音声信号を無線で送信する送信出力器と、音声発声の切り出し区間検出信号と辞書切り換え信号を生成する辞書選択トークスイッチと、前記辞書選択トークスイッチにより前記変調器の搬送波周波数を切り換える搬送波周波数切り換え部からなる音声送信部を具備すると共に、前記音声送信部から送信された音声信号を受信する受信器と、前記受信器出力信号の検波並びに搬送波検出信号の生成を行う複数の検波器からなる検波部と、前記複数の検波器で検波出力された音声信号の内、辞書選択トークスイッチで制御された搬送波で変調され検波された音声信号を選択する音声信号選択スイッチと、前記音声信号選択スイッチで選択された音声アナログ信号をデジタル信号に変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換器によりデジタル信号に変換された音声信号と前記搬送波検出信号を入力とする音声認識主処理部と、音声認識の参照データとして予め記録されている音声辞書メモリーデータ部からなる音声受信認識部とを具備したとしたことを特徴とする音声認識装置。A microphone for converting an input audio signal into an electric signal, an amplifier for amplifying the microphone output, a modulator for modulating the audio signal amplified by the amplifier, and a transmission output device for wirelessly transmitting the modulated audio signal And a voice transmission unit including a dictionary selection talk switch that generates a cut-out section detection signal of a voice utterance and a dictionary switching signal, and a carrier frequency switching unit that switches a carrier frequency of the modulator by the dictionary selection talk switch. A receiver that receives the audio signal transmitted from the audio transmitting unit, a detection unit that includes a plurality of detectors that performs detection of the output signal of the receiver and generation of a carrier detection signal, and detection output by the plurality of detectors. Audio signal selection, which selects an audio signal modulated and detected by a carrier controlled by a dictionary selection talk switch from among the audio signals thus selected. A switch, an A / D converter for converting an audio analog signal selected by the audio signal selection switch into a digital signal, and inputting the audio signal converted to a digital signal by the A / D converter and the carrier detection signal. A voice recognition main processing unit, and a voice reception recognition unit including a voice dictionary memory data unit recorded in advance as reference data for voice recognition.