JPS59195739A

JPS59195739A - Audio response unit

Info

Publication number: JPS59195739A
Application number: JP58070289A
Authority: JP
Inventors: Hitoshi Takase; 高瀬　均
Original assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Current assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Priority date: 1983-04-20
Filing date: 1983-04-20
Publication date: 1984-11-06

Abstract

PURPOSE:To input a voice to a microphone to recognize the voice even during speaking of a synthesized voice by recognizing the voice after the voice signal from the microphone is corrected with a synthesized voice signal from a voice synthesizing part. CONSTITUTION:By the request from an information processing part 1, a voice synthesizing part 6 synthesizes a desired voice signal S1 and outputs it from a speaker 8 through an amplifier 7. A voice signal V1 obtained from a microphone 3 through an amplifier 4 is corrected in a correcting means 5 to become a signal V2, and this signal is recognized by a voice recognizing part 2 and is inputted as an instruction and a data signal, which correspond to the signal V2, to the processing part 1. If the user speaks toward the microphone 8 when a voice S is outputted from the speaker 8, the voice S is superposed and inputted, and the signal V1 becomes a superposed voice signal. The signal S1 is converted to a tuning synthesized voice signal S2 tuned to the synthesized voice component S by a tuning circuit 51 in the correcting means 5, and this signal is subtracted from the superposed voice signal V1 in a subtractor 52 to obtain the voice signal V2 having no noise components.

Description

【発明の詳細な説明】（イ）Ｍ業上の利用分野本発明は音声の入出力を可能とした音声応答装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION (a) Field of use in M industry The present invention relates to a voice response device that allows input and output of voice.

（ロ）従来技術この種従来の音声応答装置には、情報処刑部に人間の音
声を認識する音声認識部と音声を合成する音声合成部と
を組合せて、音声認識部にオペレータの音声を認識せし
めて情報処理部の操作及び情報処理部へのデータ入力を
行ない、音声合成部にて情報処理部の操作を促がすメツ
セージの出力及び情報処理部からの処理結果の出力を行
なうものがある。(b) Prior art This type of conventional voice response device combines a voice recognition unit that recognizes human voice in the information processing unit and a voice synthesis unit that synthesizes voice, and the voice recognition unit recognizes the operator's voice. There is a device that at least operates the information processing unit and inputs data to the information processing unit, and the speech synthesis unit outputs a message prompting the operation of the information processing unit and outputs the processing results from the information processing unit. .

斯様な音声応答装Ｗ１は、その操作を行なうに際して、
音声合成部からの例えは、「ゝゝＹＥＳ“又はゝゝＮｏ
”を発声入力して）さい。」、「番号を発声入力して下
さい。」等のメツセージが出力され、それに従って、オ
ペレータが音声認識部に「ＹＥＳ」又は「ＮＯ」を発声
入力したり、番号を発声入力する事に依り、情報処理部
の操作及びデータ入力ができるので、断る装置の操作に
不慣れなオペレータにとって非常に便利ではある。しか
し力から、オペレータが多少とも斯様な操作に慣れて来
ると、上述の如きメツセージが不用となる場合が多い。When such voice response device W1 performs its operation,
An example from the speech synthesis unit is “ゝゝYES” or “No”.
Messages such as "Please speak and input" or "Please input the number by voice" are output, and the operator can input ``YES'' or ``NO'' into the voice recognition unit according to the message. Since the information processing unit can be operated and data inputted by inputting the number aloud, it is very convenient for operators who are not accustomed to operating the refusal device. However, as the operator becomes more or less accustomed to such operations, such messages as described above are often unnecessary.

ところが、合成音声のメツセージの発声中には、オペレ
ータの入力音声にこのメツセージが雑音として重畳する
事になり、音声認識部での正確力認識動作が不可能とな
る慣れがあった。この為、従来装置に於いては合成音声
の発声終了まで、次の音声入力は受は入れられない構成
となっており、この為無駄な待ち時間を費やす欠点があ
った。However, when a synthetic voice message is being uttered, the message is superimposed on the operator's input voice as noise, making it impossible for the voice recognition unit to perform accurate force recognition. For this reason, the conventional apparatus is configured such that the next voice input cannot be accepted until the synthetic voice is finished producing, which has the disadvantage of wasting waiting time.

（ハ）発明の目的本発明は合成音声メツセージの発声中及び終了後にかか
わらず、オペレータの入力音声を常に受は入れ可能とし
て、操作時間の短縮を図り九音声応答装置を提供する事
を目的としたものである。(c) Purpose of the Invention The object of the present invention is to provide a nine-voice response device that can always accept the operator's input voice, regardless of whether the synthesized voice message is being uttered or after it has been uttered, thereby shortening the operation time. This is what I did.

に）発明の構成本発明の音声応答装置は、マイクロフォンと音声認識部
との間に、マイクロフォンからの音声信号を情報処理部
からの要求に依り音声合成部で合成された合成音声信号
にて補正する補正手段を介押し、音声合成部がスピーカ
から合成音声を発声中であっても、上記マイクロフォン
へ音声を入力せしめて音声認識部での認識動作を可能と
したものである。B) Structure of the Invention The voice response device of the present invention corrects the voice signal from the microphone with a synthesized voice signal synthesized by the voice synthesis unit in response to a request from the information processing unit, between the microphone and the voice recognition unit. Even when the voice synthesizer is producing synthesized voice from the speaker, the voice is inputted to the microphone to enable the voice recognition unit to perform the recognition operation.

（！＠実施例第１図に本発明の音声応答装置の一実施例を示す。同図
に於いて、（１）は情報処理部でめ９、例えばセンサー
、記憶回路、タイマー回路等からなり、家庭での日常生
活に係る各種の電気機器を集中制徊１スるホームコント
ローラがこれに該当する。(!@Embodiment Fig. 1 shows an embodiment of the voice response device of the present invention. In the figure, (1) is an information processing section consisting of a sensor, a memory circuit, a timer circuit, etc.). A home controller that centrally controls various electrical devices related to daily life at home falls under this category.

（２）は該情報処理部＋１＋に連なった音声認識部であ
り、マイクロフォン（３）から増巾器（４）を介して得
られる音声信号Ｖ１を補正手段（５）にて補正した音声
信号Ｖ２を認識する事に依って、この音声信号Ｖ２に対
応した命令信号及びデータ信号を上記情報処理部ｆｌ）
へ入力する。この音声認識部（２）としては、例えば三
洋電機斡）製の品番５ＲＢ−６４なる音声認識ボードが
用いられる。（６）は上記情報処理部（１）に連なった
音声合成部であり、情報処理部（１）からの要求に依り
、必要な音声信号Ｓ１を合成して増巾器（７）を介して
スピーカ（８）から出力する。この音声合成部（６）と
しては、例えば三洋電機＠製の品番ＬＣ８１００なるＬ
ＳＩが用いられる。（９）は上記補正手段（５）からの
音声信号ｖ２を増巾クリップ回路（ＩＯ）にてクリップ
した信号をさらに積分して波形整形する積分器でめシ、
この積分器（１０Ｖからの信号は音声検知信号■８とな
り、この信号Ｖ８にて上記情報処理部ｆｔ）への割込み
がかけれる。尚、上記情報処理部ｆｉ＋はこの割込みに
依って、音声認識部（２）からの命令又はデータ信号を
受は入れ可能状態と２なる。(2) is a voice recognition unit connected to the information processing unit +1+, and the voice signal V2 is obtained by correcting the voice signal V1 obtained from the microphone (3) via the amplifier (4) by the correction means (5). By recognizing this, the command signal and data signal corresponding to this audio signal V2 are transmitted to the information processing unit fl).
Enter. As this voice recognition unit (2), for example, a voice recognition board manufactured by Sanyo Electric Co., Ltd., product number 5RB-64 is used. (6) is a speech synthesis section connected to the information processing section (1), which synthesizes the necessary speech signal S1 and sends it to the amplifier (7) according to a request from the information processing section (1). Output from the speaker (8). This voice synthesis unit (6) is, for example, L made by Sanyo Electric @, product number LC8100.
SI is used. (9) is an integrator that further integrates and shapes the waveform of the signal obtained by clipping the audio signal v2 from the correction means (5) using the amplification clip circuit (IO);
The signal from this integrator (10V) becomes the audio detection signal 8, and this signal V8 causes an interrupt to the information processing section ft. Note that the information processing section fi+ becomes ready to receive commands or data signals from the voice recognition section (2) due to this interruption.

ここで本発明実施例装置の特徴とする補正手段（５）に
ついて詳述する。該補正手段（５）は音声合成部（６）
から得られる合成音声信号Ｓ１のゲイン調整及び位相調
整を行なう同調器ω１）と、該同調器韓）からの同調合
成音声信号Ｓ２を上記マイクロフォン（３）から増巾器
（４）を介して得られる音声信号ｖ１から差し引く減算
器（−とからなり、この減算器■からの音声信号ｖ２が
音声認識部（２）に入力されると共に、この音声信号ｖ
２は増巾クリップ回路（１０）及び積分器（９）にて音
声検知信号ｖ８として情報処理部に割込みがかけられる
。第２図に各信号ｓｉ、ｓ２、Ｖｌ、Ｖ２．ＶＢの波形
を示し、同図に基づいて本発明装置の動作を述べる。Here, the correction means (5), which is a feature of the apparatus according to the embodiment of the present invention, will be described in detail. The correction means (5) is a speech synthesis section (6)
A tuner ω1) performs gain adjustment and phase adjustment of the synthesized speech signal S1 obtained from the above, and a tuned synthesized speech signal S2 from the tuner ω1) is obtained from the microphone (3) via the amplifier (4). A subtracter (-) is input to the voice recognition unit (2), and the voice signal v2 from this subtractor (2) is input to the voice recognition unit (2).
2, an amplification clip circuit (10) and an integrator (9) interrupt the information processing section as a voice detection signal v8. FIG. 2 shows each signal si, s2, Vl, V2. The waveform of VB is shown, and the operation of the device of the present invention will be described based on the figure.

情報処理部ｍからの要求に依り、音声合成部（６）から
例えばｌ’−”ＹＥＳ“又は１′Ｎθ″を発声入力して
下さい。」力るメツセージの第２図に示す如き合成音声
信号Ｓ１が出力され、これがスピーカ（８）にて発声さ
れている時、即ち、発声開始から１１時間後、オペレー
タがマイクロフォン（３）に向って「ＮＯ」と発声する
と、このマイクロフォン（３）には、オペレータの音声
Ｖに上記スピーカ（８）から発声されている上記合成音
声Ｓが重畳されて入力され、増巾器（４）からは第２図
に示す如き重畳音声信号Ｖｌが得られる。即ち、この重
畳音声信号ｖ１は、それに含まれる合成音声Ｓ成分が音
声■に対する雑音となるので、音声認識不可能な信号と
なっている。従って、補正手段（５）に於いては、同ｒ
回路（６１）にてこの時の合成音声信号Ｓ１を上記重畳
音声信号Ｖｌに含まれる合成音声Ｓ成分に同調せしめた
第２図に示す如き間開合成音声信号Ｓ２とし、減算器（
５２）にてこの同調合成音声信号Ｓ２を上記重畳音声信
号ｖ１から差し引いて、第２図に示す如き、オペレータ
の真の音声■に依る音声信号Ｖ２を得る。斯して得られ
た雑音成分のない音声信号ｖ２は音声認識部（２）にて
認識されると共に、増巾クリップ回１ｉＰ１ｆ１０１及
び積分器（９）にて第２図に示す如き音声検知信号ｖ８
を得て上記情報処理部（１）に割込みをかけ、情報処Ｆ
ｌ＋１（ｘ）を上記認識処理部（２）からの認識結果で
ある命令又はデータ信号の受は入れ可能状態とする。In response to a request from the information processing unit m, please vocally input, for example, l'-"YES" or 1'Nθ" from the speech synthesis unit (6)." A synthesized speech signal as shown in FIG. When S1 is output and is being uttered by the speaker (8), that is, 11 hours after the start of utterance, when the operator utters "NO" into the microphone (3), this microphone (3) , the synthesized voice S uttered from the speaker (8) is superimposed on the operator's voice V, and a superimposed voice signal Vl as shown in FIG. 2 is obtained from the amplifier (4). That is, this superimposed voice signal v1 is a signal that cannot be recognized as a voice because the synthesized voice S component contained therein becomes noise for the voice (2). Therefore, in the correction means (5), the same r
A circuit (61) converts the synthesized speech signal S1 at this time into a gap synthesized speech signal S2 as shown in FIG.
At step 52), this tuned synthesized voice signal S2 is subtracted from the superimposed voice signal v1 to obtain a voice signal V2 based on the operator's true voice ■ as shown in FIG. The thus obtained voice signal v2 free of noise components is recognized by the voice recognition unit (2), and is converted into a voice detection signal v8 as shown in FIG. 2 by the amplification clip circuit 1iP1f101 and the integrator (9).
The information processing unit (1) is interrupted by the information processing unit F.
l+1(x) is placed in a state in which it is ready to receive a command or data signal which is the recognition result from the recognition processing section (2).

さらに、第２図の信号図には示していないが、情報処理
部（１）に割込みがかかり、認識処理部（２）での認識
動作が正確に行なわれた時点に於いて、上記音声合成部
（１）での合成動作を直ちに中止せしめ、スピーカ（８
）からの合成音声出力を中断せしめる構成としてもよい
。この後、マイクロフォン（３）にはオペレータの音声
Ｖのみが入力される事となるが上記補正手段（Ｉ′Ｉ）
への合成音声信号Ｓ１が無声状態と力るので、補正手段
（６）からの音声信号■２はやはり、オ°ベレータの音
声Ｖのみに依るものとなシ、認識処理部（２）での確実
な認識動作が行表われる。Furthermore, although not shown in the signal diagram of FIG. Immediately stop the synthesis operation in section (1) and turn on the speaker (8).
) may be configured to interrupt the output of synthesized speech. After this, only the operator's voice V will be input to the microphone (3), but the correction means (I'I)
Since the synthesized speech signal S1 is in a voiceless state, the speech signal 2 from the correction means (6) depends only on the voice V of the operator. Reliable recognition behavior is performed.

従って、オペレータは、合成音声メツセージの先頭部分
、例えばＰゝＹ　Ｅ　Ｓ　″又はゞ′ＮＯ“」を聞くだ
けで次の音声入力形式を知る事ができ、直ちに、ＮＯ“
であればｒＮＯＪと発声入力でき、このＮＯ“力る入力
情報が情報処理部（１）に受は入れられる事となる。Therefore, the operator can know the next voice input format just by listening to the first part of the synthesized voice message, such as ``PY E S'' or ゞNO'', and can immediately select NO.
If so, rNOJ can be inputted by speaking, and the input information inputted as "NO" will be accepted by the information processing section (1).

（へ）発明の効果本発明の音声応答装置は、以上の説明から明らかな如く
、マイクロフォンと音声認識部との間に、音声合成部か
らの合成音声信号にてマイクロフォンからの音声信号を
補正する補正手段を介挿しているので、音声合成部から
スピーカを介して合成音声メツ七−ジが発声中でおって
も、この合成音声に影響される事なく、マイクロフォン
に入力されたオペレータの音声を音声認識部にて正確に
認識できる。従って、断る装置の掃作に慣れたオペレー
タにとって、合成音声のメッセージカ終了するのを待つ
事なく、次の音声入力ができ、操作時間の大巾な短縮が
図れる。(F) Effects of the Invention As is clear from the above description, the voice response device of the present invention corrects the voice signal from the microphone with the synthesized voice signal from the voice synthesis unit between the microphone and the voice recognition unit. Since the correction means is inserted, even if the synthesized voice message is being uttered from the voice synthesis unit through the speaker, the operator's voice input to the microphone will not be affected by this synthesized voice. Accurate recognition is possible with the voice recognition unit. Therefore, an operator who is accustomed to cleaning the refusal device can input the next voice without waiting for the synthesized voice message to finish, thereby greatly reducing the operating time.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の音声応答装置の一実施例のブロック図
、第２図は本発明装置の信号波形図であり、（１）は情
報処理部、（２）は音声認識部、（３）はマイクロフォ
ン、（５）は補正手段、（６）は音声合成部、（８）Ｃ
′２＞FIG. 1 is a block diagram of an embodiment of the voice response device of the present invention, and FIG. 2 is a signal waveform diagram of the device of the present invention, in which (1) is an information processing section, (2) is a voice recognition section, and (3 ) is a microphone, (5) is a correction means, (6) is a speech synthesis unit, (8) C
'2＞

Claims

【特許請求の範囲】[Claims]

（１）マイクロフォンと、該マイクロフォンカラ得られ
る人間の音声信号を認識する音声認識部と該音声認識部
での認識結果を入力情報として処理する情報処理部と、
該情報処理部からの要求に依シ音声信号を合成する音声
合成部と、該合成部からの合成音声信号に基づいて合成
音声を発声するスピーカと、を備えた音声応答装置に於
いて、上記マイクロフォンと音声認識部との間に、マイ
クロフォンからの音声信号を上記音声合成部の合成音声
信号にて補正する補正手段を介挿する事に依って、音声
合成部がスピーカから合成音声を発声中であっても、上
記マイクロフォンへ音声を入力せしめて上記音声認識部
での認識動作を可能とした事を特徴とする音声応答装置
。(1) a microphone, a voice recognition unit that recognizes a human voice signal obtained from the microphone, and an information processing unit that processes the recognition result of the voice recognition unit as input information;
In the voice response device described above, the voice response device includes a voice synthesis unit that synthesizes a voice signal depending on a request from the information processing unit, and a speaker that emits a synthesized voice based on the synthesized voice signal from the synthesis unit. By inserting a correction means between the microphone and the speech recognition section, which corrects the speech signal from the microphone with the synthesized speech signal of the speech synthesis section, the speech synthesis section is producing synthesized speech from the speaker. A voice response device characterized in that the voice recognition unit can perform a recognition operation by inputting voice to the microphone.