JPH02200046A

JPH02200046A - Telephone system with speech recognizing function

Info

Publication number: JPH02200046A
Application number: JP1020050A
Authority: JP
Inventors: Noboru Nashiki; 登梨木; Mutsuo Takematsu; 竹松　睦男; Yasumi Sada; 佐田　保美
Original assignee: Tamura Electric Works Ltd
Current assignee: Tamura Electric Works Ltd
Priority date: 1989-01-30
Filing date: 1989-01-30
Publication date: 1990-08-08

Abstract

PURPOSE:To reduce a misrecognition rate by deciding the correct recognition and to perform a connecting operation when the recognition output of a speech recognizing part is produced continuously by several times during the working of a timer and the same contents are recognized by plural times. CONSTITUTION:The subject system is provided with a detection means to detect the state change of a telephone set and a timer which starts with the output of the detection means, and a deciding means which decides a fact that the same recognition output is continuously produced by several times at a speech recognizing part during the working of the timer. When the presence of the reception data, the presence of the voice data, and no error code are decided, a process to store a folding code into a transmission buffer, a process to connect the speech synthesizing part 17 to a speech path, a process to output the words to the speech path, and a control process are successively carried out. Thus a slave equipment 4 produces the voices of the same contents as its own uttered voices with the voice synthesizing part 17 of a master equipment 1, and these words are repetitively transmitted. Then the equipment 4 can confirm the recognized contents and performs the control in accordance with the contents. As a result, the misrecognition rate is minimized.

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、電話機操作に関連した音声を音声認識部で
認識して制御動作を行う音声認識機能付電話装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a telephone device with a voice recognition function that performs control operations by recognizing voices related to telephone operations using a voice recognition section.

［従来の技術］従来、電話機の操作に関する制御を音声によって指示し
、その音声を認識して、その認識内容にしたがった制御
を行うことによって、使用者の手を煩わすことなく制御
できる電話装置が特開昭５９−１１１４９４号公報に開
示されている。[Prior Art] Conventionally, there has been a telephone device that can be controlled without the user's hands by instructing the operation of the telephone by voice, recognizing the voice, and performing control according to the recognized content. It is disclosed in Japanese Unexamined Patent Publication No. 59-111494.

［発明が解決しようとする課題］しかしながら音声認識装置の誤認識率は５パ一セント程
度であり、内線発信ならともかく、外線発信のような料
金に関係する接続動作を行わせるにはまだ信頼性が不足
しており、実用には適さない［課題を解決するための手段］このような課題を解決するために第１の発明は、電話機
の状態変化を検出する検出手段によってスタートするタ
イマと、タイマの動作中に音声認識部の同一認識出力が
複数回連続して発生したことを判定する判定手段とを備
え、判定手段の出力に基づいて制御動作を行うようにし
たものである。[Problem to be solved by the invention] However, the recognition error rate of the voice recognition device is about 5%, and it is not reliable enough for connecting operations related to charges such as making an external call, even if it is for making an internal call. [Means for Solving the Problems] In order to solve such problems, the first invention includes a timer started by a detection means for detecting a change in the state of the telephone; The apparatus includes a determining means for determining that the same recognition output of the voice recognition section has occurred several times in succession during the operation of the timer, and is configured to perform control operations based on the output of the determining means.

第２の発明は判定手段の出力に基づいて音声認識部が認
識した音声を合成して電話機から放声するようにしたも
のである。The second invention is such that the voice recognized by the voice recognition section is synthesized based on the output of the determining means and is emitted from the telephone.

［作用コ同一内容が複数回認識されたときに正しい認識であると
判断され、接続動作が行われる。[Operation] When the same content is recognized multiple times, it is determined that the recognition is correct, and a connection operation is performed.

［実施例］第１図はこの発明を適用して構成した装置の一実施例を
示すブロック図である。図において１は主装置、２１か
ら２ｆｉは外線、３□から３．は内線、４１から４１は
子機であるボタン電話機である。主装置１は外線インタ
ーフェイス１１１から１１０、通話路スイッチ１２、電
話機インターフェイス１３１から１３＝、ＣＰＵ１４、
Ｒ，０Ｍ１５、ＲＡＭ１６、子機で確認した音声と同一
内容の音声を音声合成装置で合成して子機に送り返す音
声合成部１７から構成されている。[Embodiment] FIG. 1 is a block diagram showing an embodiment of a device configured to apply the present invention. In the figure, 1 is the main device, 21 to 2fi are outside lines, and 3□ to 3. is an extension, and 41 to 41 are key telephones that are slave units. The main device 1 includes external line interfaces 111 to 110, a call path switch 12, telephone interfaces 131 to 13=, a CPU 14,
It is composed of an R, 0M 15, a RAM 16, and a voice synthesis unit 17 that synthesizes a voice with the same content as the voice confirmed by the slave unit using a voice synthesizer and sends it back to the slave unit.

ボタン電話機４は通話情報を送受信をする通話回路４１
、例えばアナログスイッチで構成された切換スイッチ４
２、音声認識プロセッサ４３、制御情報を送受信をする
伝送回路４４、表示回路４６、キーボード４７、これら
を制御する制御部４５、レシーバＲ、マイクロホンＴ、
フックスイッチＦ（Ｓから構成されている。The button telephone 4 has a call circuit 41 that sends and receives call information.
, for example, a changeover switch 4 composed of an analog switch.
2, a voice recognition processor 43, a transmission circuit 44 for transmitting and receiving control information, a display circuit 46, a keyboard 47, a control section 45 for controlling these, a receiver R, a microphone T,
It consists of a hook switch F (S).

第２図から第７図はこの装置の動作を示すフローチャー
トであり、第２図から第４図は子機の動作、第５図以降
は主装置の動作である。第２図においてステップ１００
に示すようにイニシャル処理が行われた後、ステップ１
０１から１０４に示すように、伝送処理、フック・キー
スイッチセンス処理、トーン制御処理、表示制御処理が
順次行われる。ここで、伝送処理は子機４の伝送回路４
４から主装置に制御データを伝送する処理、フック・キ
ースイッチセンス処理はフックスイッチおよびキースイ
ッチが押されたかどうかを検出するための処理、トーン
制御処理は第４図において後述するように、主装置から
伝送されてくるトーンがビジトーンであるか鵡返しコー
ドであるかを判断する処理、表示制御処理は表示回路４
６に必要な表示を行うための処理である。FIGS. 2 to 7 are flowcharts showing the operation of this device. FIGS. 2 to 4 are the operations of the slave device, and FIGS. 5 and after are the operations of the main device. In FIG. 2, step 100
After initial processing is performed as shown in step 1
As shown from 01 to 104, transmission processing, hook/key switch sense processing, tone control processing, and display control processing are performed in sequence. Here, the transmission process is performed by the transmission circuit 4 of the slave device 4.
4 to the main device, the hook/key switch sense process is a process for detecting whether a hook switch or key switch has been pressed, and the tone control process is a process for transmitting control data from the main unit to the main unit, as will be described later in FIG. The display circuit 4 performs the process of determining whether the tone transmitted from the device is a visit tone or a return code, and the display control process.
This is the processing for performing the display necessary for 6.

ステップ１０４に示す表示制御処理が終了するとステッ
プ１０５に示すように、当該子機は音声認識子機か否か
の判断が行われ、ｒＮＯＪであればステップ１０１にも
どるが、ｒＹＥＳ、であればステップ１０６に示すよう
に音声認識ユニット制御処理を行う。When the display control process shown in step 104 is completed, as shown in step 105, it is determined whether the child device is a voice recognition child device.If rNOJ, the process returns to step 101, but if rYES, step A voice recognition unit control process is performed as shown in 106.

音声認識ユニット制御処理は第３図に示し、ステップ１
１０においてオフフックであることが判断され、ステッ
プ１１１において音声辞書登録キー押下が判断され、ス
テップ１１２において音声入力が判断され、ステップ１
１３において同一内容の音声が３回入力されたことが判
断されると、ステップ１１４に示すように、入力された
音声が辞書に登録される。ステップ１１１において辞書
登録キー押下でないと判断されると、ステップ１１５示
すように音声受付タイマスタートの処理が行われる。そ
して、ステップ１１６において音声入力有りと判断され
、ステップ１１７において入力された音声が辞書登録済
みであると判断され、ステップ１１８においてその登録
済みの内容が正しく２回連続して認識されたことが判断
されると、ステップ１１９に示すように該当コードを送
信バッファに格納する。The voice recognition unit control process is shown in FIG. 3, and step 1
In step 10, it is determined that the user is off-hook, in step 111, it is determined that the voice dictionary registration key has been pressed, in step 112, voice input is determined, and in step 1
If it is determined in step 13 that the same voice has been input three times, the input voice is registered in the dictionary, as shown in step 114. If it is determined in step 111 that the dictionary registration key has not been pressed, a voice reception timer start process is performed as shown in step 115. Then, in step 116, it is determined that there is a voice input, in step 117, it is determined that the input voice has been registered in the dictionary, and in step 118, it is determined that the registered content has been correctly recognized twice in a row. If so, the corresponding code is stored in the transmission buffer as shown in step 119.

一方、ステップ１１７で辞書登録済みでないと判断され
ると、ステップ１２０に示すようにエラーコードを送信
バッファに格納する処理を行い、ステップ１２１でタイ
ムアツプと判定されると音声認識ユニット制御処理を終
了する。なお、ステップ１１４．１１９．１２０の処理
が終了したときも、音声認識ユニット制御処理を終了す
る。On the other hand, if it is determined in step 117 that the dictionary has not been registered, the process of storing the error code in the transmission buffer is performed as shown in step 120, and if it is determined that the time has expired in step 121, the speech recognition unit control process is terminated. . Note that the voice recognition unit control process also ends when the processing of steps 114, 119, and 120 ends.

第４図は第２図のステップ１０３で示したトーン制御処
理であり、主装置から送られてくるトーンを判定しステ
ップ１２５でビジトーンコードと判断されるとステップ
１２６に示すようにビジトーンを受話器へ送出して話中
音を聞かせ、ステップ１２５においてビジトーンコード
でなくステップ１２６で鵡返しコードと判断したときは
、ステッブ１２７に示すように、受話器への通話回路ル
ープを形成する。ステップ１２６ａにおいて鵡返しコー
ドでないと判断されたときは、ステップ１２８に示すよ
うにその他のトーンコード制御処理を行う。FIG. 4 shows the tone control process shown in step 103 in FIG. If it is determined in step 125 that it is not a visit tone code but that it is a return code in step 126, as shown in step 127, a communication circuit loop to the receiver is formed. If it is determined in step 126a that the tone code is not a repeat code, other tone code control processing is performed as shown in step 128.

第５図は主装置の動作であり、ステップ１５０において
受信データ有りと判定され、ステップ１５１において音
声データと判定され、ステップ１５２においてエラーコ
ードでないと判定されると、ステップ１５３からステッ
プ１５６に示すようにＢ返しコードを送信バッファに格
納する処理、音声合成部を通話路に接続する処理、当該
単語を通話路に出力する処理、制御処理が順次行われる
。FIG. 5 shows the operation of the main device. When it is determined in step 150 that there is received data, it is determined that it is voice data in step 151, and it is determined that it is not an error code in step 152, steps 153 to 156 are performed. Then, processing for storing the B return code in the transmission buffer, processing for connecting the speech synthesis unit to the communication path, processing for outputting the word to the communication path, and control processing are performed in sequence.

このことによって子機では自分が発声した音声と同一の
内容の音声が、主装置内の音声合成部で作られ、Ｂ返し
に送られてくるので認識した内容を確認でき、その内容
に対応する制御がステップ１５６で行われる。例えば子
機の発呼者が「特許」「特許」と２回発声し、それが正
しく認識されるとその内容が鵡返しに発呼者に送りかえ
されるとともに、特許庁への接続が行われる。As a result, on the slave unit, a voice with the same content as the voice uttered is created by the voice synthesis section in the main unit and sent back to B, so you can check the recognized content and respond to the content. Control occurs at step 156. For example, a caller on a handset says "patent" and "patent" twice, and if it is correctly recognized, the content is sent back to the caller and a connection is made to the patent office. .

−最に、現段階での音声認識の御認識率は５パ一セント
程度である。これをそのまま使用すると５パーセントは
誤接続となるが、２回同じ内容が認識されるときの誤認
識率は０．２５パーセントとなり、元の誤認識率よりも
極端に小さくなり、十分に実用に耐えるようになる。な
お、ステップ１５１で音声データでないと判断されると
ステップ１５７に示すように他の受信データ応答処理が
行われ、ステップ１５２でエラーコードであると判断さ
れるとステップ１５８に示すようにビジトーンを送信バ
ッファに格納する処理を行う、このため、誤認識のとき
、発呼者はビジトーンを聞くことになる第６図は主装置のメインプログラムでありステップ１６
０に示すようにイニシャル処理が行われた後、ステップ
１６１に示すように受信データ応答処理が行われ、ステ
ップ１６２において全ステップ終了が判断されると、ス
テップ１６３．１６４に示すように入力信号応答処理、
タイマカウントアツプ応答処理が行われる。そしてステ
ップ１６２がｒＮＯＪと判断されたときおよび、ステッ
プ１６４のタイマカウントアツプ応答処理が終了したと
き、フローはステップ１６１の受信データ応答処理に戻
る。-Finally, the recognition rate of voice recognition at the current stage is about 5%. If this is used as is, 5% of the connections will be incorrect, but when the same content is recognized twice, the incorrect recognition rate will be 0.25%, which is extremely smaller than the original incorrect recognition rate, and is sufficient for practical use. learn to endure. Note that if it is determined in step 151 that the received data is not audio data, other received data response processing is performed as shown in step 157, and if it is determined that it is an error code in step 152, a visit tone is transmitted as shown in step 158. The process of storing data in the buffer is performed. Therefore, in the event of false recognition, the caller will hear a visit tone. Figure 6 shows the main program of the main device, and step 16
After initial processing is performed as shown in step 0, received data response processing is performed as shown in step 161, and when it is determined in step 162 that all steps are completed, input signal response is performed as shown in steps 163 and 164. process,
Timer count up response processing is performed. Then, when step 162 is determined to be rNOJ and when the timer count-up response processing of step 164 is completed, the flow returns to step 161 of received data response processing.

第７図はタイマ割込プログラムであり、ステップ１７０
から１７２の伝送処理、入力信号センス処理、タイマカ
ウント処理が順次行われる。FIG. 7 is a timer interrupt program, step 170
From 172, transmission processing, input signal sensing processing, and timer counting processing are sequentially performed.

電話機の状態変化はオフフッタ、外線着信、内線着信で
も良く、この場合は音声で応答を行えば良い。実施例は
ボタン電話装置で説明したが単独電話機でも良く、認識
回数は２回としたが、２回以上の決められた回数であれ
ば良い。また鵡返しの合成音声は受話器でなくスピーカ
から出力しても良い。The change in the state of the telephone may be an off-footer, an incoming call from an outside line, or an incoming call from an extension line, and in this case, a voice response is sufficient. Although the embodiment has been described using a button telephone device, a stand-alone telephone may also be used, and although the number of times of recognition is set as two, it may be recognized as long as it is a predetermined number of times or more. Furthermore, the synthesized voice of the answer may be output from a speaker instead of a receiver.

［発明の効果］以上説明したように第１の発明は、同一内容の音声を連
続して認識したときに正しく認識したものとして制御処
理したので誤認識率が極端に低下し、十分に実用に耐え
うるという効果を有する。[Effects of the Invention] As explained above, in the first invention, when the same content of speech is recognized continuously, the control processing is performed as if it was recognized correctly, so the false recognition rate is extremely reduced, and it is sufficiently practical. It has the effect of being durable.

第２の発明は認識した音声を合成音声で作成し、子機に
送り返すため、発呼者は正しく認識されたことを確認で
きるという効果を有する。The second invention has the advantage that the recognized voice is created as a synthesized voice and sent back to the handset, so that the calling party can confirm that the voice has been correctly recognized.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図はこの発明を適用した装置の一実施例を示すブロ
ック図、第２図から第７図は動作を説明するためのフロ
ーチャートである。１・・・・主装置、２・・・・外線、４・・・・子ｔｆ
＆　にｌ’ンを話８’ｌ）　、１４−　・−−ＣＰＵ、
１７・・・・音声合成部、４２・・・・スイッチ、４３
・・・・音声認識プロセッサ、４５制御回路。特許出願人　　株式会社田村電機製作所代　理　人　　
山川数構（ほか２名）第図第図スタートエンド第図第図第図FIG. 1 is a block diagram showing an embodiment of an apparatus to which the present invention is applied, and FIGS. 2 to 7 are flowcharts for explaining the operation. 1...Main device, 2...External line, 4...Sub TF
&8'l), 14- ・---CPU,
17...Speech synthesis section, 42...Switch, 43
...Voice recognition processor, 45 control circuit. Patent applicant Tamura Electric Manufacturing Co., Ltd. Agent
Kazuki Yamakawa (and 2 others) Figure Figure Start End Figure Figure Figure

Claims

【特許請求の範囲】[Claims]

（１）電話機操作に関連した音声を音声認識部で認識し
てこの認識結果に基づいて制御動作を行う音声認識機能
付電話装置において、電話機の状態変化を検出する検出手段と、この検出手段の出力によってスタートするタイマと、このタイマの動作中に音声認識部の同一認識出力が複数
回連続して発生したことを判定する判定手段とを備え、判定手段の出力に基づいて制御動作を行うことを特徴と
する音声認識機能付電話装置。(1) A telephone device with a voice recognition function in which a voice recognition unit recognizes voice related to telephone operation and performs control operations based on the recognition result, a detection means for detecting a change in the state of the telephone; A timer that is started by the output, and a determining means that determines that the same recognition output of the voice recognition unit has occurred multiple times in succession during the operation of this timer, and a control operation is performed based on the output of the determining means. A telephone device with a voice recognition function.

（２）請求項１において、判定手段の出力に基づいて音
声認識部が認識した音声を合成して電話機から放声する
音声合成部を備えてなることを特徴とする音声認識機能
付電話装置。(2) The telephone device with a voice recognition function according to claim 1, further comprising a voice synthesis section which synthesizes the voice recognized by the voice recognition section based on the output of the determination means and outputs the voice from the telephone.