JPH02250099A

JPH02250099A - Speech recognition system

Info

Publication number: JPH02250099A
Application number: JP1070939A
Authority: JP
Inventors: Yasutomo Onishi; 大西　康友
Original assignee: Matsushita Refrigeration Co
Current assignee: Panasonic Holdings Corp
Priority date: 1989-03-23
Filing date: 1989-03-23
Publication date: 1990-10-05

Abstract

PURPOSE:To correctly recognize a voice that a customer voices during composite speech output by subtracting the return signal of a composite speech from an input voice, and extracting only the voice signal of the customer and inputting it to a speech recognizing means. CONSTITUTION:Only the return signal of the composite speech, which is outputted from a speaker 4, to a microphone 1 is removed from an input signal by an adaptive filter 16 and a subtracter 17 and speech recognition is allowed to be processed before the composite speech is outputted. Therefore, only the voice signal of the customer can be inputted to a speech recognizing means 2 even during the composite speech output and is correctly recognized. Further, only the voice signal that the customer voices is inputted to the speech recognizing means 2 after the composite speech is outputted and the signal is correctly recognized.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声合成技術と音声認識技術とを利用して、
音声による対話形式で客の好みの商品を選択する自動販
売機用音声認識システムに関するものである。[Detailed Description of the Invention] Industrial Application Field The present invention utilizes speech synthesis technology and speech recognition technology to
The present invention relates to a voice recognition system for vending machines that selects a customer's favorite product through voice interaction.

従来の技術従来の音声認識システムを用いたカップ飲料等の自動販
売機（以後、単にカップ自販機と称する）の構成と動作
の一例について第３図、第４図そして第５図を基に説明
する。BACKGROUND ART An example of the configuration and operation of a vending machine for cup beverages, etc. (hereinafter simply referred to as a cup vending machine) using a conventional voice recognition system will be explained based on FIGS. 3, 4, and 5. .

第３図は、従来の構成例を示す機能ブロック図で、１は
客が好みの飲料を選択するために音声を入力する音響電
気変換器であるマイクロホンである。２は入力された音
声から特徴パラメータを抽出しあらかじめ登碌した複数
の認識単語（認識すべき単語）の特徴パラメータと比較
して最も近い特徴パラメータに相当する単語を選出する
ことによ多入力された音声を認識する音声認識手段であ
る。３は接客と客に好みの飲料を選択して発声するよう
に誘導するための音声を合成する音声合成手段、４は合
成音声を出力する電気音響変換器であるスピーカである
。５は客から金銭を受は取り釣銭があれば払い戻しを行
う価値受は取り手段、６は客が選択した飲料を搬出する
飲料搬出手段である。そして、７は上述の各手段を制御
するためにマイクロコンピュータを主体として構成され
た制御手段であり、８が音声認識システムである。ただ
し、マイクロホン１とスピーカ４は共にカップ自販機本
体の前面に取υ付けられるため、マイクロホン１にはス
ピーカ４から発する合成音声も回シ込んで入力され、合
成音声出力中に客が音声を発声すると２種類の音声が混
同され正しく認識できないことになる。このため、制御
手段７によシ、合成音声の出力中は、音声認識手段２に
よる音声認識処理は禁止され、合成音声の出力終了後、
音声認識処理が許可される。FIG. 3 is a functional block diagram showing an example of a conventional configuration, in which reference numeral 1 denotes a microphone, which is an acoustoelectric transducer through which a customer inputs voice in order to select his/her favorite drink. 2 extracts feature parameters from the input speech, compares the feature parameters of multiple recognition words (words to be recognized) entered in advance, and selects the word corresponding to the feature parameter closest to the feature parameter. This is a voice recognition means that recognizes voice. Reference numeral 3 denotes a voice synthesizing means for synthesizing voice for guiding customers and customers to select and speak their favorite beverages, and 4 denotes a speaker which is an electroacoustic transducer for outputting synthesized voice. Reference numeral 5 indicates a means for receiving money from the customer and refunding the money if there is change, and reference numeral 6 indicates a beverage delivery means for transporting the beverage selected by the customer. Further, 7 is a control means mainly composed of a microcomputer to control each of the above-mentioned means, and 8 is a voice recognition system. However, since both microphone 1 and speaker 4 are attached to the front of the cup vending machine, the synthesized voice emitted from speaker 4 is also input to microphone 1, and if the customer speaks while the synthesized voice is being output. The two types of voices will be confused and cannot be recognized correctly. Therefore, the control means 7 prohibits the speech recognition processing by the speech recognition means 2 while outputting the synthesized speech, and after the output of the synthesized speech is finished,
Voice recognition processing is permitted.

尚、上述の音声認識手段２は不特定話者向けのものであ
り、その技術は、Ｓｐ　ｅ　ａ　ｋ　ｏ　ｒ　−Ｉｒｘ
ｉｅｐｅｎｄａｎｔＩｓ＋ｏｌａｔｅｄ　Ｗｏｒｄ　Ｒ
ｅｃｏｇｎｉｔｉｏｎ　ｆｏｒ　ａ　ｍｏｄｅｒａｔｅ
Ｓｉｚｅ　（５４Ｗｏｒｄ）　Ｖｏｃａｂｕｌａｒｙ　
（ＩＥＥＥＴＲＡＮＳＡＣＴＩＯＮＳ　ＯＮ　ＡＣＯＵ
ＳＴＩＣ３ＳＰＥＥＣＨ，ＡＮＤ　５ＩＧＮＡＬ　ＰＲ
ＯＣＥＳＳＩＮＧ。Note that the above-mentioned speech recognition means 2 is for unspecified speakers, and its technology is
iependantIs+olated Word R
recognition for a moderate
Size (54Word) Vocabulary
(IEEE TRANSACTIONS ON ACOU
STIC3SPEECH,AND5IGNAL PR
OCESSING.

ＶＯＬ、ＡＳＳＰ−２７，４６，ＤＥＣＥＭＢＥＲ１９
７９）等を始め、数多く公開され、ている。第４図に不
特定話者向は音声認識手段２のブロック図を示す。VOL, ASSP-27, 46, DECEMBER19
79), and many others have been published. FIG. 4 shows a block diagram of the speech recognition means 2 for non-specific speakers.

入力信号（アナログ値）は、Ａ／Ｄ変換部９で一定周期
毎にサンプリングされたデジタル値に量子化され、特徴
パラメータ抽出部１０で一定の微小区間毎に特徴パラメ
ータ（主にＬＰＣケプストラム係数）を抽出される。そ
して、抽出された特徴パラメータは音声認識処理部１１
を介してリングバッファ構造のＲＡＭ（１）１２に記録
される。ＲＯＭ１３には上述と同様の手法によりあらか
じめ抽出された認識単語の特徴パラメータがテンプレー
トと９（１認識単語当たシ数テンプレート）として記録されて
おり、処理に応じて音声認識制御部１１を介してＲＡＭ
（２）１４にコピーされる。パターンマツチング部１５
は、ＲＡＭ（１）１２に記録された特徴パラメータ中の
音声区間の特徴パラメータとＲＡＭ（２）１４のテンプ
レート群とのパターン間距離を計算し、最も距離の近い
テンプレートに相当する単語を音声認識制御部１１に出
力する。The input signal (analog value) is quantized by the A/D converter 9 into digital values sampled at regular intervals, and the feature parameter extractor 10 extracts feature parameters (mainly LPC cepstral coefficients) for each constant minute interval. is extracted. The extracted feature parameters are then processed by the speech recognition processing unit 11.
is recorded in the RAM (1) 12 having a ring buffer structure. In the ROM 13, feature parameters of recognized words extracted in advance using the same method as described above are recorded as templates and 9 (number template per recognized word), and are processed via the speech recognition control unit 11 according to processing. RAM
(2) Copied to 14. Pattern matching section 15
calculates the inter-pattern distance between the feature parameters of the voice section among the feature parameters recorded in the RAM (1) 12 and the template group in the RAM (2) 14, and performs speech recognition on the word corresponding to the template with the closest distance. It is output to the control section 11.

尚、入力信号の特徴パラメータ抽出はリアルタイムで実
行されて、音声認識制御部１１へ出力される。音声認識
処理を禁止する場合は、特徴パラメータのＲＡ　Ｍ（１
）　１２への記録を中止し、一方、音声認識処理を許可
する場合は、特徴パラメータのＲＡＭ（１）１２への記
録を開始する。Note that feature parameter extraction of the input signal is executed in real time and output to the speech recognition control section 11. When prohibiting speech recognition processing, the feature parameter RAM (1
) If the voice recognition process is to be permitted, recording of the feature parameters to RAM(1) 12 is started.

以上のように構成された従来のカップ自販機の動作例に
ついて第６図のフローチャートを基に説明する。ただし
、本例では説明を簡単にするために販売するフレーバー
（飲料の品名）をコーヒーとジュースの２種類とし、コ
ーヒーが選択された場合について説明する。An example of the operation of the conventional cup vending machine configured as described above will be explained based on the flowchart of FIG. 6. However, in this example, in order to simplify the explanation, there are two types of flavors (beverage product names) to be sold: coffee and juice, and the case where coffee is selected will be described.

まず、ステップ１０１で価値受は取り手段５に金銭が投
入されたかどうかを判断し、金銭が投入され飲料の値段
以上であればステップ１０２へ進む。ステップ１０２で
は音声認識手段２による入力音声の認識処理を禁止し、
ステップ１０３で音声合成手段３とスピーカ４により合
成音声を出力し、１いらりしゃいませ。」と接客して、
「何になさいますか。」と客にフレーバー名を選択して
いずれか一つを発声するように誘導する。この時、合成
音声はマイクロホン１に入力されるが認識処理は行なわ
れない。そして、合成音声の出力終了後、ステップ１０
４で音声認識処理を許可し、ステップ１０５で客が発声
した音声をマイクロホン１によ少入力し音声認識手段２
により認識する。First, in step 101, the value receiver determines whether money has been inserted into the collecting means 5, and if the money has been inserted and is equal to or greater than the price of the beverage, the process proceeds to step 102. In step 102, recognition processing of the input voice by the voice recognition means 2 is prohibited,
At step 103, the synthesized voice is output by the voice synthesis means 3 and the speaker 4, and 1 is welcome. ”, when serving customers,
"What would you like?" the customer is asked to select the flavor names and say one of them. At this time, the synthesized speech is input to the microphone 1, but no recognition processing is performed. After outputting the synthesized voice, step 10
In Step 4, voice recognition processing is permitted, and in Step 105, a small amount of the voice uttered by the customer is input into the microphone 1, and the voice recognition means 2
Recognize by.

ここではフレーバー名を認識するため、「コーヒー」と
「ジュース」という２種類の認識単語のあらかじめ登録
した特徴パラメータと入力音声から抽出した特徴パラメ
ータとを比較して最も近い特徴パラメータに相当する単
語を選出する（音声認識処理（１））。次に、ステップ
１０６で認識結果を判断し、「コーヒー・」の場合はス
テップ１０７へ進み、リジェクト（特徴パラメータ間の
距離がある値以上、即ち、入力音声がどの認識単語にも
該当しない）の場合はステップ１０２へ戻る。ステップ
１０７では、認識した単語の確認のため、ステップ１０
２と同様に音声認識処理を禁止し、ステップ１０８で、
「コーヒーですか。」と合成音声を出力して客の発声を
誘導する。そして、合成音声の出力終了後、ステップ１
０９で音声認識処理を許可し、ステップ１１０で客が発
声した音声を認識する。ここでの認識単語は「はい」と
「いいえ」の２種類である（音声認識処理（２））。次
にステップ１１１で認識結果を判断し、「はい」の場合
はステップ１１２へ進み、一方、「いいえ」の場合はス
テップ１０２へ戻り、リジェクトの場合はステップ１０
７へ戻る。ステップ１１２では飲料選択が終了したため
、音声認識処理を禁止して、ステップ１１３で、「ただ
今、お飲物を注いでおり壕す。しばらくお待ち下さい。Here, in order to recognize the flavor name, we compare the pre-registered feature parameters of two types of recognition words, "coffee" and "juice", with the feature parameters extracted from the input audio, and select the word corresponding to the closest feature parameter. Select (speech recognition processing (1)). Next, the recognition result is judged in step 106, and in the case of "coffee", the process proceeds to step 107 to reject (the distance between the feature parameters is greater than a certain value, that is, the input speech does not correspond to any recognized word). If so, return to step 102. In step 107, in order to confirm the recognized word, step 10
2, the voice recognition process is prohibited, and in step 108,
It outputs a synthesized voice saying, "Would you like coffee?" and guides the customer to speak. After outputting the synthesized voice, step 1
At step 09, voice recognition processing is permitted, and at step 110, the voice uttered by the customer is recognized. There are two types of words recognized here: "yes" and "no" (speech recognition process (2)). Next, the recognition result is determined in step 111, and if "yes", the process proceeds to step 112, while if "no", the process returns to step 102, and if it is rejected, the process proceeds to step 10.
Return to 7. In step 112, since the beverage selection has been completed, voice recognition processing is prohibited, and in step 113, the message ``We are currently pouring your drink.Please wait for a moment.''

」と合成音声を出力して接客するとともに、ステップ１
１４で飲料搬出手段６によシ選択された飲料（コーヒー
）をカップに注いでカップを搬出する。そして、釣銭が
ある場合は、ステップ１１６で釣銭の払い戻しを行い、
カップ飲料搬出後、ステップ１１６で、「あシがとうご
ざいました。」と合成音声を出力して一連の販売動作を
終了する。” while serving the customer by outputting a synthesized voice, Step 1
At step 14, the beverage carrying means 6 pours the selected beverage (coffee) into the cup, and the cup is carried out. If there is change, the change is refunded in step 116,
After carrying out the cup beverage, in step 116, a synthesized voice saying "Thank you very much for your support" is output, and the series of sales operations ends.

尚、ステップ１０６で、「ジュース」と認識した場合の
処理は上述と同様のため説明を割愛する。Note that the process performed when "juice" is recognized in step 106 is the same as described above, so a description thereof will be omitted.

発明が解決しようとする課題しかしながら、上記のような方法では、せっかちな客が
、発声を誘導するだめの合成音声の出力終了前に発声し
た場合は、音声認識処理がまだ許可されていないため、
客が発声した音声の全部、または、音声の一部分が欠落
するため正しく認識できないという不具合が生じるとい
う課題があった口本発明は、上記課題を鑑み、合成音声出力中も客が発声
した音声を正しく認識する音声認識システムを提供する
ことを目的とする。Problems to be Solved by the Invention However, with the above method, if an impatient customer speaks before the output of the synthesized voice that induces the voice is finished, voice recognition processing is not yet permitted.
In view of the above-mentioned problem, the present invention has a problem in that the voice uttered by the customer is not recognized correctly because all or a part of the voice is missing. The purpose is to provide a speech recognition system that correctly recognizes speech.

課題を解決するための手段上記課題を解決するために本発明の音声認識システムは
、任意の音声を合成する音声合成手段と、合成された音
声を出力するスピーカと、発声者の音声を入力するマイ
クロホンと、前記スピーカから出力される合成音声の前
記マイクロホンへの戻り信号の複製を作る適応フィルタ
と、前記マイクロホンの出力信号から前記適応フィルタ
の出力信号を差し引く減算器と、前記減算器の出力信号
を入力とし音声の認識を行う音声認識手段とを備えたも
のである。Means for Solving the Problems In order to solve the above problems, the speech recognition system of the present invention includes a speech synthesis means for synthesizing arbitrary speech, a speaker for outputting the synthesized speech, and a speaker inputting the speech of the speaker. a microphone; an adaptive filter that makes a replica of a return signal of synthesized speech output from the speaker to the microphone; a subtracter that subtracts an output signal of the adaptive filter from an output signal of the microphone; and an output signal of the subtracter. The apparatus is equipped with a voice recognition means that receives the input and recognizes the voice.

作　　用本発明は上記した構成により、合成音声出力中に客が音
声を発声した場合でも、入力音声から合成音声の戻り信
号を減算し、客の発声音声信号だけを取シ出して音声認
識手段に入力することができるものである。According to the above-described configuration, the present invention subtracts the return signal of the synthesized voice from the input voice and extracts only the voice signal uttered by the customer even when the customer utters a voice while outputting the synthesized voice. It can be entered into

実施例以下本発明の一実施例の音声認識システムについて図面
を参照しながら説明する。ただし、構成要件中、従来例
と同構成及び同処理のものは同番号、同ステップ番号を
付し、説明を割愛する。Embodiment Hereinafter, a speech recognition system according to an embodiment of the present invention will be described with reference to the drawings. However, among the structural requirements, those having the same configuration and the same processing as the conventional example are given the same numbers and step numbers, and explanations thereof will be omitted.

第１図は、本発明の一実施例の音声認識システムを利用
したカップ自販機の機能ブロック図を示すもので、１６
は適応フィルタ、１７は減算器である。適応フィルタ１
ｅは、音声合成手段３の出力信号と減算器１７の出力信
号とを入力し、スピーカ４から出力される合成音声のマ
イクロホンへの戻り信号の複製を学習によシ作成して減
算器に出力する。減算器１了は、マイクロホン１の出力
信号から適応フィルタ１６の出力信号を差し引いて音声
認識手段２に出力する。これにより、入力信号から合成
音声信号を除去し、客の発・声音声信号だけを取シ出す
ことになる。FIG. 1 shows a functional block diagram of a cup vending machine using a voice recognition system according to an embodiment of the present invention.
is an adaptive filter, and 17 is a subtracter. Adaptive filter 1
e inputs the output signal of the voice synthesis means 3 and the output signal of the subtracter 17, creates a copy of the return signal of the synthesized voice output from the speaker 4 to the microphone by learning, and outputs it to the subtracter. do. The subtracter 1 subtracts the output signal of the adaptive filter 16 from the output signal of the microphone 1 and outputs the result to the speech recognition means 2. As a result, the synthesized voice signal is removed from the input signal, and only the voice signal uttered by the customer is extracted.

尚、適応フィルタの学習アルゴリズムは、「システムの
学習的同定法」（計測と制御昭和４３年９月　第７巻第
９号Ｐ６９７〜ｐｅｏｓ）によυ公開されている。The learning algorithm for the adaptive filter is disclosed in "System Learning Identification Method" (Measurement and Control, September 1960, Vol. 7, No. 9, P697-peos).

以上のように構成されたカップ自販機の動作を第２図の
フローチャートにより説明する。ステップ１０１で金銭
投入後、ステップ２０１で音声認識処理を許可するとと
もに、ステップ２０２で、「いらっしゃいませ。何にな
さいますか。」と音声音声を出力する。そして、客が音
声を発声すればステップ２０３で音声認識処理（１）（
フレーバー名の認識）を行い、その結果をステップ２０
４で判断し、「コーヒー」の場合はステップ２０ｇへ進
み、リジェクトの場合はステップ２０１へ戻る。The operation of the cup vending machine configured as above will be explained with reference to the flowchart shown in FIG. After inserting money in step 101, voice recognition processing is permitted in step 201, and in step 202, a voice message saying "Welcome. What do you want?" is output. Then, if the customer utters a voice, the voice recognition process (1) (
flavor name recognition) and send the results to step 20.
If the result is "coffee", the process advances to step 20g, and if the result is "reject", the process returns to step 201.

ステップ２０５では、認識した単語を確認するため、「
コーヒーですか。」と合成音声を出力する。In step 205, in order to confirm the recognized words, "
Is it coffee? ” is output as a synthesized voice.

そして、客が音声を発声すればステップ２０６で音声認
識処理（２）（ｒはい」、「いいえ」の認識）を行い、
その結果をステップ２０７で判断し、「はい」の場合は
ステップ２０８へ進み、一方、「いいえ」の場合はステ
ップ２０１へ戻り、リジェクトの場合はステップ２０５
へ戻る。ステップ２０８では飲料選択が終了したため、
音声認識処理を禁止する。以降の処理は従来例と同処理
のため説明を割愛するが、以上が本実施例のカップ自販
機の販売動作例である。Then, if the customer utters a voice, voice recognition processing (2) (recognition of r yes and no) is performed in step 206.
The result is determined in step 207, and if "yes", the process proceeds to step 208, while if "no", the process returns to step 201, and if the result is rejected, the process proceeds to step 205.
Return to In step 208, since the beverage selection has been completed,
Prohibit speech recognition processing. Since the subsequent processing is the same as that of the conventional example, a description thereof will be omitted, but the above is an example of the vending operation of the cup vending machine of this embodiment.

以上のように本実施例によれば、適応フィルタ１６と減
算器１７によシスピー力４から出力される合成音声のマ
イクロホン１への戻υ信号だけを入力信号から除去する
とともに、合成音声出力前に音声認識処理を許可するも
のであるから、合一音声出力中も客の発声した音声信号
だけを音声認識手段２に入力することができ、正しく認
識できる。又、当然のことながら、合成音声出力終了後
は客の発声した音声信号だけが音声認識手段２に入力さ
れることになυ正しく認識される。As described above, according to the present embodiment, the adaptive filter 16 and the subtracter 17 remove only the return υ signal of the synthesized voice output from the system 4 to the microphone 1 from the input signal, and Since the voice recognition processing is allowed to be performed on the voice recognition means 2, only the voice signal uttered by the customer can be input to the voice recognition means 2 even during the output of the combined voice, so that it can be recognized correctly. Also, as a matter of course, after the synthetic voice output is finished, only the voice signal uttered by the customer is input to the voice recognition means 2, so that it is correctly recognized.

尚、本実施例においては、本発明の音声認識システムを
カップ自販機に利用した場合について説明したが、音声
合成手段と音声認識手段とＫよる対話形式を用いる他の
機器やシステムに利用しても同様の効果があることは言
うまでもない。In this embodiment, the case where the voice recognition system of the present invention is used in a cup vending machine has been described, but it may also be applied to other devices or systems that use a voice synthesis means, a voice recognition means, and a dialogue format using K. Needless to say, it has a similar effect.

発明の効果以上のように本発明の音声認識システムは、任意の音声
を合成する音声合成手段と、合成された音声を出力する
スピーカと、発声者の音声を入力するマイクロホンと、
前記スピーカから出力される合成音声の前記マイクロホ
ンへの戻り信号の複製を作る適応フィルタと、前記マイ
クロホンの出力信号から前記適応フィルタの出力信号を
差し弓く減算器と、前記減算器の出力信号を入力とし音
声の認識を行う音声認識手段とを設けることにより、客
が、発声を誘導するための合成音声の出力中に発声して
も、客が発声した音声信号だけを音声認識手段に入力で
き、正しく認識できるという効果がある。Effects of the Invention As described above, the speech recognition system of the present invention includes a speech synthesis means for synthesizing arbitrary speech, a speaker for outputting the synthesized speech, and a microphone for inputting the speech of the speaker.
an adaptive filter that makes a copy of a return signal of the synthesized voice output from the speaker to the microphone; a subtracter that subtracts the output signal of the adaptive filter from the output signal of the microphone; By providing a voice recognition means that recognizes the voice as an input, even if the customer speaks while the synthesized voice for guiding the voice is being output, only the voice signal uttered by the customer can be input to the voice recognition means. , it has the effect of being able to be recognized correctly.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の一実施例を利用したカップ自販機の機
能ブロック図、第２図はその販売動作を表す７０−チャ
ート、第３図は従来のカップ自販機の機能ブロック図、
冨４図は音声認識手段のブロック図、第５図は従来の販
売動作を表すフローチャートである。１・・・・・・マイクロホン、２・・・・・・音声認識
手段、３・・・・・・音声合成手段、４・・・・・・ス
ピーカ、８・・・・・・音声認識システム。代理人の氏名　弁理士粟　野　重　孝　ほか１名１１２
図区呼一−−−−＋　　　　＋−Ｊ第図FIG. 1 is a functional block diagram of a cup vending machine using an embodiment of the present invention, FIG. 2 is a 70-chart showing its vending operation, and FIG. 3 is a functional block diagram of a conventional cup vending machine.
Figure 4 is a block diagram of the voice recognition means, and Figure 5 is a flowchart showing conventional sales operations. 1...Microphone, 2...Speech recognition means, 3...Speech synthesis means, 4...Speaker, 8...Speech recognition system . Name of agent: Patent attorney Shigetaka Awano and one other person 112
Figure 1 ----+ +-J Figure

Claims

【特許請求の範囲】[Claims]

任意の音声を合成する音声合成手段と、合成された音声
を出力するスピーカと、発声者の音声を入力するマイク
ロホンと、前記スピーカから出力される合成音声の前記
マイクロホンへの戻り信号の複製を作る適応フィルタと
、前記マイクロホンの出力信号から前記適応フィルタの
出力信号を差し引く減算器と、前記減算器の出力信号を
入力とし音声の認識を行う音声認識手段とを有すること
を特徴とする音声認識システム。A speech synthesis means for synthesizing arbitrary speech, a speaker for outputting the synthesized speech, a microphone for inputting the speaker's speech, and a replica of a return signal of the synthesized speech output from the speaker to the microphone. A speech recognition system comprising: an adaptive filter; a subtracter that subtracts the output signal of the adaptive filter from the output signal of the microphone; and speech recognition means that receives the output signal of the subtracter as input and performs speech recognition. .