JP4418903B2

JP4418903B2 - Voice recognition device

Info

Publication number: JP4418903B2
Application number: JP2004075488A
Authority: JP
Inventors: 紀子鈴木; 恭弘片桐
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2004-03-17
Filing date: 2004-03-17
Publication date: 2010-02-24
Anticipated expiration: 2024-03-17
Also published as: JP2005266020A

Description

この発明は、音声認識装置に関し、特にたとえば、コンピュータと対話をする被験者の発話内容を認識する、音声認識装置に関する。 The present invention relates to a voice recognition device, and more particularly to a voice recognition device that recognizes the content of a speech of a subject who interacts with a computer, for example.

従来のこの種の音声認識装置の一例が、特許文献１に開示されている。この従来技術によれば、ユーザの発話が速すぎる場合にシステム側からの発話速度を遅くするもので、これによってユーザの発話速度が適正範囲に誘導される。かかる誘導は、発話の音量や言い回しの丁寧さについても行われる。
特開２００３−１５０１９４号公報 An example of a conventional speech recognition apparatus of this type is disclosed in Patent Document 1. According to this prior art, when the user's utterance is too fast, the utterance speed from the system side is slowed down, and thereby the user's utterance speed is guided to an appropriate range. Such guidance is also performed for the volume of speech and the politeness of words.
JP 2003-150194 A

然しながら、上記した従来技術は、発話の速度と音量の誘導のみであったので、音声の認識精度向上には限界があった。 However, the above-described prior art has only the guidance of the speed and volume of speech, so there is a limit to improving the speech recognition accuracy.

それゆえに、この発明の主たる特徴は、発話のその他の韻律パラメータ、即ち発話音声のピッチ、ピッチレンジ、発話間隔についても音声認識に適した範囲に誘導し、音声認識の精度を向上させることを目的とする。 Therefore, the main feature of the present invention is to improve the accuracy of speech recognition by guiding other prosodic parameters of speech, that is, the pitch, pitch range, and speech interval of speech speech to a range suitable for speech recognition. And

請求項１の発明に従う音声認識装置は、擬似音声を出力する出力手段、被験者の発話音声を取り込む取り込み手段、取り込み手段によって取り込まれた発話音声のピッチを検出するピッチ検出手段、ピッチ検出手段によって検出されたピッチが第１閾値を下回るとき出力手段によって出力される擬似音声のピッチを上昇させるピッチ上昇手段、およびピッチ検出手段によって検出されたピッチが第１閾値よりも大きい第２閾値を上回るとき出力手段によって出力される擬似音声のピッチを低下させるピッチ低下手段を備える。 The speech recognition apparatus according to the invention of claim 1 is detected by an output means for outputting pseudo speech, a capture means for capturing the speech sound of the subject, a pitch detection means for detecting the pitch of the speech sound captured by the capture means, and a pitch detection means. Output when the pitch detected is lower than the first threshold, pitch increasing means for increasing the pitch of the pseudo sound output by the output means, and output when the pitch detected by the pitch detection means exceeds a second threshold greater than the first threshold Pitch reduction means for reducing the pitch of the pseudo sound output by the means is provided.

被験者の発話音声のピッチが第１閾値を下回ると、擬似音声のピッチが上昇する。擬似音声のピッチの上昇によって、被験者の発話音声のピッチが上昇方向に誘導される。また、被験者の発話音声のピッチが第２閾値を上回ると、擬似音声のピッチが低下する。擬似音声のピッチの低下によって、被験者の発話音声のピッチが減少方向に誘導される。これによって、被験者の発話音声のピッチを音声認識が可能な範囲に収めることができ、音声認識の精度の向上が図られる。 When the pitch of the uttered voice of the subject falls below the first threshold, the pitch of the pseudo voice increases. As the pitch of the pseudo voice increases, the pitch of the speech voice of the subject is guided in the upward direction. Moreover, if the pitch of a test subject's speech sound exceeds a 2nd threshold value, the pitch of a pseudo sound will fall. The pitch of the uttered voice of the subject is guided in the decreasing direction by the decrease in the pitch of the pseudo voice. As a result, the pitch of the speech voice of the subject can be kept within a range where voice recognition is possible, and the accuracy of voice recognition can be improved.

請求項２の発明に従う音声認識装置は、請求項１に従属し、取り込み手段によって取り込まれた発話音声のピッチレンジを検出するピッチレンジ検出手段、ピッチレンジ検出手段によって検出されたピッチレンジが第３閾値を下回るとき出力手段によって出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大手段、およびピッチレンジ検出手段によって検出されたピッチレンジが第３閾値よりも大きい第４閾値を上回るとき出力手段によって出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小手段をさらに備える。 The speech recognition apparatus according to the invention of claim 2 is dependent on claim 1, and the pitch range detection means for detecting the pitch range of the speech voice captured by the capture means, and the pitch range detected by the pitch range detection means is third. When the pitch range detected by the pitch range detection means exceeds a fourth threshold value greater than the third threshold value, the output means increases the pitch range of the pseudo sound output by the output means when it falls below the threshold value. Pitch range reduction means for reducing the pitch range of the output pseudo sound is further provided.

被験者の発話音声のピッチレンジが第３閾値を下回ると、擬似音声のピッチレンジが拡大される。擬似音声のピッチレンジの拡大によって、被験者の発話音声のピッチレンジが拡大方向に誘導される。また、被験者の発話音声のピッチレンジが第４閾値を上回ると、擬似音声のピッチレンジが縮小される。擬似音声のピッチレンジの縮小によって、被験者の発話音声のピッチレンジが縮小方向に誘導される。これによって、被験者の発話音声のピッチレンジを音声認識が可能な範囲に収めることができ、音声認識の精度の向上が図られる。 When the pitch range of the uttered voice of the subject falls below the third threshold, the pitch range of the pseudo voice is expanded. By expanding the pitch range of the pseudo voice, the pitch range of the speech voice of the subject is guided in the expansion direction. Further, when the pitch range of the uttered voice of the subject exceeds the fourth threshold value, the pitch range of the pseudo voice is reduced. By reducing the pitch range of the pseudo voice, the pitch range of the speech voice of the subject is guided in the reduction direction. As a result, the pitch range of the uttered speech of the subject can be kept within a range where speech recognition is possible, and the accuracy of speech recognition can be improved.

請求項３の発明に従う音声認識装置は、請求項１または２に従属し、取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、発話間隔検出手段によって検出された発話間隔が第５閾値を下回るとき出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および発話間隔検出手段によって検出された発話間隔が第５閾値よりも大きい第６閾値を上回るとき出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段をさらに備える。 The speech recognition apparatus according to the invention of claim 3 is dependent on claim 1 or 2, and the speech interval detection means for detecting the speech interval of the speech voice captured by the capture means, and the speech interval detected by the speech interval detection means An utterance interval extension means for extending the utterance interval of the pseudo speech output by the output means when the output value is less than the fifth threshold value, and an output when the utterance interval detected by the utterance interval detection means exceeds the sixth threshold value greater than the fifth threshold value. The apparatus further comprises speech interval shortening means for shortening the speech interval of the pseudo sound output by the means.

被験者の発話音声の発話間隔が第５閾値を下回ると、擬似音声の発話間隔が伸長される。擬似音声の発話間隔の伸長によって、被験者の発話音声の発話間隔が伸長方向に誘導される。また、被験者の発話音声の発話間隔が第６閾値を上回ると、擬似音声の発話間隔が短縮される。擬似音声の発話間隔の短縮によって、被験者の発話音声の発話間隔が短縮方向に誘導される。これによって、被験者の発話音声の速度を音声認識が可能な範囲に収めることができ、音声認識の精度の向上が図られる。 When the speech interval of the test subject's speech is less than the fifth threshold, the speech interval of the pseudo speech is extended. By extending the utterance interval of the pseudo speech, the utterance interval of the subject's utterance speech is guided in the extension direction. Moreover, when the speech interval of the test subject's speech exceeds the sixth threshold, the speech interval of the pseudo speech is shortened. By shortening the speech interval of the pseudo speech, the speech interval of the subject's speech speech is guided in the shortening direction. As a result, the speed of the speech voice of the subject can be kept within a range where voice recognition is possible, and the accuracy of voice recognition can be improved.

請求項４の発明に従う音声認識装置は、擬似音声を出力する出力手段、被験者の発話音声を取り込む取り込み手段、取り込み手段によって取り込まれた発話音声のピッチレンジを検出するピッチレンジ検出手段、ピッチレンジ検出手段によって検出されたピッチレンジが第１閾値を下回るとき出力手段によって出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大手段、およびピッチレンジ検出手段によって検出されたピッチレンジが第１閾値よりも大きい第２閾値を上回るとき出力手段によって出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小手段を備える。 According to a fourth aspect of the present invention, there is provided a speech recognition apparatus comprising: output means for outputting pseudo speech; capture means for capturing a speech voice of a subject; pitch range detection means for detecting a pitch range of speech sound captured by the capture means; pitch range detection When the pitch range detected by the means falls below the first threshold, the pitch range expanding means for expanding the pitch range of the pseudo sound output by the output means, and the pitch range detected by the pitch range detecting means is less than the first threshold. Pitch range reduction means is provided for reducing the pitch range of the pseudo sound output by the output means when it exceeds a large second threshold.

請求項５の発明に従う音声認識装置は、請求項４に従属し、取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、発話間隔検出手段によって検出された発話間隔が第３閾値を下回るとき出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および発話間隔検出手段によって検出された発話間隔が第３閾値よりも大きい第４閾値を上回るとき出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段をさらに備える。 The speech recognition apparatus according to the invention of claim 5 is dependent on claim 4, and the speech interval detection means for detecting the speech interval of the speech voice captured by the capture means, and the speech interval detected by the speech interval detection means is third. When the speech interval detected by the speech interval detecting means exceeds the fourth threshold value, which is larger than the third threshold value, by the output means, the speech interval extending means for extending the speech interval of the pseudo speech output by the output means when it falls below the threshold value. It further includes speech interval shortening means for shortening the speech interval of the output pseudo voice.

請求項６の発明に従う音声認識装置は、擬似音声を出力する出力手段、被験者の発話音声を取り込む取り込み手段、取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、発話間隔検出手段によって検出された発話間隔が第１閾値を下回るとき出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および発話間隔検出手段によって検出された発話間隔が第１閾値よりも大きい第２閾値を上回るとき出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段を備える。 The speech recognition apparatus according to the invention of claim 6 includes an output means for outputting a pseudo sound, a capturing means for capturing a speech sound of a subject, a speech interval detecting means for detecting a speech interval of speech sound captured by the capturing means, and a speech interval detection. The speech interval extending means for extending the speech interval of the pseudo speech output by the output means when the speech interval detected by the means is less than the first threshold, and the speech interval detected by the speech interval detecting means is less than the first threshold Speaking interval shortening means for shortening the speech interval of the pseudo voice output by the output means when exceeding the large second threshold value is provided.

請求項７の発明に従う音声認識プログラムは、音声認識装置のプロセサによって実行される音声認識プログラムであって、被験者の発話音声のピッチを検出するピッチ検出ステップ、ピッチ検出ステップによって検出されたピッチが第１閾値を下回るときスピーカから出力される擬似音声のピッチを上昇させるピッチ上昇ステップ、およびピッチ検出手段によって検出されたピッチが第１閾値よりも大きい第２閾値を上回るときスピーカから出力される擬似音声のピッチを低下させるピッチ低下ステップを備える。 A speech recognition program according to a seventh aspect of the invention is a speech recognition program executed by a processor of a speech recognition device, wherein the pitch detection step for detecting the pitch of the uttered speech of the subject and the pitch detected by the pitch detection step are the first. A pitch increasing step for increasing the pitch of the pseudo sound output from the speaker when the threshold is less than one threshold, and the pseudo sound output from the speaker when the pitch detected by the pitch detection means exceeds a second threshold greater than the first threshold. A pitch lowering step for lowering the pitch.

請求項８の発明に従う音声認識プログラムは、請求項７に従属し、被験者の発話音声のピッチレンジを検出するピッチレンジ検出ステップ、ピッチレンジ検出ステップによって検出されたピッチレンジが第３閾値を下回るときスピーカから出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大ステップ、およびピッチレンジ検出ステップによって検出されたピッチレンジが第３閾値よりも大きい第４閾値を上回るときスピーカから出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小ステップをさらに備える。 The speech recognition program according to the invention of claim 8 is dependent on claim 7 and includes a pitch range detection step for detecting the pitch range of the uttered speech of the subject, and the pitch range detected by the pitch range detection step is below the third threshold value. The pitch range expansion step for expanding the pitch range of the pseudo sound output from the speaker, and the pseudo sound output from the speaker when the pitch range detected by the pitch range detection step exceeds a fourth threshold value that is greater than the third threshold value. A pitch range reduction step for reducing the pitch range is further provided.

請求項９の発明に従う音声認識プログラムは、請求項７または８に従属し、被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、発話間隔検出ステップによって検出された発話間隔が第５閾値を下回るときスピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および発話間隔検出ステップによって検出された発話間隔が第５閾値よりも大きい第６閾値を上回るときスピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップをさらに備える。 The speech recognition program according to the invention of claim 9 is dependent on claim 7 or 8, and the speech interval detected by the speech interval detection step for detecting the speech interval of the speech speech of the subject, the speech interval detected by the speech interval detection step has a fifth threshold value. The speech output from the speaker when the speech interval detected by the speech interval extension step for extending the speech interval of the pseudo sound output from the speaker when the time is less than the sixth threshold greater than the fifth threshold is exceeded. An utterance interval shortening step for shortening the speech utterance interval is further provided.

請求項１０の発明に従う音声認識プログラムは、音声認識装置のプロセサによって実行される音声認識プログラムであって、被験者の発話音声のピッチレンジを検出するピッチレンジ検出ステップ、ピッチレンジ検出ステップによって検出されたピッチレンジが第１閾値を下回るときスピーカから出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大ステップ、およびピッチレンジ検出ステップによって検出されたピッチレンジが第１閾値よりも大きい第２閾値を上回るときスピーカから出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小ステップを備える。 A speech recognition program according to the invention of claim 10 is a speech recognition program executed by a processor of a speech recognition device, and is detected by a pitch range detection step and a pitch range detection step for detecting a pitch range of speech sound of a subject. When the pitch range falls below the first threshold, the pitch range expansion step for expanding the pitch range of the pseudo sound output from the speaker, and the pitch range detected by the pitch range detection step exceeds the second threshold that is larger than the first threshold. A pitch range reduction step for reducing the pitch range of the pseudo sound output from the speaker.

請求項１１の発明に従う音声認識プログラムは、請求項１０に従属し、被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、発話間隔検出ステップによって検出された発話間隔が第３閾値を下回るときスピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および発話間隔検出ステップによって検出された発話間隔が前記第３閾値よりも大きい第４閾値を上回るときスピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップをさらに備える。 The speech recognition program according to the invention of claim 11 is dependent on claim 10, and the speech interval detection step for detecting the speech interval of the speech speech of the subject, the speech interval detected by the speech interval detection step is less than the third threshold value A pseudo speech output from the speaker when the speech interval detected by the speech interval extending step for extending the speech interval of the pseudo speech output from the speaker and the speech interval detecting step exceeds a fourth threshold value greater than the third threshold value. The speech interval shortening step for shortening the speech interval is further provided.

請求項１２の発明に従う音声認識プログラムは、音声認識装置のプロセサによって実行される音声認識プログラムであって、被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、発話間隔検出ステップによって検出された発話間隔が第１閾値を下回るときスピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および発話間隔検出ステップによって検出された発話間隔が第１閾値よりも大きい第２閾値を上回るときスピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップを備える。 A speech recognition program according to a twelfth aspect of the invention is a speech recognition program executed by a processor of a speech recognition device, which is detected by an utterance interval detection step and an utterance interval detection step for detecting an utterance interval of speech sound of a subject. When the speech interval is less than the first threshold, the speech interval extension step that extends the speech interval of the pseudo sound output from the speaker, and the speech interval detected by the speech interval detection step exceeds the second threshold that is greater than the first threshold. An utterance interval shortening step for shortening the utterance interval of the pseudo sound output from the speaker.

この発明によれば、擬似音声のピッチの上昇／低下、ピッチレンジの拡大／縮小、または発話間隔の伸長／短縮によって被験者の発話音声のピッチ、ピッチレンジまたは発話間隔を所望の方向に誘導するようにしたため、音声認識の精度を向上させることができる。 According to the present invention, the pitch, pitch range, or speech interval of the subject's speech is guided in a desired direction by increasing / decreasing the pitch of the pseudo speech, expanding / reducing the pitch range, or extending / shortening the speech interval. Therefore, the accuracy of voice recognition can be improved.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１を参照して、この実施例の音声認識装置１０は、被験者の発話音声信号を取り込むマイク１８を含む。マイク１８によって取り込まれた発話音声信号は、Ａ／Ｄ変換器１４を介してＣＰＵ１２に与えられる。ＣＰＵ１２は、Ａ／Ｄ変換器１４から取り込まれた発話音声データの内容を解析し、被験者と対話する擬似音声データを作成する。作成された擬似音声データは、Ｄ／Ａ変換器１６を介してスピーカ２０から出力される。なお、この擬似音声としては、周知の音声合成手段によって合成された合成音声が該当する。 With reference to FIG. 1, the speech recognition apparatus 10 of this embodiment includes a microphone 18 that captures a speech voice signal of a subject. The utterance voice signal captured by the microphone 18 is given to the CPU 12 via the A / D converter 14. The CPU 12 analyzes the content of the utterance voice data taken from the A / D converter 14 and creates pseudo voice data for dialogue with the subject. The created pseudo audio data is output from the speaker 20 via the D / A converter 16. The pseudo speech corresponds to synthesized speech synthesized by a known speech synthesis means.

ＣＰＵ１２は、被験者の発話音声の認識精度を高めるべく、１フレーズ分の発話音声データが取り込まれる毎に、図３〜図５に示すフロー図に従う処理を実行する。なお、これらのフロー図に対応する制御プログラムは、メモリ２２に記憶される。 CPU12 performs the process according to the flowchart shown in FIGS. 3-5 whenever the speech audio | speech data for 1 phrase is taken in, in order to improve the recognition precision of a test subject's speech. A control program corresponding to these flowcharts is stored in the memory 22.

まず図３を参照して、ステップＳ１では、取り込まれた発話音声データに基づいて被験者の声の高さつまりピッチＰＩｈを検出する。ステップＳ３およびＳ５では、検出されたピッチＰＩｈと閾値ＰＩｈ１およびＰＩｈ２との大小関係を判別する。なお、図２に示すように、閾値ＰＩｈ１およびＰＩｈ２はそれぞれ、音声認識可能範囲の下限値および上限値である。 First, referring to FIG. 3, in step S <b> 1, the pitch of the subject's voice, that is, pitch PIh is detected based on the captured speech data. In steps S3 and S5, the magnitude relationship between the detected pitch PIh and the threshold values PIh1 and PIh2 is determined. As shown in FIG. 2, threshold values PIh1 and PIh2 are a lower limit value and an upper limit value of the speech recognizable range, respectively.

ピッチＰＩｈが閾値ＰＩｈ１以上でかつ閾値ＰＩｈ２以下であれば、音声認識が可能であるとみなし、ステップＳ３からステップＳ１１に進む。ピッチＰＩｈが閾値ＰＩｈ１を下回るときは、ステップＳ３でＮＯと判断しかつステップＳ５でＹＥＳと判断し、ステップＳ７で擬似音声のピッチＰＩｃを“Δα”だけ高くする。ピッチＰＩｈが閾値ＰＩｈ２を上回るときは、ステップＳ３およびステップＳ５でＮＯと判断し、ステップＳ９で擬似音声のピッチＰＩｃを“Δα”だけ低くする。ステップＳ７またはＳ９の処理が完了すると、ステップＳ１１に進む。 If the pitch PIh is greater than or equal to the threshold value PIh1 and less than or equal to the threshold value PIh2, it is considered that voice recognition is possible, and the process proceeds from step S3 to step S11. When the pitch PIh is lower than the threshold value PIh1, it is determined NO in step S3 and YES in step S5, and the pitch PIc of the pseudo sound is increased by “Δα” in step S7. When the pitch PIh exceeds the threshold value PIh2, NO is determined in step S3 and step S5, and the pitch PIc of the pseudo sound is decreased by “Δα” in step S9. When the process of step S7 or S9 is completed, the process proceeds to step S11.

なお、ピッチＰＩｃの可変範囲は“初期値±２α”であり、この範囲を外れる方向への更新を試みるステップＳ７またはＳ９の処理は、意味を成さない。 Note that the variable range of the pitch PIc is “initial value ± 2α”, and the processing in step S7 or S9 that attempts to update in a direction out of this range does not make sense.

このように、発話音声のピッチＰＩｈが低すぎれば擬似音声のピッチＰＩｃが高くなり、発話音声のピッチＰＩｈが高すぎれば擬似音声のピッチＰＩｃが低くなる。つまり、発話音声のピッチＰＩｈは、整列傾向によって、音声認識可能範囲に収まるように誘導される。 Thus, if the pitch PIh of the utterance voice is too low, the pitch PIc of the pseudo voice is high, and if the pitch PIh of the utterance voice is too high, the pitch PIc of the pseudo voice is low. That is, the pitch PIh of the speech voice is guided so as to fall within the voice recognizable range due to the alignment tendency.

ステップＳ１１では、取り込まれた発話音声データに基づいて被験者の声の抑揚範囲つまりピッチレンジＰＲｈを検出する。ステップＳ１３およびＳ１５では、検出されたピッチレンジＰＲｈと閾値ＰＲｈ１およびＰＲｈ２との大小関係を判別する。なお、図２に示すように、閾値ＰＲｈ１およびＰＲｈ２も、それぞれ音声認識可能範囲の下限値および上限値である。 In step S11, an inflection range of the subject's voice, that is, a pitch range PRh is detected based on the captured speech data. In steps S13 and S15, the magnitude relationship between the detected pitch range PRh and the threshold values PRh1 and PRh2 is determined. As shown in FIG. 2, threshold values PRh1 and PRh2 are also a lower limit value and an upper limit value of the speech recognizable range, respectively.

ピッチレンジＰＲｈがＰＲｈ１≦ＰＲｈ≦ＰＲｈ２の条件を満たせば、音声認識が可能であるとみなし、ステップＳ１３からステップＳ２１に進む。ピッチレンジＰＲｈが閾値ＰＲｈ１を下回るときは、ステップＳ１７で擬似音声のピッチレンジＰＲｃを“Δβ”だけ拡大させる。ピッチレンジＰＲｈが閾値ＰＲｈ２を上回るときは、ステップＳ１９で擬似音声のピッチレンジＰＲｃを“Δβ”だけ縮小させる。ステップＳ１７またはＳ１９の処理が完了すると、ステップＳ２１に進む。 If the pitch range PRh satisfies the condition of PRh1 ≦ PRh ≦ PRh2, it is considered that speech recognition is possible, and the process proceeds from step S13 to step S21. When the pitch range PRh is lower than the threshold value PRh1, the pitch range PRc of the pseudo sound is expanded by “Δβ” in step S17. When the pitch range PRh exceeds the threshold value PRh2, the pseudo audio pitch range PRc is reduced by “Δβ” in step S19. When the process of step S17 or S19 is completed, the process proceeds to step S21.

なお、上述と同様、ピッチレンジＰＲｈの可変範囲も“初期値±２β”であり、この範囲を外れる方向への更新を試みるステップＳ１７またはＳ１９の処理は、意味を成さない。 As described above, the variable range of the pitch range PRh is also “initial value ± 2β”, and the processing in step S17 or S19 that attempts to update in a direction out of this range does not make sense.

このように、発話音声のピッチレンジＰＲｈが狭すぎれば擬似音声のピッチレンジＰＲｃが拡大され、発話音声のピッチレンジＰＲｈが広すぎれば擬似音声のピッチレンジＰＲｃが縮小される。つまり、発話音声のピッチレンジＰＲｈは、整列傾向によって、音声認識可能範囲に収まるように誘導される。 Thus, if the pitch range PRh of the utterance voice is too narrow, the pitch range PRc of the pseudo voice is expanded, and if the pitch range PRh of the utterance voice is too wide, the pitch range PRc of the pseudo voice is reduced. That is, the pitch range PRh of the uttered speech is guided so as to be within the speech recognizable range due to the alignment tendency.

ステップＳ２１では、取り込まれた発話音声データに基づいて被験者の発話速度Ｓｈを検出する。ステップＳ２３およびＳ２５では、検出された発話速度Ｓｈと閾値Ｓｈ１およびＳｈ２との大小関係を判別する。上述と同様、閾値Ｓｈ１およびＳｈ２も、それぞれ音声認識可能範囲の下限値および上限値である。また、ステップＳ２１で検出される発話速度の単位は、“mora/sec”である。 In step S21, the utterance speed Sh of the subject is detected based on the captured utterance voice data. In steps S23 and S25, the magnitude relationship between the detected utterance speed Sh and the thresholds Sh1 and Sh2 is determined. As described above, the thresholds Sh1 and Sh2 are also a lower limit value and an upper limit value of the speech recognizable range, respectively. The unit of the speech rate detected in step S21 is “mora / sec”.

発話速度ＳｈがＳｈ１≦Ｓｈ≦Ｓｈ２の条件を満たせば、音声認識が可能であるとみなし、ステップＳ２３からステップＳ３１に進む。発話速度Ｓｈが閾値Ｓｈ１を下回るときは、ステップＳ２７で擬似音声の発話速度Ｓｃを“Δγ”だけ上昇させる。発話速度Ｓｈが閾値Ｓｈ２を上回るときは、ステップＳ２９で擬似音声の発話速度Ｓｃを“Δγ”だけ低下させる。ステップＳ２７またはＳ２９の処理が完了すると、ステップＳ３１に進む。 If the speech rate Sh satisfies the condition of Sh1 ≦ Sh ≦ Sh2, it is considered that speech recognition is possible, and the process proceeds from step S23 to step S31. When the utterance speed Sh is lower than the threshold value Sh1, the utterance speed Sc of the pseudo voice is increased by “Δγ” in step S27. When the utterance speed Sh exceeds the threshold value Sh2, the utterance speed Sc of the pseudo voice is decreased by “Δγ” in step S29. When the process of step S27 or S29 is completed, the process proceeds to step S31.

なお、上述と同様、発話速度Ｓｈの可変範囲も“初期値±２γ”であり、この範囲を外れる方向への更新を試みるステップＳ２７またはＳ２９の処理は、意味を成さない。 As described above, the variable range of the speech rate Sh is also “initial value ± 2γ”, and the process of step S27 or S29 that attempts to update in a direction out of this range does not make sense.

このように、被験者の発話速度Ｓｈが低すぎれば擬似音声の発話速度Ｓｃが上昇し、被験者の発話速度Ｓｈが高すぎれば擬似音声の発話速度Ｓｃが低下する。つまり、被験者の発話速度Ｓｈは、整列傾向によって、音声認識可能範囲に収まるように誘導される。 Thus, if the subject's speech rate Sh is too low, the pseudo speech rate Sc increases, and if the subject's speech rate Sh is too high, the pseudo speech rate Sc decreases. That is, the speaking speed Sh of the subject is guided so as to be within the speech recognizable range due to the alignment tendency.

ステップＳ３１では、取り込まれた発話音声データに基づいて被験者の発話間隔（相手方の擬似音声終了時刻から自身の応答開始時刻までの間隔）Ｔｈを検出する。ステップＳ３３およびＳ３５では、検出された発話間隔Ｔｈと閾値Ｔｈ１およびＴｈ２との大小関係を判別する。上述と同様、閾値Ｔｈ１およびＴｈ２も、それぞれ音声認識可能範囲の下限値および上限値である。 In step S31, the subject's speech interval (interval from the opponent's pseudo-speech end time to its own response start time) Th is detected based on the captured speech voice data. In steps S33 and S35, the magnitude relationship between the detected speech interval Th and the threshold values Th1 and Th2 is determined. As described above, the threshold values Th1 and Th2 are also a lower limit value and an upper limit value of the speech recognizable range, respectively.

発話間隔ＴｈがＴｈ１≦Ｔｈ≦Ｔｈ２の条件を満たせば、音声認識が可能であるとみなし、ステップＳ３３からステップＳ４１に進む。発話間隔Ｔｈが閾値Ｔｈ１を下回るときは、ステップＳ３７で擬似音声の発話間隔Ｔｃを“Δδ”だけ伸長させる。発話間隔Ｔｈが閾値Ｔｈ２を上回るときは、ステップＳ３９で擬似音声の発話間隔Ｔｃを“Δδ”だけ短縮させる。ステップＳ３７またはＳ３９の処理が完了すると、ステップＳ４１に進む。 If the speech interval Th satisfies the condition of Th1 ≦ Th ≦ Th2, it is considered that speech recognition is possible, and the process proceeds from step S33 to step S41. When the utterance interval Th is less than the threshold value Th1, the utterance interval Tc of the pseudo voice is extended by “Δδ” in step S37. When the utterance interval Th exceeds the threshold Th2, the utterance interval Tc of the pseudo voice is shortened by “Δδ” in step S39. When the process of step S37 or S39 is completed, the process proceeds to step S41.

なお、上述と同様、発話間隔Ｔｈの可変範囲も“初期値±２δ”であり、この範囲を外れる方向への更新を試みるステップＳ３７またはＳ３９の処理は、意味を成さない。 As described above, the variable range of the speech interval Th is also “initial value ± 2δ”, and the processing in step S37 or S39 that attempts to update in a direction out of this range does not make sense.

このように、被験者の発話間隔Ｔｈが短すぎれば擬似音声の発話間隔Ｔｃが伸長され、被験者の発話間隔Ｔｈが長すぎれば擬似音声の発話間隔Ｔｃが短縮される。つまり、被験者の発話間隔Ｔｈは、整列傾向によって、音声認識可能範囲に収まるように誘導される。 Thus, if the subject's utterance interval Th is too short, the pseudo speech utterance interval Tc is extended, and if the subject's utterance interval Th is too long, the pseudo speech utterance interval Tc is shortened. That is, the subject's utterance interval Th is guided so as to be within the speech recognizable range due to the alignment tendency.

ステップＳ４１では、取り込まれた発話音声データに基づいて被験者の声量Ｖｈを検出する。ステップＳ４３およびＳ４５では、検出された声量Ｖｈと閾値Ｖｈ１およびＶｈ２との大小関係を判別する。なお、図２に示すように、閾値Ｖｈ１およびＶｈ２も、それぞれ音声認識可能範囲の下限値および上限値である。 In step S41, the subject's voice volume Vh is detected based on the captured speech data. In steps S43 and S45, the magnitude relationship between the detected voice volume Vh and the threshold values Vh1 and Vh2 is determined. As shown in FIG. 2, threshold values Vh1 and Vh2 are also a lower limit value and an upper limit value of the speech recognizable range, respectively.

声量ＶｈがＶｈ１≦Ｖｈ≦Ｖｈ２の条件を満たせば、音声認識が可能であるとみなし、処理を終了する。声量Ｖｈが閾値Ｖｈ１を下回るときは、ステップＳ４７で擬似音声の声量Ｖｃを“Δε”だけ増大させる。声量Ｖｈが閾値Ｖｈ２を上回るときは、ステップＳ４９で擬似音声の声量Ｖｃを“Δε”だけ減少させる。ステップＳ４７またはＳ４９の処理が完了すると、処理を終了する。 If the voice volume Vh satisfies the condition of Vh1 ≦ Vh ≦ Vh2, it is considered that voice recognition is possible, and the process is terminated. When the voice volume Vh is lower than the threshold value Vh1, the voice volume Vc of the pseudo voice is increased by “Δε” in step S47. When the voice volume Vh exceeds the threshold value Vh2, the voice volume Vc of the pseudo voice is decreased by “Δε” in step S49. When the process of step S47 or S49 is completed, the process ends.

なお、上述と同様、声量Ｖｈの可変範囲も“初期値±２ε”であり、この範囲を外れる方向への更新を試みるステップＳ４７またはＳ４９の処理は、意味を成さない。 As described above, the variable range of the voice volume Vh is also “initial value ± 2ε”, and the process of step S47 or S49 that attempts to update in a direction out of this range does not make sense.

このように、発話音声の声量Ｖｈが小さすぎれば擬似音声の声量Ｖｃが増大し、発話音声の声量Ｖｈが大きすぎれば擬似音声の声量Ｖｃが減少する。つまり、発話音声の声量Ｖｈは、整列傾向によって、音声認識可能範囲に収まるように誘導される。 Thus, if the voice volume Vh of the uttered voice is too small, the voice volume Vc of the pseudo voice increases, and if the voice volume Vh of the uttered voice is too large, the voice volume Vc of the pseudo voice decreases. That is, the voice volume Vh of the uttered voice is guided so as to be within the voice recognizable range due to the alignment tendency.

以上の説明から分かるように、擬似音声はスピーカ２０から出力され、被験者の発話音声はマイク１８によって取り込まれる。取り込まれた発話音声の韻律パラメータ値（ピッチ，ピッチレンジ，発話速度，発話間隔，声量）は、ＣＰＵ１２によって検出される。検出された韻律パラメータ値が音声認識可能範囲の下限値（ＰＩｈ１，ＰＲｈ１，Ｓｈ１，Ｔｈ１，Ｖｈ１）を下回れば、スピーカ２０から出力される擬似音声の韻律パラメータ値が上昇する。一方、検出された韻律パラメータ値が音声認識可能範囲の上限値（ＰＩｈ２，ＰＲｈ２，Ｓｈ２，Ｔｈ２，Ｖｈ２）を上回れば、スピーカ２０から出力される擬似音声の韻律パラメータ値が低下する。これによって、発話音声の韻律パラメータ値が音声認識可能範囲の収まるように誘導され、音声認識の精度の向上が図られる。 As can be seen from the above description, the pseudo sound is output from the speaker 20, and the speech sound of the subject is captured by the microphone 18. The CPU 12 detects the prosodic parameter values (pitch, pitch range, utterance speed, utterance interval, voice volume) of the captured utterance. If the detected prosodic parameter value falls below the lower limit value (PIh1, PRh1, Sh1, Th1, Vh1) of the speech recognizable range, the prosodic parameter value of the pseudo speech output from the speaker 20 increases. On the other hand, if the detected prosodic parameter value exceeds the upper limit (PIh2, PRh2, Sh2, Th2, Vh2) of the speech recognizable range, the prosodic parameter value of the pseudo speech output from the speaker 20 decreases. As a result, the prosodic parameter value of the uttered speech is guided to fall within the speech recognizable range, and the accuracy of speech recognition is improved.

この発明の一実施例の構成を示すブロック図である。It is a block diagram which shows the structure of one Example of this invention. 図１実施例の動作の一部を示す図解図である。It is an illustration figure which shows a part of operation | movement of FIG. 1 Example. 図１実施例に適用されるＣＰＵの動作の一部を示すフロー図である。It is a flowchart which shows a part of operation | movement of CPU applied to the FIG. 1 Example. 図１実施例に適用されるＣＰＵの動作の他の一部を示すフロー図である。It is a flowchart which shows a part of other operation | movement of CPU applied to the FIG. 1 Example. 図１実施例に適用されるＣＰＵの動作のその他の一部を示すフロー図である。It is a flowchart which shows a part of other operation | movement of CPU applied to the FIG. 1 Example.

符号の説明Explanation of symbols

１０ … 音声認識装置
１２ … ＣＰＵ
１８ … マイク
２０ … スピーカ
２２ … メモリ 10: Voice recognition device 12 ... CPU
18 ... Microphone 20 ... Speaker 22 ... Memory

Claims

擬似音声を出力する出力手段、
被験者の発話音声を取り込む取り込み手段、
前記取り込み手段によって取り込まれた発話音声のピッチを検出するピッチ検出手段、
前記ピッチ検出手段によって検出されたピッチが第１閾値を下回るとき前記出力手段によって出力される擬似音声のピッチを上昇させるピッチ上昇手段、および
前記ピッチ検出手段によって検出されたピッチが前記第１閾値よりも大きい第２閾値を上回るとき前記出力手段によって出力される擬似音声のピッチを低下させるピッチ低下手段を備える、音声認識装置。 Output means for outputting pseudo sound;
Capture means for capturing the speech of the subject,
Pitch detecting means for detecting the pitch of the uttered voice captured by the capturing means;
A pitch raising means for raising the pitch of the pseudo sound output by the output means when the pitch detected by the pitch detection means is below a first threshold; and the pitch detected by the pitch detection means is greater than the first threshold. A speech recognition apparatus comprising pitch reduction means for reducing the pitch of the pseudo speech output by the output means when the second threshold value is larger than the second threshold value.

前記取り込み手段によって取り込まれた発話音声のピッチレンジを検出するピッチレンジ検出手段、
前記ピッチレンジ検出手段によって検出されたピッチレンジが第３閾値を下回るとき前記出力手段によって出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大手段、および
前記ピッチレンジ検出手段によって検出されたピッチレンジが前記第３閾値よりも大きい第４閾値を上回るとき前記出力手段によって出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小手段をさらに備える、請求項１記載の音声認識装置。 Pitch range detecting means for detecting the pitch range of the speech voice captured by the capturing means;
Pitch range expanding means for expanding the pitch range of the pseudo sound output by the output means when the pitch range detected by the pitch range detecting means falls below a third threshold; and the pitch range detected by the pitch range detecting means The speech recognition apparatus according to claim 1, further comprising pitch range reduction means for reducing the pitch range of the pseudo speech output by the output means when the value exceeds a fourth threshold value that is greater than the third threshold value.

前記取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、
前記発話間隔検出手段によって検出された発話間隔が第５閾値を下回るとき前記出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および
前記発話間隔検出手段によって検出された発話間隔が前記第５閾値よりも大きい第６閾値を上回るとき前記出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段をさらに備える、請求項１または２記載の音声認識装置。 An utterance interval detecting means for detecting an utterance interval of the utterance voice captured by the capturing means;
An utterance interval extending means for extending an utterance interval of the pseudo speech output by the output means when the utterance interval detected by the utterance interval detecting means is less than a fifth threshold; and the utterance interval detected by the utterance interval detecting means. The speech recognition apparatus according to claim 1, further comprising speech interval shortening means for shortening the speech interval of the pseudo speech output by the output means when the value exceeds a sixth threshold value that is greater than the fifth threshold value.

擬似音声を出力する出力手段、
被験者の発話音声を取り込む取り込み手段、
前記取り込み手段によって取り込まれた発話音声のピッチレンジを検出するピッチレンジ検出手段、
前記ピッチレンジ検出手段によって検出されたピッチレンジが第１閾値を下回るとき前記出力手段によって出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大手段、および
前記ピッチレンジ検出手段によって検出されたピッチレンジが前記第１閾値よりも大きい第２閾値を上回るとき前記出力手段によって出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小手段を備える、音声認識装置。 Output means for outputting pseudo sound;
Capture means for capturing the speech of the subject,
Pitch range detecting means for detecting the pitch range of the speech voice captured by the capturing means;
Pitch range expanding means for expanding the pitch range of the pseudo sound output by the output means when the pitch range detected by the pitch range detecting means falls below a first threshold; and the pitch range detected by the pitch range detecting means A speech recognition apparatus comprising pitch range reduction means for reducing the pitch range of the pseudo voice output by the output means when the value exceeds a second threshold value that is greater than the first threshold value.

前記取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、
前記発話間隔検出手段によって検出された発話間隔が第３閾値を下回るとき前記出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および
前記発話間隔検出手段によって検出された発話間隔が前記第３閾値よりも大きい第４閾値を上回るとき前記出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段をさらに備える、請求項４記載の音声認識装置。 An utterance interval detecting means for detecting an utterance interval of the utterance voice captured by the capturing means;
An utterance interval extending means for extending an utterance interval of the pseudo speech output by the output means when the utterance interval detected by the utterance interval detecting means is less than a third threshold; and the utterance interval detected by the utterance interval detecting means. The speech recognition apparatus according to claim 4, further comprising speech interval shortening means for shortening the speech interval of the pseudo speech output by the output means when the value exceeds a fourth threshold value greater than the third threshold value.

擬似音声を出力する出力手段、
被験者の発話音声を取り込む取り込み手段、
前記取り込み手段によって取り込まれた発話音声の発話間隔を検出する発話間隔検出手段、
前記発話間隔検出手段によって検出された発話間隔が第１閾値を下回るとき前記出力手段によって出力される擬似音声の発話間隔を伸長する発話間隔伸長手段、および
前記発話間隔検出手段によって検出された発話間隔が前記第１閾値よりも大きい第２閾値を上回るとき前記出力手段によって出力される擬似音声の発話間隔を短縮する発話間隔短縮手段を備える、音声認識装置。 Output means for outputting pseudo sound;
Capture means for capturing the speech of the subject,
An utterance interval detecting means for detecting an utterance interval of the utterance voice captured by the capturing means;
An utterance interval extending means for extending an utterance interval of the pseudo voice output by the output means when the utterance interval detected by the utterance interval detecting means is less than a first threshold; and the utterance interval detected by the utterance interval detecting means. A speech recognition apparatus comprising speech interval shortening means for shortening the speech interval of the pseudo speech output by the output means when the value exceeds a second threshold value greater than the first threshold value.

音声認識装置のプロセサによって実行される音声認識プログラムであって、
被験者の発話音声のピッチを検出するピッチ検出ステップ、
前記ピッチ検出ステップによって検出されたピッチが第１閾値を下回るときスピーカから出力される擬似音声のピッチを上昇させるピッチ上昇ステップ、および
前記ピッチ検出手段によって検出されたピッチが前記第１閾値よりも大きい第２閾値を上回るとき前記スピーカから出力される擬似音声のピッチを低下させるピッチ低下ステップを備える、音声認識プログラム。 A speech recognition program executed by a processor of a speech recognition device,
A pitch detection step for detecting the pitch of the speech voice of the subject;
A pitch increasing step for increasing the pitch of the pseudo sound output from the speaker when the pitch detected by the pitch detecting step is below the first threshold; and the pitch detected by the pitch detecting means is larger than the first threshold A speech recognition program comprising a pitch reduction step of reducing the pitch of the pseudo sound output from the speaker when a second threshold value is exceeded.

前記被験者の発話音声のピッチレンジを検出するピッチレンジ検出ステップ、
前記ピッチレンジ検出ステップによって検出されたピッチレンジが第３閾値を下回るとき前記スピーカから出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大ステップ、および
前記ピッチレンジ検出ステップによって検出されたピッチレンジが前記第３閾値よりも大きい第４閾値を上回るとき前記スピーカから出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小ステップをさらに備える、請求項７記載の音声認識プログラム。 A pitch range detecting step for detecting a pitch range of the speech of the subject;
When the pitch range detected by the pitch range detection step is below a third threshold, the pitch range expansion step for expanding the pitch range of the pseudo sound output from the speaker; and the pitch range detected by the pitch range detection step The speech recognition program according to claim 7, further comprising a pitch range reduction step of reducing a pitch range of the pseudo sound output from the speaker when a fourth threshold value that is larger than the third threshold value is exceeded.

前記被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、
前記発話間隔検出ステップによって検出された発話間隔が第５閾値を下回るとき前記スピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および
前記発話間隔検出ステップによって検出された発話間隔が前記第５閾値よりも大きい第６閾値を上回るとき前記スピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップをさらに備える、請求項７または８記載の音声認識プログラム。 An utterance interval detecting step of detecting an utterance interval of the utterance voice of the subject;
An utterance interval extending step of extending an utterance interval of the pseudo sound output from the speaker when the utterance interval detected by the utterance interval detecting step is less than a fifth threshold; and the utterance interval detected by the utterance interval detecting step The speech recognition program according to claim 7 or 8, further comprising an utterance interval shortening step of shortening an utterance interval of the pseudo speech output from the speaker when the sixth threshold value is larger than the fifth threshold value.

音声認識装置のプロセサによって実行される音声認識プログラムであって、
被験者の発話音声のピッチレンジを検出するピッチレンジ検出ステップ、
前記ピッチレンジ検出ステップによって検出されたピッチレンジが第１閾値を下回るときスピーカから出力される擬似音声のピッチレンジを拡大させるピッチレンジ拡大ステップ、および
前記ピッチレンジ検出ステップによって検出されたピッチレンジが前記第１閾値よりも大きい第２閾値を上回るとき前記スピーカから出力される擬似音声のピッチレンジを縮小させるピッチレンジ縮小ステップを備える、音声認識プログラム。 A speech recognition program executed by a processor of a speech recognition device,
A pitch range detection step for detecting the pitch range of the speech of the subject;
When the pitch range detected by the pitch range detection step is below a first threshold, the pitch range expansion step for expanding the pitch range of the pseudo sound output from the speaker, and the pitch range detected by the pitch range detection step is A speech recognition program, comprising: a pitch range reduction step of reducing a pitch range of a pseudo sound output from the speaker when a second threshold value greater than a first threshold value is exceeded.

前記被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、
前記発話間隔検出ステップによって検出された発話間隔が第３閾値を下回るとき前記スピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および
前記発話間隔検出ステップによって検出された発話間隔が前記第３閾値よりも大きい第４閾値を上回るとき前記スピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップをさらに備える、請求項１０記載の音声認識プログラム。 An utterance interval detecting step of detecting an utterance interval of the utterance voice of the subject;
An utterance interval extending step of extending an utterance interval of the pseudo sound output from the speaker when the utterance interval detected by the utterance interval detecting step is less than a third threshold; and the utterance interval detected by the utterance interval detecting step The speech recognition program according to claim 10, further comprising an utterance interval shortening step of shortening an utterance interval of the pseudo speech output from the speaker when the fourth threshold value is larger than the third threshold value.

音声認識装置のプロセサによって実行される音声認識プログラムであって、
被験者の発話音声の発話間隔を検出する発話間隔検出ステップ、
前記発話間隔検出ステップによって検出された発話間隔が第１閾値を下回るときスピーカから出力される擬似音声の発話間隔を伸長する発話間隔伸長ステップ、および
前記発話間隔検出ステップによって検出された発話間隔が前記第１閾値よりも大きい第２閾値を上回るとき前記スピーカから出力される擬似音声の発話間隔を短縮する発話間隔短縮ステップを備える、音声認識プログラム。 A speech recognition program executed by a processor of a speech recognition device,
An utterance interval detecting step for detecting an utterance interval of speech sound of the subject;
When the speech interval detected by the speech interval detection step falls below a first threshold, the speech interval extension step of extending the speech interval of the pseudo sound output from the speaker; and the speech interval detected by the speech interval detection step A speech recognition program comprising: an utterance interval shortening step for shortening an utterance interval of pseudo speech output from the speaker when a second threshold value greater than a first threshold value is exceeded.