JP3266157B2

JP3266157B2 - Voice enhancement device

Info

Publication number: JP3266157B2
Application number: JP18081291A
Authority: JP
Inventors: 洋浜田; 克彦小川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-07-22
Filing date: 1991-07-22
Publication date: 2002-03-18
Anticipated expiration: 2017-03-18
Also published as: JPH0527792A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、人間同士が対話を行
う通信システムにおいて、人間間の意思疎通を円滑に行
うために、話題中の重要な語を強調して伝えるための音
声強調装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice emphasizing apparatus for emphasizing and transmitting important words in a topic in a communication system in which humans communicate with each other, in order to facilitate communication between humans. Things.

【０００２】[0002]

【従来の技術】テレコミュニケーションが発達し、人間
同士が対面せず通信を介して対話をする機会が増加して
いる。例えば、コンサルティング、注文受け付け、故障
受け付け、予約受け付け、苦情受け付けなどは、ほとん
どの業務が電話などの通信手段を介して行われるように
なってきている。さらに、画像蓄積・通信技術、大容量
通信技術、などの進歩にともない、音声のみでなく、静
止画像、動画像と音声を組み合わせた通信も用いられる
ようになっており、通信のマルチメディア化は更に進む
ものと考えられる。しかし、人間同士が何れかの通信手
段を介してコミュニケーションを行う場合に、最も重要
かつ効果が大きいのは電話、すなわち、音声によるコミ
ュニケーションであることが知られている〔例えば、
Ａ．Ｃｈａｐａｎｉｓ，“ＳｔｕｄｉｅｓｉｎＩｎ
ｔｅｒａｃｔｉｖｅＣｏｍｍｕｎｉｃａｔｉｏｎ：I
I. ＴｈｅＥｆｆｅｃｔｓｏｆＦｏｕｒＣｏｍ
ｍｕｎｉｃａｔｉｏｎＭｏｄｅｓｏｎｔｈｅＬ
ｉｎｇｕｉｓｔｉｃＰｅｒｆｏｒｍａｎｃｅｏｆ
ＴｅａｍｓｄｕｒｉｎｇＣｏｏｐｅｒａｔｉｖｅ
ＰｒｏｂｌｅｍＳｏｌｖｉｎｇ”．ＨｕｍａｎＦａ
ｃｔｏｒｓ，１９（２），ｐｐ．１０１−１２６（１９
７７）参照〕。2. Description of the Related Art With the development of telecommunication, opportunities for humans to interact with each other via communication without facing each other are increasing. For example, consulting, order reception, failure reception, reservation reception, complaint reception, and the like are almost all performed through communication means such as telephones. Furthermore, with the advancement of image storage / communication technology, large-capacity communication technology, etc., not only audio but also communication that combines still images, moving images, and audio has been used. It is thought that it will go further. However, when humans communicate with each other through any communication means, it is known that the most important and effective is telephone communication, that is, voice communication [for example,
A. Chapanis, “Studies in In
teractive Communication: I
I. The Effects of Four Com
communication Modes on the L
inguistic Performance of
Teams durable Cooperative
Problem Solving ". Human Fa
ctors, 19 (2), pp. 101-126 (19
77)].

【０００３】人間同士の対話において、両者の間で話題
や前提となる知識が一致していない場合、両者のコミュ
ニケーションを円滑に行うために要する時間がかかる、
誤った理解が生じることがある、などの問題がある。ま
た、人間はある意図をもって相手に話しかけるとき主題
となる語やキーワードを強調して発声するが、両者の前
提や知識が一致していない場合、または、他に作業しな
がら対話をしている場合など、必ずしもキーワードが一
致しないなどの問題があった。[0003] In a dialogue between humans, if the topics and the prerequisite knowledge do not match, it takes a long time for smooth communication between the two.
There is a problem that incorrect understanding may occur. Also, when speaking to the other party with a certain intention, humans emphasize the words and keywords that are the subject, but they do not agree on the assumptions and knowledge of the two, or when they are talking while working on other things There was a problem that keywords did not always match.

【０００４】[0004]

【問題を解決するための手段】この発明によれば、入力
音声から重要となる語、つまり強調すべき語（キーワー
ド）がキーワード検出部で検出され、その検出された強
調すべき語が強調処理部で強調処理され、入力音声が該
当部分を上記強調処理された語におきかえて音声出力部
から音声出力される。According to the present invention, an important word from an input voice, that is, a word to be emphasized (keyword) is detected by a keyword detecting section, and the detected word to be emphasized is subjected to an emphasis process. The input voice is output from the voice output unit by replacing the corresponding portion with the emphasized word.

【０００５】[0005]

【実施例】以下に、この発明の実施例を図面を用いて詳
細に説明する。図１にこの発明の一実施例を示す。音声
入力部１１は、電話回線、マイクロホン等を通じて音声
を取り込み、アナログ信号をディジタル信号に変換する
処理を行ってキーワード検出部１２へ供給する。キーワ
ード検出部１２は入力音声中から強調すべき語を検出す
る。このためこの例では特徴抽出部１３においてディジ
タル信号に変換された音声から、キーワード抽出処理を
行うための音声のスペクトル特徴のパラメータの抽出を
行う。この例では強調処理を行うための韻律的特徴のパ
ラメータの抽出も行う。音声のスペクトル的な特徴を表
すパラメータの分析法としては帯域通過フィルタ分析、
線形予測分析、ＦＦＴ（高速フーリエ変換）分析など各
種のものが提案されており、後に行うキーワード抽出処
理の方式に合致した分析法を選択して行えば良い。例え
ば、線形予測分析法による場合は、ＬＰＣケプストラ
ム、自己相関関数などがパラメータとして良く用いられ
る。また、強調処理を行うための韻律的特徴として、音
声のパワー、基本周波数（ピッチ）を抽出する。なお、
分析法は、スペクトル特徴を表すパラメータと強調処理
により変形した韻律特徴を表すパラメータとから音声信
号として合成できる分析法でなければならない。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 shows an embodiment of the present invention. The voice input unit 11 captures voice through a telephone line, a microphone, or the like, performs a process of converting an analog signal into a digital signal, and supplies the digital signal to the keyword detection unit 12. The keyword detection unit 12 detects a word to be emphasized from the input voice. For this reason, in this example, the parameters of the spectral features of the voice for performing the keyword extraction processing are extracted from the voice converted into the digital signal in the feature extracting unit 13. In this example, parameters of prosodic features for performing the emphasis processing are also extracted. Bandpass filter analysis is used as a method of analyzing parameters representing spectral characteristics of voice,
Various types such as linear prediction analysis and FFT (Fast Fourier Transform) analysis have been proposed, and an analysis method that matches a method of keyword extraction processing performed later may be selected and performed. For example, in the case of the linear prediction analysis method, an LPC cepstrum, an autocorrelation function, and the like are often used as parameters. In addition, as a prosodic feature for performing the emphasis processing, the power of the voice and the fundamental frequency (pitch) are extracted. In addition,
The analysis method must be an analysis method that can be synthesized as a speech signal from a parameter representing a spectrum feature and a parameter representing a prosodic feature transformed by an enhancement process.

【０００６】キーワード抽出部１４では、入力された音
声の中からキーワード辞書１５にあらかじめ登録された
キーワードの抽出を行う。キーワード抽出は、音声認識
技術のひとつであるワードスポッティングの技術を用い
て行うことができる。すなわち、あらかじめ抽出する必
要のある音声のパラメータ時系列をキーワード辞書１５
に登録しておき、特徴抽出部１３で得られたスペクトル
特徴を表す特徴パラメータの時系列と、キーワード辞書
１５に登録されているキーワードのスペクトル特徴を表
すパラメータ時系列とを順次パターンマッチング法によ
り比較しながら、入力音声中に含まれるキーワード、つ
まり強調すべき語を検出する。パターンマッチングを行
う際には、音声の時間的な伸縮を考慮し、非線形伸縮を
吸収するマッチング法を用いる方法が良い。キーワード
辞書１５に登録するキーワードは、該当する業務に応じ
てあらかじめ決定し、その音声のスペクトル特徴を表す
パラメータを蓄積しておく。例えば、テレホンショッピ
ングの受付であれば商品名や注文数を表す単語などがキ
ーワードとなる。The keyword extraction unit 14 extracts a keyword registered in the keyword dictionary 15 from the input speech. Keyword extraction can be performed using word spotting technology, which is one of voice recognition technologies. That is, the parameter time series of speech that needs to be extracted in advance is stored in the keyword dictionary 15.
, And sequentially compares the time series of the feature parameters representing the spectrum features obtained by the feature extraction unit 13 with the parameter time series representing the spectrum features of the keywords registered in the keyword dictionary 15 by the pattern matching method. Meanwhile, a keyword included in the input voice, that is, a word to be emphasized is detected. When performing pattern matching, it is preferable to use a matching method that absorbs nonlinear expansion and contraction in consideration of temporal expansion and contraction of voice. The keywords to be registered in the keyword dictionary 15 are determined in advance according to the corresponding task, and parameters representing the spectral characteristics of the voice are stored. For example, in the case of telephone shopping reception, a keyword representing a product name or an order quantity is used as a keyword.

【０００７】強調処理部１６では、入力音声からキーワ
ード検出部１２で抽出したキーワードの強調処理を行
う。このため特徴抽出部１３から抽出された基本周波
数、音声パワー等の韻律特徴が韻律特徴蓄積部１７に蓄
積される。また、特徴抽出部１３で抽出されたスペクト
ル特徴量は、キーワード抽出に用いられた後、スペクト
ル特徴蓄積部１８に蓄積される。また、キーワード抽出
部１４で抽出されたキーワード区間の情報は、キーワー
ド区間蓄積部１９に蓄積される。特徴量変形処理部２１
では、抽出されたキーワード区間に対して韻律特徴量、
スペクトル特徴量などを変形処理することにより、該当
するキーワード区間の強調を行う。[0007] The emphasis processing section 16 performs emphasis processing on the keywords extracted by the keyword detection section 12 from the input speech. Therefore, the prosody features such as the fundamental frequency and the audio power extracted from the feature extraction unit 13 are stored in the prosody feature storage unit 17. Further, the spectrum feature quantity extracted by the feature extraction unit 13 is stored in the spectrum feature storage unit 18 after being used for keyword extraction. The information on the keyword sections extracted by the keyword extracting unit 14 is stored in the keyword section storing unit 19. Feature amount deformation processing unit 21
Then, the prosodic feature amount for the extracted keyword section,
By subjecting the spectral feature and the like to deformation processing, the corresponding keyword section is emphasized.

【０００８】請求項２の発明によるキーワードの少くと
も前にポーズ（無音区間）を挿入する場合の特徴量変形
処理を図２の例に従って説明する。入力音声２２中にキ
ーワード２３が図２Ａに示すように検出された場合、そ
のキーワード２３に相当する音声区間２４の前後に図２
Ｂに示すように長さＸ₁のポーズ２５、長さＸ₂のポー
ズ２６（Ｘ₁＞０、Ｘ₂≧０）を挿入し、さらに、各ポ
ーズ２５、２６における前後の音声のパワーの不連続を
解消するため音声パワーの平滑化処理を行う（図２
Ｃ）。人間の音声パワーに対する知覚は、対数音声パワ
ーに比例していることが良く知られており、平滑化の処
理は対数パワーに対して行う方が良い。キーワードの後
ろにはポーズ２６を挿入しなくても、強調の効果は出
る。また、挿入する場合も、前のポーズ２５の長さＸ₁
より短くて良い。Ｘ₁，Ｘ₂は例えば０．５〜１．５秒
程度がよく、通常の音声の切れ目にポーズ２５を入れる
場合は１．５〜２．５秒程度が好ましい。[0008] A feature amount deformation process when a pause (silence section) is inserted at least before a keyword according to the second aspect of the present invention will be described with reference to the example of FIG. When a keyword 23 is detected in the input voice 22 as shown in FIG. 2A, before and after a voice section 24 corresponding to the keyword 23, as shown in FIG.
Pose 25 length X ₁ as shown in B, insert a pause 26 of length _{_{X 2 (X 1> 0,}} X 2 ≧ 0), further, not before and after the speech power in each pose 25,26 A smoothing process of audio power is performed to eliminate continuity (FIG. 2).
C). It is well known that human perception of audio power is proportional to logarithmic audio power, and it is better to perform smoothing processing on logarithmic power. Even if the pause 26 is not inserted after the keyword, the effect of emphasis is obtained. Also, when inserting, the length X ₁ of the previous pose 25
It may be shorter. X ₁ and X ₂ are preferably, for example, about 0.5 to 1.5 seconds, and preferably about 1.5 to 2.5 seconds when a pause 25 is inserted between normal voices.

【０００９】請求項３の発明によるキーワードに相当す
る音声区間の基本周波数を高く設定することによる強調
処理する場合を図３の例に従って説明する。入力音声２
２中にキーワード２３が図３Ａに示すように検出された
場合、そのキーワードに相当する音声区間２４の基本周
波数（ピッチ）を図３Ｂに示すように高く設定し、さら
に、キーワードの始端、終端部分での基本周波数の不連
続を解消するため基本周波数平滑化を行う（図３Ｃ）。
基本周波数を処理する場合においても、人間の基本周波
数の知覚が基本周波数の対数に比例していることを考慮
し、対数軸上で処理することが望ましい。基本周波数を
高く設定する方法としては、下記に示すように予め定め
た係数ａを対数で表現した基本周波数ｌｏｇ（Ｆｉ）に
乗ずる方法と、ｌｏｇ（Ｆｉ′）＝ａ×ｌｏｇ（Ｆｉ）Ｆｉ、Ｆｉ′は、それぞれ、ｉ時点での強調前の基本周
波数、強調後の基本周波数。A case in which the emphasis process is performed by setting the fundamental frequency of the voice section corresponding to the keyword high according to the invention of claim 3 will be described with reference to the example of FIG. Input audio 2
3A, when the keyword 23 is detected as shown in FIG. 3A, the fundamental frequency (pitch) of the voice section 24 corresponding to the keyword is set high as shown in FIG. 3B. In order to eliminate the discontinuity of the fundamental frequency in the above, the fundamental frequency is smoothed (FIG. 3C).
Also in the case of processing the fundamental frequency, it is desirable to perform the processing on the logarithmic axis in consideration of the fact that human perception of the fundamental frequency is proportional to the logarithm of the fundamental frequency. As a method of setting the fundamental frequency high, a method of multiplying a fundamental coefficient log (Fi) expressed by a logarithm with a predetermined coefficient a as shown below, and a method of log (Fi ′) = a × log (Fi) Fi, Fi 'is the fundamental frequency before emphasis at the time i and the fundamental frequency after emphasis.

【００１０】下記に示すように対数で表現した基本周波
数ｌｏｇ（Ｆｉ）に対し、一定の値ｂを加算する方法とｌｏｇ（Ｆｉ′）＝ｌｏｇ（Ｆｉ）＋ｂなど種々の方法があるが、計算量等を考慮して決定すれ
ば良い。また、ａ、ｂの値を変えることにより強調の程
度を制御することが可能である。ａとしては１．０５程
度、ｂとしては０．１程度がよい。対数軸上で処理しな
い場合は、乗算は１．１〜１．２倍、加算は男性につい
ては２０〜３０Ｈｚ、女性については４０〜５０Ｈｚ程
度が好ましい。As shown below, there are various methods such as a method of adding a fixed value b to a fundamental frequency log (Fi) expressed by a logarithm and a method of log (Fi ') = log (Fi) + b. The amount may be determined in consideration of the amount and the like. The degree of emphasis can be controlled by changing the values of a and b. a is preferably about 1.05, and b is preferably about 0.1. When processing is not performed on a logarithmic axis, multiplication is preferably 1.1 to 1.2 times, and addition is preferably about 20 to 30 Hz for men and about 40 to 50 Hz for women.

【００１１】請求項４の発明によるキーワードに相当す
る音声区間の音声パワーを大に設定することによる強調
処理する場合を図４の例に従って説明する。入力音声２
２中にキーワード２３が図４Ａに示すように検出された
場合、そのキーワード２３に相当する音声区間２４の音
声パワーを図４Ｂに示すように大きく設定し、さらに、
キーワード２３の始端、終端部分での音声パワーの不連
続を解消するため音声パワー平滑化を行う（図４Ｃ）。
また、音声パワーの処理に際しては、人間の音声のパワ
ーに関する知覚が音声パワーの対数に比例していること
を考慮し、対数軸上で行う方が効果が大きい。音声パワ
ーを大きく設定する方法としては、下記に示すように、
予め定めた係数ｃを対数で表現した音声パワーｌｏｇ
（Ｐｉ）に乗ずる方法と、ｌｏｇ（Ｐｉ′）＝ｃ×ｌｏｇ（Ｐｉ）Ｐｉ、Ｐｉ′は、それぞれ、ｉ時点での強調前の音声パ
ワー、強調後の音声パワー。A case where the emphasis processing is performed by setting the audio power of an audio section corresponding to a keyword to be large according to the invention of claim 4 will be described with reference to the example of FIG. Input audio 2
4A, when the keyword 23 is detected as shown in FIG. 4A, the voice power of the voice section 24 corresponding to the keyword 23 is set to be large as shown in FIG.
Audio power smoothing is performed to eliminate discontinuity in audio power at the beginning and end of keyword 23 (FIG. 4C).
Also, when processing the audio power, it is more effective to perform the processing on the logarithmic axis in consideration of the fact that the human perception of the power of the audio is proportional to the logarithm of the audio power. One way to increase the audio power is as follows:
Audio power log which expresses predetermined coefficient c in logarithm
(Pi) multiplication method; log (Pi ') = c * log (Pi) Pi and Pi' are the sound power before emphasis at the time i and the sound power after emphasis, respectively.

【００１２】下記に示すように対数で表現した音声パワ
ーに対し、一定の値ｄを加算する方法と、ｌｏｇ（Ｐｉ′）＝ｌｏｇ（Ｐｉ）＋ｄなど種々の方法があり、また、対数で表現しない場合に
おいても同様の効果が得られるが、計算量等を考慮して
いずれの方法を採用するか決定すれば良い。この時、
ｃ、ｄの値を制御することにより、強調の程度を変える
ことが可能である。何れにしても入力パワーに応じて適
当に決められるが、例えば、対数パワーで１、２倍程度
にされる。As shown below, there are various methods such as a method of adding a constant value d to the voice power expressed by a logarithm and a method of log (Pi ') = log (Pi) + d. Although the same effect can be obtained even when not performed, it is only necessary to determine which method is to be adopted in consideration of the amount of calculation and the like. At this time,
The degree of emphasis can be changed by controlling the values of c and d. In any case, it is appropriately determined according to the input power. For example, the logarithmic power is made about one or two times.

【００１３】請求項５の発明によるキーワードに相当す
る音声区間の前に警報音を挿入することにより強調処理
する場合を図２の例に従って説明する。まず、図２Ａに
示すように入力音声中にキーワードが検出された場合、
そのキーワードの前後に前述したように無音区間を挿入
（図２Ｂ）、その各前後の平滑化処理をする（図２
Ｃ）。つぎに、図２Ｄに示すように前後の無音区間２
５、２６に、ブザー、チャイム等受信者の注意を喚起す
る警報音２７、２８をそれぞれ挿入する。警報音はキー
ワードの前のみに挿入しても強調効果は得られる。A case in which an emphasis process is performed by inserting a warning sound before a voice section corresponding to a keyword according to the invention of claim 5 will be described with reference to the example of FIG. First, when a keyword is detected in the input voice as shown in FIG. 2A,
As described above, silence sections are inserted before and after the keyword (FIG. 2B), and smoothing processing is performed before and after each section (FIG. 2).
C). Next, as shown in FIG.
Warning sounds 27 and 28 for calling the receiver's attention, such as a buzzer and a chime, are inserted into 5 and 26, respectively. Even if the warning sound is inserted only before the keyword, the emphasis effect can be obtained.

【００１４】請求項６の発明によりキーワードに相当す
る音声区間を長くして強調処理する場合を図５の例に従
って説明する。まず、キーワードが図５Ａに示すように
検出された場合、そのキーワード区間のみ、特徴抽出部
１３における特徴分析の分析フレーム長Ｔｉに対して、
あらかじめ定めた係数ｅを乗じたＴｉ′＝ｅ×Ｔｉをフレーム長として出力速度（再生）をゆっくり行う
（図５Ｂ）。すなわち、韻律特徴蓄積部１７およびスペ
クトル特徴蓄積部１８に蓄積された特徴を、キーワード
区間の長さがもとの長さＴａに対して定数ｅ倍になるよ
うに補間して出力する。この結果、キーワード区間はゆ
っくり発声されたことになり、キーワード区間のみ強調
する効果が得られる。なお、音声出力時の基本周波数は
もとの音声と同じとし、区間長のみを変形すれば音声の
自然性は保存される。ｅとしては１．２〜１．４程度が
よい。A case in which a voice section corresponding to a keyword is lengthened and emphasized according to the invention of claim 6 will be described with reference to the example of FIG. First, when a keyword is detected as shown in FIG. 5A, only the keyword section is determined with respect to the analysis frame length Ti of the feature analysis in the feature extracting unit 13.
The output speed (reproduction) is slowly performed using Ti ′ = e × Ti multiplied by a predetermined coefficient e as the frame length (FIG. 5B). That is, the features stored in the prosodic feature storage unit 17 and the spectrum feature storage unit 18 are interpolated and output so that the length of the keyword section is a constant e times the original length Ta. As a result, the keyword section is uttered slowly, and an effect of emphasizing only the keyword section is obtained. Note that the fundamental frequency at the time of sound output is the same as the original sound, and the naturalness of the sound is preserved by modifying only the section length. e is preferably about 1.2 to 1.4.

【００１５】請求項２〜６の各発明による強調処理を複
数組み合わせることにより更に大きな強調効果が期待で
きる。以上のように強調処理部１６で強調処理されたキ
ーワードのパラメータを入力音声パラメータの該当部分
に取り替え挿入し、その音声のパラメータを、音声合成
部３１において再度音声信号として合成され、音声出力
部３２でディジタル信号からアナログ信号に変換されて
音声出力される。A greater emphasis effect can be expected by combining a plurality of emphasis processes according to each of the second to sixth aspects of the present invention. As described above, the parameters of the keyword emphasized by the emphasis processing unit 16 are replaced with corresponding portions of the input speech parameters and inserted, and the speech parameters are synthesized again as a speech signal in the speech synthesis unit 31, and the speech output unit 32 Is converted from a digital signal to an analog signal and output as sound.

【００１６】図１に示した処理をテレホンショッピング
（注文受付）に適用すると、例えば、「新聞で見たので
すが、広告にのっていたネクタイを買いたいのですが。
同じものを３本お願いします。」という顧客（利用者）
の発声が、「新聞でみたのですが、広告にのっていた
ネクタイを買いたいのですが。同じものを３本お願
いします。」（は、音声のポーズ（無音区間）を表
す）のように、「ネクタイ」と「３本」の前後にポーズ
が挿入されることにより、注文受付を行うオペレータに
とって商品名と数量が強調された音声として聞くことが
できるようになる。この例ではキーワードの前後にポー
ズを挿入することにより強調を行っているが、音声パワ
ーを大きく設定することにより強調した場合、または、
基本周波数を高く設定することにより強調した場合にお
いても、「ネクタイ」と「３本」が、強く発声される、
または、高いピッチで発声されることにより強調され、
オペレータは容易に注文内容を聞き取ることが可能にな
る。The processing shown in FIG.
When applied to (order acceptance), for example,
I want to buy the tie on the ad.
I would like three of the same. ”Customer (user)
Said, "I saw it in the newspaper,
tie I want to buy The same Three Wish
I will. " Indicates the pause of the voice (silent section)
Pose) before and after "tie" and "three"
Is inserted, allowing the operator to accept orders
It can be heard as a voice with the product name and quantity emphasized
become able to. In this example, before and after the keyword
Although emphasis is performed by inserting
Is emphasized by setting a large value, or
When emphasized by setting the fundamental frequency high
Even if it is, "tie" and "three" are strongly uttered,
Or, emphasized by being uttered at a high pitch,
Operators can easily hear orders.
You.

【００１７】[0017]

【発明の効果】以上説明したように、この発明の音声強
調装置によれば、人間同士の対話において、該当する話
題に相当するキーワードを自動的に強調することができ
るため、前提となる知識が異なる場合、話題が一致して
いない場合、においても、コミュニケーションが図り易
くなり、対話による業務に要する時間の短縮、人間に対
する負担の軽減につながるという利点がある。As described above, according to the voice emphasizing device of the present invention, in a conversation between humans, a keyword corresponding to a relevant topic can be automatically emphasized. In the case of different topics, even when the topics do not match, communication is facilitated, and there is an advantage that the time required for work by dialogue is reduced and the burden on humans is reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明による音声強調装置の一実施例を示す
ブロック図。FIG. 1 is a block diagram showing one embodiment of a speech enhancement device according to the present invention.

【図２】キーワードの前後にポーズを挿入する、キーワ
ードの前後に警報音を挿入する例を示す図。FIG. 2 is a diagram showing an example of inserting a pause before and after a keyword, and inserting an alarm sound before and after a keyword.

【図３】キーワード区間の基本周波数を高く設定する例
を示す図。FIG. 3 is a diagram showing an example in which a fundamental frequency in a keyword section is set high.

【図４】キーワード区間の音声パワーを大きく設定する
例を示す図。FIG. 4 is a diagram showing an example in which audio power in a keyword section is set to be large.

【図５】キーワード区間に相当する音声をゆっくり再生
する例を示す図。FIG. 5 is a diagram showing an example in which voice corresponding to a keyword section is reproduced slowly.

フロントページの続き (56)参考文献特開昭62−102296（ＪＰ，Ａ) 特開昭63−173100（ＪＰ，Ａ) 特開昭64−88599（ＪＰ，Ａ) 特開平１−140369（ＪＰ，Ａ) 特開平１−204100（ＪＰ，Ａ) 特開平１−255925（ＪＰ，Ａ) 特開平３−78800（ＪＰ，Ａ) 特開平３−196197（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/00 - 13/08 G10L 15/10 Continuation of front page (56) References JP-A-62-102296 (JP, A) JP-A-63-173100 (JP, A) JP-A-64-88599 (JP, A) JP-A-1-140369 (JP) JP-A-1-204100 (JP, A) JP-A-1-255925 (JP, A) JP-A-3-78800 (JP, A) JP-A-3-196197 (JP, A) (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 13/00-13/08 G10L 15/10

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】入力音声中から強調すべき語を検出する
キーワード検出部と、その抽出された強調すべき語を強調処理する強調処理部
と、上記入力音声を、上記検出された強調すべき語を上記強
調処理された語におきかえて音声出力する音声出力部
と、を具備する音声強調装置。1. A keyword detecting section for detecting a word to be emphasized from an input voice, an emphasizing processing section for emphasizing the extracted word to be emphasized, and A voice output unit that outputs a voice by replacing the word with the word subjected to the emphasis processing.

【請求項２】上記強調処理部は強調すべき語に相当す
る音声区間の前または前後にポーズを挿入するものであ
ることを特徴とする請求項１記載の音声強調装置。2. The voice emphasizing device according to claim 1, wherein the emphasis processing section inserts a pause before or before or after a voice section corresponding to a word to be emphasized.

【請求項３】上記強調処理部は強調すべき語に相当す
る音声区間の基本周波数を高くするものであることを特
徴とする請求項１記載の音声強調装置。3. The voice emphasizing device according to claim 1, wherein the emphasis processing section raises a fundamental frequency of a voice section corresponding to a word to be emphasized.

【請求項４】上記強調処理部は強調すべき語に相当す
る音声区間のパワーを大とするものであることを特徴と
する請求項１記載の音声強調装置。4. The voice emphasizing device according to claim 1, wherein the emphasis processing section increases the power of a voice section corresponding to a word to be emphasized.

【請求項５】上記強調処理部は強調すべき語に相当す
る音声区間の前に警報音を挿入するものであることを特
徴とする請求項１記載の音声強調装置。5. The voice emphasizing device according to claim 1, wherein the emphasis processing section inserts an alarm sound before a voice section corresponding to a word to be emphasized.

【請求項６】上記強調処理部は、強調すべき語に相当
する音声区間をあらかじめ定めた定数倍長くするもので
あることを特徴とする請求項１記載の音声強調装置。6. The voice emphasizing device according to claim 1, wherein the emphasis processing section lengthens a voice section corresponding to a word to be emphasized by a predetermined constant.