JP2001343985A

JP2001343985A - Method of voice switching and voice switch

Info

Publication number: JP2001343985A
Application number: JP2000165695A
Authority: JP
Inventors: Akira Emura; 暁江村; Suehiro Shimauchi; 末廣島内; Shigeaki Aoki; 茂明青木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-06-02
Filing date: 2000-06-02
Publication date: 2001-12-14

Abstract

PROBLEM TO BE SOLVED: To provide a method of voice switching by which elimination of the beginning of a word is prevented and the quality of communication is improved. SOLUTION: In the method of voice switching, received signals are attenuated using a first attenuation amount and reproduced signals are outputted, residual signals are generated by subtracting pseudo reflected signals that are generated based on the reproduced signals from sound collected signals, the residual signals are attenuated using a second attenuation amount and made into transmitting signals, and the amounts of the first and the second attenuation are controlled based on the information of the received voice and the residual signals. In the method, probability of received voice being included in the received signals in every short time segment is computed, moreover, probability of transmitted signals being included in the residual signals is computed from received and residual signals for every short time segment. Based on these probabilities, distribution of the first and the second attenuation amounts is computed in a single or plural frequency bands and the amount of the first and the second attenuation is controlled.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、拡声通話系での音
声通信に利用され、再生された受話音声が発信元に送信
して生じる反響を抑制して通話品質を確保することを目
的とする。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used for voice communication in a loudspeaker communication system, and it is an object of the present invention to suppress the reverberation generated when a reproduced received voice is transmitted to a caller and to ensure the voice quality. .

【０００２】[0002]

【従来の技術】拡声通話系では、スピーカから拡声され
た受話がマイクに収音される反響が問題となる。この反
響現象は音響エコーとも呼ばれ、対地の拡声通話系と形
成される閉ループのループゲインが１より大きい場合に
はハウリングが発生し通話が不可能となる。またループ
ゲインが１より小さい場合にも通話の障害や不快感など
の悪影響を生じる。2. Description of the Related Art In a loudspeaker system, there is a problem of reverberation in which a loudspeaker received by a speaker is picked up by a microphone. This reverberation phenomenon is also called an acoustic echo, and when the loop gain of a closed loop formed with the loudspeaker communication system at the ground is larger than 1, howling occurs and communication becomes impossible. Also, when the loop gain is smaller than 1, adverse effects such as trouble in communication and discomfort occur.

【０００３】このような拡声通話系の問題点を解決する
方法として、音声の信号パワーを算出しパワーの大きい
部分を音声区間として扱うことで受話と送話を検出し、
これに応じて受話もしくは送話信号の減衰量を切替えて
反響を抑圧する音声スイッチ法と、反響模擬による反響
消去（エコーキャンセラ）法がある。反響模擬による反
響消去法は、反響を完全に消去できないため、通常音声
スイッチ法と組み合わせて使用される。As a method for solving such a problem of the loudspeaker communication system, a reception signal and a transmission signal are detected by calculating a signal power of a voice and treating a high power portion as a voice section.
There are a voice switch method for suppressing the reverberation by switching the amount of attenuation of the received or transmitted signal in accordance with the above, and a reverberation canceling (echo canceller) method by reverberation simulation. The echo canceling method by echo simulation cannot be completely eliminated, and is usually used in combination with the voice switch method.

【０００４】[0004]

【発明が解決しようとする課題】しかし、単純なパワー
比較により送話・受話側の減衰量を切替える音声スイッ
チ法を備える拡声通話系やＴＶ会議システムには、音声
の話頭切れが生じる問題があった。それは、周囲に雑音
源が存在する通話環境や反響消去量が不充分な通話環境
では、送受話判定を誤ったときに送話・受話側の減衰が
急激に切り替わるためである。特に子音の音声パワーは
母音の１／１０以下であるため、子音で始まる音声は話
頭切れが生じやすい。なお音声中の各単音パワーの相対
比較については、例えば「聴覚と音声」（三浦種敏監
修、電子情報通信学会編,1979,pp.295-297）に記載され
ている。However, a loudspeaker communication system or a TV conference system having a voice switch method for switching the amount of attenuation between the transmitting side and the receiving side by a simple power comparison has a problem in that the beginning of the voice is cut off. Was. This is because, in a call environment in which a noise source is present in the surroundings or in a call environment in which the amount of echo cancellation is insufficient, when the transmission / reception determination is incorrect, the attenuation on the transmission / reception side is rapidly switched. In particular, since the voice power of a consonant is less than 1/10 of a vowel, voices that start with a consonant are likely to be truncated. The relative comparison of the power of each sound in speech is described in, for example, "Hearing and Speech" (edited by T. Miura, edited by the Institute of Electronics, Information and Communication Engineers, 1979, pp. 295-297).

【０００５】一方音声認識のための音声区間検出法で
は、音声パワーの代わりの音声ピッチ情報やLPC(linear
predictive coding)スペクトル分析による音声パラ
メータを含む特徴量ベクトルに基づいて検出を行なうこ
とで、計算量は大幅に増加するものの子音検出能力や音
声区間の検出制度が向上することが知られている。例え
ば特開平４−２５１２９９号公報の音声区間検出法で
は、LPCスペクトル分析後の低周波数帯域のスペクトル
ピークと高周波帯域の平均スペクトルを特徴量としてい
る。On the other hand, in the voice section detection method for voice recognition, voice pitch information instead of voice power or LPC (linear
It is known that by performing detection based on a feature vector including a speech parameter by predictive coding (spectral analysis), the amount of calculation is greatly increased, but the consonant detection ability and the detection accuracy of a speech section are improved. For example, in the voice section detection method disclosed in Japanese Patent Laid-Open No. 4-251299, a characteristic peak is a spectrum peak in a low frequency band and an average spectrum in a high frequency band after LPC spectrum analysis.

【０００６】[0006]

【課題を解決するための手段】そこで本発明は、子音で
始まる音声でも話頭切れが生じないようにするために、
受話信号及び送話信号から特徴量ベクトルを抽出し、短
時間区間毎に受話音声及び送話音声が含まれる確率を求
め、この確率に応じて単一もしくは複数帯域で受話信号
と送話信号に挿入する減衰量の配分を制御する。SUMMARY OF THE INVENTION Accordingly, the present invention provides a method for preventing the beginning of a speech from occurring even in a voice starting with a consonant.
A feature vector is extracted from the received signal and the transmitted signal, and a probability that the received voice and the transmitted voice are included for each short time interval is determined. Controls the distribution of the inserted attenuation.

【０００７】[0007]

【発明の実施の形態】（第１の実施例）図１に、本発明
の音声スイッチの構成例を示す。音声スイッチ部１に入
力された受話信号は、受話減衰部１４にてN帯域分割フ
ィルタ１４１により帯域分割され、N帯域受話減衰部１
４２にて各帯域毎に減衰され、N帯域合成フィルタ１４
３により帯域合成されて再生信号を出力し、再生器３に
より音響信号に変換される。反響消去部２では、再生信
号から擬似反響生成部２１にて生成した擬似反響信号を
収音器４の収音信号から減算器２２で差し引くことで受
話反響成分を減らして残差信号を得る。残差信号は、送
話減衰部１５にてＮ帯域分割フィルタ１５１により帯域
分割され、Ｎ帯域送話減衰部１５２にて各帯域毎に減衰
され、N帯域合成フィルタ１５３により帯域合成され
て、送話信号となる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) FIG. 1 shows a configuration example of a voice switch according to the present invention. The reception signal input to the voice switch unit 1 is band-divided by an N band division filter 141 in a reception attenuation unit 14, and the N band reception attenuation unit 1
At 42, each band is attenuated, and the N band synthesis filter 14
3 reproduces a reproduced signal after being subjected to band synthesis, and is converted into an acoustic signal by the reproducer 3. The echo canceller 2 subtracts the pseudo echo signal generated by the pseudo echo generator 21 from the reproduced signal from the sound pickup signal of the sound pickup device 4 by the subtractor 22 to reduce the received echo component and obtain a residual signal. The residual signal is band-divided by the N-band splitting filter 151 in the transmission attenuation unit 15, attenuated for each band by the N-band transmission attenuation unit 152, band-synthesized by the N-band synthesis filter 153, and transmitted. It becomes a talk signal.

【０００８】受話音声確率算出部１１にて、現短時間区
間内に受話信号に受話音声が含まれる確率が算出され、
送話音声確率算出部１２にて、現短時間区間内で残差信
号に送話音声の含まれる確率が、受話信号と残差信号か
ら算出される。また、送受話減衰量配分算出部１３では
受話側への減衰量配分比ｐを、受話信号短時間区間内に
受話音声の存在する確率ｒと送話信号短時間区間内に送
話音声の存在する確率ｓに基づいて、例えば図２に示さ
れた関数 p= s(1−r/2) で決定する。通例、音声スイッチにおいては受話減衰部
１４の減衰量と送話減衰部１５の減衰量との積を一定に
制御する。例えば、基準値に減衰量配分比を乗じた値を
もって受話減衰部１４の減衰量として定める。送話減衰
部１５の減衰量は基準値に減衰量配分比で除した値とし
て定める。このようにして減衰量の配分を制御すると、
送話（音声）がないとき(s=0のとき)送話側の減衰量は
最大であり、送話音声の存在確率が大きくなることにし
たがい、送話側の減衰量が小さくなる。これらの処理
は、入力信号をN帯域の周波数帯域に分割して各帯域毎
に独立に行ない、減衰を受けた各帯域毎の成分を全帯域
に合成する。このような構成にすることによって、一部
の帯域だけについてループゲインが１を越えないように
減衰量を制御でき、より木目の細かい減衰量の制御が実
現する。[0008] The reception voice probability calculation unit 11 calculates the probability that the reception voice signal includes the reception voice in the current short time interval,
The transmission voice probability calculation unit 12 calculates the probability that the transmission signal is included in the residual signal in the current short time period from the reception signal and the residual signal. In addition, the transmission / reception attenuation distribution calculating unit 13 determines the attenuation distribution ratio p to the receiving side based on the probability r that the reception voice exists in the short period of the reception signal and the existence of the transmission voice in the short period of the transmission signal. Based on the probability s, the function p = s (1−r / 2) shown in FIG. 2, for example. Normally, in a voice switch, the product of the attenuation of the reception attenuation unit 14 and the attenuation of the transmission attenuation unit 15 is controlled to be constant. For example, a value obtained by multiplying the reference value by the attenuation distribution ratio is determined as the attenuation of the reception attenuation unit 14. The attenuation of the transmission attenuation unit 15 is determined as a value obtained by dividing the reference value by the attenuation distribution ratio. By controlling the distribution of attenuation in this way,
When there is no transmission (speech) (when s = 0), the attenuation on the transmitting side is the maximum, and as the existence probability of the transmitted voice increases, the attenuation on the transmitting side decreases. In these processes, an input signal is divided into N frequency bands and independently performed for each band, and attenuated components for each band are combined into the entire band. With such a configuration, the attenuation can be controlled so that the loop gain does not exceed 1 for only some of the bands, and the control of the attenuation with a finer grain can be realized.

【０００９】図３に、受話信号に受話音声及び残差信号
に送話音声が含まれる確率を算出する受話音声確率算出
部１１と送話音声確率算出部１２の構成を示す。受話側
では、受話信号のｋ番目の短時間区間はM次元特徴量ベ
クトル抽出部１１Bにて短時間区間から求めた全帯域パ
ワーPT、高域部と低域部のパワー比γ、偏自己相関係数
K1,線形予測分析残差の自己相関係数の最大値ρからな
る４次元特徴量ベクトルFIG. 3 shows a configuration of the received voice probability calculator 11 and the transmitted voice probability calculator 12 for calculating the probability that the received signal includes the received voice and the residual signal includes the transmitted voice. On the receiving side, the k-th short-term section of the received signal is the full-band power PT determined from the short-term section by the M-dimensional feature vector extraction unit 11B, the power ratio γ between the high band and the low band, the partial self-phase. Relationship number
K1, a four-dimensional feature vector consisting of the maximum value ρ of the autocorrelation coefficient of the linear prediction analysis residual

【００１０】[0010]

【数１】に変換される。特開平４−１０００９９号公報によれ
ば、雑音レベルNSが既知のとき、入力信号の短時間区間
から求めた全帯域パワーPT、高域部と低域部のパワー比
γ、偏自己相関係数K1、線形予測分析残差の自己相関係
数の最大値ρから、注目している短時間区間を有声音、
無声音、無音区間に以下のように判別できる。（１）有声音は一般に無声音に比べてパワーが大きいの
で、予め測定されている雑音レベルNSと比較し、γ＞０
かつPT＞NS＋３０ｄBであるとき有声音とする。なお、
この条件はS/N比が十分でなければ有効に機能しないの
で、その場合には信号の周期性に着目する。すなわち、
有声音は声帯の振動に伴う周期性があり、ρmaxに相当
する時間遅れ（周期）が声帯の基本振動数に対応し、そ
の大小が声帯振動の周期性に依存するので、その値が大
きいときに信号の周期性がρmax＞0.25であるとき有声
音とする。（２）偏自己相関係数K1が音声のスペクトル
概形の傾きを示し、その値が小さいほどスペクトルが平
坦であることからK1＜0.3であるときに無声音であると
する。（３）無音区間である条件はPT＜NSとし、それを
満たすときに該当フレームの判定を無音区間とする。(Equation 1) Is converted to According to Japanese Patent Application Laid-Open No. 4-100099, when the noise level NS is known, the entire band power PT obtained from the short time period of the input signal, the power ratio γ between the high band and the low band, the partial autocorrelation coefficient From K1, the maximum value ρ of the autocorrelation coefficient of the linear prediction analysis residual, a voiced sound
Unvoiced sound and a silent section can be determined as follows. (1) Since a voiced sound generally has a higher power than an unvoiced sound, it is compared with a noise level NS measured in advance, and γ> 0
When PT> NS + 30 dB, voiced sound is assumed. In addition,
Since this condition does not function effectively unless the S / N ratio is sufficient, attention is paid to the periodicity of the signal in that case. That is,
A voiced sound has periodicity due to vocal cord vibration, and the time delay (period) corresponding to ρmax corresponds to the fundamental frequency of the vocal cord, and the magnitude depends on the periodicity of the vocal cord vibration. When the periodicity of the signal is ρmax> 0.25, the voiced sound is determined. (2) The partial autocorrelation coefficient K1 indicates the slope of the spectral outline of the voice, and the smaller the value is, the flatter the spectrum is. Therefore, it is assumed that the voice is unvoiced when K1 <0.3. (3) The condition of a silent section is PT <NS, and when the condition is satisfied, the determination of the corresponding frame is determined to be a silent section.

【００１１】特徴量ベクトル時系列パターン判定部１１
Dでは、上記の判定法により［数１］の特徴量ベクトル
を有声音区間（V)、無声音区間（U)、無音区間（N)に分
類する。ｋ番目の受話信号の短時間区間に受話音声が含
まれる確率は、例えば、有声音区間（V)、無声音区間
（U)、無音区間（N)からなる時系列パターンと図４に示
される隠れマルコフモデルから算出される。送話側で
は、M次元特徴量ベクトル抽出部１２Bによって残差信号
が特徴量ベクトルA feature vector time-series pattern determining unit 11
In D, the feature amount vector of [Equation 1] is classified into a voiced section (V), an unvoiced section (U), and a non-speech section (N) by the above-described determination method. The probability that the received voice is included in the short-term section of the k-th received signal is, for example, the time-series pattern including the voiced section (V), the unvoiced section (U), and the non-voice section (N) and the hidden pattern shown in FIG. Calculated from Markov model. On the transmitting side, the residual signal is converted into a feature vector by the M-dimensional feature vector extraction unit 12B.

【００１２】[0012]

【数２】特徴量ベクトル時系列パターン判定部１２Dでは、例え
ばこれまでの４ブロックの特徴量ベクトルをまとめて得
られる１６次元ベクトル(Equation 2) In the feature vector time-series pattern determination unit 12D, for example, a 16-dimensional vector obtained by combining the feature vectors of the four blocks up to now is obtained.

【００１３】[0013]

【数３】を計算する。これが所定の閾値以下であれば残差信号は
ほとんど受話エコーで送話信号は含まれていないと判断
する。閾値を越えていれば残差信号に送話信号が含まれ
ていると判断し、残差信号から上記判別方法により抽出
された有声音区間（V)、無声音区間（U)、無音区間（N)
からなる時系列パターンと隠れマルコフモデルから、現
短時間区間に送話音声が含まれる確率を算出する。(Equation 3) Is calculated. If this is less than or equal to a predetermined threshold, it is determined that the residual signal is almost a received echo and does not include a transmitted signal. If the threshold value is exceeded, it is determined that the transmission signal is included in the residual signal, and the voiced sound section (V), unvoiced sound section (U), and silent section (N) extracted from the residual signal by the above-described determination method. )
Is calculated from the time-series pattern consisting of and the hidden Markov model.

【００１４】なお、この実施例では音声区間を３つに分
類するための４次元の特徴量ベクトルを用いているが、
より次元の大きい特徴量ベクトルを用いることも可能で
ある。図４に本発明の実施例に用いられる隠れマルコフ
モデルの各状態遷移を示す。短時間区間に音声が含まれ
る確率は、図４の隠れマルコフモデルを用いて、次のよ
うに計算される。このマルコフモデルは４つの状態S0、
S1、S2、S3を遷移し、遷移するときに無音区間（N)、無
声音区間（U)もしくは有声音区間（V)を出力する。隠れ
マルコフモデルの各状態S0〜S3には、その状態に滞在し
ているときに音声を含む確率をあらかじめ設定してお
く。例えば、図４のマルコフモデルではS0では音声確率
０％、S1では30％、S2では70％、S3では100％に設定し
た。In this embodiment, a four-dimensional feature quantity vector for classifying a speech section into three is used.
It is also possible to use a feature vector having a larger dimension. FIG. 4 shows each state transition of the hidden Markov model used in the embodiment of the present invention. The probability that the voice is included in the short time interval is calculated as follows using the hidden Markov model of FIG. This Markov model has four states S0,
A transition is made between S1, S2, and S3, and a silent section (N), an unvoiced section (U), or a voiced section (V) is output at the transition. In each of the states S0 to S3 of the Hidden Markov Model, the probability of including a voice when staying in that state is set in advance. For example, in the Markov model of FIG. 4, the voice probability is set to 0% for S0, 30% for S1, 70% for S2, and 100% for S3.

【００１５】図４の矢印上の数字は状態遷移確率と無音
区間（N)、無声音区間（U)もしくは有声音区間（V)の出
力確率を示している。例えば、状態S0から状態S1への遷
移確率は0.7で、この遷移中に有声音区間（V)を出力す
る確率がPV=0.0、無声音区間（U)を出力する確率がPU=
1.0、無音区間（N)を出力する確率がPN=0.0になってい
る。状態S0から状態S2への遷移確率は0.3で、この遷移
中には100％の確率で有声音区間（V)を出力する。状態S
1から状態S2への遷移確率は0.6で、この遷移中に無声音
区間（U)、有声音区間（V)、無音区間（N)を出力する確
率は、それぞれPV=0.8、PU=0.2、PN=0.0である。The numbers above the arrows in FIG. 4 indicate the state transition probabilities and the output probabilities of the silent section (N), unvoiced section (U) or voiced section (V). For example, the transition probability from the state S0 to the state S1 is 0.7, the probability of outputting the voiced sound section (V) during this transition is PV = 0.0, and the probability of outputting the unvoiced sound section (U) is PU =
1.0, the probability of outputting a silent section (N) is PN = 0.0. The transition probability from the state S0 to the state S2 is 0.3, and during this transition, a voiced sound section (V) is output with a probability of 100%. State S
The transition probability from 1 to the state S2 is 0.6, and the probability of outputting the unvoiced interval (U), voiced interval (V), and silent interval (N) during this transition is PV = 0.8, PU = 0.2, PN = 0.0.

【００１６】今、U→U→Vという、時系列パターンが抽
出され、かつS0〜S3の各状態に存在する確率をそれぞれ
Pn＝（S0n S1n S2n S3n）で表わす。例えば、P1は最
初にUが抽出され、かつS0〜S3の各状態に存在する確率
であり、P3は時系列パターンUUVが抽出され、かつS0〜S
3の各状態に存在する確率になる。初期状態はP0＝（10
0 0）に設定すると、 P1＝（0 1.0×0.7×S0₀ 0.0×0.3×S0₀ 0）＝（0 0.7 0 0） P2＝（0 0.8×0.4×S1₁ 0.2×0.6×S1₁ 0）＝（0 0.224 0.12 0） P3＝（0 0 0.8×0.6×S1₂＋0.8×0.4×S2₂ 1.0×0.6×S2₂）＝（0 0 0.1459 0.072）になる。Now, the probability that a time series pattern of U → U → V is extracted and exists in each of the states S0 to S3 is represented by
Pn = (S0n S1n S2n S3n). For example, P1 is a probability that U is extracted first and exists in each state of S0 to S3, and P3 is a time series pattern UUV extracted and S0 to S3.
The probability of being in each state of 3. The initial state is P0 = (10
When set to (0 0), P1 = (0 1.0 × 0.7 × S0 ₀ 0.0 × 0.3 × S0 ₀ 0) = (0 0.7 0 0) P2 = (0 0.8 × 0.4 × S1 ₁ 0.2 × 0.6 × S1 ₁ 0) = (0 0.224 0.12 0) P3 = (0 0.8 x 0.6 x S1 ₂ + 0.8 x 0.4 x S2 ₂ 1.0 x 0.6 x S2 ₂ ) = (0 0 0.1459 0.072).

【００１７】時系列UUを抽出した時点で、状態S1に存在
する確率は0.224／(0.224＋0.12)に、状態S2に存在する
確率は0.12／(0.224＋0.12)になる。短時間区間に音声
が含まれる確率は、例えば次式で計算できる。 30％×(0.224／(0.224＋0.12))＋70%×(0.12／(0.224＋
0.12))≒44％なお隠れマルコフモデルによる観測シンボル系列からの
確率推定は、例えば中川聖一、「確率モデルによる音
声認識」、電子情報通信学会編、第３章に詳しく説明さ
れている。When the time series UU is extracted, the probability of being in state S1 is 0.224 / (0.224 + 0.12) and the probability of being in state S2 is 0.12 / (0.224 + 0.12). The probability that a voice is included in a short time period can be calculated by the following equation, for example. 30% × (0.224 / (0.224 + 0.12)) + 70% × (0.12 // 0.224 +
0.12)) ≒ 44% The probability estimation from the observed symbol sequence using the hidden Markov model is described in detail in, for example, Seiichi Nakagawa, “Speech Recognition by Probability Model”, edited by IEICE, Chapter 3.

【００１８】図４のような音声処理に用いるマルコフモ
デルは、一般に初期状態と最終状態があり、状態S0への
逆行する遷移を持たない。したがって音声が開始されて
からしばらく経過すると、最終状態以外の状態に存在す
る確率はほとんど０となり、次の音声区間でも最終状態
にとどまり続けてしまう。これを防ぐために、受話信号
において無音区間（N）が一定数継続したことが検出さ
れたときに、隠れマルコフモデルを初期化する。また残
差信号から無音区間（N）が一定数継続したことが検出
されたとき、送話側の隠れマルコフモデルを初期化す
る。これにより、既に最終状態に到達している隠れマル
コフモデルを初期状態に戻し、次の音声区間先頭から音
声スイッチを制御することが可能となる。A Markov model used for speech processing as shown in FIG. 4 generally has an initial state and a final state, and does not have a backward transition to the state S0. Therefore, after a while from the start of the voice, the probability that the voice exists in a state other than the final state becomes almost zero, and the voice remains in the final state even in the next voice section. To prevent this, the hidden Markov model is initialized when it is detected that a silence period (N) has continued for a certain number in the reception signal. Further, when it is detected from the residual signal that the silence section (N) has continued for a certain number, the hidden Markov model on the transmitting side is initialized. As a result, the hidden Markov model that has already reached the final state can be returned to the initial state, and the voice switch can be controlled from the beginning of the next voice section.

【００１９】この実施例では、４状態を持ち３つのシン
ボルを出力する隠れマルコフモデルを用いたが、状態数
がより大きく、出力シンボル数の大きい隠れマルコフモ
デルを用いることも可能である。さらに、例えば線形予
測を用いて高能率で符号化された音声が伝送路にて伝送
されるとき、音声スペクトル概形とピッチ周期パラメー
タからなる音声符号の各要素を音声スイッチの特徴量と
して利用する。これにより、復号後の信号から複数の特
徴量を抽出する方法と比較して、少ない処理量で本発明
の音声スイッチを実現することが可能となる。In this embodiment, a hidden Markov model having four states and outputting three symbols is used. However, it is also possible to use a hidden Markov model having a larger number of states and a larger number of output symbols. Furthermore, when speech coded at high efficiency using, for example, linear prediction is transmitted through a transmission path, each element of a speech code including a speech spectrum outline and a pitch period parameter is used as a feature amount of a speech switch. . This makes it possible to implement the voice switch of the present invention with a smaller amount of processing as compared with a method of extracting a plurality of feature amounts from a decoded signal.

【００２０】[0020]

【発明の効果】以上に述べたように、この発明は受話信
号と残差信号から、一定時間区間毎に音声の特徴をより
反映する特徴量ベクトルを抽出し、その時系列に基づい
て現在の短時間音声区間に音声が含まれる確率を算出
し、この確率に基づいて受話・送話の減衰量配分を制御
する。このため本発明の音声スイッチは、単純なパワー
比較により送話側と受話側の減衰量を切替える従来の音
声スイッチ法と比較して、音声パワーの小さい子音で始
まる音声でも話頭切れが生じないようにできる。As described above, according to the present invention, a feature amount vector that more reflects the features of the voice is extracted from the reception signal and the residual signal at regular time intervals, and the current short-term value is extracted based on the time series. The probability that a voice is included in the time voice section is calculated, and the distribution of the amount of attenuation of the reception / transmission is controlled based on the probability. For this reason, the voice switch of the present invention does not cut off the beginning of the voice even in voices starting with a consonant having a low voice power, as compared with the conventional voice switch method of switching the attenuation amount between the transmitting side and the receiving side by a simple power comparison. Can be.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の音声スイッチの構成を示す図。FIG. 1 is a diagram showing a configuration of a voice switch of the present invention.

【図２】受話音声存在確率ｒと送話音声存在確率ｓから
受話側の減衰量配分比ｐを決める関数を示す図。FIG. 2 is a diagram showing a function for determining a receiving-side attenuation distribution ratio p from a received voice existence probability r and a transmitted voice existence probability s.

【図３】受話音声確率算出部と送話音声確率算出部の構
成を示す図。FIG. 3 is a diagram showing a configuration of a reception voice probability calculation unit and a transmission voice probability calculation unit.

【図４】実施例で用いられる隠れマルコフモデルの一例
を示す図。FIG. 4 is a diagram showing an example of a hidden Markov model used in the embodiment.

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 9/08 Ｇ１０Ｌ 9/00 ＣＨ０４Ｒ 3/02 Ｄ (72)発明者青木茂明東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5D015 DD03 5D020 CC05 5K027 BB03 DD07 DD10 5K038 CC00 DD06 FF13 Continuation of the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (reference) H04M 9/08 G10L 9/00 C H04R 3/02 D (72) Inventor Shigeaki Aoki 2-3 Otemachi, Chiyoda-ku, Tokyo No. 1 Nippon Telegraph and Telephone Corporation F-term (reference) 5D015 DD03 5D020 CC05 5K027 BB03 DD07 DD10 5K038 CC00 DD06 FF13

Claims

【特許請求の範囲】[Claims]

【請求項１】受話信号を第１の減衰量で減衰させて再生
信号を出力し、前記再生信号に基づいて擬似反響信号を生成し、収音信号から前記擬似反響信号を差し引いて残差信号を
生成し、前記残差信号を第２の減衰量で減衰させて送話信号と
し、前記受話信号と残差信号の情報に基づいて前記第１及び
第２の減衰量を制御する音声スイッチ方法において、前記第１及び第２の減衰量の制御は、短時間区間毎に受話信号に受話音声が含まれる確率を算
出するステップと、短時間区間毎に受話信号及び残差信号から、残差信号に
送話音声が含まれる確率を算出するステップと、前記確率に基づいて単一もしくは複数周波数帯域にて第
１及び第２の減衰量の配分を算出するステップを有する
ことを特徴とする音声スイッチ方法。An attenuated received signal is attenuated by a first amount to output a reproduced signal, a pseudo echo signal is generated based on the reproduced signal, and the pseudo echo signal is subtracted from a collected signal to obtain a residual signal. A voice switching method for attenuating the residual signal by a second attenuation amount to generate a transmission signal, and controlling the first and second attenuation amounts based on information on the reception signal and the residual signal. In the control of the first and second attenuation amounts, a step of calculating a probability that a received voice includes a received voice signal for each short time section, and a step of calculating a residual error from the received signal and the residual signal for each short time section Calculating a probability that the transmitted voice is included in the signal; and calculating a distribution of the first and second attenuation amounts in a single or a plurality of frequency bands based on the probability. Switch method.

【請求項２】請求項１に記載の音声スイッチ方法におい
て、前記短時間区間毎に受話信号に受話音声が含まれる確率
を算出するステップは、受話信号の短時間区間から特徴量ベクトルを抽出するス
テップと、特徴量ベクトルの時系列から受話音声が含ま
れる確率を算出するステップを有し、短時間区間毎に残差信号に送話音声が含まれる確率を算
出するステップは、受話信号と残差信号の短時間区間から特徴量ベクトルを
抽出するステップと、特徴量ベクトルの時系列から送話
音声が含まれる確率を算出するステップを有することを
特徴とする音声スイッチ。2. The voice switching method according to claim 1, wherein the step of calculating the probability that the received voice signal includes the received voice for each short time period comprises extracting a feature vector from the short time period of the received voice signal. And a step of calculating the probability that the received voice is included from the time series of the feature amount vector. The step of calculating the probability that the transmitted voice is included in the residual signal for each short time interval includes: A voice switch, comprising: extracting a feature vector from a short time period of a difference signal; and calculating a probability that a transmitted voice is included from a time series of the feature vector.

【請求項３】請求項２に記載の音声スイッチ方法におい
て、前記特徴量ベクトルの時系列から短時間区間毎に音声が
含まれる確率を算出するステップは、「隠れマルコフモデル」を用いて確率計算するステップ
を有し、非音声区間が検出されたときに、確率の計算に
用いる隠れマルコフモデルを初期化することを特徴とす
る音声スイッチ方法。3. The voice switching method according to claim 2, wherein the step of calculating a probability that a voice is included for each short time period from the time series of the feature amount vector includes a probability calculation using a “Hidden Markov Model”. And a step of initializing a hidden Markov model used for calculating a probability when a non-voice section is detected.

【請求項４】請求項２または３に記載の音声スイッチ方
法において、前記受話信号と送話信号は、符号化された信号であり、
特徴量ベクトルが符号要素からなることを特徴とする音
声スイッチ方法。4. The voice switching method according to claim 2, wherein the reception signal and the transmission signal are coded signals,
A voice switch method, wherein the feature vector comprises a code element.

【請求項５】受話信号を第１の減衰量で減衰させて再生
信号を出力する受話減衰部と、前記再生信号を参照して擬似反響信号を生成し、収音信
号から前記擬似反響信号を差引き残差信号を出力する反
響消去部と、前記残差信号を第二の減衰量で減衰させて送話信号を出
力する送話減衰部と、前記受話信号と残差信号を入力し、第１及び第２の減衰
量を制御する制御部を備えた音声スイッチにおいて、前記制御部は、短時間区間毎に受話信号に受話音声が含まれる確率を算
出する受話音声確率算出部と、短時間区間毎に受話信号及び残差信号から残差信号に送
話音声が含まれる確率を算出する送話音声確率算出部
と、前記算出した確率に基づいて単一もしくは複数周波数帯
域にて第１及び第２の減衰量の配分を算出する送受話減
衰量配分算出部を備えたことを特徴とする音声スイッ
チ。5. A receiving attenuator for attenuating a received signal by a first attenuation to output a reproduced signal; generating a pseudo echo signal by referring to the reproduced signal; and generating the pseudo echo signal from the collected sound signal. An echo canceller that outputs a subtraction residual signal, a transmission attenuator that attenuates the residual signal by a second attenuation and outputs a transmission signal, and inputs the received signal and the residual signal, A voice switch including a control unit for controlling the first and second attenuations, wherein the control unit calculates a probability that the received signal includes the received voice for each short time interval; A transmission voice probability calculation unit for calculating a probability that the transmission signal is included in the residual signal from the reception signal and the residual signal for each time section; and a first or a plurality of frequency bands based on the calculated probability. And transmission / reception talk attenuation distribution calculation for calculating the distribution of the second attenuation An audio switch comprising an output unit.

【請求項６】請求項５に記載の音声スイッチにおいて、前記受話音声確率算出部は、受話信号の短時間区間から特徴量ベクトルを抽出する手
段と、特徴量ベクトルの時系列から受話音声が含まれる
確率を算出する手段とを有し、前記送話音声確率算出部は、受話信号と残差信号の短時間区間から特徴量ベクトルを
抽出する手段と、特徴量ベクトルの時系列から送話音声が含まれる確率を
算出する手段を有することを特徴とする音声スイッチ。6. The voice switch according to claim 5, wherein said received voice probability calculation unit includes: a means for extracting a feature vector from a short time period of the received signal; and a received voice from a time series of the feature vector. Means for calculating a probability to be transmitted, wherein the transmission voice probability calculation unit comprises: means for extracting a feature vector from a short period of a received signal and a residual signal; and transmission voice from a time series of the feature vector. A voice switch, comprising: means for calculating a probability that a character is included.

【請求項７】請求項６に記載の音声スイッチにおいて、前記特徴量ベクトルの時系列から短時間区間毎に音声が
含まれる確率を算出する手段は、「隠れマルコフモデル」を用いて確率計算し、非音声区
間が検出されたときに、確率の計算に用いる隠れマルコ
フモデルを初期化する手段を備えたことを特徴とする音
声スイッチ。7. The voice switch according to claim 6, wherein the means for calculating a probability that a voice is included for each short time period from the time series of the feature amount vector includes calculating a probability using a “hidden Markov model”. And a means for initializing a hidden Markov model used for calculating a probability when a non-voice section is detected.

【請求項８】請求項６または７に記載の音声スイッチに
おいて、前記受話信号と送話信号は、符号化された信号であり、
特徴量ベクトルが符号要素からなることを特徴とする音
声スイッチ。8. The voice switch according to claim 6, wherein the reception signal and the transmission signal are coded signals,
An audio switch, wherein the feature vector comprises a code element.