JP2010124207A

JP2010124207A - Volume adjusting device, and method, program, and recording medium of the same

Info

Publication number: JP2010124207A
Application number: JP2008295634A
Authority: JP
Inventors: Tasuku Shinozaki; 翼篠崎; Taichi Asami; 太一浅見
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-11-19
Filing date: 2008-11-19
Publication date: 2010-06-03

Abstract

<P>PROBLEM TO BE SOLVED: To suppress an echo in an echo-suppressing part properly even if volumes of an input sound and a reference sound are adjusted. <P>SOLUTION: A first sound collecting part collects a sound as an input sound signal, and a gain of the input sound signal concerned is adjusted by use of input sound gain information, whereby the adjusted input sound signal is output, a gain of a reference sound signal is adjusted by use of reference sound gain information, whereby the adjusted reference sound signal is output, and what adaptive filter coefficients are convoluted into the adjusted reference sound signal is subtracted from the adjusted input sound signal, whereby the suppressed input sound signal is output. The input sound gain information, the reference sound gain information, and gain adjusting information indicating a degree in which an amount of change of the input sound gain information is distant from an amount of change of the reference sound gain information are computed by use of the adjusted reference sound signal and the adjusted input sound signal, and the adaptive filter coefficients are set to a value corresponding to the gain adjusting information to output the suppressed input sound signal and the adjusted reference sound signal. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、入力した音声の音量を的に調整する音量調整装置、その方法、プログラム、記録媒体に関する。 The present invention relates to a volume control apparatus, a method, a program, and a recording medium for adjusting the volume of input sound.

入力した音の音量を適切な音量にする音量調整装置は様々ある（例えば、特許文献１に記載）。図１に従来の音量調整装置６、８を電話機に用いた場合の例を示し、図２に図１を簡略化したものを示す。電話機５０とハンドセット４２とはコード（図示せず）で結ばれているとし、電話機５０とハンドセット４２との間に音量調整装置６、８は設けられる。この説明では、エコー抑圧装置１０を適用した場合を示す。また、図１の前提として、第２音源からの第２音Ｇが第２収音部４で参照音Ｑとして収音され、第１音源からの第１音Ｆと第２音Ｇが第１収音部２で入力音Ｐとして収音される。そして、エコー抑圧装置１０からエコー抑圧後の信号を出力する（後述するエコー信号を抑圧した信号）場合を説明する。電話機を用いて、自地点の話者と他地点の話者とが通話をしている場合を考えると、第１音源とは他地点の話者の口（音声が発せられる箇所）であり、第２音源とは自地点の話者の口（音声が発せられる箇所）である。第１音、第２音がハンドセットに入力された後に変換された電気信号をそれぞれ第１音信号Ｆ（ｘ）、第２音信号Ｇ（ｘ）とする。ただしｘは時刻とする。第１音Ｆ、第１音信号Ｆ（ｘ）を実線で示し、第２音Ｇ、第２音信号Ｇ（ｘ）を破線で示す。ハンドセット４２は、送話信号を入力させる送話部４６と、受音信号を経由出力させる受聴部４４とからなるとする。第２音源からの第２音信号Ｇ（ｘ）は、送話信号に変換され送話部４６を経由して、第２収音部４で収音されつつ、加算部５２に入力され、ネットワーク（図示せず）を経由して、他地点の話者に送話信号を送信される。 There are various volume control devices that adjust the volume of input sound to an appropriate volume (for example, described in Patent Document 1). FIG. 1 shows an example in which conventional volume control devices 6 and 8 are used in a telephone, and FIG. 2 shows a simplified version of FIG. Assume that the telephone 50 and the handset 42 are connected by a cord (not shown), and the volume control devices 6 and 8 are provided between the telephone 50 and the handset 42. In this description, a case where the echo suppression device 10 is applied is shown. As a premise of FIG. 1, the second sound G from the second sound source is collected as the reference sound Q by the second sound collection unit 4, and the first sound F and the second sound G from the first sound source are the first sound. The sound collecting unit 2 collects the input sound P. A case where a signal after echo suppression is output from the echo suppression device 10 (a signal in which an echo signal described later is suppressed) will be described. Considering the case where the speaker at the local point and the speaker at the other point are talking using the telephone, the first sound source is the mouth of the speaker at the other point (where the voice is emitted) The second sound source is a speaker's mouth (a portion where sound is emitted) at the local point. The electric signals converted after the first sound and the second sound are input to the handset are referred to as a first sound signal F (x) and a second sound signal G (x), respectively. However, x is time. The first sound F and the first sound signal F (x) are indicated by solid lines, and the second sound G and the second sound signal G (x) are indicated by broken lines. It is assumed that the handset 42 includes a transmission unit 46 that inputs a transmission signal and a listening unit 44 that outputs a received sound signal. The second sound signal G (x) from the second sound source is converted into a transmission signal, collected by the second sound collection unit 4 via the transmission unit 46, and input to the addition unit 52 to be transmitted to the network. A transmission signal is transmitted to a speaker at another point via (not shown).

一方、ネットワークを経由して到達した第１音信号Ｆ（ｘ）は加算部５２に入力される。加算部５２は、第２音信号Ｇ（ｘ）にγ（音響・電気系による係数）を乗算したもの（エコー音信号（側音信号））と、第１音信号Ｆ（ｘ）と加算する。つまり、加算部５２は、Ｆ（ｘ）＋γＧ（ｘ）を演算して、第１収音部２に出力する。出力された信号Ｆ（ｘ）＋γＧ（ｘ）は、第１収音部２で収音されつつ、受聴部４４から再生される。受聴部４４で第１音信号Ｆ（ｘ）のみを再生するのではなく、Ｆ（ｘ）＋γＧ（ｘ）を再生する理由は、聴感上、違和感をなくすために、自身の話した声（つまり、第２音信号Ｇ（ｘ）についての音声）を第１音（第１音信号Ｆ（ｘ）についての音声）に重畳して受聴部４４で再生させている。以下の説明では、入力音Ｐについての信号を入力音信号Ｐ（ｘ）（＝Ｆ（ｘ）＋γＧ（ｘ））とし、参照音Ｑについての信号を参照音信号Ｑ（ｘ）（＝Ｇ（ｘ））とする。 On the other hand, the first sound signal F (x) reached via the network is input to the adding unit 52. The adder 52 adds the second sound signal G (x) multiplied by γ (coefficient by acoustic / electrical system) (echo sound signal (side sound signal)) and the first sound signal F (x). . That is, the adder 52 calculates F (x) + γG (x) and outputs it to the first sound collection unit 2. The output signal F (x) + γG (x) is reproduced from the listening unit 44 while being collected by the first sound collecting unit 2. The reason for playing back F (x) + γG (x) instead of playing back only the first sound signal F (x) at the listening unit 44 is that the voice that the user spoke (that is, in order to eliminate the uncomfortable feeling) The sound of the second sound signal G (x) is superimposed on the first sound (the sound of the first sound signal F (x)) and is reproduced by the listening unit 44. In the following description, the signal for the input sound P is defined as the input sound signal P (x) (= F (x) + γG (x)), and the signal for the reference sound Q is defined as the reference sound signal Q (x) (= G ( x)).

第１収音部２よりの入力音信号Ｐ（ｘ）は音量調整装置６に入力され、第２収音部４よりの参照音信号Ｑ（ｘ）は音量調整装置８に入力される。音量調整装置６、８はそれぞれ入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）の利得（入力音、参照音の音量）を適切な値に調整する。調整後の入力音信号Ｐ（ｘ）、調整後の参照音信号Ｑ（ｘ）を入力として、エコー抑圧装置１０でエコー音信号を抑圧する。入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）の利得（入力音、参照音の音量）を適切な値に調整する理由は、入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）の振幅が大きすぎて、入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）が歪む場合には、適応フィルタ係数を適切に学習できない等の問題が発生するからである。
特開昭５８−１４１０１８号公報 The input sound signal P (x) from the first sound collection unit 2 is input to the volume adjustment device 6, and the reference sound signal Q (x) from the second sound collection unit 4 is input to the volume adjustment device 8. The volume adjusting devices 6 and 8 adjust the gains (input sound and reference sound volume) of the input sound signal P (x) and the reference sound signal Q (x) to appropriate values, respectively. The echo suppressor 10 suppresses the echo sound signal using the adjusted input sound signal P (x) and the adjusted reference sound signal Q (x) as inputs. The reason why the gains (input sound and reference sound volume) of the input sound signal P (x) and the reference sound signal Q (x) are adjusted to appropriate values is that the input sound signal P (x) and the reference sound signal Q (x ) Is too large and the input sound signal P (x) and the reference sound signal Q (x) are distorted, there arises a problem that the adaptive filter coefficient cannot be properly learned.
JP 58-14410 A

上記のような構成にした場合、音量調整装置によって、入力音と参照音の音量がばらばらに調整されるため、音量変更の度にエコー抑圧装置の適応量の再計算が必要になり、その再計算が終わるまでの間、エコーを抑圧できず、エコー抑圧装置からの出力にエコーが残ったままになる。つまり、常に音量調整装置により音量が変化すると、エコーを消去できなくなる。 In the case of the above configuration, the volume of the input sound and the reference sound are adjusted differently by the volume adjustment device, so it is necessary to recalculate the adaptation amount of the echo suppression device every time the volume is changed. Until the calculation is completed, the echo cannot be suppressed, and the echo remains in the output from the echo suppressor. That is, if the volume is always changed by the volume adjusting device, the echo cannot be erased.

上述では音量調整装置をエコー抑圧装置の前段に設けた例を説明したが、図３に示すように、エコー抑圧装置１０の後段に音量調整装置１２を設けることもできる。この構成をとることで、上記問題は解決できる。しかし、図３のような構成である場合に、エコー抑圧装置１０の入力の際に、入力音の音声波形、参照音の音声波形の少なくとも一方が最大値（エコー抑圧装置１０の入力のピーク）を越えて波形が歪むと、エコー抑圧装置では適切にエコーを抑圧できなくなる。 In the above description, an example in which the volume control device is provided in the previous stage of the echo suppression device has been described. However, as shown in FIG. 3, the volume control device 12 may be provided in the subsequent stage of the echo suppression device 10. By taking this configuration, the above problem can be solved. However, in the case of the configuration as shown in FIG. 3, at the time of input of the echo suppressor 10, at least one of the speech waveform of the input sound and the speech waveform of the reference sound has a maximum value (the input peak of the echo suppressor 10). If the waveform is distorted beyond this value, the echo suppressor cannot properly suppress the echo.

本願の目的は、入力音、参照音の音量を変更した場合でもエコーの抑圧を適切に行うことができる音量調整装置、その方法、そのプログラム、その記録媒体を提供するものである。 An object of the present application is to provide a volume adjusting device, a method thereof, a program thereof, and a recording medium capable of appropriately suppressing echo even when the volume of an input sound and a reference sound is changed.

この発明の音量調整装置は、入力音利得調整部と、参照音利得調整部と、エコー抑圧部と、利得計算部と、出力部と、を具備する。入力音利得調整部は、第１音源からの第１音と、第２音源からの第２音が第１収音部で入力音信号として収音され、当該入力音信号の利得を入力音利得情報を用いて調整することで、調整後入力音信号を出力する。参照音利得調整部は、第２音が第２収音部で参照音信号として収音され、当該参照音信号の利得を参照音利得情報を用いて調整することで、調整後参照音信号を出力する。エコー抑圧部は、調整後入力音信号から、調整後参照音信号に適応フィルタ係数を畳み込んだものを減算することで、抑圧後入力音信号を出力する。利得計算部は、調整後参照音信号と調整後入力音信号とを用いて、入力音利得情報と、参照音利得情報と、当該入力音利得情報の変化量が当該参照音利得情報の変化量と離れている度合いを示す利得調整情報と、を計算し、前記適応フィルタ係数を当該利得調整情報に応じた値にする。出力部は、抑圧後入力音信号および調整後参照音信号を出力する。 The volume control device of the present invention includes an input sound gain adjustment unit, a reference sound gain adjustment unit, an echo suppression unit, a gain calculation unit, and an output unit. The input sound gain adjustment unit picks up the first sound from the first sound source and the second sound from the second sound source as input sound signals by the first sound collection unit, and calculates the gain of the input sound signal as the input sound gain. By adjusting using the information, an adjusted input sound signal is output. The reference sound gain adjusting unit picks up the adjusted reference sound signal by collecting the second sound as a reference sound signal by the second sound collecting unit and adjusting the gain of the reference sound signal using the reference sound gain information. Output. The echo suppression unit outputs the post-suppression input sound signal by subtracting the adjusted reference sound signal obtained by convolving the adaptive filter coefficient from the adjusted input sound signal. The gain calculation unit uses the adjusted reference sound signal and the adjusted input sound signal to change the input sound gain information, the reference sound gain information, and the change amount of the input sound gain information to the change amount of the reference sound gain information. The gain adjustment information indicating the degree of separation is calculated, and the adaptive filter coefficient is set to a value corresponding to the gain adjustment information. The output unit outputs the post-suppression input sound signal and the adjusted reference sound signal.

この発明では、入力音、参照音の音量を調整したとしても、エコー抑圧部で適切にエコーを抑圧できる。 In this invention, even if the volume of the input sound and the reference sound is adjusted, the echo can be appropriately suppressed by the echo suppression unit.

以下に、発明を実施するための最良の形態を示す。なお、同じ機能を持つ構成部や同じ処理を行う過程には同じ番号を付し、重複説明を省略する。 The best mode for carrying out the invention will be described below. In addition, the same number is attached | subjected to the process which performs the structure part which has the same function, and the same process, and duplication description is abbreviate | omitted.

図４に実施例１の音量調整装置２０の機能構成例を示し、図５に音量調整装置２０を電話機に適用した場合の機能構成例を示し、図６に処理フローを示す。
音量調整装置２０は、入力音利得調整部２４と、参照音利得調整部２２と、利得計算部２６と、エコー抑圧部２８と、出力部２１と、を有する。また上述のように、第１音源からの音を第１音Ｆとし、第２音源からの音を第２音Ｇとする。また、第１音Ｆと第２音Ｇとが第１収音部２で電気信号に変換され、入力音信号Ｐ（ｘ）として収音されるとし、第２音が第２収音部４で電気信号に変換され参照音信号Ｑ（ｘ）として収音されるとする。ここで、収音部とは例えば、マイクロホンである。収音部として、電話の送話音声と受話音声の一方を又は両方をミックスして取り出すために、電話機と、送受話器又はハンドセットとの間に設置した送受話アダプターを用いてもよい。第１音、第２音についての電気信号である第１音信号、第２音信号をそれぞれＦ（ｘ）、Ｇ（ｘ）とすると、
入力音信号Ｐ（ｘ）＝Ｆ（ｘ）＋γＧ（ｘ）
参照音信号Ｑ（ｘ）＝Ｇ（ｘ）
となる。ただし、ｘは時刻である。この実施例１の目的は、エコー抑圧部２８で、入力音信号Ｐ（ｘ）から重畳されたエコー音信号γＧ（ｘ）を抑圧し、出力部２１で第１音信号Ｆ（ｘ）、第２音信号Ｇ（ｘ）を出力することを目的とする。また、図７に利得計算部２６の機能構成例を主に示したものを示す。図７に示すように、入力音信号、参照音信号について分かれており、入力音信号の処理のために、ＡＤ変換手段２６２、フレーム分割手段２６４、バッファ２６６、直流バイアス計算手段２６８、減算手段２７０、音量計算手段２７２、入力音利得調整指示手段２７４が設けられ、参照音の処理のためにＡＤ変換手段２８２、フレーム分割手段２８４、バッファ２８６、直流バイアス計算手段２８８、減算手段２９０、音量計算手段２９２、入力音利得調整指示手段２９４が設けられ、そのほか、利得決定手段２７６が設けられる。入力音信号、参照音信号についてのこれらの構成部は対応するもの同士、統合しても良い。 FIG. 4 shows a functional configuration example of the volume control device 20 of the first embodiment, FIG. 5 shows a functional configuration example when the volume control device 20 is applied to a telephone, and FIG. 6 shows a processing flow.
The volume adjustment device 20 includes an input sound gain adjustment unit 24, a reference sound gain adjustment unit 22, a gain calculation unit 26, an echo suppression unit 28, and an output unit 21. In addition, as described above, the sound from the first sound source is referred to as the first sound F, and the sound from the second sound source is referred to as the second sound G. Further, it is assumed that the first sound F and the second sound G are converted into electric signals by the first sound collection unit 2 and collected as the input sound signal P (x), and the second sound is collected by the second sound collection unit 4. Is converted into an electrical signal and collected as a reference sound signal Q (x). Here, the sound collection unit is, for example, a microphone. A transmitter / receiver adapter installed between the telephone and the transmitter / receiver or the handset may be used as the sound collection unit in order to extract one or both of the transmitted voice and the received voice of the telephone. When the first sound signal and the second sound signal, which are electrical signals for the first sound and the second sound, are F (x) and G (x), respectively,
Input sound signal P (x) = F (x) + γG (x)
Reference sound signal Q (x) = G (x)
It becomes. However, x is time. The purpose of the first embodiment is to suppress the echo sound signal γG (x) superimposed from the input sound signal P (x) by the echo suppression unit 28, and to output the first sound signal F (x) and the first sound signal by the output unit 21. The purpose is to output a two-tone signal G (x). FIG. 7 mainly shows a functional configuration example of the gain calculation unit 26. As shown in FIG. 7, the input sound signal and the reference sound signal are separated, and for processing of the input sound signal, AD conversion means 262, frame division means 264, buffer 266, DC bias calculation means 268, subtraction means 270 are provided. , A sound volume calculation means 272 and an input sound gain adjustment instruction means 274 are provided, and an AD conversion means 282, a frame division means 284, a buffer 286, a DC bias calculation means 288, a subtraction means 290, a sound volume calculation means for processing the reference sound. 292, input sound gain adjustment instruction means 294 is provided, and gain determination means 276 is also provided. These components of the input sound signal and the reference sound signal may be integrated with each other.

入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）はそれぞれ入力音利得調整部２４、参照音利得調整部２２とに入力される。また、入力音利得調整部２４、参照音利得調整部２２の前段にＡＤ変換部２７を設けて、デジタル信号で処理を行ってもよく、ＡＤ変換部２７を設けずにアナログ信号で処理を行っても良い。入力音信号Ｐ（ｘ）、参照音信号Ｑ（ｘ）はそれぞれ入力音利得調整部２４、参照音利得調整部２２に入力される。入力音利得調整部２４は入力音信号Ｐ（ｘ）の利得を入力音利得情報β_１を用いて調整する（利得を乗算する）ことで、調整後入力音信号Ｐ’（ｘ）を出力する（ステップＳ２）。入力音利得情報β_１とは入力音信号Ｐ（ｘ）に乗算する利得であり、入力音利得情報β_１は、入力音利得調整指示手段２７４（後述する）から入力される。調整後入力音Ｐ’（ｘ）はエコー抑圧部２８に入力される。 The input sound signal P (x) and the reference sound signal Q (x) are input to the input sound gain adjustment unit 24 and the reference sound gain adjustment unit 22, respectively. Further, an AD conversion unit 27 may be provided before the input sound gain adjustment unit 24 and the reference sound gain adjustment unit 22 to perform processing with a digital signal, or an analog signal may be processed without the AD conversion unit 27 being provided. May be. The input sound signal P (x) and the reference sound signal Q (x) are input to the input sound gain adjustment unit 24 and the reference sound gain adjustment unit 22, respectively. The input sound gain adjustment unit 24 adjusts the gain of the input sound signal P (x) using the input sound gain information β ₁ (multiplies the gain), thereby outputting the adjusted input sound signal P ′ (x). (Step S2). The input sound gain information β ₁ is a gain multiplied by the input sound signal P (x), and the input sound gain information β ₁ is input from the input sound gain adjustment instruction unit 274 (described later). The adjusted input sound P ′ (x) is input to the echo suppression unit 28.

エコー抑圧部２８とは、一般的に使用されているエコー抑圧装置であり、例えば、「北脇信彦著、「未来ねっと技術シリーズディジタル音声・オーディオ技術」電気通信協会、平成１１年１２月１５日初版発行、ｐ２１８−ｐ２５５」などに記載されている。エコー抑圧部２８の処理内容は後述する。エコー抑圧部２８の出力信号は、利得計算部２６に入力される。 The echo suppressor 28 is a commonly used echo suppressor. For example, “Nobuhiko Kitawaki,“ Future Netto Technology Series Digital Voice / Audio Technology ”, Telecommunications Association, December 15, 1999, first edition. Issue, p218-p255 ". The processing content of the echo suppression unit 28 will be described later. The output signal of the echo suppression unit 28 is input to the gain calculation unit 26.

利得計算部２６は、調整後参照音信号Ｑ’（ｘ）と抑圧後入力音信号Ｐ’’（ｘ）とを用いて、入力音利得情報β_１と、参照音利得情報β_２と、当該入力音利得情報の変化量が当該参照音利得情報の変化量と離れている度合いを示す利得調整情報Ωと、を計算し、適応フィルタ係数αを当該利得調整情報Ωに応じた値にする（ステップＳ８）。以下、詳細に説明する。 The gain calculation unit 26 uses the adjusted reference sound signal Q ′ (x) and the post-suppression input sound signal P ″ (x) to input sound gain information β ₁ , reference sound gain information β ₂ , Gain adjustment information Ω indicating the degree of change in the input sound gain information from the change in the reference sound gain information is calculated, and the adaptive filter coefficient α is set to a value corresponding to the gain adjustment information Ω ( Step S8). Details will be described below.

利得計算部２６中のＡＤ変換手段２６２は、入力音のアナログ信号を所定のサンプリング周波数で量子化することによりデジタル化して、フレーム分割手段２６４に送る。フレーム分割手段２６４は、入力された音を一定の時間長のフレームで分割する。例えば、１フレームの長さを１００ｍｓ（サンプリング周波数が１６ｋＨｚである場合にはフレームを構成するサンプル数は１６００）とする。このように、フレームの時間長を例えば男性の音声波形及び電源ノイズの基本周期よりも十分長くすることにより、声の高低及び電源ノイズによらず安定して利得調整をすることができる。フレーム化された音信号は、バッファ２６６に送られる。 The AD conversion unit 262 in the gain calculation unit 26 digitizes the analog signal of the input sound by quantizing at a predetermined sampling frequency, and sends the digitized signal to the frame dividing unit 264. The frame dividing means 264 divides the input sound into frames having a certain time length. For example, the length of one frame is set to 100 ms (when the sampling frequency is 16 kHz, the number of samples constituting the frame is 1600). Thus, by making the time length of the frame sufficiently longer than, for example, the male speech waveform and the basic period of the power supply noise, the gain can be adjusted stably regardless of the voice level and the power supply noise. The framed sound signal is sent to the buffer 266.

バッファ２６６は、予め定めた数１以上の数Ａ_１のフレームを一時的に格納する。直流バイアス計算手段２６８は、バッファ２６６に格納されたフレーム化された入力音信号を読み込み、その入力音信号の振幅の平均値を長時間観測して計算する。その平均値、すなわち直流成分の値は、減算手段２７０に送られる。減算手段２７０は、バッファ２６６から読み込んだ入力音信号から、直流バイアス計算手段２６８が計算した直流成分の値を減算して、バイアスのかかっていない入力音信号を生成する。生成された入力音信号は、音量計算手段２７２に入力される。音量計算手段２７２は入力音信号の音量を計算するものであり、計算された音量は、入力音利得調整指示手段２７４に入力される。入力音利得調整指示手段２７４は、入力音利得調整部２４で用いる利得β_１を求め、入力音利得調整部２４へ出力する。音量計算手段２７２と、入力音利得調整指示手段２７４との処理内容については公知の技術を用いればよく、求め方の好適例については、実施例２以降で説明する。 The buffer 266 temporarily stores a number A ₁ frames equal to or greater than a predetermined number 1. The DC bias calculation means 268 reads the framed input sound signal stored in the buffer 266 and calculates the average value of the amplitude of the input sound signal by observing it for a long time. The average value, that is, the value of the direct current component is sent to the subtracting means 270. The subtracting means 270 subtracts the value of the DC component calculated by the DC bias calculating means 268 from the input sound signal read from the buffer 266 to generate an unbiased input sound signal. The generated input sound signal is input to the volume calculation means 272. The volume calculation means 272 calculates the volume of the input sound signal, and the calculated volume is input to the input sound gain adjustment instruction means 274. The input sound gain adjustment instruction unit 274 calculates the gain β ₁ used in the input sound gain adjustment unit 24 and outputs the gain β ₁ to the input sound gain adjustment unit 24. A known technique may be used for the processing contents of the volume calculation means 272 and the input sound gain adjustment instruction means 274, and preferred examples of how to obtain it will be described in the second and subsequent embodiments.

一方、参照音利得調整部２２は参照音Ｑの音量（参照音信号の振幅レベル）を参照音利得情報β_２を用いて調整することで、調整後参照音信号Ｑ’（ｘ）を出力する（ステップＳ４）。出力された調整後参照音信号Ｑ’（ｘ）はＡＤ変換手段２８２に入力される。そして、参照音利得調整指示手段２９４は参照音利得調整部２２で用いる利得β_２を求め、参照音利得調整部２２へ出力する。処理の詳細は、入力音信号に対して処理と同様であるので、説明を省略する。 On the other hand, the reference sound gain adjusting unit 22 outputs the adjusted reference sound signal Q ′ (x) by adjusting the volume of the reference sound Q (the amplitude level of the reference sound signal) using the reference sound gain information β _2. (Step S4). The output adjusted reference sound signal Q ′ (x) is input to the AD conversion means 282. Then, the reference sound gain adjustment instruction unit 294 obtains the gain β ₂ used in the reference sound gain adjustment unit 22 and outputs it to the reference sound gain adjustment unit 22. Details of the processing are the same as the processing for the input sound signal, and thus description thereof is omitted.

エコー抑圧部２８は、調整後入力音信号Ｐ’（ｘ）から調整後参照音信号Ｑ’（ｘ）に適応フィルタ係数αを畳み込んだものを減算することで、エコー音信号が抑圧された抑圧後入力音信号Ｐ’’（ｘ）を出力する（ステップＳ６）。つまり、以下の式の演算が行われる。
Ｐ’’（ｘ）＝Ｐ’（ｘ）−αＱ’（ｘ）（１）
ここで、上述のように、
Ｐ（ｘ）＝Ｆ（ｘ）＋γＧ（ｘ）
Ｑ（ｘ）＝Ｇ（ｘ）（２）
となり、
Ｐ’（ｘ）＝β_１Ｐ（ｘ）＝β_１（Ｆ（ｘ）＋γＧ（ｘ））
Ｑ’（ｘ）＝β_２Ｑ（ｘ）＝β_２Ｇ（ｘ）（３）
となる。
式（３）を式（１）に代入すると、
Ｐ’’（ｘ）＝β_１（Ｆ（ｘ）＋γＧ（ｘ））−αβ_２Ｇ（ｘ）（４）
になる。 The echo suppression unit 28 subtracts the adjusted input sound signal P ′ (x) subtracted from the adjusted reference sound signal Q ′ (x) from the adaptive filter coefficient α, thereby suppressing the echo sound signal. An input sound signal P ″ (x) after suppression is output (step S6). That is, the following formula is calculated.
P ″ (x) = P ′ (x) −αQ ′ (x) (1)
Here, as described above,
P (x) = F (x) + γG (x)
Q (x) = G (x) (2)
And
P ′ (x) = β ₁ P (x) = β ₁ (F (x) + γG (x))
Q ′ (x) = β ₂ Q (x) = β ₂ G (x) (3)
It becomes.
Substituting equation (3) into equation (1),
P ″ (x) = β ₁ (F (x) + γG (x)) − αβ ₂ G (x) (4)
become.

ただし、Ｐ’’（ｘ）はエコー抑圧部２８から出力される、エコー抑圧後の入力音信号であるとする。上述のように、エコー抑圧部２８は、入力音利得調整部２４で調整された後の第１音信号β_１Ｆ（ｘ）（以下、「調整後第１音信号」という。）のみを出力しなければならないので、
Ｐ’’（ｘ）＝β_１Ｆ（ｘ）（５）
とならなければならない。 However, it is assumed that P ″ (x) is an input sound signal after echo suppression output from the echo suppression unit 28. As described above, the echo suppressing unit 28 outputs only the first sound signal β ₁ F (x) (hereinafter referred to as “the adjusted first sound signal”) after being adjusted by the input sound gain adjusting unit 24. So you have to
P ″ (x) = β ₁ F (x) (5)
Must be.

式（５）を式（４）に代入すると、
β_１Ｆ（ｘ）＝β_１（Ｆ（ｘ）＋γＧ（ｘ））＋αβ_２Ｇ（ｘ）（６）
となり、αについて求めると、
α＝−β_１γ／β_２（７）
となる。 Substituting equation (5) into equation (4),
β ₁ F (x) = β ₁ (F (x) + γG (x)) + αβ ₂ G (x) (6)
Then, as for α,
α = −β ₁ γ / β ₂ (7)
It becomes.

利得計算部２６は、この式（７）が成り立つ適応フィルタ係数αになるような利得調整情報Ωを生成し、エコー抑圧部２８に送信すればよい。 The gain calculation unit 26 may generate the gain adjustment information Ω such that the adaptive filter coefficient α for which the equation (7) is satisfied, and transmit it to the echo suppression unit 28.

利得調整情報Ωの詳細について説明する。式（７）では、γは上述のように、音響・電気系による係数であり定数である。従って、β_１、β_２により利得調整情報Ωは求まる。例えば、電話機を用いた会話をしている利用者が、受話音を聞き取りづらい時に、受話音の音量を上げる場合がある。そのような場合には、第１収音部２で、入力音信号Ｐ（ｘ）の利得が上がる。また、入力音利得調整部２４や参照音利得調整部２２により入力音信号Ｐ（ｘ）や参照音信号Ｑ（ｘ）の利得が上がる場合もある。ここで、例えば、入力音利得調整部２４で、入力音信号Ｐ（ｘ）の利得が４倍になり、参照音利得調整部２２で参照音信号Ｑ（ｘ）の利得が２倍になった場合を考える。この場合には、適応フィルタ係数αが変わらなければ、式（４）によりエコー音信号が残ってしまう。エコー音信号の全てを抑圧するためには、式（７）により、β_１／β_２＝２となり、この場合では適応フィルタ係数αを２倍にしなければならないことが理解されよう。 Details of the gain adjustment information Ω will be described. In the equation (7), γ is a coefficient by the acoustic / electrical system and is a constant as described above. Therefore, gain adjustment information Ω is obtained from β ₁ and β ₂ . For example, when a user who is having a conversation using a telephone has difficulty in listening to the received sound, the volume of the received sound may be increased. In such a case, the gain of the input sound signal P (x) is increased in the first sound collection unit 2. Further, the input sound gain adjustment unit 24 and the reference sound gain adjustment unit 22 may increase the gain of the input sound signal P (x) and the reference sound signal Q (x). Here, for example, in the input sound gain adjustment unit 24, the gain of the input sound signal P (x) is quadrupled, and in the reference sound gain adjustment unit 22, the gain of the reference sound signal Q (x) is doubled. Think about the case. In this case, if the adaptive filter coefficient α does not change, the echo sound signal remains according to equation (4). In order to suppress all of the echo sound signals, it will be understood from equation (7) that β ₁ / β ₂ = 2 and in this case the adaptive filter coefficient α must be doubled.

また、入力音信号Ｐ（ｘ）の利得が２倍になり、参照音信号Ｑ（ｘ）の利得が４倍になった場合には、適応フィルタ係数αが変わらなければ、式（４）により過大に減算してしまい、出力される信号に参照音の逆位相の信号が現れてしまい、結果として、エコー音として聞こえてしまう。この場合には、式（７）により、β_１／β_２＝１／２となり、適応フィルタ係数αを１／２倍にしなければならない。 When the gain of the input sound signal P (x) is doubled and the gain of the reference sound signal Q (x) is four times, if the adaptive filter coefficient α does not change, the expression (4) The signal is excessively subtracted, and a signal having a phase opposite to that of the reference sound appears in the output signal, and as a result, it is heard as an echo sound. In this case, according to Equation (7), β ₁ / β ₂ = 1/2, and the adaptive filter coefficient α must be halved.

また、対数で考えた場合に、β_１が＋６ｄＢ変化し、β_２が＋３ｄＢ変化した場合にはβ_１−β_２＝３ｄＢとなり、適応フィルタ係数αを３ｄＢ上げなければならない。 Further, when considering logarithm, when β ₁ changes by +6 dB and β ₂ changes by +3 dB, β ₁ −β ₂ = 3 dB, and the adaptive filter coefficient α must be increased by 3 dB.

このように、利得調整情報Ωとは入力音利得情報β_１が参照音利得情報β_２と離れている度合いを示すものである。利得決定手段２７６は、入力音利得情報β_１の変化量と参照音利得情報β_２の変化量が実数で表されている場合には、
Ω＝入力音利得情報β_１の変化量／参照音利得情報β_２の変化量
を演算する。 Thus, the gain adjustment information Ω indicates the degree to which the input sound gain information β ₁ is separated from the reference sound gain information β ₂ . When the change amount of the input sound gain information β _{1 and} the change amount of the reference sound gain information β ₂ are expressed by real numbers, the gain determination means 276
Ω = change amount of input sound gain information β ₁ / change amount of reference sound gain information β ₂ is calculated.

また、利得決定手段２７６は、入力音利得情報β_１の変化量と参照音利得情報β_２の変化量が対数（ｄＢ）で表されている場合には、
Ω＝入力音利得情報β_１の変化量−参照音利得情報β_２の変化量
を演算する。そして、利得決定手段２７６よりの利得調整情報Ωはエコー抑圧部２８へすることで、適応フィルタ係数αを利得調整情報Ωに応じた値にする。 Also, the gain determining unit 276, when the change amount of the reference sound gain information beta ₂ and the input sound gain information beta ₁ variation is logarithmic (dB) is
Ω = change amount of input sound gain information β ₁ −change amount of reference sound gain information β ₂ is calculated. Then, the gain adjustment information Ω from the gain determining means 276 is sent to the echo suppression unit 28, so that the adaptive filter coefficient α is set to a value corresponding to the gain adjustment information Ω.

また、入力音信号Ｐ（ｘ）、調整後入力音信号Ｐ’（ｘ）のエコー音信号γＧ（ｘ）については、遅延差τが存在するので、厳密にいうと、
Ｐ（ｘ）＝Ｆ（ｘ）＋γＧ（ｘ−τ）
Ｐ’（ｘ）＝β_１（Ｆ（ｘ）＋γＧ（ｘ−τ））（８）
になる。 Moreover, since there is a delay difference τ for the input sound signal P (x) and the echo sound signal γG (x) of the adjusted input sound signal P ′ (x), strictly speaking,
P (x) = F (x) + γG (x−τ)
P ′ (x) = β ₁ (F (x) + γG (x−τ)) (8)
become.

つまり式（６）は以下のようになる。
β_１Ｆ（ｘ）＝β_１（Ｆ（ｘ）＋γＧ（ｘ−τ））＋αβ_２Ｇ（ｘ−τ’）
−β_１γＧ（ｘ−τ）＝αβ_２Ｇ（ｘ−τ’）（６’）
ただし、τ’はエコー抑圧部２８で学習すべき遅延差であるとする。
そして、利得決定手段２７６が、この式（６’）が成り立つようなα、τ’をエコー抑圧部２８が学習するような利得調整情報Ωを生成して、エコー抑圧部２８に送信すればよい。 That is, Equation (6) is as follows.
β ₁ F (x) = β ₁ (F (x) + γG (x−τ)) + αβ ₂ G (x−τ ′)
−β ₁ γG (x−τ) = αβ ₂ G (x−τ ′) (6 ′)
However, it is assumed that τ ′ is a delay difference to be learned by the echo suppressing unit 28.
Then, the gain determination means 276 may generate the gain adjustment information Ω so that the echo suppression unit 28 learns α and τ ′ such that the expression (6 ′) holds, and transmits the gain adjustment information Ω to the echo suppression unit 28. .

出力部２１は、抑圧後入力音信号Ｐ’’（ｘ）を出力する（ステップＳ１０）。また、出力部２１は、抑圧後入力音信号Ｐ’’（ｘ）と調整後参照音信号Ｑ’（ｘ）を個別に出力するようにしてもよい。全ての入力音信号、参照音信号の収音が終わるまで上記の処理を続ける（ステップＳ１２）。 The output unit 21 outputs the post-suppression input sound signal P ″ (x) (step S10). The output unit 21 may individually output the post-suppression input sound signal P ″ (x) and the adjusted reference sound signal Q ′ (x). The above processing is continued until all input sound signals and reference sound signals are collected (step S12).

このように、適応フィルタ係数αを、入力音利得情報の変化量が当該参照音利得情報の変化量と離れている度合いを示す利得調整情報Ωに応じた値にすることで、入力音や参照音の利得（音量）を調整した場合であっても、エコー音信号を全て消去できる。 As described above, the adaptive filter coefficient α is set to a value corresponding to the gain adjustment information Ω indicating the degree to which the change amount of the input sound gain information is different from the change amount of the reference sound gain information. Even when the sound gain (volume) is adjusted, the entire echo sound signal can be erased.

また、図１に記載のように、従来の音量調整装置８では、入力音に含まれるエコー音音信号に合わせて誤って音量を調整してしまう場合があった。ところが、この実施例１の音量調整装置２０の利得計算部２６は、調整後参照音信号Ｑ’（ｘ）と抑圧後入力音信号Ｐ’’（ｘ）を用いて、入力音利得情報β_１と、参照音利得情報β_２と、利得調整情報Ωを求める例を説明した。この構成により、エコー音信号に音量を合わせるといった誤った音量調整を回避できるという顕著な効果を有する。従って、この効果の必要性のない場合は、利得計算部２６は、調整後参照音信号Ｑ’（ｘ）と調整後入力音信号Ｐ’（ｘ）を用いて、入力音利得情報β_１と、参照音利得情報β_２と、利得調整情報Ωを求めるようにしてもよい。 In addition, as shown in FIG. 1, the conventional volume adjusting device 8 sometimes erroneously adjusts the volume in accordance with the echo sound signal included in the input sound. However, the gain calculation unit 26 of the volume adjusting device 20 of the first embodiment uses the adjusted reference sound signal Q ′ (x) and the post-suppression input sound signal P ″ (x) to input sound gain information β _1. with a reference sound gain information beta _2, an example was described for obtaining the gain adjustment information Omega. With this configuration, there is a remarkable effect that it is possible to avoid erroneous volume adjustment such as adjusting the volume to the echo sound signal. Therefore, when there is no necessity for this effect, the gain calculation unit 26 uses the adjusted reference sound signal Q ′ (x) and the adjusted input sound signal P ′ (x), and the input sound gain information β ₁ . , a reference sound gain information beta _2, may be obtained gain adjustment information Omega.

実施例２以降では、入力音利得調整指示手段２７４、参照音利得調整手段２９４で求める好適な利得の求める手法を説明する。図８に音量計算手段２７２、入力音利得調整指示手段２７４の機能構成例を示す。音量計算手段２７２は、外形値決定手段２７２２、終始判定手段２７２４、有音無音フレーム判定手段２７２６、有音無音区間判定手段２７２８を有し、入力音利得調整指示手段２７４は入力音第１利得調整指示手段２７４２、入力音第２利得調整指示手段２７４４（実施例２で説明）、とを有する。入力音利得調整部２４は、第１入力音利得調整手段２４２、第２入力音利得調整手段２４４（実施例３で説明）とを有する。実施例２〜４では入力音側の処理について説明し、参照音側の処理については同様なので、省略する。実施例２〜４の説明は、「特許出願番号：特願２００７−２９３７４３号発明の名称「音量調整装置、方法およびプログラム」」に記載されているが、念のため説明する。 In the second and subsequent embodiments, a method for obtaining a suitable gain obtained by the input sound gain adjustment instruction unit 274 and the reference sound gain adjustment unit 294 will be described. FIG. 8 shows a functional configuration example of the volume calculation means 272 and the input sound gain adjustment instruction means 274. The sound volume calculation means 272 includes an external shape determination means 2722, a start / stop determination means 2724, a sound / silence frame determination means 2726, and a sound / silence section determination means 2728. The input sound gain adjustment instruction means 274 is the input sound first gain adjustment. Instruction means 2742 and input sound second gain adjustment instruction means 2744 (described in the second embodiment). The input sound gain adjustment unit 24 includes first input sound gain adjustment means 242 and second input sound gain adjustment means 244 (described in the third embodiment). In the second to fourth embodiments, the processing on the input sound side will be described, and the processing on the reference sound side is the same and will be omitted. The description of Examples 2 to 4 is described in “Patent Application Number: Japanese Patent Application No. 2007-293743 Name of Invention“ Volume Adjusting Device, Method and Program ””.

減算手段２７０からの入力音信号は外形値決定手段２７２２と終始判定手段２７２４に入力される。終始判定手段２７２４は、フレームごとの音信号の絶対値の平均値を観測することで、発音の開始時と発音の終了時を判定する。発音の開始時と発音の終了時の音区間のことを、通話区間と定義する。発音の開始時と発音の終了時とは、音が電話等の音声である場合には通話の始端と終端のことである。この場合、発音は、いわゆる通話区間に相当することになる。 The input sound signal from the subtracting means 270 is input to the outer shape value determining means 2722 and the end-to-end determining means 2724. The end-to-end determination means 2724 determines the start time of sound generation and the end time of sound generation by observing the average value of the absolute values of sound signals for each frame. The sound section at the beginning and end of pronunciation is defined as a call section. The start and end of pronunciation are the beginning and end of a call when the sound is a voice such as a phone call. In this case, the pronunciation corresponds to a so-called call section.

具体的には、終始判定手段２７２４内の平均値計算手段（図示せず）は、入力された入力音信号の振幅の絶対値の平均値をフレームごとに計算する。そして、終始判定手段２７２４が、計算された振幅の絶対値の平均値が予め定められた第７閾値Ａ_２よりも大きいかどうかを順次判定して、大きいと判定された場合には発音が開始されたと判定する。計算された振幅の絶対値の平均値が予め定められた第７閾値Ａ_２よりも大きいと判定された場合に、その判定された時から一定時間長（例えば０．５秒）遡った時から発音が開始されたと判定してもよい。 Specifically, average value calculation means (not shown) in the start / end determination means 2724 calculates the average value of the absolute values of the amplitudes of the input sound signals inputted for each frame. Then, throughout determination unit 2724 is, to determine whether the average value of the calculated absolute value of the amplitude is larger than the seventh threshold value A ₂ with a predetermined sequence, the sound in the case where it is determined to be larger starting It is determined that If the average value of the calculated absolute value of the amplitude is determined to be larger than the seventh threshold value A ₂ predetermined, from the time of going back a predetermined time length from the time when the it is determined (e.g., 0.5 seconds) It may be determined that pronunciation has started.

また、終始判定手段２７２４は、計算された振幅の絶対値の平均値が、予め定められた第８閾値Ａ_３（閾値Ａ_３は、閾値Ａ_２よりも小さい値である。）よりも小さい状態が予め定められた一定時間長続いた場合には、又は、予め定められた数Ａ_４のフレームだけ続いた場合には、発音が終了したと判定し、その旨の信号を各構成部に送る。 In addition, the all-time determination unit 2724 has a state in which the average value of the absolute values of the calculated amplitudes is smaller than a predetermined eighth threshold A ₃ (threshold A ₃ is smaller than the threshold A ₂ ). Is continued for a predetermined period of time, or when it has continued for a predetermined number A ₄ frames, it is determined that the sound generation has ended, and a signal to that effect is sent to each component. .

発音が開始された旨の信号を受け取った外形値決定手段２７２２は、フレームの音の大きさを表す特徴量である外形値をフレームごとに求める。例えば、外形値とは、入力音信号の振幅の絶対値の最大値のことである。換言すると、外形値とは、フレームを構成する複数のサンプルの値の最大値のことである。求められたフレームごとの外形値は、有音無音フレーム判定手段２７２６、入力音第１利得調整指示手段２７４２に送られる。図９Ａ、Ｂに、外形値抽出の具体例を示す。図９Ａはバイアスがかかっていない音信号（つまり減算手段２７０の出力信号）の波形である。図９Ｂは、図９Ａに示した音信号の波形からフレームごとに振幅の絶対値の最大値（外形値）を求めて、図示したものである。 The external shape determination means 2722 that has received the signal that the sound generation has been started obtains an external shape value, which is a feature amount indicating the loudness of the frame, for each frame. For example, the outer shape value is the maximum absolute value of the amplitude of the input sound signal. In other words, the outer shape value is the maximum value of the values of a plurality of samples constituting the frame. The obtained outer shape value for each frame is sent to the sound / silence frame determination means 2726 and the input sound first gain adjustment instruction means 2742. 9A and 9B show specific examples of external value extraction. FIG. 9A shows a waveform of an unbiased sound signal (that is, an output signal of the subtracting means 270). FIG. 9B illustrates the maximum value (outer shape value) of the absolute value of the amplitude obtained for each frame from the waveform of the sound signal illustrated in FIG. 9A.

再度、図８を参照して説明をする。有音無音フレーム判定手段２７２６は、外形値と予め定められた第２閾値Ａ_５とを比較して、外形値の方が大きければそのフレームを有音フレームと判定し、そうでなければ、そのフレームを無音フレームと判定する。第２閾値Ａ_５を、予め定めた値とせずに、例えば、過去１０秒間の無音フレームの外形値の最小値の定数倍（例えば３倍）の値として動的に閾値Ａ_５を変化させてもよい。フレームが、有音フレームであるか、無音フレームであるかの情報は、有音無音区間判定手段２７２８に送られる。 The description will be given again with reference to FIG. Voice activity frame determination unit 2726 compares the second threshold value A ₅ with a predetermined outer shape value, the larger the better contour value determines the frame as voiced frame, otherwise, the The frame is determined as a silent frame. The second threshold value A _5, without the predetermined value, for example, dynamically changing the threshold value A ₅ as the value of the constant multiple of the minimum value of the outline values of the silent frame of the past 10 seconds (e.g., 3 times) Also good. Information on whether the frame is a sound frame or a sound frame is sent to the sound / silence section determination means 2728.

有音無音区間判定手段２７２８は、無音フレームが予め定められた第１閾値Ａ_６（例えば５、時間長にして０．５秒となるように、Ａ_６を設定する。）以上連続する場合には、その連続するフレームから構成される音区間を無音区間と判定し、それ以外のフレームから構成される音区間を有音区間と判定する。有音区間、無音区間についての情報は、入力音第１利得調整指示手段２７４２に送られる。 The voiced / silent section determination means 2728 is set when the silent frame continues for a predetermined first threshold value A ₆ (for example, A ₆ is set so that the time length is 0.5 seconds). Determines that a sound section composed of consecutive frames is a silent section, and determines a sound section composed of other frames as a sound section. Information about the voiced section and the silent section is sent to the input sound first gain adjustment instruction means 2742.

図１０に入力音第１利得調整指示手段２７４２の機能構成例を示す。入力音第１利得調整指示手段２７４２は、第１音区間抽出手段２８０２、第１音区間外形値抽出手段２８０３、第１決定手段２８０８で構成されている。更に、第１音区間外形値抽出手段２８０３は、除外手段２８０４、最大値決定手段２８０６とで構成されている。有音区間、無音区間についての情報は第１音区間抽出手段２８０２に入力される。第１音区間抽出手段２８０２は、上記判定された有音区間が予め定められた時間長Ａ_７（例えば２秒）よりも長いかどうか、又は、上記判定された有音区間を構成するフレーム数Ａ_８（例えば２０フレーム）が予め定められた数Ａ_８よりも大きい場合には、その有音区間を第１音区間とする。入力される音が電話等の音声である場合には、第１音区間はいわゆる発話区間に相当する。発話区間は、人間が一呼吸で発した音の区間のことである。このようにして、第１音区間を抽出することにより、「こんにちは」や「ちょっと質問があるのですが」といった人の感覚に近い長さの音区間を切り出すことができる。図９Ｂに、第１音区間の抽出の具体例を示す。例えば、この図９Ｂ示すように、０．５秒以上の無音区間を使って２秒以上の有音区間のかたまりを第１音区間として抽出する。 FIG. 10 shows a functional configuration example of the input sound first gain adjustment instruction unit 2742. The input sound first gain adjustment instruction means 2742 includes a first sound section extraction means 2802, a first sound section outer shape value extraction means 2803, and a first determination means 2808. Further, the first sound section outer shape value extracting means 2803 is composed of an excluding means 2804 and a maximum value determining means 2806. Information about the voiced section and the silent section is input to the first sound section extraction unit 2802. The first sound segment extraction means 2802 determines whether the determined sound segment is longer than a predetermined time length A ₇ (for example, 2 seconds), or the number of frames constituting the determined sound segment. When A ₈ (for example, 20 frames) is larger than a predetermined number A ₈ , the sounded section is set as the first sound section. When the input sound is a voice such as a phone call, the first sound section corresponds to a so-called speech section. The utterance section is a section of a sound that a person utters with one breath. In this way, by extracting the first sound section, it is possible to cut out the "Hello" or "little question is you, but" such as the sense to close the length of a person's sound section. FIG. 9B shows a specific example of the extraction of the first sound section. For example, as shown in FIG. 9B, a lump of sounded sections of 2 seconds or longer is extracted as a first sound section using a silent section of 0.5 seconds or longer.

第１音区間抽出手段２８０２は、例えば、第１音区間を構成するフレームと、それらのフレームの外形値とに関する情報を、第１音区間外形値抽出手段２８０３中の除外手段２８０４に送る。第１音区間を構成するフレームの外形値は、第１音区間抽出手段２８０２が外形値決定手段２７２２から受け取ったフレームの外形値の情報を用いる。 For example, the first sound segment extraction unit 2802 sends information on the frames constituting the first sound segment and the outline values of those frames to the exclusion unit 2804 in the first sound segment outline value extraction unit 2803. As the outer shape value of the frame constituting the first sound section, information on the outer shape value of the frame received by the first sound section extracting unit 2802 from the outer shape value determining unit 2722 is used.

除外手段２８０４は、第１音区間を構成する複数のフレームの外形値から、外形値が大きい方から複数の外形値を除外する。除外する外形値の数は、第１音区間を構成するフレームの数が多いほど多くするとよい。例えば、第１音区間を構成するフレームの数に予め設定した割合Ａ_９（例えば１０〜３０％、今回は２０％）をかけて、小数点以下を切り捨て・四捨五入・切り上げた数の外形値を除外する。予め定めた数Ａ_１０の外形値を除外することにしてもよい。除外されずに残った外形値は、最大値決定手段２８０６に送られる。 The excluding unit 2804 excludes a plurality of contour values from the one having a larger contour value, from the contour values of the plurality of frames constituting the first sound section. The number of external values to be excluded is preferably increased as the number of frames constituting the first sound section is larger. For example, multiply the number of frames that make up the first sound interval by a preset ratio A ₉ (for example, 10-30%, this time 20%), and exclude the rounded-off, rounded-off, rounded-up outline values. To do. It may be to exclude a predetermined outer shape of the number A _10. The outline value remaining without being excluded is sent to the maximum value determining means 2806.

最大値決定手段２８０６は、除外されずに残った外形値の最大値を求め、その最大値を第１音区間の外形値として保存する。第１音区間の外形値は、第１決定手段２８０８に送られる。 The maximum value determining means 2806 obtains the maximum value of the outer shape value that remains without being excluded, and stores the maximum value as the outer shape value of the first sound section. The external value of the first sound section is sent to the first determining means 2808.

第１決定手段２８０８は、第１音区間の外形値が予め定められた範囲に入るように、入力された音を調整するための情報（以下、第１入力音利得調整情報とする。）を決定して、入力音利得調整部２４に送る。例えば、第１決定手段２８０８に入力のピークが入力される。第１決定手段２８０８は、入力のピークに予め定められた割合Ａ_１１（例えば、１０％〜２５％）をかけた範囲に、第１音区間の外形値が入るように、利得を決定する。 The first determining means 2808 uses information for adjusting the input sound so that the external value of the first sound section falls within a predetermined range (hereinafter referred to as first input sound gain adjustment information). It is determined and sent to the input sound gain adjustment unit 24. For example, an input peak is input to the first determination unit 2808. The first determining means 2808 determines the gain so that the outer shape value of the first sound section falls within a range obtained by multiplying the input peak by a predetermined ratio A ₁₁ (for example, 10% to 25%).

なお、第１利得調整情報が決定された場合には、第１利得調整指示手段２７４２は、バッファ１５の遅延分の時間に相当するフレームについて、上記の処理を行わない。 Note that when the first gain adjustment information is determined, the first gain adjustment instruction unit 2742 does not perform the above-described process for the frame corresponding to the time corresponding to the delay of the buffer 15.

図９Ｃを参照して、具体例を説明する。除外手段２８０４は、第１音区間を構成するフレームの外形値のうち、外形値が大きい予め定められた数（この例では、７つ）の外形値を除外する。図９Ｃの白で示した外形値が除外された外形値である。最大値決定手段２８０６は、第１音区間の外形値として、除外されずに残った外形値のうち最も大きい外形値を選択する。除外されずに残った外形値が図９Ｃの黒と斜線で示した外形値であり、その最大値である第１音区間の外形値は斜線で示した外形値である。 A specific example will be described with reference to FIG. 9C. The excluding means 2804 excludes a predetermined number (seven in this example) of outer shape values having a large outer shape value from among the outer shape values of the frames constituting the first sound section. It is an outer shape value excluding the outer shape value shown in white in FIG. 9C. The maximum value determining means 2806 selects the largest contour value among the contour values remaining without being excluded as the contour value of the first sound section. The outline value that remains without being excluded is the outline value indicated by black and diagonal lines in FIG. 9C, and the external value of the first sound section, which is the maximum value, is the outline value indicated by diagonal lines.

第１音区間の外形値が入るべき予め定められた範囲を３０００〜８０００とすると、この例では、第１音区間の外形値はその範囲に入っていない。第１決定手段２８０８は、第１音区間の外形値とその範囲との差分を計算して、第１音区間の外形値がその範囲に入るように利得を決定する。第１音区間の外形値がその範囲に入っている場合には、処理を行わない。なお、予め定められた範囲３０００〜８０００という値は、量子化ビット数が１６ビットの場合の値で、振幅の最大値が２の８乗（３２７６８）の場合の値である。 Assuming that a predetermined range in which the outer shape value of the first sound section is to be entered is 3000 to 8000, in this example, the outer shape value of the first sound section is not in that range. The first determining means 2808 calculates the difference between the external value of the first sound section and the range thereof, and determines the gain so that the external value of the first sound section falls within the range. When the external value of the first sound section is within the range, no processing is performed. The predetermined range of 3000 to 8000 is a value when the number of quantization bits is 16 bits, and is a value when the maximum amplitude is 2 to the 8th power (32768).

別の具体例を説明する。第１音区間の外形値が入力のピークの５％であり、第１音区間の外形値が入るべき予め定められた範囲が入力のピークの１０％〜２５％であるとする。この場合、第１決定手段２８０８は、第１音区間の外形値が入力のピークの１０％になるように、利得を決定する。このように、利得調整後の第１音区間の外形値が、予め定められた範囲の上限値又は下限値のうち、利得調整前の第１音区間の外形値と近い方の値と等しくなるように、利得を決定することにより、利得調整量が最も小さくすることができ、音の所定の特徴量の変化を最も小さくすることができる。 Another specific example will be described. It is assumed that the external value of the first sound section is 5% of the input peak, and the predetermined range in which the external value of the first sound section is to be input is 10% to 25% of the input peak. In this case, the first determining means 2808 determines the gain so that the outer shape value of the first sound section is 10% of the input peak. As described above, the outer shape value of the first sound interval after gain adjustment is equal to the value closer to the outer shape value of the first sound interval before gain adjustment among the upper limit value or lower limit value of the predetermined range. Thus, by determining the gain, the gain adjustment amount can be minimized, and the change in the predetermined feature amount of the sound can be minimized.

また、このように、第１音区間の外形値が入るべき予め定められた範囲を設けて、この範囲に第１音区間の外形値が入っている場合には上記の利得の計算を行わないようにすることにより、利得を変更する回数を少なくすることができる。これにより、音の波形が歪む回数を少なくすることができるため、音の所定の特徴量の変化を小さくすることができる。 In addition, in this way, when a predetermined range in which the outer shape value of the first sound section is to be entered is provided and the outer shape value of the first sound section is included in this range, the above gain calculation is not performed. By doing so, the number of times of changing the gain can be reduced. As a result, the number of times the waveform of the sound is distorted can be reduced, so that the change in the predetermined feature amount of the sound can be reduced.

この方法では、「はい」、「あ」、「えー」等の音量が不安定な短い音区間ではなく、「お電話ありがとうございます。」、「ちょっと聞きたいことがあるのですが」等のある程度の長さを持ち音量が安定した音区間を利得調整の基準としている。また、第１音区間を構成する複数のフレームの外形値から、外形値が大きい複数の外形値を除外して、除外されずの残った外形値の最大値を第１音区間の外形値として、その第１音区間の外形値を用いて、利得を調整している。 In this method, “Yes”, “Ah”, “Eh”, etc., not the short sound intervals where the volume is unstable, such as “Thank you for calling”, “I have something I want to hear” A sound section having a certain length and a stable volume is used as a reference for gain adjustment. Further, by excluding a plurality of contour values having a large contour value from the contour values of a plurality of frames constituting the first sound section, the maximum value of the remaining contour values not excluded is used as the contour value of the first sound section. The gain is adjusted using the external value of the first sound interval.

これにより、咳やくしゃみ等の突発的な雑音の影響を受けにくくなり、かつ、対象とする音の振幅の分散の大小によっても利得調整後の音量が入力のピークが超えることがなくなる。 This makes it less susceptible to sudden noise such as coughing and sneezing, and the volume after gain adjustment does not exceed the input peak even if the amplitude of the target sound is dispersed.

上記の例においては、第１音区間を構成するフレームの外形値のうち、大きい方から２０％の外形値を除外し、第１音区間の外形値が入るべき予め定められた範囲を入力ピークの１０％〜２０％としている。これは、実験を行った結果、突発的な雑音を除くと、入力のピークが第１音区間の外形値のおよそ４倍未満であったためである。 In the above example, out of the outer shape values of the frames constituting the first sound interval, the outer shape value of 20% from the larger one is excluded, and a predetermined range in which the outer shape value of the first sound interval should be entered is the input peak. 10% to 20%. This is because, as a result of the experiment, the peak of the input was less than about 4 times the external value of the first sound interval, excluding sudden noise.

再度、図８を参照して説明をする。入力音利得調整部２４の第１入力音利得調整手段２４２は、入力音第１利得調整指示手段２７４２が決定した第１入力音利得情報を用いて、入力された音の音量を調整して出力する。第１入力音利得調整手段２４２は、新たな第１入力音利得調整情報が入力音第１利得調整指示手段２７４２から送られてくるまで、既に送られている第１入力音利得情報に基づいて利得調整を行う。 The description will be given again with reference to FIG. The first input sound gain adjustment unit 242 of the input sound gain adjustment unit 24 adjusts the volume of the input sound using the first input sound gain information determined by the input sound first gain adjustment instruction unit 2742 and outputs the adjusted sound. To do. The first input sound gain adjustment means 242 is based on the first input sound gain information already sent until new first input sound gain adjustment information is sent from the input sound first gain adjustment instruction means 2742. Perform gain adjustment.

このような構成にすることで、頻繁に音量を調整するための利得が変化する場合と比較して、音の所定の特徴量が失われづらくなる。 By adopting such a configuration, it is difficult to lose a predetermined feature amount of the sound as compared with a case where the gain for adjusting the sound volume frequently changes.

この実施例３では、第１音区間よりも短い音区間（第２音区間）を基準として、利得調整をする入力音第２利得調整指示手段２７４４、第２入力音利得調整手段２４４を有している実施例について説明する。 The third embodiment includes input sound second gain adjustment instruction means 2744 and second input sound gain adjustment means 244 that perform gain adjustment with reference to a sound section (second sound section) shorter than the first sound section. Examples will be described.

図１１に、入力音第２利得調整指示手段２７４４の説明をする。減算手段２７０から出力された入力音信号は（図８参照）、入力音第２利得調整指示手段２７４４の過大入力サンプル数決定手段２７４６に入力される。過大入力サンプル数決定手段２７４６は、予め定められた第３閾値Ａ_１２（例えばサンプル値で表現することができる値の上限の９０％の値）よりも大きいサンプルの数（以下、過大入力サンプル数とする。）をフレームごとに決定する。決定されたフレームごとの過大入力サンプル数は、過大入力フレーム決定手段２７４８と、記憶手段２７５０とに送られる。 FIG. 11 illustrates the input sound second gain adjustment instruction unit 2744. The input sound signal output from the subtracting means 270 (see FIG. 8) is input to the excessive input sample number determining means 2746 of the input sound second gain adjustment instruction means 2744. The excessive input sample number determination means 2746 has a number of samples larger than a predetermined third threshold A ₁₂ (for example, a value that is 90% of the upper limit of a value that can be expressed by a sample value) (hereinafter referred to as an excessive input sample number). For each frame. The determined number of excessive input samples for each frame is sent to the excessive input frame determination means 2748 and the storage means 2750.

過大入力フレーム決定手段２７４８は、過大入力サンプル数が予め定められた第４閾値Ａ_１３（１フレームのサンプル数の３０％の数）よりも大きいかどうかをフレームごとに決定する。以下、過大入力サンプル数が予め定められた第４閾値Ａ_１３よりも大きいフレームを、過大入力フレームとする。過大入力フレームについての情報（例えば、過大入力フレームであることを表すフラグ）は、記憶手段２７５０に送られる。 The excessive input frame determination means 2748 determines for each frame whether or not the excessive input sample number is larger than a predetermined fourth threshold A ₁₃ (30% of the number of samples in one frame). Hereinafter, a larger frame than the fourth threshold value A ₁₃ excessive number of input samples has been determined in advance and excessive input frame. Information on the excessive input frame (for example, a flag indicating that it is an excessive input frame) is sent to the storage unit 2750.

第２音区間過大入力サンプル数決定手段２７５２は、第１音区間を構成するフレームの数よりも少ない数Ａ_１４（例えば１０、時間長にして１秒）のフレームから構成される音区間を第２音区間として、その第２音区間を構成するフレームについての過大入力サンプル数の総数を計算して、その総数を第２決定手段２７５６に送る。具体的には、第２音区間が過去１０フレームである場合には、記憶手段２７５０から、過去１０フレームの過大入力サンプル数をそれぞれ読み出して、それらを加算することにより、過大入力サンプル数の総数を求める。 The second sound section excessive input sample number determining means 2752 selects a sound section composed of frames of a number A ₁₄ (for example, 10 for a time length of 1 second) smaller than the number of frames constituting the first sound section. As a two-sound section, the total number of excessive input samples for the frames constituting the second sound section is calculated, and the total number is sent to the second determining means 2756. Specifically, when the second sound interval is the past 10 frames, the number of excessive input samples of the past 10 frames is read from the storage unit 2750 and added to obtain the total number of excessive input samples. Ask for.

第２音区間過大入力フレーム数決定手段２７５４は、第２音区間を構成するフレームの中の過大入力フレームの数を決定して、その数を第２決定手段２７５６に送る。具体的には、第２音区間が過去１０フレームである場合には、記憶手段２７５０から、過去１０フレームの過大入力フレームについての情報を読み込み、過大入力フレームの数を決定する。 The second sound section excessive input frame number determination means 2754 determines the number of excessive input frames in the frames constituting the second sound section, and sends the number to the second determination means 2756. Specifically, when the second sound interval is the past 10 frames, information on the excessive input frames of the past 10 frames is read from the storage unit 2750, and the number of excessive input frames is determined.

第２決定手段２７５６は、過大入力サンプル数の総数が予め定められた第５閾値Ａ_１５（例えば第２音区間を構成するサンプルの総数の２０％の数）よりも大きく、かつ、過大入力フレームの数が予め定められた第６閾値Ａ_１６（第２音区間が１０フレームである場合には、例えば３）よりも大きい場合には、入力された音の音量を所定の音量だけ下げるための情報（以下、第２入力音利得調整情報とする。）を、入力音利得調整部２４中の第２入力音利得調整手段２４４に送る。第２入力音利得調整情報は、具体的な利得の値（例えば０．７、音量にして３ｄＢ）等であってもよいし、具体的な数値を伴わない単なる音量を下げる旨を指示する情報であってもよい。 The second determining means 2756 has an excessive input frame in which the total number of excessive input samples is larger than a predetermined fifth threshold A ₁₅ (for example, 20% of the total number of samples constituting the second sound section). Is greater than a predetermined sixth threshold A ₁₆ (for example, 3 if the second sound interval is 10 frames), the volume of the input sound is decreased by a predetermined volume. Information (hereinafter referred to as second input sound gain adjustment information) is sent to the second input sound gain adjustment means 244 in the input sound gain adjustment unit 24. The second input sound gain adjustment information may be a specific gain value (for example, 0.7, 3 dB in volume) or the like, or information that instructs to simply decrease the volume without a specific numerical value. It may be.

入力音利得調整部２４の第２入力音利得調整手段２４４は、第２入力音利得調整情報に基づいて、入力された音の音量を下げる。利得を下げた場合には、第２入力音利得調整手段２４４は、フレームに短時間利得調整フラグを立て、以降は、バッファ１５の遅延分の時間に相当するフレームについて処理を行わない。 The second input sound gain adjustment means 244 of the input sound gain adjustment unit 24 reduces the volume of the input sound based on the second input sound gain adjustment information. When the gain is lowered, the second input sound gain adjustment unit 244 sets a short-time gain adjustment flag for the frame, and thereafter, does not perform processing for the frame corresponding to the delay time of the buffer 15.

これにより、発声部分の波形が入力のピークを越えることによって波形が歪むことのない音声を収音できる。 As a result, it is possible to collect a sound whose waveform is not distorted when the waveform of the utterance part exceeds the input peak.

この実施例４では、入力音終了時利得調整手段２４６を有することにより、終始判定手段２７２４によって発音の開始が検出された後は、上記のように、入力音第１利得調整指示手段２７４２、第２利得調整指示手段２７４４の指示に従って利得が調節される。終始判定手段２７２４が発音の終了を検出した場合には、発音が終了した旨の情報が、入力音終了時利得調整手段２４６に送られる。 In the fourth embodiment, since the input sound end gain adjusting means 246 is provided, after the start / stop determination means 2724 detects the start of sound generation, as described above, the input sound first gain adjustment instructing means 2742, The gain is adjusted according to the instruction of the two gain adjustment instruction means 2744. When the end-to-end determination means 2724 detects the end of sound generation, information indicating that sound generation has ended is sent to the gain adjustment means 246 at the end of the input sound.

入力音終了時利得調整手段２４６は、発音が終了した旨の情報を受け取ると、入力音利得調整部２４に設定された発音の終了時の利得を読み込んで、入力音終了時利得調整手段２４６の記憶手段２４６２に格納する。そして、入力音終了時利得調整手段２４６は、直近の発音から予め定められた数Ａ_１７の過去の発音の終了時の利得を記憶手段２４６２からそれぞれ読み出して、それらの平均値を求め、その平均値を入力音利得調整部２４に設定する。 When the input sound end gain adjusting means 246 receives the information indicating that the sound generation has ended, the input sound end gain adjusting means 246 reads the gain at the end of the sound generation set in the input sound gain adjusting section 24, and the input sound end gain adjusting means 246 Store in the storage means 2462. Then, the input sound end gain adjusting means 246 reads out the gain at the end of the past pronunciation of the predetermined number A ₁₇ from the latest pronunciation from the storage means 2462, calculates the average value thereof, and calculates the average The value is set in the input sound gain adjustment unit 24.

入力音利得調整部２４から現在の利得の値を得ることができない場合には、入力音終了時利得調整手段２４６は、以下のようにして利得を入力音利得調整部２４に設定する。入力音利得調整部２４から現在の利得の値を得ることができない場合とは、例えば、入力音利得調整部２４が３ｄＢ利得を上げる、３ｄＢ利得を下げるというような相対的な利得の指定手段しか持たず、装置の調整範囲を超えた場合や、調整できなかったことを通知する手段を持たない場合のことである。 If the current gain value cannot be obtained from the input sound gain adjustment unit 24, the input sound end gain adjustment unit 246 sets the gain in the input sound gain adjustment unit 24 as follows. The case where the current gain value cannot be obtained from the input sound gain adjusting unit 24 is, for example, only relative gain specifying means such that the input sound gain adjusting unit 24 increases the 3 dB gain and decreases the 3 dB gain. This is a case where the adjustment range of the apparatus is not exceeded, or a means for notifying that adjustment could not be performed is not provided.

１．入力音第１利得調整指示部２７４２の指示によっては音量を調整するために利得を変更しなかった場合には、入力音終了時利得調整手段２４６は何もしない。
２．入力音第１利得調整指示部２７４２の指示により音量を下げるために利得を下げた場合には、入力音終了時利得調整手段２４６は現在の利得から予め設定した値Ａ_１８だけを値を下げた利得を入力音利得調整部２４に設定する。
３．入力音第１利得調整指示手段２７４２の指示により音量を上げるために利得を上げたときには、入力音終了時利得調整手段２４６は、以下の処理を行う。
３−１．入力音第２利得調整指示手段２７４４の指示により音量を下げるために利得を下げた場合には、入力音終了時利得調整手段２４６は何もしない。
３−２．「３−１．」以外の場合には、入力音終了時利得調整手段２４６は現在の利得から予め設定した値Ａ_１９だけ値を上げた利得を入力音利得調整部２４に設定する。 1. When the gain is not changed in order to adjust the volume according to the instruction of the input sound first gain adjustment instruction unit 2742, the input sound end gain adjustment means 246 does nothing.
2. When lowering the gain to decrease the volume according to an instruction of the input sound first gain adjustment instruction section 2742, the input sound at the end of the gain adjustment unit 246 has only the value A ₁₈ set in advance from the current gain lower on The gain is set in the input sound gain adjustment unit 24.
3. When the gain is increased to increase the volume in accordance with an instruction from the input sound first gain adjustment instruction unit 2742, the input sound end gain adjustment unit 246 performs the following processing.
3-1. When the gain is lowered to lower the volume in accordance with an instruction from the input sound second gain adjustment instruction means 2744, the input sound end gain adjustment means 246 does nothing.
3-2. In cases other than “3-1.”, The input sound end-time gain adjusting means 246 sets the gain obtained by increasing the value by a preset value A ₁₉ from the current gain in the input sound gain adjusting unit 24.

このような方法で、発音の終了時に音量を調整することで、次の発音開始時の音量を適切な値に近づけることができるとともに、話者、マイク位置、声量等の収音環境条件の変化に追随して音量を適切に調整することができる。
入力音第２利得調整指示手段２７４４及び第２入力音利得調整手段２４４はなくてもよい。また、終了時利得調整手段２４６はなくてもよい。 By adjusting the volume at the end of pronunciation in this way, the volume at the beginning of the next pronunciation can be brought close to an appropriate value, and the sound collection environmental conditions such as speaker, microphone position, and volume can be changed. The volume can be adjusted appropriately following the above.
The input sound second gain adjustment instruction unit 2744 and the second input sound gain adjustment unit 244 may be omitted. Further, the end gain adjusting means 246 may not be provided.

図１２、図１３に、実施例４におけるコールセンターで音量調整装置１４０を利用してオペレータとユーザとの会話を録音するシステムを示す。 12 and 13 show a system for recording a conversation between an operator and a user using the volume control device 140 at a call center according to the fourth embodiment.

電話機５０に接続したヘッドセット４３をオペレータが装着し、ユーザと会話をする。ヘッドセット４３と電話機５０との間に音量調整部２０（実施例１〜４で説明した音量調整装置と同一）を有する送受話器分岐アダプタ１３６を接続して、オーディオ入力又はＵＳＢを使って、その音声をＰＣ１３７に取り込む。ＰＣ１３７に取り込んだオペレータ、ユーザそれぞれの音声はエコー抑圧部２８を通して、側音としてユーザ音声側に入っているオペレータ音声を抑圧する。図１３に示すように送受話器分離アダプタにエコー抑圧部２８が付いている構成にもできる。 An operator wears the headset 43 connected to the telephone 50 and has a conversation with the user. A handset branch adapter 136 having a volume control unit 20 (same as the volume control device described in the first to fourth embodiments) is connected between the headset 43 and the telephone 50, and the audio input or USB is used to Audio is taken into the PC 137. The voices of the operator and the user captured by the PC 137 are transmitted through the echo suppression unit 28 to suppress the operator voice that is contained in the user voice side as a side sound. As shown in FIG. 13, it is possible to adopt a configuration in which the echo suppressor 28 is attached to the handset separation adapter.

エコー抑圧部２８から送られたそれぞれの音声をもとに終始判定手段２７２４で、通話の始端を検出すると、送信側音量調整装置１４０ａは、オペレータ音声の音量を上記説明した音量調整装置２０と同様に調整する。また、受信側音量調整装置１４０ｂは、ユーザ音声の音量を上記説明した音量調整装置２０と同様に調整する。送信側音量調整装置１４０ａと受信側音量調整装置１４０ｂはそれぞれ、音量調整部２０と終始判定手段２７２４とを有していないが、送受話器分岐アダプタ１３６の音量調整部２０及びＰＣ１３７の終始判定手段２７２４が、送信側音量調整装置１４０ａと受信側音量調整装置１４０ｂの音量調整部２０及び終始判定手段２７２４として機能する。それ以外の点では、音量調整装置１４０と同様である。 When the start / end determination unit 2724 detects the beginning of a call based on the respective voices sent from the echo suppressor 28, the transmission-side volume adjustment device 140a adjusts the volume of the operator voice in the same manner as the volume adjustment device 20 described above. Adjust to. Further, the reception-side volume adjustment device 140b adjusts the volume of the user voice in the same manner as the volume adjustment device 20 described above. Each of the transmission-side volume adjustment device 140a and the reception-side volume adjustment device 140b does not have the volume adjustment unit 20 and the end-to-end determination unit 2724, but the volume adjustment unit 20 of the handset branch adapter 136 and the end-to-end determination unit 2724 of the PC 137. However, it functions as the volume adjustment unit 20 and the end-to-end determination unit 2724 of the transmission-side volume adjustment device 140a and the reception-side volume adjustment device 140b. The other points are the same as those of the volume adjusting device 140.

オペレータ音声はオペレータが同じ間は収音条件がほぼ同じなので数通話で適切な音量に調整することができる。しかし、ユーザ音声は、一通話ごとに電話機、伝送路等が異なる。このため、受信側音量調整装置１４０ｂは、入力音終了時利得調整手段２４６、参照音終了時利得調整手段２２６による音調調整の指示を行わない。 The operator voice can be adjusted to an appropriate volume with a few calls because the sound collection conditions are substantially the same while the operator is the same. However, the user voice has a different telephone, transmission line, etc. for each call. For this reason, the reception-side volume adjustment device 140b does not instruct the tone adjustment by the input sound end gain adjustment means 246 and the reference sound end gain adjustment means 226.

終始判定手段２７２４が通話の終了を検出すると、音量が調整された音声は録音部１３９を通して、ＰＣ１３７のディスク１５０に格納される。 When the end-to-end determination unit 2724 detects the end of the call, the sound whose volume has been adjusted is stored in the disk 150 of the PC 137 through the recording unit 139.

＜ハードウェア構成＞
本発明は上述の実施の形態に限定されるものではない。また、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 <Hardware configuration>
The present invention is not limited to the above-described embodiment. In addition, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、エコー消去装置３００が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that the echo canceling apparatus 300 should have are described by a program. The processing function is realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよいが、具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, the magnetic recording device may be a hard disk device or a flexible Discs, magnetic tapes, etc. as optical disks, DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), etc. As the magneto-optical recording medium, MO (Magneto-Optical disc) or the like can be used, and as the semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。
また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads the program stored in its own recording medium and executes the process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

また、本実施例で説明したエコー消去装置３００は、ＣＰＵ（Central Processing Unit）、入力部、出力部、補助記憶装置、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）及びバスを有している（何れも図示せず）。 The echo canceller 300 described in this embodiment includes a CPU (Central Processing Unit), an input unit, an output unit, an auxiliary storage device, a RAM (Random Access Memory), a ROM (Read Only Memory), and a bus. (Both not shown).

ＣＰＵは、読み込まれた各種プログラムに従って様々な演算処理を実行する。補助記憶装置は、例えば、ハードディスク、ＭＯ（Magneto-Optical disc）、半導体メモリ等であり、ＲＡＭは、ＳＲＡＭ(Static Random Access Memory)、ＤＲＡＭ (Dynamic Random Access Memory)等である。また、バスは、ＣＰＵ、入力部、出力部、補助記憶装置、ＲＡＭ及びＲＯＭを通信可能に接続している。 The CPU executes various arithmetic processes according to the read various programs. The auxiliary storage device is, for example, a hard disk, an MO (Magneto-Optical disc), a semiconductor memory, or the like, and the RAM is an SRAM (Static Random Access Memory), a DRAM (Dynamic Random Access Memory), or the like. The bus connects the CPU, the input unit, the output unit, the auxiliary storage device, the RAM, and the ROM so that they can communicate with each other.

＜ハードウェアとソフトウェアとの協働＞
本実施例の単語追加装置は、上述のようなハードウェアに所定のプログラムが読み込まれ、ＣＰＵがそれを実行することによって構築される。以下、このように構築される各装置の機能構成を説明する。
音量調整装置２０の参照音利得調整部２２、入力音利得調整部２４、利得計算部２６、エコー抑圧部２８は、所定のプログラムがＣＰＵに読み込まれ、実行されることによって構築される演算部である。音量調整装置２０の記憶部（図示せず）は上記補助記憶装置として機能する。 <Cooperation between hardware and software>
The word adding device of this embodiment is constructed by reading a predetermined program into the hardware as described above and executing it by the CPU. The functional configuration of each device constructed in this way will be described below.
The reference sound gain adjustment unit 22, the input sound gain adjustment unit 24, the gain calculation unit 26, and the echo suppression unit 28 of the volume adjustment device 20 are calculation units constructed by reading a predetermined program into the CPU and executing it. is there. A storage unit (not shown) of the volume adjusting device 20 functions as the auxiliary storage device.

従来の音量調整装置の適用例を示した図。The figure which showed the example of application of the conventional volume control apparatus. 図１を簡略化した図。The figure which simplified FIG. 従来の音量調整装置の変形例を示した図。The figure which showed the modification of the conventional volume control apparatus. 本実施例の音量調整装置の機能構成例を示した図。The figure which showed the function structural example of the volume control apparatus of a present Example. 本実施例の音量調整装置の適用例を示した図。The figure which showed the example of application of the volume control apparatus of a present Example. 本実施例の音量調整装置の処理フローを示した図。The figure which showed the processing flow of the volume control apparatus of a present Example. 本実施例の利得計算部の機能構成例を示した図。The figure which showed the function structural example of the gain calculation part of a present Example. 本実施例の音量計算手段の機能構成例を示した図。The figure which showed the function structural example of the volume calculation means of a present Example. Ａは音信号の波形を例示する図であり、Ｂは第１音区間を例示する図であり、Ｃは第１音区間の外形値を例示する図である。A is a diagram illustrating a waveform of a sound signal, B is a diagram illustrating a first sound segment, and C is a diagram illustrating an outer shape value of the first sound segment. 本実施例の第１利得調整指示手段の機能構成例を示した図。The figure which showed the function structural example of the 1st gain adjustment instruction | indication means of a present Example. 本実施例の第２利得調整指示手段の機能構成例を示した図。The figure which showed the function structural example of the 2nd gain adjustment instruction | indication means of a present Example. オペレータとユーザの会話を録音するシステムを例示する図。The figure which illustrates the system which records a conversation of an operator and a user. オペレータとユーザの会話を録音するシステムの別の形態を例示する図。The figure which illustrates another form of the system which records a conversation of an operator and a user.

Claims

第１音源からの第１音と、第２音源からの第２音が第１収音部で入力音信号として収音され、当該入力音信号の利得を入力音利得情報を用いて調整することで、調整後入力音信号を出力する入力音利得調整部と、
第２音が第２収音部で参照音信号として収音され、当該参照音信号の利得を参照音利得情報を用いて調整することで、調整後参照音信号を出力する参照音利得調整部と、
調整後入力音信号から、調整後参照音信号に適応フィルタ係数を畳み込んだものを減算することで、抑圧後入力音信号を出力するエコー抑圧部と、
調整後参照音信号と調整後入力音信号とを用いて、入力音利得情報と、参照音利得情報と、当該入力音利得情報の変化量が当該参照音利得情報の変化量と離れている度合いを示す利得調整情報と、を計算し、前記適応フィルタ係数を当該利得調整情報に応じた値にする利得計算部と、
抑圧後入力音信号を出力する出力部と、を具備する音量調整装置。 The first sound from the first sound source and the second sound from the second sound source are collected as input sound signals by the first sound collection unit, and the gain of the input sound signal is adjusted using the input sound gain information. And an input sound gain adjustment unit for outputting the adjusted input sound signal,
The second sound is collected as a reference sound signal by the second sound collection unit, and the reference sound gain adjustment unit that outputs the adjusted reference sound signal by adjusting the gain of the reference sound signal using the reference sound gain information When,
An echo suppression unit that outputs the input sound signal after suppression by subtracting the adjusted reference sound signal obtained by convolving the adaptive filter coefficient from the adjusted input sound signal;
The degree to which the change amount of the input sound gain information, the reference sound gain information, and the input sound gain information is different from the change amount of the reference sound gain information using the adjusted reference sound signal and the adjusted input sound signal A gain calculation unit that calculates the adaptive filter coefficient to a value according to the gain adjustment information,
An output unit that outputs an input sound signal after suppression.

請求項１記載の音量調整装置において、
前記出力部は、
抑圧後入力音信号および調整後参照音信号を出力するものであることを特徴とする音量調整装置。 The volume control device according to claim 1,
The output unit is
A volume control device for outputting an input sound signal after suppression and a reference sound signal after adjustment.

請求項１または２記載の音量調整装置において、
前記利得計算部は、調整後参照音信号と抑圧後入力音信号とから、入力音利得情報と、参照音利得情報と、利得調整情報とを計算することを特徴とする音量調整装置。 The volume control apparatus according to claim 1 or 2,
The gain calculating unit calculates input sound gain information, reference sound gain information, and gain adjustment information from the adjusted reference sound signal and the suppressed input sound signal.

請求項１〜３何れかに記載の音量調整装置において、
前記利得計算部は、
入力音信号と参照音信号を一定の時間長のフレームで分割するフレーム分割手段と、
フレームに含まれる入力音信号と参照音信号の大きさを表す特徴量である外形値をフレームごとに求める外形値決定手段と、
予め定められた第１閾値以上連続する無音フレームに挟まれ、予め定められた第２閾値以上のフレームから構成された音区間を第１音区間として、第１音区間を構成する複数のフレームの外形値から、外形値が大きい方から複数の外形値を除外して、除外されずに残った外形値の最大値をその第１音区間の外形値として求める第１音区間外形値抽出手段と、
入力音信号と参照音信号についての、第１音区間の外形値が予め定められた範囲に入るように、第１入力音利得情報と第１参照音利得情報を決定し、出力する第１決定手段と、を有し、
前記参照音利得調整部は、前記第１参照音利得情報を用いて、参照音信号の音量を調整する第１参照音利得調整手段を有し、
前記入力音利得情報調整部は、前記第１入力音利得情報を用いて、入力音信号の音量を調整する第１入力音利得調整手段を有することを特徴とする音量調整装置。 In the volume control apparatus in any one of Claims 1-3,
The gain calculator is
Frame dividing means for dividing the input sound signal and the reference sound signal into frames of a certain time length;
An outline value determining means for obtaining, for each frame, an outline value that is a feature amount representing the magnitude of the input sound signal and the reference sound signal included in the frame;
A plurality of frames constituting the first sound interval are defined as a first sound interval, which is sandwiched between silence frames that are continuous for a predetermined first threshold value or more and composed of frames having a predetermined second threshold value or more. A first sound section outer shape value extracting means for excluding a plurality of outer shape values from the larger outer shape value from the outer shape values and obtaining a maximum value of the remaining outer shape values as the outer shape value of the first sound section without being excluded; ,
First input sound gain information and first reference sound gain information are determined and output so that the external value of the first sound interval for the input sound signal and the reference sound signal falls within a predetermined range. Means,
The reference sound gain adjusting unit includes first reference sound gain adjusting means for adjusting a volume of a reference sound signal using the first reference sound gain information,
The input sound gain information adjustment unit includes first input sound gain adjustment means for adjusting the sound volume of an input sound signal using the first input sound gain information.

請求項４に記載された音量調整装置において、
上記フレームの外形値は、そのフレームに含まれるサンプルの値の絶対値の最大値であることを特徴とする音量調整装置。 In the volume control apparatus according to claim 4,
The external volume value of the frame is a maximum absolute value of sample values included in the frame.

請求項４又は５記載された音量調整装置において、
前記利得計算部は、
フレームの外形値が予め定められた第２閾値より大であればそのフレームを有音フレームと判定し、そうでなければそのフレームを無音フレームと判定する有音無音フレーム判定手段と、
前記第１閾値以上連続する無音フレームから構成される音区間を無音区間と判定するとともに、それ以外の音区間を有音区間と判定する有音無音区間判定手段と、
上記判定された有音区間のうち、予め定められた時間長よりも長い有音区間を上記第１音区間とする第１音区間抽出手段とを備えることを特徴とする音量調整装置。 In the volume control apparatus described in Claim 4 or 5,
The gain calculator is
A sound / silence frame determining means for determining that the frame is a sound frame if the outer shape value of the frame is greater than a predetermined second threshold;
Determining a sound section composed of silent frames continuous for the first threshold or more as a silent section, and determining a sound silent section determination means for determining other sound sections as a sound section;
A sound volume adjusting apparatus comprising: a first sound section extraction unit that uses a sound section that is longer than a predetermined time length among the determined sound sections as the first sound section.

請求項１〜６何れかに記載の音量調整装置において、
前記利得計算部は、
サンプルの値の絶対値が予め定められた第３閾値より大であるサンプルの数（以下、過大入力サンプル数とする。）をフレームごとに決定する過大入力サンプル数決定手段と、
前記過大入力サンプル数が予め定められた第４閾値より大である（以下、過大入力フレームという。）かどうかをフレームごとに決定する過大入力フレーム決定手段と、
前記第１音区間を構成するフレームの数よりも少ない数のフレームから構成される音区間を第２音区間として、その第２音区間を構成するフレームについての前記決定された過大入力サンプル数の総数が予め定められた第５閾値より大であり、かつ、その第２音区間を構成するフレームの中の過大入力フレームの数が第６閾値より大である場合に、収音された入力音信号と参照音信号の利得を所定量だけ下げるための情報（以下、それぞれ、「第２入力音利得情報」と「第２参照音利得情報」という。）を出力する第２決定手段と、
前記参照音利得調整部は、前記第２参照音利得情報を用いて、参照音信号の音量を調整する第２参照音利得調整手段を有し、
前記入力音利得情報調整部は、前記第２入力音利得情報を用いて、入力音信号の音量を調整する第２入力音利得調整手段を有することを特徴とする音量調整装置。 In the volume control apparatus in any one of Claims 1-6,
The gain calculator is
An excessive input sample number determining means for determining, for each frame, the number of samples whose absolute value of the sample value is larger than a predetermined third threshold (hereinafter referred to as an excessive input sample number);
An excessive input frame determining means for determining, for each frame, whether the number of excessive input samples is larger than a predetermined fourth threshold (hereinafter referred to as an excessive input frame);
A sound interval composed of a number of frames smaller than the number of frames constituting the first sound interval is defined as a second sound interval, and the determined number of excessive input samples for the frames constituting the second sound interval is determined. When the total number is larger than a predetermined fifth threshold and the number of excessive input frames in the frames constituting the second sound interval is larger than the sixth threshold, the collected input sound is collected. Second determining means for outputting information for lowering the gain of the signal and the reference sound signal by a predetermined amount (hereinafter referred to as “second input sound gain information” and “second reference sound gain information”, respectively);
The reference sound gain adjustment unit includes second reference sound gain adjustment means for adjusting a volume of a reference sound signal using the second reference sound gain information.
The input sound gain information adjustment unit includes a second input sound gain adjustment unit that adjusts the volume of an input sound signal using the second input sound gain information.

請求項１〜７何れかに記載の音量調整装置において、
入力された入力音信号、参照音信号の振幅の絶対値の平均値をフレームごとに求め、予め定められた第７閾値よりも大きい平均値を有するフレームを検出した場合に発音が開始したと判定し、予め定められた第８閾値より小さい平均値を有するフレームが予め定められた数だけ連続して続いた場合に発音が終了したと判定する終始判定手段と、
発音が終了したと判定されたときに、発音終了時の第１入力音利得調整情報及び／又は第２入力音利得調整情報を記憶手段に記憶すると共に、直近の発音から予め定められた数の過去の発音の終了時の第１入力音利得調整情報及び／又は第２入力音利得調整情報をその記憶手段から読み出して、それらの平均値を求め、第１入力音利得調整手段及び／又は第２入力音利得調整手段に設定する入力音終了時利得調整手段と、
発音が終了したと判定されたときに、発音終了時の第１参照音利得調整情報及び／又は第２参照音利得調整情報を記憶手段に記憶すると共に、直近の発音から予め定められた数の過去の発音の終了時の第１参照音利得調整情報及び／又は第２参照音利得調整情報をその記憶手段から読み出して、それらの平均値を求め、第１参照音利得調整手段及び／又は第２参照音利得調整手段に設定する参照音終了時利得調整手段と、を備えることを特徴とする音量調整装置。 In the volume control apparatus in any one of Claims 1-7,
An average value of the absolute values of the amplitudes of the input sound signal and the reference sound signal that are input is obtained for each frame, and it is determined that sound generation has started when a frame having an average value larger than a predetermined seventh threshold is detected. And an end-to-end determination means for determining that the sound generation has ended when a predetermined number of frames having an average value smaller than a predetermined eighth threshold value continue continuously.
When it is determined that the sound generation has ended, the first input sound gain adjustment information and / or the second input sound gain adjustment information at the end of the sound generation is stored in the storage means, and a predetermined number of sounds are determined from the latest sound generation. The first input sound gain adjustment information and / or the second input sound gain adjustment information at the end of the past pronunciation is read out from the storage means, and an average value thereof is obtained to obtain the first input sound gain adjustment means and / or the first input sound gain adjustment information. Input sound end gain adjusting means set in the two input sound gain adjusting means;
When it is determined that the sound generation has ended, the first reference sound gain adjustment information and / or the second reference sound gain adjustment information at the end of the sound generation is stored in the storage means, and a predetermined number of sounds are determined from the latest sound generation. The first reference sound gain adjustment information and / or the second reference sound gain adjustment information at the end of the past pronunciation is read from the storage means, an average value thereof is obtained, and the first reference sound gain adjustment means and / or the first reference sound gain adjustment information is obtained. And a reference sound end gain adjusting means set in the 2 reference sound gain adjusting means.

第１音源からの第１音と、第２音源からの第２音が第１収音過程で入力音信号として収音され、当該入力音信号の利得を入力音利得情報を用いて調整することで、調整後入力音信号を出力する入力音利得調整過程と、
第２音が第２収音過程で参照音信号として収音され、当該参照音信号の利得を参照音利得情報を用いて調整することで、調整後参照音信号を出力する参照音利得調整過程と、
調整後入力音信号から、調整後参照音信号に適応フィルタ係数を畳み込んだものを減算することで、抑圧後入力音信号を出力するエコー抑圧過程と、
調整後参照音信号と調整後入力音信号とを用いて、入力音利得情報と、参照音利得情報と、当該入力音利得情報の変化量が当該参照音利得情報の変化量と離れている度合いを示す利得調整情報と、を計算し、前記適応フィルタ係数を当該利得調整情報に応じた値にする利得計算過程と、
抑圧後入力音信号および調整後参照音信号を出力する出力過程と、を具備する音量調整方法。 The first sound from the first sound source and the second sound from the second sound source are collected as an input sound signal in the first sound collection process, and the gain of the input sound signal is adjusted using the input sound gain information. In the input sound gain adjustment process for outputting the adjusted input sound signal,
The reference sound gain adjustment process in which the second sound is collected as a reference sound signal in the second sound collection process, and the adjusted reference sound signal is output by adjusting the gain of the reference sound signal using the reference sound gain information When,
An echo suppression process for outputting the input sound signal after suppression by subtracting the adjusted reference sound signal obtained by convolving the adaptive filter coefficient from the adjusted input sound signal;
The degree to which the change amount of the input sound gain information, the reference sound gain information, and the input sound gain information is different from the change amount of the reference sound gain information using the adjusted reference sound signal and the adjusted input sound signal A gain adjustment process for calculating the adaptive filter coefficient to a value according to the gain adjustment information,
An output process for outputting an input sound signal after suppression and a reference sound signal after adjustment.

請求項１〜８何れかに記載の音量調整装置としてコンピュータを動作させるプログラム。 A program that causes a computer to operate as the volume control device according to claim 1.

請求項１０記載のプログラムをコンピュータに実現させるために記録したコンピュータ読み取り可能な記録媒体。

A computer-readable recording medium recorded to cause a computer to implement the program according to claim 10.