JP2007174190A

JP2007174190A - Audio system

Info

Publication number: JP2007174190A
Application number: JP2005368053A
Authority: JP
Inventors: Takuro Sone; 卓朗曽根
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-12-21
Filing date: 2005-12-21
Publication date: 2007-07-05
Anticipated expiration: 2025-12-21
Also published as: JP4835151B2

Abstract

PROBLEM TO BE SOLVED: To provide an audio system capable of detecting a position of a microphone in real time only through signal processing without using an adaptor. SOLUTION: An analysis control section 14 refers to peaks of a filter coefficient of an adaptive filter 13. The filter coefficient simulates a sound propagation path from speakers 18 to the microphone 11, and positions of peaks (signals corresponding to direct sound) on a time base indicate a distance between the speakers 18 and the microphone 11. Thus, the positional relation among the speakers 18L, R and the microphone 11 can be detected. COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、スピーカとマイクロホンの位置関係をリアルタイムに検出するオーディオシステムに関する。 The present invention relates to an audio system that detects a positional relationship between a speaker and a microphone in real time.

スピーカとマイクとを備えたカラオケ装置においては、複数のスピーカによるステレオ効果によって立体感のある音響効果を付与するものがあった。立体感を得るためにはスピーカ、マイクの位置関係が重要となる。すなわち、歌唱者の位置において最も立体感が得られるように音響効果を付与する。しかし、歌唱者が移動した場合には最適な音響効果を付与できないという問題が発生する。 Some karaoke apparatuses including a speaker and a microphone provide a three-dimensional sound effect by a stereo effect by a plurality of speakers. In order to obtain a three-dimensional effect, the positional relationship between the speaker and the microphone is important. That is, an acoustic effect is provided so that the most stereoscopic effect can be obtained at the position of the singer. However, when a singer moves, the problem that an optimal acoustic effect cannot be provided generate | occur | produces.

そこで、可聴外の周波数の音声信号（超音波）を歌唱用マイクから出力し、カラオケ装置本体に設置した超音波受信用のマイクで距離を測定する手法が提案されている（例えば特許文献１参照）。
特開２０００−５９８８０号公報 Therefore, a method has been proposed in which an audio signal (ultrasonic wave) having a frequency outside the audible range is output from a singing microphone and the distance is measured with an ultrasonic receiving microphone installed in the karaoke apparatus body (see, for example, Patent Document 1). ).
JP 2000-59880 A

しかしながら、特許文献１の構成によれば、マイクに超音波発生器が必要となり、また、カラオケ装置本体にも超音波受信用のマイクを設置する必要があった。これらの超音波発生器や受信機は、カラオケ装置の本来機能（歌唱音声の拡声、カラオケ楽音の再生など）に関係するものではなく、位置検出のためだけに用いられるものである。したがって、位置検出のためだけに付加装置が必要となり、コストがかかるという問題が有った。また、リアルタイムに位置検出するためには超音波を発し続けなければならないという問題も有った。 However, according to the configuration of Patent Document 1, an ultrasonic generator is required for the microphone, and it is necessary to install an ultrasonic receiving microphone on the karaoke apparatus main body. These ultrasonic generators and receivers are not related to the original functions of the karaoke apparatus (speech voice enhancement, karaoke music reproduction, etc.), but are used only for position detection. Therefore, an additional device is required only for position detection, and there is a problem that costs increase. In addition, in order to detect the position in real time, there is a problem that it is necessary to continuously emit ultrasonic waves.

また、特許文献１の構成を他の分野（例えば通信会議装置）に適用した場合においても同様に、本来機能（マイクで収音した音声を送信し、受信した音声をスピーカで放音する）に関係しない付加装置が必要となり、コストがかかるという問題が有った。 Similarly, when the configuration of Patent Document 1 is applied to other fields (for example, a teleconferencing device), the original function (sound collected by a microphone is transmitted and the received sound is emitted by a speaker) is similarly applied. There was a problem that an additional device that was not related was necessary and costly.

この発明は、付加装置を使用せず、信号処理だけでマイクの位置をリアルタイムに検出することができるオーディオシステムを提供することを目的とする。 An object of the present invention is to provide an audio system that can detect the position of a microphone in real time only by signal processing without using an additional device.

この発明のオーディオシステムは、音声を収音し、収音信号を出力するマイクと、出力音声信号を音声として放音するスピーカと、前記スピーカから前記マイクに至る音声伝搬経路を模したフィルタ係数を算出する係数推定部、前記フィルタ係数が設定され、前記出力音声信号をフィルタリングして前記音声伝搬経路を帰還した帰還音声信号の模擬信号を出力するフィルタ、および、前記収音信号から前記模擬信号を減算することにより収音信号中の帰還音声信号成分を除去した残差信号を出力する減算部、を備えた適応フィルタと、前記残差信号を含む出力音声信号を生成する出力音声信号生成部と、前記フィルタ係数の時間軸におけるピーク位置に基づいて前記スピーカと前記マイクとの距離を推定するマイク位置推定手段と、を備えたことを特徴とする。 The audio system according to the present invention includes a microphone that collects sound and outputs a collected sound signal, a speaker that emits the output sound signal as sound, and a filter coefficient that imitates a sound propagation path from the speaker to the microphone. A coefficient estimating unit to calculate, a filter in which the filter coefficient is set, filtering the output audio signal and outputting a simulated signal of a feedback audio signal that is fed back through the audio propagation path, and the simulated signal from the collected sound signal An adaptive filter comprising: a subtractor that outputs a residual signal from which a feedback audio signal component in the collected sound signal has been removed by subtraction; and an output audio signal generator that generates an output audio signal including the residual signal; And a microphone position estimation means for estimating a distance between the speaker and the microphone based on a peak position of the filter coefficient on the time axis. And wherein the door.

この発明において、マイクで収音した音声は適応フィルタにて帰還音声信号成分が除去される。適応フィルタは、スピーカからマイクに至る伝達系（音響伝達系）の伝達関数を推定して、スピーカに入力する出力音声信号から帰還音（回り込み信号）を模擬した模擬信号を生成する。適応フィルタは、この模擬信号をマイクで収音した収音信号から減算する。マイク位置推定手段は、この適応フィルタの推定した伝達関数（フィルタ係数）を参照して、所定レベル以上のピークを示す時間軸の位置を検出する。通常、スピーカからマイクに至る音声のうち、最もレベルの大きい信号は直接音であるため、フィルタ係数のピークは、スピーカからマイクに至る直接音に対応するものである。したがって、マイク位置推定手段は、フィルタ係数のピークの位置を検出することで、直接音の到達時間を検出することができる。この時間に音速を乗算することで、スピーカとマイクの距離を推定する。 In the present invention, the feedback sound signal component is removed from the sound collected by the microphone by the adaptive filter. The adaptive filter estimates a transfer function of a transmission system (acoustic transmission system) from the speaker to the microphone, and generates a simulated signal that simulates a feedback sound (around signal) from an output audio signal input to the speaker. The adaptive filter subtracts this simulation signal from the collected sound signal collected by the microphone. The microphone position estimating means refers to the transfer function (filter coefficient) estimated by the adaptive filter and detects the position of the time axis indicating a peak of a predetermined level or higher. Usually, the signal having the highest level among the sound from the speaker to the microphone is a direct sound, and therefore the peak of the filter coefficient corresponds to the direct sound from the speaker to the microphone. Therefore, the microphone position estimating means can detect the arrival time of the direct sound by detecting the peak position of the filter coefficient. By multiplying this time by the speed of sound, the distance between the speaker and the microphone is estimated.

この発明は、さらに、前記スピーカを複数備え、前記マイク位置推定手段は、前記フィルタ係数から複数のピークを検出し、各ピーク位置に基づいて各スピーカと前記マイクとの距離を推定することを特徴とする。 The present invention further comprises a plurality of the speakers, and the microphone position estimating means detects a plurality of peaks from the filter coefficient and estimates a distance between each speaker and the microphone based on each peak position. And

この発明において、スピーカが複数あり、それぞれのスピーカとマイクの距離が異なる場合は、フィルタ係数に複数のピークが検出されることとなる。マイク位置推定手段は、この複数のピークをそれぞれのスピーカの出力した音声に対応させ、各スピーカとマイクとの距離を推定する。 In the present invention, when there are a plurality of speakers and the distance between each speaker and the microphone is different, a plurality of peaks are detected in the filter coefficient. The microphone position estimating means estimates the distance between each speaker and the microphone by associating the plurality of peaks with the sound output from each speaker.

この発明は、さらに、特定のスピーカに入力する出力音声信号を遅延させる出力音声信号遅延部をさらに備え、前記マイク位置推定手段は、前記遅延部の遅延時間を変化させるとともに、この遅延時間の変化によるフィルタ係数のピークの移動を検出することにより、前記特定のスピーカに対応するピークを割り出すことを特徴とする。 The present invention further includes an output audio signal delay unit that delays an output audio signal input to a specific speaker, and the microphone position estimating unit changes a delay time of the delay unit and changes the delay time. The peak corresponding to the specific speaker is determined by detecting the movement of the peak of the filter coefficient due to.

この発明において、いずれかのスピーカに入力する出力音声信号に遅延時間を付与する。マイク位置推定手段は、この遅延時間をコントロールし、フィルタ係数のピークの時間変化を検出する。すなわち、スピーカが複数あり、それぞれのスピーカとマイクの距離が異なる場合は、フィルタ係数に複数のピークが検出されることとなるが、いずれかのスピーカに入力する出力音声信号の遅延時間を変化させることで、そのスピーカに対応するフィルタ係数のピークを割り出すことができる。これにより、より詳細にスピーカ、マイクの位置関係を検出することができる。 In the present invention, a delay time is given to an output audio signal input to any speaker. The microphone position estimating means controls the delay time and detects the time change of the peak of the filter coefficient. That is, if there are multiple speakers and the distance between each speaker and the microphone is different, multiple peaks will be detected in the filter coefficient, but the delay time of the output audio signal input to one of the speakers will be changed. Thus, the peak of the filter coefficient corresponding to the speaker can be determined. Thereby, the positional relationship between the speaker and the microphone can be detected in more detail.

この発明は、さらに、前記出力音声信号生成部を前記複数のスピーカのそれぞれに独立して備えるとともに、前記残差信号を遅延させて各出力音声信号生成部に入力する残差信号遅延部を各出力音声信号生成部毎に独立してさらに備え、前記マイク位置推定手段は、前記残差信号遅延部の遅延時間を変化させるとともに、この遅延時間の変化によるフィルタ係数のピークの移動を検出することにより、前記各スピーカに対応するピークを割り出すことを特徴とする。 The present invention further includes the output audio signal generation unit independently for each of the plurality of speakers, and further includes residual signal delay units that delay the residual signal and input the output signals to the output audio signal generation units. The microphone position estimator further changes the delay time of the residual signal delay unit and detects the movement of the peak of the filter coefficient due to the change of the delay time. Thus, a peak corresponding to each speaker is determined.

この発明において、複数のスピーカのそれぞれに入力する出力音声信号に遅延時間を付与する。マイク位置推定手段は、この遅延時間をそれぞれコントロールし、フィルタ係数のピークの時間変化を検出する。それぞれの出力音声信号の遅延時間を変化させることで、フィルタ係数の各ピークに対応するスピーカを割り出すことができる。これにより、より詳細にスピーカ、マイクの位置関係を検出することができる。 In the present invention, a delay time is given to the output audio signal input to each of the plurality of speakers. The microphone position estimating means controls the delay time, and detects the time change of the peak of the filter coefficient. By changing the delay time of each output audio signal, a speaker corresponding to each peak of the filter coefficient can be determined. Thereby, the positional relationship between the speaker and the microphone can be detected in more detail.

この発明は、さらに、前記マイク位置推定手段は、前記フィルタ係数の各スピーカに対応するピークが時間軸上で一致するように前記残差信号遅延部の遅延時間を設定することを特徴とする。 The present invention is further characterized in that the microphone position estimating means sets the delay time of the residual signal delay unit so that the peaks corresponding to the respective speakers of the filter coefficient coincide on the time axis.

この発明において、さらにディレイ回路の遅延量を変化させ、複数のピークのうち移動したピークを、時間軸において一方のピークに一致させる。すなわち、いずれか一方のスピーカに入力される音声信号を遅延させ、仮想的に、それぞれのスピーカとマイクとの距離を等しくする。これにより、それぞれのスピーカから放音された音声のうち、マイクで収音した音声に関しては、歌唱者の位置に同時に到達することとなり、歌唱音声を歌唱者に集中させることができる。 In the present invention, the delay amount of the delay circuit is further changed, and the shifted peak among the plurality of peaks is matched with one peak on the time axis. That is, the audio signal input to one of the speakers is delayed to virtually equalize the distance between each speaker and the microphone. Thereby, regarding the sound picked up by the microphone among the sounds emitted from the respective speakers, the position of the singer is reached at the same time, and the singing sound can be concentrated on the singer.

この発明によれば、適応フィルタのフィルタ係数の時間軸上のピーク位置を検出することで、スピーカからマイクに至る直接音の到達時間を推定するので、スピーカとマイクの距離を検出することができ、付加装置を使用せず、信号処理だけでマイクの位置をリアルタイムに検出することができる。 According to this invention, since the arrival time of the direct sound from the speaker to the microphone is estimated by detecting the peak position on the time axis of the filter coefficient of the adaptive filter, the distance between the speaker and the microphone can be detected. The position of the microphone can be detected in real time only by signal processing without using an additional device.

図面を参照して、本発明の実施形態について説明する。本発明は、マイクとスピーカを用いたシステムであれば、殆どのシステムに適用可能であるが、ここではカラオケ装置について説明する。図１は、同実施形態に係るカラオケ装置の主要部を示すブロック図である。同図に示すように、このカラオケ装置は、マイク１１、Ａ／Ｄコンバータ１２、適応フィルタ１３、分析制御部１４、加算器１５、Ｄ／Ａコンバータ１６、増幅器１７、およびスピーカ１８Ｌ，Ｒを備えている。 Embodiments of the present invention will be described with reference to the drawings. The present invention can be applied to almost any system as long as the system uses a microphone and a speaker. Here, a karaoke apparatus will be described. FIG. 1 is a block diagram showing a main part of the karaoke apparatus according to the embodiment. As shown in the figure, this karaoke apparatus includes a microphone 11, an A / D converter 12, an adaptive filter 13, an analysis control unit 14, an adder 15, a D / A converter 16, an amplifier 17, and speakers 18L and 18R. ing.

マイク１１は、歌唱者の歌唱音声とともに、カラオケ装置が設置された空間の音声を収音し、収音した音声に応じた音声信号をＡ／Ｄコンバータ１２に出力する。収音した音声には、スピーカ１８Ｌ，Ｒから放音されて回り込んだ音声（回り込み音）も含まれている。マイク１１は、一般的にはダイナミックマイクユニットを用いるが、コンデンサマイクユニット等、その他の形式を用いてもよい。また、マイク１１は、単一指向性マイクであってもよいし、無指向性マイクであってもよい。 The microphone 11 collects the voice of the space where the karaoke apparatus is installed together with the singing voice of the singer, and outputs an audio signal corresponding to the collected voice to the A / D converter 12. The collected sounds include sounds that are emitted from the speakers 18L and 18R and wrap around (sounds around). As the microphone 11, a dynamic microphone unit is generally used, but other types such as a capacitor microphone unit may be used. The microphone 11 may be a unidirectional microphone or an omnidirectional microphone.

Ａ／Ｄコンバータ１２は、マイク１１から出力された音声信号をデジタル信号に変換し、適応フィルタ１３に出力する。 The A / D converter 12 converts the audio signal output from the microphone 11 into a digital signal and outputs the digital signal to the adaptive filter 13.

適応フィルタ１３は、ＦＩＲフィルタ等のデジタルフィルタを含んでおり、Ａ／Ｄコンバータ１２でデジタル信号に変換されたマイク１１の収音信号から前記回り込み音を除去して加算器１５に出力する。 The adaptive filter 13 includes a digital filter such as an FIR filter, and removes the wraparound sound from the collected sound signal of the microphone 11 converted into a digital signal by the A / D converter 12 and outputs it to the adder 15.

図２に適応フィルタ１３の詳細なブロック図を示す。適応フィルタ１３は、ＦＩＲフィルタ１３１、加算器１３２、および係数推定部１３３を備えている。係数推定部１３３は、音響伝達系（マイク１１からスピーカ１８に至る音響伝搬経路）の伝達関数を推定し、推定した伝達関数を模擬するようにＦＩＲフィルタ１３１のフィルタ係数を算出して設定する。伝達関数の推定及びフィルタ係数の算出は、加算器１３２から出力された信号である残差信号を参照信号として用いて加算器１５から入力された信号（スピーカ１８へ入力する出力音声信号）に基づいて、適応アルゴリズムを用いて行われる。適応アルゴリズムは、残差信号ができるだけ小さくなるようにフィルタ係数を算出するアルゴリズムである。 FIG. 2 shows a detailed block diagram of the adaptive filter 13. The adaptive filter 13 includes an FIR filter 131, an adder 132, and a coefficient estimation unit 133. The coefficient estimation unit 133 estimates a transfer function of an acoustic transmission system (acoustic propagation path from the microphone 11 to the speaker 18), and calculates and sets a filter coefficient of the FIR filter 131 so as to simulate the estimated transfer function. The estimation of the transfer function and the calculation of the filter coefficient are based on a signal (output audio signal input to the speaker 18) input from the adder 15 using the residual signal, which is a signal output from the adder 132, as a reference signal. And using an adaptive algorithm. The adaptive algorithm is an algorithm that calculates filter coefficients so that the residual signal is as small as possible.

これにより、ＦＩＲフィルタ１３１において音響伝達系の回り込み信号（スピーカ１８からマイク１１に至る音声信号）を模擬した信号が生成され、加算器１３２においてマイク１１の出力信号から模擬信号を差し引くことで、回り込み信号のみを効率的に減衰させることができる。これにより、適応フィルタ１３は、回り込み信号のループ現象により発生するハウリングやエコーを防止することができる。 As a result, a signal simulating the sneak signal of the acoustic transmission system (sound signal from the speaker 18 to the microphone 11) is generated in the FIR filter 131, and the sneak path is obtained by subtracting the simulated signal from the output signal of the microphone 11 in the adder 132. Only the signal can be attenuated efficiently. Thereby, the adaptive filter 13 can prevent the howling and echo which generate | occur | produce by the loop phenomenon of a wraparound signal.

加算器１５は、適応フィルタの出力信号である歌唱音声信号と、カラオケ装置の楽音再生部（図示せず）から出力された楽音信号（モノラル）を加算してＤ／Ａコンバータ１６に出力する。 The adder 15 adds the singing voice signal, which is the output signal of the adaptive filter, and the musical sound signal (monaural) output from the musical sound reproducing unit (not shown) of the karaoke apparatus, and outputs the result to the D / A converter 16.

Ｄ／Ａコンバータ１６は、加算器１５の出力信号をアナログ音声信号に変換して増幅器１７に出力する。 The D / A converter 16 converts the output signal of the adder 15 into an analog audio signal and outputs it to the amplifier 17.

増幅器１７は、所謂パワーアンプであり、Ｄ／Ａコンバータ１６から出力された音声信号を増幅してスピーカ１８Ｌ、およびスピーカ１８Ｒに分岐して出力する。この例において、モノラル楽音信号と歌唱音声信号は、加算器１５で加算され、スピーカ１８Ｌ，Ｒに均等に（センタ定位で）出力される。 The amplifier 17 is a so-called power amplifier, which amplifies the audio signal output from the D / A converter 16 and branches it to the speaker 18L and the speaker 18R for output. In this example, the monaural musical sound signal and the singing voice signal are added by the adder 15 and output to the speakers 18L and 18R equally (at the center position).

スピーカ１８Ｌ、およびスピーカ１８Ｒは、それぞれ増幅器１７から出力された増幅音声信号に基づいて音声を放音する。スピーカ１８Ｌ、およびスピーカ１８Ｒは、一般的にコーン型スピーカユニットを用いるが、ホーン型スピーカユニット等、その他の形式を用いてもよい。 The speaker 18L and the speaker 18R each emit sound based on the amplified sound signal output from the amplifier 17. As the speaker 18L and the speaker 18R, a cone type speaker unit is generally used, but other types such as a horn type speaker unit may be used.

分析制御部１４は、図１、および図２に示すように、適応フィルタ１３の係数推定部１３３に接続されており、ＦＩＲフィルタ１３１に設定するフィルタ係数を読み出す。そして、このフィルタ係数のピーク位置に基づいて、マイク１１とスピーカ１８Ｌ，Ｒとの距離を検出する。 As shown in FIGS. 1 and 2, the analysis control unit 14 is connected to the coefficient estimation unit 133 of the adaptive filter 13 and reads the filter coefficient set in the FIR filter 131. Based on the peak position of the filter coefficient, the distance between the microphone 11 and the speakers 18L and 18R is detected.

ここで分析制御部１４が距離を検出する手法について詳細に説明する。図３に適応フィルタ１３のフィルタ係数の時間軸成分を示す。同図に示すグラフの横軸はＦＩＲフィルタのタップ番号、すなわち時間を表し、縦軸は各タップのゲイン、すなわち回り込み音のレベルを表す。同図（Ａ）においてはスピーカが１つ、マイクが１つの場合のフィルタ係数の例を示す。なお、適応フィルタ１３は、デジタルフィルタであり、フィルタ係数は離散信号として表されるものであるが、同図においては説明を容易にするために連続信号として表す。 Here, a method in which the analysis control unit 14 detects the distance will be described in detail. FIG. 3 shows time axis components of the filter coefficients of the adaptive filter 13. The horizontal axis of the graph shown in the figure represents the FIR filter tap number, that is, the time, and the vertical axis represents the gain of each tap, that is, the level of the wraparound sound. FIG. 2A shows an example of filter coefficients when there is one speaker and one microphone. The adaptive filter 13 is a digital filter, and the filter coefficient is expressed as a discrete signal. However, in the figure, it is expressed as a continuous signal for ease of explanation.

上述したように、適応フィルタ１３は、残差信号を参照信号として用いてスピーカに入力される出力音声信号に基づいて、音響伝達系の伝達関数を推定し、推定した伝達関数に合わせてフィルタ係数を算出する。したがって、同図に示すフィルタ係数の時間軸成分は、スピーカからマイクに至る帰還信号に対応する。スピーカからマイクに至る帰還信号のレベルが大きい場合は、これをキャンセルするためにフィルタ係数のレベルが大きくなる。 As described above, the adaptive filter 13 estimates the transfer function of the acoustic transfer system based on the output audio signal input to the speaker using the residual signal as a reference signal, and adjusts the filter coefficient according to the estimated transfer function. Is calculated. Therefore, the time-axis component of the filter coefficient shown in the figure corresponds to the feedback signal from the speaker to the microphone. When the level of the feedback signal from the speaker to the microphone is high, the level of the filter coefficient is increased to cancel this.

同図に示すように、フィルタ係数は所定レベル以上のピークを１つ有している。ピークとは所定レベル（閾値）以上の成分のうち、最もレベルが大きい係数を言う。ここで、スピーカからマイクに至る音声のうち、最もレベルの大きい信号は直接到達音であるため、フィルタ係数のピークは、スピーカからマイクに至る直接音に対応するものである。したがって、フィルタ係数のピークの時間成分は、スピーカからマイクへの直接音声の到達時間を示すこととなる。よって、適応フィルタ１３に接続されている分析制御部１４は、このピークの時間成分を検出し、この時間成分に音速を乗算することによってスピーカとマイクの距離を算出する。スピーカは固定されているため、この距離によりマイクの位置を検出することができる。 As shown in the figure, the filter coefficient has one peak of a predetermined level or higher. The peak is a coefficient having the largest level among components of a predetermined level (threshold value) or more. Here, of the sound from the speaker to the microphone, the signal having the highest level is the direct arrival sound, and therefore the peak of the filter coefficient corresponds to the direct sound from the speaker to the microphone. Therefore, the time component of the peak of the filter coefficient indicates the arrival time of the direct sound from the speaker to the microphone. Therefore, the analysis control unit 14 connected to the adaptive filter 13 detects the time component of this peak, and calculates the distance between the speaker and the microphone by multiplying this time component by the speed of sound. Since the speaker is fixed, the position of the microphone can be detected from this distance.

ここで、この例においては、複数のスピーカ１８Ｌ，Ｒを備えており、実際には同図（Ｂ）に示すようにフィルタ係数は所定レベル以上のピークを２つ有している。それぞれのスピーカとマイクとの距離が異なるため、各スピーカからマイクへの直接音声の到達時間が異なり、フィルタ係数は時間軸成分の異なるピークを２つ有することとなる。時間軸成分の小さい（時間が短い）ピークはマイクとの距離が近いスピーカに対応するピークであり、時間軸成分の大きい（時間が長い）ピークはマイクとの距離が遠いスピーカに対応するピークである。分析制御部１４は、マイク１１がスピーカ１８Ｌとスピーカ１８Ｒのどちら側に位置するか（距離が近いスピーカ、遠いスピーカ）を予め判断することで上記フィルタ係数の各ピークがそれぞれどのスピーカの出力音声に対応するものであるかを判断することができる。マイク１１がどちら側に位置するかを判断するには、マイクを使用する状況を想定して予め設定しておいてもよいし、ユーザがカラオケ装置の操作部等を用いて設定するようにしてもよい。 Here, in this example, a plurality of speakers 18L and 18R are provided, and actually the filter coefficient has two peaks of a predetermined level or more as shown in FIG. Since the distance between each speaker and the microphone is different, the arrival time of the direct sound from each speaker to the microphone is different, and the filter coefficient has two peaks with different time axis components. A peak with a small time axis component (short time) corresponds to a speaker with a short distance to the microphone, and a peak with a large time axis component (long time) corresponds to a speaker with a long distance to the microphone. is there. The analysis control unit 14 determines in advance which speaker the peak of the filter coefficient is output to which speaker by determining in advance which side the microphone 11 is located on the speaker 18L or the speaker 18R (a speaker having a short distance or a speaker having a distant distance). It can be determined whether it corresponds. In order to determine which side the microphone 11 is located on, it may be set in advance assuming a situation in which the microphone is used, or the user may set using the operation unit or the like of the karaoke device. Also good.

例えばスピーカ１８Ｌがマイク１１に近い場合は、図３（Ｂ）に示す時間軸成分の小さいピークがスピーカ１８Ｌの出力音声に対応するものである。一方で時間軸成分の大きいピークがスピーカ１８Rの出力音声に対応するものである。したがって、各スピーカ１８とマイク１１との距離を検出することができる。各スピーカ１８とマイク１１との距離を検出することで、マイク１１の位置を検出することができる。すなわち、スピーカの設置位置が決まっており、スピーカ１８Ｌとスピーカ１８Ｒの距離が判明している場合は、三角測量によってマイク１１の位置を判断することができる。 For example, when the speaker 18L is close to the microphone 11, the small peak of the time axis component shown in FIG. 3B corresponds to the output sound of the speaker 18L. On the other hand, a peak with a large time axis component corresponds to the output sound of the speaker 18R. Therefore, the distance between each speaker 18 and the microphone 11 can be detected. The position of the microphone 11 can be detected by detecting the distance between each speaker 18 and the microphone 11. That is, when the speaker installation position is determined and the distance between the speaker 18L and the speaker 18R is known, the position of the microphone 11 can be determined by triangulation.

以上のようにして検出したマイクの距離情報は、例えばカラオケ装置の音響制御部等に出力され、音響制御部にて音響効果を付与するために用いられる。音響制御部は、スピーカ１８とマイク１１の距離情報に基づいて、例えばスピーカ１８の音量等をコントロールし、歌唱者（マイク）の位置で最適な音響効果が付与されるようにする。また、歌唱者の位置にスポットライトを当てる等、その他の効果を付与するために用いてもよい。 The distance information of the microphone detected as described above is output to, for example, an acoustic control unit of the karaoke apparatus, and is used for providing an acoustic effect by the acoustic control unit. The sound control unit controls, for example, the volume of the speaker 18 based on the distance information between the speaker 18 and the microphone 11 so that an optimal sound effect is applied at the position of the singer (microphone). Moreover, you may use in order to provide other effects, such as shining a spotlight on the position of a singer.

また、左右スピーカからの回り込み音によるピークの特定を容易にするために、本発明においては以下のような変形例が可能である。図４は、変形例に係るカラオケ装置の主要部を示すブロック図である。なお、図１に示したカラオケ装置の主要部のブロック図と共通する構成部分については同一の符号を付し、その説明を省略する。 Further, in order to easily specify the peak due to the wraparound sound from the left and right speakers, the following modifications are possible in the present invention. FIG. 4 is a block diagram showing a main part of a karaoke apparatus according to a modification. In addition, about the component which is common in the block diagram of the principal part of the karaoke apparatus shown in FIG. 1, the same code | symbol is attached | subjected and the description is abbreviate | omitted.

このカラオケ装置は、マイク１１、Ａ／Ｄコンバータ１２、適応フィルタ１３、分析制御部１４、加算器２１Ｌ、Ｒ、ミキサ２２、ディレイ２３、Ｄ／Ａコンバータ１６、増幅器１７、およびスピーカ１８Ｌ，Ｒを備えている。 This karaoke apparatus includes a microphone 11, an A / D converter 12, an adaptive filter 13, an analysis control unit 14, an adder 21L, R, a mixer 22, a delay 23, a D / A converter 16, an amplifier 17, and speakers 18L, R. I have.

このカラオケ装置において、適応フィルタ１３は、加算器２１Ｌ，Ｒ、およびミキサ２２に接続されている。また、加算器２１Ｌはディレイ２３に接続され、Ｄ／Ａコンバータ１６は加算器２１Ｒ、およびディレイ２３に接続されている。また、分析制御部１４は、適応フィルタ１３、およびディレイ２３に接続されている。 In this karaoke apparatus, the adaptive filter 13 is connected to the adders 21 L and R and the mixer 22. The adder 21L is connected to the delay 23, and the D / A converter 16 is connected to the adder 21R and the delay 23. The analysis control unit 14 is connected to the adaptive filter 13 and the delay 23.

適応フィルタ１３の出力信号は加算器２１Ｌ，Ｒに入力される。加算器２１Ｌ，Ｒは、それぞれ適応フィルタ１３の出力信号（歌唱音声信号）と、カラオケ装置の楽音再生部（図示せず）から出力されたステレオ楽音信号（Ｌチャンネル、Ｒチャンネル）を加算する。加算器２１Ｒの出力信号はＤ／Ａコンバータ１６に入力され、加算器２１Ｌの出力信号はディレイ２３に入力される。また、加算器２１Ｌ，Ｒの出力信号は、それぞれミキサ２２に分岐入力される。ミキサ２２は、これらの信号をミキシングし、適応フィルタ１３にモノラル信号として出力する。適応フィルタ１３は、このモノラル信号に基づいて、上述した残差信号を参照信号としてフィルタ係数を更新する。 The output signal of the adaptive filter 13 is input to the adders 21L and 21R. The adders 21L and 21 add the output signal (singing voice signal) of the adaptive filter 13 and the stereo musical sound signal (L channel, R channel) output from the musical sound reproducing unit (not shown) of the karaoke apparatus. The output signal of the adder 21R is input to the D / A converter 16, and the output signal of the adder 21L is input to the delay 23. The output signals from the adders 21L and R are branched and input to the mixer 22, respectively. The mixer 22 mixes these signals and outputs them to the adaptive filter 13 as a monaural signal. Based on the monaural signal, the adaptive filter 13 updates the filter coefficient using the residual signal described above as a reference signal.

ディレイ２３は、加算器２１Ｌの出力信号に対し、分析制御部１４が設定した遅延時間を付与して出力する。ディレイ２３の出力信号はＤ／Ａコンバータ１６に入力される。Ｄ／Ａコンバータ１６は、加算器２１Ｒの出力信号、ディレイ２３の出力信号をそれぞれアナログ音声信号に変換し、増幅器１７に出力する。増幅器１７は、それぞれのアナログ信号（Ｌチャンネル，Ｒチャンネル）を増幅し、スピーカ１８Ｌ，Ｒに出力する。スピーカ１８Ｌ、およびスピーカ１８Ｒは、それぞれのチャンネルの信号に応じて音声を放音する。 The delay 23 adds the delay time set by the analysis control unit 14 to the output signal of the adder 21L, and outputs it. The output signal of the delay 23 is input to the D / A converter 16. The D / A converter 16 converts the output signal of the adder 21 R and the output signal of the delay 23 into analog audio signals, and outputs them to the amplifier 17. The amplifier 17 amplifies each analog signal (L channel, R channel) and outputs it to the speakers 18L, 18R. The speaker 18L and the speaker 18R emit sound according to the signal of each channel.

分析制御部１４は、ディレイ２３の遅延量を変化させる。これにより、適応フィルタ１３の外の系でＬチャンネルの距離が擬似的に長くなる。分析制御部１４は、この遅延量の変化に対応して移動するフィルタ係数のピークがスピーカ１８Ｌの出力音声に対応すると判断できる。したがって、マイク１１の位置（スピーカ１８Ｌ，Ｒからの距離）を検出することができる。 The analysis control unit 14 changes the delay amount of the delay 23. Thereby, the distance of the L channel is artificially increased in the system outside the adaptive filter 13. The analysis control unit 14 can determine that the peak of the filter coefficient that moves in response to the change in the delay amount corresponds to the output sound of the speaker 18L. Therefore, the position of the microphone 11 (distance from the speakers 18L and 18R) can be detected.

分析制御部１４がマイク１１の位置を検出する手法について詳細に説明する。図５にこのカラオケ装置における適応フィルタ１３のフィルタ係数の時間軸成分を示す。同図に示すグラフの横軸は時間を表し、縦軸はレベルを表す。同図においてはスピーカが２つ、マイクが１つであって、それぞれのスピーカとマイクとの距離が異なる場合のフィルタ係数の例を示す。なお、同図においても説明を容易にするためにフィルタ係数は連続信号として表す。 A method in which the analysis control unit 14 detects the position of the microphone 11 will be described in detail. FIG. 5 shows time axis components of filter coefficients of the adaptive filter 13 in this karaoke apparatus. The horizontal axis of the graph shown in the figure represents time, and the vertical axis represents level. In the figure, an example of filter coefficients in the case where there are two speakers and one microphone and the distance between each speaker and the microphone is different is shown. In the figure, the filter coefficients are represented as continuous signals for ease of explanation.

適応フィルタ１３のフィルタ係数の時間軸成分は、上述したように、スピーカからマイクに至る帰還信号に対応する。スピーカからマイクに至る帰還信号のレベルが大きい場合は、これをキャンセルするためにフィルタ係数のレベルが大きくなる。図１に示したカラオケ装置においては、通常、図３（Ｂ）に示したように、フィルタ係数は所定レベル以上のピークを２つ有している。すなわち、図１のカラオケ装置は、スピーカが２つ、マイクが１つであって、それぞれのスピーカとマイクとの距離が異なるため、フィルタ係数は時間軸成分の異なるピークを２つ有することとなる。時間軸成分の小さい（時間が短い）ピークはマイクとの距離が近いスピーカに対応するピークであり、時間軸成分の大きい（時間が長い）ピークはマイクとの距離が遠いスピーカに対応するピークである。 As described above, the time axis component of the filter coefficient of the adaptive filter 13 corresponds to the feedback signal from the speaker to the microphone. When the level of the feedback signal from the speaker to the microphone is high, the level of the filter coefficient is increased to cancel this. In the karaoke apparatus shown in FIG. 1, normally, as shown in FIG. 3B, the filter coefficient has two peaks of a predetermined level or more. That is, since the karaoke apparatus of FIG. 1 has two speakers and one microphone and the distance between each speaker and the microphone is different, the filter coefficient has two peaks with different time axis components. . A peak with a small time axis component (short time) corresponds to a speaker with a short distance to the microphone, and a peak with a large time axis component (long time) corresponds to a speaker with a long distance to the microphone. is there.

図４の分析制御部１４は、ディレイ２３の遅延量を変化させる。すると、図５に示すように、フィルタ係数の２つのピークのうち、一方のピークの時間軸上の位置が移動することとなる。 The analysis control unit 14 in FIG. 4 changes the delay amount of the delay 23. Then, as shown in FIG. 5, the position on the time axis of one of the two peaks of the filter coefficient moves.

すなわち、Ｌチャンネルの出力音声信号が遅延時間を付与されてスピーカから出力されるため、適応フィルタ１３は、遅延された音声信号にあわせてフィルタ係数を適応させ、ピークの位置が移動する。この移動したピークがＬチャンネルの直接音に対応するピークであると判断することができる。分析制御部１４は、このピークの移動を検出することにより、そのピークがＬチャンネルのスピーカから出力された音声信号に対応するものであるかを判断することができる。なお、遅延量は固定（分析制御部１４はディレイ２３をオン／オフするのみ）であってもよいし、変化させるようにしてもよい。また、図４ではディレイ２３をＬチャンネルの信号に接続する例を示したが、Ｒチャンネルであってもよい。 That is, since the output audio signal of the L channel is output from the speaker with a delay time, the adaptive filter 13 adapts the filter coefficient according to the delayed audio signal, and the peak position moves. It can be determined that the shifted peak is a peak corresponding to the direct sound of the L channel. The analysis control unit 14 can determine whether the peak corresponds to the audio signal output from the L channel speaker by detecting the movement of the peak. The delay amount may be fixed (the analysis control unit 14 only turns on / off the delay 23) or may be changed. Further, although FIG. 4 shows an example in which the delay 23 is connected to an L channel signal, an R channel may be used.

以上のようにして、分析制御部１４は、フィルタ係数の複数のピークがそれぞれどのスピーカから出力された音声信号に対応するものであるかを判断し、各スピーカとマイクとの距離を検出することができる。各スピーカを識別してマイクとの距離を検出するので、スピーカの左右を間違えることなくマイクの位置を検出することができる。検出したマイクの位置情報は、カラオケ装置の音響制御部等に出力され、音響効果を付与するために用いられる。無論、マイクの位置情報を他の用途に適用してもよい。 As described above, the analysis control unit 14 determines which speaker each of the plurality of peaks of the filter coefficient corresponds to the sound signal output, and detects the distance between each speaker and the microphone. Can do. Since each speaker is identified and the distance to the microphone is detected, the position of the microphone can be detected without making a mistake in the left and right sides of the speaker. The detected position information of the microphone is output to an acoustic control unit or the like of the karaoke apparatus and used for providing an acoustic effect. Of course, the microphone position information may be applied to other purposes.

また、他の変形例として、以下のような態様も考えられる。図６は、他の変形例に係るカラオケ装置の主要部を示すブロック図である。なお、図１に示したカラオケ装置の主要部のブロック図と共通する構成部分については同一の符号を付し、その説明を省略する。同図に示すように、このカラオケ装置は、マイク１１、Ａ／Ｄコンバータ１２、適応フィルタ１３、分析制御部１４、ディレイ３１Ｌ，Ｒ、加算器２１Ｌ，Ｒ、Ｄ／Ａコンバータ１６、増幅器１７、スピーカ１８Ｌ，Ｒ、およびミキサ３２を備えている。 Moreover, the following aspects are also considered as another modification. FIG. 6 is a block diagram showing a main part of a karaoke apparatus according to another modification. In addition, about the component which is common in the block diagram of the principal part of the karaoke apparatus shown in FIG. 1, the same code | symbol is attached | subjected and the description is abbreviate | omitted. As shown in the figure, this karaoke apparatus includes a microphone 11, an A / D converter 12, an adaptive filter 13, an analysis control unit 14, delays 31L and R, adders 21L and R, a D / A converter 16, an amplifier 17, Speakers 18L and 18R and a mixer 32 are provided.

この例において、適応フィルタ１３は、分析制御部１４、ディレイ３１Ｌ，Ｒ、およびミキサ３２に接続されている。また、ディレイ３１Ｌは、加算器２１Ｌ、および分析制御部１４に接続され、ディレイ３１Ｒは、加算器２１Ｒ、および分析制御部１４に接続されている。ミキサ３２は、カラオケ装置の楽音再生部（図示せず）から出力されたステレオ楽音信号（Ｌチャンネル，Ｒチャンネル）と、適応フィルタ１３の出力信号とをミキシングして、適応フィルタ１３に出力する。 In this example, the adaptive filter 13 is connected to the analysis control unit 14, the delays 31 L and R, and the mixer 32. The delay 31L is connected to the adder 21L and the analysis control unit 14, and the delay 31R is connected to the adder 21R and the analysis control unit 14. The mixer 32 mixes the stereo tone signal (L channel, R channel) output from the tone playback unit (not shown) of the karaoke apparatus and the output signal of the adaptive filter 13 and outputs the mixed signal to the adaptive filter 13.

ディレイ３１Ｌ，Ｒは、それぞれ適応フィルタの出力信号に対し、分析制御部１４が設定した遅延量を付与して出力する。ディレイ３１Ｌ，Ｒの出力信号はそれぞれ加算器２１Ｌ，Ｒに入力される。加算器２１Ｌ，Ｒは、それぞれディレイ３１Ｌ，Ｒの出力信号とＬチャンネル楽音信号，Ｒチャンネル楽音信号をミキシングしてＤ／Ａコンバータ１６に出力する。Ｄ／Ａコンバータ１６は、加算器２１Ｌ，Ｒの出力信号をそれぞれアナログ音声信号に変換し、増幅器１７に出力する。増幅器１７は、それぞれのアナログ信号（Ｌチャンネル，Ｒチャンネル）を増幅し、スピーカ１８Ｌ，Ｒに出力する。スピーカ１８Ｌ、およびスピーカ１８Ｒは、それぞれのチャンネルの信号に応じて音声を放音する。 Each of the delays 31L and 31 outputs a delay amount set by the analysis control unit 14 with respect to the output signal of the adaptive filter. The output signals of the delays 31L and R are input to the adders 21L and R, respectively. The adders 21L and R mix the output signals of the delays 31L and R, the L channel musical sound signal, and the R channel musical sound signal, respectively, and output them to the D / A converter 16. The D / A converter 16 converts the output signals of the adders 21L and 21R into analog audio signals, respectively, and outputs them to the amplifier 17. The amplifier 17 amplifies each analog signal (L channel, R channel) and outputs it to the speakers 18L, 18R. The speaker 18L and the speaker 18R emit sound according to the signal of each channel.

分析制御部１４は、ディレイ３１Ｌ，Ｒの遅延量を変化させ、この遅延量の変化に応じて更新される適応フィルタ１３のフィルタ係数を参照し、マイク１１の位置（スピーカ１８Ｌ，Ｒからの距離）を検出する。分析制御部１４がマイク１１の位置を検出する手法については図５に示したように、ディレイ３１Ｌ（又はディレイ３１Ｒ）の遅延時間を変化させ、時間軸において移動したフィルタ係数のピークを参照することで行う。ディレイ３１Ｌの遅延により移動するピークがスピーカ１８Ｌに対応するピークであり、ディレイ３１Ｒの遅延により移動するピークがスピーカ１８Ｒに対応するピークである。また、分析制御部１４は、ディレイ３１Ｌ，Ｒの遅延量を変化させ、フィルタ係数の複数のピークが時間軸において一致するように制御を行う。図７は、この変形例における適応フィルタのフィルタ係数の時間軸成分を示す図である。同図に示すグラフについても横軸は時間を表し、縦軸はレベルを表す。また、同図においてはスピーカが２つ、マイクが１つであって、それぞれのスピーカとマイクとの距離が異なる場合のフィルタ係数の例を示す。なお、同図においても説明を容易にするためにフィルタ係数は連続信号として表す。 The analysis control unit 14 changes the delay amounts of the delays 31L and R, refers to the filter coefficient of the adaptive filter 13 that is updated according to the change in the delay amount, and determines the position of the microphone 11 (distance from the speakers 18L and R). ) Is detected. As for the method in which the analysis control unit 14 detects the position of the microphone 11, as shown in FIG. 5, the delay time of the delay 31L (or the delay 31R) is changed, and the peak of the filter coefficient moved on the time axis is referred to. To do. The peak moving due to the delay of the delay 31L is a peak corresponding to the speaker 18L, and the peak moving due to the delay of the delay 31R is a peak corresponding to the speaker 18R. Further, the analysis control unit 14 controls the delay amounts of the delays 31L and 31R so that a plurality of peaks of the filter coefficients coincide on the time axis. FIG. 7 is a diagram showing time axis components of filter coefficients of the adaptive filter in this modification. Also in the graph shown in the figure, the horizontal axis represents time, and the vertical axis represents level. In addition, FIG. 2 shows an example of filter coefficients when there are two speakers and one microphone, and the distance between each speaker and the microphone is different. In the figure, the filter coefficients are represented as continuous signals for ease of explanation.

同図（Ａ）に示すように、この例においてもスピーカが２つ、マイクが１つであって、それぞれのスピーカとマイクとの距離が異なるため、フィルタ係数は時間軸成分の異なるピークを２つ有することとなる。時間軸成分の小さいピークはマイクとの距離が近いスピーカに対応するピークであり、時間軸成分の大きいピークはマイクとの距離が遠いスピーカに対応するピークである。分析制御部１４は、ディレイ３１（ＬまたはＲ）の遅延量を変化させ、移動したピークがどのスピーカから出力された音声信号に対応するものであるかを判断する。これによりマイクの位置を検出する。検出したマイクの位置情報は、カラオケ装置の音響制御部等に出力され、音響効果を付与するために用いられる。 As shown in FIG. 5A, in this example, there are two speakers and one microphone, and the distance between each speaker and the microphone is different. Therefore, the filter coefficient has two peaks with different time axis components. Will have one. A peak with a small time axis component is a peak corresponding to a speaker with a short distance from the microphone, and a peak with a large time axis component is a peak corresponding to a speaker with a long distance from the microphone. The analysis control unit 14 changes the delay amount of the delay 31 (L or R), and determines which speaker corresponds to the sound signal output from the moved peak. Thereby, the position of the microphone is detected. The detected position information of the microphone is output to an acoustic control unit or the like of the karaoke apparatus and used for providing an acoustic effect.

この例において分析制御部１４は、さらにディレイ３１（ＬまたはＲ）の遅延量を変化させ、移動したピークを時間軸上で他方のピークに一致させる。すなわち、いずれか一方のスピーカに入力される音声信号を遅延させ、仮想的に、それぞれのスピーカとマイクとの距離を等しくする。これにより、それぞれのスピーカから放音された音声のうち、マイクで収音した音声の成分（拡声音）に関しては、歌唱者の位置に同時に到達することとなる。 In this example, the analysis control unit 14 further changes the delay amount of the delay 31 (L or R) to match the moved peak with the other peak on the time axis. That is, the audio signal input to one of the speakers is delayed to virtually equalize the distance between each speaker and the microphone. Thereby, among the sounds emitted from the respective speakers, the components of the sound collected by the microphones (speech sounds) reach the singer's position at the same time.

このように、ＬチャンネルまたはＲチャンネルの信号（拡声音）が遅延され、マイク１１に同時に到達するので、歌唱者の位置でマイク音声の音像が定位し、歌唱音声を歌唱者に集中させることができる。さらに、この例によれば、カラオケ楽音信号については遅延しない（ディレイ３１にはマイクで収音した信号のみ入力される）ため、上述の図４に示した例に比較してカラオケ楽音の音像定位感を崩しにくく、さらに音質の向上が期待できる。 In this way, since the L channel or R channel signal (loud sound) is delayed and reaches the microphone 11 at the same time, the sound image of the microphone sound is localized at the position of the singer, and the singing sound can be concentrated on the singer. it can. Further, according to this example, the karaoke music sound signal is not delayed (only the signal picked up by the microphone is input to the delay 31), so that the sound image localization of the karaoke music sound is compared with the example shown in FIG. It is difficult to destroy the feeling, and further improvement in sound quality can be expected.

なお、この変形例においても、マイクの位置情報はカラオケ装置の音響制御部に出力して、音響制御部において音響効果を付与するために用いてもよいし、他の用途に適用してもよい。 In this modification as well, the microphone position information may be output to the acoustic control unit of the karaoke apparatus and used to impart an acoustic effect in the acoustic control unit, or may be applied to other uses. .

なお、上記例においては、カラオケ装置について説明したが、本発明の適用例はこれに限らず、ＰＡシステム（拡声装置）に用いることも可能である。拡声装置に用いる場合、図１においては加算器１５を省略し、楽音信号を入力しない構成とする。この場合、本発明の構成要件である出力音声信号生成部は、例えば増幅器に該当する。 In the above example, the karaoke apparatus has been described. However, the application example of the present invention is not limited to this, and can be used in a PA system (speaking apparatus). When used in a loudspeaker, the adder 15 is omitted in FIG. 1 and no musical tone signal is input. In this case, the output audio signal generation unit which is a constituent requirement of the present invention corresponds to, for example, an amplifier.

また、本発明のオーディオシステムは、上記の様にカラオケ装置（拡声装置）に適用する場合に限るものではない。例えば図８に示すように、音声入出力装置（所謂、通信会議装置）に適用することもできる。図８は、音声入出力装置の構成を示すブロック図である。なお、図１に示したカラオケ装置の主要部のブロック図と共通する構成部分については同一の符号を付し、その説明を省略する。同図に示すように、この音声入出力装置は、マイク１１、Ａ／Ｄコンバータ１２、適応フィルタ１３Ａ、適応フィルタ１３Ｂ、分析制御部１４、Ｄ／Ａコンバータ１６、増幅器１７、スピーカ１８Ｌ，Ｒ、および入出力インタフェース５１を備えている。 In addition, the audio system of the present invention is not limited to the case where it is applied to a karaoke apparatus (speaking apparatus) as described above. For example, as shown in FIG. 8, the present invention can be applied to a voice input / output device (so-called communication conference device). FIG. 8 is a block diagram showing the configuration of the voice input / output device. In addition, about the component which is common in the block diagram of the principal part of the karaoke apparatus shown in FIG. 1, the same code | symbol is attached | subjected and the description is abbreviate | omitted. As shown in the figure, the voice input / output device includes a microphone 11, an A / D converter 12, an adaptive filter 13A, an adaptive filter 13B, an analysis control unit 14, a D / A converter 16, an amplifier 17, speakers 18L and R, And an input / output interface 51.

この例において、適応フィルタ１３Ａおよび適応フィルタ１３Ｂは、分析制御部１４と入出力インタフェース５１に接続されている。また、入出力インタフェース５１はＤ／Ａコンバータ１６、および他の音声入出力装置等（図示せず）に接続されている。 In this example, the adaptive filter 13A and the adaptive filter 13B are connected to the analysis control unit 14 and the input / output interface 51. The input / output interface 51 is connected to the D / A converter 16 and other audio input / output devices (not shown).

入出力インタフェース５１は、ネットワーク等を介して接続される他の音声入出力装置から音声信号（発話信号）を受信する。この受信信号は、Ｄ／Ａコンバータ１６、増幅器１７を介してスピーカ１８から音声として発せられる。また、入出力インタフェース５１は、マイク１１で収音し、Ａ／Ｄコンバータ１２、および適応フィルタ１３Ａ，Ｂを介して入力された音声信号をそれぞれ、他の音声入出力装置に出力する。 The input / output interface 51 receives an audio signal (utterance signal) from another audio input / output device connected via a network or the like. This received signal is emitted as sound from the speaker 18 via the D / A converter 16 and the amplifier 17. The input / output interface 51 collects sound by the microphone 11 and outputs the audio signals input through the A / D converter 12 and the adaptive filters 13A and 13B to other audio input / output devices.

この例において、適応フィルタ１３Ａ、および適応フィルタ１３Ｂは、入出力インタフェースが受信した複数の発話信号（図８における受信信号Ａ、Ｂ）をそれぞれ入力し、マイク１１で収音した音声から発話信号の帰還成分をキャンセルする。すなわち、この例において、適応フィルタ１３はスピーカ１８からマイク１１に帰還する音声の成分（エコー）をキャンセルする。各スピーカ１８Ｌ，Ｒにはそれぞれ異なる発話信号（受信信号Ａ、Ｂ）が入力され、各スピーカからは異なる音声が発せられる。適応フィルタ１３Ａ、および適応フィルタ１３Ｂは、それぞれの発話信号に係る帰還成分をキャンセルする。 In this example, the adaptive filter 13A and the adaptive filter 13B each receive a plurality of utterance signals (received signals A and B in FIG. 8) received by the input / output interface, and the utterance signal is extracted from the sound collected by the microphone 11. Cancel the feedback component. That is, in this example, the adaptive filter 13 cancels the sound component (echo) that returns from the speaker 18 to the microphone 11. Different speech signals (received signals A and B) are input to the speakers 18L and 18R, and different sounds are emitted from the speakers. The adaptive filter 13A and the adaptive filter 13B cancel the feedback component related to each speech signal.

分析制御部１４は、適応フィルタ１３Ａ、適応フィルタ１３Ｂのフィルタ係数をそれぞれ参照し、上述の様にスピーカ１８からマイク１１に至る直接音の到達時間を検出する。この例においては、各スピーカから発せられる音声が異なる（各スピーカに対応する適応フィルタが１対１に決まっている）ため、分析制御部１４においてそれぞれのスピーカとマイクとの距離を算出することができる。したがって、この音声入出力装置においては音声信号に遅延時間を付与しなくとも詳細にマイクの位置を検出（２次元的に検出）することができる。 The analysis control unit 14 refers to the filter coefficients of the adaptive filter 13A and the adaptive filter 13B, respectively, and detects the arrival time of the direct sound from the speaker 18 to the microphone 11 as described above. In this example, since the sound emitted from each speaker is different (the adaptive filter corresponding to each speaker is determined on a one-to-one basis), the analysis controller 14 can calculate the distance between each speaker and the microphone. it can. Therefore, in this voice input / output device, the position of the microphone can be detected in detail (two-dimensionally detected) without giving a delay time to the voice signal.

なお、この例においてもマイクの位置情報は、音声入出力装置の音響制御部に出力し、音響制御部において音響効果を付与するために用いてもよいし、他の音声入出力装置に出力する等、どの様な用途に適用してもよいものである。 In this example as well, the position information of the microphone is output to the sound control unit of the sound input / output device, and may be used for providing the sound effect in the sound control unit, or is output to another sound input / output device. The present invention may be applied to any use.

カラオケ装置の主要部の構成を示すブロック図Block diagram showing the configuration of the main part of the karaoke device 適応フィルタの詳細な構成を示すブロック図Block diagram showing detailed configuration of adaptive filter 適応フィルタのフィルタ係数の時間軸成分を示す図The figure which shows the time-axis component of the filter coefficient of an adaptive filter 変形例に係るカラオケ装置の主要部の構成を示すブロック図The block diagram which shows the structure of the principal part of the karaoke apparatus which concerns on a modification. 変形例における適応フィルタのフィルタ係数の時間軸成分を示す図The figure which shows the time-axis component of the filter coefficient of the adaptive filter in a modification 他の変形例に係るカラオケ装置の主要部の構成を示すブロック図The block diagram which shows the structure of the principal part of the karaoke apparatus which concerns on another modification. 他の変形例における適応フィルタのフィルタ係数の時間軸成分を示す図The figure which shows the time-axis component of the filter coefficient of the adaptive filter in another modification 音声入出力装置の構成を示すブロック図Block diagram showing the configuration of the voice input / output device

符号の説明Explanation of symbols

１１−マイク
１２−Ａ／Ｄコンバータ
１３−適応フィルタ
１４−分析制御部
１５−加算器
１６−Ｄ／Ａコンバータ
１７−増幅器
１８−スピーカ 11-Microphone 12-A / D converter 13-Adaptive filter 14-Analysis control unit 15-Adder 16-D / A converter 17-Amplifier 18-Speaker

Claims

音声を収音し、収音信号を出力するマイクと、
出力音声信号を音声として放音するスピーカと、
前記スピーカから前記マイクに至る音声伝搬経路を模したフィルタ係数を算出する係数推定部、前記フィルタ係数が設定され、前記出力音声信号をフィルタリングして前記音声伝搬経路を帰還した帰還音声信号の模擬信号を出力するフィルタ、および、前記収音信号から前記模擬信号を減算することにより収音信号中の帰還音声信号成分を除去した残差信号を出力する減算部、を備えた適応フィルタと、
前記残差信号を含む出力音声信号を生成する出力音声信号生成部と、
前記フィルタ係数の時間軸におけるピーク位置に基づいて前記スピーカと前記マイクとの距離を推定するマイク位置推定手段と、
を備えたオーディオシステム。 A microphone that picks up sound and outputs a picked up signal;
A speaker that emits the output audio signal as audio;
A coefficient estimator that calculates a filter coefficient that imitates a sound propagation path from the speaker to the microphone, a simulated signal of a feedback sound signal in which the filter coefficient is set, the output sound signal is filtered and the sound propagation path is fed back An adaptive filter comprising: a filter that outputs a residual signal obtained by subtracting the simulated signal from the collected sound signal and removing a feedback audio signal component in the collected sound signal;
An output audio signal generation unit that generates an output audio signal including the residual signal;
Microphone position estimation means for estimating a distance between the speaker and the microphone based on a peak position on the time axis of the filter coefficient;
Audio system with

前記スピーカを複数備え、
前記マイク位置推定手段は、前記フィルタ係数から複数のピークを検出し、各ピーク位置に基づいて各スピーカと前記マイクとの距離を推定する請求項１に記載のオーディオシステム。 A plurality of the speakers;
The audio system according to claim 1, wherein the microphone position estimation unit detects a plurality of peaks from the filter coefficient and estimates a distance between each speaker and the microphone based on each peak position.

特定のスピーカに入力する出力音声信号を遅延させる出力音声信号遅延部をさらに備え、
前記マイク位置推定手段は、前記遅延部の遅延時間を変化させるとともに、この遅延時間の変化によるフィルタ係数のピークの移動を検出することにより、前記特定のスピーカに対応するピークを割り出す請求項２に記載のオーディオシステム。 An output audio signal delay unit that delays an output audio signal input to a specific speaker;
3. The microphone position estimating means detects a peak corresponding to the specific speaker by changing a delay time of the delay unit and detecting a shift of a peak of a filter coefficient due to the change of the delay time. The audio system described.

前記出力音声信号生成部を前記複数のスピーカのそれぞれに独立して備えるとともに、
前記残差信号を遅延させて各出力音声信号生成部に入力する残差信号遅延部を各出力音声信号生成部毎に独立してさらに備え、
前記マイク位置推定手段は、前記残差信号遅延部の遅延時間を変化させるとともに、この遅延時間の変化によるフィルタ係数のピークの移動を検出することにより、前記各スピーカに対応するピークを割り出す請求項２に記載のオーディオシステム。 The output audio signal generator is provided independently for each of the plurality of speakers, and
A residual signal delay unit that delays the residual signal and inputs it to each output audio signal generation unit is further provided independently for each output audio signal generation unit,
The microphone position estimation means determines a peak corresponding to each speaker by changing a delay time of the residual signal delay unit and detecting a shift of a peak of a filter coefficient due to the change of the delay time. 2. The audio system according to 2.

前記マイク位置推定手段は、前記フィルタ係数の各スピーカに対応するピークが時間軸上で一致するように前記残差信号遅延部の遅延時間を設定する請求項４に記載のオーディオシステム。 5. The audio system according to claim 4, wherein the microphone position estimation unit sets a delay time of the residual signal delay unit so that peaks corresponding to the respective speakers of the filter coefficient coincide on the time axis.