CN103995252A

CN103995252A - Three-dimensional space sound source positioning method

Info

Publication number: CN103995252A
Application number: CN201410202062.0A
Authority: CN
Inventors: 郭业才; 朱赛男; 张宁; 黄友锐
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2014-05-13
Filing date: 2014-05-13
Publication date: 2014-08-20
Anticipated expiration: 2034-05-13
Also published as: CN103995252B

Abstract

The invention provides a three-dimensional space sound source positioning method, and belongs to the technical field of sound source positioning. The method includes the steps that a double-L-shaped microphone array is established; on the basis of the normalization frequency domain minimum mean square method, a penalty function is introduced to correct frequency spectrum energy, and then impulse responses of different array elements are adaptively estimated and delay inequality is calculated; position coordinates of a sound source are determined through the relation between the acquired delay inequality and the position of double-L-shaped microphones. Compared with a traditional positioning method, the array structure is simple, the calculated quantity is small, anti-noise performance, reverberation resistance and positioning accuracy are effectively improved, and the method is more suitable for indoor three-dimensional sound source positioning and can be widely applied to a car hands-free phone, a video conference system, a speech recognition system, an intelligent robot and other fields.

Description

A kind of three dimensions sound localization method

Technical field

The invention belongs to auditory localization technical field, especially relate to a kind of method of utilizing microphone array to carry out sound source three-dimensional location.

Background technology

At present, auditory localization based on microphone array is a major issue in acoustics signal processing field, compared to traditional Array Signal Processing, the voice signal of array microphone processing does not have carrier wave, the range of signal that can process is wide, adaptable, be all widely used in fields such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robots.Auditory localization based on microphone array mainly contain based on steerable beam form localization method, based on arrive delay inequality localization method and based on high-resolution localization method.Wherein, the voice signal that the localization method forming based on steerable beam receives microphone array carries out filtering, weighted sum, then directly control microphone and point to the direction that makes wave beam have peak power output, because its computation complexity height can not be used to real time processing system; The correlation matrix solving between Mike's signal based on high-resolution localization method utilization is made deflection, thereby further make sound source position, although be successfully applied to the processing of some array signals, but locating effect is not good in actual applications, be not only subject to array structure restriction and also signal not steadily time calculated amount can be multiplied; And wherein, be by estimating that sound source is to the delay inequality between microphone and according to estimating sound source position in the position of microphone based on arriving the sound localization method of delay inequality, be not subject to array structure restriction, calculated amount little.

Sound localization method based on arriving delay inequality mainly divides two parts to complete: time delay is estimated and location.Delay time estimation method mainly contains broad sense cross-correlation method (Generalized Cross Correlation, GCC), minimum mean square error method (Least Mean Square, and self-adaptive features value decomposition method (Adaptive Engenvalue Decomposition, AED) LMS).Wherein, GCC method performance under reverberation environment can decline a lot; LMS method performance is basic suitable with it; AED method is by estimating that double-channel impulse response suppresses reverberation, but it requires double-channel relatively prime, does not contain common zero point.But in actual indoor environment, impulse response length is generally very long, the relatively prime possibility of double-channel is very little, and AED method is no longer applicable.In order to improve relatively prime possibility, AED binary channels method of estimation is generalized to self-adapting multi-channel method of estimation (Adaptive Multichannel by the people such as Y (Arden) Huang, AMC), proposition utilizes normalization hyperchannel frequency domain least-square methods (Normalized Multichannel Frequency-domain Least Mean Square, NMCFLMS) to estimate the impulse response of each array element.The basic thought of NMCFLMS method is, observation signal is divided into continuous block signal, adopt frequency domain normalization minimum mean-square method to carry out channel estimation in frequency domain, though there is the feature of the low and Newton method fast convergence rate of hyperchannel frequency domain least-square methods complexity, but in the time that channel additive noise exists, may disperse, can not effectively carry out channel estimating.

Localization method mainly contains least square method and geometry location method, the former calculation of complex, and to initial value sensitivity; The latter utilizes microphone position parameter and time delay estimated value to determine bi-curved funtcional relationship, and multiple bi-curved intersection points are definite sound source position, calculates simply, but easily occurs not exclusive closed solution, is difficult to effective location.

Summary of the invention

May disperse in the time that channel additive noise exists for sound localization method in prior art, can not effectively carry out channel estimating and be difficult to the defect of effective location sound source, the invention provides a kind of improved three dimensions sound localization method, utilize the method for revising normalization hyperchannel frequency domain least-square methods (MNMCFLMS) and limiting channel impulse response length to carry out channel estimating, effectively improved the precision of auditory localization.

In order to achieve the above object, the invention provides following technical scheme:

A kind of three dimensions sound localization method, comprises the steps:

Steps A, in three-dimensional cartesian coordinate system, the L-type microphone array of setting up two at grade and being oppositely arranged;

Step B, adopts the normalization hyperchannel frequency domain least-square methods of revising to estimate that sound-source signal arrives the delay inequality of each microphone, comprises the steps:

Step B-1, sound-source signal s (n) is through the channel impulse response h of i microphone _i(n) rear and channel additive noise v _i(n) merge, obtain the reception signal x of i microphone _i(n): x _i(n)=s ^t(n) h _i(n)+v _i(n);

Wherein, T representing matrix matrix transpose operation; N is time series, is integer;

In the time disregarding channel additive noise, the reception signal x of i microphone _i(n) with the reception signal x of j microphone _j(n) pass between is:

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n);

Step B-2, with length be 2L _hthe reception signal x of rectangular window function w (n) to i microphone _i(n) carry out windowing process, the n frame that obtains i microphone receives signal and is:

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} (n L_{h} - L_{h}), x_{i} (n L_{h} - L_{h} + 1), . . ., x_{i} (n L_{h} + L_{h} - 1)]}^{T};

In formula, L _hrepresent channel impulse response h _i(n) length is integer; N=1 ..., int (N/L _h-1); N is the sequence total length that microphone receives signal, is integer; Int (N/L _h-1) be to N/L _h-1 integer obtaining after rounding downwards;

Step B-3, utilizes Fourier transform that the time-domain signal obtaining in step B-2 is transformed to frequency-region signal:

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1};

In formula, for 2L _h× 2L _hdimension Fourier transform matrix;

Step B-4, introduces penalty to spectrum energy correction, estimates adaptively channel frequency domain response

As channel additive noise v _i(n), while existence, error of frequency domain function definition is

{\underset{&OverBar;}{e}}_{ij} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{i} (n)

In formula

D_{i} = diag (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j; I, j=1,2 ..., M, be respectively L _h× L _hdimension Fourier transform matrix and Fourier transform inverse matrix; Diag represents diagonal matrix; represent dimension unit matrix; represent L _h× L _hdimension null matrix;

Utilize penalty J _p(n) by the cost function J of Lagrange multiplier β (n) to NMCFLMS _f(n) revise, the cost function that obtains revising is

J _mod(n)＝J _f(n)+β(n){-J _p(n)}

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{{| | &dtri; J_{p} (n) | |}^{2}}

In formula, the cost function gradient that the value of Lagrange multiplier β (n) is revised during by stable state equals zero, i.e. ▽ J _f(n)=β (n) ▽ J _p(n) time, obtain, H represents conjugate transpose;

By the cost function J revising _mod(n), obtain channel frequency domain response more new formula be

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \underset{&OverBar;}{\hat{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{{| | &dtri; J_{p}^{10} | |}^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + δ I_{2 L_{h} \times L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{ki}^{01} (n)

In above formula,

P_{i} (n) = λ P_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{ki}^{01} (n) = F_{{2 L}_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{ki} (n)

In formula, K (n) is that diagonal entry is diagonal matrix; P _i(n) be the spectrum energy of multi-channel output signal, can obtain more stable spectrum energy by forgetting factor λ, parameter δ is normal number, and it can effectively solve the noise scale-up problem that spectrum energy hour causes;

Step B-5, to channel frequency domain response carry out Fourier inversion, obtain channel impulse response estimation and be

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times {2 L}_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n)

In formula, for 2L _h× 2L _hdimension Fourier transform inverse matrix;

Step B-6, in adaptive process, the channel impulse response of estimation the corresponding time delay of the peak value of middle appearance is exactly the reception direct sound wave signal time delay of i microphone, and sound-source signal arrives the delay inequality between i microphone and j microphone and is

{\hat{τ}}_{ij} = (\max_{l = 1}^{L_{h}} | {\hat{h}}_{il} (n) | - \max_{l = 1}^{L_{h}} | {\hat{h}}_{jl} (n) |) / f_{s}

In formula, f _sfor the sample frequency of signal, max represents to get maximal value;

Step C, is multiplied each other and is obtained sound-source signal and arrive the range difference of each microphone by the delay inequality described in step B and the velocity of sound, and according to the position relationship of each microphone, determine the position of described sound source.

Further, described J _p(n) be the penalty under constraint condition, constraint condition is

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln ({| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{h} - 1} {| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2} = \frac{1}{{ML}_{h}} \end{matrix}

In formula, at penalty J _p(n) while maximization, constraint condition set up; S.t. represent constraint condition.

Further, the length L of channel impulse response in described step B-2 _hrestrictive condition is:

L_{h} \leq 2 f_{s} \max_{i, j = 1}^{M} {τ_{ij, \max}}

In formula, τ _{ij, max}=d _ij/ c is that between i and j microphone, maximum delay is poor, d _ijbe the distance between i and j microphone, the velocity of propagation that c is sound source.

Further, in described steps A two L-type arrays to lay rule as follows: first microphone mic1 is placed in X-axis, and its coordinate is (d, 0,0); Second microphone mic2 and true origin are overlapping, and its coordinate is (0,0,0); The 3rd microphone mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is distance between two adjacent microphones, is real number, and first, second and third microphone has formed first L-type microphone array; Meanwhile, taking x=D/2 as axis of symmetry, set up second the L-type microphone array relative with first L-type microphone array, wherein, the coordinate of the 4th microphone mic4 is (D, 0,0), the coordinate of the 5th microphone mic5 is (D-d, 0,0), the coordinate of the 6th microphone mic6 is (D, d, 0); D is two distances between relative L-type microphone array.

Further, in described step C, determine that the step of sound source position is as follows:

Step C-1, position is the delay inequality that the sound-source signal of (x, y, z) arrives i microphone and j microphone with i microphone and j microphone space from d _ijpass be relation has following two groups in detail:

First group:

\sqrt{x^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{21}} (c^{2} {\hat{τ}}_{21}^{2} + 2 dx - d^{2})

With

\sqrt{x^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{23}} (c^{2} {\hat{τ}}_{23}^{2} + 2 dy - d^{2});

Second group:

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{45}} (c^{2} {\hat{τ}}_{45}^{2} d^{2} - 2 d (x - D))

With

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{46}} (c^{2} {\hat{τ}}_{46}^{2} + 2 Dx + D^{2} - 2 dy - d^{2});

In formula, for the delay inequality between sound-source signal to the second microphone and first microphone; for the delay inequality of sound-source signal to the second microphone and the 3rd microphone, for the delay inequality of four microphones of sound-source signal to the and the 5th microphone, for the delay inequality of four microphones of sound-source signal to the and the 6th microphone;

Step C-2, according to first group described in step C-1 with second group of relation, obtain the projection coordinate of sound source in xoy plane and be

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}}

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}}

\hat{z} = 0

In formula,

a_{1} = {\hat{τ}}_{23} / {\hat{τ}}_{21}; a_{2} = - {\hat{τ}}_{46} / {\hat{τ}}_{45}; b_{1} = (c^{2} {\hat{τ}}_{21} {\hat{τ}}_{23} + d^{2}) ({\hat{τ}}_{21} - {\hat{τ}}_{23}) / (2 d {\hat{τ}}_{21});

b_{2} = ((c^{2} {\hat{τ}}_{45} {\hat{τ}}_{46} + d^{2}) ({\hat{τ}}_{45} - {\hat{τ}}_{46}) + 2 dD {\hat{τ}}_{45}) / (2 d {\hat{τ}}_{45});

Step C-3, projection coordinate by the sound source described in step C-2 in xoy plane in first group described in substitution step C-1 and four relational expressions of second group, obtains four z axial coordinate estimated values of sound source respectively getting the mean value of these four z coordinates estimates as the z coordinate of sound source for:

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} .

Compared with prior art, tool of the present invention has the following advantages and beneficial effect:

The present invention, on normalization frequency domain least-square methods basis, by introducing penalty, spectrum energy is revised, thereby it is poor to estimate adaptively impulse response the calculation delay of different array elements, avoids channel estimating to occur worsening; Utilize interior two the unique definite sound sources of crossing straight line of plane projected position planar, avoided the appearance of not exclusive closed solution problem; Define in addition channel impulse response length, strengthened the relatively prime property of double-channel, improved anti-reverberation ability.Compared with traditional localization method, array structure of the present invention is simple, calculated amount is few, noise robustness, anti-reverberation ability and positioning precision are effectively improved, more be suitable for indoor three-dimensional auditory localization, can be widely used in the every field such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robot.

Brief description of the drawings

Fig. 1 is the flow chart of steps of three dimensions sound localization method provided by the invention;

Fig. 2 is double L-shaped microphone array setting principle figure;

Fig. 3 is the normalization hyperchannel frequency domain least-square methods schematic diagram of revising;

Fig. 4 is that in embodiment, microphone array is listed in the schematic diagram of putting in room;

Fig. 5 is normalized projection error convergence curve figure.

Embodiment

Below with reference to specific embodiment, technical scheme provided by the invention is elaborated, should understands following embodiment and only be not used in and limit the scope of the invention for the present invention is described.

First the present invention improves normalization hyperchannel frequency domain least-square methods NMCFLMS, increases its performance; Limit channel impulse response length simultaneously, strengthen the relatively prime property of double-channel; And utilize the NMCFLMS revising to carry out channel estimating.Specifically, process flow diagram as shown in Figure 1, comprises the steps:

Steps A, sets up three-dimensional coordinate system, at grade, 6 microphones is put into two relative L-type arrays, and between them, distance is D, and as shown in Figure 2: first microphone mic1 is placed in X-axis, its coordinate is (d, 0,0); Second microphone mic2 and true origin are overlapping, and its coordinate is (0,0,0); The 3rd microphone mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is distance between two adjacent microphones, is real number, and first, second and third microphone has formed first L-type microphone array.Meanwhile, taking x=D/2 as axis of symmetry, set up second the L-type microphone array relative with first L-type microphone array, wherein, the coordinate of the 4th microphone mic4 is (D, 0,0), the coordinate of the 5th microphone mic5 is (D-d, 0,0), the coordinate of the 6th microphone mic6 is (D, d, 0); D is two distances between relative L-type microphone array.The projection coordinate of the coordinate (x, y, z) of sound source in xoy plane is (x, y, 0), and the distance between projection coordinate's point and initial point is γ, and both lines and x axle forward angle are θ.

Step B, adopts the normalization hyperchannel frequency domain least-square methods MNMCFLMS revising, and estimates that sound-source signal arrives the delay inequality of each microphone, its schematic diagram as shown in Figure 3:

Step B-1, the channel impulse response h of sound-source signal s (n), an i microphone _i(n), channel additive noise v _iand the reception signal x of i microphone (n) _i(n) pass between is

x _i(n)＝s ^T(n)h _i(n)+v _i(n) (1)

In formula, T representing matrix matrix transpose operation; N is time series, is integer.

In the time disregarding channel additive noise, the reception signal x of i microphone _i(n) with the reception signal x of j microphone _j(n) pass between is

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n) - - - (2)

In formula, i, j=1,2 ..., M, the number that M is microphone, is positive integer;

Step B-2, carries out frame to the reception signal of i microphone and moves as L _hwindowing process: with length be 2L _hrectangular window function w (n) is multiplied by the reception signal of i microphone, and the n frame that obtains i microphone receives signal and is:

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} (n L_{h} - L_{h}), x_{i} (n L_{h} - L_{h} + 1), . . ., x_{i} (n L_{h} + L_{h} - 1)]}^{T} - - - (3)

In formula, L _hrepresent channel impulse response h _ilength, be integer; N=1 ..., int (N/L _h-1); N is the sequence total length that microphone receives signal, is integer; Int (N/L _h-1) be to N/L _h-1 integer obtaining after rounding downwards;

In actual indoor environment, in order to overcome the very long very little deficiency of the relatively prime possibility of double-channel that causes of channel impulse response length, guaranteeing that double-channel is relatively prime when stronger, can be further to channel impulse response length L _hlimit, restrictive condition is:

L_{h} \leq 2 f_{s} \max_{i, j = 1}^{M} {τ_{ij, \max}} - - - (4)

In formula, be that between i and j microphone, maximum delay is poor.By limiting channel impulse response length, effectively strengthen the relatively prime property of double-channel.

Step B-3, obtains frequency-region signal to formula (3) as Fourier transform

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1} - - - (5)

In formula, for 2L _h× 2L _hdimension Fourier transform matrix.

Step B-4, introduces penalty to spectrum energy correction, estimates adaptively and estimates adaptively channel frequency domain response

{\underset{&OverBar;}{e}}_{ij} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{i} (n) - - - (6)

In formula

D_{i} = diag (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j; I, j=1,2 ..., M, be respectively L _h× L _hdimension Fourier transform matrix and Fourier transform inverse matrix; Diag represents diagonal matrix; represent L _h× L _hdimension unit matrix; represent L _h× L _hdimension null matrix.

Utilize normalization hyperchannel frequency domain least-square methods (Normalized Multichannel Frequency-domain Least Mean Square, NMCFLMS), its cost function is

J_{f} (n) = Σ_{i = 1}^{M - 1} Σ_{j = i + 1}^{M} {\underset{&OverBar;}{e}}_{ij}^{H} (n) {\underset{&OverBar;}{e}}_{ij} (n) - - - (7)

In the time that channel additive noise exists, utilize formula (7) to carry out channel frequency domain response and estimate, may disperse, channel estimating was lost efficacy.For making NMCFLMS method in the time that channel additive noise exists, there is good channel estimating performance, can utilize acoustical signal spectrum energy to there is equally distributed characteristic.For sound spectrum is uniformly distributed, utilize the law of the mean pair frequency band energy corresponding with channel impulse response to retrain, constraint condition is

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln ({| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{h} - 1} {| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2} = \frac{1}{{ML}_{h}} \end{matrix} - - - (8)

In formula, J _p(n) be the penalty under constraint condition, at penalty J _p(n) while maximization, constraint condition set up; S.t. represent constraint condition.

In order to make full use of the advantage of NMCFLMS and to make spectrum energy there is equally distributed characteristic, penalty J _p(n) by the cost function J of Lagrange multiplier β (n) to NMCFLMS _f(n) revise, the cost function that obtains revising NMCFLMS (MNMCFLMS) is

J _mod(n)＝J _f(n)+β(n){-J _p(n)} (9)

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{{| | &dtri; J_{p} (n) | |}^{2}} - - - (10)

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n) - - - (11)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \underset{&OverBar;}{\hat{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{{| | &dtri; J_{p}^{10} | |}^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + δ I_{2 L_{h} \times L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{ki}^{01} (n)

In above formula,

P_{i} (n) = λ P_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{ki}^{01} (n) = F_{{2 L}_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{ki} (n)

Revise normalization hyperchannel frequency domain least-square methods, impulse response the calculation delay that can estimate adaptively different array elements are poor, avoid channel estimating to occur worsening;

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times {2 L}_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - - - (12)

In formula, for 2L _h× 2L _hdimension Fourier transform inverse matrix;

Step B-6, according to the channel impulse response of estimating carry out time delay estimation.In adaptive process, a peak value of middle appearance, the corresponding time delay of peak value is exactly the reception direct sound wave signal time delay of i microphone, and sound-source signal arrives the delay inequality between i microphone and j microphone and is

{\hat{τ}}_{ij} = (\max_{l = 1}^{L_{h}} | {\hat{h}}_{il} (n) | - \max_{l = 1}^{L_{h}} | {\hat{h}}_{jl} (n) |) / f_{s} - - - (13)

Use d _ijrepresent the distance between i and j microphone, the velocity of propagation that c is sound source.In the time that the position coordinates of sound source is (x, y, z), sound-source signal is to the delay inequality τ between i microphone and j microphone _ijbe a constant, at this moment

d_{ij} = c \cdot {\hat{τ}}_{ij} - - - (14)

Formula (14) comprises following concrete equation:

First group:

\sqrt{x^{2} + y^{2} + z^{2}} = \sqrt{{(x - d)}^{2} + y^{2} + z^{2}} = c \cdot {\hat{τ}}_{21} - - - (15)

\sqrt{x^{2} + y^{2} + z^{2}} = \sqrt{x^{2} + {(y - d)}^{2} + z^{2}} = c \cdot {\hat{τ}}_{23} - - - (16)

Second group:

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = \sqrt{{(x - D + d)}^{2} + y^{2} + z^{2}} = c \cdot {\hat{τ}}_{45} - - - (17)

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} - = \sqrt{x^{2} + {(y - d)}^{2} + z^{2}} = c \cdot {\hat{τ}}_{46} - - - (18)

In formula, for the delay inequality of sound-source signal to the second microphone and first microphone, for the delay inequality of sound-source signal to the second microphone and the 3rd microphone, for the delay inequality of four microphones of sound-source signal to the and the 5th microphone, for the delay inequality of four microphones of sound-source signal to the and the 6th microphone.

By formula (15) and formula (16),

y＝a ₁x+b ₁ (19)

In formula

a_{1} = {\hat{τ}}_{23} / {\hat{τ}}_{21} - - - (20)

b_{1} = (c^{2} {\hat{τ}}_{21} {\hat{τ}}_{23} + d^{2}) ({\hat{τ}}_{21} - {\hat{τ}}_{23}) / (2 d {\hat{τ}}_{21}) - - - (21)

By formula (17) and (18),

y＝a ₂x+b ₂ (22)

In formula

a_{2} = - {\hat{τ}}_{46} / {\hat{τ}}_{45} - - - (23)

b_{2} = ((c^{2} {\hat{τ}}_{45} {\hat{τ}}_{46} + d^{2}) ({\hat{τ}}_{45} - {\hat{τ}}_{46}) + 2 d D / {\hat{τ}}_{45}) (2 d {\hat{τ}}_{45}) - - - (24)

By formula (19) and (22), obtain sound source and in projection coordinate's estimated value of xoy plane be

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}} - - - (25)

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}} - - - (26)

\hat{z} = 0 - - - (27)

By formula (25)-Shi (26) substitution formula (15)-Shi (18) respectively, obtain four estimated values of sound source Z coordinate, be designated as respectively accurate estimation with the mean value of these four estimated values as sound source Z coordinate,

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} - - - (28)

Through type (25), (26) and the value that (28) obtain are exactly the coordinate figure of auditory localization.

Embodiment mono-:

In order to verify the performance of the inventive method, we carry out Performance Detection for the present invention.In detection, Fig. 4 is microphone placing mould graphoid in room.The acoustic enviroment of common conference chamber, room-sized is 6m × 5m × 3.5m.Microphone array is erected at overhead on the position of about 0.5m, 6 microphones are arranged in double L-shaped microphone array as shown in Figure 4, microphone is respectively 0.5m, 0.5m from the distance on wall and ground, distance between adjacent two microphones is 1m, and the distance between L-type microphone array is 5m.Consider that microphone is little from wall distance, reverberation large, practical application has little significance, therefore, the dash area in Fig. 4 is not considered, and sound source is positioned over to the arbitrary position within two L-type microphones.Sound-source signal is that one section of male voice of recording is in advance read aloud, and its sample frequency is 25kHz.The position laying according to microphone, the length L of channel impulse response ^hparameter δ=0.1556 × 10 of=370, MNMCFLMS ^-5, μ=0.5.

Fig. 5 is SNR=20dB, mixing time delay RT ₆₀when=100ms, normalized projection error (Normalized Projection Misalignment, NPM) convergence curve, projection error NPM is

NPM (n) = 20 \log_{10} (| | h (n) - \frac{h {(n)}^{T} \hat{h} (n)}{\hat{h} {(n)}^{T} \hat{h} (n)} \hat{h} (n) | / | | | h (n) | |) - - - (29)

In formula, h (n) is the true impulse response of channel, for channel self-adapting is estimated the impulse response obtaining.

Fig. 5 shows having under channel additive noise environment, and NMCFLMS method has just been dispersed in the time of iteration approximately 450 times.And the inventive method MNMCFLMS in the time of iteration approximately 450 times with regard to stable convergence, effectively avoid the channel estimating that channel additive noise causes to worsen.

Under different reverberation environment in the time of SNR=20dB, adopt the inventive method MNMCFLMS and classical phase tranformation weighting broad sense cross correlation function method (Generalized Cross Correlation-phase Transform, GCC-PHAT) carry out the contrast of time delay estimated performance, result is as shown in table 1 below:

The performance index of the inventive method and GCC-PHAT method contrast under the low reverberation environment of table 1

In table 1, adopt non-abnormity point number percent (Percentage of Non-abnormal Point, PNP) and root-mean-square error (Root Mean Square Error, the RMSE) index as measure algorithm performance.Wherein, PNP and RMSE are defined as respectively

PNP = (1 - \frac{1}{N} Σ_{i = 1}^{N} T_{P} (τ_{i} - τ_{0})) \times 100 % - - - (30)

RMSE = \sqrt{\frac{1}{N} Σ_{i = 1}^{N} {(τ_{i} - τ_{0})}^{2}} - - - (31)

In formula

T_{P} (x) = \{\begin{matrix} 0, & | x | \leq 2 / f_{s} \\ 1, & | x | > 2 / f_{s} \end{matrix} - - - (32)

In formula, N represents time delay appreciable amt, τ ₀the actual value that time delay is estimated.At this, using the time delay estimated value that differs more than 2 sample point with actual value as an abnormity point, non-abnormal number percent is higher, and method performance is better.Table 1 shows, process many experiments, and under low reverberation environment, the inventive method is suitable with GCC-PHAT method performance; In the time that reverberation is greater than 250ms, the non-dissimilarity number percent of GCC-PHAT method significantly declines, root-mean-square error is much larger than sampling precision, cannot effectively carry out time delay estimation; But the performance of the inventive method is uninfluenced, under the environment in reverberation lower than 700ms, still estimation time delay value effectively.

Under room model, real sound source is tested, detect the performance of sound localization method of the present invention.Indoor environment parameter is as follows: SNR=20dB, RT ₆₀=150ms.The result of auditory localization, adopts absolute error as the judgment criteria of positioning performance, and for the coordinate estimated value of sound source position, the present invention is carried out to many experiments checking, result is as shown in table 2 below:

The test result of table 2 the inventive method

From upper table 2, the absolute error of this method orientation distance γ is stable be distributed in 10cm within, the absolute error of horizontal angle θ is controlled within 1 °, height absolute error is stable be distributed in 10cm within.

The disclosed technological means of the present invention program is not limited only to the disclosed technological means of above-mentioned embodiment, also comprises the technical scheme being made up of above technical characterictic combination in any.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims

1. a three dimensions sound localization method, is characterized in that, comprises the steps:

Step B, adopts the normalization hyperchannel frequency domain least-square methods NMCFLMS revising to estimate that sound-source signal arrives the delay inequality of each microphone, comprises the steps:

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n);

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} (n L_{h} - L_{h}), x_{i} (n L_{h} - L_{h} + 1), . . ., x_{i} (n L_{h} + L_{h} - 1)]}^{T};

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1};

In formula, for 2L _h× 2L _hdimension Fourier transform matrix;

{\underset{&OverBar;}{e}}_{ij} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{i} (n)

In formula

D_{i} = diag (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j; I, j=1,2 ..., M, be respectively L _h× L _hdimension Fourier transform matrix and Fourier transform inverse matrix; Diag represents diagonal matrix; represent L _h× L _hdimension unit matrix; represent L _h× L _hdimension null matrix;

J _mod(n)＝J _f(n)+β(n){-J _p(n)}

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{{| | &dtri; J_{p} (n) | |}^{2}}

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \underset{&OverBar;}{\hat{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{{| | &dtri; J_{p}^{10} | |}^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + δ I_{2 L_{h} \times L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{ki}^{01} (n)

In above formula,

P_{i} (n) = λ P_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{ki}^{01} (n) = F_{{2 L}_{h} \times {2 L}_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{ki} (n)

In formula, K (n) is that diagonal entry is diagonal matrix; P _i(n) be the spectrum energy of multi-channel output signal, forgetting factor parameter δ is normal number;

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times {2 L}_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n)

In formula, for 2L _h× 2L _hdimension Fourier transform inverse matrix;

Step B-6, in adaptive process, the channel impulse response of estimation the corresponding time delay of the peak value of middle appearance is the reception direct sound wave signal time delay of i microphone, and sound-source signal arrives the delay inequality between i microphone and j microphone and is

{\hat{τ}}_{ij} = (\max_{l = 1}^{L_{h}} | {\hat{h}}_{il} (n) | - \max_{l = 1}^{L_{h}} | {\hat{h}}_{jl} (n) |) / f_{s}

2. three dimensions sound localization method according to claim 1, is characterized in that, described J _p(n) be the penalty under constraint condition, constraint condition is:

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln ({| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{h} - 1} {| {\underset{&OverBar;}{\hat{h}}}_{ij} |}^{2} = \frac{1}{{ML}_{h}} \end{matrix}

3. three dimensions sound localization method according to claim 1 and 2, is characterized in that, the length L of channel impulse response in described step B-2 _hrestrictive condition is:

L_{h} \leq 2 f_{s} \max_{i, j = 1}^{M} {τ_{ij, \max}}

4. three dimensions sound localization method according to claim 1, is characterized in that, in described steps A, to lay rule as follows for two L-type arrays: first microphone mic1 is placed in X-axis, and its coordinate is (d, 0,0); Second microphone mic2 and true origin are overlapping, and its coordinate is (0,0,0); The 3rd microphone mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is distance between two adjacent microphones, is real number, and first, second and third microphone has formed first L-type microphone array; Meanwhile, taking x=D/2 as axis of symmetry, set up second the L-type microphone array relative with first L-type microphone array, wherein, the coordinate of the 4th microphone mic4 is (D, 0,0), the coordinate of the 5th microphone mic5 is (D-d, 0,0), the coordinate of the 6th microphone mic6 is (D, d, 0); D is two distances between relative L-type microphone array.

5. according to the three dimensions sound localization method described in claim 1 or 4, it is characterized in that, in described step C, determine that the step of sound source position is as follows:

First group:

\sqrt{x^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{21}} (c^{2} {\hat{τ}}_{21}^{2} + 2 dx - d^{2})

With

\sqrt{x^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{23}} (c^{2} {\hat{τ}}_{23}^{2} + 2 dy - d^{2});

Second group:

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{45}} (c^{2} {\hat{τ}}_{45}^{2} d^{2} - 2 d (x - D))

With

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{46}} (c^{2} {\hat{τ}}_{46}^{2} + 2 Dx + D^{2} - 2 dy - d^{2});

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}}

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}}

\hat{z} = 0

In formula,

a_{1} = {\hat{τ}}_{23} / {\hat{τ}}_{21}; a_{2} = - {\hat{τ}}_{46} / {\hat{τ}}_{45}; b_{1} = (c^{2} {\hat{τ}}_{21} {\hat{τ}}_{23} + d^{2}) ({\hat{τ}}_{21} - {\hat{τ}}_{23}) / (2 d {\hat{τ}}_{21});

b_{2} = ((c^{2} {\hat{τ}}_{45} {\hat{τ}}_{46} + d^{2}) ({\hat{τ}}_{45} - {\hat{τ}}_{46}) + 2 dD {\hat{τ}}_{45}) / (2 d {\hat{τ}}_{45});

Step C-3, projection coordinate by the sound source described in step C-2 in xoy plane in first group described in substitution step C-1 and four relational expressions of second group, obtains four z axial coordinate estimated values of sound source respectively the mean value of getting these four z coordinates obtains the z coordinate estimation of sound source for:

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} .