CN103995252B

CN103995252B - A kind of sound source localization method of three-dimensional space

Info

Publication number: CN103995252B
Application number: CN201410202062.0A
Authority: CN
Inventors: 郭业才; 朱赛男; 张宁; 黄友锐
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2014-05-13
Filing date: 2014-05-13
Publication date: 2016-08-24
Anticipated expiration: 2034-05-13
Also published as: CN103995252A

Abstract

The present invention provides the sound source localization method of three-dimensional space of a kind of improvement, belongs to sound localization technical field.The present invention includes: set up double L-shaped microphone array；On the basis of normalization frequency domain least-square methods, by introducing penalty, spectrum energy is modified, thus estimates the impulse response of different array element adaptively and calculation delay is poor；Utilize the relation of delay inequality and the double L-shaped microphone position obtained, determine the position coordinates by sound source.For more traditional localization method, array structure of the present invention is simple, amount of calculation is few, it is effectively improved noise robustness, anti-reverberation ability and positioning precision, it is more suitable for, for indoor three dimensional sound source location, can be widely applied to the every field such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robot.

Description

A kind of sound source localization method of three-dimensional space

Technical field

The invention belongs to sound localization technical field, especially relate to one and utilize microphone array to carry out sound source three-dimensional calmly The method of position.

Background technology

At present, sound localization based on microphone array is a major issue in acoustics signal processing field, compares In traditional Array Signal Processing, the voice signal that array microphone processes does not has carrier wave, it is possible to the range of signal of process is wide, suitable Should be able to power strong, have extensively in fields such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robots Application.Sound localization based on microphone array mainly has localization method based on steerable beam formation, based on arriving delay inequality Localization method and based on high-resolution localization method.Wherein, localization method based on steerable beam formation is to microphone array The voice signal that row receive is filtered, weighted sum, the most directly controls mike sensing and makes wave beam have maximum work output The direction of rate, because its computation complexity height can not be used for real time processing system；Utilize based on high-resolution localization method Solve the correlation matrix between Mike's signal to make deflection, thus make sound source position further, although be applied successfully to The process of some array signals, but locating effect is the best in actual applications, is not only limited by array structure but also at signal Time unstable, amount of calculation can be multiplied；And wherein, be by estimating that sound source arrives based on the sound localization method arriving delay inequality Delay inequality between mike also estimates sound source position according to the position of mike, do not limited by array structure, amount of calculation little.

Two parts are mainly divided to complete based on the sound localization method arriving delay inequality: time delay is estimated and location.Time delay is estimated Method mainly has broad sense cross-correlation method (Generalized Cross Correlation, GCC), minimum mean square error method (Least Mean Square, LMS) and self-adaptive features value decomposition method (Adaptive Engenvalue Decomposition, AED).Wherein, GCC method performance under reverberant ambiance can decline a lot；LMS method performance basic and it Quite；AED method is by estimating that double-channel impulse response suppresses reverberation, but it requires that double-channel is relatively prime, does not contains public Zero point.But, in actual indoor environment, impulse response length is the longest, and the relatively prime probability of double-channel is the least, AED side Method is the most applicable.In order to improve relatively prime probability, AED dual pathways method of estimation is generalized to certainly by Y (Arden) Huang et al. Adapt to multichannel method of estimation (Adaptive Multichannel, AMC), propose to utilize normalization multichannel frequency domain minimum all Fang Fangfa (Normalized Multichannel Frequency-domain Least Mean Square, NMCFLMS) comes Estimate the impulse response of each array element.The basic thought of NMCFLMS method is, observation signal is divided into continuous print block signal, adopts Channel estimation in frequency domain is carried out, though it is low to have multichannel frequency domain least-square methods complexity by frequency domain normalization minimum mean-square method With the feature of Newton method fast convergence rate, but may dissipate in the presence of channel additive noise, it is impossible to effectively carry out channel Estimate.

Localization method mainly has method of least square and geometry location method, and the former calculates complexity, and sensitive to initial value；The latter's profit Determine that bi-curved functional relationship, multiple bi-curved intersection points are and determine sound source by microphone position parameter and time delay estimated value Position, calculates simple, but not exclusive closed solution easily occurs, it is difficult to effectively position.

Summary of the invention

May dissipate in the presence of channel additive noise for sound localization method in prior art, it is impossible to effectively enter Row channel is estimated and is difficult to the defect of effective localization of sound source, and the present invention provides the three dimensions sound localization side of a kind of improvement Method, utilizes the method revised normalization multichannel frequency domain least-square methods (MNMCFLMS) and limit channel impulse response length Carry out channel estimation, be effectively increased the precision of sound localization.

In order to achieve the above object, the present invention provides following technical scheme:

A kind of sound source localization method of three-dimensional space, comprises the steps:

Step A, in three-dimensional cartesian coordinate system, sets up two at grade and the L-type microphone array that is oppositely arranged Row；

Step B, uses the normalization multichannel frequency domain least-square methods revised to estimate that sound-source signal arrives each mike Delay inequality, comprise the steps:

Step B-1, sound-source signal s (n) is through the channel impulse response h of i-th mike_i(n) afterwards with channel additive noise v_i N () merges, obtain the reception signal x of i-th mike_i(n): x_i(n)=s^T(n)*h_i(n)+v_i(n)；

Wherein, T representing matrix transposition operation；N is time series, is integer；

When disregarding channel additive noise, the reception signal x of i-th mike_iThe reception letter of (n) and jth mike Number x_jN the relation between () is:

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n);

Step B-2, uses a length of 2L_hRectangular window function w (n) the reception signal x to i-th mike_iN () adds Window processes, and the n-th frame reception signal obtaining i-th mike is:

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} ({nL}_{h} - L_{h}), x_{i} ({nL}_{h} - L_{h} + 1), ..., x_{i} ({nL}_{h} + L_{h} - 1)]}^{T};

In formula, L_hRepresent channel impulse response h_iN the length of (), for integer；N=1 ..., int (N/L_h-1)；N is Mike Wind receives the sequence total length of signal, for integer；int(N/L_h-1) it is to N/L_h-1 round downwards after the integer that obtains；

Step B-3, utilizes Fourier transformation that the time-domain signal obtained in step B-2 is transformed to frequency-region signal:

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1};

In formula,For 2L_h×2L_hDimension Fourier transform matrix；

Step B-4, introducing penalty, to spectrum energy correction, estimates channel frequency domain response adaptively

As channel additive noise v_iN, in the presence of (), error of frequency domain function is defined as

{\underset{&OverBar;}{e}}_{i j} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\hat{\underset{&OverBar;}{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\hat{\underset{&OverBar;}{h}}}_{i} (n)

In formula

D_{i} = d i a g (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j；I, j=1,2 ..., M,It is respectively L_h×L_hDimension Fourier transform matrix and Fourier become Change inverse matrix；Diag represents diagonal matrix；RepresentDimension unit matrix；Represent L_h×L_hDimension null matrix；

Utilize penalty J_pN () passes through Lagrange multiplier β (n) the cost function J to NMCFLMS_fN () is modified, The cost function obtaining revising is

J_mod(n)=J_f(n)+β(n){-J_p(n)}

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{{|| &dtri; J_{p} (n) ||}^{2}}

In formula, the cost function gradient revised when the value of Lagrange multiplier β (n) is by stable state is equal to zero, i.e. J_f(n)= β(n)▽J_pN obtaining time (), H represents conjugate transpose；

By the cost function J revised_modN (), obtains channel frequency domain responseMore new formula be

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \underset{&OverBar;}{\hat{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{{|| &dtri; J_{p}^{10} (n) ||}^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + {δI}_{2 L_{h} \times 2 L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{k i}^{01} (n)

In above formula,

P_{i} (n) = {λP}_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{k i}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{k i} (n)

In formula, K (n) is that diagonal entry isDiagonal matrix；P_iN () is the frequency spectrum of multi-channel output signal Energy, can obtain more stable spectrum energy by forgetting factor λ,Parameter δ is normal number, its energy The noise scale-up problem that effectively solution spectrum energy causes time less；

Step B-5, to channel frequency domain responseCarrying out Fourier inversion, obtaining channel impulse response estimation is

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n)

In formula,For 2L_h×2L_hDimension Fourier transformation inverse matrix；

Step B-6, in adaptive process, the channel impulse response of estimationThe peak value correspondence time delay of middle appearance is exactly The reception direct sound wave signal time delay of i-th mike, then sound-source signal arrives between i-th mike and jth mike Delay inequality is

{\hat{τ}}_{i j} = ({m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{i l} (n) | - {m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{j l} (n) |) / f_{s}

In formula, f_sFor the sample frequency of signal, max represents and takes maximum；

Step C, the delay inequality described in step B being multiplied with the velocity of sound obtains sound-source signal and arrives the range difference of each mike, And according to the position relationship of each mike, determine the position of described sound source.

Further, described J_pN () is the penalty under the conditions of constraint, constraints is

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln ({| {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |}^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{n} - 1} {| {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |}^{2} = \frac{1}{{ML}_{h}} \end{matrix}

In formula, in penalty J_pWhen () maximizes n, constraintsSet up；S.t. represent about Bundle condition.

Further, length L of channel impulse response in described step B-2_hRestrictive condition is:

L_{h} \leq 2 f_{s} {m a x}_{i, j = 1}^{M} {τ_{i j, m a x}}

In formula, τ_ij,max=d_ij/ c is that between i-th and jth mike, maximum delay is poor, d_ijFor i-th and jth Distance between mike, c is the spread speed of sound source.

Further, in described step A two L-type arrays to lay rule as follows: first mike mic1 is placed in X-axis On, its coordinate is (d, 0,0)；Second mike mic2 is overlapping with zero, and its coordinate is (0,0,0)；3rd Mike Wind mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is the spacing of two neighboring microphones, for real number, first, second and third wheat Gram wind constitutes first L-type microphone array；Meanwhile, with x=D/2 as axis of symmetry, set up and first L-type microphone array Second relative L-type microphone array, wherein, the coordinate of the 4th mike mic4 is (D, 0,0), the 5th mike The coordinate of mic5 is (D-d, 0,0), and the coordinate of the 6th mike mic6 is (D, d, 0)；D is two relative L-type mikes Distance between array.

Further, in described step C, determine that the step of sound source position is as follows:

Step C-1, position is that (x, y, sound-source signal z) arrives i-th mike and the delay inequality of jth mike With i-th mike and jth microphone space from d_ijRelation beRelation in detail has following two groups:

First group:With

Second group:With

In formula,For the delay inequality between sound-source signal to second mike and first mike；Believe for sound source Number to second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th Mike The delay inequality of wind,For sound-source signal to the 4th mike and the delay inequality of the 6th mike；

Step C-2, according to first group described in step C-1 and second group of relation, obtains sound source projection in xoy plane and sits It is designated as

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}}

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}}

\hat{z} = 0

In formula,

Step C-3, substitutes into described in step C-1 respectively by the projection coordinate in xoy plane of the sound source described in step C-2 First group, with four relational expressions of second group, obtains four z-axis coordinate estimated values of sound sourceTake this four z The meansigma methods of coordinate is estimated as the z coordinate of sound sourceFor:

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} .

Compared with prior art, the invention have the advantages that and beneficial effect:

Spectrum energy, on the basis of normalization frequency domain least-square methods, is repaiied by the present invention by introducing penalty Just, the impulse response the calculation delay that thus estimate different array element adaptively are poor, it is to avoid channel estimates occur deteriorating；Utilize In plane, two crossing straight lines uniquely determine sound source projected position planar, it is to avoid going out of not exclusive closed solution problem Existing；Additionally define channel impulse response length, enhance the relatively prime property of double-channel, improve anti-reverberation ability.More traditional For localization method, array structure of the present invention is simple, amount of calculation is few, be effectively improved noise robustness, anti-reverberation ability and Positioning precision, is more suitable for for indoor three dimensional sound source location, can be widely applied to vehicle-carried hands-free telephone, video conferencing system, The every field such as speech recognition system and intelligent robot.

Accompanying drawing explanation

The flow chart of steps of the sound source localization method of three-dimensional space that Fig. 1 provides for the present invention；

Fig. 2 is double L-shaped microphone array setting principle figure；

Fig. 3 is the normalization multichannel frequency domain least-square methods schematic diagram revised；

Fig. 4 be in embodiment microphone array be listed in room put schematic diagram；

Fig. 5 is normalized projection error convergence curve figure.

Detailed description of the invention

The technical scheme provided the present invention below with reference to specific embodiment is described in detail, it should be understood that following specifically Embodiment is merely to illustrate the present invention rather than limits the scope of the present invention.

First normalization multichannel frequency domain least-square methods NMCFLMS is improved by the present invention, increases its performance；With Time limit channel impulse response length, strengthen the relatively prime property of double-channel；And utilize the NMCFLMS of correction to carry out channel estimation.Specifically Ground is said, flow chart is as it is shown in figure 1, comprise the steps:

Step A, sets up three-dimensional coordinate system, at grade, 6 mikes is put into two relative L-type arrays, Their spacing is D, as shown in Figure 2: first mike mic1 is placed in X-axis, and its coordinate is (d, 0,0)；Second wheat Gram wind mic2 is overlapping with zero, and its coordinate is (0,0,0)；3rd mike mic3 is placed in Y-axis, its coordinate be (0, d, 0), d is the spacing of two neighboring microphones, and for real number, first, second and third mike constitutes first L-type microphone array Row.Meanwhile, with x=D/2 as axis of symmetry, set up second the L-type microphone array relative with first L-type microphone array, Wherein, the coordinate of the 4th mike mic4 is (D, 0,0), and the coordinate of the 5th mike mic5 is (D-d, 0,0), the 6th The coordinate of mike mic6 is (D, d, 0)；D is the distance between two relative L-type microphone arrays.The coordinate of sound source (x, y, Z) projection coordinate in xoy plane is (x, y, 0), and the distance between projection coordinate's point and initial point is γ, and both lines and x Axle forward angle is θ.

Step B, uses normalization multichannel frequency domain least-square methods MNMCFLMS revised, and estimates that sound-source signal arrives The delay inequality of each mike, its schematic diagram as shown in Figure 3:

Step B-1, sound-source signal s (n), the channel impulse response h of i-th mike_i(n), channel additive noise v_i(n) And the reception signal x of i-th mike_iN the relation between () is

x_i(n)=s^T(n)*h_i(n)+v_i(n) (1)

In formula, T representing matrix transposition operates；N is time series, is integer.

When disregarding channel additive noise, the reception signal x of i-th mike_iThe reception letter of (n) and jth mike Number x_jN the relation between () is

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n) - - - (2)

In formula, i, j=1,2 ..., M, M are the number of mike, for positive integer；

Step B-2, carries out frame and moves as L the reception signal of i-th mike_hWindowing process: use a length of 2L_hRectangle Window function w (n) is multiplied by the reception signal of i-th mike, and the n-th frame reception signal obtaining i-th mike is:

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} ({nL}_{h} - L_{h}), x_{i} ({nL}_{h} - L_{h} + 1), ..., x_{i} ({nL}_{h} + L_{h} - 1)]}^{T} - - - (3)

In formula, L_hRepresent channel impulse response h_iLength, for integer；N=1 ..., int (N/L_h-1)；N is that mike connects Receive the sequence total length of signal, for integer；int(N/L_h-1) it is to N/L_h-1 round downwards after the integer that obtains；

In actual indoor environment, in order to overcome, channel impulse response length is the longest causes the relatively prime probability of double-channel the least Deficiency, guarantee double-channel relatively prime stronger time, can be further to channel impulse response length L_hLimit, restrictive condition For:

L_{h} \leq 2 f_{s} {m a x}_{i, j = 1}^{M} {{\hat{τ}}_{i j, m a x}} - - - (4)

In formula,Poor for maximum delay between i-th and jth mike.Ring by limiting channel impulse Answer length, effectively strengthen the relatively prime property of double-channel.

Step B-3, obtains frequency-region signal to formula (3) as Fourier transformation

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1} - - - (5)

In formula,For 2L_h×2L_hDimension Fourier transform matrix.

Step B-4, introducing penalty, to spectrum energy correction, estimates adaptively and estimates channel frequency adaptively Domain response

{\underset{&OverBar;}{e}}_{i j} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\hat{\underset{&OverBar;}{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\hat{\underset{&OverBar;}{h}}}_{i} (n) - - - (6)

In formula

D_{i} = d i a g (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j；I, j=1,2 ..., M,It is respectively L_h×L_hDimension Fourier transform matrix and Fourier become Change inverse matrix；Diag represents diagonal matrix；Represent L_h×L_hDimension unit matrix；Represent L_h×L_hDimension null matrix.

Utilize normalization multichannel frequency domain least-square methods (Normalized Multichannel Frequency- Domain Least Mean Square, NMCFLMS), its cost function is

J_{f} (n) = Σ_{i = 1}^{M - 1} Σ_{j = i + 1}^{M} {\underset{&OverBar;}{e}}_{i j}^{H} (n) {\underset{&OverBar;}{e}}_{i j} (n) - - - (7)

In the presence of channel additive noise, utilize formula (7) to carry out channel frequency domain response and estimate, may dissipate, make Obtain channel to estimate to lose efficacy.For making NMCFLMS method in the presence of channel additive noise, there is good channel estimating performance, permissible Acoustical signal spectrum energy is utilized to have equally distributed characteristic.In order to make sound spectrum be uniformly distributed, mean value theorem pair is utilized to rush with channel The frequency band energy swashing response corresponding retrains, and constraints is

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln ({| {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |}^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{n} - 1} {| {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |}^{2} = \frac{1}{{ML}_{h}} \end{matrix} - - - (8)

In formula, J_pN () is the penalty under the conditions of constraint, in penalty J_pWhen () maximizes n, constraintsSet up；S.t. constraints is represented.

In order to make full use of the advantage of NMCFLMS and make spectrum energy have equally distributed characteristic, penalty J_p(n) By Lagrange multiplier β (n) the cost function J to NMCFLMS_fN () is modified, obtain revising NMCFLMS (MNMCFLMS) cost function is

J_mod(n)=J_f(n)+β(n){-J_p(n)} (9)

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{{|| &dtri; J_{p} (n) ||}^{2}} - - - (10)

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n) - - - (11)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \hat{\underset{&OverBar;}{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{{|| &dtri; J_{p}^{10} (n) ||}^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + {δI}_{2 L_{h} \times 2 L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{k i}^{01} (n)

In above formula,

P_{i} (n) = {λP}_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{k i}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{k i} (n)

Revise normalization multichannel frequency domain least-square methods, it is possible to estimate the impulse response of different array element adaptively And calculation delay is poor, it is to avoid channel estimates occur deteriorating；

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - - - (12)

In formula,For 2L_h×2L_hDimension Fourier transformation inverse matrix；

Step B-6, according to the channel impulse response estimatedCarry out time delay estimation.In adaptive process,One peak value of middle appearance, peak value correspondence time delay is exactly the reception direct sound wave signal time delay of i-th mike, then sound source letter Number delay inequality arrived between i-th mike and jth mike is

{\hat{τ}}_{i j} = ({m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{i l} (n) | - {m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{j l} (n) |) / f_{s} - - - (13)

Use d_ijRepresenting the distance between i-th and jth mike, c is the spread speed of sound source.When the position of sound source is sat It is designated as that (x, y, time z), sound-source signal is to delay inequality τ between i-th mike and jth mike_ijIt is a constant, at this moment

d_{i j} = c . {\hat{τ}}_{i j} - - - (14)

Formula (14) includes following concrete equation:

First group:

\sqrt{x^{2} + y^{2} + z^{2}} - \sqrt{x^{2} + {(y - d)}^{2} + z^{2}} = c . {\hat{τ}}_{23} - - - (16)

Second group:

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} - \sqrt{x^{2} + {(y - d)}^{2} + z^{2}} = c . {\hat{τ}}_{46} - - - (18)

In formula,For sound-source signal to second mike and the delay inequality of first mike,Arrive for sound-source signal Second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th mike Delay inequality,For sound-source signal to the 4th mike and the delay inequality of the 6th mike.

By formula (15) and formula (16),

Y=a₁x+b₁ (19)

In formula

a_{1} = {\hat{τ}}_{23} / {\hat{τ}}_{21} - - - (20)

b_{1} = (c^{2} {\hat{τ}}_{21} {\hat{τ}}_{23} + d^{2}) ({\hat{τ}}_{21} - {\hat{τ}}_{23}) / (2 d {\hat{τ}}_{21}) - - - (21)

By formula (17) and (18),

Y=a₂x+b₂ (22)

In formula

a_{2} = - {\hat{τ}}_{46} / {\hat{τ}}_{45} - - - (23)

b_{2} = ((c^{2} {\hat{τ}}_{45} {\hat{τ}}_{46} + d^{2}) ({\hat{τ}}_{45} - {\hat{τ}}_{46}) + 2 d D {\hat{τ}}_{45}) / (2 d {\hat{τ}}_{45}) - - - (24)

By formula (19) and (22), obtaining sound source in projection coordinate's estimated value of xoy plane is

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}} - - - (25)

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}} - - - (26)

\hat{z} = 0 - - - (27)

Formula (25)-formula (26) is substituted into formula (15)-formula (18) respectively, obtains four estimated values of sound source Z coordinate, remember respectively ForEstimate, i.e. as the accurate of sound source Z coordinate by the meansigma methods of these four estimated values

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} - - - (28)

The value obtained with (28) by formula (25), (26), it is simply that the coordinate figure of sound localization.

Embodiment one:

In order to verify the performance of the inventive method, we carry out performance detection for the present invention.In the detection, Fig. 4 is room Simulation drawing put by interior mike.The acoustic enviroment of common conference room, room-sized is 6m × 5m × 3.5m.Microphone array frame Being located on the position of the most about 0.5m, 6 mikes are arranged in double L-shaped microphone array as shown in Figure 4, mike from The distance on wall and ground is respectively 0.5m, 0.5m, and the distance between adjacent two mikes is 1m, L-type microphone array it Between distance be 5m.In view of mike from wall apart from little, reverberation is big, actual application value is little, therefore, the moon in Fig. 4 Shadow part is not considered, and sound source is positioned over any position within two L-type mikes.Sound-source signal is one section of prior recording Male voice read aloud, its sample frequency is 25kHz.The position laid according to mike, length L of channel impulse response_h=370, Parameter δ=0.1556 × 10 of MNMCFLMS^-5, μ=0.5.

Fig. 5 is SNR=20dB, mixing time delay RT₆₀During=100ms, normalized projection error (Normalized Projection Misalignment, NPM) convergence curve, projection error NPM is

N P M (n) = 20 \log_{10} (|| h (n) - \frac{h {(n)}^{T} \hat{h} (n)}{\hat{h} {(n)}^{T} \hat{h} (n)} \hat{h} (n) || / || h (n) ||) - - - (29)

In formula, h (n) is the true impulse response of channel,The impulse response obtained is estimated for channel self-adapting.

Fig. 5 shows, under having channel additive noise environment, NMCFLMS method just dissipates when iteration about 450 times.And The inventive method MNMCFLMS with regard to stable convergence, is effectively prevented from the channel that channel additive noise causes when iteration about 450 times Estimate to deteriorate.

Under different reverberant ambiance when SNR=20dB, use the inventive method MNMCFLMS and classical phse conversion Weighting broad sense cross-correlation function method (Generalized Cross Correlation-phase Transform, GCC-PHAT) Carrying out time delay and estimate performance comparison, result is as shown in table 1 below:

Under the low reverberant ambiance of table 1, the inventive method contrasts with the performance indications of GCC-PHAT method

Table 1 uses non-abnormity point percentage ratio (Percentage of Non-abnormal Point, PNP) and root-mean-square Error (Root Mean Square Error, RMSE) is as the index of measure algorithm performance.Wherein, PNP and RMSE is fixed respectively Justice is

P N P = (1 - \frac{1}{N} Σ_{i = 1}^{N} T_{P} (τ_{i} - τ_{0})) \times 100 % - - - (30)

R M S E = \sqrt{\frac{1}{N} Σ_{i = 1}^{N} {(τ_{i} - τ_{0})}^{2}} - - - (31)

In formula

T_{P} (x) = {\begin{matrix} 0, & | x | \leq 2 / f_{s} \\ 1, & | x | > 2 / f_{s} \end{matrix} - - - (32)

In formula, N represents time delay appreciable amt, τ₀The actual value that time delay is estimated.Here, more than 2 will be differed with actual value The time delay estimated value of sample point is as an abnormity point, and non-abnormal percentage ratio is the highest, and method performance is the best.Table 1 shows, through too much Secondary experiment, under low reverberant ambiance, the inventive method is suitable with GCC-PHAT method performance；When reverberation is more than 250ms, GCC- The non-dissimilarity percentage ratio of PHAT method significantly declines, root-mean-square error is much larger than sampling precision, it is impossible to effectively carries out time delay and estimates Meter；But, the performance of the inventive method is uninfluenced, in the environment of reverberation is less than 700ms, remains to estimation time delay effectively Value.

Under room model, real sound source is tested, detect the performance of sound localization method of the present invention.Indoor Ambient parameter is as follows: SNR=20dB, RT₆₀=150ms.The result of sound localization, uses absolute errorMake For the judgment criteria of positioning performance, and For the coordinate estimated value of sound source position, right The present invention carries out many experiments checking, and result is as shown in table 2 below:

The test result of table 2 the inventive method

Knowable to upper table 2, the absolute error Stable distritation of this method orientation distance γ within 10cm, horizontal angle θ exhausted To error control within 1 °, highly absolute error Stable distritation is within 10cm.

Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, also includes The technical scheme being made up of above technical characteristic combination in any.It should be pointed out that, for those skilled in the art For, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications are also considered as Protection scope of the present invention.

Claims

1. a sound source localization method of three-dimensional space, it is characterised in that comprise the steps:

Step A, in three-dimensional cartesian coordinate system, sets up two at grade and the L-type microphone array that is oppositely arranged；

Step B, uses normalization multichannel frequency domain least-square methods NMCFLMS revised to estimate that sound-source signal arrives each Mike The delay inequality of wind, comprises the steps:

Step B-1, sound-source signal s (n) is through the channel impulse response h of i-th mike_i(n) afterwards with channel additive noise v_i(n) Merge, obtain the reception signal x of i-th mike_i(n): x_i(n)=s^T(n)*h_i(n)+v_i(n)；

When disregarding channel additive noise, the reception signal x of i-th mike_iThe reception signal x of (n) and jth mike_j N the relation between () is:

x_{i}^{T} (n) h_{j} (n) = x_{j}^{T} (n) h_{i} (n);

Step B-2, uses a length of 2L_hRectangular window function w (n) the reception signal x to i-th mike_iN () is carried out at windowing Reason, the n-th frame reception signal obtaining i-th mike is:

x_{i} {(n)}_{2 L_{h} \times 1} = {[x_{i} ({nL}_{h} - L_{h}), x_{i} ({nL}_{h} - L_{h} + 1), ..., x_{i} ({nL}_{h} + L_{h} - 1)]}^{T};

In formula, L_hRepresent channel impulse response h_iN the length of (), for integer；N=1 ..., int (N/L_h-1)；N is that mike connects Receive the sequence total length of signal, for integer；int(N/L_h-1) it is to N/L_h-1 round downwards after the integer that obtains；

X_{i} {(n)}_{2 L_{h} \times 1} = F_{2 L_{h} \times 2 L_{h}} x_{i} {(n)}_{2 L_{h} \times 1};

In formula,For 2L_h×2L_hDimension Fourier transform matrix；

{\underset{&OverBar;}{e}}_{i j} (n) = W_{L_{h} \times 2 L_{h}}^{01} D_{i} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{j} (n) - W_{L_{h} \times 2 L_{h}}^{01} D_{j} (n) W_{2 L_{h} \times L_{h}}^{10} {\underset{&OverBar;}{\hat{h}}}_{i} (n)

In formula

D_{i} = d i a g (F_{2 L_{h} \times 2 L_{h}} (x_{i} {(n)}_{2 L_{h} \times 1}));

W_{L_{h} \times 2 L_{h}}^{01} = F_{L_{h} \times L_{h}} [\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1};

W_{2 L_{h} \times L_{h}}^{10} = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1};

In formula, i ≠ j；I, j=1,2 ..., M,It is respectively L_h×L_hDimension Fourier transform matrix and Fourier transformation are inverse Matrix；Diag represents diagonal matrix；Represent L_h×L_hDimension unit matrix；Represent L_h×L_hDimension null matrix；

Utilize penalty J_pN () passes through Lagrange multiplier β (n) the cost function J to NMCFLMS_fN () is modified, obtain The cost function revised is

J_mod(n)=J_f(n)+β(n){-J_p(n)}

In formula

β (n) = \frac{| &dtri; J_{p}^{H} (n) &dtri; J_{f} (n) |}{| | &dtri; J_{p} (n) | |^{2}}

In formula, the cost function gradient revised when the value of Lagrange multiplier β (n) is by stable state is equal to zero, i.e. J_f(n)=β (n) ▽J_pN obtaining time (), H represents conjugate transpose；

{\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n + 1) = {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n) - &dtri; J_{f}^{01} (n) + \hat{β} (n) &dtri; J_{p}^{10} (n)

Wherein,

&dtri; J_{p}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} K (n) F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} \underset{&OverBar;}{\hat{h}} (n)

\hat{β} (n) = \frac{| &dtri; J_{p}^{10} {(n)}^{H} &dtri; J_{f}^{01} (n) |}{| | &dtri; J_{p}^{10} (n) | |^{2}}

&dtri; J_{f}^{01} (n) = μ {(P_{i} (n) + {δI}_{2 L_{h} \times 2 L_{h}})}^{- 1} \times Σ_{k = 1}^{M} D_{k}^{*} (n) {\underset{&OverBar;}{e}}_{k i}^{01} (n)

In above formula,

P_{i} (n) = {λP}_{i} (n - 1) + (1 - λ) \times Σ_{k = 1, k &NotEqual; i}^{M} D_{k}^{*} (n) D_{k} (n)

{\underset{&OverBar;}{e}}_{k i}^{01} (n) = F_{2 L_{h} \times 2 L_{h}} {[\begin{matrix} 0_{L_{h} \times L_{h}} & I_{L_{h} \times L_{h}} \end{matrix}]}^{T} F_{L_{h} \times L_{h}}^{- 1} {\underset{&OverBar;}{e}}_{k i} (n)

In formula, K (n) is that diagonal entry isDiagonal matrix；P_iN () is the spectrum energy of multi-channel output signal, Forgetting factorParameter δ is normal number；

{\hat{h}}_{i} (n) = [\begin{matrix} I_{L_{h} \times L_{h}} & 0_{L_{h} \times L_{h}} \end{matrix}] F_{2 L_{h} \times 2 L_{h}}^{- 1} {\underset{&OverBar;}{\hat{h}}}_{i}^{10} (n)

In formula,For 2L_h×2L_hDimension Fourier transformation inverse matrix；

Step B-6, in adaptive process, the channel impulse response of estimationThe peak value correspondence time delay of middle appearance is i-th The reception direct sound wave signal time delay of mike, then sound-source signal arrives the delay inequality between i-th mike and jth mike For

{\hat{τ}}_{i j} = ({m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{i l} (n) | - {m a x}_{l = 1}^{L_{h}} | {\hat{h}}_{j l} (n) |) / f_{s}

Step C, the delay inequality described in step B being multiplied with the velocity of sound obtains sound-source signal and arrives the range difference of each mike, and root According to the position relationship of each mike, determine the position of described sound source.

Sound source localization method of three-dimensional space the most according to claim 1, it is characterised in that described J_pN () is for constraint under the conditions of Penalty, constraints is:

\{\begin{matrix} \max & J_{P} (n) = Σ_{i = 1}^{M} Σ_{j = 0}^{L - 1} \ln (| {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |^{2}) \\ s . t . & Σ_{i = 1}^{M} Σ_{j = 0}^{L_{h} - 1} | {\hat{\underset{&OverBar;}{h}}}_{i j} (n) |^{2} = \frac{1}{{ML}_{h}} \end{matrix}

In formula, in penalty J_pWhen () maximizes n, constraintsSet up；S.t. constraint bar is represented Part.

Sound source localization method of three-dimensional space the most according to claim 1 and 2, it is characterised in that channel in described step B-2 Length L of impulse response_hRestrictive condition is:

L_{h} \leq 2 f_{s} {m a x}_{i, j = 1}^{M} {τ_{i j, m a x}}

In formula, τ_ij,max=d_ij/ c is that between i-th and jth mike, maximum delay is poor, d_ijFor i-th and jth mike Between distance, c is the spread speed of sound source.

Sound source localization method of three-dimensional space the most according to claim 1, it is characterised in that two L-type battle arrays in described step A It is as follows that row lay rule: first mike mic1 is placed in X-axis, and its coordinate is (d, 0,0)；Second mike mic2 and seat Mark initial point is overlapping, and its coordinate is (0,0,0)；3rd mike mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is two phases The spacing of adjacent mike, for real number, first, second and third mike constitutes first L-type microphone array；Meanwhile, with x =D/2 is axis of symmetry, sets up second the L-type microphone array relative with first L-type microphone array, wherein, the 4th The coordinate of mike mic4 is (D, 0,0), and the coordinate of the 5th mike mic5 is (D-d, 0,0), the 6th mike mic6 Coordinate be (D, d, 0)；D is the distance between two relative L-type microphone arrays.

5. according to the sound source localization method of three-dimensional space described in claim 1 or 4, it is characterised in that in described step C, determine The step of sound source position is as follows:

Step C-1, position is that (x, y, sound-source signal z) arrives i-th mike and the delay inequality of jth mikeWith I mike and jth microphone space are from d_ijRelation beRelation in detail has following two groups:

First group:With

\sqrt{x^{2} + y^{2} + z^{2}} = \frac{1}{2 c {\hat{τ}}_{23}} (c^{2} {\hat{τ}}_{23}^{2} + 2 d y - d^{2});

Second group:With

\sqrt{{(x - D)}^{2} + y^{2} + z^{2}} = - \frac{1}{2 c {\hat{τ}}_{46}} (c^{2} {\hat{τ}}_{46}^{2} + 2 D x + D^{2} - 2 d y - d^{2});

In formula,For the delay inequality between sound-source signal to second mike and first mike；Arrive for sound-source signal Second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th mike Delay inequality,For sound-source signal to the 4th mike and the delay inequality of the 6th mike；

Step C-2, according to first group described in step C-1 and second group of relation, obtaining sound source projection coordinate in xoy plane is

\hat{x} = \frac{b_{2} - b_{1}}{a_{1} - a_{2}}

\hat{y} = \frac{a_{1} b_{2} - a_{2} b_{1}}{a_{1} - a_{2}}

\hat{z} = 0

In formula,

b_{2} = ((b^{2} {\hat{τ}}_{45} {\hat{τ}}_{46} + d^{2}) ({\hat{τ}}_{45} - {\hat{τ}}_{46}) + 2 dD {\hat{τ}}_{45}) / (2 d {\hat{τ}}_{45});

Step C-3, substitutes into first described in step C-1 respectively by the projection coordinate in xoy plane of the sound source described in step C-2 Group, with four relational expressions of second group, obtains four z-axis coordinate estimated values of sound sourceTake these four z coordinate Meansigma methods obtains the z coordinate of sound source and estimatesFor:

\hat{z} = \frac{{\hat{z}}_{1} + {\hat{z}}_{2} + {\hat{z}}_{3} + {\hat{z}}_{4}}{4} .