CN106526541B

CN106526541B - Sound localization method based on distribution matrix decision

Info

Publication number: CN106526541B
Application number: CN201610893331.1A
Authority: CN
Inventors: 王建中; 叶凯; 曹九稳; 薛安克; 王天磊
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Electronic Science and Technology University
Priority date: 2016-10-13
Filing date: 2016-10-13
Publication date: 2019-01-18
Anticipated expiration: 2036-10-13
Also published as: CN106526541A

Abstract

The invention discloses a kind of sound localization methods based on distribution matrix decision.The present invention includes the following steps: 1, carries out acoustic array collected multiple channel acousto sound signal pretreatment to include framing；2, voice recognition algorithm is carried out to single-channel data, each frame obtains a voice recognition result；3, wideband voice positioning is carried out to multi-channel data, each frame obtains a sound positioning result；4, what is obtained through the above steps identifies and positions results set, respectively indicates the row and column of matrix, constructs distribution matrix.5, after obtaining distribution matrix, the positioning distribution peaks of target sound source are found；6, peak value and two neighboring angular interval are selected, the average statistical in these three sections is calculated.The present invention can be improved the accuracy rate of sound location algorithm result, particularly evident especially in the case where interfering obvious and environmental background complicated.And location algorithm is low to the dependence of recognition result, has broad applicability.

Description

Sound localization method based on distribution matrix decision

Technical field

The invention belongs to signal processing technology fields, more particularly to the sound localization method based on distribution matrix decision.

Background technique

In traditional sound location algorithm, there are the following problems:

1. poor anti jamming capability.Noiseless indoors, in muting situation, location algorithm accuracy rate is high, but outdoors In the case of complex environment, once occur noise or very be interference, positioning result will be produced a very large impact.

2. it is close to identify and position algorithm connection, and complements each other for sound signal processing field.Conventional location algorithm does not have but Have well using this point, lacks the assurance to information fusion technology advantage.

Summary of the invention

In view of the above problems, the present invention provides a kind of sound localization methods based on distribution matrix decision.Now with cross It is illustrated for ideophone array.

To achieve the goals above, the technical solution adopted by the present invention includes the following steps:

Step 1 pre-processes the collected four-way voice signal of acoustic array, and pretreatment includes framing；

Step 2 carries out voice recognition to single-channel data；

Step 3 carries out wideband voice positioning to multi-channel data；

Step 4 identifies and positions results set according to what step 2,3 obtained, constructs distribution matrix；

Step 5 after obtaining distribution matrix, finds the positioning distribution peaks of target sound source；

Step 6, selection peak value and its two neighboring angular interval, calculate the average statistical in these three sections, as finally Positioning result.

The step 1: live sound signal is obtained using cross acoustic array, note sample frequency is f_s.To four-way Voice signal carries out sub-frame processing, it is assumed that the frame number after framing is m.Next each frame signal after framing is handled.

The step 2: each frame single channel signal after taking framing is identified.

The algorithm identified to single channel signal is LPCC+SVM algorithm.

Each frame obtains a recognition result, to constitute the recognition result array C that length is m.

C=[c (1) c (2) c (m)]；

The step 3: each frame four-way signal after taking framing carries out broadband location algorithm.

The algorithm that the four-way signal carries out broadband positioning is broadband MUSIC algorithm

3-1, frequency band and centre frequency f are chosen as needed₀, the frequency band and centre frequency f₀It needs according to practical mesh The frequecy characteristic of signal is marked to be selected.

3-2, FFT Fourier transformation is done to each frame four-way signal, the model X of each frame four-way signal after transformation (f_j) indicate are as follows:

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J formula 1

A_θ(f_j) it is guiding vector, S (f_j) and N (f_j) it is sound-source signal and noise after FFT Fourier transformation respectively.

It is f that selected frequency band, which is divided into multiple frequencies, after transformation_jNarrow band signal combination.

3-3, using focussing matrix T, by frequency f where each narrowband_jPass through focus variations to centre frequency f₀Place is narrow Band, change procedure are as follows:

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀) formula 2

And centre frequency f is acquired by formula 3₀The autocorrelation matrix at place, for positioning:

3-4, to centre frequency f₀Place narrowband is positioned, and the positioning result of this frame data is obtained.Each frame corresponding one A positioning result, to constitute the positioning result array A that length is m.

A=[a (1) a (2) a (m)]

The step 4: the recognition result array C and positioning result array A obtained according to step 2 and step 3, construction point Cloth matrix M.

Using the value of recognition result array C as abscissa, using the angular configurations range of positioning result array A as ordinate, Traverse each frame as a result, building matrix M, wherein M (C_i,A_j) what is indicated is that recognition result is C in all frames_iPositioning result is A_jFrame number.

The step 5: after obtaining distribution matrix, pass through recognition result C_iFind the positioning distribution peaks of target sound source A_top。

The step 6: in recognition result C_iPositioning distribution on, select peak A_topAnd its two neighboring value A_top-1And A_top+1, the average statistical of matrix unit, formula can indicate where calculating these three values are as follows:

The wherein resolution ratio of P representing matrix ordinate angular interval.Such as 36 angular areas are divided by 360 degree of circumference Between, then resolution ratio P=10.

The present invention has the beneficial effect that:

Collected voice signal is identified and positioned algorithm by the present invention simultaneously, and constructs distribution matrix according to result, Final result is obtained by certain decision making algorithm.What the invention can make full use of in sound clip all identifies and positions letter Breath is distributed according to the positioning result of all frames under the premise of target sound is recognition result, obtains final positioning result. Advantage is can to maximize to reject interference and noise bring in voice signal and influence, and low to the dependence of recognizer, With broad applicability.

Detailed description of the invention

Fig. 1 is that the present invention proposes overall algorithm flow chart

Fig. 2 is position portion algorithm flow chart

Fig. 3 is the schematic diagram of distribution matrix

Fig. 4 is that 4 channel cross acoustic arrays establish the structure chart under rectangular coordinate system

Specific embodiment

It elaborates, is described below only as demonstration reconciliation to the present invention with reference to the accompanying drawings and detailed description It releases, it is intended that the present invention is limited in any way.

It is illustrated in figure 44 channel cross acoustic arrays and establishes the structure chart under rectangular coordinate system, wherein d is two phases The spacing of adjacent microphone；R is the radius of cross array；S (t) is sound source, its direction is θ；A, B, C, D in figure is right respectively It should be in channel 1, channel 2, channel 3, channel 4.Then signal is acquired, the signal for collecting 4 channels is always met together, is denoted as x respectively₁ (t), x₂(t), x₃(t), x₄(t)。

Guiding vector based on signal collected by cross battle array can indicate are as follows:

Wherein, the π of ω=2 f, f is signal frequency, τ_p(θ) (p=1,2,3,4) is the time delay between signal.Guiding vector exists Algorithm positioned below can be used.

Fig. 1 illustrates algorithm overview flow chart of the invention, according to the step in Fig. 1, connects by four-way acoustic array After having received four channel signals, pretreatment operation is carried out to it.Main pretreatment operation is framing.To four channels Signal does framing respectively, and framing length is 1024 sampled points, and step-length is the half of framing length.Assuming that after signal framing It is divided into the frame of m a length of 1024 sampled points, next our algorithm will be handled this each frame.

Firstly, carrying out recognizer to each frame single channel signal.

Any speech recognition algorithm can use, we are here with LPCC feature extraction and svm classifier learning algorithm Example illustrates.Wherein, we use 16 rank LPCC coefficients, the kernel function of SVM we choose radial basis function (Radial Basis Function, RBF), it is assumed that the sound type identified has C1, C2, C3, C4, C5 three types.

12 rank linear predictor coefficients (Linear Prediction Coefficients, LPC) value of every frame signal is acquired, Wherein LPC value can be solved using Levinson-Durbin algorithm.Followed by the corresponding relationship of LPCC value and LPC value Acquire the LPCC value of 16 ranks.

The sound fingerprint base method for building up is as follows:

The 16 rank LPCC values extracted to every frame signal by rows, a column are then added in front and are used as category, mark Number ' 0 ' represents C1, and ' 1 ' represents C2, and ' 2 ' represent C3, and ' 3 ' represent C4, and ' 4 ' represent C5.To constitute 17 ranks feature to Amount.

SVM algorithm is realized with the existing library libsvm, chooses RBF as classifier functions；There are two parameters by RBF: punishing Penalty factor c and parameter gamma can select optimal number by the grid search function opti_svm_coeff of libsvm Value.

Training process is using the svmtrain function in the library libsvm, and include four parameters: feature vector uses said extracted Labelled LPCC value out；Kernel function type selects RBF kernel function；RBF kernel functional parameter c and gamma, are searched using grid Rope method determines；To call can obtain the variable of an entitled model after svmtrain, the trained gained model letter of this variable save Breath, i.e. the sound fingerprint base, this variable save is got off and is identified for next step.

And sound is identified by the svmtest in the library libsvm to realize, LPCC value that every frame signal is obtained Carry out intelligent classification with the svmtest function of libsvm, there are three parameters by svmtest: first is category, for testing identification Not (when the sound to UNKNOWN TYPE identifies, which does not have practical significance) of rate；Second is feature vector, i.e., The variable of LPCC value is stored, it is exactly the return value of above-mentioned steps training process svmtrain function that third, which is Matching Model,.It adjusts It is exactly acquired results of classifying with the return value that svmtest is obtained, i.e. category, to can determine that the equipment class for generating this sound Type.

When in practical applications, feature extraction is carried out to signal, is then compared with established sound fingerprint base, to do To identification.

Then after this stage, we can obtain m recognition result, form array C

C=[c (1) c (2) c (m)]

Next, the present invention carries out location algorithm to the four-way signal of each frame.

Fig. 2 illustrates the specific flow chart of location algorithm part, including carries out FFT transform to subframe, to each narrowband The location algorithm of pre-estimation angle and broadband, we illustrate by taking MUSIC algorithm as an example herein.

For the autocorrelation matrix for seeking signal, this frame four-way signal is done into secondary framing, framing length is 256, and step-length is The half of frame length.To doing FFT Fourier transformation after sub- framing.The formula of FFT transform is as follows:

L is that subframe is long, as 256.

Data can indicate after FFT transform are as follows:

N is the number of sub-frames after secondary framing.

The signal frequency domain model then obtained can indicate are as follows:

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J

Whereinf_sIt is the sample frequency of signal.Since actual signal is mostly broadband signal, need to choose One suitable broadband frequency domain and center frequency points f₀。

Broadband signal can be regarded as multiple narrow band signals and constitute.Pass through focussing matrix T_jWe can make each narrowband Focusing transform is to centre frequency.

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀)

A (f) is the guiding vector to be used in location algorithm.

We first do the MUSIC location algorithm an of narrowband to each narrowband, as pre-estimation when seeking focussing matrix As a result.Steps are as follows:

First seek the signal autocorrelation matrix R of each narrow band frequency_f, to autocorrelation matrix R_fMake Eigenvalues Decomposition.

U in formula_SIt is the subspace by big characteristic value corresponding characteristic vector namely signal subspace, and U_NIt is by small The subspace of the corresponding characteristic vector of characteristic value namely noise subspace.The Power estimation function of MUSIC algorithm is

Θ indicates angle of visibility in formula.

It allows θ to scan in the observation fan face Θ, calculates formula in the corresponding functional value of each scan position, which peak value occurs Orientation, be denoted as β_j, as aspect.

Available β=[β after doing MUSIC location algorithm pre-estimation to each narrowband₁ β₂ ··· β_J]。

And then, we will construct focussing matrix by pre-estimation result.

T(f_j)=V (f_j)U(f_j)^H

Wherein U (f_j) and V (f_j) it is respectively A (f_j,β)A^H(f₀, β) left unusual and right singular vector.Gathered using a series of Burnt matrix T (f_j) transformation is focused to array received data, obtain the data autocorrelation matrix of single-frequency point

Equally, after having obtained autocorrelation matrix, we can try again to centre frequency narrowband MUSIC algorithm, just Positioning result to the end can be obtained.

After this stage, our available m positioning result forms array A.

A=[a (1) a (2) a (3) a (4) a (m)]

As shown in Figure 1, after obtaining positioning and recognition result, therefore we can construct distribution matrix M.Fig. 3 is illustrated The schematic diagram of distribution matrix.Abscissa is the possible value range section positioning result A.That ordinate indicates is recognition result C Possible value range.M(C_i,A_j) indicate that recognition result is C in all frames of this segment data_iPositioning result is A_jFrame it is total Number.

After obtaining distribution matrix statistics, just by the positioning distribution of the recognition result of target sound source, determining for target is acquired Position result.

The present invention selects recognition result for that a line of target sound source, and the positioning result distribution of target sound source can be obtained.It looks for To peak A_top, determine peak value and its two neighboring value A_top-1And A_top+1, the statistics where calculating this 3 values in matrix units is equal Value, as final positioning result.

Formula can indicate are as follows:

Claims

1. the sound localization method based on distribution matrix decision, it is characterised in that include the following steps:

Step 1 pre-processes the collected four-way voice signal of acoustic array；

The pretreatment is to the progress sub-frame processing of four-way voice signal；

Step 2 carries out voice recognition to each frame single channel signal；

Step 3 carries out wideband voice positioning to each frame four-way signal；

Step 4 identifies and positions results set according to what step 2,3 obtained, constructs distribution matrix M, wherein M (C_i,A_j) indicate It is that recognition result is C in all frames_iPositioning result is A_jFrame number；

Step 6, selection peak value and its two neighboring angular interval, calculate the average statistical in these three sections, as last determines Position result.

2. the sound localization method according to claim 1 based on distribution matrix decision, it is characterised in that the step 1: live sound signal being obtained using cross acoustic array, note sample frequency is f_s；Four-way voice signal is carried out at framing Reason, it is assumed that the frame number after framing is m；Next each frame signal after framing is handled.

3. the sound localization method according to claim 2 based on distribution matrix decision, which is characterized in that the step The algorithm that 2 pairs of each frame single channel signals carry out voice recognition is LPCC+SVM algorithm；

Each frame obtains a recognition result, to constitute the recognition result array C that length is m；

C=[c (1) c (2) ... c (m)].

4. the sound localization method according to claim 3 based on distribution matrix decision, which is characterized in that the step The algorithm that 3 pairs of each frame four-way signals carry out wideband voice positioning is broadband MUSIC algorithm, specific as follows:

3-1, frequency band and centre frequency f are chosen as needed₀, the frequency band and centre frequency f₀It needs to be believed according to realistic objective Number frequecy characteristic selected；

3-2, FFT Fourier transformation is done to each frame four-way signal, the model X (f of each frame four-way signal after transformation_j) table It is shown as:

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J formula 1

A_θ(f_j) it is guiding vector, S (f_j) and N (f_j) it is sound-source signal and noise after FFT Fourier transformation respectively；

It is f that selected frequency band, which is divided into multiple frequencies, after transformation_jNarrow band signal combination；

3-3, using focussing matrix T, by frequency f where each narrowband_jPass through focus variations to centre frequency f₀Place narrowband becomes Change process is as follows:

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀) formula 2

Wherein, A (f) is guiding vector；And centre frequency f is acquired by formula 3₀The autocorrelation matrix at place, for positioning:

3-4, to centre frequency f₀Place narrowband is positioned, and the positioning result of this frame data is obtained；The corresponding positioning of each frame As a result, to constitute the positioning result array A that length is m；

A=[a (1) a (2) ... a (m)].

5. the sound localization method according to claim 4 based on distribution matrix decision, it is characterised in that the step 4: the recognition result array C and positioning result array A obtained according to step 2 and step 3, construct distribution matrix M；

Using the value of recognition result array C as abscissa, using the angular configurations range of positioning result array A as ordinate, traversal Each frame as a result, building distribution matrix M, wherein M (C_i,A_j) what is indicated is that recognition result is C in all frames_iPositioning result is A_jFrame number.

6. the sound localization method according to claim 5 based on distribution matrix decision, it is characterised in that the step 5: after obtaining distribution matrix, passing through recognition result C_iFind the positioning distribution peaks A of target sound source_top。

7. the sound localization method according to claim 6 based on distribution matrix decision, it is characterised in that the step 6: in recognition result C_iPositioning distribution on, select peak A_topAnd its two neighboring value A_top-1And A_top+1, calculate these three value institutes In the average statistical of matrix unit, formula can be indicated are as follows:

Wherein, the resolution ratio of P representing matrix ordinate angular interval.