CN1545086A - Voice signal time delay estimating method based on ear hearing characteristics - Google Patents
Voice signal time delay estimating method based on ear hearing characteristics Download PDFInfo
- Publication number
- CN1545086A CN1545086A CNA2003101134838A CN200310113483A CN1545086A CN 1545086 A CN1545086 A CN 1545086A CN A2003101134838 A CNA2003101134838 A CN A2003101134838A CN 200310113483 A CN200310113483 A CN 200310113483A CN 1545086 A CN1545086 A CN 1545086A
- Authority
- CN
- China
- Prior art keywords
- time delay
- voice signal
- subband
- signal
- cross correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a voice signal time delay estimating method based on the hearing property of human ear for estimating the time delay between two voice signals of the same source, including: (1) dividing the two voice signals into two sub-band signals; (2) making a relative operation on each sub-band signal, thus obtaining two sub-band relative functions together; (3) adding the two functions according to weights to obtain the sum; (4) obtaining the time delay according to the sum. It introduces the hearing property into time delay estimation, skillfully and determines the weight values according to signal-noise ratio of each sub-band, making the algorithm have better robustness to the noise.
Description
Technical field
The present invention relates to the microphone array technology in the computer technology application, more particularly, the present invention relates to the time delay estimation method of a kind of voice signal in the microphone array technology.
Background technology
So-called time delay was meant between the same source signal that different microphone/sensors receive in microphone/sensor array owing to the different mistimings that cause of signal transmission distance.Time delay estimates that (TDE-Time DelayEstimation) is exactly theory and the method for utilizing parameter estimation and signal Processing, and above-mentioned time delay is estimated and measured.
In various speech processing systems based on microphone array, it is a basic problem that time delay is estimated.For example, utilize microphone array that the talker is positioned, its basic thought is exactly to determine target direction and distance according to the time delay between each channel signal that estimates.In the microphone array speech-enhancement system, estimate the time delay between each road voice signal, making it to keep is a precondition of carrying out subsequent processes synchronously, time delay in the system is accurately estimated and done delay compensation, make the pointing direction of microphone array consistent with talker's direction, this is to implement the matter of utmost importance that various microphone array sound enhancement methods need be solved.Because the existence that noise and voice disturb also exists reverberation or echo problem under some occasion, and the talker moves often, and this brings very big difficulty all for the time delay estimation in the array.
Broad sense simple crosscorrelation Time Delay Estimation Method (GCC-Generalized Cross-Correlation) is the most widely used algorithm of research, and its formula is
Its essence is exactly the result who cross correlation function is added a wave filter, wherein, and φ
Ij(f) for being numbered the microphone signal x of i and j in the microphone array
i(k) and x
j(k) cross-power spectrum between, ψ
Ij(f) be weighting function.When utilizing GCC to carry out the time delay estimation, according to weighting function ψ
Ij(f) and cross-power spectrum φ
Ij(f) calculate broad sense cross correlation function R
Ij(τ), determine the position of its peak point, the pairing τ of this peak value is the time delay between signal.In actual applications, at different noises and reverberation situation, can select different weighting function ψ
Ij(f), make R
Ij(τ) has more sharp-pointed peak value.According to the difference of weighting function, be divided into the broad sense cross correlation function of maximum likelihood weighting (ML-MaximumLikelihood) and phase tranformation weighted method (PHAT-Phase Transform) again.Also have Adaptive Time Delay Estimation Method in addition, based on the Time Delay Estimation Method of people's ear positioning principle etc., wherein the former adopts adaptive mode to find transport function between the two paths of signals, thereby obtaining time delay estimates; The latter utilizes the initial segment of voice signal to make the broad sense computing cross-correlation according to the leading effect of sound, when reflective echo and direct sound wave stack, then can't determine the initial segment of voice, can obviously reduce algorithm performance.
Summary of the invention
The objective of the invention is to the auditory properties of people's ear is incorporated into the time delay estimation, thereby a kind of voice signal time delay estimation method based on human hearing characteristic is provided.
In order to realize the foregoing invention purpose, the invention provides a kind of voice signal time delay estimation method based on human hearing characteristic, the time delay between the two-way voice signal of homology is estimated, comprise the steps:
(1) be two subband signals with described two-way voice signal according to frequency partition;
(2) corresponding subband signal is made computing cross-correlation in the described two-way voice signal, obtains two subband cross correlation functions altogether;
(3) described two subband cross correlation functions are obtained the cross correlation function sum according to the weight addition;
(4) obtain time delay between the described two-way voice signal according to described cross correlation function sum.
Wherein, in step (1), described two-way voice signal is divided into two subband signals of high and low frequency respectively, and described division is the frequency separation with 1KHz.
Wherein, in step (3), described two subband cross correlation functions have identical weight, and the weight of described two subband cross correlation functions determines that according to the signal to noise ratio (S/N ratio) of subband the weight of the autocorrelation function of the subband that signal to noise ratio (S/N ratio) is high relatively is also big relatively.The weight of described two subband cross correlation functions is directly proportional with its signal to noise ratio (S/N ratio).
Wherein, the subband cross correlation function in step (2) is:
Wherein, x
i, x
jBe the input signal of passage i, j, R
Ij(m) be time domain broad sense cross correlation function, ()
*The expression complex conjugate, DFT and IDFT represent to pay upright leaf and anti-Fourier transformation, 0≤ρ≤1 respectively.
Wherein, 0.5≤ρ≤0.75, preferred ρ=0.6.
Voice signal time delay estimation method of the present invention is incorporated into time delay to the auditory properties of people's ear dexterously to be estimated, and the weights when determining the cross correlation function addition according to the signal to noise ratio (S/N ratio) of each subband, makes algorithm to noise robustness better be arranged.
Description of drawings
Fig. 1 is a voice signal time delay estimation method block diagram of the present invention;
Fig. 2 is the comparison of the calculating broad sense cross correlation results of diverse ways, and wherein (a) is PHAT result of calculation, (b) is the PHAT result of calculation of revising, and (c) is the present invention's SCCF result of calculation;
Time delay estimated result in Fig. 3 true environment, wherein solid line is the SCCF arithmetic result, and dotted line is the PHAT arithmetic result of revising, and dotted line is the PHAT arithmetic result.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
To arrive microphone m
iAnd m
jThe two-way voice signal use x respectively
iAnd x
jThe expression, to this two-way voice signal by the linear prediction prewhitening.The auditory model of people's ear has the cochlea characteristic, and promptly people's ear is the branch subband to the sensitivity of signal, and common people's ear is to the low frequency signal sensitivity, and insensitive to high-frequency signal.Therefore the signal of prewhitening is divided into two subband signals of high and low frequency according to the auditory properties of people's ear.
In the embodiment shown in fig. 1, signal x
iAnd x
jBehind the prewhitening, all use two wave filters to carry out filtering respectively, these two wave filters are Hi-pass filter and low-pass filter.Signal x
iAfter high pass filter filters, obtain its high-frequency signal, after low-pass filter filtering, obtain its low frequency signal.Wherein, in the preferred embodiment, high-frequency signal and low frequency signal are the frequency separation with 1KHz.Signal x
jProcess and x
iAlso be divided into two subband signals of high and low frequency for the frequency separation after the same operation with 1KHz.Signal x
iAnd x
jHigh-frequency signal part at first pass through half-wave shaping and lkHz low-pass filtering, and then do simple crosscorrelation, obtain high frequency cross-correlation function (HCCF, High-frequency Cross-Correlation Function); The low frequency signal part is directly done simple crosscorrelation, obtains low frequency cross correlation function (LCCF, Low-frequency Cross-Correlation Function).
The aforementioned high pass of using, low-pass filter all are 4 rank FIR wave filters.
Signal to noise ratio (S/N ratio) is different separately with the LCCF basis HCCF, obtains cross correlation function sum (SCCF, Summary Cross-Correlation Function) by different weighted value additions, promptly
SCCF=g
L×LCCF+g
H×HCCF (2)
Wherein, g
LAnd g
HBe respectively the weighted value of HCCF and LCCF, can determine according to the signal to noise ratio (S/N ratio) of subband under them:
g
L=SNR
L/(SNR
L+SNR
H) (3)
g
H=SNR
H/(SNR
L+SNR
H) (4)
Wherein
SNR in the formula
LAnd SNR
HBe respectively the signal to noise ratio (S/N ratio) of low frequency and two subbands of high frequency, x
L, x
HBe subband signal, n
L, n
HBe respectively the noise of corresponding subband, they can estimate to obtain in speech gaps.
In real system, if signal to noise ratio (S/N ratio) is not easy to obtain, can be g
L, g
HAll be changed to 1.
Be not difficult to find out that method of the present invention is the same with ML broad sense cross correlation function, when calculating autocorrelation function, all considered the signal to noise ratio (S/N ratio) of signal, the autocorrelation function of giving the high subband of signal to noise ratio (S/N ratio) is with bigger weight.
HCCF in the formula (2) and the cross-correlation calculation of LCCF are as follows:
x
i, x
jBe the input signal of passage i, j, k is the signal vector index, R
Ij(m) be time domain broad sense cross correlation function, m is the index of cross correlation vector, ()
*The expression complex conjugate.With high and low frequency signal substitution formula (7) separately, be used for calculating respectively HCCF and LCCF.The suitable value of determining ρ in the formula according to experiment is between 0.5~0.75, is preferably 0.6.
To bring formula (2) into by HCCF and the LCCF that formula (7) calculate respectively then, just can obtain final time delay estimated result by following formula like this:
The implication of formula (8) is composed the index at cross-correlation peak value place to τ
Ij, τ wherein
IjBe exactly that signal arrives microphone m
iAnd m
jTime delay, SCCF
Ij(m) be exactly the broad sense simple crosscorrelation summation of signal i, j.
For method of the present invention is analyzed, Fig. 2 provides the cross-correlation calculation result of a frame noisy speech, and wherein (a) is the result of calculation of PHAT (phase tranformation), (b) is the result of calculation of the PHAT of correction, (c) is SCCF result of calculation of the present invention.Dotted line marks the correct result of time delay among the figure, and as can be seen, PHTA detects by crest and obtains error result; Though the PHAT that revises obtains correct result, the pulse of SCCF method is more precipitous.
Fig. 3 is the experimental result statistics in actual office environment, and wherein solid line is a SCCF arithmetic result statistics, and dotted line is the PHAT arithmetic result statistics of revising, and dotted line is a PHAT arithmetic result statistics.The RMR room reverb time of this actual office environment is about 0.8s, in result's statistics, the correct data that postpone in the scope of point ± 2 all is considered as correct estimation, and other are considered as mistake and estimate.As can be seen, in the low signal-to-noise ratio environment, the method that the present invention proposes all has higher accuracy and littler error than the PHAT algorithm of PHAT and correction.Along with the improvement of signal to noise ratio (S/N ratio), various algorithm performances reach unanimity.Therefore algorithm of the present invention is obviously strengthened to the robust of noise is capable.
Claims (9)
1, a kind of voice signal time delay estimation method based on human hearing characteristic is estimated the time delay between the two-way voice signal of homology, comprises the steps:
(1) be two subband signals with described two-way voice signal according to frequency partition;
(2) corresponding subband signal is made computing cross-correlation in the described two-way voice signal, obtains two subband cross correlation functions altogether;
(3) described two subband cross correlation functions are obtained the cross correlation function sum according to the weight addition;
(4) obtain time delay between the described two-way voice signal according to described cross correlation function sum.
2, the voice signal time delay estimation method based on human hearing characteristic according to claim 1 is characterized in that, in the step (1), described two-way voice signal is divided into two subband signals of high and low frequency respectively.
3, the voice signal time delay estimation method based on human hearing characteristic according to claim 2 is characterized in that, described division is the frequency separation with 1KHz.
4, the time delay estimation method of voice signal according to claim 1 is characterized in that, in the step (3), described two subband cross correlation functions have identical weight.
5, the time delay estimation method of voice signal according to claim 1, it is characterized in that, in the step (3), the weight of described two subband cross correlation functions determines that according to the signal to noise ratio (S/N ratio) of subband the weight of the autocorrelation function of the subband that signal to noise ratio (S/N ratio) is high relatively is also big relatively.
6, the time delay estimation method of voice signal according to claim 5 is characterized in that, in the step (3), the weight of described two subband cross correlation functions is directly proportional with its signal to noise ratio (S/N ratio).
7, the voice signal time delay estimation method based on human hearing characteristic according to claim 1 is characterized in that, the subband cross correlation function in the step (2) is:
Wherein, x
i, x
jBe the input signal of passage i, j, R
Ij(m) be time domain broad sense cross correlation function, ()
*The expression complex conjugate, DFT and IDFT represent to pay upright leaf and anti-Fourier transformation, 0≤ρ≤1 respectively.
8, the voice signal time delay estimation method based on human hearing characteristic according to claim 7 is characterized in that 0.5≤ρ≤0.75.
9, the voice signal time delay estimation method based on human hearing characteristic according to claim 8 is characterized in that ρ=0.6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2003101134838A CN1212609C (en) | 2003-11-12 | 2003-11-12 | Voice signal time delay estimating method based on ear hearing characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2003101134838A CN1212609C (en) | 2003-11-12 | 2003-11-12 | Voice signal time delay estimating method based on ear hearing characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1545086A true CN1545086A (en) | 2004-11-10 |
CN1212609C CN1212609C (en) | 2005-07-27 |
Family
ID=34336877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2003101134838A Expired - Fee Related CN1212609C (en) | 2003-11-12 | 2003-11-12 | Voice signal time delay estimating method based on ear hearing characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1212609C (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103630148A (en) * | 2013-11-01 | 2014-03-12 | 中国科学院物理研究所 | Signal sampling averaging device and signal sampling averaging method |
CN105474306A (en) * | 2014-06-26 | 2016-04-06 | 华为技术有限公司 | Noise reduction method and apparatus, and mobile terminal |
CN105580076A (en) * | 2013-03-12 | 2016-05-11 | 谷歌技术控股有限责任公司 | Delivery of medical devices |
CN107479030A (en) * | 2017-07-14 | 2017-12-15 | 重庆邮电大学 | Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method |
CN107680603A (en) * | 2016-08-02 | 2018-02-09 | 电信科学技术研究院 | A kind of reverberation time method of estimation and device |
CN107785026A (en) * | 2017-10-18 | 2018-03-09 | 会听声学科技(北京)有限公司 | A kind of delay time estimation method eliminated for set top box indoor echo |
CN107966910A (en) * | 2017-11-30 | 2018-04-27 | 深圳Tcl新技术有限公司 | Method of speech processing, intelligent sound box and readable storage medium storing program for executing |
TWI743950B (en) * | 2020-08-18 | 2021-10-21 | 瑞昱半導體股份有限公司 | Method for delay estimation, method for echo cancellation and signal processing device utilizing the same |
-
2003
- 2003-11-12 CN CNB2003101134838A patent/CN1212609C/en not_active Expired - Fee Related
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105580076A (en) * | 2013-03-12 | 2016-05-11 | 谷歌技术控股有限责任公司 | Delivery of medical devices |
CN103630148A (en) * | 2013-11-01 | 2014-03-12 | 中国科学院物理研究所 | Signal sampling averaging device and signal sampling averaging method |
CN103630148B (en) * | 2013-11-01 | 2016-03-02 | 中国科学院物理研究所 | Sample of signal averaging device and sample of signal averaging method |
CN105474306A (en) * | 2014-06-26 | 2016-04-06 | 华为技术有限公司 | Noise reduction method and apparatus, and mobile terminal |
CN107680603A (en) * | 2016-08-02 | 2018-02-09 | 电信科学技术研究院 | A kind of reverberation time method of estimation and device |
CN107479030A (en) * | 2017-07-14 | 2017-12-15 | 重庆邮电大学 | Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method |
CN107785026A (en) * | 2017-10-18 | 2018-03-09 | 会听声学科技(北京)有限公司 | A kind of delay time estimation method eliminated for set top box indoor echo |
CN107785026B (en) * | 2017-10-18 | 2020-10-20 | 会听声学科技(北京)有限公司 | Time delay estimation method for indoor echo cancellation of set top box |
CN107966910A (en) * | 2017-11-30 | 2018-04-27 | 深圳Tcl新技术有限公司 | Method of speech processing, intelligent sound box and readable storage medium storing program for executing |
TWI743950B (en) * | 2020-08-18 | 2021-10-21 | 瑞昱半導體股份有限公司 | Method for delay estimation, method for echo cancellation and signal processing device utilizing the same |
Also Published As
Publication number | Publication date |
---|---|
CN1212609C (en) | 2005-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105869651B (en) | Binary channels Wave beam forming sound enhancement method based on noise mixing coherence | |
RU2456701C2 (en) | Higher speech intelligibility with application of several microphones on several devices | |
EP2196988B1 (en) | Determination of the coherence of audio signals | |
DK2701145T3 (en) | Noise cancellation for use with noise reduction and echo cancellation in personal communication | |
CN106226739A (en) | Merge the double sound source localization method of Substrip analysis | |
US8812309B2 (en) | Methods and apparatus for suppressing ambient noise using multiple audio signals | |
CN102456351A (en) | Voice enhancement system | |
US20070121955A1 (en) | Room acoustics correction device | |
EP2063419A1 (en) | Speaker localization | |
CN101762806B (en) | Sound source locating method and apparatus thereof | |
US20030061032A1 (en) | Selective sound enhancement | |
CN109239667A (en) | A kind of sound localization method based on two-microphone array | |
JP2009288215A (en) | Acoustic processing device and method therefor | |
Schwartz et al. | Joint estimation of late reverberant and speech power spectral densities in noisy environments using Frobenius norm | |
CN1212609C (en) | Voice signal time delay estimating method based on ear hearing characteristics | |
CN103907152A (en) | A method and a system for noise suppressing an audio signal | |
CN101587712B (en) | Directional speech enhancement method based on small microphone array | |
CN1768555A (en) | Method and apparatus for reducing an interference noise signal fraction in a microphone signal | |
KR20040077661A (en) | Method and apparatus for removing noise from electronic signals | |
CN1667702A (en) | Input sound processor | |
Guo et al. | Underwater target detection and localization with feature map and CNN-based classification | |
Griebel et al. | Microphone array speech dereverberation using coarse channel modeling | |
KR100612616B1 (en) | The signal-to-noise ratio estimation method and sound source localization method based on zero-crossings | |
Moore et al. | Linear prediction based dereverberation for spherical microphone arrays | |
CN111210836B (en) | Dynamic adjustment method for microphone array beam forming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C19 | Lapse of patent right due to non-payment of the annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |