CN1545086A - Voice signal time delay estimating method based on ear hearing characteristics - Google Patents

Voice signal time delay estimating method based on ear hearing characteristics Download PDF

Info

Publication number
CN1545086A
CN1545086A CNA2003101134838A CN200310113483A CN1545086A CN 1545086 A CN1545086 A CN 1545086A CN A2003101134838 A CNA2003101134838 A CN A2003101134838A CN 200310113483 A CN200310113483 A CN 200310113483A CN 1545086 A CN1545086 A CN 1545086A
Authority
CN
China
Prior art keywords
time delay
voice signal
subband
signal
cross correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2003101134838A
Other languages
Chinese (zh)
Other versions
CN1212609C (en
Inventor
杜利民
阎兆立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CNB2003101134838A priority Critical patent/CN1212609C/en
Publication of CN1545086A publication Critical patent/CN1545086A/en
Application granted granted Critical
Publication of CN1212609C publication Critical patent/CN1212609C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice signal time delay estimating method based on the hearing property of human ear for estimating the time delay between two voice signals of the same source, including: (1) dividing the two voice signals into two sub-band signals; (2) making a relative operation on each sub-band signal, thus obtaining two sub-band relative functions together; (3) adding the two functions according to weights to obtain the sum; (4) obtaining the time delay according to the sum. It introduces the hearing property into time delay estimation, skillfully and determines the weight values according to signal-noise ratio of each sub-band, making the algorithm have better robustness to the noise.

Description

Voice signal time delay estimation method based on human hearing characteristic
Technical field
The present invention relates to the microphone array technology in the computer technology application, more particularly, the present invention relates to the time delay estimation method of a kind of voice signal in the microphone array technology.
Background technology
So-called time delay was meant between the same source signal that different microphone/sensors receive in microphone/sensor array owing to the different mistimings that cause of signal transmission distance.Time delay estimates that (TDE-Time DelayEstimation) is exactly theory and the method for utilizing parameter estimation and signal Processing, and above-mentioned time delay is estimated and measured.
In various speech processing systems based on microphone array, it is a basic problem that time delay is estimated.For example, utilize microphone array that the talker is positioned, its basic thought is exactly to determine target direction and distance according to the time delay between each channel signal that estimates.In the microphone array speech-enhancement system, estimate the time delay between each road voice signal, making it to keep is a precondition of carrying out subsequent processes synchronously, time delay in the system is accurately estimated and done delay compensation, make the pointing direction of microphone array consistent with talker's direction, this is to implement the matter of utmost importance that various microphone array sound enhancement methods need be solved.Because the existence that noise and voice disturb also exists reverberation or echo problem under some occasion, and the talker moves often, and this brings very big difficulty all for the time delay estimation in the array.
Broad sense simple crosscorrelation Time Delay Estimation Method (GCC-Generalized Cross-Correlation) is the most widely used algorithm of research, and its formula is
R ij ( τ ) = ∫ - ∞ + ∞ ψ ij ( f ) φ ij ( f ) e j 2 πfτ df - - - ( 1 )
Its essence is exactly the result who cross correlation function is added a wave filter, wherein, and φ Ij(f) for being numbered the microphone signal x of i and j in the microphone array i(k) and x j(k) cross-power spectrum between, ψ Ij(f) be weighting function.When utilizing GCC to carry out the time delay estimation, according to weighting function ψ Ij(f) and cross-power spectrum φ Ij(f) calculate broad sense cross correlation function R Ij(τ), determine the position of its peak point, the pairing τ of this peak value is the time delay between signal.In actual applications, at different noises and reverberation situation, can select different weighting function ψ Ij(f), make R Ij(τ) has more sharp-pointed peak value.According to the difference of weighting function, be divided into the broad sense cross correlation function of maximum likelihood weighting (ML-MaximumLikelihood) and phase tranformation weighted method (PHAT-Phase Transform) again.Also have Adaptive Time Delay Estimation Method in addition, based on the Time Delay Estimation Method of people's ear positioning principle etc., wherein the former adopts adaptive mode to find transport function between the two paths of signals, thereby obtaining time delay estimates; The latter utilizes the initial segment of voice signal to make the broad sense computing cross-correlation according to the leading effect of sound, when reflective echo and direct sound wave stack, then can't determine the initial segment of voice, can obviously reduce algorithm performance.
Summary of the invention
The objective of the invention is to the auditory properties of people's ear is incorporated into the time delay estimation, thereby a kind of voice signal time delay estimation method based on human hearing characteristic is provided.
In order to realize the foregoing invention purpose, the invention provides a kind of voice signal time delay estimation method based on human hearing characteristic, the time delay between the two-way voice signal of homology is estimated, comprise the steps:
(1) be two subband signals with described two-way voice signal according to frequency partition;
(2) corresponding subband signal is made computing cross-correlation in the described two-way voice signal, obtains two subband cross correlation functions altogether;
(3) described two subband cross correlation functions are obtained the cross correlation function sum according to the weight addition;
(4) obtain time delay between the described two-way voice signal according to described cross correlation function sum.
Wherein, in step (1), described two-way voice signal is divided into two subband signals of high and low frequency respectively, and described division is the frequency separation with 1KHz.
Wherein, in step (3), described two subband cross correlation functions have identical weight, and the weight of described two subband cross correlation functions determines that according to the signal to noise ratio (S/N ratio) of subband the weight of the autocorrelation function of the subband that signal to noise ratio (S/N ratio) is high relatively is also big relatively.The weight of described two subband cross correlation functions is directly proportional with its signal to noise ratio (S/N ratio).
Wherein, the subband cross correlation function in step (2) is:
R ij ( m ) = IDFT { DFT { x i ( k ) } DFT { x j ( k ) } * ( | DFT { x i ( k ) } | | DFT { x j ( k ) } | ) ρ }
Wherein, x i, x jBe the input signal of passage i, j, R Ij(m) be time domain broad sense cross correlation function, () *The expression complex conjugate, DFT and IDFT represent to pay upright leaf and anti-Fourier transformation, 0≤ρ≤1 respectively.
Wherein, 0.5≤ρ≤0.75, preferred ρ=0.6.
Voice signal time delay estimation method of the present invention is incorporated into time delay to the auditory properties of people's ear dexterously to be estimated, and the weights when determining the cross correlation function addition according to the signal to noise ratio (S/N ratio) of each subband, makes algorithm to noise robustness better be arranged.
Description of drawings
Fig. 1 is a voice signal time delay estimation method block diagram of the present invention;
Fig. 2 is the comparison of the calculating broad sense cross correlation results of diverse ways, and wherein (a) is PHAT result of calculation, (b) is the PHAT result of calculation of revising, and (c) is the present invention's SCCF result of calculation;
Time delay estimated result in Fig. 3 true environment, wherein solid line is the SCCF arithmetic result, and dotted line is the PHAT arithmetic result of revising, and dotted line is the PHAT arithmetic result.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
To arrive microphone m iAnd m jThe two-way voice signal use x respectively iAnd x jThe expression, to this two-way voice signal by the linear prediction prewhitening.The auditory model of people's ear has the cochlea characteristic, and promptly people's ear is the branch subband to the sensitivity of signal, and common people's ear is to the low frequency signal sensitivity, and insensitive to high-frequency signal.Therefore the signal of prewhitening is divided into two subband signals of high and low frequency according to the auditory properties of people's ear.
In the embodiment shown in fig. 1, signal x iAnd x jBehind the prewhitening, all use two wave filters to carry out filtering respectively, these two wave filters are Hi-pass filter and low-pass filter.Signal x iAfter high pass filter filters, obtain its high-frequency signal, after low-pass filter filtering, obtain its low frequency signal.Wherein, in the preferred embodiment, high-frequency signal and low frequency signal are the frequency separation with 1KHz.Signal x jProcess and x iAlso be divided into two subband signals of high and low frequency for the frequency separation after the same operation with 1KHz.Signal x iAnd x jHigh-frequency signal part at first pass through half-wave shaping and lkHz low-pass filtering, and then do simple crosscorrelation, obtain high frequency cross-correlation function (HCCF, High-frequency Cross-Correlation Function); The low frequency signal part is directly done simple crosscorrelation, obtains low frequency cross correlation function (LCCF, Low-frequency Cross-Correlation Function).
The aforementioned high pass of using, low-pass filter all are 4 rank FIR wave filters.
Signal to noise ratio (S/N ratio) is different separately with the LCCF basis HCCF, obtains cross correlation function sum (SCCF, Summary Cross-Correlation Function) by different weighted value additions, promptly
SCCF=g L×LCCF+g H×HCCF (2)
Wherein, g LAnd g HBe respectively the weighted value of HCCF and LCCF, can determine according to the signal to noise ratio (S/N ratio) of subband under them:
g L=SNR L/(SNR L+SNR H) (3)
g H=SNR H/(SNR L+SNR H) (4)
Wherein
SNR L = E [ x L 2 ] - E [ n L 2 ] E [ n L 2 ] - - - ( 5 )
SNR H = E [ x H 2 ] - E [ n H 2 ] E [ n H 2 ] - - - ( 6 )
SNR in the formula LAnd SNR HBe respectively the signal to noise ratio (S/N ratio) of low frequency and two subbands of high frequency, x L, x HBe subband signal, n L, n HBe respectively the noise of corresponding subband, they can estimate to obtain in speech gaps.
In real system, if signal to noise ratio (S/N ratio) is not easy to obtain, can be g L, g HAll be changed to 1.
Be not difficult to find out that method of the present invention is the same with ML broad sense cross correlation function, when calculating autocorrelation function, all considered the signal to noise ratio (S/N ratio) of signal, the autocorrelation function of giving the high subband of signal to noise ratio (S/N ratio) is with bigger weight.
HCCF in the formula (2) and the cross-correlation calculation of LCCF are as follows:
R ij ( m ) = IDFT { DFT { x i ( k ) } DFT { x j ( k ) } * ( | DFT { x i ( k ) } | | DFT { x j ( k ) } | ) ρ } - - - 0 ≤ ρ ≤ 1 - - - ( 7 )
x i, x jBe the input signal of passage i, j, k is the signal vector index, R Ij(m) be time domain broad sense cross correlation function, m is the index of cross correlation vector, () *The expression complex conjugate.With high and low frequency signal substitution formula (7) separately, be used for calculating respectively HCCF and LCCF.The suitable value of determining ρ in the formula according to experiment is between 0.5~0.75, is preferably 0.6.
To bring formula (2) into by HCCF and the LCCF that formula (7) calculate respectively then, just can obtain final time delay estimated result by following formula like this:
τ ij = m : max m SCCF ij ( m ) - - - ( 8 )
The implication of formula (8) is composed the index at cross-correlation peak value place to τ Ij, τ wherein IjBe exactly that signal arrives microphone m iAnd m jTime delay, SCCF Ij(m) be exactly the broad sense simple crosscorrelation summation of signal i, j.
For method of the present invention is analyzed, Fig. 2 provides the cross-correlation calculation result of a frame noisy speech, and wherein (a) is the result of calculation of PHAT (phase tranformation), (b) is the result of calculation of the PHAT of correction, (c) is SCCF result of calculation of the present invention.Dotted line marks the correct result of time delay among the figure, and as can be seen, PHTA detects by crest and obtains error result; Though the PHAT that revises obtains correct result, the pulse of SCCF method is more precipitous.
Fig. 3 is the experimental result statistics in actual office environment, and wherein solid line is a SCCF arithmetic result statistics, and dotted line is the PHAT arithmetic result statistics of revising, and dotted line is a PHAT arithmetic result statistics.The RMR room reverb time of this actual office environment is about 0.8s, in result's statistics, the correct data that postpone in the scope of point ± 2 all is considered as correct estimation, and other are considered as mistake and estimate.As can be seen, in the low signal-to-noise ratio environment, the method that the present invention proposes all has higher accuracy and littler error than the PHAT algorithm of PHAT and correction.Along with the improvement of signal to noise ratio (S/N ratio), various algorithm performances reach unanimity.Therefore algorithm of the present invention is obviously strengthened to the robust of noise is capable.

Claims (9)

1, a kind of voice signal time delay estimation method based on human hearing characteristic is estimated the time delay between the two-way voice signal of homology, comprises the steps:
(1) be two subband signals with described two-way voice signal according to frequency partition;
(2) corresponding subband signal is made computing cross-correlation in the described two-way voice signal, obtains two subband cross correlation functions altogether;
(3) described two subband cross correlation functions are obtained the cross correlation function sum according to the weight addition;
(4) obtain time delay between the described two-way voice signal according to described cross correlation function sum.
2, the voice signal time delay estimation method based on human hearing characteristic according to claim 1 is characterized in that, in the step (1), described two-way voice signal is divided into two subband signals of high and low frequency respectively.
3, the voice signal time delay estimation method based on human hearing characteristic according to claim 2 is characterized in that, described division is the frequency separation with 1KHz.
4, the time delay estimation method of voice signal according to claim 1 is characterized in that, in the step (3), described two subband cross correlation functions have identical weight.
5, the time delay estimation method of voice signal according to claim 1, it is characterized in that, in the step (3), the weight of described two subband cross correlation functions determines that according to the signal to noise ratio (S/N ratio) of subband the weight of the autocorrelation function of the subband that signal to noise ratio (S/N ratio) is high relatively is also big relatively.
6, the time delay estimation method of voice signal according to claim 5 is characterized in that, in the step (3), the weight of described two subband cross correlation functions is directly proportional with its signal to noise ratio (S/N ratio).
7, the voice signal time delay estimation method based on human hearing characteristic according to claim 1 is characterized in that, the subband cross correlation function in the step (2) is:
R ij ( m ) = IDFT { DFT { x i ( k ) } DFT { x j ( k ) } * ( | DFT { x i ( k ) } | | DFT { x j ( k ) } | ) ρ }
Wherein, x i, x jBe the input signal of passage i, j, R Ij(m) be time domain broad sense cross correlation function, () *The expression complex conjugate, DFT and IDFT represent to pay upright leaf and anti-Fourier transformation, 0≤ρ≤1 respectively.
8, the voice signal time delay estimation method based on human hearing characteristic according to claim 7 is characterized in that 0.5≤ρ≤0.75.
9, the voice signal time delay estimation method based on human hearing characteristic according to claim 8 is characterized in that ρ=0.6.
CNB2003101134838A 2003-11-12 2003-11-12 Voice signal time delay estimating method based on ear hearing characteristics Expired - Fee Related CN1212609C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2003101134838A CN1212609C (en) 2003-11-12 2003-11-12 Voice signal time delay estimating method based on ear hearing characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2003101134838A CN1212609C (en) 2003-11-12 2003-11-12 Voice signal time delay estimating method based on ear hearing characteristics

Publications (2)

Publication Number Publication Date
CN1545086A true CN1545086A (en) 2004-11-10
CN1212609C CN1212609C (en) 2005-07-27

Family

ID=34336877

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101134838A Expired - Fee Related CN1212609C (en) 2003-11-12 2003-11-12 Voice signal time delay estimating method based on ear hearing characteristics

Country Status (1)

Country Link
CN (1) CN1212609C (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103630148A (en) * 2013-11-01 2014-03-12 中国科学院物理研究所 Signal sampling averaging device and signal sampling averaging method
CN105474306A (en) * 2014-06-26 2016-04-06 华为技术有限公司 Noise reduction method and apparatus, and mobile terminal
CN105580076A (en) * 2013-03-12 2016-05-11 谷歌技术控股有限责任公司 Delivery of medical devices
CN107479030A (en) * 2017-07-14 2017-12-15 重庆邮电大学 Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method
CN107680603A (en) * 2016-08-02 2018-02-09 电信科学技术研究院 A kind of reverberation time method of estimation and device
CN107785026A (en) * 2017-10-18 2018-03-09 会听声学科技(北京)有限公司 A kind of delay time estimation method eliminated for set top box indoor echo
CN107966910A (en) * 2017-11-30 2018-04-27 深圳Tcl新技术有限公司 Method of speech processing, intelligent sound box and readable storage medium storing program for executing
TWI743950B (en) * 2020-08-18 2021-10-21 瑞昱半導體股份有限公司 Method for delay estimation, method for echo cancellation and signal processing device utilizing the same

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105580076A (en) * 2013-03-12 2016-05-11 谷歌技术控股有限责任公司 Delivery of medical devices
CN103630148A (en) * 2013-11-01 2014-03-12 中国科学院物理研究所 Signal sampling averaging device and signal sampling averaging method
CN103630148B (en) * 2013-11-01 2016-03-02 中国科学院物理研究所 Sample of signal averaging device and sample of signal averaging method
CN105474306A (en) * 2014-06-26 2016-04-06 华为技术有限公司 Noise reduction method and apparatus, and mobile terminal
CN107680603A (en) * 2016-08-02 2018-02-09 电信科学技术研究院 A kind of reverberation time method of estimation and device
CN107479030A (en) * 2017-07-14 2017-12-15 重庆邮电大学 Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method
CN107785026A (en) * 2017-10-18 2018-03-09 会听声学科技(北京)有限公司 A kind of delay time estimation method eliminated for set top box indoor echo
CN107785026B (en) * 2017-10-18 2020-10-20 会听声学科技(北京)有限公司 Time delay estimation method for indoor echo cancellation of set top box
CN107966910A (en) * 2017-11-30 2018-04-27 深圳Tcl新技术有限公司 Method of speech processing, intelligent sound box and readable storage medium storing program for executing
TWI743950B (en) * 2020-08-18 2021-10-21 瑞昱半導體股份有限公司 Method for delay estimation, method for echo cancellation and signal processing device utilizing the same

Also Published As

Publication number Publication date
CN1212609C (en) 2005-07-27

Similar Documents

Publication Publication Date Title
CN105869651B (en) Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
RU2456701C2 (en) Higher speech intelligibility with application of several microphones on several devices
EP2196988B1 (en) Determination of the coherence of audio signals
DK2701145T3 (en) Noise cancellation for use with noise reduction and echo cancellation in personal communication
CN106226739A (en) Merge the double sound source localization method of Substrip analysis
US8812309B2 (en) Methods and apparatus for suppressing ambient noise using multiple audio signals
CN102456351A (en) Voice enhancement system
US20070121955A1 (en) Room acoustics correction device
EP2063419A1 (en) Speaker localization
CN101762806B (en) Sound source locating method and apparatus thereof
US20030061032A1 (en) Selective sound enhancement
CN109239667A (en) A kind of sound localization method based on two-microphone array
JP2009288215A (en) Acoustic processing device and method therefor
Schwartz et al. Joint estimation of late reverberant and speech power spectral densities in noisy environments using Frobenius norm
CN1212609C (en) Voice signal time delay estimating method based on ear hearing characteristics
CN103907152A (en) A method and a system for noise suppressing an audio signal
CN101587712B (en) Directional speech enhancement method based on small microphone array
CN1768555A (en) Method and apparatus for reducing an interference noise signal fraction in a microphone signal
KR20040077661A (en) Method and apparatus for removing noise from electronic signals
CN1667702A (en) Input sound processor
Guo et al. Underwater target detection and localization with feature map and CNN-based classification
Griebel et al. Microphone array speech dereverberation using coarse channel modeling
KR100612616B1 (en) The signal-to-noise ratio estimation method and sound source localization method based on zero-crossings
Moore et al. Linear prediction based dereverberation for spherical microphone arrays
CN111210836B (en) Dynamic adjustment method for microphone array beam forming

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee