CN1991980A

CN1991980A - Method for removing background noise in voice signal

Info

Publication number: CN1991980A
Application number: CNA2005101374510A
Authority: CN
Inventors: 黄泰惠
Original assignee: Industrial Technology Research Institute ITRI
Current assignee: Industrial Technology Research Institute ITRI
Priority date: 2005-12-30
Filing date: 2005-12-30
Publication date: 2007-07-04
Anticipated expiration: 2025-12-30
Also published as: CN100565672C

Abstract

A method of eliminating background noise in voice signal includes steps as following. Firstly, the attenuation coefficient of frequency band i is defined, then it is brute-force filtered based on the attenuation coefficient of adjacent frequency band to calculate the forward attenuation coefficient and backward attenuation coefficient. Then the forward attenuation coefficient and backward attenuation coefficient are linear combined to calculate the smooth attenuation coefficient of frequency band i. then the voice spectrum estimating value is calculated with the smooth attenuation coefficient. At last, the voice signal that the background noise is eliminated can be gained after the Fourier inversion.

Description

Remove the method for ground unrest in the voice signal

Technical field

The present invention relates to the method for ground unrest in a kind of removal voice signal (background noise), and be particularly related to the audio signal processing method that a kind of attenuation coefficient to each frequency band (attenuation factor) is made The disposal of gentle filter.

Background technology

User satisfaction investigation result according to osophone shows that the osophone user often has the complaint of " the excessive amplification of environmental noise is made us feeling tired " and " hear but and can not hear clearly ".Therefore, the noise in the removal signal is worn comfort level with raising becomes one of important topic of present research and development digital deaf-aid technology.Though the method for the ground unrest in some removal voice signals can obviously improve signal to noise ratio (S/N ratio) (signal to noise ratio at present, abbreviate SNR as), but but, even the subsidiary fluency that produce extra noise (be loosely referred to as musical noise) or destroy voice also not obvious to the improvement of voice identification.

Ground unrest disturbs and to be a kind of time domain (time domain) Waveform Superimposed Action, and the noise voice signal that receives at first can be expressed as y[n]=x[n]+w[n], x[n wherein] represent undisturbed voice signal, w[n] then represent ground unrest.

Traditional removal noise method can be expressed as

\hat{X} [i] = γ [i] Y [i],

Y[i wherein] be noise voice signal y[n] through belonging to the part of frequency band i after the fast fourier transform (fast Fourier transform), i ∈ [0, N-1], N is a number of frequency bands, | Y[i] | expression noise voice signal y[n] at the amplitude of frequency band i, and the attenuation coefficient of the above-mentioned amplitude of γ [i] expression.

Traditional attenuation coefficient computing method are

γ [i] = \frac{{| D [i] |}^{2}}{{| Y [i] |}^{2}},

Wherein

{| D [i] |}^{2} = \{\begin{matrix} {| Y [i] |}^{2} - α {| W [i] |}^{2}, if {| Y [i] |}^{2} &GreaterEqual; \frac{α}{1 - β} {| W [i] |}^{2} \\ β {| Y [i] |}^{2}, elsewhere \end{matrix},

| W[i] ²Be the energy of ground unrest at frequency band i, α and β are default coefficient.So, calculate

\hat{X} [i] = γ [i] Y [i]

Afterwards, then right Do inverse fourier transform (inverse Fourier transform), can obtain removing the voice signal behind the ground unrest.

Voice signal has correlativity between adjacent frequency band, yet as mentioned above, classic method is not utilized this point, and traditional amplitude damping factor is in each frequency band separate computations.So classic method should have improved space.

Summary of the invention

The purpose of this invention is to provide a kind of method of removing ground unrest in the voice signal, but this method can improve speech quality and the identification of removing behind the ground unrest.

For reaching above-mentioned and other purpose, the present invention proposes a kind of method of removing ground unrest in the voice signal, comprises the following steps.At first, the attenuation coefficient of definition frequency band i

γ [i] = \frac{{| D [i] |}^{2}}{{| Y [i] |}^{2}},

Wherein

{| D [i] |}^{2} = \{\begin{matrix} {| Y [i] |}^{2} - α {| W [i] |}^{2}, if {| Y [i] |}^{2} &GreaterEqual; \frac{α}{1 - β} {| W [i] |}^{2} \\ β {| Y [i] |}^{2}, elsewhere \end{matrix},

| Y[i] | ²Be the energy of noise voice signal at frequency band i, | W[i] | ²Be the energy of ground unrest at frequency band i, i ∈ [0, N-1], N is a number of frequency bands, α and β are default coefficient.Calculate the forward attenuation coefficient γ of frequency band i then _f[i] ≡ γ [i]=λ _fγ [i]+(1-λ _f) γ [i-1], wherein λ _fBe default coefficient.Calculate the reverse attenuation coefficient gamma of frequency band i then _b[i]=λ _bγ _b[i]+(1-λ _b) γ _b[i-1], wherein γ _b[i]=γ [N-1-i], λ _bBe default coefficient.Then calculate the level and smooth attenuation coefficient of frequency band i

\hat{γ} [i] = λ_{c} \cdot {\overset{&OverBar;}{γ}}_{f} [i] + (1 - λ_{c}) {\overset{&OverBar;}{γ}}_{b} [N - 1 - i]),

λ wherein _cBe default coefficient.Then according to level and smooth attenuation coefficient computing voice frequency spectrum estimated value

\hat{X} [i] = \hat{γ} [i] Y [i] .

At last, will Make inverse fourier transform, obtain removing the voice signal behind the ground unrest.

The method of ground unrest in the above-mentioned removal voice signal, in one embodiment, γ [1]=γ [0], and

{\overset{&OverBar;}{γ}}_{b} [- 1] = γ [N - 1] .

Described according to preferred embodiment of the present invention, the method of ground unrest is to utilize the relevance of voice signal between nearby frequency bands that attenuation coefficient is made The disposal of gentle filter in the above-mentioned removal voice signal, replacing traditional amplitude damping factor, but experimental result proof the method can improve speech quality and the identification of removing behind the ground unrest.

For above and other objects of the present invention, feature and advantage can be become apparent, the present invention's cited below particularly preferred embodiment, and conjunction with figs. are described in detail below.

Description of drawings

Fig. 1 is for removing the method flow diagram of ground unrest in the voice signal according to an embodiment of the invention.

Fig. 2 is the attenuation coefficient contrast figure of conventional art and one embodiment of the invention.

The main element description of symbols

110～160: flow chart step

Embodiment

By the voice spectrum behind the resulting removal noise of classic method is independently to be calculated by each frequency band, but the method that the present invention proposes then is to utilize the dependence relation between frequency band to handle the identification of back voice to improve denoising.

Below explanation please refer to Fig. 1.Fig. 1 is for removing the method flow diagram of ground unrest in the voice signal according to an embodiment of the invention.At first, define the attenuation coefficient of each frequency band in step 110.The number of frequency bands of supposing present embodiment is N, i ∈ [0, N-1], and then the attenuation coefficient of frequency band i is defined as

γ [i] = \frac{{| D [i] |}^{2}}{{| Y [i] |}^{2}},

Wherein

{| D [i] |}^{2} = \{\begin{matrix} {| Y [i] |}^{2} - α {| W [i] |}^{2}, if {| Y [i] |}^{2} &GreaterEqual; \frac{α}{1 - β} {| W [i] |}^{2} \\ β {| Y [i] |}^{2}, elsewhere \end{matrix},

| Y[i] | ²Be the energy of the initial noise voice signal that receives at frequency band i, | W[i] | ²Be the energy of ground unrest at frequency band i, α and β are default coefficient.

Behind the definition attenuation coefficient, utilize first order IIR (infinite impulse response) wave filter q[n in step 120]=λ p[n]+(1-λ) q[n-1] the attenuation coefficient γ [i] of frequency band i is made Filtering Processing, to calculate the forward attenuation coefficient γ of frequency band i _f[i].The computing formula of present embodiment is γ _f[i] ≡ γ [i]=λ _fγ [i]+(1-λ _f) γ [i-1], wherein λ _fBe default coefficient.Calculate as can be known through simple, forward attenuation coefficient γ _f[i] calculates to γ [i] according to γ [0].

Next, utilize above-mentioned first order IIR filtering device that the attenuation coefficient γ [i] that the frequency band order is inverted is made Filtering Processing, to calculate the reverse attenuation coefficient gamma of frequency band i in step 130 _b[i].The computing formula of present embodiment is γ _b[i]=λ _bγ _b[i]+(1-λ _b) γ _b[i-1], wherein γ _b[i]=γ [N-1-i], λ _bBe default coefficient.Calculate as can be known the reverse attenuation coefficient gamma through simple _b[i] calculates to γ [N-1] according to γ [N-1-i].

In above-mentioned difierence equation computing, starting condition is γ [1]=γ [0], and γ _b[1]=γ [N-1].

Next, will forward do linear combination to calculate the level and smooth attenuation coefficient of frequency band i in step 140 with reverse filtering result

The computing formula of present embodiment is

\hat{γ} [i] = λ_{c} \cdot {\overset{&OverBar;}{γ}}_{f} [i] + (1 - λ_{c}) {\overset{&OverBar;}{γ}}_{b} [N - 1 - i]),

λ wherein _cBe default coefficient.Voice spectrum estimated value after step 150 is calculated smoothing processing then

\hat{X} [i] = \hat{γ} [i] Y [i],

At last, will in step 160 Make inverse fourier transform, just can obtain removing the voice signal behind the ground unrest.

Fig. 2 is the attenuation coefficient contrast figure of conventional art and present embodiment, and its transverse axis is a frequency band number, and its longitudinal axis is the attenuation coefficient value.Fig. 2 is set at λ _f=λ _b=λ _c=0.5, except the broken line that is marked as conventional art, all the other broken lines are all the data of present embodiment.Can find by Fig. 2, merge and forward to reach reverse result and make the attenuation coefficient of each frequency band can be subjected to the influence of its left and right sides band attenuation coefficient and adjust its value, therefore can reach the purpose of utilizing dependence relation adjustment band attenuation coefficient between frequency band.

Below the experimental result of explanation present embodiment at first is the experiment about the syllable discrimination power.This experiment is to train Chinese syllable-based hmm with 18 male sex and 11 women at the indoor clean speech database of respectively reading 120 Chinese names of peace and quiet.As for ground unrest, be that this clean speech database is added operation room noise, white noise, people's acoustic noise and factory noise respectively, wherein every kind of noise is synthesized into by waveform according to signal to noise ratio (S/N ratio) 20dB, 15dB, 10dB, 5dB and 0dB respectively.With each voice archives of this noise speech database, do the removal noise processed with the method for present embodiment, do automatic syllable identification with the clean speech model then, obtain following result.Following each experimental data all is four kinds of noises and five kinds of signal to noise ratio (S/N ratio)s, the mean value of 20 kinds of combinations altogether.

The syllable discrimination power experimental data of table 1, present embodiment

The λ value	1.0	0.7	0.6	0.55	0.5	0.45	0.4
The λ value	1.0	0.7	0.6	0.55	0.5	0.45	0.4	Syllable accuracy (%)	41.8	44.8	45.6	45.8	46.1	46.2	45.9

λ in this experiment _f=λ _b=λ, λ=1 o'clock level and smooth attenuation coefficient Equal traditional attenuation coefficient γ [i], so λ=1 o'clock 41.8% is exactly the experimental data of classic method.On the other hand, not doing the syllable accuracy of removing noise fully is 32.9%.As shown in Table 1, the method for present embodiment can improve the discrimination power of removing behind the noise really, o'clock can reach the highest discrimination power 46.2% in λ=0.45.

Second experiment is the result who comes the comparison distinct methods with the speech quality sense of hearing amount of commenting (perceptualevaluation of speech quality abbreviates PESQ as) of measuring the tonequality quality.PESQ mark scope is [0,4], and wherein 4 is the undistorted highest score of tonequality.Experimental result is as shown in the table.

Speech quality behind table 2, the removal ground unrest is measured

The λ value	1.0	0.5
The λ value	1.0	0.5	The PESQ mark	2.44	2.45

Same, λ in this experiment _f=λ _b=λ, 2.44 of λ=1 o'clock is the PESQ mark of classic method.On the other hand, not doing the mark of removing noise fully is 2.08.As shown in Table 2, the method for present embodiment can improve the speech quality of removing behind the ground unrest really.

Though the present invention is the inspiration that is subjected to digital deaf-aid, application of the present invention is not limited to digital deaf-aid, also can be applied to other field, and for example the digital recording of recording pen and so on is used.

In sum, the method of ground unrest in the removal voice signal that the present invention proposes, be to utilize the relevance of voice signal between nearby frequency bands that attenuation coefficient is made The disposal of gentle filter, replacing traditional amplitude damping factor, but experimental result proof said method can improve speech quality and the identification of removing behind the ground unrest.

Though the present invention discloses as above with preferred embodiment; right its is not in order to limit the present invention; any person of ordinary skill in the field; without departing from the spirit and scope of the present invention; when can doing a little change and improvement, so protection scope of the present invention is as the criterion when looking the claim person of defining.

Claims

1. method of removing ground unrest in the voice signal is characterized in that comprising the following steps: to define the attenuation coefficient of frequency band i

γ [i] = \frac{{| D [i] |}^{2}}{{| Y [i] |}^{2}},

Wherein

{| D [i] |}^{2} = \{\begin{matrix} {| Y [i] |}^{2} - α {| W [i] |}^{2}, if {[Y [i] |}^{2} &GreaterEqual; \frac{α}{1 - β} {| W [i] |}^{2} \\ β {| Y [i] |}^{2}, elsewhere \end{matrix},

| Y[i] | ²Be the energy of noise voice signal at frequency band i, | W[i] | ²Be the energy of ground unrest at frequency band i, i ∈ [0, N-1], N is a number of frequency bands, α and β are default coefficient;

Calculate the forward attenuation coefficient γ of frequency band i to γ [i] according to γ [0] _f[i];

Calculate the reverse attenuation coefficient gamma of frequency band i to γ [N-1] according to γ [N-1-i] _b[i];

According to γ _f[i] and γ _b[i], the level and smooth attenuation coefficient of calculating frequency band i

Computing voice frequency spectrum estimated value

\hat{X} [i] = \hat{γ} [i] Y [i];

And

Will

Make inverse fourier transform, obtain removing the voice signal behind the ground unrest.

2. according to the method for ground unrest in the described removal voice signal of claim 1, it is characterized in that γ _f[i] ≡ γ [i]=λ _fγ [i]+(1-λ _f) γ [i-1], λ _fBe default coefficient.

3. according to the method for ground unrest in the described removal voice signal of claim 2, it is characterized in that γ [1]=γ [0].

4. according to the method for ground unrest in the described removal voice signal of claim 2, it is characterized in that λ _fBe 0.5.

5. according to the method for ground unrest in the described removal voice signal of claim 1, it is characterized in that γ _b[i]=λ _bγ _b[i]+(1-λ _b) γ _b[i-1], γ _b[i]=γ [N-1-i], λ _bBe default coefficient.

6. according to the method for ground unrest in the described removal voice signal of claim 5, it is characterized in that γ _b[1]=γ [N-1].

7. according to the method for ground unrest in the described removal voice signal of claim 5, it is characterized in that λ _bBe 0.5.

8. according to the method for ground unrest in the described removal voice signal of claim 1, it is characterized in that

\hat{γ} [i] = λ_{c} \cdot {\overset{&OverBar;}{γ}}_{f} [i] + (1 - λ_{c}) {\overset{&OverBar;}{γ}}_{b} [N - 1 - i]),

λ _cBe default coefficient.

9. the method for ground unrest is characterized in that λ in the described according to Claim 8 removal voice signal _cBe 0.5.

10. method of removing ground unrest in the voice signal is characterized in that comprising the following steps: to define the attenuation coefficient of frequency band i

γ [i] = \frac{{| D [i] |}^{2}}{{| Y [i] |}^{2}},

Wherein

{| D [i] |}^{2} = \{\begin{matrix} {| Y [i] |}^{2} - α {| W [i] |}^{2}, if {| Y [i] |}^{2} &GreaterEqual; \frac{α}{1 - β} {| W [i] |}^{2} \\ β {| Y [i] |}^{2}, elsewhere \end{matrix},

Calculate the forward attenuation coefficient γ of frequency band i _f[i] ≡ γ [i]=λ _fγ [i]+(1-λ _f) γ [i-1], wherein λ _fBe default coefficient;

Calculate the reverse attenuation coefficient gamma of frequency band i _b[i]=λ _bγ _b[i]+(1-λ _b) γ _b[i-1], wherein γ _b[i]=γ [N-1-i], λ _bBe default coefficient;

Calculate the level and smooth attenuation coefficient of frequency band i

\hat{γ} [i] = λ_{c} \cdot {\overset{&OverBar;}{γ}}_{f} [i] + (1 - λ_{c}) {\overset{&OverBar;}{γ}}_{b} [N - 1 - i]),

λ wherein _cBe default coefficient;

Computing voice frequency spectrum estimated value

\hat{X} [i] = \hat{γ} [i] Y [i];

And

Will Make inverse fourier transform, obtain removing the voice signal behind the ground unrest.

11. the method according to ground unrest in the described removal voice signal of claim 10 is characterized in that γ [1]=γ [0], and γ _b[1]=γ [N-1].

12. the method according to ground unrest in the described removal voice signal of claim 10 is characterized in that λ _f=λ _b=λ _c=0.5.