CN105869649A

CN105869649A - Perceptual filtering method and perceptual filter

Info

Publication number: CN105869649A
Application number: CN201510031872.9A
Authority: CN
Inventors: 张勇; 刘轶
Original assignee: PKU-HKUST SHENZHEN-HONGKONG INSTITUTION; Peking University Shenzhen Graduate School
Current assignee: PKU-HKUST SHENZHEN-HONGKONG INSTITUTION; Peking University Shenzhen Graduate School
Priority date: 2015-01-21
Filing date: 2015-01-21
Publication date: 2016-08-17
Anticipated expiration: 2035-01-21
Also published as: CN105869649B

Abstract

The invention provides a perceptual filtering method. The perceptual filtering method includes the following steps that: noise-containing speech is obtained, the noise-containing speech is calculated according to a noise estimation algorithm, so that noise power can be obtained; the noise-containing speech is calculated according to a masking model, so that a frequency-domain masking threshold value can be obtained; the noise-containing speech is converted to a frequency domain, so that frequency-domain noise-containing speech can be obtained, wherein the frequency-domain noise-containing speech includes frequency-domain pure speech and frequency-domain background noises; based on a speech estimation error algorithm, speech signal distortion is expressed as an relational expression about the frequency-domain pure speech and perceptual filter gain, and filtering background noises are expressed as an relational expression about the frequency-domain background noises and the perceptual filter gain; based on a relationship that the sum of the power of the speech signal distortion and the power of the filtering background noises is smaller or equal to the frequency-domain masking threshold value, an equation about the perceptual filter gain can be constructed; the equation is solved, so that the perceptual filter gain can be obtained; and filtering processing is performed on the noise-containing speech according to the perceptual filter gain, so that enhanced speech can be obtained. With the perceptual filtering method adopted, the subjective perceptual quality of the speech can be improved. The invention also provides a perceptual filter.

Description

Perception filtering method and perceptual filter

Technical field

The present invention relates to field of voice signal, particularly relate to a kind of perception filtering method and perception filtering Device.

Background technology

In actual life, voice signal is inevitably polluted by background noise, and speech enhan-cement is as one Planting signal processing method is a kind of high effective way solving sound pollution, thus it is always Speech processing One study hotspot in field.The purpose of speech enhan-cement is exactly on the premise of ensureing the intelligibility of speech, to the greatest extent may be used The removal background noise of energy, improves the subjective auditory effect of voice.

Traditional voice strengthens algorithm and includes spectrum-subtraction, Wiener Filter Method, Minimum Mean Squared Error estimation method, logarithm Spectral amplitude least mean-square error, based on DCT (Discrete Cosine Transform, discrete cosine) convert Enhancement Method etc..These methods are all based on greatly voice and the statistical model of noise component(s) in frequency domain, and combine Various estimation theories design has noise cancellation technique targetedly.But traditional voice strengthens algorithm due to vacation If there is deviation and cause enhanced voice signal still has a large amount of voice distortion and residual in model and practical situation Stay noise, have impact on the effect of speech enhan-cement.

Summary of the invention

Based on this, it is necessary to for the problems referred to above, it is provided that a kind of perception filtering method and perceptual filter, will Noise level drops to below human auditory system masking threshold, thus improves the subjective perceptual quality strengthening voice.

A kind of perception filtering method, described method includes:

Obtain noisy speech, described noisy speech is calculated noise power according to noise Estimation Algorithm；

Described noisy speech is calculated frequency domain masking threshold according to masking model；

Described noisy speech being transformed into frequency domain, obtains frequency domain noisy speech, described frequency domain noisy speech includes Frequency domain clean speech and frequency domain background noise；

Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, The relational expression of perceptual filter gain, is expressed as filtering background noise about described frequency domain background noise, sense Know the relational expression of filter gain；

According to described voice signal distortion and filtering background noise, based on voice signal distortion power, the filtering back of the body Scape noise power sum is less than or equal to the pass series structure side about perceptual filter gain of frequency domain masking threshold Journey；

Solve described equation and obtain described perceptual filter gain；

According to described perceptual filter gain, it is filtered described noisy speech processing the voice obtaining strengthening.

Wherein in an embodiment, described according to described voice signal distortion and filtering background noise, based on Voice signal distortion power, filtering Background Noise Power sum are less than or equal to the relation structure of frequency domain masking threshold The step making the equation about perceptual filter gain is:

Described voice signal distortion power is obtained according to described voice signal distortion；

Described filtering Background Noise Power is obtained according to described filtering background noise；

Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold, Obtain described equation (G (k)-1)²P_s(k)+(G(k))²P_zK ()=T (k), wherein G (k) is perceptual filter gain, k For frequency spectrum sequence number, P_SK () is frequency domain clean speech power, P_ZK () is frequency domain Background Noise Power, T (k) is frequency Territory masking threshold.

Wherein in an embodiment, described in solve described equation and obtain the step of described perceptual filter gain Including:

Approximate data is used to be calculated described frequency domain Background Noise Power according to described noise power；

It is calculated posteriori SNR according to described frequency domain Background Noise Power；

It is calculated prior weight based on direct decision algorithm according to described posteriori SNR；

Solve described equation obtain according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold To described perceptual filter gain.

Wherein in an embodiment, described, described noisy speech is transformed into frequency domain, obtains frequency domain band and make an uproar Voice, before described frequency domain noisy speech includes the step of frequency domain clean speech and frequency domain background noise, also wraps Include:

Described noisy speech is used and strengthens based on short-time magnitude Power estimation method, obtain enhanced band and make an uproar Voice, the described frequency domain that is transformed into by described noisy speech is that described enhanced noisy speech is transformed into frequency domain, Described be filtered noisy speech processes the voice obtaining strengthening for be filtered enhanced noisy speech Process obtain strengthen voice, described according to described noise power use approximate data be calculated described frequency domain The step of Background Noise Power is:

Obtain described frequency domain gain function G based on short-time magnitude Power estimation method_HK (), wherein k is frequency spectrum sequence number；

According to P_Z(k)=λ_d(k)-(1-G_H(k))|Y(k)|²Obtain described frequency domain Background Noise Power P_Z(k), wherein λ_dK () is noise power, Y (k) is frequency domain noisy speech.

Wherein in an embodiment, described calculate based on direct decision algorithm according to described posteriori SNR Step to prior weight is:

Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ ' Number, l is frame number, and present frame is l frame；

Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before One frame perceptual filter gain is preset value；

Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain

{\hat{ξ}}^{'} (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) \max {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.

A kind of perceptual filter, described perceptual filter includes:

Acquisition module, is used for obtaining noisy speech；

Noise power calculation module, for being calculated noise by described noisy speech according to noise Estimation Algorithm Power；

Masking threshold computing module, shelters for described noisy speech is calculated frequency domain according to masking model Threshold value；

Frequency domain modular converter, for described noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, institute State frequency domain noisy speech and include frequency domain clean speech and frequency domain background noise；

Equation constructing module, for based on voice estimation difference algorithm, voice signal distortion table is shown as about Described frequency domain clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise about institute State frequency domain background noise, the relational expression of perceptual filter gain, according to described voice signal distortion and the filtering back of the body Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum The pass series structure of threshold value is about the equation of perceptual filter gain；

Gain solves module, is used for solving described equation and obtains described perceptual filter gain；

Filtering Processing module, for according to described perceptual filter gain, is filtered described noisy speech Process the voice obtaining strengthening.

Wherein in an embodiment, described equation constructing module is according to described voice signal distortion and the filtering back of the body Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum Threshold value close series structure about perceptual filter gain equation particularly as follows:

Wherein in an embodiment, described gain solves module and includes:

Solve preparatory unit, for using approximate data to be calculated the described frequency domain back of the body according to described noise power Scape noise power, is calculated posteriori SNR according to described frequency domain Background Noise Power, according to described posteriority Signal to noise ratio is calculated prior weight based on direct decision algorithm；

Solve unit, for according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold Solve described equation and obtain described perceptual filter gain.

Wherein in an embodiment, described perceptual filter also includes:

Strengthen module, strengthen based on short-time magnitude Power estimation method for described noisy speech is used, To enhanced noisy speech；

Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by described frequency domain modular converter It is transformed into frequency domain；

Noisy speech is filtered processing the voice obtaining strengthening for enhanced by described Filtering Processing module Noisy speech is filtered processing the voice obtaining strengthening；

The described preparatory unit that solves uses approximate data to be calculated described frequency domain background according to described noise power Noise power particularly as follows:

Wherein in an embodiment, described in solve preparatory unit according to described posteriori SNR based on directly sentencing Annual reporting law be calculated prior weight particularly as follows:

{\hat{ξ}}^{'} (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) \max {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.

Said sensed filtering method and perceptual filter, by obtaining noisy speech, by noisy speech according to making an uproar Sound algorithm for estimating is calculated noise power, according to masking model, noisy speech is calculated frequency domain and shelters threshold Value, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, and frequency domain noisy speech includes that frequency domain is pure Voice and frequency domain background noise；Based on voice estimation difference algorithm, voice signal distortion table is shown as about institute State frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about described Frequency domain background noise, the relational expression of perceptual filter gain；According to voice signal distortion and filtering background noise, Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold Series structure is about the equation of perceptual filter gain；Solving equation obtains perceptual filter gain；According to perception Filter gain, is filtered noisy speech processing the voice obtaining strengthening.Due to voice signal distortion merit Rate, filtering Background Noise Power sum, less than or equal to frequency domain masking threshold, are ensureing voice signal distortion merit While rate is less, it is ensured that noise level is not heard by human ear less than human auditory system masking threshold, thus improves Strengthen the subjective perceptual quality of voice.

Accompanying drawing explanation

Fig. 1 is the flow chart of perception filtering method in an embodiment；

Fig. 2 is the flow chart constructing the equation about perceptual filter gain in an embodiment；

Fig. 3 is the flow chart that in an embodiment, solving equation obtains perceptual filter gain；

Fig. 4 is the structured flowchart of speech-enhancement system in an embodiment；

Fig. 5 is the structured flowchart of perceptual filter in an embodiment；

Fig. 6 is the structured flowchart that in an embodiment, gain solves module；

Fig. 7 is the structured flowchart of perceptual filter in another embodiment.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and reality Execute example, the present invention is further elaborated.Only should be appreciated that specific embodiment described herein Only in order to explain the present invention, it is not intended to limit the present invention.

As shown in Figure 1, it is provided that a kind of perception filtering method, comprise the following steps:

Step S110, obtains noisy speech, according to noise Estimation Algorithm, noisy speech is calculated noise merit Rate.

In the present embodiment, the noisy speech of acquisition is y (n)=s (n)+z (n) at time-domain representation, and wherein s (n) is pure Voice signal, z (n) is the background noise in original noisy speech.Noise Estimation Algorithm can use existing calculation Method, is calculated noise power λ of frequency domain by noisy speech y (n)=s (n)+z (n) according to noise Estimation Algorithm_d(k), Wherein k is frequency spectrum sequence number.

Step S120, is calculated frequency domain masking threshold by noisy speech according to masking model.

In the present embodiment, masking model can be existing masking model, such as psychoacoustic model, according to covering Cover model and be calculated frequency domain masking threshold T (k) of frequency domain noisy speech Y (k).

Step S130, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, including the pure language of frequency domain Sound and frequency domain background noise.

In the present embodiment, noisy speech y (n)=s (n)+z (n) is transformed into frequency domain through FFT, obtains frequency Territory noisy speech Y (k), is expressed as Y (k)=S (k)+Z (k), and wherein S (k) is frequency domain clean speech, and Z (k) is frequency Territory background noise, k is frequency spectrum sequence number.It is understood that noisy speech can be through voice enhancement algorithm Noisy speech after process, as the band after sound enhancement method based on short-time spectrum amplitude Estimation processes is made an uproar Voice, now z (n) is the remnants after sound enhancement method based on short-time spectrum amplitude Estimation processes in voice Noise.

Step S140, based on voice estimation difference algorithm, is shown as voice signal distortion table about described frequency domain Clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise carrying on the back about described frequency domain Scape noise, the relational expression of perceptual filter gain.

In the present embodiment, the frequency domain after perceptual filter denoising strengthens voiceRoot According to voice estimation differenceObtaining E (k)=S (k)-G (k) Y (k), wherein E (k) is that voice is estimated Meter error, S (k) is frequency domain clean speech, and G (k) is perceptual filter gain, and Y (k) is frequency domain noisy speech. According to Y (k)=S (K)+Z (K), obtaining E (k)=S (k)-G (K) (S (K)+Z (K)), wherein Z (K) is frequency domain background Noise.Described voice estimation difference E (k) is converted into E (k)=(1-G (k)) S (k)-G (K) Z (K), obtains voice Distorted signals ε_SK ()=(1-G (k)) S (k), filters background noise ε_Z(k)=|-G (k) Z (k) |=| G (k) Z (k) |.

Step S150, according to voice signal distortion and filtering background noise, based on voice signal distortion power, Filtering Background Noise Power sum increases about perceptual filter less than or equal to the pass series structure of frequency domain masking threshold The equation of benefit.

In the present embodiment, voice signal distortion power isFiltering Background Noise Power ForWherein E{ } represent expectation, the transposition of T representing matrix.Cover in conjunction with human ear Cover effect, while optimum gain function G (k) should make voice distortion the least, make at background noise Under human ear masking threshold, if voice distortion simultaneously is excessive, hence it is evident that have distortion, subjective perception can be affected Quality, therefore, the present embodiment requires voice signal distortion power E_S(k), filtering Background Noise Power E_z(k) it With less than or equal to frequency domain masking threshold T (k), i.e. E_S(k)+E_Z(k)≤T(k).Can meet as required E_S(k)+E_ZSelf-defined E under conditions of (k)≤T (k)_S(k)+E_ZK the pass series structure between () and T (k) is about G's (k) Equation, such as E_S(k)+E_Z(k)=T (k)/2.

In one embodiment, as in figure 2 it is shown, according to voice signal distortion power, filter background noise merit Rate sum is equal to frequency domain masking threshold, constructs the equation about G (k), and step S150 comprises the following steps:

Step S151, obtains described voice signal distortion power according to described voice signal distortion.

Concrete, by voice signal distortion ε_SK ()=(1-G (k)) S (k) substitutes intoObtain language Tone signal distortion power E_s(k)=(G (k)-1)²P_s(k), wherein P_S(k)=E{S^T(k), S (k) } it is frequency domain clean speech Power.

Step S152, obtains described filtering Background Noise Power according to described filtering background noise.

Concrete, will filtering background noise ε_ZK ()=| G (k) Z (k) | substitutes intoFiltered Background Noise Power E_z(k)=(G (k))²P_z(k), wherein P_Z(k)=E{Z^T(k), Z (k) } it is frequency domain Background Noise Power.

Step S153, shelters equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum The relation of threshold value, obtains described equation (G (k)-1)²P_S(k)+(G(k))²P_ZK ()=T (k), wherein G (k) is perception Filter gain, k is frequency spectrum sequence number, P_SK () is frequency domain clean speech power, P_ZK () is frequency domain background noise Power, T (k) is frequency domain masking threshold.

Concrete, by voice signal distortion power E_S(k)=(G (k)-1)²P_S(k), filtering Background Noise Power E_z(k)=(G (k))²P_zK () substitutes into E_S(k)+E_ZK ()=T (k) obtains (G(k)-1)²P_S(k)+(G(k))²P_Z(k)=T (k), wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number, P_SK () is frequency domain clean speech power, P_ZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.

Step S150, solving equation obtains perceptual filter gain.

In the present embodiment, (G (k)-1)²P_S(k)+(G(k))²P_ZK ()=T (k) is the One-place 2-th Order side about G (k) Journey, can be by first calculating P_S(k)、P_ZK the value of (), then solves quadratic equation with one unknown and obtains equation root.Also By equation is deformed, then solving equation can be carried out.Owing to there is quadratic equation with one unknown without the feelings solved Condition, can the most self-defined G (k).

In one embodiment, as it is shown on figure 3, step S160 comprises the following steps:

Step S161, uses approximate data to be calculated frequency domain Background Noise Power according to noise power.

Concrete, by noise power λ_dK () is approximately equal to frequency domain Background Noise Power P_ZK (), by P_Z(k)=λ_d(k) Obtain frequency domain Background Noise Power P_Z(k).If it is understood that the noisy speech obtained is again through voice Strengthen algorithm to be processed, then approximate data can be different, uses self-defining approximate data.

Step S162, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR It is calculated prior weight based on direct decision algorithm.

In the present embodiment, posteriori SNR γ ' (k) is defined asWherein Y (k) is noisy speech, The spectrum amplitude that | Y (k) | is noisy speech, P_ZK () is frequency domain Background Noise Power.Directly decision algorithm can be adopted Use existing algorithm, be calculated prior weight ξ ' (k).

Step S163, according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold solving equation Obtain perceptual filter gain.

In the present embodiment, if according to E_S(k)+E_ZK ()=T (k) equationof structure formula, then the equation constructed is (G(k)-1)²P_S(k)+(G(k))²P_Z(k)=T (k).As a example by solving this equation, by about this equation simultaneously divided by P_ZK () is converted into equation (G (k)-1)²ξ′(k)+(G(k))²=C (k), wherein ξ ' (k) is prior weight,P_ZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.This equation be about The linear equation in two unknowns of G (k), wherein ξ ' (k), C (k) are that oneself knows, then obtain according to quadratic equation with one unknown radical formula Arrive

G (k) = \frac{ξ^{'} (k) + \sqrt{ξ^{'} (k) (C (k) - 1) + C (k)}}{ξ^{'} (k) + 1},

The condition set up is

\frac{ξ^{'} (k)}{ξ^{'} (k) + 1} \leq C (k) < 1,

If

0 < C (k) < \frac{ξ^{'} (k)}{ξ^{'} (k) + 1}

Or during C (k) >=1 equation without solve.If

0 < C (k) < \frac{ξ^{'} (k)}{ξ^{'} (k) + 1},

Definition

G (k) = \frac{ξ^{'} (k)}{ξ^{'} (k) + 1} .

If C (k) >=1, according toThen frequency domain Background Noise Power P_ZK () is at frequency domain masking threshold T (k) Under, now noise level is less than human auditory system masking threshold, now need not carry out frequency domain noisy speech Y (k) Filtering Processing, also can reach preferable audition subjective effect, definition G (k)=1.Therefore, the above analysis, Then perceptual filter G (k) is:

G (k) = \{\begin{matrix} \frac{ξ^{'} (k)}{ξ^{'} (k) + 1} & 0 < C < \frac{ξ^{'} (k)}{ξ^{'} (k) + 1} \\ \frac{ξ^{'} (k) + \sqrt{ξ^{'} (k) (C (k) - 1) + C (k)}}{ξ^{'} (k) + 1} & \frac{ξ^{'} (k)}{ξ^{'} (k) + 1} \leq C < 1 \\ 1 & C &GreaterEqual; 1 \end{matrix}

It is understood that the present embodiment is with solving equation (G (k)-1)²P_S(k)+(G(k))²P_ZAs a example by (k)=T (k) Illustrating, equation can be according to E_S(k)+E_ZK other equation that ()≤T (k) constructs.

Step S170, according to perceptual filter gain, is filtered noisy speech processing the language obtaining strengthening Sound.

In the present embodiment, according to perceptual filter gain G (k), by frequency domain noisy speech? To the frequency domain speech strengthenedConvert it to time domain again, obtain the voice strengthenedOr first will sense Know that filter gain G (k) is transformed into time domain, obtain g (n), then byObtain enhancing VoiceWherein y (n) is time domain noisy speech, and * represents convolution.

In the present embodiment, by obtaining noisy speech, noisy speech is calculated according to noise Estimation Algorithm Noise power, is calculated frequency domain masking threshold by noisy speech according to masking model, is changed by noisy speech To frequency domain, obtaining frequency domain noisy speech, frequency domain noisy speech includes frequency domain clean speech and frequency domain background noise； Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, perception The relational expression of filter gain, is expressed as filtering background noise about described frequency domain background noise, perception filter The relational expression of ripple device gain；According to voice signal distortion and filtering background noise, based on voice signal distortion merit Rate, filtering Background Noise Power sum filter about perception less than or equal to the pass series structure of frequency domain masking threshold The equation of device gain；Solving equation obtains perceptual filter gain；According to perceptual filter gain, band is made an uproar Voice is filtered processing the voice obtaining strengthening.Due to voice signal distortion power, filtering background noise merit Rate sum is less than or equal to frequency domain masking threshold, while ensureing that voice signal distortion power is less, it is ensured that Noise level is not heard by human ear less than human auditory system masking threshold, thus improves the subjective sense strengthening voice Know quality.

In one embodiment, before step S130, also include: noisy speech is used based on width in short-term Degree Power estimation method strengthens, and obtains enhanced noisy speech, is transformed into by noisy speech in step S130 Noisy speech, for enhanced noisy speech is transformed into frequency domain, is filtered in step S170 processing by frequency domain The voice obtaining strengthening is to be filtered enhanced noisy speech processing the voice obtaining strengthening, step S161 is:

Obtain frequency domain gain function G based on short-time magnitude Power estimation method_HK (), wherein k is frequency spectrum sequence number；Root According to P_Z(k)=λ_d(k)-(1-G_H(k))|Y(k)|²Obtain frequency domain Background Noise Power P_Z(k), wherein λ_dK () is noise merit Rate, Y (k) is frequency domain noisy speech.

In the present embodiment, due to through yet suffering from remnants based on the short-time magnitude enhanced voice of Power estimation method Noise, can improve the effect strengthening voice further by the perception filtering method of the present embodiment.Now root According to noise power λ_dK () uses approximate data to be calculated frequency domain Background Noise Power P_ZTime (k), first obtain based on The frequency domain gain function G of short-time magnitude Power estimation method_HK (), then according to P_Z(k)=λ_d(k)-(1-G_H(k))|Y(k)|² Approximate evaluation obtains frequency domain Background Noise Power P_ZK (), wherein Y (k) is noisy speech, and | Y (k) | is noisy speech Spectrum amplitude.

In one embodiment, it is calculated prior weight according to posteriori SNR based on direct decision algorithm Step be: obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is respectively γ ' Frequency spectrum sequence number, l is frame number, and present frame is l frame；Obtain former frame perceptual filter gain G (k, l-1), as The most described former frame is the first frame, then the gain of former frame perceptual filter is preset value；According to posteriori SNR Present frame prior weight is obtained by formula with perceptual filter gain

{\hat{ξ}}^{'} (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) \max {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.

In the present embodiment, defining the first frame perceptual filter gain G (k, 1) is preset value, preferential for 1, obtains Take the second frame and the first frame posteriori SNR, respectively γ ' (k, 2), γ ' (k, 1) basis

{\hat{ξ}}^{'} (k, 2) = ηG (k, 1) γ^{'} (k, 1) + (1 - η) \max {γ^{'} (k, 2) - 1,0}

Obtain the second frame prior weight.η for smooth because of Son, can take any number between 0 to 1, preferred η=0.92.Obtain the prior weight of the second frame After, the perceptual filter gain G (k, 2) that can obtain the second frame is solved equation according to follow-up step.Again can root According to G (k, 2), calculateBy that analogy.

Above example can be applied in speech-enhancement system as shown in Figure 4, inputs noisy speech, warp Cross masking threshold estimation 164 and obtain frequency domain masking threshold T (k), obtain noise power through Noise Estimation 166 λ_dK (), by T (k), λ_dK () input perception boostfiltering 165, structure solving equation obtain perceptual filter and increase Benefit, and carry out Filtering Processing obtain strengthen voice.

In one embodiment, as shown in Figure 5, it is provided that a kind of perceptual filter, including:

Acquisition module 210, is used for obtaining noisy speech.

Noise power calculation module 220, for being calculated noise merit by noisy speech according to noise Estimation Algorithm Rate.

Masking threshold computing module 230, shelters threshold for noisy speech is calculated frequency domain according to masking model Value.

Frequency domain modular converter 240, for noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, frequency domain Noisy speech includes frequency domain clean speech and frequency domain background noise.

Equation constructing module 250, for based on voice estimation difference algorithm, is shown as pass by voice signal distortion table In described frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about Described frequency domain background noise, the relational expression of perceptual filter gain, according to voice signal distortion and filtering background Noise, shelters threshold based on voice signal distortion power, filtering Background Noise Power sum less than or equal to frequency domain The pass series structure of value is about the equation of perceptual filter gain.

Gain solves module 260, obtains perceptual filter gain for solving equation；

Filtering Processing module 270, for according to perceptual filter gain, is filtered processing to noisy speech To the voice strengthened.

In one embodiment, equation constructing module 250 is according to voice signal distortion and filters background noise, Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold Series structure about perceptual filter gain equation particularly as follows: according to voice signal distortion obtain described voice letter Number distortion power, obtains described filtering Background Noise Power according to filtering background noise, loses based on voice signal True power, filtering Background Noise Power sum, equal to the relation of frequency domain masking threshold, obtain equation (G(k)-1)²P_s(k)+(G(k))²P_z(k)=T (k)_,Wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number, P_SK () is frequency domain clean speech power, P_ZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.

In one embodiment, as shown in Figure 6, gain solves module 260 and includes:

Solve preparatory unit 261, for using approximate data to be calculated frequency domain background noise according to noise power Power, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR based on directly Decision algorithm is calculated prior weight.

Solve unit 262, for asking according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold Solve equation and obtain perceptual filter gain.

In one embodiment, as it is shown in fig. 7, on the basis of above-described embodiment, described perceptual filter Also include:

Strengthen module 280, strengthen based on short-time magnitude Power estimation method for noisy speech is used, obtain Enhanced noisy speech.

Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by frequency domain modular converter 240 Being transformed into frequency domain, it is right that noisy speech is filtered processing the voice obtaining strengthening by Filtering Processing module 270 Enhanced noisy speech is filtered processing the voice obtaining strengthening.

Solving preparatory unit 261 uses approximate data to be calculated frequency domain Background Noise Power according to noise power Particularly as follows: obtain frequency domain gain function G based on short-time magnitude Power estimation method_H(k), wherein k is frequency spectrum sequence number, According to P_Z(k)=λ_d(k)-(1-G_H(k))|Y(k)|²Obtain frequency domain Background Noise Power P_Z(k), wherein λ_dK () is noise Power, Y (k) is frequency domain noisy speech.

In one embodiment, preparatory unit 261 is solved according to posteriori SNR based on direct decision algorithm meter Calculation obtains prior weight particularly as follows: obtain present frame and former frame posteriori SNR, respectively γ ' (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence number, and l is frame number, and present frame is l frame, obtains former frame perception filtering Device gain G (k, l-1), if former frame is the first frame, then the gain of former frame perceptual filter is preset value；Root Present frame prior weight is obtained by formula according to posteriori SNR and perceptual filter gain

{\hat{ξ}}^{'} (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) \max {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.

Embodiment described above only have expressed the several embodiments of the present invention, and it describes more concrete and detailed, But therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that, for this area Those of ordinary skill for, without departing from the inventive concept of the premise, it is also possible to make some deformation and Improving, these broadly fall into protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be with appended Claim is as the criterion.

Claims

1. a perception filtering method, described method includes:

Solve described equation and obtain described perceptual filter gain；

Method the most according to claim 1, it is characterised in that described according to described voice signal distortion With filtering background noise, it is less than or equal to based on voice signal distortion power, filtering Background Noise Power sum The pass series structure of frequency domain masking threshold about the step of the equation of perceptual filter gain is:

Method the most according to claim 1, it is characterised in that described in solve described equation and obtain described The step of perceptual filter gain includes:

Method the most according to claim 3, it is characterised in that described noisy speech is changed described To frequency domain, obtaining frequency domain noisy speech, described frequency domain noisy speech includes frequency domain clean speech and frequency domain background Before the step of noise, also include:

Method the most according to claim 3, it is characterised in that described according to described posteriori SNR base The step being calculated prior weight in direct decision algorithm is:

{\hat{ξ}}^{'} = (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) \max {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.

6. a perceptual filter, it is characterised in that described perceptual filter includes:

Acquisition module, is used for obtaining noisy speech；

Perceptual filter the most according to claim 6, it is characterised in that described equation constructing module root According to described voice signal distortion and filtering background noise, based on voice signal distortion power, filtering background noise Power sum is concrete about the equation of perceptual filter gain less than or equal to the pass series structure of frequency domain masking threshold For:

Perceptual filter the most according to claim 6, it is characterised in that described gain solves module bag Include:

Perceptual filter the most according to claim 8, it is characterised in that described perceptual filter is also wrapped Include:

Perceptual filter the most according to claim 8, it is characterised in that described in solve preparatory unit Be calculated prior weight according to described posteriori SNR based on direct decision algorithm particularly as follows:

{\hat{ξ}}^{'} = (k, l) = ηG (k, l - 1) γ^{'} (k, l - 1) + (1 - η) {γ^{'} (k, l) - 1,0},

Wherein η is smoothing factor, and 0 ＜ η ＜ 1.