CN105869649A - Perceptual filtering method and perceptual filter - Google Patents
Perceptual filtering method and perceptual filter Download PDFInfo
- Publication number
- CN105869649A CN105869649A CN201510031872.9A CN201510031872A CN105869649A CN 105869649 A CN105869649 A CN 105869649A CN 201510031872 A CN201510031872 A CN 201510031872A CN 105869649 A CN105869649 A CN 105869649A
- Authority
- CN
- China
- Prior art keywords
- frequency domain
- background noise
- noisy speech
- power
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Telephone Function (AREA)
Abstract
The invention provides a perceptual filtering method. The perceptual filtering method includes the following steps that: noise-containing speech is obtained, the noise-containing speech is calculated according to a noise estimation algorithm, so that noise power can be obtained; the noise-containing speech is calculated according to a masking model, so that a frequency-domain masking threshold value can be obtained; the noise-containing speech is converted to a frequency domain, so that frequency-domain noise-containing speech can be obtained, wherein the frequency-domain noise-containing speech includes frequency-domain pure speech and frequency-domain background noises; based on a speech estimation error algorithm, speech signal distortion is expressed as an relational expression about the frequency-domain pure speech and perceptual filter gain, and filtering background noises are expressed as an relational expression about the frequency-domain background noises and the perceptual filter gain; based on a relationship that the sum of the power of the speech signal distortion and the power of the filtering background noises is smaller or equal to the frequency-domain masking threshold value, an equation about the perceptual filter gain can be constructed; the equation is solved, so that the perceptual filter gain can be obtained; and filtering processing is performed on the noise-containing speech according to the perceptual filter gain, so that enhanced speech can be obtained. With the perceptual filtering method adopted, the subjective perceptual quality of the speech can be improved. The invention also provides a perceptual filter.
Description
Technical field
The present invention relates to field of voice signal, particularly relate to a kind of perception filtering method and perception filtering
Device.
Background technology
In actual life, voice signal is inevitably polluted by background noise, and speech enhan-cement is as one
Planting signal processing method is a kind of high effective way solving sound pollution, thus it is always Speech processing
One study hotspot in field.The purpose of speech enhan-cement is exactly on the premise of ensureing the intelligibility of speech, to the greatest extent may be used
The removal background noise of energy, improves the subjective auditory effect of voice.
Traditional voice strengthens algorithm and includes spectrum-subtraction, Wiener Filter Method, Minimum Mean Squared Error estimation method, logarithm
Spectral amplitude least mean-square error, based on DCT (Discrete Cosine Transform, discrete cosine) convert
Enhancement Method etc..These methods are all based on greatly voice and the statistical model of noise component(s) in frequency domain, and combine
Various estimation theories design has noise cancellation technique targetedly.But traditional voice strengthens algorithm due to vacation
If there is deviation and cause enhanced voice signal still has a large amount of voice distortion and residual in model and practical situation
Stay noise, have impact on the effect of speech enhan-cement.
Summary of the invention
Based on this, it is necessary to for the problems referred to above, it is provided that a kind of perception filtering method and perceptual filter, will
Noise level drops to below human auditory system masking threshold, thus improves the subjective perceptual quality strengthening voice.
A kind of perception filtering method, described method includes:
Obtain noisy speech, described noisy speech is calculated noise power according to noise Estimation Algorithm;
Described noisy speech is calculated frequency domain masking threshold according to masking model;
Described noisy speech being transformed into frequency domain, obtains frequency domain noisy speech, described frequency domain noisy speech includes
Frequency domain clean speech and frequency domain background noise;
Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech,
The relational expression of perceptual filter gain, is expressed as filtering background noise about described frequency domain background noise, sense
Know the relational expression of filter gain;
According to described voice signal distortion and filtering background noise, based on voice signal distortion power, the filtering back of the body
Scape noise power sum is less than or equal to the pass series structure side about perceptual filter gain of frequency domain masking threshold
Journey;
Solve described equation and obtain described perceptual filter gain;
According to described perceptual filter gain, it is filtered described noisy speech processing the voice obtaining strengthening.
Wherein in an embodiment, described according to described voice signal distortion and filtering background noise, based on
Voice signal distortion power, filtering Background Noise Power sum are less than or equal to the relation structure of frequency domain masking threshold
The step making the equation about perceptual filter gain is:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold,
Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k
For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency
Territory masking threshold.
Wherein in an embodiment, described in solve described equation and obtain the step of described perceptual filter gain
Including:
Approximate data is used to be calculated described frequency domain Background Noise Power according to described noise power;
It is calculated posteriori SNR according to described frequency domain Background Noise Power;
It is calculated prior weight based on direct decision algorithm according to described posteriori SNR;
Solve described equation obtain according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold
To described perceptual filter gain.
Wherein in an embodiment, described, described noisy speech is transformed into frequency domain, obtains frequency domain band and make an uproar
Voice, before described frequency domain noisy speech includes the step of frequency domain clean speech and frequency domain background noise, also wraps
Include:
Described noisy speech is used and strengthens based on short-time magnitude Power estimation method, obtain enhanced band and make an uproar
Voice, the described frequency domain that is transformed into by described noisy speech is that described enhanced noisy speech is transformed into frequency domain,
Described be filtered noisy speech processes the voice obtaining strengthening for be filtered enhanced noisy speech
Process obtain strengthen voice, described according to described noise power use approximate data be calculated described frequency domain
The step of Background Noise Power is:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein
λdK () is noise power, Y (k) is frequency domain noisy speech.
Wherein in an embodiment, described calculate based on direct decision algorithm according to described posteriori SNR
Step to prior weight is:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ '
Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before
One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
A kind of perceptual filter, described perceptual filter includes:
Acquisition module, is used for obtaining noisy speech;
Noise power calculation module, for being calculated noise by described noisy speech according to noise Estimation Algorithm
Power;
Masking threshold computing module, shelters for described noisy speech is calculated frequency domain according to masking model
Threshold value;
Frequency domain modular converter, for described noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, institute
State frequency domain noisy speech and include frequency domain clean speech and frequency domain background noise;
Equation constructing module, for based on voice estimation difference algorithm, voice signal distortion table is shown as about
Described frequency domain clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise about institute
State frequency domain background noise, the relational expression of perceptual filter gain, according to described voice signal distortion and the filtering back of the body
Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum
The pass series structure of threshold value is about the equation of perceptual filter gain;
Gain solves module, is used for solving described equation and obtains described perceptual filter gain;
Filtering Processing module, for according to described perceptual filter gain, is filtered described noisy speech
Process the voice obtaining strengthening.
Wherein in an embodiment, described equation constructing module is according to described voice signal distortion and the filtering back of the body
Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum
Threshold value close series structure about perceptual filter gain equation particularly as follows:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold,
Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k
For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency
Territory masking threshold.
Wherein in an embodiment, described gain solves module and includes:
Solve preparatory unit, for using approximate data to be calculated the described frequency domain back of the body according to described noise power
Scape noise power, is calculated posteriori SNR according to described frequency domain Background Noise Power, according to described posteriority
Signal to noise ratio is calculated prior weight based on direct decision algorithm;
Solve unit, for according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold
Solve described equation and obtain described perceptual filter gain.
Wherein in an embodiment, described perceptual filter also includes:
Strengthen module, strengthen based on short-time magnitude Power estimation method for described noisy speech is used,
To enhanced noisy speech;
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by described frequency domain modular converter
It is transformed into frequency domain;
Noisy speech is filtered processing the voice obtaining strengthening for enhanced by described Filtering Processing module
Noisy speech is filtered processing the voice obtaining strengthening;
The described preparatory unit that solves uses approximate data to be calculated described frequency domain background according to described noise power
Noise power particularly as follows:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein
λdK () is noise power, Y (k) is frequency domain noisy speech.
Wherein in an embodiment, described in solve preparatory unit according to described posteriori SNR based on directly sentencing
Annual reporting law be calculated prior weight particularly as follows:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ '
Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before
One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
Said sensed filtering method and perceptual filter, by obtaining noisy speech, by noisy speech according to making an uproar
Sound algorithm for estimating is calculated noise power, according to masking model, noisy speech is calculated frequency domain and shelters threshold
Value, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, and frequency domain noisy speech includes that frequency domain is pure
Voice and frequency domain background noise;Based on voice estimation difference algorithm, voice signal distortion table is shown as about institute
State frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about described
Frequency domain background noise, the relational expression of perceptual filter gain;According to voice signal distortion and filtering background noise,
Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold
Series structure is about the equation of perceptual filter gain;Solving equation obtains perceptual filter gain;According to perception
Filter gain, is filtered noisy speech processing the voice obtaining strengthening.Due to voice signal distortion merit
Rate, filtering Background Noise Power sum, less than or equal to frequency domain masking threshold, are ensureing voice signal distortion merit
While rate is less, it is ensured that noise level is not heard by human ear less than human auditory system masking threshold, thus improves
Strengthen the subjective perceptual quality of voice.
Accompanying drawing explanation
Fig. 1 is the flow chart of perception filtering method in an embodiment;
Fig. 2 is the flow chart constructing the equation about perceptual filter gain in an embodiment;
Fig. 3 is the flow chart that in an embodiment, solving equation obtains perceptual filter gain;
Fig. 4 is the structured flowchart of speech-enhancement system in an embodiment;
Fig. 5 is the structured flowchart of perceptual filter in an embodiment;
Fig. 6 is the structured flowchart that in an embodiment, gain solves module;
Fig. 7 is the structured flowchart of perceptual filter in another embodiment.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and reality
Execute example, the present invention is further elaborated.Only should be appreciated that specific embodiment described herein
Only in order to explain the present invention, it is not intended to limit the present invention.
As shown in Figure 1, it is provided that a kind of perception filtering method, comprise the following steps:
Step S110, obtains noisy speech, according to noise Estimation Algorithm, noisy speech is calculated noise merit
Rate.
In the present embodiment, the noisy speech of acquisition is y (n)=s (n)+z (n) at time-domain representation, and wherein s (n) is pure
Voice signal, z (n) is the background noise in original noisy speech.Noise Estimation Algorithm can use existing calculation
Method, is calculated noise power λ of frequency domain by noisy speech y (n)=s (n)+z (n) according to noise Estimation Algorithmd(k),
Wherein k is frequency spectrum sequence number.
Step S120, is calculated frequency domain masking threshold by noisy speech according to masking model.
In the present embodiment, masking model can be existing masking model, such as psychoacoustic model, according to covering
Cover model and be calculated frequency domain masking threshold T (k) of frequency domain noisy speech Y (k).
Step S130, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, including the pure language of frequency domain
Sound and frequency domain background noise.
In the present embodiment, noisy speech y (n)=s (n)+z (n) is transformed into frequency domain through FFT, obtains frequency
Territory noisy speech Y (k), is expressed as Y (k)=S (k)+Z (k), and wherein S (k) is frequency domain clean speech, and Z (k) is frequency
Territory background noise, k is frequency spectrum sequence number.It is understood that noisy speech can be through voice enhancement algorithm
Noisy speech after process, as the band after sound enhancement method based on short-time spectrum amplitude Estimation processes is made an uproar
Voice, now z (n) is the remnants after sound enhancement method based on short-time spectrum amplitude Estimation processes in voice
Noise.
Step S140, based on voice estimation difference algorithm, is shown as voice signal distortion table about described frequency domain
Clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise carrying on the back about described frequency domain
Scape noise, the relational expression of perceptual filter gain.
In the present embodiment, the frequency domain after perceptual filter denoising strengthens voiceRoot
According to voice estimation differenceObtaining E (k)=S (k)-G (k) Y (k), wherein E (k) is that voice is estimated
Meter error, S (k) is frequency domain clean speech, and G (k) is perceptual filter gain, and Y (k) is frequency domain noisy speech.
According to Y (k)=S (K)+Z (K), obtaining E (k)=S (k)-G (K) (S (K)+Z (K)), wherein Z (K) is frequency domain background
Noise.Described voice estimation difference E (k) is converted into E (k)=(1-G (k)) S (k)-G (K) Z (K), obtains voice
Distorted signals εSK ()=(1-G (k)) S (k), filters background noise εZ(k)=|-G (k) Z (k) |=| G (k) Z (k) |.
Step S150, according to voice signal distortion and filtering background noise, based on voice signal distortion power,
Filtering Background Noise Power sum increases about perceptual filter less than or equal to the pass series structure of frequency domain masking threshold
The equation of benefit.
In the present embodiment, voice signal distortion power isFiltering Background Noise Power
ForWherein E{ } represent expectation, the transposition of T representing matrix.Cover in conjunction with human ear
Cover effect, while optimum gain function G (k) should make voice distortion the least, make at background noise
Under human ear masking threshold, if voice distortion simultaneously is excessive, hence it is evident that have distortion, subjective perception can be affected
Quality, therefore, the present embodiment requires voice signal distortion power ES(k), filtering Background Noise Power Ez(k) it
With less than or equal to frequency domain masking threshold T (k), i.e. ES(k)+EZ(k)≤T(k).Can meet as required
ES(k)+EZSelf-defined E under conditions of (k)≤T (k)S(k)+EZK the pass series structure between () and T (k) is about G's (k)
Equation, such as ES(k)+EZ(k)=T (k)/2.
In one embodiment, as in figure 2 it is shown, according to voice signal distortion power, filter background noise merit
Rate sum is equal to frequency domain masking threshold, constructs the equation about G (k), and step S150 comprises the following steps:
Step S151, obtains described voice signal distortion power according to described voice signal distortion.
Concrete, by voice signal distortion εSK ()=(1-G (k)) S (k) substitutes intoObtain language
Tone signal distortion power Es(k)=(G (k)-1)2Ps(k), wherein PS(k)=E{ST(k), S (k) } it is frequency domain clean speech
Power.
Step S152, obtains described filtering Background Noise Power according to described filtering background noise.
Concrete, will filtering background noise εZK ()=| G (k) Z (k) | substitutes intoFiltered
Background Noise Power Ez(k)=(G (k))2Pz(k), wherein PZ(k)=E{ZT(k), Z (k) } it is frequency domain Background Noise Power.
Step S153, shelters equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum
The relation of threshold value, obtains described equation (G (k)-1)2PS(k)+(G(k))2PZK ()=T (k), wherein G (k) is perception
Filter gain, k is frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain background noise
Power, T (k) is frequency domain masking threshold.
Concrete, by voice signal distortion power ES(k)=(G (k)-1)2PS(k), filtering Background Noise Power
Ez(k)=(G (k))2PzK () substitutes into ES(k)+EZK ()=T (k) obtains
(G(k)-1)2PS(k)+(G(k))2PZ(k)=T (k), wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number,
PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.
Step S150, solving equation obtains perceptual filter gain.
In the present embodiment, (G (k)-1)2PS(k)+(G(k))2PZK ()=T (k) is the One-place 2-th Order side about G (k)
Journey, can be by first calculating PS(k)、PZK the value of (), then solves quadratic equation with one unknown and obtains equation root.Also
By equation is deformed, then solving equation can be carried out.Owing to there is quadratic equation with one unknown without the feelings solved
Condition, can the most self-defined G (k).
In one embodiment, as it is shown on figure 3, step S160 comprises the following steps:
Step S161, uses approximate data to be calculated frequency domain Background Noise Power according to noise power.
Concrete, by noise power λdK () is approximately equal to frequency domain Background Noise Power PZK (), by PZ(k)=λd(k)
Obtain frequency domain Background Noise Power PZ(k).If it is understood that the noisy speech obtained is again through voice
Strengthen algorithm to be processed, then approximate data can be different, uses self-defining approximate data.
Step S162, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR
It is calculated prior weight based on direct decision algorithm.
In the present embodiment, posteriori SNR γ ' (k) is defined asWherein Y (k) is noisy speech,
The spectrum amplitude that | Y (k) | is noisy speech, PZK () is frequency domain Background Noise Power.Directly decision algorithm can be adopted
Use existing algorithm, be calculated prior weight ξ ' (k).
Step S163, according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold solving equation
Obtain perceptual filter gain.
In the present embodiment, if according to ES(k)+EZK ()=T (k) equationof structure formula, then the equation constructed is
(G(k)-1)2PS(k)+(G(k))2PZ(k)=T (k).As a example by solving this equation, by about this equation simultaneously divided by
PZK () is converted into equation (G (k)-1)2ξ′(k)+(G(k))2=C (k), wherein ξ ' (k) is prior weight,PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.This equation be about
The linear equation in two unknowns of G (k), wherein ξ ' (k), C (k) are that oneself knows, then obtain according to quadratic equation with one unknown radical formula
Arrive The condition set up is If Or during C (k) >=1 equation without solve.If Definition
If C (k) >=1, according toThen frequency domain Background Noise Power PZK () is at frequency domain masking threshold T (k)
Under, now noise level is less than human auditory system masking threshold, now need not carry out frequency domain noisy speech Y (k)
Filtering Processing, also can reach preferable audition subjective effect, definition G (k)=1.Therefore, the above analysis,
Then perceptual filter G (k) is:
It is understood that the present embodiment is with solving equation (G (k)-1)2PS(k)+(G(k))2PZAs a example by (k)=T (k)
Illustrating, equation can be according to ES(k)+EZK other equation that ()≤T (k) constructs.
Step S170, according to perceptual filter gain, is filtered noisy speech processing the language obtaining strengthening
Sound.
In the present embodiment, according to perceptual filter gain G (k), by frequency domain noisy speech?
To the frequency domain speech strengthenedConvert it to time domain again, obtain the voice strengthenedOr first will sense
Know that filter gain G (k) is transformed into time domain, obtain g (n), then byObtain enhancing
VoiceWherein y (n) is time domain noisy speech, and * represents convolution.
In the present embodiment, by obtaining noisy speech, noisy speech is calculated according to noise Estimation Algorithm
Noise power, is calculated frequency domain masking threshold by noisy speech according to masking model, is changed by noisy speech
To frequency domain, obtaining frequency domain noisy speech, frequency domain noisy speech includes frequency domain clean speech and frequency domain background noise;
Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, perception
The relational expression of filter gain, is expressed as filtering background noise about described frequency domain background noise, perception filter
The relational expression of ripple device gain;According to voice signal distortion and filtering background noise, based on voice signal distortion merit
Rate, filtering Background Noise Power sum filter about perception less than or equal to the pass series structure of frequency domain masking threshold
The equation of device gain;Solving equation obtains perceptual filter gain;According to perceptual filter gain, band is made an uproar
Voice is filtered processing the voice obtaining strengthening.Due to voice signal distortion power, filtering background noise merit
Rate sum is less than or equal to frequency domain masking threshold, while ensureing that voice signal distortion power is less, it is ensured that
Noise level is not heard by human ear less than human auditory system masking threshold, thus improves the subjective sense strengthening voice
Know quality.
In one embodiment, before step S130, also include: noisy speech is used based on width in short-term
Degree Power estimation method strengthens, and obtains enhanced noisy speech, is transformed into by noisy speech in step S130
Noisy speech, for enhanced noisy speech is transformed into frequency domain, is filtered in step S170 processing by frequency domain
The voice obtaining strengthening is to be filtered enhanced noisy speech processing the voice obtaining strengthening, step
S161 is:
Obtain frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;Root
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain frequency domain Background Noise Power PZ(k), wherein λdK () is noise merit
Rate, Y (k) is frequency domain noisy speech.
In the present embodiment, due to through yet suffering from remnants based on the short-time magnitude enhanced voice of Power estimation method
Noise, can improve the effect strengthening voice further by the perception filtering method of the present embodiment.Now root
According to noise power λdK () uses approximate data to be calculated frequency domain Background Noise Power PZTime (k), first obtain based on
The frequency domain gain function G of short-time magnitude Power estimation methodHK (), then according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2
Approximate evaluation obtains frequency domain Background Noise Power PZK (), wherein Y (k) is noisy speech, and | Y (k) | is noisy speech
Spectrum amplitude.
In one embodiment, it is calculated prior weight according to posteriori SNR based on direct decision algorithm
Step be: obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is respectively γ '
Frequency spectrum sequence number, l is frame number, and present frame is l frame;Obtain former frame perceptual filter gain G (k, l-1), as
The most described former frame is the first frame, then the gain of former frame perceptual filter is preset value;According to posteriori SNR
Present frame prior weight is obtained by formula with perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
In the present embodiment, defining the first frame perceptual filter gain G (k, 1) is preset value, preferential for 1, obtains
Take the second frame and the first frame posteriori SNR, respectively γ ' (k, 2), γ ' (k, 1) basis Obtain the second frame prior weight.η for smooth because of
Son, can take any number between 0 to 1, preferred η=0.92.Obtain the prior weight of the second frame
After, the perceptual filter gain G (k, 2) that can obtain the second frame is solved equation according to follow-up step.Again can root
According to G (k, 2), calculateBy that analogy.
Above example can be applied in speech-enhancement system as shown in Figure 4, inputs noisy speech, warp
Cross masking threshold estimation 164 and obtain frequency domain masking threshold T (k), obtain noise power through Noise Estimation 166
λdK (), by T (k), λdK () input perception boostfiltering 165, structure solving equation obtain perceptual filter and increase
Benefit, and carry out Filtering Processing obtain strengthen voice.
In one embodiment, as shown in Figure 5, it is provided that a kind of perceptual filter, including:
Acquisition module 210, is used for obtaining noisy speech.
Noise power calculation module 220, for being calculated noise merit by noisy speech according to noise Estimation Algorithm
Rate.
Masking threshold computing module 230, shelters threshold for noisy speech is calculated frequency domain according to masking model
Value.
Frequency domain modular converter 240, for noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, frequency domain
Noisy speech includes frequency domain clean speech and frequency domain background noise.
Equation constructing module 250, for based on voice estimation difference algorithm, is shown as pass by voice signal distortion table
In described frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about
Described frequency domain background noise, the relational expression of perceptual filter gain, according to voice signal distortion and filtering background
Noise, shelters threshold based on voice signal distortion power, filtering Background Noise Power sum less than or equal to frequency domain
The pass series structure of value is about the equation of perceptual filter gain.
Gain solves module 260, obtains perceptual filter gain for solving equation;
Filtering Processing module 270, for according to perceptual filter gain, is filtered processing to noisy speech
To the voice strengthened.
In one embodiment, equation constructing module 250 is according to voice signal distortion and filters background noise,
Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold
Series structure about perceptual filter gain equation particularly as follows: according to voice signal distortion obtain described voice letter
Number distortion power, obtains described filtering Background Noise Power according to filtering background noise, loses based on voice signal
True power, filtering Background Noise Power sum, equal to the relation of frequency domain masking threshold, obtain equation
(G(k)-1)2Ps(k)+(G(k))2Pz(k)=T (k),Wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number,
PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.
In one embodiment, as shown in Figure 6, gain solves module 260 and includes:
Solve preparatory unit 261, for using approximate data to be calculated frequency domain background noise according to noise power
Power, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR based on directly
Decision algorithm is calculated prior weight.
Solve unit 262, for asking according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold
Solve equation and obtain perceptual filter gain.
In one embodiment, as it is shown in fig. 7, on the basis of above-described embodiment, described perceptual filter
Also include:
Strengthen module 280, strengthen based on short-time magnitude Power estimation method for noisy speech is used, obtain
Enhanced noisy speech.
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by frequency domain modular converter 240
Being transformed into frequency domain, it is right that noisy speech is filtered processing the voice obtaining strengthening by Filtering Processing module 270
Enhanced noisy speech is filtered processing the voice obtaining strengthening.
Solving preparatory unit 261 uses approximate data to be calculated frequency domain Background Noise Power according to noise power
Particularly as follows: obtain frequency domain gain function G based on short-time magnitude Power estimation methodH(k), wherein k is frequency spectrum sequence number,
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain frequency domain Background Noise Power PZ(k), wherein λdK () is noise
Power, Y (k) is frequency domain noisy speech.
In one embodiment, preparatory unit 261 is solved according to posteriori SNR based on direct decision algorithm meter
Calculation obtains prior weight particularly as follows: obtain present frame and former frame posteriori SNR, respectively γ ' (k, l),
γ ' (k, l-1), wherein k is frequency spectrum sequence number, and l is frame number, and present frame is l frame, obtains former frame perception filtering
Device gain G (k, l-1), if former frame is the first frame, then the gain of former frame perceptual filter is preset value;Root
Present frame prior weight is obtained by formula according to posteriori SNR and perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
Embodiment described above only have expressed the several embodiments of the present invention, and it describes more concrete and detailed,
But therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that, for this area
Those of ordinary skill for, without departing from the inventive concept of the premise, it is also possible to make some deformation and
Improving, these broadly fall into protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be with appended
Claim is as the criterion.
Claims (10)
1. a perception filtering method, described method includes:
Obtain noisy speech, described noisy speech is calculated noise power according to noise Estimation Algorithm;
Described noisy speech is calculated frequency domain masking threshold according to masking model;
Described noisy speech being transformed into frequency domain, obtains frequency domain noisy speech, described frequency domain noisy speech includes
Frequency domain clean speech and frequency domain background noise;
Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech,
The relational expression of perceptual filter gain, is expressed as filtering background noise about described frequency domain background noise, sense
Know the relational expression of filter gain;
According to described voice signal distortion and filtering background noise, based on voice signal distortion power, the filtering back of the body
Scape noise power sum is less than or equal to the pass series structure side about perceptual filter gain of frequency domain masking threshold
Journey;
Solve described equation and obtain described perceptual filter gain;
According to described perceptual filter gain, it is filtered described noisy speech processing the voice obtaining strengthening.
Method the most according to claim 1, it is characterised in that described according to described voice signal distortion
With filtering background noise, it is less than or equal to based on voice signal distortion power, filtering Background Noise Power sum
The pass series structure of frequency domain masking threshold about the step of the equation of perceptual filter gain is:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold,
Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k
For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency
Territory masking threshold.
Method the most according to claim 1, it is characterised in that described in solve described equation and obtain described
The step of perceptual filter gain includes:
Approximate data is used to be calculated described frequency domain Background Noise Power according to described noise power;
It is calculated posteriori SNR according to described frequency domain Background Noise Power;
It is calculated prior weight based on direct decision algorithm according to described posteriori SNR;
Solve described equation obtain according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold
To described perceptual filter gain.
Method the most according to claim 3, it is characterised in that described noisy speech is changed described
To frequency domain, obtaining frequency domain noisy speech, described frequency domain noisy speech includes frequency domain clean speech and frequency domain background
Before the step of noise, also include:
Described noisy speech is used and strengthens based on short-time magnitude Power estimation method, obtain enhanced band and make an uproar
Voice, the described frequency domain that is transformed into by described noisy speech is that described enhanced noisy speech is transformed into frequency domain,
Described be filtered noisy speech processes the voice obtaining strengthening for be filtered enhanced noisy speech
Process obtain strengthen voice, described according to described noise power use approximate data be calculated described frequency domain
The step of Background Noise Power is:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein
λdK () is noise power, Y (k) is frequency domain noisy speech.
Method the most according to claim 3, it is characterised in that described according to described posteriori SNR base
The step being calculated prior weight in direct decision algorithm is:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ '
Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before
One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
6. a perceptual filter, it is characterised in that described perceptual filter includes:
Acquisition module, is used for obtaining noisy speech;
Noise power calculation module, for being calculated noise by described noisy speech according to noise Estimation Algorithm
Power;
Masking threshold computing module, shelters for described noisy speech is calculated frequency domain according to masking model
Threshold value;
Frequency domain modular converter, for described noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, institute
State frequency domain noisy speech and include frequency domain clean speech and frequency domain background noise;
Equation constructing module, for based on voice estimation difference algorithm, voice signal distortion table is shown as about
Described frequency domain clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise about institute
State frequency domain background noise, the relational expression of perceptual filter gain, according to described voice signal distortion and the filtering back of the body
Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum
The pass series structure of threshold value is about the equation of perceptual filter gain;
Gain solves module, is used for solving described equation and obtains described perceptual filter gain;
Filtering Processing module, for according to described perceptual filter gain, is filtered described noisy speech
Process the voice obtaining strengthening.
Perceptual filter the most according to claim 6, it is characterised in that described equation constructing module root
According to described voice signal distortion and filtering background noise, based on voice signal distortion power, filtering background noise
Power sum is concrete about the equation of perceptual filter gain less than or equal to the pass series structure of frequency domain masking threshold
For:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold,
Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k
For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency
Territory masking threshold.
Perceptual filter the most according to claim 6, it is characterised in that described gain solves module bag
Include:
Solve preparatory unit, for using approximate data to be calculated the described frequency domain back of the body according to described noise power
Scape noise power, is calculated posteriori SNR according to described frequency domain Background Noise Power, according to described posteriority
Signal to noise ratio is calculated prior weight based on direct decision algorithm;
Solve unit, for according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold
Solve described equation and obtain described perceptual filter gain.
Perceptual filter the most according to claim 8, it is characterised in that described perceptual filter is also wrapped
Include:
Strengthen module, strengthen based on short-time magnitude Power estimation method for described noisy speech is used,
To enhanced noisy speech;
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by described frequency domain modular converter
It is transformed into frequency domain;
Noisy speech is filtered processing the voice obtaining strengthening for enhanced by described Filtering Processing module
Noisy speech is filtered processing the voice obtaining strengthening;
The described preparatory unit that solves uses approximate data to be calculated described frequency domain background according to described noise power
Noise power particularly as follows:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein
λdK () is noise power, Y (k) is frequency domain noisy speech.
Perceptual filter the most according to claim 8, it is characterised in that described in solve preparatory unit
Be calculated prior weight according to described posteriori SNR based on direct decision algorithm particularly as follows:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ '
Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before
One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain Wherein η is smoothing factor, and 0 < η < 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510031872.9A CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510031872.9A CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105869649A true CN105869649A (en) | 2016-08-17 |
CN105869649B CN105869649B (en) | 2020-02-21 |
Family
ID=56623456
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510031872.9A Active CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105869649B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
CN112951262A (en) * | 2021-02-24 | 2021-06-11 | 北京小米松果电子有限公司 | Audio recording method and device, electronic equipment and storage medium |
US20220027436A1 (en) * | 2020-07-22 | 2022-01-27 | Mitsubishi Heavy Industries, Ltd. | Anomaly factor estimation method, anomaly factor estimating device, and program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003514264A (en) * | 1999-11-15 | 2003-04-15 | ノキア コーポレイション | Noise suppression device |
CN1684143A (en) * | 2004-04-14 | 2005-10-19 | 华为技术有限公司 | Method for strengthening sound |
CN103824562A (en) * | 2014-02-10 | 2014-05-28 | 太原理工大学 | Psychological acoustic model-based voice post-perception filter |
JP2014232331A (en) * | 2007-07-06 | 2014-12-11 | オーディエンス,インコーポレイテッド | System and method for adaptive intelligent noise suppression |
-
2015
- 2015-01-21 CN CN201510031872.9A patent/CN105869649B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003514264A (en) * | 1999-11-15 | 2003-04-15 | ノキア コーポレイション | Noise suppression device |
CN1684143A (en) * | 2004-04-14 | 2005-10-19 | 华为技术有限公司 | Method for strengthening sound |
JP2014232331A (en) * | 2007-07-06 | 2014-12-11 | オーディエンス,インコーポレイテッド | System and method for adaptive intelligent noise suppression |
CN103824562A (en) * | 2014-02-10 | 2014-05-28 | 太原理工大学 | Psychological acoustic model-based voice post-perception filter |
Non-Patent Citations (1)
Title |
---|
张勇等: "结合人耳听觉感知的两级语音增强算法", 《信号处理》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
US20220027436A1 (en) * | 2020-07-22 | 2022-01-27 | Mitsubishi Heavy Industries, Ltd. | Anomaly factor estimation method, anomaly factor estimating device, and program |
CN112951262A (en) * | 2021-02-24 | 2021-06-11 | 北京小米松果电子有限公司 | Audio recording method and device, electronic equipment and storage medium |
CN112951262B (en) * | 2021-02-24 | 2023-03-10 | 北京小米松果电子有限公司 | Audio recording method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105869649B (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101976566B (en) | Voice enhancement method and device using same | |
Valin et al. | A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech | |
US20200265857A1 (en) | Speech enhancement method and apparatus, device and storage mediem | |
Sim et al. | A parametric formulation of the generalized spectral subtraction method | |
CN103531204B (en) | Sound enhancement method | |
CN102074246B (en) | Dual-microphone based speech enhancement device and method | |
Soon et al. | Speech enhancement using 2-D Fourier transform | |
CN108735225A (en) | It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method | |
CN109643554A (en) | Adaptive voice Enhancement Method and electronic equipment | |
CN105679330B (en) | Based on the digital deaf-aid noise-reduction method for improving subband signal-to-noise ratio (SNR) estimation | |
CN105489226A (en) | Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup | |
CN105869649A (en) | Perceptual filtering method and perceptual filter | |
Islam et al. | Speech enhancement based on student $ t $ modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function | |
CN106328155A (en) | Speech enhancement method of correcting priori signal-to-noise ratio overestimation | |
CN106653004B (en) | Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient | |
Yang et al. | Spectral contrast enhancement: Algorithms and comparisons | |
CN103971697B (en) | Sound enhancement method based on non-local mean filtering | |
CN108962275A (en) | A kind of music noise suppressing method and device | |
CN105869652A (en) | Psychological acoustic model calculation method and device | |
CN107045874A (en) | A kind of Non-linear Speech Enhancement Method based on correlation | |
US20170323656A1 (en) | Signal processor | |
Rao et al. | Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration | |
CN102568491B (en) | Noise suppression method and equipment | |
Trawicki et al. | Speech enhancement using Bayesian estimators of the perceptually-motivated short-time spectral amplitude (STSA) with Chi speech priors | |
Upadhyay | An improved multi-band speech enhancement utilizing masking properties of human hearing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |