CN105869649A - Perceptual filtering method and perceptual filter - Google Patents

Perceptual filtering method and perceptual filter Download PDF

Info

Publication number
CN105869649A
CN105869649A CN201510031872.9A CN201510031872A CN105869649A CN 105869649 A CN105869649 A CN 105869649A CN 201510031872 A CN201510031872 A CN 201510031872A CN 105869649 A CN105869649 A CN 105869649A
Authority
CN
China
Prior art keywords
frequency domain
background noise
noisy speech
power
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510031872.9A
Other languages
Chinese (zh)
Other versions
CN105869649B (en
Inventor
张勇
刘轶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PKU-HKUST SHENZHEN-HONGKONG INSTITUTION
Peking University Shenzhen Graduate School
Original Assignee
PKU-HKUST SHENZHEN-HONGKONG INSTITUTION
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PKU-HKUST SHENZHEN-HONGKONG INSTITUTION, Peking University Shenzhen Graduate School filed Critical PKU-HKUST SHENZHEN-HONGKONG INSTITUTION
Priority to CN201510031872.9A priority Critical patent/CN105869649B/en
Publication of CN105869649A publication Critical patent/CN105869649A/en
Application granted granted Critical
Publication of CN105869649B publication Critical patent/CN105869649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephone Function (AREA)

Abstract

The invention provides a perceptual filtering method. The perceptual filtering method includes the following steps that: noise-containing speech is obtained, the noise-containing speech is calculated according to a noise estimation algorithm, so that noise power can be obtained; the noise-containing speech is calculated according to a masking model, so that a frequency-domain masking threshold value can be obtained; the noise-containing speech is converted to a frequency domain, so that frequency-domain noise-containing speech can be obtained, wherein the frequency-domain noise-containing speech includes frequency-domain pure speech and frequency-domain background noises; based on a speech estimation error algorithm, speech signal distortion is expressed as an relational expression about the frequency-domain pure speech and perceptual filter gain, and filtering background noises are expressed as an relational expression about the frequency-domain background noises and the perceptual filter gain; based on a relationship that the sum of the power of the speech signal distortion and the power of the filtering background noises is smaller or equal to the frequency-domain masking threshold value, an equation about the perceptual filter gain can be constructed; the equation is solved, so that the perceptual filter gain can be obtained; and filtering processing is performed on the noise-containing speech according to the perceptual filter gain, so that enhanced speech can be obtained. With the perceptual filtering method adopted, the subjective perceptual quality of the speech can be improved. The invention also provides a perceptual filter.

Description

Perception filtering method and perceptual filter
Technical field
The present invention relates to field of voice signal, particularly relate to a kind of perception filtering method and perception filtering Device.
Background technology
In actual life, voice signal is inevitably polluted by background noise, and speech enhan-cement is as one Planting signal processing method is a kind of high effective way solving sound pollution, thus it is always Speech processing One study hotspot in field.The purpose of speech enhan-cement is exactly on the premise of ensureing the intelligibility of speech, to the greatest extent may be used The removal background noise of energy, improves the subjective auditory effect of voice.
Traditional voice strengthens algorithm and includes spectrum-subtraction, Wiener Filter Method, Minimum Mean Squared Error estimation method, logarithm Spectral amplitude least mean-square error, based on DCT (Discrete Cosine Transform, discrete cosine) convert Enhancement Method etc..These methods are all based on greatly voice and the statistical model of noise component(s) in frequency domain, and combine Various estimation theories design has noise cancellation technique targetedly.But traditional voice strengthens algorithm due to vacation If there is deviation and cause enhanced voice signal still has a large amount of voice distortion and residual in model and practical situation Stay noise, have impact on the effect of speech enhan-cement.
Summary of the invention
Based on this, it is necessary to for the problems referred to above, it is provided that a kind of perception filtering method and perceptual filter, will Noise level drops to below human auditory system masking threshold, thus improves the subjective perceptual quality strengthening voice.
A kind of perception filtering method, described method includes:
Obtain noisy speech, described noisy speech is calculated noise power according to noise Estimation Algorithm;
Described noisy speech is calculated frequency domain masking threshold according to masking model;
Described noisy speech being transformed into frequency domain, obtains frequency domain noisy speech, described frequency domain noisy speech includes Frequency domain clean speech and frequency domain background noise;
Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, The relational expression of perceptual filter gain, is expressed as filtering background noise about described frequency domain background noise, sense Know the relational expression of filter gain;
According to described voice signal distortion and filtering background noise, based on voice signal distortion power, the filtering back of the body Scape noise power sum is less than or equal to the pass series structure side about perceptual filter gain of frequency domain masking threshold Journey;
Solve described equation and obtain described perceptual filter gain;
According to described perceptual filter gain, it is filtered described noisy speech processing the voice obtaining strengthening.
Wherein in an embodiment, described according to described voice signal distortion and filtering background noise, based on Voice signal distortion power, filtering Background Noise Power sum are less than or equal to the relation structure of frequency domain masking threshold The step making the equation about perceptual filter gain is:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold, Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency Territory masking threshold.
Wherein in an embodiment, described in solve described equation and obtain the step of described perceptual filter gain Including:
Approximate data is used to be calculated described frequency domain Background Noise Power according to described noise power;
It is calculated posteriori SNR according to described frequency domain Background Noise Power;
It is calculated prior weight based on direct decision algorithm according to described posteriori SNR;
Solve described equation obtain according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold To described perceptual filter gain.
Wherein in an embodiment, described, described noisy speech is transformed into frequency domain, obtains frequency domain band and make an uproar Voice, before described frequency domain noisy speech includes the step of frequency domain clean speech and frequency domain background noise, also wraps Include:
Described noisy speech is used and strengthens based on short-time magnitude Power estimation method, obtain enhanced band and make an uproar Voice, the described frequency domain that is transformed into by described noisy speech is that described enhanced noisy speech is transformed into frequency domain, Described be filtered noisy speech processes the voice obtaining strengthening for be filtered enhanced noisy speech Process obtain strengthen voice, described according to described noise power use approximate data be calculated described frequency domain The step of Background Noise Power is:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein λdK () is noise power, Y (k) is frequency domain noisy speech.
Wherein in an embodiment, described calculate based on direct decision algorithm according to described posteriori SNR Step to prior weight is:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ ' Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain ξ ^ ′ ( k , l ) = ηG ( k , l - 1 ) γ ′ ( k , l - 1 ) + ( 1 - η ) max { γ ′ ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
A kind of perceptual filter, described perceptual filter includes:
Acquisition module, is used for obtaining noisy speech;
Noise power calculation module, for being calculated noise by described noisy speech according to noise Estimation Algorithm Power;
Masking threshold computing module, shelters for described noisy speech is calculated frequency domain according to masking model Threshold value;
Frequency domain modular converter, for described noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, institute State frequency domain noisy speech and include frequency domain clean speech and frequency domain background noise;
Equation constructing module, for based on voice estimation difference algorithm, voice signal distortion table is shown as about Described frequency domain clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise about institute State frequency domain background noise, the relational expression of perceptual filter gain, according to described voice signal distortion and the filtering back of the body Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum The pass series structure of threshold value is about the equation of perceptual filter gain;
Gain solves module, is used for solving described equation and obtains described perceptual filter gain;
Filtering Processing module, for according to described perceptual filter gain, is filtered described noisy speech Process the voice obtaining strengthening.
Wherein in an embodiment, described equation constructing module is according to described voice signal distortion and the filtering back of the body Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum Threshold value close series structure about perceptual filter gain equation particularly as follows:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold, Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency Territory masking threshold.
Wherein in an embodiment, described gain solves module and includes:
Solve preparatory unit, for using approximate data to be calculated the described frequency domain back of the body according to described noise power Scape noise power, is calculated posteriori SNR according to described frequency domain Background Noise Power, according to described posteriority Signal to noise ratio is calculated prior weight based on direct decision algorithm;
Solve unit, for according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold Solve described equation and obtain described perceptual filter gain.
Wherein in an embodiment, described perceptual filter also includes:
Strengthen module, strengthen based on short-time magnitude Power estimation method for described noisy speech is used, To enhanced noisy speech;
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by described frequency domain modular converter It is transformed into frequency domain;
Noisy speech is filtered processing the voice obtaining strengthening for enhanced by described Filtering Processing module Noisy speech is filtered processing the voice obtaining strengthening;
The described preparatory unit that solves uses approximate data to be calculated described frequency domain background according to described noise power Noise power particularly as follows:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein λdK () is noise power, Y (k) is frequency domain noisy speech.
Wherein in an embodiment, described in solve preparatory unit according to described posteriori SNR based on directly sentencing Annual reporting law be calculated prior weight particularly as follows:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ ' Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain ξ ^ ′ ( k , l ) = ηG ( k , l - 1 ) γ ′ ( k , l - 1 ) + ( 1 - η ) max { γ ′ ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
Said sensed filtering method and perceptual filter, by obtaining noisy speech, by noisy speech according to making an uproar Sound algorithm for estimating is calculated noise power, according to masking model, noisy speech is calculated frequency domain and shelters threshold Value, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, and frequency domain noisy speech includes that frequency domain is pure Voice and frequency domain background noise;Based on voice estimation difference algorithm, voice signal distortion table is shown as about institute State frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about described Frequency domain background noise, the relational expression of perceptual filter gain;According to voice signal distortion and filtering background noise, Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold Series structure is about the equation of perceptual filter gain;Solving equation obtains perceptual filter gain;According to perception Filter gain, is filtered noisy speech processing the voice obtaining strengthening.Due to voice signal distortion merit Rate, filtering Background Noise Power sum, less than or equal to frequency domain masking threshold, are ensureing voice signal distortion merit While rate is less, it is ensured that noise level is not heard by human ear less than human auditory system masking threshold, thus improves Strengthen the subjective perceptual quality of voice.
Accompanying drawing explanation
Fig. 1 is the flow chart of perception filtering method in an embodiment;
Fig. 2 is the flow chart constructing the equation about perceptual filter gain in an embodiment;
Fig. 3 is the flow chart that in an embodiment, solving equation obtains perceptual filter gain;
Fig. 4 is the structured flowchart of speech-enhancement system in an embodiment;
Fig. 5 is the structured flowchart of perceptual filter in an embodiment;
Fig. 6 is the structured flowchart that in an embodiment, gain solves module;
Fig. 7 is the structured flowchart of perceptual filter in another embodiment.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and reality Execute example, the present invention is further elaborated.Only should be appreciated that specific embodiment described herein Only in order to explain the present invention, it is not intended to limit the present invention.
As shown in Figure 1, it is provided that a kind of perception filtering method, comprise the following steps:
Step S110, obtains noisy speech, according to noise Estimation Algorithm, noisy speech is calculated noise merit Rate.
In the present embodiment, the noisy speech of acquisition is y (n)=s (n)+z (n) at time-domain representation, and wherein s (n) is pure Voice signal, z (n) is the background noise in original noisy speech.Noise Estimation Algorithm can use existing calculation Method, is calculated noise power λ of frequency domain by noisy speech y (n)=s (n)+z (n) according to noise Estimation Algorithmd(k), Wherein k is frequency spectrum sequence number.
Step S120, is calculated frequency domain masking threshold by noisy speech according to masking model.
In the present embodiment, masking model can be existing masking model, such as psychoacoustic model, according to covering Cover model and be calculated frequency domain masking threshold T (k) of frequency domain noisy speech Y (k).
Step S130, is transformed into frequency domain by noisy speech, obtains frequency domain noisy speech, including the pure language of frequency domain Sound and frequency domain background noise.
In the present embodiment, noisy speech y (n)=s (n)+z (n) is transformed into frequency domain through FFT, obtains frequency Territory noisy speech Y (k), is expressed as Y (k)=S (k)+Z (k), and wherein S (k) is frequency domain clean speech, and Z (k) is frequency Territory background noise, k is frequency spectrum sequence number.It is understood that noisy speech can be through voice enhancement algorithm Noisy speech after process, as the band after sound enhancement method based on short-time spectrum amplitude Estimation processes is made an uproar Voice, now z (n) is the remnants after sound enhancement method based on short-time spectrum amplitude Estimation processes in voice Noise.
Step S140, based on voice estimation difference algorithm, is shown as voice signal distortion table about described frequency domain Clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise carrying on the back about described frequency domain Scape noise, the relational expression of perceptual filter gain.
In the present embodiment, the frequency domain after perceptual filter denoising strengthens voiceRoot According to voice estimation differenceObtaining E (k)=S (k)-G (k) Y (k), wherein E (k) is that voice is estimated Meter error, S (k) is frequency domain clean speech, and G (k) is perceptual filter gain, and Y (k) is frequency domain noisy speech. According to Y (k)=S (K)+Z (K), obtaining E (k)=S (k)-G (K) (S (K)+Z (K)), wherein Z (K) is frequency domain background Noise.Described voice estimation difference E (k) is converted into E (k)=(1-G (k)) S (k)-G (K) Z (K), obtains voice Distorted signals εSK ()=(1-G (k)) S (k), filters background noise εZ(k)=|-G (k) Z (k) |=| G (k) Z (k) |.
Step S150, according to voice signal distortion and filtering background noise, based on voice signal distortion power, Filtering Background Noise Power sum increases about perceptual filter less than or equal to the pass series structure of frequency domain masking threshold The equation of benefit.
In the present embodiment, voice signal distortion power isFiltering Background Noise Power ForWherein E{ } represent expectation, the transposition of T representing matrix.Cover in conjunction with human ear Cover effect, while optimum gain function G (k) should make voice distortion the least, make at background noise Under human ear masking threshold, if voice distortion simultaneously is excessive, hence it is evident that have distortion, subjective perception can be affected Quality, therefore, the present embodiment requires voice signal distortion power ES(k), filtering Background Noise Power Ez(k) it With less than or equal to frequency domain masking threshold T (k), i.e. ES(k)+EZ(k)≤T(k).Can meet as required ES(k)+EZSelf-defined E under conditions of (k)≤T (k)S(k)+EZK the pass series structure between () and T (k) is about G's (k) Equation, such as ES(k)+EZ(k)=T (k)/2.
In one embodiment, as in figure 2 it is shown, according to voice signal distortion power, filter background noise merit Rate sum is equal to frequency domain masking threshold, constructs the equation about G (k), and step S150 comprises the following steps:
Step S151, obtains described voice signal distortion power according to described voice signal distortion.
Concrete, by voice signal distortion εSK ()=(1-G (k)) S (k) substitutes intoObtain language Tone signal distortion power Es(k)=(G (k)-1)2Ps(k), wherein PS(k)=E{ST(k), S (k) } it is frequency domain clean speech Power.
Step S152, obtains described filtering Background Noise Power according to described filtering background noise.
Concrete, will filtering background noise εZK ()=| G (k) Z (k) | substitutes intoFiltered Background Noise Power Ez(k)=(G (k))2Pz(k), wherein PZ(k)=E{ZT(k), Z (k) } it is frequency domain Background Noise Power.
Step S153, shelters equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum The relation of threshold value, obtains described equation (G (k)-1)2PS(k)+(G(k))2PZK ()=T (k), wherein G (k) is perception Filter gain, k is frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain background noise Power, T (k) is frequency domain masking threshold.
Concrete, by voice signal distortion power ES(k)=(G (k)-1)2PS(k), filtering Background Noise Power Ez(k)=(G (k))2PzK () substitutes into ES(k)+EZK ()=T (k) obtains (G(k)-1)2PS(k)+(G(k))2PZ(k)=T (k), wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.
Step S150, solving equation obtains perceptual filter gain.
In the present embodiment, (G (k)-1)2PS(k)+(G(k))2PZK ()=T (k) is the One-place 2-th Order side about G (k) Journey, can be by first calculating PS(k)、PZK the value of (), then solves quadratic equation with one unknown and obtains equation root.Also By equation is deformed, then solving equation can be carried out.Owing to there is quadratic equation with one unknown without the feelings solved Condition, can the most self-defined G (k).
In one embodiment, as it is shown on figure 3, step S160 comprises the following steps:
Step S161, uses approximate data to be calculated frequency domain Background Noise Power according to noise power.
Concrete, by noise power λdK () is approximately equal to frequency domain Background Noise Power PZK (), by PZ(k)=λd(k) Obtain frequency domain Background Noise Power PZ(k).If it is understood that the noisy speech obtained is again through voice Strengthen algorithm to be processed, then approximate data can be different, uses self-defining approximate data.
Step S162, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR It is calculated prior weight based on direct decision algorithm.
In the present embodiment, posteriori SNR γ ' (k) is defined asWherein Y (k) is noisy speech, The spectrum amplitude that | Y (k) | is noisy speech, PZK () is frequency domain Background Noise Power.Directly decision algorithm can be adopted Use existing algorithm, be calculated prior weight ξ ' (k).
Step S163, according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold solving equation Obtain perceptual filter gain.
In the present embodiment, if according to ES(k)+EZK ()=T (k) equationof structure formula, then the equation constructed is (G(k)-1)2PS(k)+(G(k))2PZ(k)=T (k).As a example by solving this equation, by about this equation simultaneously divided by PZK () is converted into equation (G (k)-1)2ξ′(k)+(G(k))2=C (k), wherein ξ ' (k) is prior weight,PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.This equation be about The linear equation in two unknowns of G (k), wherein ξ ' (k), C (k) are that oneself knows, then obtain according to quadratic equation with one unknown radical formula Arrive G ( k ) = &xi; &prime; ( k ) + &xi; &prime; ( k ) ( C ( k ) - 1 ) + C ( k ) &xi; &prime; ( k ) + 1 , The condition set up is &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 &le; C ( k ) < 1 , If 0 < C ( k ) < &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 Or during C (k) >=1 equation without solve.If 0 < C ( k ) < &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 , Definition G ( k ) = &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 . If C (k) >=1, according toThen frequency domain Background Noise Power PZK () is at frequency domain masking threshold T (k) Under, now noise level is less than human auditory system masking threshold, now need not carry out frequency domain noisy speech Y (k) Filtering Processing, also can reach preferable audition subjective effect, definition G (k)=1.Therefore, the above analysis, Then perceptual filter G (k) is:
G ( k ) = &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 0 < C < &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 &xi; &prime; ( k ) + &xi; &prime; ( k ) ( C ( k ) - 1 ) + C ( k ) &xi; &prime; ( k ) + 1 &xi; &prime; ( k ) &xi; &prime; ( k ) + 1 &le; C < 1 1 C &GreaterEqual; 1
It is understood that the present embodiment is with solving equation (G (k)-1)2PS(k)+(G(k))2PZAs a example by (k)=T (k) Illustrating, equation can be according to ES(k)+EZK other equation that ()≤T (k) constructs.
Step S170, according to perceptual filter gain, is filtered noisy speech processing the language obtaining strengthening Sound.
In the present embodiment, according to perceptual filter gain G (k), by frequency domain noisy speech? To the frequency domain speech strengthenedConvert it to time domain again, obtain the voice strengthenedOr first will sense Know that filter gain G (k) is transformed into time domain, obtain g (n), then byObtain enhancing VoiceWherein y (n) is time domain noisy speech, and * represents convolution.
In the present embodiment, by obtaining noisy speech, noisy speech is calculated according to noise Estimation Algorithm Noise power, is calculated frequency domain masking threshold by noisy speech according to masking model, is changed by noisy speech To frequency domain, obtaining frequency domain noisy speech, frequency domain noisy speech includes frequency domain clean speech and frequency domain background noise; Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, perception The relational expression of filter gain, is expressed as filtering background noise about described frequency domain background noise, perception filter The relational expression of ripple device gain;According to voice signal distortion and filtering background noise, based on voice signal distortion merit Rate, filtering Background Noise Power sum filter about perception less than or equal to the pass series structure of frequency domain masking threshold The equation of device gain;Solving equation obtains perceptual filter gain;According to perceptual filter gain, band is made an uproar Voice is filtered processing the voice obtaining strengthening.Due to voice signal distortion power, filtering background noise merit Rate sum is less than or equal to frequency domain masking threshold, while ensureing that voice signal distortion power is less, it is ensured that Noise level is not heard by human ear less than human auditory system masking threshold, thus improves the subjective sense strengthening voice Know quality.
In one embodiment, before step S130, also include: noisy speech is used based on width in short-term Degree Power estimation method strengthens, and obtains enhanced noisy speech, is transformed into by noisy speech in step S130 Noisy speech, for enhanced noisy speech is transformed into frequency domain, is filtered in step S170 processing by frequency domain The voice obtaining strengthening is to be filtered enhanced noisy speech processing the voice obtaining strengthening, step S161 is:
Obtain frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;Root According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain frequency domain Background Noise Power PZ(k), wherein λdK () is noise merit Rate, Y (k) is frequency domain noisy speech.
In the present embodiment, due to through yet suffering from remnants based on the short-time magnitude enhanced voice of Power estimation method Noise, can improve the effect strengthening voice further by the perception filtering method of the present embodiment.Now root According to noise power λdK () uses approximate data to be calculated frequency domain Background Noise Power PZTime (k), first obtain based on The frequency domain gain function G of short-time magnitude Power estimation methodHK (), then according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2 Approximate evaluation obtains frequency domain Background Noise Power PZK (), wherein Y (k) is noisy speech, and | Y (k) | is noisy speech Spectrum amplitude.
In one embodiment, it is calculated prior weight according to posteriori SNR based on direct decision algorithm Step be: obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is respectively γ ' Frequency spectrum sequence number, l is frame number, and present frame is l frame;Obtain former frame perceptual filter gain G (k, l-1), as The most described former frame is the first frame, then the gain of former frame perceptual filter is preset value;According to posteriori SNR Present frame prior weight is obtained by formula with perceptual filter gain &xi; ^ &prime; ( k , l ) = &eta;G ( k , l - 1 ) &gamma; &prime; ( k , l - 1 ) + ( 1 - &eta; ) max { &gamma; &prime; ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
In the present embodiment, defining the first frame perceptual filter gain G (k, 1) is preset value, preferential for 1, obtains Take the second frame and the first frame posteriori SNR, respectively γ ' (k, 2), γ ' (k, 1) basis &xi; ^ &prime; ( k , 2 ) = &eta;G ( k , 1 ) &gamma; &prime; ( k , 1 ) + ( 1 - &eta; ) max { &gamma; &prime; ( k , 2 ) - 1,0 } Obtain the second frame prior weight.η for smooth because of Son, can take any number between 0 to 1, preferred η=0.92.Obtain the prior weight of the second frame After, the perceptual filter gain G (k, 2) that can obtain the second frame is solved equation according to follow-up step.Again can root According to G (k, 2), calculateBy that analogy.
Above example can be applied in speech-enhancement system as shown in Figure 4, inputs noisy speech, warp Cross masking threshold estimation 164 and obtain frequency domain masking threshold T (k), obtain noise power through Noise Estimation 166 λdK (), by T (k), λdK () input perception boostfiltering 165, structure solving equation obtain perceptual filter and increase Benefit, and carry out Filtering Processing obtain strengthen voice.
In one embodiment, as shown in Figure 5, it is provided that a kind of perceptual filter, including:
Acquisition module 210, is used for obtaining noisy speech.
Noise power calculation module 220, for being calculated noise merit by noisy speech according to noise Estimation Algorithm Rate.
Masking threshold computing module 230, shelters threshold for noisy speech is calculated frequency domain according to masking model Value.
Frequency domain modular converter 240, for noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, frequency domain Noisy speech includes frequency domain clean speech and frequency domain background noise.
Equation constructing module 250, for based on voice estimation difference algorithm, is shown as pass by voice signal distortion table In described frequency domain clean speech, the relational expression of perceptual filter gain, filtering background noise is expressed as about Described frequency domain background noise, the relational expression of perceptual filter gain, according to voice signal distortion and filtering background Noise, shelters threshold based on voice signal distortion power, filtering Background Noise Power sum less than or equal to frequency domain The pass series structure of value is about the equation of perceptual filter gain.
Gain solves module 260, obtains perceptual filter gain for solving equation;
Filtering Processing module 270, for according to perceptual filter gain, is filtered processing to noisy speech To the voice strengthened.
In one embodiment, equation constructing module 250 is according to voice signal distortion and filters background noise, Based on voice signal distortion power, filter the Background Noise Power sum pass less than or equal to frequency domain masking threshold Series structure about perceptual filter gain equation particularly as follows: according to voice signal distortion obtain described voice letter Number distortion power, obtains described filtering Background Noise Power according to filtering background noise, loses based on voice signal True power, filtering Background Noise Power sum, equal to the relation of frequency domain masking threshold, obtain equation (G(k)-1)2Ps(k)+(G(k))2Pz(k)=T (k),Wherein G (k) is perceptual filter gain, and k is frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency domain masking threshold.
In one embodiment, as shown in Figure 6, gain solves module 260 and includes:
Solve preparatory unit 261, for using approximate data to be calculated frequency domain background noise according to noise power Power, is calculated posteriori SNR according to frequency domain Background Noise Power, according to posteriori SNR based on directly Decision algorithm is calculated prior weight.
Solve unit 262, for asking according to frequency domain Background Noise Power, prior weight, frequency domain masking threshold Solve equation and obtain perceptual filter gain.
In one embodiment, as it is shown in fig. 7, on the basis of above-described embodiment, described perceptual filter Also include:
Strengthen module 280, strengthen based on short-time magnitude Power estimation method for noisy speech is used, obtain Enhanced noisy speech.
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by frequency domain modular converter 240 Being transformed into frequency domain, it is right that noisy speech is filtered processing the voice obtaining strengthening by Filtering Processing module 270 Enhanced noisy speech is filtered processing the voice obtaining strengthening.
Solving preparatory unit 261 uses approximate data to be calculated frequency domain Background Noise Power according to noise power Particularly as follows: obtain frequency domain gain function G based on short-time magnitude Power estimation methodH(k), wherein k is frequency spectrum sequence number, According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain frequency domain Background Noise Power PZ(k), wherein λdK () is noise Power, Y (k) is frequency domain noisy speech.
In one embodiment, preparatory unit 261 is solved according to posteriori SNR based on direct decision algorithm meter Calculation obtains prior weight particularly as follows: obtain present frame and former frame posteriori SNR, respectively γ ' (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence number, and l is frame number, and present frame is l frame, obtains former frame perception filtering Device gain G (k, l-1), if former frame is the first frame, then the gain of former frame perceptual filter is preset value;Root Present frame prior weight is obtained by formula according to posteriori SNR and perceptual filter gain &xi; ^ &prime; ( k , l ) = &eta;G ( k , l - 1 ) &gamma; &prime; ( k , l - 1 ) + ( 1 - &eta; ) max { &gamma; &prime; ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
Embodiment described above only have expressed the several embodiments of the present invention, and it describes more concrete and detailed, But therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that, for this area Those of ordinary skill for, without departing from the inventive concept of the premise, it is also possible to make some deformation and Improving, these broadly fall into protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be with appended Claim is as the criterion.

Claims (10)

1. a perception filtering method, described method includes:
Obtain noisy speech, described noisy speech is calculated noise power according to noise Estimation Algorithm;
Described noisy speech is calculated frequency domain masking threshold according to masking model;
Described noisy speech being transformed into frequency domain, obtains frequency domain noisy speech, described frequency domain noisy speech includes Frequency domain clean speech and frequency domain background noise;
Based on voice estimation difference algorithm, voice signal distortion table is shown as about described frequency domain clean speech, The relational expression of perceptual filter gain, is expressed as filtering background noise about described frequency domain background noise, sense Know the relational expression of filter gain;
According to described voice signal distortion and filtering background noise, based on voice signal distortion power, the filtering back of the body Scape noise power sum is less than or equal to the pass series structure side about perceptual filter gain of frequency domain masking threshold Journey;
Solve described equation and obtain described perceptual filter gain;
According to described perceptual filter gain, it is filtered described noisy speech processing the voice obtaining strengthening.
Method the most according to claim 1, it is characterised in that described according to described voice signal distortion With filtering background noise, it is less than or equal to based on voice signal distortion power, filtering Background Noise Power sum The pass series structure of frequency domain masking threshold about the step of the equation of perceptual filter gain is:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold, Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency Territory masking threshold.
Method the most according to claim 1, it is characterised in that described in solve described equation and obtain described The step of perceptual filter gain includes:
Approximate data is used to be calculated described frequency domain Background Noise Power according to described noise power;
It is calculated posteriori SNR according to described frequency domain Background Noise Power;
It is calculated prior weight based on direct decision algorithm according to described posteriori SNR;
Solve described equation obtain according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold To described perceptual filter gain.
Method the most according to claim 3, it is characterised in that described noisy speech is changed described To frequency domain, obtaining frequency domain noisy speech, described frequency domain noisy speech includes frequency domain clean speech and frequency domain background Before the step of noise, also include:
Described noisy speech is used and strengthens based on short-time magnitude Power estimation method, obtain enhanced band and make an uproar Voice, the described frequency domain that is transformed into by described noisy speech is that described enhanced noisy speech is transformed into frequency domain, Described be filtered noisy speech processes the voice obtaining strengthening for be filtered enhanced noisy speech Process obtain strengthen voice, described according to described noise power use approximate data be calculated described frequency domain The step of Background Noise Power is:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein λdK () is noise power, Y (k) is frequency domain noisy speech.
Method the most according to claim 3, it is characterised in that described according to described posteriori SNR base The step being calculated prior weight in direct decision algorithm is:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ ' Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain &xi; ^ &prime; = ( k , l ) = &eta;G ( k , l - 1 ) &gamma; &prime; ( k , l - 1 ) + ( 1 - &eta; ) max { &gamma; &prime; ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
6. a perceptual filter, it is characterised in that described perceptual filter includes:
Acquisition module, is used for obtaining noisy speech;
Noise power calculation module, for being calculated noise by described noisy speech according to noise Estimation Algorithm Power;
Masking threshold computing module, shelters for described noisy speech is calculated frequency domain according to masking model Threshold value;
Frequency domain modular converter, for described noisy speech is transformed into frequency domain, obtains frequency domain noisy speech, institute State frequency domain noisy speech and include frequency domain clean speech and frequency domain background noise;
Equation constructing module, for based on voice estimation difference algorithm, voice signal distortion table is shown as about Described frequency domain clean speech, the relational expression of perceptual filter gain, be expressed as filtering background noise about institute State frequency domain background noise, the relational expression of perceptual filter gain, according to described voice signal distortion and the filtering back of the body Scape noise, shelters less than or equal to frequency domain based on voice signal distortion power, filtering Background Noise Power sum The pass series structure of threshold value is about the equation of perceptual filter gain;
Gain solves module, is used for solving described equation and obtains described perceptual filter gain;
Filtering Processing module, for according to described perceptual filter gain, is filtered described noisy speech Process the voice obtaining strengthening.
Perceptual filter the most according to claim 6, it is characterised in that described equation constructing module root According to described voice signal distortion and filtering background noise, based on voice signal distortion power, filtering background noise Power sum is concrete about the equation of perceptual filter gain less than or equal to the pass series structure of frequency domain masking threshold For:
Described voice signal distortion power is obtained according to described voice signal distortion;
Described filtering Background Noise Power is obtained according to described filtering background noise;
Based on voice signal distortion power, filter the Background Noise Power sum relation equal to frequency domain masking threshold, Obtain described equation (G (k)-1)2Ps(k)+(G(k))2PzK ()=T (k), wherein G (k) is perceptual filter gain, k For frequency spectrum sequence number, PSK () is frequency domain clean speech power, PZK () is frequency domain Background Noise Power, T (k) is frequency Territory masking threshold.
Perceptual filter the most according to claim 6, it is characterised in that described gain solves module bag Include:
Solve preparatory unit, for using approximate data to be calculated the described frequency domain back of the body according to described noise power Scape noise power, is calculated posteriori SNR according to described frequency domain Background Noise Power, according to described posteriority Signal to noise ratio is calculated prior weight based on direct decision algorithm;
Solve unit, for according to described frequency domain Background Noise Power, prior weight, frequency domain masking threshold Solve described equation and obtain described perceptual filter gain.
Perceptual filter the most according to claim 8, it is characterised in that described perceptual filter is also wrapped Include:
Strengthen module, strengthen based on short-time magnitude Power estimation method for described noisy speech is used, To enhanced noisy speech;
Described noisy speech is transformed into frequency domain for by described enhanced noisy speech by described frequency domain modular converter It is transformed into frequency domain;
Noisy speech is filtered processing the voice obtaining strengthening for enhanced by described Filtering Processing module Noisy speech is filtered processing the voice obtaining strengthening;
The described preparatory unit that solves uses approximate data to be calculated described frequency domain background according to described noise power Noise power particularly as follows:
Obtain described frequency domain gain function G based on short-time magnitude Power estimation methodHK (), wherein k is frequency spectrum sequence number;
According to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtain described frequency domain Background Noise Power PZ(k), wherein λdK () is noise power, Y (k) is frequency domain noisy speech.
Perceptual filter the most according to claim 8, it is characterised in that described in solve preparatory unit Be calculated prior weight according to described posteriori SNR based on direct decision algorithm particularly as follows:
Obtaining present frame and former frame posteriori SNR, (k, l), γ ' (k, l-1), wherein k is frequency spectrum sequence to respectively γ ' Number, l is frame number, and present frame is l frame;
Obtain former frame perceptual filter gain G (k, l-1), if described former frame is the first frame, the most described before One frame perceptual filter gain is preset value;
Present frame prior weight is obtained by formula according to described posteriori SNR and perceptual filter gain &xi; ^ &prime; = ( k , l ) = &eta;G ( k , l - 1 ) &gamma; &prime; ( k , l - 1 ) + ( 1 - &eta; ) { &gamma; &prime; ( k , l ) - 1,0 } , Wherein η is smoothing factor, and 0 < η < 1.
CN201510031872.9A 2015-01-21 2015-01-21 Perceptual filtering method and perceptual filter Active CN105869649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510031872.9A CN105869649B (en) 2015-01-21 2015-01-21 Perceptual filtering method and perceptual filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510031872.9A CN105869649B (en) 2015-01-21 2015-01-21 Perceptual filtering method and perceptual filter

Publications (2)

Publication Number Publication Date
CN105869649A true CN105869649A (en) 2016-08-17
CN105869649B CN105869649B (en) 2020-02-21

Family

ID=56623456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510031872.9A Active CN105869649B (en) 2015-01-21 2015-01-21 Perceptual filtering method and perceptual filter

Country Status (1)

Country Link
CN (1) CN105869649B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448696A (en) * 2016-12-20 2017-02-22 成都启英泰伦科技有限公司 Adaptive high-pass filtering speech noise reduction method based on background noise estimation
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
CN112951262A (en) * 2021-02-24 2021-06-11 北京小米松果电子有限公司 Audio recording method and device, electronic equipment and storage medium
US20220027436A1 (en) * 2020-07-22 2022-01-27 Mitsubishi Heavy Industries, Ltd. Anomaly factor estimation method, anomaly factor estimating device, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003514264A (en) * 1999-11-15 2003-04-15 ノキア コーポレイション Noise suppression device
CN1684143A (en) * 2004-04-14 2005-10-19 华为技术有限公司 Method for strengthening sound
CN103824562A (en) * 2014-02-10 2014-05-28 太原理工大学 Psychological acoustic model-based voice post-perception filter
JP2014232331A (en) * 2007-07-06 2014-12-11 オーディエンス,インコーポレイテッド System and method for adaptive intelligent noise suppression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003514264A (en) * 1999-11-15 2003-04-15 ノキア コーポレイション Noise suppression device
CN1684143A (en) * 2004-04-14 2005-10-19 华为技术有限公司 Method for strengthening sound
JP2014232331A (en) * 2007-07-06 2014-12-11 オーディエンス,インコーポレイテッド System and method for adaptive intelligent noise suppression
CN103824562A (en) * 2014-02-10 2014-05-28 太原理工大学 Psychological acoustic model-based voice post-perception filter

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张勇等: "结合人耳听觉感知的两级语音增强算法", 《信号处理》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448696A (en) * 2016-12-20 2017-02-22 成都启英泰伦科技有限公司 Adaptive high-pass filtering speech noise reduction method based on background noise estimation
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
US20220027436A1 (en) * 2020-07-22 2022-01-27 Mitsubishi Heavy Industries, Ltd. Anomaly factor estimation method, anomaly factor estimating device, and program
CN112951262A (en) * 2021-02-24 2021-06-11 北京小米松果电子有限公司 Audio recording method and device, electronic equipment and storage medium
CN112951262B (en) * 2021-02-24 2023-03-10 北京小米松果电子有限公司 Audio recording method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105869649B (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN101976566B (en) Voice enhancement method and device using same
Valin et al. A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech
US20200265857A1 (en) Speech enhancement method and apparatus, device and storage mediem
Sim et al. A parametric formulation of the generalized spectral subtraction method
CN103531204B (en) Sound enhancement method
CN102074246B (en) Dual-microphone based speech enhancement device and method
Soon et al. Speech enhancement using 2-D Fourier transform
CN108735225A (en) It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method
CN109643554A (en) Adaptive voice Enhancement Method and electronic equipment
CN105679330B (en) Based on the digital deaf-aid noise-reduction method for improving subband signal-to-noise ratio (SNR) estimation
CN105489226A (en) Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
CN105869649A (en) Perceptual filtering method and perceptual filter
Islam et al. Speech enhancement based on student $ t $ modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function
CN106328155A (en) Speech enhancement method of correcting priori signal-to-noise ratio overestimation
CN106653004B (en) Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
Yang et al. Spectral contrast enhancement: Algorithms and comparisons
CN103971697B (en) Sound enhancement method based on non-local mean filtering
CN108962275A (en) A kind of music noise suppressing method and device
CN105869652A (en) Psychological acoustic model calculation method and device
CN107045874A (en) A kind of Non-linear Speech Enhancement Method based on correlation
US20170323656A1 (en) Signal processor
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
CN102568491B (en) Noise suppression method and equipment
Trawicki et al. Speech enhancement using Bayesian estimators of the perceptually-motivated short-time spectral amplitude (STSA) with Chi speech priors
Upadhyay An improved multi-band speech enhancement utilizing masking properties of human hearing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant