CN102054482A

CN102054482A - Method and device for enhancing voice signal

Info

Publication number: CN102054482A
Application number: CN2009102369170A
Authority: CN
Inventors: 刘霖; 田康
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2009-10-27
Filing date: 2009-10-27
Publication date: 2011-05-11
Anticipated expiration: 2029-10-27
Also published as: CN102054482B

Abstract

The embodiment of the invention discloses a method and a device for enhancing a voice signal. The method comprises the steps of: obtaining a noising voice signal, and carrying out perception weighted filtering on the noising voice signal; converting the noising voice signal subjected to the perception weighted filtering into a frequency domain, carrying out spectrum subtraction and phase synthesis on the noising voice signal in the frequency domain, and converting the voice signal subjected to the spectrum subtraction and phase synthesis into a time domain; and carrying out inverse perception weighted filtering on the voice signal subjected to the spectrum subtraction and the phase synthesis to obtain an enhanced voice signal. Through using the invention, the noising voice signal is subjected to the perception weighted filtering, the interference of the noising voice signal to the noise is effectively eliminated, the enhanced voice signal is obtained and the human vision is met.

Description

The method and apparatus that a kind of voice signal strengthens

Technical field

The present invention relates to communication technical field, the method and apparatus that particularly a kind of voice signal strengthens.

Background technology

Along with the development of 3G (3rd Generation, 3-G (Generation Three mobile communication system)), visual telephone service has obtained using widely.Visual telephone service can allow both call sides observe the residing conversation scene of the other side when realizing basic communication, has strengthened user's use experience.In calling course of video telephone, in order to allow camera capture the real-time conversation scene image of both call sides, both call sides need maintain a certain distance with mobile phone microphone when conversation, therefore, sneaked into a large amount of noises in the call voice signal that mobile phone microphone collects, the introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone.

In the prior art, in order to reduce the interference of noise to speech quality, Noisy Speech Signal is transformed to frequency domain through Fourier transform, at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm, from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech, its principle is as follows:

The noisy speech model is:

Y (n)=s (n)+d (n) formula (1)

Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.

Fourier transform is made on formula (1) both sides, can be got:

Y (k)=S (k)+D (k) formula (2)

Wherein, Y (k) represents the Fourier coefficient of noisy speech, and S (k) represents the Fourier coefficient of clean speech, and D (k) represents the Fourier coefficient of noise.

Ignore the phase difference between noisy speech and the clean speech, can get:

| Y (k) |=| S (k) |+| D (k) | formula (3)

Utilize the insensitivity of people's ear, can directly from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech for phase information, and as the amplitude spectrum of the enhancing voice that obtain.Obtaining basic expression formula thus is:

Formula (4)

And in actual use, more uses be the improved form of spectrum subtraction, formula (5) is the improved form of spectrum subtraction:

| \hat{S} (k) | = {[{| Y (k) |}^{α} - β {| D (k) |}^{α}]}^{1 / α}

Formula (5)

The spectral subtraction algorithm of this improved form and the difference of common spectral subtraction algorithm are to have introduced α and two parameters of β, and the introducing of parameter provides very big dirigibility to spectral subtraction algorithm.Noisy speech is carried out the system principle of spectrum subtraction, as shown in Figure 1.

Yet, utilize the process of spectral subtraction algorithm abating noises to have following technological deficiency in the prior art: when noisy speech is carried out spectral substraction, can't judge noise spectrum and voice spectrum accurately, therefore the spectral substraction algorithm is in abating noises, also make voice spectrum be subjected to bigger subduing, influenced the perception of human auditory normal voice spectrum.

Therefore, carry out the reduction that occurs in the process of voice enhancing based on utilizing spectral subtraction algorithm in the prior art to voice signal, existing spectral subtraction algorithm has been done a lot of improvement,, optimized the performance that voice strengthen by adjusting the intensity of abating noises in the spectral substraction.

Scheme 1 according to the frequency spectrum probability nature of noisy speech and the probability nature of noise spectrum, averages calculating, in order to the intensity of control abating noises amplitude;

Scheme 2, with α=1 in traditional spectral subtraction algorithm, β=2 change α=2 into, β=5, the subtractive method of spectrums that is improved, the coefficient that utilizes training to obtain, the intensity of control noise reduction.

In the implementation process of scheme 1 and scheme 2, the applicant finds, in scheme 1, its implementation procedure complexity height, be based on probability distribution for the control of noise reduction and carry out, not in conjunction with human auditory's feature, can not be very satisfactory on human auditory's effect; In scheme 2, though by a large amount of experiments, obtain a pair of effect α preferably, the β value, but because applied environment is constantly changing, this mode may obtain reasonable effect under certain environment, and under bigger environment, its noise reduction control still can not obtain gratifying effect.Therefore, above-mentioned by adjustment for abating noises intensity in the spectral substraction, in the scheme of the performance that the optimization voice strengthen, the subject matter that exists is: when noisy speech is carried out spectral substraction, because noise spectrum can only draw by estimation, the noise spectrum of its estimation is not accurate enough, in carrying out the spectral substraction process, may be because damping strength be controlled bad making when cutting down noise, too much reduction the intensity of speech manual, the voice signal after being enhanced can not satisfy human auditory's needs.

Summary of the invention

The method and apparatus that the embodiment of the invention provides a kind of voice signal to strengthen, the voice signal that is used to be enhanced satisfies human auditory's characteristics simultaneously.

The method that the embodiment of the invention provides a kind of voice signal to strengthen comprises:

Obtain Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;

Described Noisy Speech Signal after described sense weighted filtering handled is transformed into frequency domain, described Noisy Speech Signal is carried out spectral substraction and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;

Voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced.

Preferably, described Noisy Speech Signal is carried out perception weighted filtering handles, comprising:

The frequency band signals amplitude of people Er Yi perception in the described Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the described Noisy Speech Signal is difficult for perception reduces.

Preferably, described Noisy Speech Signal being carried out perception weighted filtering handles employed transport function and is:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

Wherein, γ ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Preferably, described Noisy Speech Signal is carried out perception weighted filtering handle, described Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Preferably, it is synthetic that described Noisy Speech Signal is carried out spectral substraction and phase place, comprising:

Noise-cut intensity to people Er Yi perception frequency range in the described Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.

Preferably, the voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:

The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with described spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.

The embodiment of the invention provides a kind of voice signal enhanced device, it is characterized in that, comprising:

The perception weighted filtering module is used for after obtaining Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;

The spectrum subtraction module after the described Noisy Speech Signal after described sense weighted filtering is handled is transformed into frequency domain, is carried out spectral substraction to described Noisy Speech Signal and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;

Contrary perception weighted filtering module is used for to described spectral substraction and phase place voice signal after synthetic and carries out contrary perception weighted filtering and handle the voice signal that is enhanced.

Preferably, described perception weighted filtering module specifically is used for:

Preferably, described perception weighted filtering module, described Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

Preferably, described perception weighted filtering module is carried out the perception weighted filtering processing to described Noisy Speech Signal, and described Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Preferably, described spectrum subtraction module specifically is used for:

Preferably, described contrary perception weighted filtering module specifically is used for:

Compared with prior art, the embodiment of the invention has the following advantages: in time domain Noisy Speech Signal is carried out perception weighted filtering and handle, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, finish the perception weighted filtering processing back of noisy speech synthetic in spectral substraction and phase place that frequency domain carries out noisy speech, make in the noise-cut intensity of the frequency range of people Er Yi perception less, the noise-cut intensity of frequency range that is difficult for perception at people's ear is bigger, the voice signal that is enhanced has satisfied human auditory's characteristics.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is for carrying out the system principle synoptic diagram of spectral subtraction algorithm to noisy speech in the prior art;

The schematic flow sheet of the method that a kind of voice signal that Fig. 2 provides for the embodiment of the invention strengthens;

The system principle synoptic diagram that a kind of voice signal that Fig. 3 provides for the embodiment of the invention strengthens;

Fig. 4 is the schematic flow sheet of the method for a kind of voice signal enhancing under the application scenarios of the present invention.

A kind of voice signal enhanced device structural representation that Fig. 5 provides for the embodiment of the invention.

Embodiment

In embodiments of the present invention, Noisy Speech Signal is carried out perception weighted filtering to be handled, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, after finishing perception weighted filtering processing to Noisy Speech Signal, by Fourier transform Noisy Speech Signal is transformed into frequency domain, at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm, frequency range noise reduction intensity in people Er Yi perception is less, the intensity of frequency range noise reduction that is difficult for perception at people's ear is bigger, after finishing spectral subtraction algorithm, voice signal is transformed into time domain by inverse Fourier transform, handle the voice signal that is enhanced through contrary perception weighted filtering again.

As shown in Figure 2, the schematic flow sheet of the method that a kind of voice signal that provides for the embodiment of the invention strengthens may further comprise the steps:

Step 201, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle.

Obtain Noisy Speech Signal after the conversation beginning, in time domain Noisy Speech Signal being carried out perception weighted filtering handles, perception weighted filtering is handled and to be made that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.

Step 202, the Noisy Speech Signal after will feeling weighted filtering and handling are transformed into frequency domain, Noisy Speech Signal are carried out spectral substraction and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with spectral substraction and phase place.

Noisy speech is behind perception weighted filtering, by Fourier transform Noisy Speech Signal is transformed into frequency domain, it is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, and the voice signal after synthetic is transformed into time domain through inverse Fourier transform with spectral substraction and phase place.

Concrete, Noisy Speech Signal is carried out spectral substraction and phase place is synthetic, make that the noise-cut intensity of people Er Yi perception frequency range is little in the Noisy Speech Signal, the noise-cut intensity of frequency range that people's ear in the Noisy Speech Signal is difficult for perception is big.

Step 203, the voice signal to spectral substraction and phase place after synthetic carry out contrary perception weighted filtering and handle the voice signal that is enhanced.

Noisy speech is carried out spectral substraction and the phase place voice signal after synthetic to be handled through contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic, the frequency band signals amplitude that people's ear is difficult for perception raises the voice signal after being enhanced.Voice signal after the enhancing is less in the intensity of its noise-cut of frequency range of people Er Yi perception, and people's ear to be difficult for the intensity of its noise-cut of frequency range of perception bigger, the effect that voice strengthen is obvious, satisfies human auditory's characteristics.

The method that the voice signal that the application of the invention embodiment provides strengthens, the frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.

In order more clearly to understand technical scheme of the present invention, detailed introduction is done in the perception weighted filtering processing of Noisy Speech Signal.

In the invention process process, Noisy Speech Signal carries out before the spectral subtraction algorithm, Noisy Speech Signal is carried out perception weighted filtering in time domain to be handled, voice signal after perception weighted filtering is handled, by Fourier transform Noisy Speech Signal is transformed into frequency domain, at frequency domain Noisy Speech Signal is carried out spectral substraction and phase place and synthetic, voice signal with spectral substraction and phase place after synthetic is transformed into time domain by inverse Fourier transform, handle the voice signal that is enhanced through contrary perception weighted filtering again.

Concrete, it is by making up a perceptual weighting filter, by using this perceptual weighting filter Noisy Speech Signal is carried out filtering, voice signal being transformed to the perceptual weighting territory that the perception weighted filtering of Noisy Speech Signal is handled.

In embodiments of the present invention, the transport function of perceptual weighting filter is as follows:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤ 1 formula (6)

Wherein, γ ₁And γ ₂Be two weighting factors, γ ₁And γ ₂Can obtain optimum value by a large amount of training; α _iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can read from the Voice decoder code stream, has reduced the complexity in the perceptual weighting implementation procedure.

After Noisy Speech Signal handled by perception weighted filtering, be in the expression formula of time domain:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Formula (7)

Noisy Speech Signal makes that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal after the said sensed weighted filtering is handled, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.After the perception weighted filtering of finishing Noisy Speech Signal was handled, less to the intensity of the frequency range noise-cut of people Er Yi perception in carrying out spectral substraction and phase place building-up process, the intensity of frequency range noise-cut that people's ear is difficult for perception was bigger.

Concrete, when carrying out spectral substraction, intensity to the frequency range noise-cut of people Er Yi perception is less, and promptly the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; In addition, the intensity of frequency range noise-cut that people's ear is difficult for perception is bigger, be that the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity under the signal amplitude after the reduction, increased be difficult for perception at people's ear frequency range to noise abatement intensity, satisfied human auditory's characteristics.

Behind the spectral substraction of finishing noisy speech, handle by contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic, the frequency band signals amplitude that people's ear is difficult for perception raises, and has realized the voice of Noisy Speech Signal are strengthened effect.Contrary perception weighted filtering processing to voice signal is reversible process with perception weighted filtering, and specific algorithm does not repeat them here.

Based on above-mentioned perception weighted filtering principle, the invention provides the system principle synoptic diagram that a kind of voice signal strengthens, as shown in Figure 3 to Noisy Speech Signal.As can be seen from Figure 3, when the conversation beginning, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering to be handled, Noisy Speech Signal is transformed to the perceptual weighting territory after Fourier transform is transformed into frequency domain with Noisy Speech Signal, it is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, be transformed into time domain through inverse Fourier transform successively with finishing the synthetic voice signal of spectral substraction and phase place, handle the voice signal after being enhanced through contrary perception weighted filtering again.Wherein, handling with the perception weighted filtering of noisy speech is handled to the contrary perception weighted filtering of finishing the synthetic voice signal of spectral substraction and phase place is reversible process.

Need to prove, in embodiments of the present invention, Noisy Speech Signal is carried out in the spectral substraction process, its noise spectrum is the experience estimation value, obtains the optimum spectral range of noise spectrum by a large amount of training, and can be according to the spectrum intensity moderate change of Noisy Speech Signal, assurance is carried out in the spectral substraction process Noisy Speech Signal, noise in the effective subduction zone noisy speech signal, the voice signal that is enhanced, the variation of its noise spectrum under varying environment do not repeat them here.

Below, in conjunction with concrete application scenarios of the present invention technical scheme of the present invention is described in detail, as shown in Figure 4, the schematic flow sheet of the method that strengthens for a kind of voice signal under the application scenarios of the present invention may further comprise the steps:

Noisy Speech Signal is obtained in step 401, conversation beginning.

Conversation on video telephone begins, both call sides can be observed the other side's conversation scene video, and both call sides and telephone microphone are keeping certain distance to make things convenient for collection of video signal, therefore in the voice signal that microphone collects except the voice signal of both call sides, also sneak into a large amount of noises, in embodiments of the present invention this voice signal of sneaking into much noise has been called Noisy Speech Signal.The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone, in order to improve the signal to noise ratio (S/N ratio) of conversation signal, the Noisy Speech Signal that obtains is carried out noise reduction handle.The model of noisy speech as the formula (1) in the embodiment of the invention.

The noisy speech model is:

Y (n)=s (n)+d (n) formula (1)

Need to prove, in embodiments of the present invention by Noisy Speech Signal being carried out perception weighted filtering is handled and the spectral substraction algorithm is subdued noise intensity in the voice signal, except in calling course of video telephone, cutting down noise, can also be used for other communication mode realizes noise abatement, as plain old telephone, visual telephone or the like, the variation of this application scenarios does not repeat them here.

Step 402, from decoded stream, obtain the LP Prediction Parameters, make up perceptual weighting filter.

Noisy Speech Signal is carried out before the spectral subtraction algorithm, Noisy Speech Signal is carried out perception weighted filtering to be handled, the perception weighted filtering of noisy speech handled by making up perceptual weighting filter realizes, in the present invention, the transport function of perceptual weighting filter as the formula (6):

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤ 1 formula (6)

Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can directly read from decoded stream.According to weighting factor γ ₁And γ ₂, and the LP Prediction Parameters of from decoded stream, obtaining, make up perceptual weighting filter.

Step 403, Noisy Speech Signal is carried out perception weighted filtering handle.

Noisy Speech Signal y (n)=s (n)+d (n) is carried out perception weighted filtering by perceptual weighting filter handle, Noisy Speech Signal is when carrying out the perception weighted filtering processing, and its Noisy Speech Signal is the time domain Noisy Speech Signal.Noisy Speech Signal carries out through perceptual weighting filter that its form is a time domain after the filtering, and its expression formula in time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Formula (7)

Wherein, x (k) represents the Noisy Speech Signal before the perceptual weighting, and y (k) representative is through the Noisy Speech Signal after the perceptual weighting, γ ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Handle by Noisy Speech Signal being carried out perception weighted filtering, can be so that the frequency band signals amplitude of people Er Yi perception raise in the Noisy Speech Signal, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.

Step 404, Noisy Speech Signal is carried out Fourier transform, Noisy Speech Signal is transformed into frequency domain.

Noisy Speech Signal converts the voice signal function of time domain the voice signal function of frequency field to by Fourier transform, and carries out the spectral substraction algorithm of Noisy Speech Signal at frequency domain after perception weighted filtering is handled.Concrete, by Fourier transform the process that Noisy Speech Signal is transformed into frequency domain is repeated no more.

Step 405, at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place synthetic.

Noisy Speech Signal is after perception weighted filtering processing and Fourier transform, it is synthetic at frequency domain noisy speech to be carried out spectral substraction and phase place, at this moment, the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and people's ear is difficult for the frequency band signals amplitude reduction of perception in the Noisy Speech Signal.

Noisy speech is carried out spectral substraction and phase place when synthetic, use less noise-cut intensity in the frequency range of people Er Yi perception, the frequency range that is difficult for perception at people's ear is used bigger noise-cut intensity.Concrete, the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; People's ear is difficult for the frequency range of perception and carries out bigger noise-cut intensity under the signal amplitude after the reduction, has increased at people's ear to be difficult for the frequency range of perception to noise abatement intensity.

Step 406, the voice signal that will finish spectral subtraction algorithm are transformed into time domain by inverse Fourier transform.

Noisy speech is transformed into time domain through inverse Fourier transform with Noisy Speech Signal after finishing the spectral substraction algorithm, inverse Fourier transform is the inverse operation of Fourier transform, and concrete conversion process repeats no more once more.

Step 407, the signal that will be transformed into time domain carry out contrary perception weighted filtering and handle the voice signal that is enhanced.

Handle through contrary perception weighted filtering finishing spectral substraction and the phase place voice signal after synthetic, the frequency band signals amplitude of people Er Yi perception is reduced, the frequency band signals amplitude that people's ear is difficult for perception raises the voice signal that is enhanced.The contrary perception weighted filtering of voice signal is handled process reversible when the perception weighted filtering of Noisy Speech Signal is handled, and specific algorithm does not repeat them here.

As shown in Figure 5, a kind of voice signal enhanced device 500 structural representations for the embodiment of the invention provides comprise:

Perception weighted filtering module 510 is used for after obtaining Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle;

Spectrum subtraction module 520, be connected with perception weighted filtering module 510, after Noisy Speech Signal after the sense weighted filtering handled is transformed into frequency domain, Noisy Speech Signal is carried out spectral substraction and phase place is synthetic, and the voice signal after synthetic is transformed into time domain with spectral substraction and phase place at frequency domain;

Contrary perception weighted filtering module 530 is connected with spectrum subtraction module 520, and the voice signal that is used for spectral substraction and phase place after synthetic carries out contrary perception weighted filtering processing, the voice signal that is enhanced.

Wherein, perception weighted filtering module 510 specifically is used for: the frequency band signals amplitude of Noisy Speech Signal people Er Yi perception is raise, and the frequency band signals amplitude that the people's ear in the Noisy Speech Signal is difficult for perception reduces.

510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled employed transport function and is:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled, and Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Spectrum subtraction module 520 specifically is used for: the noise-cut intensity to described Noisy Speech Signal people Er Yi perception frequency range is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.

Contrary perception weighted filtering module 530 specifically is used for: the frequency band signals amplitude of voice signal people Er Yi perception is reduced, and the frequency band signals amplitude that people's ear is difficult for perception raises.

The method and apparatus that a kind of voice signal that the application of the invention embodiment is provided strengthens, in time domain Noisy Speech Signal being carried out perception weighted filtering handles, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, and the frequency range of people Er Yi perception in the Noisy Speech Signal carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.

Through the above description of the embodiments, those skilled in the art can be well understood to the embodiment of the invention and can realize by hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding, the technical scheme of the embodiment of the invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that the enforcement embodiment of the invention is necessary.

It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.

The invention described above embodiment sequence number is not represented the quality of embodiment just to description.

More than disclosed only be several specific embodiment of the present invention, still, the embodiment of the invention is not limited thereto, any those skilled in the art can think variation all should fall into the protection domain of the embodiment of the invention.

Claims

1. the method that voice signal strengthens is characterized in that, comprising:

2. the method for claim 1 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle, and comprising:

3. method as claimed in claim 1 or 2 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

4. method as claimed in claim 1 or 2 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle, and described Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

5. the method for claim 1 is characterized in that, it is synthetic that described Noisy Speech Signal is carried out spectral substraction and phase place, comprising:

6. the method for claim 1 is characterized in that, the voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:

7. a voice signal enhanced device is characterized in that, comprising:

8. device as claimed in claim 7 is characterized in that, described perception weighted filtering module specifically is used for:

9. as claim 7 or 8 described devices, it is characterized in that described perception weighted filtering module is carried out perception weighted filtering to described Noisy Speech Signal and handled employed transport function and be:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

10. as claim 7 or 8 described devices, it is characterized in that described perception weighted filtering module is carried out the perception weighted filtering processing to described Noisy Speech Signal, described Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

11. device as claimed in claim 7 is characterized in that, described spectrum subtraction module specifically is used for:

12. device as claimed in claim 7 is characterized in that, described contrary perception weighted filtering module specifically is used for: