CN102054482A - Method and device for enhancing voice signal - Google Patents

Method and device for enhancing voice signal Download PDF

Info

Publication number
CN102054482A
CN102054482A CN2009102369170A CN200910236917A CN102054482A CN 102054482 A CN102054482 A CN 102054482A CN 2009102369170 A CN2009102369170 A CN 2009102369170A CN 200910236917 A CN200910236917 A CN 200910236917A CN 102054482 A CN102054482 A CN 102054482A
Authority
CN
China
Prior art keywords
perception
noisy speech
speech signal
voice signal
weighted filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2009102369170A
Other languages
Chinese (zh)
Other versions
CN102054482B (en
Inventor
刘霖
田康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN2009102369170A priority Critical patent/CN102054482B/en
Publication of CN102054482A publication Critical patent/CN102054482A/en
Application granted granted Critical
Publication of CN102054482B publication Critical patent/CN102054482B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the invention discloses a method and a device for enhancing a voice signal. The method comprises the steps of: obtaining a noising voice signal, and carrying out perception weighted filtering on the noising voice signal; converting the noising voice signal subjected to the perception weighted filtering into a frequency domain, carrying out spectrum subtraction and phase synthesis on the noising voice signal in the frequency domain, and converting the voice signal subjected to the spectrum subtraction and phase synthesis into a time domain; and carrying out inverse perception weighted filtering on the voice signal subjected to the spectrum subtraction and the phase synthesis to obtain an enhanced voice signal. Through using the invention, the noising voice signal is subjected to the perception weighted filtering, the interference of the noising voice signal to the noise is effectively eliminated, the enhanced voice signal is obtained and the human vision is met.

Description

The method and apparatus that a kind of voice signal strengthens
Technical field
The present invention relates to communication technical field, the method and apparatus that particularly a kind of voice signal strengthens.
Background technology
Along with the development of 3G (3rd Generation, 3-G (Generation Three mobile communication system)), visual telephone service has obtained using widely.Visual telephone service can allow both call sides observe the residing conversation scene of the other side when realizing basic communication, has strengthened user's use experience.In calling course of video telephone, in order to allow camera capture the real-time conversation scene image of both call sides, both call sides need maintain a certain distance with mobile phone microphone when conversation, therefore, sneaked into a large amount of noises in the call voice signal that mobile phone microphone collects, the introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone.
In the prior art, in order to reduce the interference of noise to speech quality, Noisy Speech Signal is transformed to frequency domain through Fourier transform, at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm, from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech, its principle is as follows:
The noisy speech model is:
Y (n)=s (n)+d (n) formula (1)
Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.
Fourier transform is made on formula (1) both sides, can be got:
Y (k)=S (k)+D (k) formula (2)
Wherein, Y (k) represents the Fourier coefficient of noisy speech, and S (k) represents the Fourier coefficient of clean speech, and D (k) represents the Fourier coefficient of noise.
Ignore the phase difference between noisy speech and the clean speech, can get:
| Y (k) |=| S (k) |+| D (k) | formula (3)
Utilize the insensitivity of people's ear, can directly from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech for phase information, and as the amplitude spectrum of the enhancing voice that obtain.Obtaining basic expression formula thus is:
Figure B2009102369170D0000021
Formula (4)
And in actual use, more uses be the improved form of spectrum subtraction, formula (5) is the improved form of spectrum subtraction:
| S ^ ( k ) | = [ | Y ( k ) | α - β | D ( k ) | α ] 1 / α Formula (5)
The spectral subtraction algorithm of this improved form and the difference of common spectral subtraction algorithm are to have introduced α and two parameters of β, and the introducing of parameter provides very big dirigibility to spectral subtraction algorithm.Noisy speech is carried out the system principle of spectrum subtraction, as shown in Figure 1.
Yet, utilize the process of spectral subtraction algorithm abating noises to have following technological deficiency in the prior art: when noisy speech is carried out spectral substraction, can't judge noise spectrum and voice spectrum accurately, therefore the spectral substraction algorithm is in abating noises, also make voice spectrum be subjected to bigger subduing, influenced the perception of human auditory normal voice spectrum.
Therefore, carry out the reduction that occurs in the process of voice enhancing based on utilizing spectral subtraction algorithm in the prior art to voice signal, existing spectral subtraction algorithm has been done a lot of improvement,, optimized the performance that voice strengthen by adjusting the intensity of abating noises in the spectral substraction.
Scheme 1 according to the frequency spectrum probability nature of noisy speech and the probability nature of noise spectrum, averages calculating, in order to the intensity of control abating noises amplitude;
Scheme 2, with α=1 in traditional spectral subtraction algorithm, β=2 change α=2 into, β=5, the subtractive method of spectrums that is improved, the coefficient that utilizes training to obtain, the intensity of control noise reduction.
In the implementation process of scheme 1 and scheme 2, the applicant finds, in scheme 1, its implementation procedure complexity height, be based on probability distribution for the control of noise reduction and carry out, not in conjunction with human auditory's feature, can not be very satisfactory on human auditory's effect; In scheme 2, though by a large amount of experiments, obtain a pair of effect α preferably, the β value, but because applied environment is constantly changing, this mode may obtain reasonable effect under certain environment, and under bigger environment, its noise reduction control still can not obtain gratifying effect.Therefore, above-mentioned by adjustment for abating noises intensity in the spectral substraction, in the scheme of the performance that the optimization voice strengthen, the subject matter that exists is: when noisy speech is carried out spectral substraction, because noise spectrum can only draw by estimation, the noise spectrum of its estimation is not accurate enough, in carrying out the spectral substraction process, may be because damping strength be controlled bad making when cutting down noise, too much reduction the intensity of speech manual, the voice signal after being enhanced can not satisfy human auditory's needs.
Summary of the invention
The method and apparatus that the embodiment of the invention provides a kind of voice signal to strengthen, the voice signal that is used to be enhanced satisfies human auditory's characteristics simultaneously.
The method that the embodiment of the invention provides a kind of voice signal to strengthen comprises:
Obtain Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;
Described Noisy Speech Signal after described sense weighted filtering handled is transformed into frequency domain, described Noisy Speech Signal is carried out spectral substraction and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;
Voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced.
Preferably, described Noisy Speech Signal is carried out perception weighted filtering handles, comprising:
The frequency band signals amplitude of people Er Yi perception in the described Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the described Noisy Speech Signal is difficult for perception reduces.
Preferably, described Noisy Speech Signal being carried out perception weighted filtering handles employed transport function and is:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Preferably, described Noisy Speech Signal is carried out perception weighted filtering handle, described Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Preferably, it is synthetic that described Noisy Speech Signal is carried out spectral substraction and phase place, comprising:
Noise-cut intensity to people Er Yi perception frequency range in the described Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.
Preferably, the voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with described spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
The embodiment of the invention provides a kind of voice signal enhanced device, it is characterized in that, comprising:
The perception weighted filtering module is used for after obtaining Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;
The spectrum subtraction module after the described Noisy Speech Signal after described sense weighted filtering is handled is transformed into frequency domain, is carried out spectral substraction to described Noisy Speech Signal and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;
Contrary perception weighted filtering module is used for to described spectral substraction and phase place voice signal after synthetic and carries out contrary perception weighted filtering and handle the voice signal that is enhanced.
Preferably, described perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception in the described Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the described Noisy Speech Signal is difficult for perception reduces.
Preferably, described perception weighted filtering module, described Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Preferably, described perception weighted filtering module is carried out the perception weighted filtering processing to described Noisy Speech Signal, and described Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Preferably, described spectrum subtraction module specifically is used for:
Noise-cut intensity to people Er Yi perception frequency range in the described Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.
Preferably, described contrary perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with described spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
Compared with prior art, the embodiment of the invention has the following advantages: in time domain Noisy Speech Signal is carried out perception weighted filtering and handle, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, finish the perception weighted filtering processing back of noisy speech synthetic in spectral substraction and phase place that frequency domain carries out noisy speech, make in the noise-cut intensity of the frequency range of people Er Yi perception less, the noise-cut intensity of frequency range that is difficult for perception at people's ear is bigger, the voice signal that is enhanced has satisfied human auditory's characteristics.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is for carrying out the system principle synoptic diagram of spectral subtraction algorithm to noisy speech in the prior art;
The schematic flow sheet of the method that a kind of voice signal that Fig. 2 provides for the embodiment of the invention strengthens;
The system principle synoptic diagram that a kind of voice signal that Fig. 3 provides for the embodiment of the invention strengthens;
Fig. 4 is the schematic flow sheet of the method for a kind of voice signal enhancing under the application scenarios of the present invention.
A kind of voice signal enhanced device structural representation that Fig. 5 provides for the embodiment of the invention.
Embodiment
In embodiments of the present invention, Noisy Speech Signal is carried out perception weighted filtering to be handled, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, after finishing perception weighted filtering processing to Noisy Speech Signal, by Fourier transform Noisy Speech Signal is transformed into frequency domain, at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm, frequency range noise reduction intensity in people Er Yi perception is less, the intensity of frequency range noise reduction that is difficult for perception at people's ear is bigger, after finishing spectral subtraction algorithm, voice signal is transformed into time domain by inverse Fourier transform, handle the voice signal that is enhanced through contrary perception weighted filtering again.
As shown in Figure 2, the schematic flow sheet of the method that a kind of voice signal that provides for the embodiment of the invention strengthens may further comprise the steps:
Step 201, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle.
Obtain Noisy Speech Signal after the conversation beginning, in time domain Noisy Speech Signal being carried out perception weighted filtering handles, perception weighted filtering is handled and to be made that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.
Step 202, the Noisy Speech Signal after will feeling weighted filtering and handling are transformed into frequency domain, Noisy Speech Signal are carried out spectral substraction and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with spectral substraction and phase place.
Noisy speech is behind perception weighted filtering, by Fourier transform Noisy Speech Signal is transformed into frequency domain, it is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, and the voice signal after synthetic is transformed into time domain through inverse Fourier transform with spectral substraction and phase place.
Concrete, Noisy Speech Signal is carried out spectral substraction and phase place is synthetic, make that the noise-cut intensity of people Er Yi perception frequency range is little in the Noisy Speech Signal, the noise-cut intensity of frequency range that people's ear in the Noisy Speech Signal is difficult for perception is big.
Step 203, the voice signal to spectral substraction and phase place after synthetic carry out contrary perception weighted filtering and handle the voice signal that is enhanced.
Noisy speech is carried out spectral substraction and the phase place voice signal after synthetic to be handled through contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic, the frequency band signals amplitude that people's ear is difficult for perception raises the voice signal after being enhanced.Voice signal after the enhancing is less in the intensity of its noise-cut of frequency range of people Er Yi perception, and people's ear to be difficult for the intensity of its noise-cut of frequency range of perception bigger, the effect that voice strengthen is obvious, satisfies human auditory's characteristics.
The method that the voice signal that the application of the invention embodiment provides strengthens, the frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
In order more clearly to understand technical scheme of the present invention, detailed introduction is done in the perception weighted filtering processing of Noisy Speech Signal.
In the invention process process, Noisy Speech Signal carries out before the spectral subtraction algorithm, Noisy Speech Signal is carried out perception weighted filtering in time domain to be handled, voice signal after perception weighted filtering is handled, by Fourier transform Noisy Speech Signal is transformed into frequency domain, at frequency domain Noisy Speech Signal is carried out spectral substraction and phase place and synthetic, voice signal with spectral substraction and phase place after synthetic is transformed into time domain by inverse Fourier transform, handle the voice signal that is enhanced through contrary perception weighted filtering again.
Concrete, it is by making up a perceptual weighting filter, by using this perceptual weighting filter Noisy Speech Signal is carried out filtering, voice signal being transformed to the perceptual weighting territory that the perception weighted filtering of Noisy Speech Signal is handled.
In embodiments of the present invention, the transport function of perceptual weighting filter is as follows:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤ 1 formula (6)
Wherein, γ 1And γ 2Be two weighting factors, γ 1And γ 2Can obtain optimum value by a large amount of training; α iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can read from the Voice decoder code stream, has reduced the complexity in the perceptual weighting implementation procedure.
After Noisy Speech Signal handled by perception weighted filtering, be in the expression formula of time domain:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i ) Formula (7)
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Noisy Speech Signal makes that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal after the said sensed weighted filtering is handled, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.After the perception weighted filtering of finishing Noisy Speech Signal was handled, less to the intensity of the frequency range noise-cut of people Er Yi perception in carrying out spectral substraction and phase place building-up process, the intensity of frequency range noise-cut that people's ear is difficult for perception was bigger.
Concrete, when carrying out spectral substraction, intensity to the frequency range noise-cut of people Er Yi perception is less, and promptly the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; In addition, the intensity of frequency range noise-cut that people's ear is difficult for perception is bigger, be that the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity under the signal amplitude after the reduction, increased be difficult for perception at people's ear frequency range to noise abatement intensity, satisfied human auditory's characteristics.
Behind the spectral substraction of finishing noisy speech, handle by contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic, the frequency band signals amplitude that people's ear is difficult for perception raises, and has realized the voice of Noisy Speech Signal are strengthened effect.Contrary perception weighted filtering processing to voice signal is reversible process with perception weighted filtering, and specific algorithm does not repeat them here.
Based on above-mentioned perception weighted filtering principle, the invention provides the system principle synoptic diagram that a kind of voice signal strengthens, as shown in Figure 3 to Noisy Speech Signal.As can be seen from Figure 3, when the conversation beginning, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering to be handled, Noisy Speech Signal is transformed to the perceptual weighting territory after Fourier transform is transformed into frequency domain with Noisy Speech Signal, it is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, be transformed into time domain through inverse Fourier transform successively with finishing the synthetic voice signal of spectral substraction and phase place, handle the voice signal after being enhanced through contrary perception weighted filtering again.Wherein, handling with the perception weighted filtering of noisy speech is handled to the contrary perception weighted filtering of finishing the synthetic voice signal of spectral substraction and phase place is reversible process.
Need to prove, in embodiments of the present invention, Noisy Speech Signal is carried out in the spectral substraction process, its noise spectrum is the experience estimation value, obtains the optimum spectral range of noise spectrum by a large amount of training, and can be according to the spectrum intensity moderate change of Noisy Speech Signal, assurance is carried out in the spectral substraction process Noisy Speech Signal, noise in the effective subduction zone noisy speech signal, the voice signal that is enhanced, the variation of its noise spectrum under varying environment do not repeat them here.
Below, in conjunction with concrete application scenarios of the present invention technical scheme of the present invention is described in detail, as shown in Figure 4, the schematic flow sheet of the method that strengthens for a kind of voice signal under the application scenarios of the present invention may further comprise the steps:
Noisy Speech Signal is obtained in step 401, conversation beginning.
Conversation on video telephone begins, both call sides can be observed the other side's conversation scene video, and both call sides and telephone microphone are keeping certain distance to make things convenient for collection of video signal, therefore in the voice signal that microphone collects except the voice signal of both call sides, also sneak into a large amount of noises, in embodiments of the present invention this voice signal of sneaking into much noise has been called Noisy Speech Signal.The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone, in order to improve the signal to noise ratio (S/N ratio) of conversation signal, the Noisy Speech Signal that obtains is carried out noise reduction handle.The model of noisy speech as the formula (1) in the embodiment of the invention.
The noisy speech model is:
Y (n)=s (n)+d (n) formula (1)
Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.
Need to prove, in embodiments of the present invention by Noisy Speech Signal being carried out perception weighted filtering is handled and the spectral substraction algorithm is subdued noise intensity in the voice signal, except in calling course of video telephone, cutting down noise, can also be used for other communication mode realizes noise abatement, as plain old telephone, visual telephone or the like, the variation of this application scenarios does not repeat them here.
Step 402, from decoded stream, obtain the LP Prediction Parameters, make up perceptual weighting filter.
Noisy Speech Signal is carried out before the spectral subtraction algorithm, Noisy Speech Signal is carried out perception weighted filtering to be handled, the perception weighted filtering of noisy speech handled by making up perceptual weighting filter realizes, in the present invention, the transport function of perceptual weighting filter as the formula (6):
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤ 1 formula (6)
Wherein, γ 1And γ 2Be two weighting factors, γ 1And γ 2Can obtain optimum value by a large amount of training; α iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can directly read from decoded stream.According to weighting factor γ 1And γ 2, and the LP Prediction Parameters of from decoded stream, obtaining, make up perceptual weighting filter.
Step 403, Noisy Speech Signal is carried out perception weighted filtering handle.
Noisy Speech Signal y (n)=s (n)+d (n) is carried out perception weighted filtering by perceptual weighting filter handle, Noisy Speech Signal is when carrying out the perception weighted filtering processing, and its Noisy Speech Signal is the time domain Noisy Speech Signal.Noisy Speech Signal carries out through perceptual weighting filter that its form is a time domain after the filtering, and its expression formula in time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i ) Formula (7)
Wherein, x (k) represents the Noisy Speech Signal before the perceptual weighting, and y (k) representative is through the Noisy Speech Signal after the perceptual weighting, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Handle by Noisy Speech Signal being carried out perception weighted filtering, can be so that the frequency band signals amplitude of people Er Yi perception raise in the Noisy Speech Signal, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.
Step 404, Noisy Speech Signal is carried out Fourier transform, Noisy Speech Signal is transformed into frequency domain.
Noisy Speech Signal converts the voice signal function of time domain the voice signal function of frequency field to by Fourier transform, and carries out the spectral substraction algorithm of Noisy Speech Signal at frequency domain after perception weighted filtering is handled.Concrete, by Fourier transform the process that Noisy Speech Signal is transformed into frequency domain is repeated no more.
Step 405, at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place synthetic.
Noisy Speech Signal is after perception weighted filtering processing and Fourier transform, it is synthetic at frequency domain noisy speech to be carried out spectral substraction and phase place, at this moment, the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and people's ear is difficult for the frequency band signals amplitude reduction of perception in the Noisy Speech Signal.
Noisy speech is carried out spectral substraction and phase place when synthetic, use less noise-cut intensity in the frequency range of people Er Yi perception, the frequency range that is difficult for perception at people's ear is used bigger noise-cut intensity.Concrete, the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; People's ear is difficult for the frequency range of perception and carries out bigger noise-cut intensity under the signal amplitude after the reduction, has increased at people's ear to be difficult for the frequency range of perception to noise abatement intensity.
Step 406, the voice signal that will finish spectral subtraction algorithm are transformed into time domain by inverse Fourier transform.
Noisy speech is transformed into time domain through inverse Fourier transform with Noisy Speech Signal after finishing the spectral substraction algorithm, inverse Fourier transform is the inverse operation of Fourier transform, and concrete conversion process repeats no more once more.
Step 407, the signal that will be transformed into time domain carry out contrary perception weighted filtering and handle the voice signal that is enhanced.
Handle through contrary perception weighted filtering finishing spectral substraction and the phase place voice signal after synthetic, the frequency band signals amplitude of people Er Yi perception is reduced, the frequency band signals amplitude that people's ear is difficult for perception raises the voice signal that is enhanced.The contrary perception weighted filtering of voice signal is handled process reversible when the perception weighted filtering of Noisy Speech Signal is handled, and specific algorithm does not repeat them here.
The method that the voice signal that the application of the invention embodiment provides strengthens, the frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
As shown in Figure 5, a kind of voice signal enhanced device 500 structural representations for the embodiment of the invention provides comprise:
Perception weighted filtering module 510 is used for after obtaining Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle;
Spectrum subtraction module 520, be connected with perception weighted filtering module 510, after Noisy Speech Signal after the sense weighted filtering handled is transformed into frequency domain, Noisy Speech Signal is carried out spectral substraction and phase place is synthetic, and the voice signal after synthetic is transformed into time domain with spectral substraction and phase place at frequency domain;
Contrary perception weighted filtering module 530 is connected with spectrum subtraction module 520, and the voice signal that is used for spectral substraction and phase place after synthetic carries out contrary perception weighted filtering processing, the voice signal that is enhanced.
Wherein, perception weighted filtering module 510 specifically is used for: the frequency band signals amplitude of Noisy Speech Signal people Er Yi perception is raise, and the frequency band signals amplitude that the people's ear in the Noisy Speech Signal is difficult for perception reduces.
510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled employed transport function and is:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled, and Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Spectrum subtraction module 520 specifically is used for: the noise-cut intensity to described Noisy Speech Signal people Er Yi perception frequency range is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.
Contrary perception weighted filtering module 530 specifically is used for: the frequency band signals amplitude of voice signal people Er Yi perception is reduced, and the frequency band signals amplitude that people's ear is difficult for perception raises.
The method and apparatus that a kind of voice signal that the application of the invention embodiment is provided strengthens, in time domain Noisy Speech Signal being carried out perception weighted filtering handles, the frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, and the frequency range of people Er Yi perception in the Noisy Speech Signal carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
Through the above description of the embodiments, those skilled in the art can be well understood to the embodiment of the invention and can realize by hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding, the technical scheme of the embodiment of the invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that the enforcement embodiment of the invention is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosed only be several specific embodiment of the present invention, still, the embodiment of the invention is not limited thereto, any those skilled in the art can think variation all should fall into the protection domain of the embodiment of the invention.

Claims (12)

1. the method that voice signal strengthens is characterized in that, comprising:
Obtain Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;
Described Noisy Speech Signal after described sense weighted filtering handled is transformed into frequency domain, described Noisy Speech Signal is carried out spectral substraction and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;
Voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced.
2. the method for claim 1 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle, and comprising:
The frequency band signals amplitude of people Er Yi perception in the described Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the described Noisy Speech Signal is difficult for perception reduces.
3. method as claimed in claim 1 or 2 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
4. method as claimed in claim 1 or 2 is characterized in that, described Noisy Speech Signal is carried out perception weighted filtering handle, and described Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
5. the method for claim 1 is characterized in that, it is synthetic that described Noisy Speech Signal is carried out spectral substraction and phase place, comprising:
Noise-cut intensity to people Er Yi perception frequency range in the described Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.
6. the method for claim 1 is characterized in that, the voice signal to described spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with described spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
7. a voice signal enhanced device is characterized in that, comprising:
The perception weighted filtering module is used for after obtaining Noisy Speech Signal, described Noisy Speech Signal is carried out perception weighted filtering handle;
The spectrum subtraction module after the described Noisy Speech Signal after described sense weighted filtering is handled is transformed into frequency domain, is carried out spectral substraction to described Noisy Speech Signal and phase place is synthetic at frequency domain, and the voice signal after synthetic is transformed into time domain with described spectral substraction and phase place;
Contrary perception weighted filtering module is used for to described spectral substraction and phase place voice signal after synthetic and carries out contrary perception weighted filtering and handle the voice signal that is enhanced.
8. device as claimed in claim 7 is characterized in that, described perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception in the described Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the described Noisy Speech Signal is difficult for perception reduces.
9. as claim 7 or 8 described devices, it is characterized in that described perception weighted filtering module is carried out perception weighted filtering to described Noisy Speech Signal and handled employed transport function and be:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
10. as claim 7 or 8 described devices, it is characterized in that described perception weighted filtering module is carried out the perception weighted filtering processing to described Noisy Speech Signal, described Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that extracts from Voice decoder device code stream, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
11. device as claimed in claim 7 is characterized in that, described spectrum subtraction module specifically is used for:
Noise-cut intensity to people Er Yi perception frequency range in the described Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the described Noisy Speech Signal is difficult for perception is big.
12. device as claimed in claim 7 is characterized in that, described contrary perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with described spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
CN2009102369170A 2009-10-27 2009-10-27 Method and device for enhancing voice signal Expired - Fee Related CN102054482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102369170A CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102369170A CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Publications (2)

Publication Number Publication Date
CN102054482A true CN102054482A (en) 2011-05-11
CN102054482B CN102054482B (en) 2012-11-28

Family

ID=43958736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102369170A Expired - Fee Related CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Country Status (1)

Country Link
CN (1) CN102054482B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867497A (en) * 2014-02-26 2015-08-26 北京信威通信技术股份有限公司 Voice noise-reducing method
CN113506577A (en) * 2021-06-25 2021-10-15 贵州电网有限责任公司 Method for perfecting voiceprint library based on incremental acquisition of telephone recording

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3454190B2 (en) * 1999-06-09 2003-10-06 三菱電機株式会社 Noise suppression apparatus and method
CN100487789C (en) * 2006-09-06 2009-05-13 华为技术有限公司 Perception weighting filtering wave method and perception weighting filter thererof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867497A (en) * 2014-02-26 2015-08-26 北京信威通信技术股份有限公司 Voice noise-reducing method
CN113506577A (en) * 2021-06-25 2021-10-15 贵州电网有限责任公司 Method for perfecting voiceprint library based on incremental acquisition of telephone recording

Also Published As

Publication number Publication date
CN102054482B (en) 2012-11-28

Similar Documents

Publication Publication Date Title
CN102652336B (en) Speech signal restoration device and speech signal restoration method
EP0698877B1 (en) Postfilter and method of postfiltering
US9020813B2 (en) Speech enhancement system and method
CN102341852B (en) Filtering speech
JP3881946B2 (en) Acoustic encoding apparatus and acoustic encoding method
JP2004101720A (en) Device and method for acoustic encoding
KR20070085532A (en) Stereo encoding apparatus, stereo decoding apparatus, and their methods
US7330813B2 (en) Speech processing apparatus and mobile communication terminal
JP2010055024A (en) Signal correction device
CN104981870B (en) Sound enhancing devices
CN116030823B (en) Voice signal processing method and device, computer equipment and storage medium
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN115171709B (en) Speech coding, decoding method, device, computer equipment and storage medium
CN114007157A (en) Intelligent noise reduction communication earphone
US6424942B1 (en) Methods and arrangements in a telecommunications system
CN111899750B (en) Speech enhancement algorithm combining cochlear speech features and hopping deep neural network
CN1416561A (en) Speech decoder and method for decoding speech
CN102054482B (en) Method and device for enhancing voice signal
CN101960514A (en) Signal analysis/control system and method, signal control device and method, and program
CN113936680B (en) Single-channel voice enhancement method based on multi-scale information perception convolutional neural network
CN109215635B (en) Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement
CN101582263B (en) Method and device for noise enhancement post-processing in speech decoding
CN113411456B (en) Voice quality assessment method and device based on voice recognition
CN101826327B (en) Method and system for judging transient state based on time domain masking
CN101533639B (en) Voice signal processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121128

Termination date: 20211027