CN103440872B - The denoising method of transient state noise - Google Patents

The denoising method of transient state noise Download PDF

Info

Publication number
CN103440872B
CN103440872B CN201310357211.6A CN201310357211A CN103440872B CN 103440872 B CN103440872 B CN 103440872B CN 201310357211 A CN201310357211 A CN 201310357211A CN 103440872 B CN103440872 B CN 103440872B
Authority
CN
China
Prior art keywords
frame
buf
data
pitch
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310357211.6A
Other languages
Chinese (zh)
Other versions
CN103440872A (en
Inventor
陈喆
殷福亮
周文颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201310357211.6A priority Critical patent/CN103440872B/en
Publication of CN103440872A publication Critical patent/CN103440872A/en
Application granted granted Critical
Publication of CN103440872B publication Critical patent/CN103440872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The present invention discloses the denoising method of transient state noise, belongs to signal processing technology field. First the present invention calculates the mel cepstrum coefficients of this frame signal, predicts the fundamental tone cycle of this frame signal simultaneously, then uses mel cepstrum coefficients to detect whether this frame signal exists noise, if there is noise, then uses fundamental tone period forecasting value to rebuild this frame signal.

Description

The denoising method of transient state noise
Technical field
The present invention relates to the denoising method of transient state noise, belong to signal processing technology field.
Background technology
Transient state adding property noise in sound signal, also referred to as transient noise, or impulse noise. Usually, transient state noise be in the time domain discontinuous, intermittently, pulsed, noise enerty mainly concentrates in shorter time interval, and in this interval, the energy of transient state noise is obviously more much larger than the energy of pure signal. Typical transient state noise as desk knock sound, sound of closing the door, brouhaha, keyboard hit key sound, mousebutton sound, hammer hit and beat sound etc., they often appear at a lot of application scenario, such as osophone, mobile phone, video signal meeting equipment etc. The existence of transient state noise seriously affects audio frequency quality, therefore, it is necessary to take measures to be suppressed by transient state noise, to strengthen the quality of audio frequency. Current noise suppression algorithm is for steady noise and continuous noise situation mostly, usually method described in document " research of speech enhan-cement and correlation technique thereof " is used to carry out speech enhan-cement, such as spectrum-subtraction, self-adaptive routing etc., but these algorithms are helpless to above-mentioned transient state noise, substantially do not have inhibition.
Summary of the invention
The present invention is directed to the proposition of above problem, and develop the denoising method of transient state noise.
The technical scheme that the present invention takes is: the mel cepstrum coefficients first calculating this frame signal, predict the fundamental tone cycle of this frame signal simultaneously, then use mel cepstrum coefficients to detect this frame signal whether to there is noise and namely carry out walkaway, if there is noise, then fundamental tone period forecasting value is used to carry out waveform reconstruction.
The useful effect of the present invention: using the noised audio of 20 first clean speech audio frequency (comprise adult man, grow up woman, children speech audio frequency) and 4 types to test, noise type is respectively: mouse sound, knock sound, metronome sound, keyboard sound. The time length of four kinds of noises is respectively: mouse sound is 10ms, knocks sound, metronome sound is 20ms, and keyboard sound is 30ms. Every first pure audio frequency is added this 4 kinds of noises respectively, obtains 80 first containing noise frequency. The number that every first audio frequency adds noise is 30, and the distance between noise is equal. The sampling rate of all audio frequency is fs=48kHz, frame length is N=480. MFCC calculation stages, is NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 and ties up MFCC; In the transient state walkaway stage, adaptive threshold is set to Thres=const ener, and for making thresholding be applicable to all noises, constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; In the fundamental tone phase estimate stage, searching for the fundamental tone cycle in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, points N of being fade-in fade-out1, N2Being 32, buffer zone buf (n) length is 2240. After using the present invention that noisy speech is carried out denoising, increase substantially the intelligibility of voice, decrease the tired sense of hearer. Use segmentation signal to noise ratio snrSegWith PEAQ two kinds of indexs, present method denoising effect is carried out assessment result to see shown in Figure 12 and the Figure 13 in accompanying drawing explanation.
Accompanying drawing explanation
The relation of Fig. 1 mel-frequency and linear frequency.
The technical scheme flow process of Fig. 2 prior art one.
The technical scheme flow process of Fig. 3 prior art two.
Fig. 4 the technical program block diagram.
Fig. 5 MFCC feature extracts block diagram.
Fig. 6 mel-frequency bank of filters.
Fig. 7 fundamental tone phase estimate block diagram.
The linear interpolation of Fig. 8 point-to-point transmission.
Signal when Fig. 9 (a) present frame is not repaired.
The new pitch cycle waveform pw of Fig. 9 (b)(p)(n)��
Signal after Fig. 9 (c) present frame reparation.
Signal when Figure 10 (a) present frame is not repaired.
Figure 10 (b) current frame signal.
Figure 10 (c) repairs rear signal.
Figure 11 (a) denoising front signal.
Signal after Figure 11 (b) denoising.
Figure 12 denoising effect evaluation form (SNR).
Figure 13 denoising effect assessment (PEAQ).
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described:
Mel cepstrum coefficients:
The research of the sense of hearing mechanism of people being found, the sound wave of different frequency is had different hearing sensitivities by people's ear, and the sharpness of voice is affected maximum by the speech signal between 200Hz to 5kHz. In addition, people's ear has masking effect, and more weak speech signal is had by the speech signal that namely energy is big certain covers effect. Usually, the audio frequency of more low-frequency audio masking higher-frequency rate is easy, otherwise then more difficult, and that is, the critical bandwidth higher-frequency end of the sound mask at low frequency place is little. Accordingly, people according to the size of critical bandwidth by close to rare arrangement one group of bandpass filter, input signal is carried out filtering. If the energy each bandpass filter outputed signal is as the essential characteristic of signal, then after this feature being processed further, so that it may as the feature of voice, this is exactly mel cepstrum coefficients (MFCC). This kind of feature does not rely on the character of signal, namely input signal is not done any hypothesis and restriction, make use of again the auditory perception property of people's ear simultaneously, therefore, compared with the linear prediction residue error (LPCC) based on channel model, it has better robustness, and when signal to noise ratio is lower, still has good speech recognition performance.
MFCC is the cepstrum parameter extracted in Mel scale frequency territory. Mel scale describes the non-linear character of people's ear frequency, and the relation of it and frequency can approximate representation be
f mel = 2595 log 10 ( 1 + f linear 700 ) - - - ( 18 )
In formula, f is frequency, and unit is Hz. The relation being mel-frequency and linear frequency shown in Fig. 1, along with flinearLinear increase, fmelThe form of logarithm increases.
Letter packet loss concealment:
In the voice communication system based on IP agreement, such as based on IP net voice (VoIP) in, due to network is congested or transmitting procedure postpone shake, letter packet loss can be caused, namely some letter bag can not appear at receiving end on time, seriously affects the voice quality of receiving end. Therefore, must take some measures at receiving end, to reduce the voice distortion caused because of letter packet loss. Usually, the measure of this kind of process packet loss problem is called letter packet loss concealment algorithm (PLC) algorithm.
PLC algorithm is mainly divided into the Processing Algorithm based on sending end and Processing Algorithm two class based on receiving end. Jointly participate in by sending and receiving two ends based on sending end PLC algorithm; Based on receiving end PLC algorithm, then the letter bag that only normally receives according to receiving end, the coded system lost letter packet number and know in advance, recover original voice as far as possible. Owing to not needing the relevant data of sending end based on the PLC technology of receiving end, flow and the time delay of network therefore can not be increased. The conventional PLC method based on receiving end has quiet alternative method, a front letter bag repetition methods, template matching method, pitch waveform clone method and linear prediction method etc.
Pitch cycle waveform in this paper copies (PWR) method, belongs to the PLC method based on receiving end.
Prior art one related to the present invention
The technical scheme of prior art one
What will bravely waits in paper " based on the speech enhan-cement of Kalman filtering under impulse noise environment ", it is proposed that the sound enhancement method under a kind of transient noise conditions. The schema of the method as shown in Figure 2, first finds out the frequency range that transient state noise sample energy is maximum with the ratio of signals and associated noises sample energy, then utilizes the energy distribution situation of this frequency range, differentiates that whether speech signal is by transient state noise jamming frame by frame; On this basis, the method carries out denoising for the voice frame of transient state noise jamming, application card Kalman Filtering algorithm; In addition, autoregression (AR) model parameter estimation process has been improved by the method.
The shortcoming of prior art one
(1) for the noise that hangover is longer, trailing portion likely can detect out.
(2) when denoising, Kalman filtering used is applicable to steady noise is carried out denoising, is not suitable for non-stable transient state noise, and therefore denoising effect is limited, and noise residual is more, have impact on voice quality.
Prior art two related to the present invention
The technical scheme of prior art two
Hetherington etc. are in patent of invention " Repetitivetransientnoiseremoval ", it is proposed to a kind of transient state noise suppressing method. The schema of Hetherington method is as shown in Figure 3. The method first carries out modeling according to noise behavior, then utilizes the relation conefficient of modeling signal and signal to be detected to determine that whether data to be tested are containing noise, if there is noise, then remove the noise contribution in signal to be detected according to modeling signal.
The shortcoming of prior art two
The noise repeated can be carried out denoising by Hetherington method effectively, but owing to transient state noise type is varied, when there is the transient state noise of number of different types in the short period of time, modeling being caused inaccurate, now the denoising effect of Hetherington method is poor.
Elaborating of technical solution of the present invention
Technical problem to be solved by this invention
The audio frequency of transient state noise jamming is carried out speech enhan-cement, transient suppression noise, improve voice quality, it is to increase audio frequency intelligibility.
Complete skill scheme provided by the invention:
Fig. 4 is shown in by technical solution of the present invention block diagram: utilize input audio signal, extracts MFCC parameter; Then whether detect in sound signal containing noise by MFCC parameter; If detected result is for containing noise, then using PWR method to replace containing making an uproar frame data, carry out waveform reconstruction; If detected result is not containing noise, sound signal is former state output then.
Technical solution of the present invention performing step:
The sampling rate of input monophonic audio signal is fs=48kHz. Containing the sound signal x (n) that makes an uproar, input can represent that wherein s (n) represents clean speech signal for x (n)=s (n)+d (n), d (n) represents transient state noise signal.
(1) the MFCC feature of sound signal is extracted
As shown in Figure 5, gray-scale map can better understand the technique effect of the present invention to the leaching process of MFCC, the special technique effect providing gray-scale map that the present invention is described. Technique effect spy in order to allow auditor more clearly understand the present invention provides gray-scale map Fig. 5 so that the technique effect of the present invention to be described. For reference. First time-domain audio signal is carried out time-frequency conversion, calculate its energy spectrum; Then being multiplied with the triangle filter group of Mel scale by this energy spectrum, then the logarithm energy of multiplied result is done discrete cosine transform (DCT), the front L dimensional vector obtained like this is called MFCC, calculates the concrete steps of MFCC:
1) input signal framing, frame length is set to 10ms, owing to sample frequency is fs=48kHz, so the data length of a frame is N=480 point; Then data are normalized: if signal quantization figure place is 16bit, then by data divided by 215, the scope of data is narrowed down to (-1,1), namely completes the normalization method of data. If current frame signal is p frame signal, then have
x(p)(n)=x[p��(N-1)+n],n=0,1,��,N-1(19)
2) pre-treatment. Current frame signal is carried out pre-emphasis and windowing process, namely
y(p)(n)=x(p)(n)-��x(p)(n-1)(20)
y w ( p ) ( n ) = y ( p ) ( n ) w ( n ) - - - ( 21 )
Wherein pre-emphasis factor-beta=0.938; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n ��/N).
3) pretreated signal is N=1024 point FFT, obtains frequency domain signal Y(p)(k)��
4) frequency domain signal Y is calculated(p)The energy spectrum of (k) | Y(p)(k)|2��
5) by the energy spectrum of frequency domain signal by the triangle filter group H of one group of Mel scale, frequency domain filtering is carried out.
In bank of filters, having M wave filter, each wave filter is triangular filter, overlapped between wave filter, as shown in Figure 6: the mid-frequency of each wave filter is f (m), m=1,2 ..., M, the present invention gets M=24. Filter design method: by input signal end frequency fs/ 2, i.e. 24kHz, transform to Mel scale frequency territory by formula (1), obtain Fsmel; By interval (0, Fsmel) it is divided into 25 parts, remove 0 and FsmelTwo end points, 24 points of remaining cutpoints are respectively as the mid-frequency of 24 wave filters. Each point of cutpoint f (m) is evenly distributed in Mel scale frequency, then transforms to linear frequency scale by formula (1). After conversion, the interval between f (m) reduces along with the reduction of m value, broadening along with the increase of m value.
According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
H ( m , k ) = 0 , f ( k ) < f ( m + 1 ) 2 [ f ( k ) - f ( m - 1 ) ] [ f ( m + 1 ) - f ( m - 1 ) ] [ f ( m ) - f ( m - 1 ) ] , f ( m - 1 ) &le; f ( k ) < f ( m ) 2 [ f ( m + 1 ) - f ( k ) ] [ f ( m + 1 ) - f ( m - 1 ) ] [ f ( m + 1 ) - f ( m ) ] , f ( m ) &le; f ( k ) &le; f ( m + 1 ) 0 , f ( k ) > f ( m + 1 ) - - - ( 22 )
6) calculate energy and logarithm that each filters H (m, k) exports, obtain E (m), namely
E ( m ) = log 10 [ &Sigma; k H ( m , k ) | Y ( p ) ( k ) | 2 ] , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , M - - - ( 23 )
E (m) is done discrete cosine transform, L=12 rank MFCC can be obtained, be designated as C (l)
C ( p ) ( 0 ) = 2 L &Sigma; m = 0 M - 1 E ( m ) , l = 0 (24)
C ( p ) ( l ) = 2 L &Sigma; m = 0 M - 1 E ( m ) cos ( &pi;l ( 2 m + 1 ) 2 M ) , 1 &le; l &le; L - 1
(2) walkaway:
Calculate the Euclidean distance dist between MFCC and the MFCC of former frame signal of current frame signal
dist = &Sigma; l = 0 L [ C ( p ) ( l ) - C ( p - 1 ) ( l ) ] 2 , - - - ( 25 )
Judge that whether present frame is containing noise according to distance value and threshold T hres. Threshold T hres is determined by following formula self-adaptation
Thres=10 ener, (26)
Wherein ener is the energy after each frame signal normalization method, and its minimum value is set to 60.0.
After having detected, upgrade the MFCC feature of present frame, namely
C(p)(l)=b��C(p-1)(l)+(1-b)��C(p-1)(l),(27)
Wherein forgetting factor b=0.4. When next frame of noise frame is voice frame, this update method can prevent inspection by mistake.
(3) fundamental tone period forecasting:
Each frame speech signal is estimated the fundamental tone cycle. If present frame is noise frame, then fundamental tone cycle according to front cross frame signal predicts the present frame fundamental tone cycle. Fundamental tone phase estimate block diagram is as shown in Figure 7: for different speaker, and the fundamental tone cycle, generally in 2-12ms, therefore, searches for the fundamental tone cycle herein in 2-12ms. If PMAX is the data amount check corresponding to 12ms, i.e. PMAX=576; PMIN is the data amount check corresponding to 2ms, i.e. PMIN=96. Using buffer zone buf (n) that length is 3PMAX+N=2208 to estimate the fundamental tone cycle, wherein buffer zone buf (n) is used for storing the data exported.
Fundamental tone phase estimate method is as follows:
1) buf (n) is carried out low-pass filtering, obtain bufd(n). Wherein the limiting frequency of low-pass filter (LPF) is 900Hz.
2) to bufdN () carries out center slicing, obtain bufc(n), namely
buf c ( n ) = buf d ( n ) - C L , buf d ( n ) > C L buf d ( n ) + C L , buf d ( n ) < - C L 0 , | buf d ( n ) | &le; C L , - - - ( 28 )
Wherein CLFor clipping lever, usually it is set to the 68% of normalization data maximum value.
3) to bufcN () carries out auto-correlation computation, search for autocorrelative maximum value position in (96,576) scope, it can be used as fundamental tone phase estimate value Pitch.
r buf c ( n ) = &Sigma; m = 0 2 PMAX - 1 buf c ( m ) buf c ( m + n ) , PMIN &le; n &le; PMAX - - - ( 29 )
Pitch = arg max PMIN &le; n &le; PMAX r buf c ( n ) - - - ( 30 )
4) for preventing frequency multiplication from occurring, by formula (13) to front cross frame fundamental tone period forecasting value Pitch(p-1)And Pitch(p-2)Carry out smoothing processing, namely
Present frame fundamental tone cycle Pitch is predicted according to two fundamental tone cycles after level and smooth(p), namely
Pitch(p)=Pitch(p-1)+(Pitch(p-1)-Pitch(p-2))��(32)
(4) waveform reconstruction:
Extract last pitch cycle waveform of former frame, it is carried out linear interpolation, obtain new pitch cycle waveform.
1) owing to buf (n) storing output frame data, so the pitch cycle waveform of former frame can be extracted from buf (n), i.e. the last Pitch of former frame output signal(p-1)Individual, its Wave data is designated as pw(p-1)(n). To pw(p-1) (n) carry out linear interpolation, obtaining length is Pitch(p)New waveform, be designated as pw(p)(n). As shown in Figure 8, interpolation formula is the linear interpolation of point-to-point transmission
pw ( p ) ( n &prime; ) = ( pitch ( p - 1 ) pitch ( p ) &CenterDot; n &prime; - n + 1 ) &CenterDot; [ pw ( p - 1 ) ( n ) - pw ( p - 1 ) ( n - 1 ) ] + pw ( p - 1 ) , n - 1 &le; pitct ( p - 1 ) pitch ( p ) &CenterDot; n &prime; < n - - - ( 33 )
2) new waveform is used to carry out wave period duplication:
D. the principle that wave period copies is if Fig. 9 (a) is to Fig. 9 (c): if present frame is noise frame (no matter former frame is noise frame or clean speech frame), treating processes is: according to formula (15), AB section data in buf (n) are carried out overlapping addition with CD section data, and carry out process of being fade-in fade-out, to ensure the data of D both sides, there is continuity, namely
bufCD(n)=�� bufCD(n)+(1-��)��bufAB(n)(34)
=�� bufCD(n)+(1-��)��bufCD(n-Pitch) 0��n < N1
&alpha; = N 1 - i N 1 , i = 0,1 , &CenterDot; &CenterDot; &CenterDot; , N 1 - 1 , - - - ( 35 )
Wherein, �� is decay factor, linearly decays to 0 from 1; AB section and CD section data length N1=32��
E. according to cycle Pitch(p), with new waveform pw(p)N () constantly copies in DF region. Wherein, DE section is the present frame after repairing; EF section data length is N2=32, its role is to, when next frame is voice frame, it is fade-in fade-out for data, to ensure E two ends and the continuity between frame and frame.
F. the frame data started with C point are exported in buf (n). This method exports to exist and postpones, time of lag and CD segment length. Move forward N point (a frame length degree) again by all for buf (n) data.
As shown in Figure 10 (a) to Figure 10 (c), Figure 10 (a) for present frame be multi-frame to be repaired time, present frame is abandoned signal diagram; The current frame signal that Figure 10 (b) rebuilds for using this patent method; Figure 10 (c) is for repairing rear signal. If present frame is clean speech frame, and former frame is noise frame, treating processes is as follows,
D. now DG section data are the EF section data of previous frame in buf (n). By the front N of DG section and present frame input2Individual data point carries out data fusion (calculating with formula (15) similar), is stored in DG.
E. by after present frame remainder strong point slavish copying to the G point in buf (n).
F. the data of the frame length degree started with C point are exported, then the data length of a frame signal that all for buf (n) data are moved forward, i.e. N point.
If present frame and former frame are all clean speech frame, then present frame is inputted area to be repaired in data slavish copying to buf (n), i.e. DE region in Fig. 8; Export the data of the frame length degree started with C point.
The useful effect that technical solution of the present invention is brought:
Using the noised audio of 20 first clean speech audio frequency (comprise adult man, grow up woman, children speech audio frequency) and 4 types to test, noise type is respectively: mouse sound, knock sound, metronome sound, keyboard sound. The time length of four kinds of noises is respectively: mouse sound is 10ms, knocks sound, metronome sound is 20ms, and keyboard sound is 30ms. Every first pure audio frequency is added this 4 kinds of noises respectively, obtains 80 first containing noise frequency. The number that every first audio frequency adds noise is 30, and the distance between noise is equal.
The sampling rate of all audio frequency is fs=48kHz, frame length is N=480. MFCC calculation stages, is NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 and ties up MFCC; In the transient state walkaway stage, adaptive threshold is set to Thres=const ener, and for making thresholding be applicable to all noises, constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; In the fundamental tone phase estimate stage, searching for the fundamental tone cycle in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, points N of being fade-in fade-out1, N2Being 32, buffer zone buf (n) length is 2240.
After using the present invention that noisy speech is carried out denoising, increase substantially the intelligibility of voice, decrease the tired sense of hearer. Use segmentation signal to noise ratio snrSegBeing assessed by present method denoising effect with PEAQ two kinds of indexs, wherein segmentation signal-noise ratio computation method is
SNR seg in = 1 R &Sigma; i = 1 R 10 log 10 &Sigma; n &Element; frame i | s ( n ) | 2 &Sigma; n &Element; frame i | x ( n ) - s ( n ) | 2 , - - - ( 36 )
SNR seg out = 1 R &Sigma; i = 1 R 10 log 10 &Sigma; n &Element; frame i | s ( n ) | 2 &Sigma; n &Element; frame i | s ^ ( n ) - s ( n ) | 2 , - - - ( 37 )
By two kinds of indexs, present method denoising effect is assessed, result as shown in Figure 12 and Figure 13, Figure 12 be use signal to noise ratio to before signals and associated noises denoising with denoising after objective audio frequency quality compare; Figure 13 be use PEAQ to before signals and associated noises denoising with denoising after objective audio frequency quality compare.
Signals and associated noises with the language spectrogram of signal after the denoising of this scheme as Figure 11 (a) and Figure 11 (b) and shown in; Gray-scale map can better understand the technique effect of the present invention, the special technique effect providing gray-scale map that the present invention is described. Technique effect spy in order to allow auditor more clearly understand the present invention provides gray-scale map and Figure 11 (a) and Figure 11 (b) that the technique effect of the present invention is described. For reference. Figure 11 (a) is the language spectrogram of the audio frequency by the pollution of mouse click sound; Figure 11 (b) for frequently carrying out the audio frequency language spectrogram after denoising to the noise of band shown in Figure 11 (a).
The above; it is only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; any it is familiar with those skilled in the art in the technical scope that the present invention discloses; technical scheme and invention design thereof according to the present invention are equal to replacement or are changed, and all should be encompassed within protection scope of the present invention.
The shortenings that the present invention relates to and Key Term definition
AR:AutoregressiveModel, autoregressive model.
DCT:DiscreteCosineTransform, discrete cosine transform.
FFT:FastFourierTransform, Fast Fourier Transform (FFT).
LPF:LowPassFilter, low-pass filter.
LPCC:LinearPredictionCepstrumCoefficient, linear prediction residue error.
MFCC:MelFrequencyCepstrumCoefficient, mel cepstrum coefficients.
VoIP:VoiceoverIP, based on the voice of IP net.
PLC:PacketLossConcealment, letter packet loss concealment algorithm.
PWR:PitchWaveformReplication, pitch cycle waveform copies.
SNR:Signal_to_NoiseRatio, signal to noise ratio.
A kind of objective evaluation standard for audio frequency quality perception of PEAQ:PerceptualEvaluationofAudioQuality, ITU-RBS.1387 suggestion.

Claims (3)

1. the denoising method of transient state noise, it is characterized in that: the mel cepstrum coefficients first calculating this frame signal, predict the fundamental tone cycle of this frame signal simultaneously, then use mel cepstrum coefficients to detect this frame signal whether to there is noise and namely carry out walkaway, if there is noise, then fundamental tone period forecasting value is used to carry out waveform reconstruction;
The method of fundamental tone period forecasting is as follows:
1) buf (n) is carried out low-pass filtering, obtain bufd(n); Wherein the limiting frequency of low-pass filter (LPF) is 900Hz;
2) to bufdN () carries out center slicing, obtain bufc(n), namely
buf c ( n ) = buf d ( n ) - C L , buf d ( n ) > C L buf d ( n ) + C L , buf d ( n ) < - C L 0 , | buf d ( n ) | &le; C L - - - ( 1 )
Wherein CLFor clipping lever, usually it is set to the 68% of normalization data maximum value;
3) to bufcN () carries out auto-correlation computation, search for autocorrelative maximum value position in (96,576) scope, it can be used as fundamental tone phase estimate value Pitch;
r buf c ( n ) = &Sigma; m = 0 2 P M A X - 1 buf c ( m ) buf c ( m + n ) , P M I N &le; n &le; P M A X - - - ( 2 )
P i t c h = arg m a x P M I N &le; n &le; P M A X r buf c ( n ) - - - ( 3 )
4) for preventing frequency multiplication from occurring, by formula (4) to front cross frame fundamental tone period forecasting value Pitch(p-1)And Pitch(p-2)Carry out smoothing processing, namely
Present frame fundamental tone cycle Pitch is predicted according to two fundamental tone cycles after level and smooth(p), namely
Pitch(p)=Pitch(p-1)+(Pitch(p-1)-Pitch(p-2))(5)
The method of waveform reconstruction is:
1) owing to buf (n) storing output frame data, so the pitch cycle waveform of former frame can be extracted from buf (n), i.e. the last Pitch of former frame output signal(p-1)Individual, its Wave data is designated as pw(p-1)(n); To pw(p-1)N () carries out linear interpolation, obtaining length is Pitch(p)New waveform, be designated as pw(p)N () interpolation formula is
pw ( p ) ( n &prime; ) = ( pitch ( p - 1 ) pitch ( p ) &CenterDot; n &prime; - n + 1 ) &CenterDot; &lsqb; pw ( p - 1 ) ( n ) - pw ( p - 1 ) ( n - 1 ) &rsqb; + pw ( p - 1 ) ( n - 1 ) , n - 1 &le; pitch ( p - 1 ) pitch ( p ) &CenterDot; n &prime; < n , - - - ( 6 )
2) new waveform is used to carry out wave period duplication; The method that new waveform carries out wave period duplication is as follows:
No matter if a. present frame is noise frame and former frame is noise frame or clean speech frame, treating processes is: according to formula (7), AB section data in buf (n) are carried out overlapping addition with CD section data, and carry out process of being fade-in fade-out, to ensure the data of D both sides, there is continuity, namely
buf C D ( n ) = &alpha; &CenterDot; buf C D ( n ) + ( 1 - &alpha; ) &CenterDot; buf A B ( n ) = &alpha; &CenterDot; buf C D ( n ) + ( 1 - &alpha; ) &CenterDot; buf C D ( n - P i t c h ) , 0 &le; n < N 1 - - - ( 7 )
&alpha; = N 1 - i N 1 , i = 0 , 1 , ... , N 1 - 1 ; - - - ( 8 )
Wherein, �� is decay factor, linearly decays to 0 from 1; AB section and CD section data length N1=32;
B. according to cycle Pitch(p), with new waveform pw(p)N () constantly copies in DF region; Wherein, DE section is the present frame after repairing; EF section data length is N2=32, its role is to, when next frame is voice frame, it is fade-in fade-out for data, to ensure E two ends and the continuity between frame and frame;
C. the frame data started with C point are exported in buf (n); This method exports to exist and postpones, time of lag and CD segment length, then the N point that all for buf (n) data moved forward;
If present frame is clean speech frame, and former frame is noise frame, treating processes is as follows,
A. now DG section data are the EF section data of previous frame in buf (n); By the front N of DG section and present frame input2It is identical with the process that CD section data overlap is added with AB section in formula (7) and be stored in DG that individual data point carries out its method of calculation of data fusion;
B. by after present frame remainder strong point slavish copying to the G point in buf (n);
C. the data of the frame length degree started with C point are exported, then a N point i.e. frame length degree that all for buf (n) data are moved forward; If present frame and former frame are all clean speech frame, then present frame is inputted area to be repaired in data slavish copying to buf (n); Export the data of the frame length degree started with C point.
2. the denoising method of transient state noise according to claim 1, it is characterised in that: mel cepstrum coefficients method of calculation are as follows:
1) input signal framing, frame length is set to N=480 and data length is 10ms, and data is normalized; If current frame signal is p frame signal, then have
x(p)(n)=x [p (N-1)+n], n=0,1 ..., N-1; (9)
2) pre-treatment, carries out pre-emphasis and windowing process, namely to current frame signal
y(p)(n)=x(p)(n)-��x(p)(n-1); (10)
Wherein pre-emphasis factor-beta=0.938; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n ��/N);
3) pretreated signal is N=1024 point FFT, obtains frequency domain signal Y(p)(k);
4) frequency domain signal Y is calculated(p)The energy spectrum of (k) | Y(p)(k)|2;
5) by the energy spectrum of frequency domain signal by the triangle filter group H of one group of Mel scale, frequency domain filtering is carried out;
In bank of filters, having M wave filter, each wave filter is triangular filter, overlapped between wave filter, and the mid-frequency of each wave filter is f (m), m=1,2 ..., M, M=24;
Filter design method: by input signal end frequency fs/ 2, i.e. 24kHz, passes through formula
f m e l = 2595 log 10 ( 1 + f l i n e a r 700 ) , - - - ( 12 )
In formula, f is frequency, and unit is Hz; Transform to Mel scale frequency territory, obtain Fsmel; By interval (0, Fsmel) it is divided into 25 parts, remove 0 and FsmelTwo end points, 24 points of remaining cutpoints are respectively as the mid-frequency of 24 wave filters; Each point of cutpoint f (m) is evenly distributed in Mel scale frequency, then transforms to linear frequency scale by formula (12); After conversion, the interval between f (m) reduces along with the reduction of m value, broadening along with the increase of m value; According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
H ( m , k ) = 0 , f ( k ) < f ( m + 1 ) 2 &lsqb; f ( k ) - f ( m - 1 ) &rsqb; &lsqb; f ( m + 1 ) - f ( m - 1 ) &rsqb; &lsqb; f ( m ) - f ( m - 1 ) &rsqb; , f ( m - 1 ) &le; f ( k ) < f ( m ) 2 &lsqb; f ( m + 1 ) - f ( k ) &rsqb; &lsqb; f ( m + 1 ) - f ( m - 1 ) &rsqb; &lsqb; f ( m + 1 ) - f ( m ) &rsqb; , f ( m ) &le; f ( k ) &le; f ( m + 1 ) 0 , f ( k ) > f ( m + 1 ) - - - ( 13 )
6) calculate energy and logarithm that each filters H (m, k) exports, obtain E (m), namely
E ( m ) = log 10 &lsqb; &Sigma; k H ( m , k ) | Y ( p ) ( k ) | 2 &rsqb; , m = 1 , 2 , ... , M - - - ( 14 )
E (m) is done discrete cosine transform, L=12 rank MFCC can be obtained, be designated as C (l)
C ( p ) ( 0 ) = 2 L &Sigma; m = 0 M - 1 E ( m ) , l = 0 C ( p ) ( l ) = 2 L &Sigma; m = 0 M - 1 E ( m ) cos ( &pi; l ( 2 m + 1 ) 2 M ) , 1 &le; l &le; L - 1. - - - ( 15 )
3. the denoising method of transient state noise according to claim 1, it is characterised in that: the process of walkaway is as follows:
Calculate the Euclidean distance dist between MFCC and the MFCC of former frame signal of current frame signal
d i s t = &Sigma; l = 0 L &lsqb; C ( p ) ( l ) - C ( p - 1 ) ( l ) &rsqb; 2 , - - - ( 16 )
Judge that whether present frame is containing noise according to distance value and threshold T hres; Threshold T hres is determined by following formula self-adaptation
Thres=10 ener, (17)
Wherein ener is the energy after each frame signal normalization method, and its minimum value is set to 60.0; After having detected, upgrade the MFCC feature of present frame, namely
C(p)(l)=b C(p-1)(l)+(1-b)��C(p-1)(l), (18)
Wherein forgetting factor b=0.4; When next frame of noise frame is voice frame, this update method can prevent inspection by mistake.
CN201310357211.6A 2013-08-15 2013-08-15 The denoising method of transient state noise Active CN103440872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310357211.6A CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310357211.6A CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Publications (2)

Publication Number Publication Date
CN103440872A CN103440872A (en) 2013-12-11
CN103440872B true CN103440872B (en) 2016-06-01

Family

ID=49694563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310357211.6A Active CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Country Status (1)

Country Link
CN (1) CN103440872B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745729B (en) * 2013-12-16 2017-01-04 深圳百科信息技术有限公司 A kind of audio frequency denoising method and system
CN103778914B (en) * 2014-01-27 2017-02-15 华南理工大学 Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN105830152B (en) * 2014-01-28 2019-09-06 三菱电机株式会社 The input signal bearing calibration and mobile device information system of audio collecting device, audio collecting device
CN104157295B (en) * 2014-08-22 2018-03-09 中国科学院上海高等研究院 For detection and the method for transient suppression noise
CN106652624B (en) * 2016-10-12 2019-05-24 快创科技(大连)有限公司 A kind of medical operating analogue system based on VR technology and transient noise noise-removed technology
CN108182953B (en) * 2017-12-27 2021-03-16 上海传英信息技术有限公司 Audio tail POP sound processing method and device
CN108899043A (en) * 2018-06-15 2018-11-27 深圳市康健助力科技有限公司 The research and realization of digital deaf-aid instantaneous noise restrainable algorithms
CN109346105B (en) * 2018-07-27 2022-04-15 南京理工大学 Pitch period spectrogram method for directly displaying pitch period track
CN111081269B (en) * 2018-10-19 2022-06-14 ***通信集团浙江有限公司 Noise detection method and system in call process
CN110010145B (en) * 2019-02-28 2021-05-11 广东工业大学 Method for eliminating friction sound of electronic stethoscope
CN110703144B (en) * 2019-09-08 2021-07-09 广东石油化工学院 Transformer operation state detection method and system based on discrete cosine transform
CN114333880B (en) * 2022-03-04 2022-06-14 南京大鱼半导体有限公司 Signal processing method, device, equipment and storage medium
CN115063895A (en) * 2022-06-10 2022-09-16 深圳市智远联科技有限公司 Ticket selling method and system based on voice recognition

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1956058A (en) * 2005-10-17 2007-05-02 哈曼贝克自动***-威美科公司 Minimization of transient noises in a voice signal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4448464B2 (en) * 2005-03-07 2010-04-07 日本電信電話株式会社 Noise reduction method, apparatus, program, and recording medium
US7869994B2 (en) * 2007-01-30 2011-01-11 Qnx Software Systems Co. Transient noise removal system using wavelets

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1956058A (en) * 2005-10-17 2007-05-02 哈曼贝克自动***-威美科公司 Minimization of transient noises in a voice signal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于MFCC的语音情感识别;韩丨等;《重庆邮电大学学报(自然科学版)》;20081031;第20卷(第5期);597-602 *
语音中瞬态噪声抑制算法研究;张兆伟;《CNKI中国知网》;20130501;14-19,31-37 *

Also Published As

Publication number Publication date
CN103440872A (en) 2013-12-11

Similar Documents

Publication Publication Date Title
CN103440872B (en) The denoising method of transient state noise
CN103440871B (en) A kind of method that in voice, transient noise suppresses
Ghanbari et al. A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets
KR100330230B1 (en) Noise suppression for low bitrate speech coder
JP5666444B2 (en) Apparatus and method for processing an audio signal for speech enhancement using feature extraction
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
CN106340292B (en) A kind of sound enhancement method based on continuing noise estimation
CN106885971B (en) Intelligent background noise reduction method for cable fault detection pointing instrument
CN104658544A (en) Method for inhibiting transient noise in voice
Shahnaz et al. Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme
CN103440869A (en) Audio-reverberation inhibiting device and inhibiting method thereof
CN104599677B (en) Transient noise suppressing method based on speech reconstructing
WO2007001821A2 (en) Multi-sensory speech enhancement using a speech-state model
CN101271686A (en) Method and apparatus for estimating noise by using harmonics of voice signal
Verteletskaya et al. Noise reduction based on modified spectral subtraction method
Jain et al. Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
CN106297795A (en) Audio recognition method and device
Patil et al. Effectiveness of Teager energy operator for epoch detection from speech signals
Wenlu et al. Modified Wiener filtering speech enhancement algorithm with phase spectrum compensation
Unoki et al. MTF-based power envelope restoration in noisy reverberant environments
EP4254409A1 (en) Voice detection method
Cui Pitch extraction based on weighted autocorrelation function in speech signal processing
Gao et al. Improved endpoint detection of multi-parameter fusion under noise reduction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant