EP1021805B1 - Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals - Google Patents
Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals Download PDFInfo
- Publication number
- EP1021805B1 EP1021805B1 EP98943997A EP98943997A EP1021805B1 EP 1021805 B1 EP1021805 B1 EP 1021805B1 EP 98943997 A EP98943997 A EP 98943997A EP 98943997 A EP98943997 A EP 98943997A EP 1021805 B1 EP1021805 B1 EP 1021805B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- frequency
- speech signal
- frame
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 36
- 230000003750 conditioning effect Effects 0.000 title claims description 11
- 230000003595 spectral effect Effects 0.000 claims description 36
- 230000001143 conditioned effect Effects 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 14
- 230000001755 vocal effect Effects 0.000 claims description 9
- 230000000717 retained effect Effects 0.000 claims 2
- 230000007774 longterm Effects 0.000 description 19
- 230000006870 function Effects 0.000 description 18
- 230000000873 masking effect Effects 0.000 description 17
- 230000004044 response Effects 0.000 description 14
- 230000000694 effects Effects 0.000 description 13
- 238000001228 spectrum Methods 0.000 description 11
- 238000001514 detection method Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 210000004704 glottis Anatomy 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000000630 rising effect Effects 0.000 description 3
- 230000007480 spreading Effects 0.000 description 3
- 238000000528 statistical test Methods 0.000 description 3
- 210000000721 basilar membrane Anatomy 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000002964 excitative effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000916 dilatatory effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000012858 packaging process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
Definitions
- the present invention relates to techniques digital speech signal processing.
- a commonly used method used is based on a linear prediction by which we evaluate a prediction delay inversely proportional to the tone frequency. This delay may be expressed as a whole or fractional number of times digital signal sample.
- Other methods directly detect attributable signal breaks at closures of the speaker's glottis, the intervals of time between these breaks being inversely proportional to the tone frequency.
- the discrete frequencies considered are those of the form (a / N) ⁇ F e , where F e is the sampling frequency, N the number of samples of the blocks used in the discrete Fourier transform, and has an integer ranging from 0 to N / 2-1. These frequencies do not necessarily include the estimated tone frequency and / or its harmonics. This results in an imprecision in the operations carried out in connection with the estimated tonal frequency, which can cause distortions of the processed signal by affecting its harmonic character.
- a main object of the present invention is to propose a way to condition the speech signal which makes it less sensitive to the above drawbacks.
- the invention therefore proposes a process as indicated in the claim 1 and a device as claimed in claim 9.
- the invention thus provides a method of conditioning of a digital speech signal processed by successive frames, in which an analysis is carried out harmonic of the speech signal to estimate a frequency tonal of the speech signal on each frame where it presents voice activity. After estimating the frequency of the speech signal on a frame, we condition the speech signal of the frame by oversampling it to a multiple oversampling frequency of the estimated tone frequency.
- An additional improvement is that that after processing each frame, we keep, among samples of the denoised speech signal provided by this processing, a number of samples equal to an integer multiple of times the ratio between the frequency sampling frequency and estimated tone frequency. This avoids distortion problems caused by phase discontinuities between frames, which are not generally not fully corrected by techniques overlap-add classics.
- the fact of having conditioned the signal by the oversampling technique provides good measure of the degree of voicing of the speech signal on the frame, from a calculation of the entropy of the autocorrelation of the spectral components calculated on the basis of the conditioned signal.
- Signal conditioning of speech accentuates the irregular aspect of the spectrum and therefore variations in entropy, so that the latter is a measure of good sensitivity.
- the denoising system shown in FIG. 1 processes a digital speech signal s.
- the signal frame is transformed in the frequency domain by a module 11 applying a conventional fast Fourier transform (TFR) algorithm to calculate the module of the signal spectrum.
- TFR fast Fourier transform
- the frequency resolution available at the output of the fast Fourier transform is not used, but a lower resolution, determined by a number I of frequency bands covering the band [0 , F e / 2] of the signal.
- a module 12 calculates the respective averages of the spectral components S n, f of the speech signal in bands, for example by a uniform weighting such that:
- This averaging reduces the fluctuations between the bands by averaging the noise contributions in these bands, which will decrease the variance of the estimator of noise. In addition, this averaging allows a large reduction of the complexity of the system.
- the averaged spectral components S n, i are addressed to a voice activity detection module 15 and to a noise estimation module 16. These two modules 15, 16 operate jointly, in the sense that degrees of vocal activity ⁇ n, i measured for the different bands by the module 15 are used by the module 16 to estimate the long-term energy of the noise in the different bands, while these long-term estimates B and n, i are used by module 15 to carry out a priori denoising of the speech signal in the different bands to determine the degrees of vocal activity ⁇ n, i .
- modules 15 and 16 can correspond to the flowcharts represented in the figures 2 and 3.
- the module 15 proceeds a priori to denoising the speech signal in the different bands i for the signal frame n.
- This a priori denoising is carried out according to a conventional process of non-linear spectral subtraction from noise estimates obtained during one or more previous frames.
- the spectral components pp n, i are calculated according to: where ⁇ p i is a floor coefficient close to 0, conventionally used to prevent the spectrum of the denoised signal from taking negative or too low values which would cause musical noise.
- Steps 17 to 20 therefore essentially consist in subtracting from the signal spectrum an estimate, increased by the coefficient ⁇ '/ n - ⁇ 1, i , of the noise spectrum estimated a priori.
- the module 15 calculates, for each band i (0 ⁇ 1 ⁇ I), a quantity ⁇ E n, i representing the short-term variation of the energy of the noise-suppressed signal in the band i, as well as long-term value E n, i of the energy of the denoised signal in band i.
- step 25 the quantity ⁇ E n, l is compared with a threshold ⁇ 1. If the threshold ⁇ 1 is not reached, the counter b i is incremented by one unit in step 26.
- step 27 the long-term estimator ba i is compared to the value of the smoothed energy E n, i . If ba l ⁇ E n, i , the estimator ba i is taken equal to the smoothed value E n, i in step 28, and the counter b i is reset to zero.
- the quantity ⁇ i which is taken equal to the ratio ba i / E n, 1 (step 36), is then equal to 1.
- step 27 shows that ba i ⁇ E n, i
- the counter b i is compared with a limit value bmax in step 29. If b j > bmax, the signal is considered to be too stationary to support vocal activity.
- Bm represents an update coefficient between 0.90 and 1. Its value differs depending on the state of a voice activity detection automaton (steps 30 to 32). This state ⁇ n-1 is that determined during the processing of the previous frame.
- the coefficient Bm takes a value Bmp very close to 1 so that the noise estimator is very slightly updated in the presence of speech. Otherwise, the coefficient Bm takes a lower value Bms, to allow a more significant update of the noise estimator in the phase of silence.
- the difference ba i -bl i between the long-term estimator and the internal noise estimator is compared to a threshold ⁇ 2. If the threshold ⁇ 2 is not reached, the long-term estimator ba l is updated with the value of the internal estimator bi l in step 35. Otherwise, the long-term estimator ba l remains unchanged . This avoids that sudden variations due to a speech signal lead to an update of the noise estimator.
- the module 15 After having obtained the quantities ⁇ i , the module 15 proceeds to the voice activity decisions in step 37.
- the module 15 first updates the state of the detection automaton according to the quantity ⁇ 0 calculated for l of the signal band.
- the new state ⁇ n of the automaton depends on the previous state ⁇ n-1 and on ⁇ 0 , as shown in Figure 4.
- the module 15 also calculates the degrees of vocal activity ⁇ n, i in each band i ⁇ 1.
- This function has for example the appearance shown in FIG. 5.
- Module 16 calculates the band noise estimates, which will be used in the denoising process, using the successive values of the components S n, i and the degrees of voice activity ⁇ n, i . This corresponds to steps 40 to 42 of FIG. 3.
- step 40 it is determined whether the voice activity detection machine has just gone from the rising state to the speaking state. If so, the last two estimates B and n -1, i and B and n- 2 , i previously calculated for each band i ⁇ 1 are corrected in accordance with the value of the previous estimate B and n -3, i .
- step 42 the module 16 updates the noise estimates per band according to the formulas: where ⁇ B denotes a forgetting factor such as 0 ⁇ B ⁇ 1.
- Formula (6) shows how the degree of non-binary vocal activity ⁇ n, i is taken into account.
- the long-term noise estimates B and n , l are overestimated, by a module 45 (FIG. 1), before proceeding to denoising by nonlinear spectral subtraction.
- Module 45 calculates the overestimation coefficient ⁇ '/ n, i previously mentioned, as well as an increased estimate B and' / n, i which essentially corresponds to ⁇ '/ n, i . B and n , i .
- the organization of the overestimation module 45 is shown in FIG. 6.
- the enhanced estimate B and '/ n, i is obtained by combining the long-term estimate B and n, i and a measure ⁇ B max / n, i the variability of the noise component in band i around its long-term estimate.
- this combination is essentially a simple sum made by an adder 46. It could also be a weighted sum.
- the measure ⁇ B max / n, i of the noise variability reflects the variance of the noise estimator. It is obtained as a function of the values of S n, i and of B and n, i calculated for a certain number of previous frames on which the speech signal does not present any vocal activity in the band i. It is a function of the deviations
- the degree of vocal activity ⁇ n, i is compared to a threshold (block 51) to decide whether the deviation
- the measure of variability ⁇ B max / n, i can, as a variant, be obtained as a function of the values S n, f (and not S n, i ) and B and n , i .
- FIFO 54 does not contain
- the enhanced estimator B and '/ n, i provides excellent robustness to the musical noises of the denoising process.
- a first phase of the spectral subtraction is carried out by the module 55 shown in FIG. 1.
- This phase provides, with the resolution of the bands 1 (1 i i 1 1), the frequency response H 1 / n, i of first denoising filter, as a function of the components S n, i and B and n, i and the overestimation coefficients ⁇ '/ n, i .
- the coefficient ⁇ 1 / i represents, like the coefficient ⁇ p i of formula (3), a floor conventionally used to avoid negative or too low values of the denoised signal.
- the overestimation coefficient ⁇ n, i could be replaced in formula (7) by another coefficient equal to a function of ⁇ '/ n, i and of an estimate of the signal-to-noise ratio (for example S n, i / B and n, i ), this function decreasing according to the estimated value of the signal-to-noise ratio.
- This function is then equal to ⁇ '/ n, i for the lowest values of the signal-to-noise ratio. Indeed, when the signal is very noisy, it is a priori not useful to reduce the overestimation factor.
- this function decreases to zero for the highest values of the signal / noise ratio. This protects the most energetic areas of the spectrum, where the speech signal is most significant, the amount subtracted from the signal then tending towards zero.
- This strategy can be refined by applying it selectively to frequency harmonics pitch of the speech signal when it has voice activity.
- a second denoising phase is carried out by a module 56 for protecting harmonics.
- the module 57 can apply any known method of analysis of the speech signal of the frame to determine the period T p , expressed as an integer or fractional number of samples, for example a linear prediction method.
- the protection provided by the module 56 may consist in carrying out, for each frequency f belonging to a band i:
- H 2 n , f 1
- the quantity subtracted from the component S n, f will be zero.
- the floor coefficients ⁇ 2 / i express the fact that certain harmonics of the tonal frequency f p can be masked by noise, so that it doesn ' is not worth protecting them.
- This protection strategy is preferably applied for each of the frequencies closest to the harmonics of f p , that is to say for any arbitrary integer.
- the difference ink the ⁇ -th harmonic of the real tonal frequency is its estimate ⁇ ⁇ f p (condition (9)) can go up to ⁇ ⁇ ⁇ ⁇ p / 2. For high values of ⁇ , this difference can be greater than the spectral half-resolution ⁇ f / 2 of the Fourier transform.
- the corrected frequency response H 2 / n, f can be equal to 1 as indicated above, which corresponds to the subtraction of a zero quantity in the context of spectral subtraction, that is to say ie full protection of the frequency in question. More generally, this corrected frequency response H 2 / n, f could be taken equal to a value between 1 and H 1 / n, f depending on the degree of protection desired, which corresponds to the subtraction of an amount less than which would be subtracted if the frequency in question was not protected.
- This signal S 2 / n, f is supplied to a module 60 which calculates, for each frame n, a masking curve by applying a psychoacoustic model of auditory perception by the human ear.
- the masking phenomenon is a principle known from functioning of the human ear. When two frequencies are heard simultaneously, it is possible that one of the two is no longer audible. We say then that it is hidden.
- the masking curve is seen as the convolution of the spectral spreading function of the basilar membrane in the bark domain with the excitatory signal, constituted in the present application by the signal S 2 / n, f .
- the spectral spreading function can be modeled as shown in Figure 7.
- R q depends on the more or less voiced character of the signal.
- ⁇ designates a degree of voicing of the speech signal, varying between zero (no voicing) and 1 (strongly voiced signal).
- the denoising system also includes a module 62 which corrects the frequency response of the denoising filter, as a function of the masking curve M n, q calculated by the module 60 and of the increased estimates B and '/ n, i , calculated by the module 45.
- the module 62 decides the level of denoising which must really be reached.
- the new response H 3 / n, f for a frequency f belonging to the band i defined by the module 12 and to the banae of bark q, thus depends on the relative difference between the increased estimate B and '/ n, i of the corresponding spectral component of the noise and the masking curve M n, q , as follows:
- the quantity subtracted from a spectral component S n, f , in the process of spectral subtraction having the frequency response H 3 / n, f is substantially equal to the minimum between on the one hand the quantity subtracted from this spectral component in the spectral subtraction process having the frequency response H 2 / n, f , and on the other hand the fraction of the increased estimate B and '/ n, i of the corresponding spectral component of the noise which, if if necessary, exceeds the masking curve M n, q .
- FIG. 8 illustrates the principle of the correction applied by the module 62. It schematically shows an example of masking curve M n, q calculated on the basis of the spectral components S 2 / n, f of the noise-suppressed signal, as well as the estimation plus B and '/ n, i of the noise spectrum.
- the quantity finally subtracted from the components S n, f will be that represented by the hatched areas, that is to say limited to the fraction of the increased estimate B and '/ n, i of the spectral components of the noise which exceeds the curve masking.
- This subtraction is carried out by multiplying the frequency response H 3 / n , f of the denoising filter by the spectral components S n, f of the speech signal (multiplier 64).
- TFRI inverse fast Fourier transform
- FIG. 9 shows a preferred embodiment of a denoising system implementing the invention.
- This system comprises a certain number of elements similar to corresponding elements of the system of FIG. 1, for which the same reference numbers have been used.
- modules 10, 11, 12, 15, 16, 45 and 55 provide in particular the quantities S n, i , B and n , i , ⁇ '/ n, i , B and' / n, i and H 1 / n, f to perform selective denoising.
- the frequency resolution of the fast Fourier transform 11 is a limitation of the system of FIG. 1.
- the frequency subject to protection by the module 56 is not necessarily the precise tonal frequency f p , but the frequency closest to it in the discrete spectrum. In some cases, it is then possible to protect harmonics relatively far from that of the tone frequency.
- the system of FIG. 9 overcomes this drawback thanks to an appropriate conditioning of the speech signal.
- the sampling frequency of the signal is modified so that the period 1 / f p covers exactly an integer number of sample times of the conditioned signal.
- This size N is usually a power of 2 for putting implementation of the TFR. It is 256 in the example considered.
- This choice is made by a module 70 according to the value of the delay T p supplied by the harmonic analysis module 57.
- the module 70 provides the ratio K between the sampling frequencies to three frequency change modules 71, 72, 73 .
- the module 71 is used to transform the values S n, i , B and n , i , ⁇ '/ n, i , B and ' / n, i and H 1 / n, f , relating to the bands i defined by the module 12 , in the scale of modified frequencies (sampling frequency f e ). This transformation consists simply in dilating the bands i in the factor K. The values thus transformed are supplied to the module 56 for protecting harmonics.
- the module 72 performs the oversampling of the frame of N samples provided by the windowing module 10.
- the conditioned signal frame supplied by the module 72 includes KN samples at the frequency f e . These samples are sent to a module 75 which calculates their Fourier transform.
- the two blocks therefore have an overlap of (2-K) x100%.
- the autocorrelations A (k) are calculated by a module 76, for example according to the formula:
- a module 77 then calculates the normalized entropy H, and supplies it to module 60 for the calculation of the masking curve (see SA McClellan et al: “Spectral Entropy: an Alternative Indicator for Rate Allocation?”, Proc. ICASSP'94 , pages 201-204):
- the normalized entropy H constitutes a measurement of voicing very robust to noise and variations in the tonal frequency.
- the correction module 62 operates in the same way as that of the system in FIG. 1, taking into account the overestimated noise B and '/ n, i rescaled by the frequency change module 71. It provides the response in frequency H 3 / n, f of the final denoising filter, which is multiplied by the spectral components S n, f of the signal conditioned by the multiplier 64. The components S 3 / n, f which result therefrom are brought back into the time domain by the TFRI 65 module. At the output of this TFRI 65, a module 80 combines, for each frame, the two signal blocks resulting from the processing of the two overlapping blocks delivered by the TFR 75. This combination can consist of a weighted sum Hamming of samples, to form a denoised conditioned signal frame of KN samples.
- the management module 82 controls the windowing module 10 so that the overlap between the current frame and the next one corresponds to NM. This recovery of NM samples will be required in the recovery sum carried out by the module 66 during the processing of the next frame.
- the tone frequency is estimated in an average way on the frame.
- the tonal frequency can vary some little over this period. It is possible to take into account these variations in the context of the present invention, in conditioning the signal so as to obtain artificially a constant tone frequency in the frame.
- the analysis module 57 harmonic provides the time intervals between the consecutive breaks in speech signal due to closures of the glottis of the intervening speaker for the duration of the frame.
- Usable methods to detect such micro-ruptures are well known in the area of harmonic signal analysis lyrics.
- the principle of these methods is to perform a statistical test between two models, one in the short term and the other in the long term. Both models are adaptive linear prediction models.
- the value of this statistical test w m is the cumulative sum of the posterior likelihood ratio of two distributions, corrected by the Kullback divergence. For a distribution of residuals having a Gaussian statistic, this value w m is given by: where e 0 / m and ⁇ 2/0 represent the residue calculated at the time of the sample m of the frame and the variance of the long-term model, e 1 / m and ⁇ 2/1 likewise representing the residue and the variance of the short term model. The closer the two models are, the more the value w m of the statistical test is close to 0. On the other hand, when the two models are distant from each other, this value w m becomes negative, which indicates a break R of the signal.
- FIG. 10 thus shows a possible example of evolution of the value w m , showing the breaks R of the speech signal.
- FIG. 11 shows the means used to calculate the conditioning of the signal in the latter case.
- the harmonic analysis module 57 is produced so as to implement the above analysis method, and to provide the intervals t r relative to the signal frame produced by the module 10.
- These oversampling reports K r are supplied to the frequency change modules 72 and 73, so that the interpolations are carried out with the sampling ratio K r over the corresponding time interval t r .
- the largest T p of the time intervals t r supplied by the module 57 for a frame is selected by the module 70 (block 91 in FIG. 11) to obtain a torque p, ⁇ as indicated in table I.
- This embodiment of the invention also involves an adaptation of the window management module 82.
- the number M of samples of the denoised signal to be saved on the current frame here corresponds to an integer number of consecutive time intervals t r between two glottal breaks (see FIG. 10). This arrangement avoids the problems of phase discontinuity between frames, while taking into account the possible variations of the time intervals t r on a frame.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Claims (9)
- Verfahren zur Aufbereitung eines in aufeinanderfolgenden Gruppen ("trames") behandelten digitalen Sprachsignals (s), dadurch gekennzeichnet, dass man eine Oberschwingungsanalyse des Sprachsignals vornimmt, um eine Tonfrequenz (fp) des Sprachsignals auf jeder Gruppe zu schätzen, auf der es eine Stimmaktivität aufweist, und dass man nach Schätzung der Tonfrequenz des Sprachsignals auf einer Gruppe das Sprachsignal der Gruppe aufbereitet, indem man es mit einer Überabtastfrequenz ("fréquence de suréchantillonnage") (fe) überabtastet ("suréchantillonne"), die ein ganzzahliges Vielfaches der geschätzten Tonfrequenz ist.
- Verfahren nach Anspruch 1, bei dem man Spektralkomponenten (Sn,f) des Sprachsignals errechnet, indem man das aufbereitete Signal (s') in Blöcken von N Abtastungen abgibt, die einer Transformation im Frequenzbereich unterzogen wurden, wobei N eine vorbestimmte ganze Zahl ist, und bei dem das Verhältnis (p) zwischen der Überabtastfrequenz (fe) und der geschätzten Tonfrequenz ein Teiler der Zahl N ist.
- Verfahren nach Anspruch 2, bei dem die Zahl N eine Potenz von 2 ist.
- Verfahren nach Anspruch 2 oder 3, bei dem man einen Voisementgrad ("degré de voisement") (χ) des Sprachsignals auf der Gruppe ausgehend von einer Berechnung der Entropie (H) der Autokorrelation von Spektralkomponenten (S 2 / n,f), die auf der Basis des aufbereiteten Signals (s') errechnet wurden, schätzt.
- Verfahren nach Anspruch 4, bei dem der Voisementgrad (χ) ausgehend von einer standardisierten entropie H der Formel gemessen wird,
worin A(k) die standardisierte Autokorrelation ist, die definiert ist durch: worin S 2 / n,f diese auf der Basis des überabgetasteten Signals berechnete spektrale Komponente der Ordnung f bezeichnet. - Verfahren nach einem der vorhergehenden Ansprüche, bei dem man nach Behandlung jeder Gruppe aufbereiteten Signals von den durch diese Behandlung gelieferten Signalabtastungen eine Anzahl von Abtastungen (M) gleich einem ganzzahligen Vielfachen des Verhältnisses (Tp) zwischen der Abtastfrequenz (Fe) und der geschätzten Tonfrequenz (fp) beibehält.
- Verfahren nach einem der Ansprüche 1 bis 5, bei dem die Schätzung der Tonfrequenz des Sprachsignals auf einer Gruppe die folgenden Schritte umfasst:man schätzt Zeitintervalle (tr) zwischen zwei aufeinanderfolgenden Unterbrechungen (R) des Signals, die während der Dauer der Gruppe auftretenden Schließungen der Stimmritze des Sprechers zuschreibbar sind, wobei die geschätzte Tonfrequenz umgekehrt proportional zu diesen Zeitintervallen ist;man interpoliert das Sprachsignal in diesen Zeitintervallen, damit das aus dieser Interpolation resultierende aufbereitete Signal (s') ein konstantes Zeitintervall zwischen zwei aufeinanderfolgenden Unterbrechungen aufweist.
- Verfahren nach Anspruch 7, bei dem man nach der Behandlung jeder Gruppe von den von dieser Behandlung gelieferten Abtastungen des Sprachsignals eine Anzahl von Abtastungen (M) beibehält, die einer ganzen Zahl von geschätzten Zeitintervallen (tr) entspricht.
- Vorrichtung zur Aufbereitung eines digitalen Sprachsignals (s), umfassend Behandlungsmittel, die für die Durchführung eines Aufbereitungsverfahrens nach einem der vorhergehenden Ansprüche ausgelegt ist.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9711641 | 1997-09-18 | ||
FR9711641A FR2768545B1 (fr) | 1997-09-18 | 1997-09-18 | Procede de conditionnement d'un signal de parole numerique |
PCT/FR1998/001978 WO1999014744A1 (fr) | 1997-09-18 | 1998-09-16 | Procede de conditionnement d'un signal de parole numerique |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1021805A1 EP1021805A1 (de) | 2000-07-26 |
EP1021805B1 true EP1021805B1 (de) | 2001-11-07 |
Family
ID=9511228
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP98943997A Expired - Lifetime EP1021805B1 (de) | 1997-09-18 | 1998-09-16 | Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals |
Country Status (7)
Country | Link |
---|---|
US (1) | US6775650B1 (de) |
EP (1) | EP1021805B1 (de) |
AU (1) | AU9168798A (de) |
CA (1) | CA2304013A1 (de) |
DE (1) | DE69802431T2 (de) |
FR (1) | FR2768545B1 (de) |
WO (1) | WO1999014744A1 (de) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1278185A3 (de) * | 2001-07-13 | 2005-02-09 | Alcatel | Verfahren zur Verbesserung von Geräuschunterdrückung bei der Sprachübertragung |
US7103539B2 (en) * | 2001-11-08 | 2006-09-05 | Global Ip Sound Europe Ab | Enhanced coded speech |
WO2004042722A1 (en) * | 2002-11-07 | 2004-05-21 | Samsung Electronics Co., Ltd. | Mpeg audio encoding method and apparatus |
CN101790756B (zh) * | 2007-08-27 | 2012-09-05 | 爱立信电话股份有限公司 | 瞬态检测器以及用于支持音频信号的编码的方法 |
WO2009059300A2 (en) * | 2007-11-02 | 2009-05-07 | Melodis Corporation | Pitch selection, voicing detection and vibrato detection modules in a system for automatic transcription of sung or hummed melodies |
US8924200B2 (en) * | 2010-10-15 | 2014-12-30 | Motorola Mobility Llc | Audio signal bandwidth extension in CELP-based speech coder |
US9384729B2 (en) * | 2011-07-20 | 2016-07-05 | Tata Consultancy Services Limited | Method and system for detecting boundary of coarticulated units from isolated speech |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3785189T2 (de) * | 1987-04-22 | 1993-10-07 | Ibm | Verfahren und Einrichtung zur Veränderung von Sprachgeschwindigkeit. |
US5384891A (en) * | 1988-09-28 | 1995-01-24 | Hitachi, Ltd. | Vector quantizing apparatus and speech analysis-synthesis system using the apparatus |
AU633673B2 (en) | 1990-01-18 | 1993-02-04 | Matsushita Electric Industrial Co., Ltd. | Signal processing device |
EP0459362B1 (de) | 1990-05-28 | 1997-01-08 | Matsushita Electric Industrial Co., Ltd. | Sprachsignalverarbeitungsvorrichtung |
US5400434A (en) * | 1990-09-04 | 1995-03-21 | Matsushita Electric Industrial Co., Ltd. | Voice source for synthetic speech system |
US5226084A (en) * | 1990-12-05 | 1993-07-06 | Digital Voice Systems, Inc. | Methods for speech quantization and error correction |
FR2679689B1 (fr) * | 1991-07-26 | 1994-02-25 | Etat Francais | Procede de synthese de sons. |
US5469087A (en) | 1992-06-25 | 1995-11-21 | Noise Cancellation Technologies, Inc. | Control system using harmonic filters |
US5787398A (en) * | 1994-03-18 | 1998-07-28 | British Telecommunications Plc | Apparatus for synthesizing speech by varying pitch |
JP3528258B2 (ja) * | 1994-08-23 | 2004-05-17 | ソニー株式会社 | 符号化音声信号の復号化方法及び装置 |
US5641927A (en) * | 1995-04-18 | 1997-06-24 | Texas Instruments Incorporated | Autokeying for musical accompaniment playing apparatus |
US5555190A (en) | 1995-07-12 | 1996-09-10 | Micro Motion, Inc. | Method and apparatus for adaptive line enhancement in Coriolis mass flow meter measurement |
BE1010336A3 (fr) * | 1996-06-10 | 1998-06-02 | Faculte Polytechnique De Mons | Procede de synthese de son. |
JP3266819B2 (ja) * | 1996-07-30 | 2002-03-18 | 株式会社エイ・ティ・アール人間情報通信研究所 | 周期信号変換方法、音変換方法および信号分析方法 |
WO1999010719A1 (en) * | 1997-08-29 | 1999-03-04 | The Regents Of The University Of California | Method and apparatus for hybrid coding of speech at 4kbps |
US6064955A (en) * | 1998-04-13 | 2000-05-16 | Motorola | Low complexity MBE synthesizer for very low bit rate voice messaging |
-
1997
- 1997-09-18 FR FR9711641A patent/FR2768545B1/fr not_active Expired - Fee Related
-
1998
- 1998-09-16 CA CA002304013A patent/CA2304013A1/fr not_active Abandoned
- 1998-09-16 DE DE69802431T patent/DE69802431T2/de not_active Expired - Fee Related
- 1998-09-16 WO PCT/FR1998/001978 patent/WO1999014744A1/fr active IP Right Grant
- 1998-09-16 US US09/509,146 patent/US6775650B1/en not_active Expired - Lifetime
- 1998-09-16 EP EP98943997A patent/EP1021805B1/de not_active Expired - Lifetime
- 1998-09-16 AU AU91687/98A patent/AU9168798A/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
AU9168798A (en) | 1999-04-05 |
FR2768545A1 (fr) | 1999-03-19 |
WO1999014744A1 (fr) | 1999-03-25 |
CA2304013A1 (fr) | 1999-03-25 |
DE69802431T2 (de) | 2002-07-18 |
US6775650B1 (en) | 2004-08-10 |
EP1021805A1 (de) | 2000-07-26 |
DE69802431D1 (de) | 2001-12-13 |
FR2768545B1 (fr) | 2000-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1016072B1 (de) | Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals | |
EP1789956B1 (de) | Verfahren zum verarbeiten eines rauschbehafteten tonsignals und einrichtung zur implementierung des verfahrens | |
EP1356461B1 (de) | Rauschverminderungsverfahren und -einrichtung | |
EP2002428B1 (de) | Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung | |
EP1016071B1 (de) | Verfahren und vorrichtung zur sprachdetektion | |
EP0490740A1 (de) | Verfahren und Einrichtung zum Bestimmen der Sprachgrundfrequenz in Vocodern mit sehr niedriger Datenrate | |
EP1016073B1 (de) | Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals | |
EP1021805B1 (de) | Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals | |
EP3192073B1 (de) | Unterscheidung und dämpfung von vorechos in einem digitalen audiosignal | |
EP1429316A1 (de) | Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen | |
EP2515300B1 (de) | Verfahren und System für die Geräuschunterdrückung | |
FR2797343A1 (fr) | Procede et dispositif de detection d'activite vocale | |
EP4287648A1 (de) | Elektronische vorrichtung und verarbeitungsverfahren, akustische vorrichtung und computerprogramm dafür | |
FR3051958A1 (fr) | Procede et dispositif pour estimer un signal dereverbere | |
WO1999027523A1 (fr) | Procede de reconstruction, apres debruitage, de signaux sonores | |
WO2006117453A1 (fr) | Procede d’attenuation des pre- et post-echos d’un signal numerique audio et dispositif correspondant | |
FR2664446A1 (fr) | Codeur differentiel a filtre predicteur auto-adaptatif a adaptation rapide de gain et decodeur correspondant. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20000316 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 11/04 A, 7G 10L 21/02 B |
|
RTI1 | Title (correction) |
Free format text: METHOD AND APPARATUS FOR CONDITIONING A DIGITAL SPEECH SIGNAL |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
17Q | First examination report despatched |
Effective date: 20001123 |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 69802431 Country of ref document: DE Date of ref document: 20011213 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20020130 |
|
RAP2 | Party data changed (patent owner data changed or rights of a patent transferred) |
Owner name: NORTEL NETWORKS FRANCE |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20050817 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20050902 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20050930 Year of fee payment: 8 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20070403 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20060916 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20070531 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060916 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20061002 |