EP2515300B1

EP2515300B1 - Method and system for noise reduction

Info

Publication number: EP2515300B1
Application number: EP20120163679
Authority: EP
Inventors: Pascal Saguin; Karim Maouche
Original assignee: Adeunis RF SA
Current assignee: Adeunis RF SA
Priority date: 2011-04-22
Filing date: 2012-04-11
Publication date: 2014-05-14
Anticipated expiration: 2032-04-11
Also published as: EP2515300A1; FR2974443B1; PT2515300E; ES2492698T3; FR2974443A1

Description

La présente invention se rapporte à un procédé de réduction du bruit et à un système de réduction du bruit associé.The present invention relates to a noise reduction method and an associated noise reduction system.

Elle se rapporte plus particulièrement à un procédé et à un système de réduction du bruit sur un signal acoustique bruité y(t) issu d'un microphone opérant dans un milieu bruité.It relates more particularly to a method and a system for reducing noise on a noisy acoustic signal y (t) from a microphone operating in a noisy environment.

La présente invention trouve une application particulière dans les systèmes de communication audio full-duplex, sans fils et mono-capteur, autrement dit mono-microphone, qui permettent d'établir une communication audio entre plusieurs utilisateurs, de manière autonome (c'est-à-dire sans raccordement à une base de transmission ou à un réseau) et qui soit simple d'utilisation (c'est-à-dire ne nécessitant aucune intervention d'un technicien pour établir la communication).The present invention finds particular application in full-duplex, wireless and mono-sensor audio communication systems, in other words mono-microphone, which make it possible to establish audio communication between several users, in an autonomous manner (ie ie without connection to a transmission base or to a network) and which is easy to use (that is to say not requiring any intervention of a technician to establish the communication).

De tels systèmes de communication sont généralement employés dans un environnement bruité, comme par exemple un milieu marin ou une salle de spectacle, voire extrêmement bruité, comme par exemple un chantier de travaux publics ou une salle ou stade accueillant un évènement sportif.Such communication systems are generally used in a noisy environment, such as a marine environment or a theater, or extremely noisy, such as a construction site or a hall or stadium hosting a sporting event.

Ainsi, il est nécessaire, voire indispensable, de prévoir un procédé et/ou système de réduction du bruit (ou de rehaussement de parole) dans la chaîne de traitement du signal afin d'améliorer la qualité audio pour que la conversation soit audible entre les utilisateurs des systèmes de communication, autrement dit que la communication soit compréhensible.Thus, it is necessary, if not essential, to provide a method and / or system for noise reduction (or speech enhancement) in the signal processing chain in order to improve the audio quality so that the conversation is audible between the speakers. users of communication systems, in other words that the communication is understandable.

Il est connu d'estimer la composante de bruit contenue dans le signal bruité par un algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon une méthode de moyennage récursif des minima contrôlés dite « MCRA » pour « Minima Controlled Recursive Averaging ».It is known to estimate the noise component contained in the noisy signal by an algorithm for estimating the power spectral density of the noise component according to a recursive averaging method of the controlled minima known as "MCRA" for "Minima Controlled Recursive" Averaging ".

Cette méthode d'estimation du bruit dite « MCRA » est connue de la littérature scientifique, notamment des articles suivants :

« Speech enhancement for non-stationary noise environments », par I. Cohen et B. Berdugo, Signal Processing, 2001, vol.81 , pp. 2403-2418 ;
« Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement », par I. Cohen et B. Berdugo, IEEE Signal Processing Letters, Janvier 2002, vol. 9, No.1, pp. 12-15 ;

This so-called "MCRA" noise estimation method is known from the scientific literature, including the following articles:

"Speech enhancement for non-stationary noise environments", by I. Cohen and B. Berdugo, Signal Processing, 2001, vol.81, pp. 2403-2418 ;
"Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement", by I. Cohen and B. Berdugo, IEEE Signal Processing Letters, January 2002, vol. 9, No.1, pp. 12-15 ;

« A Modified Spectral Subtraction Method For Speech Enhancement Based on Masking Property of Human Auditory System », par X. Bing-Yin et al, Wireless Communications & Signals Processing, International Conference on IEEE, 13 novembre 2009, pp. 1-5 , XP031594664. "A Modified Spectral Subtraction Method for Speech Enhancement Based on the Masking Property of Human Auditory System," by X. Bing-Yin et al, Wireless Communications & Signal Processing, International Conference on IEEE, November 13, 2009, pp. 1-5 , XP031594664.

La méthode d'estimation du bruit dite « MCRA » est ainsi particulièrement bien adaptée dans les environnements où le signal de bruit est fortement non-stationnaire et évolue relativement rapidement dans le temps.The so-called "MCRA" noise estimation method is thus particularly well suited in environments where the noise signal is highly non-stationary and evolves relatively rapidly over time.

L'état de la technique peut également être illustré par l'enseignement de l'article « Multi-band Spectral Subraction for Enhancing Speech Corrupted by Colored Noise », 2002 IEEE International Conference on Acoustics, Speech And Signal Processing. Proceedings (Car. NO.02CH37334), vol.4, 2002 . Cet article divulgue une méthode de soustraction spectrale multi-bandes.The state of the art can also be illustrated by teaching the article "Multi-band Spectral Subraction for Enhancing Speech Corrupted by Colored Noise", 2002 IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings (Car NO.02CH37334), vol.4, 2002 . This article discloses a multiband spectral subtraction method.

Un autre état de la technique publié dans « A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise», par S.D. Kamath et P.C. Loizou, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2002, Vol.4 , décrit une méthode de soustraction spectrale à décomposition multi-bandes tenant compte des effets de bruit coloré à différentes fréquences.Another state of the art published in "A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise," by SD Kamath and PC Loizou, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2002, Vol.4 , describes a multi-band decomposition spectral subtraction method taking into account the effects of colored noise at different frequencies.

La présente invention a pour but de proposer un procédé selon la revendication 1 et un système selon la revendication 10 de réduction du bruit adapté à des environnements fortement bruités et où la composante de bruit est fortement non-stationnaire et évolue relativement rapidement dans le temps.The object of the present invention is to propose a method according to claim 1 and a system according to claim 10 for noise reduction adapted to highly noisy environments and where the noise component is strongly non-stationary and evolves relatively rapidly over time.

A titre d'exemple, un procédé de réduction du bruit sur un signal acoustique bruité y(t) issu d'un microphone, de préférence unique, opérant dans un milieu bruité, comporte les étapes successives suivantes :

a) conversion du signal acoustique bruité y(t) dans le domaine temporel en un signal bruité Y_k,l dans le domaine fréquentiel, par découpage temporel du signal acoustique bruité y(t) en signaux échantillonnés y_l dans des trames temporelles I successives, fenêtrage des signaux échantillonnés y_l par application d'une fenêtre de pondération, et application d'une transformée de Fourier discrète, avec extraction du module carré |Y_k,l|², et éventuellement de la phase θ_k,l, du signal bruité Y_k,l ;
b) estimation d'une composante de bruit D̂_k,l contenue dans le signal bruité Y_k,l à partir du module carré |Y_k,l|², par un algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon une méthode de moyennage récursif des minima contrôlés dite « MCRA » ;
ledit procédé étant remarquable en ce qu'il comprend en outre, après l'étape b), les étapes successives suivantes :
c) découpage de la bande fréquentielle en plusieurs sous-bandes fréquentielles SB_i=[e_i, b_i], suivie d'une décomposition multi-bandes du module carré |Y_k,l|² et de la composante de bruit D̂_k,l, consistant à décomposer le module carré |Y_k,l|² et la composante de bruit D̂_k,l en respectivement plusieurs modules carré de sous-bande |Y_k,l,i|² et plusieurs composantes de bruit de sous-bande D̂_k,l,i propres à chacune des sous-bandes SB_i ;
d) estimation, pour chacune des sous-bandes SB_i, du module carré |X̂_k,l,i|² d'une composante débruitée de sous-bande X̂_k,l,i propre à chaque sous-bande SB_i d'un signal débruité X̂_k,l, par un algorithme de soustraction spectrale multi-bandes dit « SSMB » à partir des modules carrés de sous-bande |Y_k,l,i|² et des composantes de bruit de sous-bande D̂_k,l,i ;
e) détermination d'un signal débruité de sortie SD_k,l à partir des modules carré |X̂_k,l,i|² issus de l'étape d), et éventuellement des phases θ _k,l extraites lors de l'étape a) ;
f) conversion du signal débruité de sortie SD_k,l en un signal vocal débruité de sortie sd(t) dans le domaine temporel, par une étape f.1) de calcul d'un signal échantillonné de sortie Sd_l propre à chaque trame temporelle I pair application d'une transformée de Fourier inverse du signal débruité de sortie SD_k,l, suivie d'une étape f.2) de reconstruction temporelle du signal vocal débruité de sortie sd(t) à partir des signaux échantillonnés de sortie Sd_l.

For example, a method for reducing noise on a noisy acoustic signal y (t) from a microphone, preferably a single microphone, operating in a noisy medium, comprises the following successive steps:

a) converting the noisy acoustic signal y (t) in the time domain into a noisy signal Y _{k, 1} in the frequency domain, by time division of the noisy acoustic signal y (t) into sampled signals y ₁ in successive time frames I , windowing of sampled signals y _l by applying a weighting window, and applying a discrete Fourier transform, with extraction of the squared modulus | Y _{k, l} | ² , and possibly of the phase θ _{k, 1} , of the noisy signal Y _{k, 1} ;
b) estimating a noise component D _{k, 1} contained in the noisy signal Y _{k, 1} from the square module | Y _{k, l} | ² , by an algorithm for estimating the spectral power density of the noise component according to a method of recursive averaging of the controlled minima known as "MCRA";
said method being remarkable in that it further comprises, after step b), the following successive steps:
c) division of the frequency band into several frequency sub-bands SB _i = [e _i , b _i ], followed by a multi-band decomposition of the module square | Y _{k, l} | ² and the noise component D _{k, 1} , consisting in breaking down the square module | Y _{k, l} | ² and the noise component D _{k, l} respectively at several square subband modules | Y _{k, l, i} | ² and a plurality of subband noise components D _{k, l, i} specific to each of the subbands SB _i ;
d) estimating, for each of the subbands SB _i , the square module | X _{k, l, i} | ² of a denoised component of subband X _{k, l, i} specific to each subband SB _i of a denoised signal X _{k, 1} , by a multiband spectral subtraction algorithm called "SSMB" from square subband modules | Y _{k, l, i} | ² and subband noise components D _{k, l, i} ;
e) determining an output denoised signal SD _{k, l} from the square modules | X _{k, l, i} | ² from step d), and optionally phases θ _{k, l} extracted during step a);
f) converting the output denoised signal SD _{k, l} into an output speech output signal sd (t) in the time domain, by a step f.1) of calculating a sampled output signal Sd _l specific to each frame temporal I pair application of an inverse Fourier transform of the output denoised signal SD _{k, l} , followed by a step f.2) of temporal reconstruction of the output unfracked speech signal sd (t) from the output sampled signals Sd _l .

Ainsi, cet exemple propose de mettre en oeuvre un algorithme de soustraction spectrale multi-bandes « SSMB » qui consiste à partager la bande spectrale entière en sous bandes et à adapter dans chaque sous bande le calcul de soustraction entre le signal bruité Y_k,l et la composante de bruit D̂_k,l pour extraire un signal débruité de sortie SD_k,l ; appliquer l'opération de soustraction spectrale dans chaque sous-bande permettant d'améliorer la sensibilité du procédé de réduction du bruit.Thus, this example proposes to implement a multi-band spectral subtraction algorithm "SSMB" which consists of sharing the entire spectral band in sub-bands and to adapt in each subband the subtraction calculation between the noisy signal Y _{k, l} and the noise component D _{k, 1} for extracting an output denoised signal SD _{k, l} ; applying the spectral subtraction operation in each sub-band to improve the sensitivity of the noise reduction method.

Les simulations avec ce procédé qui combine les algorithmes « MCRA » et « SSMB » démontrent une nette amélioration de la qualité audio du signal bruité traité. Pour un large éventail de bruits, le résultat s'avère largement supérieur aux procédés classiques.Simulations with this method that combines the "MCRA" and "SSMB" algorithms demonstrate a marked improvement in the audio quality of the processed noisy signal. For a wide range of noises, the result is far superior to conventional methods.

Lors de l'étape a), l'extraction de la phase θ_k,l, du signal bruité Y_k,l est optionnelle. Il est en effet envisageable, lors de l'étape e), de déterminer le signal débruité de sortie SD_k,l à partir des modules carré |X̂_k,l,i|² et du signal bruité Y_k,l sans avoir à calculer en préalable sa phase.During step a), the extraction of the phase θ _{k, 1} , of the noisy signal Y _{k, 1} is optional. It is indeed possible, during step e), to determine the output denoised signal SD _{k, l} from the square modules | X _{k, l, i} | ² and the noisy signal Y _{k, l} without having to calculate its phase beforehand.

Selon une caractéristique, l'algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon la méthode de moyennage récursif des minimas contrôlés dite « MCRA » lors de l'étape b) met en oeuvre les phases de calcul suivantes :

b.1) calcul d'une composante bruitée filtrée S_k,l répondant à l'équation : $S_{k, l} = α_{s} S_{k, l - 1} + (1 - α_{s}) {|Y_{k, l}|}^{2}$

où α_s est une constante prédéterminée caractéristique d'un filtre passe-bas ;
b.2) calcul d'une densité de probabilité de présence de parole p̃_k,l par la mise en oeuvre du calcul progressif suivant :
1. (i) calcul d'une composante minimale spectrale Smin_k,l avec :
  - si rem(k,1) = 0
    alors Smin_k,l = min (Smin_k,l-1 ; S_k,l) et Stmp_k,l = S_k,l
  - si rem(k,1) ≠ 0
    alors Smin_k,l = min (Stmp_k,l-1 ; S_k,l) et
    Stmp_k,l = min (Stmp_k,l-1 ; S_k,l)
    où rem(k,l) est le reste de la division entière de k par I, puis
2. (ii) calcul d'un rapport spectral Sr_k,l répondant à l'équation : ${Sr}_{k, l} = \frac{S_{k, l}}{{Smin}_{k, l}}$
3. (iii) calcul d'une variable indicatrice I_k,l avec :
  - si Sr_k,l > δ_TH alors I_k,l = 1
  - si Sr_k,l ≤ δ_TH alors I_k,l = 0
    où δ_TH est un paramètre prédéterminé de seuil fixe de détection de parole ;
4. (iv) calcul de la densité de probabilité de présence de parole p̃_k.l avec : ${\tilde{p}}_{k, l} = α_{p} {\tilde{p}}_{k, l - 1} + (1 - α_{p}) I_{k, l}$
  
  où α_p est une constante prédéterminée ;
b.3) calcul d'un coefficient α̃_k,l répondant à l'équation suivante : ${\tilde{α}}_{k, l} = α + (1 - α) {\tilde{p}}_{k, l},$

où α est une constante prédéterminée ;
b.4) calcul de la composante de bruit D̂_k,l répondant à l'équation suivante : ${\hat{D}}_{k, l} = {\tilde{α}}_{k, l} {\hat{D}}_{k, l - 1} + (1 - {\tilde{α}}_{k, l}) {|Y_{k, l}|}^{2} .$

According to one characteristic, the algorithm for estimating the power spectral density of the noise component according to the method of recursive averaging of the controlled minima known as "MCRA" during step b) implements the following calculation phases:

b.1) calculating a filtered noise component S _{k, l} corresponding to the equation: $S_{k, l} = α_{s} S_{k, l - 1} + (1 - α_{s}) {|Y_{k, l}|}^{2}$

where α _s is a predetermined constant characteristic of a low-pass filter;
b.2) calculating a probability density of presence of speech p _{k, 1} by the implementation of the following progressive calculation:
1. (i) calculating a minimum spectral component Smin _{k, 1} with:
  - if rem (k, 1) = 0
    then Smin _{k, l} = min (Smin _{k, l-1} ; S _{k, l} ) and Stmp _{k, l} = S _{k, l}
  - if rem (k, 1) ≠ 0
    then Smin _{k, l} = min (Stmp _{k, l-1} ; S _{k, l} ) and
    Stmp _{k, l} = min (Stmp _{k, l-1} ; S _{k, l} )
    where rem (k, l) is the remainder of the integer division of k by I, then
2. (ii) calculating a spectral ratio Sr _{k, l} corresponding to the equation: ${Sr}_{k, l} = \frac{S_{k, l}}{{Smin}_{k, l}}$
3. (iii) calculating an indicator variable I _{k, l} with:
  - if Sr _{k, l} > δ _TH then I _{k, l} = 1
  - if Sr _{k, l} ≤ δ _TH then I _{k, l} = 0
    where δ _TH is a predetermined parameter of fixed threshold of speech detection;
4. (iv) calculating the probability density of speech presence p _kl with: ${\tilde{p}}_{k, l} = α_{p} {\tilde{p}}_{k, l - 1} + (1 - α_{p}) I_{k, l}$
  
  where α _p is a predetermined constant;
b.3) calculating a coefficient α _{k, l} corresponding to the following equation: ${\tilde{α}}_{k, l} = α + (1 - α) {\tilde{p}}_{k, l},$

where α is a predetermined constant;
b.4) calculating the noise component D _{k, l} corresponding to the following equation: ${\hat{D}}_{k, l} = {\tilde{α}}_{k, l} {\hat{D}}_{k, l - 1} + (1 - {\tilde{α}}_{k, l}) {|Y_{k, l}|}^{2} .$

La méthode dite « MCRA » utilise le principe suivant : sur une fenêtre à horizon fini et pour une fréquence donnée, le minimum de la densité spectrale de puissance (DSP) du signal bruité Y_k,l correspond à la valeur de la densité spectrale de puissance (DSP) de la composante de bruit D̂_k,l. Ainsi, une estimation des différentes valeurs minimales du spectre sur une fenêtre glissante permet d'obtenir une estimation de la densité spectrale de puissance (DSP) de la composante de bruit D̂_k,l.The so-called "MCRA" method uses the following principle: on a finite horizon window and for a given frequency, the minimum of the power spectral density (DSP) of the noisy signal Y _{k, l} corresponds to the value of the spectral density of power (DSP) of the noise component D _{k, l} . Thus, an estimation of the different minimum values of the spectrum on a sliding window makes it possible to obtain an estimate of the power spectral density (DSP) of the noise component D _{k, l} .

Selon une autre caractéristique, l'algorithme de soustraction spectrale multi-bande dit « SSMB » de l'étape d) met en oeuvre les phases de calcul suivantes, pour chacune des sous-bandes SB_i :

d.1) calcul d'un rapport signal à bruit SNR_k,l,i propre à chaque sous-bande SB_i répondant à l'équation : ${SNR}_{k, l, i} = 10. \log_{10} (\frac{\sum_{k = ei}^{bi} {|Y_{k, l, i}|}^{2}}{\sum_{k = ei}^{bi} |{\hat{D}}_{k, l, i}|})$
d.2) calcul du module carré |X̂_k,l,i|² de la composante débruitée de sous-bande X̂_k,l,i propre à chaque sous-bande SB_i, selon l'équation : ${|{\hat{X}}_{k, l, i}|}^{2} = {\begin{cases} {|Y_{k, l, i}|}^{2} - α_{i} δ_{i} |{\hat{D}}_{k, l, i}| & si {|Y_{k, l, i}|}^{2} > α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \\ β {|Y_{k, l, i}|}^{2} & si {|Y_{k, l, i}|}^{2} \leq α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \end{cases}$

où - δ_i est un paramètre variable en fonction de la sous-bande SB_i correspondante, prenant des valeurs distinctes d'une sous-bande à l'autre ;
- α_i est un paramètre variable qui dépend de la valeur du rapport signal à bruit SNR_k,l,i calculée dans la sous-bande SB_i correspondante ; et
- β est une constante.

According to another characteristic, the multi-band spectral subtraction algorithm called "SSMB" of step d) implements the following calculation phases, for each of the sub-bands SB _i :

d.1) calculating a signal-to-noise ratio SNR _{k, l, i} specific to each sub-band SB _i corresponding to the equation: ${SNR}_{k, l, i} = 10. \log_{10} (\frac{Σ_{k = ei}^{bi} {|Y_{k, l, i}|}^{2}}{Σ_{k = ei}^{bi} |{\hat{D}}_{k, l, i}|})$
d.2) calculation of the square module | X _{k, l, i} | ² of the denoised component of sub-band X _{k, l, i} specific to each sub-band SB _i , according to the equation: ${|{\hat{X}}_{k, l, i}|}^{2} = {\begin{cases} {|Y_{k, l, i}|}^{2} - α_{i} δ_{i} |{\hat{D}}_{k, l, i}| & if {|Y_{k, l, i}|}^{2} > α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \\ β {|Y_{k, l, i}|}^{2} & if {|Y_{k, l, i}|}^{2} \leq α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \end{cases}$

where - δ _i is a variable parameter as a function of the corresponding sub-band SB _i , taking values that are distinct from one subband to another;
- α _i is a variable parameter which depends on the value of the signal-to-noise ratio SNR _{k, l, i} calculated in the corresponding sub-band SB _i ; and
- β is a constant.

Un tel algorithme est particulièrement avantageux car il prend en compte la valeur du rapport signal à bruit SNR_k,l,i propre à chaque sous-bande SB_i pour effectuer la soustraction spectrale dans chaque sous-bande.Such an algorithm is particularly advantageous since it takes into account the value of the signal-to-noise ratio SNR _{k, l, i} specific to each sub-band SB _i to perform the spectral subtraction in each sub-band.

Comme la composante de bruit n'est pas distribué de façon uniforme le long des fréquences, la mise en oeuvre de cet algorithme de soustraction spectrale multi-bande « SSMB » permet de faire des traitements avec des paramètres différents selon la sous-bande où l'on se trouve. De ce fait, la composante de bruit pourra être réduite de manière plus significative dans une sous-bande si elle y est plus prépondérante, contrairement à un algorithme de soustraction pleine bande qui réduira le bruit de façon égale sur toutes les sous-bandes du spectre et sera donc moins précis et donc moins efficace.Since the noise component is not distributed uniformly along the frequencies, the implementation of this multi-band spectral subtraction algorithm "SSMB" allows processing with different parameters depending on the sub-band where the 'We are. As a result, the noise component can be reduced more significantly in a sub-band if it is more dominant, unlike a full-band subtraction algorithm that will reduce noise equally all the sub-bands of the spectrum and will therefore be less precise and therefore less effective.

Dans une réalisation particulière, les paramètres α _i répondent aux équations suivantes : $α_{i} = {\begin{matrix} α_{c 1} & si & {SNR}_{k, l, i} < {SNR}_{1} \\ α_{c 2} + α_{c 3} {SNR}_{k, l, i} & si & {SNR}_{1} \leq {SNR}_{k, l, i} \leq {SNR}_{2} \\ α_{c 4} & si & {SNR}_{k, l, i} > {SNR}_{2} \end{matrix}$

où α_c1, α_c2, α_c3 et α_c4 sont des constantes prédéterminées, et SNR₁ et SNR₂ sont des seuils prédéterminés.In a particular embodiment, the parameters α _i respond to the following equations:

α_{i} = {\begin{matrix} α_{vs 1} & if & {SNR}_{k, l, i} < {SNR}_{1} \\ α_{vs 2} + α_{vs 3} {SNR}_{k, l, i} & if & {SNR}_{1} \leq {SNR}_{k, l, i} \leq {SNR}_{2} \\ α_{vs 4} & if & {SNR}_{k, l, i} > {SNR}_{2} \end{matrix}

where α _c1 , α _c2 , α _c3 and α _c4 are predetermined constants, and SNR ₁ and SNR ₂ are predetermined thresholds.

Les valeurs α_c1, α_c2, α_c3 et α_c4 et SNR₁ et SNR₂ sont choisies de façon expérimentale, notamment par simulation numérique. A titre d'exemple non limitatif, les valeurs suivantes ont donné de bons résultats en simulation numérique : ${SNR}_{1} = - 5 dB, {SNR}_{2} = 20 dB;$

et

α_{c 1} = 5, α_{c 2} = 4, α_{c 3} = - 0.15, α_{c 4} = 1

The values α _c1 , α _c2 , α _c3 and α _c4 and SNR ₁ and SNR ₂ are chosen experimentally, in particular by numerical simulation. By way of non-limiting example, the following values have given good results in numerical simulation:

{SNR}_{1} = - 5 dB, {SNR}_{2} = 20 dB;

and

α_{vs 1} = 5, α_{vs 2} = 4, α_{vs 3} = - 0.15, α_{vs 4} = 1

Selon le procédé exemplatif, l'étape e) consiste à déterminer le signal débruité X̂_k,l à partir des modules carré |X̂_k,l,i|² des composantes débruitées de sous-bande X̂_k,l,i, et éventuellement des phases θ _k,l extraites lors de l'étape a), de sorte que le signal débruité de sortie SD_k,l correspond au signal débruité X̂_k,l, soit SD_k,l = X̂_k,l.According to the exemplary method, step e) consists in determining the denoised signal X _{k, 1} from the square modules | X _{k, l, i} | ² debruitées components of sub-band X _{k, l, i} , and optionally phases θ _{k, l} extracted in step a), so that the output unedited signal SD _{k, l} corresponds to the denoised signal X _{k , l} , ie SD _{k, l} = X _{k, l} .

Ainsi, on considère que le signal débruité de sortie SD_k,l correspond au signal débruité X̂_k,l dont les modules carré |X̂_k,l,i|² sont issus directement de l'étape d) mettant en oeuvre l'algorithme de soustraction spectrale multi-bandes « SSMB ».Thus, it is considered that the output unedited signal SD _{k, 1} corresponds to the denoised signal X _{k, l} whose square modules | X _{k, l, i} | ² are directly derived from step d) implementing the multi-band spectral subtraction algorithm "SSMB".

Selon l'invention, en variante de l'exemple, l'étape e) consiste à :

déterminer, pour chacune des sous-bandes SB_i, le module carré ${|{\overset{\lor}{X}}_{k, l, i}|}^{2}$
d'une composante débruitée combinée de sous-bande ${\overset{\lor}{X}}_{k, l, i}$
propre à chaque sous-bande SBi d'un signal débruité combiné ${\overset{\lor}{X}}_{k, l}$
répondant à l'équation correspondante : ${|{\overset{\lor}{X}}_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2}$
où γ est un coefficient d'amplification prédéterminé, préférentiellement compris entre 0,01 et 0,1 ;
déterminer un signal débruité combiné ${\overset{‿}{X}}_{k, l}$
à partir des modules carrés ${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
_k,l,i|² des composantes débruitées combinées de sous-bande ${\overset{‿}{X}}_{k, l, i},$
et éventuellement des phases θ_k,l extraites lors de l'étape a), de sorte que le signal débruité de sortie SD_k,l corresponde au signal débruité combiné ${\overset{‿}{X}}_{k, l},$
soit ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$

According to the invention, as a variant of the example, step e) consists of:

determining, for each of the subbands SB _i , the square module ${|{\overset{\lor}{X}}_{k, l, i}|}^{2}$
a combined denoised subband component ${\overset{\lor}{X}}_{k, l, i}$
specific to each sub-band SBi of a combined de-noise signal ${\overset{\lor}{X}}_{k, l}$
responding to the corresponding equation: ${|{\overset{\lor}{X}}_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2}$
where γ is a predetermined amplification coefficient, preferably between 0.01 and 0.1;
determine a combined debris signal ${\overset{‿}{X}}_{k, l}$
from the square modules ${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
_{k, l, i} | ² combined debris components of subband ${\overset{‿}{X}}_{k, l, i},$
and possibly phases θ _{k, l} extracted during step a), so that the output denoised signal SD _{k, 1} corresponds to the combined de-noise signal ${\overset{‿}{X}}_{k, l},$
is ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$

Le signal traité, c'est-à-dire le signal débruité X̂_k,l, peut souffrir au niveau de la qualité et de l'intelligibilité d'un problème de distorsion. Cette distorsion, dont l'origine est généralement le traitement de réduction du bruit, dépend des paramètres des algorithmes de réduction du bruit (algorithmes « MCRA » et « SSMB ») mais aussi du niveau et du type de composante de bruit à réduire.The processed signal, i.e., the denoised signal X _{k, l} , may suffer in terms of the quality and intelligibility of a distortion problem. This distortion, whose origin is usually the noise reduction process, depends on the parameters of noise reduction algorithms ("MCRA" and "SSMB" algorithms) but also on the level and type of noise component to be reduced.

Ainsi, dans le cadre de l'invention, on procède, après l'étape d) mettant en oeuvre l'algorithme de soustraction spectrale multi-bandes « SSMB », à une étape de réinjection du signal bruité Y_k,_l issu du microphone dans le signal débruité X̂_k,l ; cette réinjection étant contrôlée par le coefficient d'amplification (autrement appelé paramètre d'atténuation) choisi très faible, de l'ordre de quelques pourcents. Le signal réinjecté correspond au signal γ |Y_k,l,i|² et il ne présente pas de distorsion, car il n'a pas fait l'objet d'un traitement par les algorithmes « MCRA » et « SSMB ».Thus, in the context of the invention, after step d) implementing the multi-band spectral subtraction algorithm "SSMB", a step of reinjection of the noisy signal Y _k , _l from the microphone is carried out. in the denoised signal X _{k, l} ; this reinjection being controlled by the amplification coefficient (otherwise called attenuation parameter) chosen very low, of the order of a few percent. The reinjected signal corresponds to the signal γ | Y _{k, l, i} | ² and it is not distorted because it has not been processed by the "MCRA" and "SSMB" algorithms.

Ce principe de réinjection d'une partie très faible du signal bruité Y_k,l issu du microphone permet de remédier au moins en partie à ce problème de distorsion.This principle of reinjection of a very small part of the noisy signal Y _{k, l} from the microphone makes it possible to remedy at least part of this problem of distortion.

Dans une première réalisation, l'étape f.2) consiste à reconstruire le signal vocal débruité de sortie sd(t) uniquement à partir des signaux de sortie Sd_l issus de l'étape f.1), ces signaux de sortie Sd_l correspondant aux transformées de Fourier inverse du signal débruité de sortie SD_k,l propres à chaque trame temporelle I.In a first embodiment, the step f.2) involves reconstructing the voice signal output denoised sd (t) only from the Sd output signals from _the the step f.1), these output signals Sd _l corresponding to the inverse Fourier transforms of the output denoised signal SD _{k, l} specific to each time frame I.

Ainsi, dans cette première réalisation, le signal vocal débruité de sortie sd(t) est reconstruit à partir des seuls signaux de sortie Sd_l issus de l'étape f.1).Thus, in this first embodiment, the speech output speech signal sd (t) is reconstructed from the only output signals Sd _I from step f.1).

Dans une seconde réalisation, en variante de la première réalisation, l'étape f.2) consiste à, pour chaque trame temporelle I :

g) calculer un rapport moyen signal à bruit r_l propre à la trame temporelle I à partir du module carré |Y_k,l|² et de la composante de bruit D̂_k,l ;
h) comparer le rapport moyen signal à bruit r_l avec un seuil Ψ_TH prédéterminé ;
i) reconstruire le signal vocal débruité de sortie sd(t) à partir en considérant que :
- si le rapport moyen signal à bruit r_l est inférieur audit seuil Ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction temporelle pour cette trame temporelle I correspond au signal de sortie Sd_l issu de l'étape f.1) ;
- si le rapport moyen signal à bruit r_l est supérieur audit seuil Ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction temporelle pour cette trame temporelle I correspond au signal échantillonné y_l issu de l'étape de découpage de l'étape a).

In a second embodiment, as a variant of the first embodiment, step f.2) consists of, for each time frame I:

g) calculating a signal-to-noise average ratio r _I specific to the time frame I from the square module | Y _{k, l} | ² and the noise component D _{k, l} ;
h) comparing the average signal-to-noise ratio r ₁ with a predetermined threshold Ψ _TH ;
i) reconstructing the output speechless signal sd (t) from considering that:
- if the average signal-to-noise ratio r ₁ is less than said threshold Ψ _TH for the time frame I, then the signal considered before time reconstruction for this time frame I corresponds to the output signal Sd ₁ resulting from the step f.1);
- if the average signal-to-noise ratio r _l is greater than said threshold Ψ _TH for the time frame I, then the signal considered before time reconstruction for this time frame I corresponds to the sampled signal y ₁ resulting from the step of cutting step at).

Ainsi, dans cette seconde réalisation, on met en oeuvre:

lors de l'étape g), une détection de bruit par calcul d'un rapport moyen signal à bruit r_l ;
lors de l'étape h), une comparaison de ce rapport moyen signal à bruit r_l avec un seuil Ψ_TH pour établir si le bruit est présent (r_l < Ψ_TH) ou si le bruit est absent ou du moins est extrémement faible (r_l > Ψ_TH) ; et
lors de l'étape i), on fait en sorte, pour chaque trame I, que :
- si le bruit est présent (r_l < Ψ_TH), alors on prend en compte le signal traité numériquement, c'est-à-dire le signal de sortie Sd_l pour la reconstruction temporelle ;
- si le bruit est absent ou extrémement faible (r_l > Ψ_TH), alors on ne prend pas en compte le signal de sortie Sd_l mais on prend en compte directement le signal échantillonné y_l pour la reconstruction temporelle, ce qui revient à ignorer les traitements de réduction du bruit (MCRA, SSMB) pour cette trame I, avec l'avantage d'éviter des distorsions inutiles lorsque le niveau de bruit est tel qu'un traitement de réduction du bruit n'est pas nécessaire.

Thus, in this second embodiment, we implement:

in step g), a noise detection by calculating a signal-to-noise average ratio r _l ;
in step h), a comparison of this signal-to-noise ratio r _l with a threshold Ψ _TH to establish if the noise is present (r _l <Ψ _TH ) or if the noise is absent or at least is extremely weak (r _l > Ψ _TH ); and
during step i), it is made, for each frame I, that:
- if the noise is present (r _l <Ψ _TH ), then we take into account the digitally processed signal, that is to say the output signal Sd _l for time reconstruction;
- if the noise is absent or extremely weak (r _l > Ψ _TH ), then we do not take into account the output signal Sd _l but we take into account directly the sampled signal y _l for the time reconstruction, which amounts to ignoring the noise reduction processing (MCRA, SSMB) for this frame I, with the advantage of avoiding unnecessary distortions when the noise level is such that a noise reduction treatment is not necessary.

Cette seconde réalisation permet en quelque sorte la désactivation des étapes b), c) et d) de réduction du bruit lorsque le bruit n'est pas présent. De ce fait, les distorsions qui peuvent être apportées par le traitement de réduction du bruit, dans cette situation d'absence ou de quasi-absence de bruit, seront éliminées.This second embodiment allows somehow the deactivation of steps b), c) and d) noise reduction when the noise is not present. As a result, the distortions that can be brought about by the noise reduction processing, in this situation of absence or near absence of noise, will be eliminated.

Pour rendre le procédé plus robuste, il serait avantageux que la même décision (présence ou non de bruit) soit prise sur une succession de trames temporelles I.To make the process more robust, it would be advantageous for the same decision (presence or absence of noise) to be taken on a succession of time frames I.

Dans un mode de réalisation particulier, l'étape g) met en oeuvre l'algorithme de calcul suivant, pour chaque trame temporelle I :

g.1) calcul d'une composante de bruit moyenne D̅₁ à partir de la composante de bruit D̂_k,l estimée lors de l'étape b) et répondant à l'équation : ${\overline{D}}_{1} = \frac{1}{M} \sum_{k = 0}^{M - 1} {\hat{D}}_{k, l}$

où M est une constante prédéterminée, de préférence égale à N ou à [1+N/2], N étant le nombre de points de la transformée de Fourier ;
g.2) calcul d'un module carré moyen |Y_k,l|² du signal bruité Y_k,l répondant à l'équation : $\overline{{|Y_{k, l}|}^{2}} = \frac{1}{M} \sum_{k = 0}^{M - 1} {|Y_{k, l}|}^{2}$
g.3) calcul d'une composante filtrée P_l du module carré moyen |Y_k,l|² répondant à l'équation : $P_{1} = λ P_{l - 1} + (1 - λ) \overline{{|Y_{k, l}|}^{2}}$

où - λ est une constante prédéterminée caractéristique d'un filtre passe-bas,
de préférence compris entre 0,80 et 0,99 ;
$- P_{0} = {\overline{D}}_{0} = \frac{1}{M} \sum_{k = 0}^{M - 1} {\hat{D}}_{k, 0}$
pour initialiser l'algorithme.
g.4) calcul du rapport moyen signal à bruit r_l répondant à l'équation :
- si D̅_l > 0 alors $r_{l} = \frac{P_{l}}{{\overline{D}}_{l}},$
- si D̅_l ≤ 0 alors r_l = 0.

In a particular embodiment, step g) implements the following calculation algorithm, for each time frame I:

g.1) calculating a mean noise component D̅ ₁ from the noise component D _{k, l} estimated in step b) and corresponding to the equation: ${\tilde{D}}_{1} = \frac{1}{M} Σ_{k = 0}^{M - 1} {\hat{D}}_{k, l}$

where M is a predetermined constant, preferably equal to N or [1 + N / 2], where N is the number of points of the Fourier transform;
g.2) calculating a mean square module | Y _{k, l} | ² of the noisy signal Y _{k, l} corresponding to the equation: $\tilde{{|Y_{k, l}|}^{2}} = \frac{1}{M} Σ_{k = 0}^{M - 1} {|Y_{k, l}|}^{2}$
g.3) calculating a filtered component P _l of the average square module | Y _{k, l} | ² answering the equation: $P_{1} = λ P_{l - 1} + (1 - λ) \tilde{{|Y_{k, l}|}^{2}}$

where - λ is a predetermined constant characteristic of a low-pass filter,
preferably between 0.80 and 0.99;
$- P_{0} = {\tilde{D}}_{0} = \frac{1}{M} Σ_{k = 0}^{M - 1} {\hat{D}}_{k, 0}$
to initialize the algorithm.
g.4) calculation of the average signal to noise ratio r _l corresponding to the equation:
- if D̅ _l > 0 then $r_{l} = \frac{P_{l}}{{\tilde{D}}_{l}},$
- if D̅ _l ≤ 0 then r _l = 0.

Avantageusement, les étapes a) et f) de conversion mettent en oeuvre une méthode de recouvrement et addition dite « OLA », avec :

pour l'étape a), un découpage du signal acoustique bruité y(t) en trames temporelles avec un recouvrement entre les trames temporelles successives ;
pour l'étape f.2), la reconstruction du signal vocal débruité de sortie sd(t) est réalisée par les additions successives des parties en recouvrement des signaux de deux trames temporelles successives.

Advantageously, the conversion steps a) and f) implement an OLA recovery and addition method, with:

for step a), cutting the noisy acoustic signal y (t) into time frames with an overlap between the successive time frames;
for step f.2), the reconstruction of the output defracked speech signal sd (t) is performed by the successive additions of the parts in overlap of the signals of two successive time frames.

Cette méthode de recouvrement et addition dite « OLA » pour « OverLap and Add method » est une méthode classique de reconstruction temporelle qui utilise des fenêtres de pondération juxtaposées (c'est-à-dire des fenêtres qui se superposent ou qui se recouvrent partiellement), puis qui additionne les signaux de sortie en tenant compte du recouvrement des trames temporelles.This OLA overlap and add method is a typical time reconstruction method that uses juxtaposed weighting windows (ie overlapping or overlapping windows). then who Adds the output signals taking into account the overlap of the time frames.

Un autre exemple se rapporte également à un système de réduction du bruit sur un signal acoustique bruité y(t) issu d'un microphone opérant dans un milieu bruité, comportant :

une unité de conversion du signal acoustique bruité y(t) dans le domaine temporel en un signal bruité Y_k,l dans le domaine fréquentiel, comportant :
- un module de découpage du signal acoustique bruité y(t) en signaux échantillonnés y_l dans des trames temporelles I successives ;
- en sortie du module de découpage, un module de fenêtrage des signaux échantillonnés y_l par application d'une fenêtre de pondération ;
- en sortie du module de fenêtrage, un module de calcul d'une transformée de Fourier discrète qui délivre en sortie le signal bruité Y_k,l ;
une unité de traitement numérique dans le domaine fréquentiel comportant, en sortie de l'unité de conversion :
- un premier module d'extraction du module carré |Y_k,l|² du signal bruité Y_k,l ; et éventuellement un deuxième module d'extraction de la phase θ_k,l du signal bruité Y_k,l ;
- en sortie du premier module d'extraction, un module d'estimation, dit « MCRA », d'une composante de bruit D̃_k,l contenue dans le signal bruité Y_k,l à partir du module carré |Y_k,l|² issu du premier module d'extraction, par un algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon une méthode de moyennage récursif des minima contrôlés dite « MCRA » ;
- en sortie du premier module d'extraction, un module de découpage de la bande fréquentielle en plusieurs sous-bandes fréquentielles SB_i=[e_i, b_i] ;
- en sortie du module d'estimation « MCRA » et du module de découpage de la bande fréquentielle, un module d'estimation, dit « SSMB », du module carré |X̂_k,l,i|² d'une composante débruitée de sous-bande X̂_k,l,i propre à chaque sous-bande SB_i d'un signal débruité X̂_k,l, par un algorithme de soustraction spectrale multi-bandes à partir de modules carrés de sous-bande |Y_k,l,i|² et de composantes de bruit de sous-bande D̂_k,l,i ;
- en sortie du module d'estimation « SSMB », et éventuellement du deuxième module d'extraction, un module de détermination d'un signal débruité de sortie SD_k,l à partir des modules carré |X̂_k,l,i|², et éventuellement des phases θ_k,l ;
une unité de conversion dans le domaine temporel comportant, en sortie de l'unité de traitement numérique :
- un module de calcul d'un signal de sortie Sd_l propre à chaque trame temporelle I par application d'une transformée de Fourier inverse du signal débruité de sortie SD_k,l ; et
- un module de reconstruction d'un signal vocal débruité de sortie sd(t) dans le domaine temporel à partir desdits signaux de sortie Sd_l.

Another example also relates to a system for reducing noise on a noisy acoustic signal y (t) originating from a microphone operating in a noisy environment, comprising:

a noisy acoustic signal conversion unit y (t) in the time domain into a noisy signal Y _{k, 1} in the frequency domain, comprising:
- a module for cutting the noisy acoustic signal y (t) into sampled signals y ₁ in successive time frames I;
- the output of the cutting module, a windowing module of the sampled signals y _l by applying a weighting window;
- at the output of the windowing module, a module for calculating a discrete Fourier transform which outputs the noisy signal Y _{k, l} ;
a digital processing unit in the frequency domain comprising, at the output of the conversion unit:
- a first extraction module of the square module | Y _{k, l} | ² of the noisy signal Y _{k, l} ; and possibly a second extraction module of the phase θ _{k, 1} of the noisy signal Y _{k, 1} ;
- at the output of the first extraction module, an estimation module, called "MCRA", of a noise component D _{k, 1} contained in the noisy signal Y _{k, 1} from the square module | Y _{k, l} | ² from the first extraction module, by an algorithm for estimating the spectral power density of the noise component according to a recursive averaging method of the controlled minima known as "MCRA";
- at the output of the first extraction module, a frequency band division module in several frequency subbands SB _i = [e _i , b _i ];
- at the output of the estimation module "MCRA" and the frequency band division module, an estimation module, called "SSMB", of the square module | X _{k, l, i} | ² of a denoised component of subband X _{k, l, i} specific to each subband SB _i of a denoised signal X _{k, 1} , by a multiband spectral subtraction algorithm from square subband modules | Y _{k, l, i} | ² and subband noise components D _{k, l, i} ;
- at the output of the estimating module "SSMB", and possibly of the second extraction module, a module for determining an output denoised signal SD _{k, l} from the square modules | X _{k, l, i} | ² , and optionally phases θ _{k, l} ;
a time domain conversion unit comprising, at the output of the digital processing unit:
- a module for calculating an output signal Sd ₁ specific to each time frame I by applying an inverse Fourier transform of the output denoised signal SD _{k, l} ; and
- a module for reconstructing a speech output speech signal sd (t) in the time domain from said output signals Sd _l .

Selon cet exemple, le module de détermination du signal débruité de sortie SD_k,l comprend :

un sous-module racine-carré pour calculer le module |X̂_k,l,i| des composantes débruitées de sous-bande X̂_k,l,i, et
un sous-module de recombinaison des composantes débruitées de sous-bande X̂_k,l,i pour obtenir le signal débruité X̂_k,l à partir des modules |X̂_k,l,i|, et éventuellement des phases θ_k,l, de sorte que le signal débruité de sortie SD_k,l corresponde au signal débruité X̂_k,l, soit SD_k,l = X̂_k,l

According to this example, the module for determining the output denoised signal SD _{k, 1} comprises:

a root-square submodule to compute the module | X _{k, l, i} | debrueted components of sub-band X _{k, l, i} , and
a recombination sub-module of the de-banded components of sub-band X _{k, l, i} to obtain the denoised signal X _{k, l} from the modules | X _{k, l, i} |, and possibly phases θ _{k, l} , so that the output unedited signal SD _{k, 1} corresponds to the denoised signal X _{k, 1} , ie SD _{k, l} = X _{k, l}

Selon l'invention, en variante de cet exemple, le module de détermination d'un signal débruité de sortie SD_k,l comprend :

en sortie du premier module d'extraction, un sous-module d'amplification selon un coefficient d'amplification γ, préférentiellement compris entre 0,01 et 0,1, afin de délivrer un signal amplifié γ |Y_k,l|² ;
en sortie du module d'estimation « SSMB », un sous-module additionneur propre à additionner le signal amplifié γ |Y_k,l|² et les modules carré |X̂_k,l,i|², afin de délivrer en sortie les module carré | ${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
de composantes débruitées combinées de sous-bande ${\overset{‿}{X}}_{k, l, i}$
propres à chaque sous-bande SBi d'un signal débruité combiné ${\overset{‿}{X}}_{k, l},$
répondant à l'équation correspondante : ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2};$
un sous-module racine-carré pour calculer le module $|{\overset{‿}{X}}_{k, l, i}|$
des composantes débruitées combinées de sous-bande ${\overset{‿}{X}}_{k, l, i}$
et
un sous-module de recombinaison des composantes combinées de sous-bande ${\overset{‿}{X}}_{k, l, i}$
pour obtenir le signal débruité combiné ${\overset{‿}{X}}_{k, l},$
à partir des modules $|{\overset{‿}{X}}_{k, l, i}|,$
et éventuellement des phases θ_k,l, de sorte que le signal débruité de sortie SD_k,l correspond au signal débruité combiné ${\overset{‿}{X}}_{k, l}$
soit ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$

According to the invention, as a variant of this example, the module for determining an output de-let signal SD _{k, 1} comprises:

at the output of the first extraction module, an amplification sub-module according to an amplification coefficient γ, preferably between 0.01 and 0.1, in order to deliver an amplified signal γ | Y _{k, l} | ² ;
at the output of the estimating module "SSMB", an addering sub-module capable of adding the amplified signal γ | Y _{k, l} | ² and the square modules | X _{k, l, i} | ² , in order to output the square module | ${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
of combined debris components of subband ${\overset{‿}{X}}_{k, l, i}$
specific to each SBi sub-band of a combined de-noise signal ${\overset{‿}{X}}_{k, l},$
responding to the corresponding equation: ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2};$
a root-square sub-module to calculate the module $|{\overset{‿}{X}}_{k, l, i}|$
combined debrueted subband components ${\overset{‿}{X}}_{k, l, i}$
and
a recombination sub-module of the combined subband components ${\overset{‿}{X}}_{k, l, i}$
to get the combined noise signal ${\overset{‿}{X}}_{k, l},$
from the modules $|{\overset{‿}{X}}_{k, l, i}|,$
and possibly phases θ _{k, 1} , such that the output denoised signal SD _{k, 1} corresponds to the combined de-noise signal ${\overset{‿}{X}}_{k, l}$
is ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$

Dans cette seconde possibilité, le système met en oeuvre une étape de réinjection d'une partie très faible du signal bruité Y_k,l issu du microphone dans le signal débruité X̂_k,l pour remédier au moins en partie aux problèmes de distorsion induits par les modules d'estimation « MCRA » et « SSMB ».In this second possibility, the system implements a step of reinjection of a very small part of the noisy signal Y _{k, 1} coming from the microphone into the denoised signal X _{k, 1} to remedy at least in part the distortion problems induced by the estimation modules "MCRA" and "SSMB".

Dans une réalisation avantageuse, l'unité de traitement numérique comprend en outre, en sortie du module d'estimation « MCRA », un module de détection du bruit comprenant

un module de calcul d'un rapport moyen signal à bruit r_l propre à chaque trame temporelle I à partir du module carré |Y_k,l|² et de la composante de bruit D̂_k,l ;
un module de comparaison du rapport moyen signal à bruit r_l propre à chaque trame temporelle I avec un seuil Ψ_TH prédéterminé ;
un module de contrôle du module de reconstruction du signal vocal débruité de sortie sd(t) qui est conçu pour que :
- si le rapport moyen signal à bruit r_l est inférieur audit seuil Ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction pour cette trame temporelle I correspond au signal de sortie Sd_l issu du module de calcul dudit signal de sortie Sd_l ;
- si le rapport moyen signal à bruit r_l est supérieur audit seuil Ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction pour cette trame temporelle I correspond au signal échantillonné y_l issu du module de découpage du signal acoustique bruité y(t).

In an advantageous embodiment, the digital processing unit further comprises, at the output of the estimation module "MCRA", a noise detection module comprising

a module for calculating a signal-to-noise average ratio r ₁ specific to each time frame I from the square module | Y _{k, l} | ² and the noise component D _{k, l} ;
a module for comparing the average signal-to-noise ratio r ₁ for each time frame I with a predetermined threshold Ψ _TH ;
a module for controlling the output defracked speech signal reconstruction module sd (t) which is designed so that:
- if the average signal-to-noise ratio r ₁ is less than said threshold Ψ _TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the output signal Sd ₁ from the calculation module of said output signal Sd _l ;
- if the average signal-to-noise ratio r _l is greater than said threshold Ψ _TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the sampled signal y ₁ from the noisy acoustic signal cutting module y (t ).

D'autres caractéristiques et avantages de la présente invention apparaîtront à la lecture de la description détaillée ci-après, d'un exemple de mise en oeuvre non limitatif, faite en référence aux figures annexées dans lesquelles :

la figure 1 est une vue schématique d'un premier système de réduction du bruit exemplatif;
la figure 2 est une vue schématique d'un système de réduction du bruit conforme à l'invention ;
la figure 3 est une vue schématique d'une variante du second système de la figure 2.

Other features and advantages of the present invention will appear on reading the detailed description below, of an example of non-limiting implementation, with reference to the appended figures in which:

the figure 1 is a schematic view of a first exemplary noise reduction system;
the figure 2 is a schematic view of a noise reduction system according to the invention;
the figure 3 is a schematic view of a variant of the second system of the figure 2 .

La description d'un système 1 de réduction du bruit sur un signal acoustique bruité y(t) issu d'un microphone unique opérant dans un milieu bruité, et du procédé de réduction du bruit associé, est faite en référence aux figures 1 à 3.The description of a system 1 for reducing the noise on a noisy acoustic signal y (t) originating from a single microphone operating in a noisy medium, and the method of reducing the associated noise, is made with reference to the Figures 1 to 3 .

Dans les trois modes de réalisation, le système 1 comprend une unité de conversion 2 du signal acoustique bruité y(t) dans le domaine temporel en un signal bruité Y_k,l dans le domaine fréquentiel.In the three embodiments, the system 1 comprises a conversion unit 2 of the noisy acoustic signal y (t) in the time domain into a noisy signal Y _{k, 1} in the frequency domain.

Cette unité de conversion 2 comporte un module de découpage 21 du signal acoustique bruité y(t) en signaux échantillonnés y_l dans des trames temporelles I successives.This conversion unit 2 comprises a switching module 21 of the noisy acoustic signal y (t) in sampled signals y ₁ in successive time frames I.

Dans ce module de découpage 21, le signal acoustique bruité y(t) est découpé en trames de 240 échantillons, ce qui à une fréquence d'échantillonnage de 8 kHz correspond à des trames temporelles de 30 millisecondes.In this clipping module 21, the noisy acoustic signal y (t) is cut into frames of 240 samples, which at a sampling frequency of 8 kHz corresponds to time frames of 30 milliseconds.

Il est également envisageable d'avoir un découpage en trames de 256 échantillons, ce qui à une fréquence d'échantillonnage de 8 kHz correspond à des trames temporelles de 32 millisecondes.It is also conceivable to have a frame division of 256 samples, which at a sampling frequency of 8 kHz corresponds to time frames of 32 milliseconds.

Dans ce module de découpage 21, les trames temporelles successives se recouvrent ou se chevauchent. Par exemple, les trames temporelles successives se chevauchent sur 120 échantillons, ce qui correspond à cinquante pourcent (50 %) de recouvrement.In this clipping module 21, the successive time frames overlap or overlap. For example, successive time frames overlap on 120 samples, which corresponds to fifty percent (50%) of overlap.

Ce recouvrement des trames temporelles est destiné à permettre la mise en oeuvre d'une méthode de recouvrement et addition dite « OLA », qui permet le découpage temporel initial du signal acoustique bruité y(t) puis la restitution finale en sortie du système 1 dans le domaine temporel.This overlap of the time frames is intended to allow the implementation of a method of recovery and addition called "OLA", which allows the initial temporal division of the noisy acoustic signal y (t) then the final restitution at the output of the system 1 in the time domain.

Cette unité de conversion 2 comporte, en sortie du module de découpage 21, un module de fenêtrage 22 des signaux échantillonnés y_l par application d'une fenêtre de pondération, notamment du type fenêtre de Hanning ou fenêtre de Hamming, afin de délivrer en sortie des signaux échantillonnés pondérés {y_l}.This conversion unit 2 comprises, at the output of the chopper module 21, a windowing module 22 of the sampled signals y ₁ by application of a weighting window, in particular of the Hanning window or Hamming window type, in order to output weighted sampled signals {y ₁ }.

Ainsi, les trames temporelles sont alors apodisées avec une fenêtre de pondération, avant d'appliquer une transformée de Fourier, afin de minimiser les effets de bords dus au découpage-recouvrement effectué par le module de découpage 21.Thus, the time frames are then apodized with a weighting window, before applying a Fourier transform, in order to to minimize the edge effects due to the blanking cut performed by the cutting module 21.

Cette unité de conversion 2 comporte, en sortie du module de fenêtrage 22, un module de calcul 23 d'une transformée de Fourier discrète qui délivre en sortie le signal bruité Y_k,l.This conversion unit 2 comprises, at the output of the windowing module 22, a calculation module 23 of a discrete Fourier transform which outputs the noisy signal Y _{k, l} .

D'un point de vue mathématique, on note : $y (t) = x (t) + d (t),$

avec x(t) le signal de parole utile et d(t) la composante de bruit.From a mathematical point of view, we note:

there (t) = x (t) + d (t),

with x (t) the useful speech signal and d (t) the noise component.

Le module de découpage 21 reçoit en entrée le signal acoustique bruité y(t) et délivre en sortie le signal échantillonnée y_l, où I est l'indice temporel (ou indice de la trame temporelle).The switching module 21 receives as input the noisy acoustic signal y (t) and outputs the sampled signal y ₁ , where I is the time index (or index of the time frame).

Le module de découpage 21 reçoit en entrée le signal échantillonnée y_l et délivre en sortie le signal échantillonnée pondéré {y_l}, avec : $\{y_{l}\} = ω_{l} . y_{l}$

avec ω_l, le signal représentatif de la fenêtre de pondération.Cutting module 21 receives as input the sampled signal y _l and outputs the weighted sampled signal {y _t}, with:

\{{there}_{l}\} = ω_{l} . {there}_{l}

with ω _l , the signal representative of the weighting window.

Le module de calcul 23 reçoit en entrée le signal échantillonnée pondéré {y_l} et délivre en sortie le signal bruité Y_k,l qui correspond à la transformée de Fourier Discrète de {y(l)}, où k représente l'indice de fréquence, soit : $Y_{k, l} = DFT (\{y_{l}\}) = r_{k, l} + j i_{k, l}$

The calculation module 23 receives as input the weighted sampled signal {y ₁ } and outputs the noisy signal Y _{k, 1} which corresponds to the Discrete Fourier transform of {y (1)}, where k represents the index of Frequency:

Y_{k, l} = DFT (\{{there}_{l}\}) = r_{k, l} + j i_{k, l}

Le calcul de la transformée de Fourier Discrète (DFT pour « Discrete Fourier Transform ») est par exemple réalisé par un calcul de transformée de Fourier rapide (FFT pour « Fast Fourier Transform ») avec une taille N qui peut être égale à 256 (N correspond au nombre de points de la transformée de Fourier).The calculation of the Discrete Fourier Transform (DFT) is for example carried out by a Fast Fourier Transform (FFT) calculation with a size N which can be equal to 256 (N corresponds to the number of points of the Fourier transform).

Le système 1 comprend, en sortie de l'unité de conversion 2, une unité de traitement numérique 3 dans le domaine fréquentiel qui réalise le traitement de débruitage ou de rehaussement de la parole sur le signal bruité Y_k,l.The system 1 comprises, at the output of the conversion unit 2, a digital processing unit 3 in the frequency domain which carries out denoising or speech enhancement processing on the noisy signal Y _{k, l} .

Cette unité de traitement numérique 3 comprend un premier module d'extraction 31 du module carré |Y_k,l|² du signal bruité Y_k,l.This digital processing unit 3 comprises a first extraction module 31 of the square module | Y _{k, l} | ² of the noisy signal Y _{k, l} .

D'un point de vue mathématique, le premier module d'extraction 31 réalise le calcul suivant : ${|Y_{k, l}|}^{2} = r_{k, l}^{2} + i_{k, l}^{2} .$

From a mathematical point of view, the first extraction module 31 performs the following calculation:

{|Y_{k, l}|}^{2} = r_{k, l}^{2} + i_{k, l}^{2} .

Dans les modes de réalisation des figures 1 et 2, cette unité de traitement numérique 3 comprend un deuxième module d'extraction 32 de la phase θ_k,l du signal bruité Y_k,l. Comme détaillé ultérieurement, il est également envisageable de se passer de ce deuxième module d'extraction 32.In the embodiments of figures 1 and 2 this digital processing unit 3 comprises a second extraction module 32 of the phase θ _{k, 1} of the noisy signal Y _{k, l} . As detailed later, it is also conceivable to dispense with this second extraction module 32.

D'un point de vue mathématique, le deuxième module d'extraction 32 réalise le calcul suivant : $\cos θ_{k, l} = \frac{r_{k, l}}{|Y_{k, l}|}, et sin θ_{k, l} = \frac{i_{k, l}}{|Y_{k, l}|}$

avec

|Y_{k, l}| = \sqrt{r_{k, l}^{2} + i_{k, l}^{2}}

From a mathematical point of view, the second extraction module 32 performs the following calculation:

\cos θ_{k, l} = \frac{r_{k, l}}{|Y_{k, l}|}, and sin θ_{k, l} = \frac{i_{k, l}}{|Y_{k, l}|}

with

|Y_{k, l}| = \sqrt{r_{k, l}^{2} + i_{k, l}^{2}}

L'unité de traitement numérique 3 comprend, en sortie du premier module d'extraction 31, un module d'estimation 33, dit « MCRA », d'une composante de bruit D̂_k,l contenue dans le signal bruité Y_k,l à partir du module carré |Y_k,l|² issu du premier module d'extraction 31, par un algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon une méthode de moyennage récursif des minima contrôlés dite « MCRA ».The digital processing unit 3 comprises, at the output of the first extraction module 31, an estimation module 33, called "MCRA", of a noise component D _{k, 1} contained in the noisy signal Y _{k, l} from the square module | Y _{k, l} | ² from the first extraction module 31, by an algorithm for estimating the power spectral density of the noise component according to a recursive averaging method of the controlled minima known as "MCRA".

D'un point de vue mathématique, l'algorithme d'estimation de la densité spectrale de puissance de la composante de bruit selon la méthode de moyennage récursif des minimas contrôlés dite « MCRA » met en oeuvre les phases de calcul suivantes :

b.1) calcul d'une composante bruitée filtrée S_k,l répondant à l'équation : $S_{k, l} = α_{s} S_{k, l - 1} + (1 - α_{s}) {|Y_{k, l}|}^{2}$

où α_s est une constante prédéterminée caractéristique d'un filtre passe-bas ;
b.2) calcul d'une densité de probabilité de présence de parole p̃_k,l par la mise en oeuvre du calcul progressif suivant :
1. (i) calcul d'une composante minimale spectrale Smin_k,l avec :
  - si rem(k,1) = 0
  alors Smin_k,l = min (Smin_k,l-1 ; S_k,l) et Stmp_k,l = S_k,l
  - si rem(k,l) ≠ 0
  alors Smin_k,l = min (Stmp_k,l-1 ; S_k,l) et
  Stmp_k,l = min (Stmp_k,l-1 ; S_k,l)
  où rem(k,l) est le reste de la division entière de k par I, puis
2. (ii) calcul d'un rapport spectral Sr_k,l répondant à l'équation : ${Sr}_{k, l} = \frac{S_{k, l}}{{Smin}_{k, l}}$
3. (iii) calcul d'une variable indicatrice I_k,l avec :
  - si Sr_k,l > δ_TH alors I_k,l = 1
  - si Sr_k,l ≤ δ_TH alors I_k,l = 0
    où δ_TH est un paramètre prédéterminé de seuil fixe de détection de parole ;
4. (iv) calcul de la densité de probabilité de présence de parole p̃_k,l avec : ${\tilde{p}}_{k, l} = α_{p} {\tilde{p}}_{k, l - 1} + (1 - α_{p}) I_{k, l}$
  
  où α_p est une constante prédéterminée ;
b.3) calcul d'un coefficient α̃ _k,l répondant à l'équation suivante : ${\tilde{α}}_{k, l} = α + (1 - α) {\tilde{p}}_{k, l},$

où α est une constante prédéterminée ;
b.4) calcul de la composante de bruit D̂_k,l répondant à l'équation suivante : ${\hat{D}}_{k, l} = {\tilde{α}}_{k, l} {\hat{D}}_{k, l - 1} + (1 - {\tilde{α}}_{k, l}) {|Y_{k, l}|}^{2} .$

From a mathematical point of view, the algorithm for estimating the power spectral density of the noise component according to the recursive averaging method of the controlled minima known as "MCRA" implements the following calculation phases:

b.1) calculating a filtered noise component S _{k, l} corresponding to the equation: $S_{k, l} = α_{s} S_{k, l - 1} + (1 - α_{s}) {|Y_{k, l}|}^{2}$

where α _s is a predetermined constant characteristic of a low-pass filter;
b.2) calculating a probability density of presence of speech p _{k, 1} by the implementation of the following progressive calculation:
1. (i) calculating a minimum spectral component Smin _{k, 1} with:
  - if rem (k, 1) = 0
  then Smin _{k, l} = min (Smin _{k, l-1} ; S _{k, l} ) and Stmp _{k, l} = S _{k, l}
  - if rem (k, l) ≠ 0
  then Smin _{k, l} = min (Stmp _{k, l-1} ; S _{k, l} ) and
  Stmp _{k, l} = min (Stmp _{k, l-1} ; S _{k, l} )
  where rem (k, l) is the remainder of the integer division of k by I, then
2. (ii) calculating a spectral ratio Sr _{k, l} corresponding to the equation: ${Sr}_{k, l} = \frac{S_{k, l}}{{Smin}_{k, l}}$
3. (iii) calculating an indicator variable I _{k, l} with:
  - if Sr _{k, l} > δ _TH then I _{k, l} = 1
  - if Sr _{k, l} ≤ δ _TH then I _{k, l} = 0
    where δ _TH is a predetermined parameter of fixed threshold of speech detection;
4. (iv) calculating the probability density of speech presence p _{k, 1} with: ${\tilde{p}}_{k, l} = α_{p} {\tilde{p}}_{k, l - 1} + (1 - α_{p}) I_{k, l}$
  
  where α _p is a predetermined constant;
b.3) calculating a coefficient α _{k, l} corresponding to the following equation: ${\tilde{α}}_{k, l} = α + (1 - α) {\tilde{p}}_{k, l},$

where α is a predetermined constant;
b.4) calculating the noise component D _{k, l} corresponding to the following equation: ${\hat{D}}_{k, l} = {\tilde{α}}_{k, l} {\hat{D}}_{k, l - 1} + (1 - {\tilde{α}}_{k, l}) {|Y_{k, l}|}^{2} .$

Le module d'estimation « MCRA » 33 délivre ainsi en sortie la composante de bruit D̂_k,l.The estimation module "MCRA" 33 thus outputs the noise component D _{k, l} .

L'unité de traitement numérique 3 comprend, en sortie du premier module d'extraction 31, un module de découpage 34 de la bande fréquentielle en plusieurs sous-bandes fréquentielles SB_i=[e_i, b_i].The digital processing unit 3 comprises, at the output of the first extraction module 31, a switching module 34 of the frequency band in several frequency subbands SB _i = [e _i , b _i ].

La bande fréquentielle peut par exemple être divisée en trois sous-bandes fréquentielles, à savoir SB₁ pour f_i < 1000 Hz, SB₂ pour 1000 Hz ≤ f_i ≤ 2000 Hz et enfin SB₃ pour f_i > 2000 Hz, où f_i est la fréquence de sous-bande.The frequency band can for example be divided into three frequency sub-bands, namely SB ₁ for f _i <1000 Hz, SB ₂ for 1000 Hz ≤ f _i ≤ 2000 Hz and finally SB ₃ for f _i > 2000 Hz, where f _i is the subband frequency.

L'unité de traitement numérique 3 comprend, en sortie du module d'estimation « MCRA » 33 et du module de découpage 34 de la bande fréquentielle, un module d'estimation 35, dit « SSMB », du module carré |X̂_k,l,i|² d'une composante débruitée de sous-bande X̂_k,_l,_i propre à chaque sous-bande SB_i d'un signal débruité X̂_k,l, par un algorithme de soustraction spectrale multi-bandes à partir de modules carrés de sous-bande |Y_k,l,i|² et de composantes de bruit de sous-bande D̂_k,l,i The digital processing unit 3 comprises, at the output of the estimation module "MCRA" 33 and the module of division 34 of the frequency band, an estimation module 35, called "SSMB", of the square module | X _{k, l, i} | ² of a denoised component of sub-band X _k , _l , _i specific to each sub-band SB _i of a denoised signal X _{k, l} , by a multi-band spectral subtraction algorithm from square modules of sub-band | Y _{k, l, i} | ² and subband noise components D _{k, l, i}

Le principe de l'algorithme de soustraction spectrale multi-bandes (SSMB) procède d'une généralisation de l'algorithme de soustraction spectrale qui consiste à soustraire de la densité spectrale de puissance du signal bruité Y_k,l issu du microphone une portion de la densité spectrale de puissance de la composante de bruit estimé par la méthode « MCRA ». L'algorithme de soustraction spectrale multi-bandes applique le même principe en découpant l'espace spectral en plusieurs sous-bandes SB_i fréquentielles et, ensuite, dans chaque sous-bande SB_i, l'opération de soustraction spectrale est appliquée, comme suit : ${|{\hat{X}}_{k, l, i}|}^{2} = {|Y_{k, l, i}|}^{2} - μ_{i} {|{\hat{D}}_{k, l, i}|}^{2}$

avec µ_i un coefficient prédéterminé.The principle of the multi-band spectral subtraction algorithm (SSMB) proceeds from a generalization of the spectral subtraction algorithm which consists in subtracting from the spectral power density of the noisy signal Y _{k, l} coming from the microphone a portion of the spectral power density of the noise component estimated by the "MCRA" method. The multi-band spectral subtraction algorithm applies the same principle by splitting the spectral space into several frequency sub-bands SB _i and then, in each sub-band SB _i , the spectral subtraction operation is applied, as follows :

{|{\hat{X}}_{k, l, i}|}^{2} = {|Y_{k, l, i}|}^{2} - μ_{i} {|{\hat{D}}_{k, l, i}|}^{2}

with μ _i a predetermined coefficient.

Cet algorithme de soustraction spectrale multi-bandes est combiné avec la méthode de recouvrement et addition dite « OLA ».This multi-band spectral subtraction algorithm is combined with the so-called OLA overlay and addition method.

Il est bien entendu envisageable d'affiner la relation de soustraction spectrale multi-bandes donnée ci-dessus, comme décrit ci-après.It is of course conceivable to refine the multi-band spectral subtraction relationship given above, as described below.

D'un point de vue mathématique, l'algorithme de soustraction spectrale multi-bande « SSMB » mis en oeuvre par le module d'estimation « SSMB » 35, réalise les phases de calcul suivantes, pour chacune des sous-bandes SB_i:

From a mathematical point of view, the multi-band spectral subtraction algorithm "SSMB" implemented by the estimation module "SSMB" 35, performs the following calculation phases, for each of the sub-bands SB _i :

d.1) calculating a signal-to-noise ratio SNR _{k, l, i} specific to each sub-band SB _i corresponding to the equation: ${SNR}_{k, l, i} = 10. \log_{10} (\frac{Σ_{k = ei}^{bi} {|Y_{k, l, i}|}^{2}}{Σ_{k = ei}^{bi} |{\hat{D}}_{k, l, i}|})$
d.2) calculation of the square module | X _{k, l, i} | ² of the denoised component of sub-band X _{k, l, i} specific to each sub-band SB _i , according to the equation: ${|{\hat{X}}_{k, l, i}|}^{2} = {\begin{cases} {|Y_{k, l, i}|}^{2} - α_{i} δ_{i} |{\hat{D}}_{k, l, i}| & if {|Y_{k, l, i}|}^{2} > α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \\ β {|Y_{k, l, i}|}^{2} & if {|Y_{k, l, i}|}^{2} \leq α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \end{cases}$

where - δ _i is a variable parameter depending on the sub-band SB _i
corresponding, taking values distinct from one subband to another;
- α _i is a variable parameter which depends on the value of the signal-to-noise ratio SNR _{k, l, i} calculated in the corresponding sub-band SB _i ; and
- β is a constant.

Il est envisageable de comparer la valeur du rapport signal à bruit SNR_k,l,i avec deux seuils SNR₁ et SNR₂ pour établir la valeur des paramètres α_i. Ainsi, les paramètres α_i répondent aux équations suivantes : $α_{i} = {\begin{matrix} α_{c 1} & si & {SNR}_{k, l, i} < {SNR}_{1} \\ α_{c 2} + α_{c 3} {SNR}_{k, l, i} & si & {SNR}_{1} \leq {SNR}_{k, l, i} \leq {SNR}_{2} \\ α_{c 4} & si & {SNR}_{k, l, i} > {SNR}_{2} \end{matrix}$

où α_c1, α_c2, α_c3 et α_c4 sont des constantes prédéterminées, et SNR₁ et SNR₂ sont les seuils prédéterminés.It is conceivable to compare the value of the signal-to-noise ratio SNR _{k, l, i} with two thresholds SNR ₁ and SNR ₂ to establish the value of the parameters α _i . Thus, the parameters α _i respond to the following equations:

α_{i} = {\begin{matrix} α_{vs 1} & if & {SNR}_{k, l, i} < {SNR}_{1} \\ α_{vs 2} + α_{vs 3} {SNR}_{k, l, i} & if & {SNR}_{1} \leq {SNR}_{k, l, i} \leq {SNR}_{2} \\ α_{vs 4} & if & {SNR}_{k, l, i} > {SNR}_{2} \end{matrix}

where α _c1 , α _c2 , α _c3 and α _c4 are predetermined constants, and SNR ₁ and SNR ₂ are the predetermined thresholds.

Par exemple, il est envisageable d'avoir les relations suivantes pour les paramètres α_i : $α_{i} = {\begin{matrix} 5 & si & {SNR}_{k, l, i} < - 5 \\ 4 - \frac{3}{20} {SNR}_{k, l, i} & si & - 5 \leq {SNR}_{k, l, i} \leq 20 \\ 1 & si & {SNR}_{k, l, i} > 20 \end{matrix}$

For example, it is conceivable to have the following relations for the parameters α _i :

α_{i} = {\begin{matrix} 5 & if & {SNR}_{k, l, i} < - 5 \\ 4 - \frac{3}{20} {SNR}_{k, l, i} & if & - 5 \leq {SNR}_{k, l, i} \leq 20 \\ 1 & if & {SNR}_{k, l, i} > 20 \end{matrix}

Concernant les paramètres variables δ_i, il envisageable d'avoir les relations suivantes, dans le cas de la division en trois sous-bandes fréquentielles décrite ci-dessus : $δ_{i} = {\begin{matrix} 1 & si & f_{i} < 1000 (première sous - bande {SB}_{1}) \\ 2.75 & si & 1000 \leq f_{i} \leq 2000 (deuxième sous - bande {SB}_{2}) \\ 1.75 & si & f_{i} > 2000 (troisième sous - bande {SB}_{3}) \end{matrix}$

Concerning the variable parameters δ _i , it is conceivable to have the following relations, in the case of the division into three frequency subbands described above:

δ_{i} = {\begin{matrix} 1 & if & f_{i} < 1000 (first sub - bandaged {SB}_{1}) \\ 2.75 & if & 1000 \leq f_{i} \leq 2000 (second sub - bandaged {SB}_{2}) \\ 1.75 & if & f_{i} > 2000 (third sub - bandaged {SB}_{3}) \end{matrix}

Concernant la constante β, il est envisageable d'avoir des valeurs de l'ordre de 0,002 ou 0,0015, soit par exemple β = 0.002 ou β = 0.0015.Regarding the constant β, it is possible to have values of the order of 0.002 or 0.0015, for example β = 0.002 or β = 0.0015.

L'unité de traitement numérique 3 comprend également, en sortie du module d'estimation « SSMB » 35 et du deuxième module d'extraction 32, un module de détermination 36 d'un signal débruité de sortie SD_k,l à partir des modules carré |X̃_k,l,i|² et des phases θ_k,l.The digital processing unit 3 also comprises, at the output of the estimation module "SSMB" 35 and the second extraction module 32, a module 36 for determining an output denoised signal SD _{k, l} from the modules square | X _{k, l, i} | ² and phases θ _{k, l} .

Dans le premier mode de réalisation illustré sur la figure 1, le module de détermination 36 du signal débruité de sortie SD_k,l comprend :

un sous-module racine-carré 361 pour calculer le module |X̂_k,l,i| des composantes débruitées de sous-bande X̂_k,l,i, et
un sous-module de recombinaison 362 des composantes débruitées de sous-bande X̂_k,l,i pour obtenir le signal débruité X̂_k,l à partir des modules |X̂_k,l,i| et des phases θ_k,l, de sorte que le signal débruité de sortie SD_k,l corresponde au signal débruité X̂_k,l, soit SD_k,l = X̂_k,l.

In the first embodiment illustrated on the figure 1 , the determination module 36 of the output denoised signal SD _{k, 1} comprises:

a root-square sub-module 361 for calculating the module | X _{k, l, i} | debrueted components of sub-band X _{k, l, i} , and
a recombination sub-module 362 of the subband de-waving components X _{k, l, i} to obtain the denoised signal X _{k, l} from the modules | X _{k, l, i} | and phases θ _{k, 1} , such that the output unedited signal SD _{k, 1} corresponds to the denoised signal X _{k, 1} , ie SD _{k, l} = X _{k, l} .

D'un point de vue mathématique, on a :

le sous-module racine-carré 361 réalise le calcul : $|{\hat{X}}_{k, l}| = \sqrt{{|{\hat{X}}_{k, l}|}^{2}};$
et
le sous-module de recombinaison 362 effectue la réinjection de la phase, comme suit : SD_k,l = X̂_k,l = |X̂_k,l| cos θ_k,l + j |X̂_k,l| sin θ_k,l.

From a mathematical point of view, we have:

the root-square sub-module 361 performs the calculation: $|{\hat{X}}_{k, l}| = \sqrt{{|{\hat{X}}_{k, l}|}^{2}};$
and
the recombination submodule 362 performs the reinjection of the phase, as follows: SD _{k, l} = X _{k, l} = | X _{k, l} | cos θ _{k, l} + j | X _{k, l} | sin θ _{k, l} .

Dans le second mode de réalisation illustré sur la figure 2, le module de détermination 36 du signal débruité de sortie SD_k,l comprend :

en sortie du premier module d'extraction 31, un sous-module d'amplification 363 selon un coefficient d'amplification γ, préférentiellement compris entre 0,01 et 0,1, afin de délivrer un signal amplifié γ |Y_k,l|² ;
en sortie du module d'estimation « SSMB » 35, un sous-module additionneur 364 propre à additionner le signal amplifié γ |Y_k,l|² et les modules carré |X̂_k,l,i|², afin de délivrer en sortie les module carré |X̂_k,l,i|2 de composantes débruitées combinées de sous-bande X̂_k,l,i propres à chaque sous-bande SBi d'un signal débruité combiné X̂_k,l, répondant à l'équation correspondante : ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2};$
un sous-module racine-carré 361 pour calculer le module |X̂_k,l,i| des composantes débruitées combinées de sous-bande X̂_k,l,i ; et
un sous-module de recombinaison 362 des composantes combinées de sous-bande X̂_k,l,i pour obtenir le signal débruité combiné X̂_k,l, à partir des modules |X̂_k,l,i| et des phases θ_k,l, de sorte que le signal débruité de sortie SD_k,l correspond au signal débruité combiné X̂_k,l, soit SD_{k,l =} X̂_k,l.

In the second embodiment illustrated on the figure 2 , the determination module 36 of the output denoised signal SD _{k, 1} comprises:

at the output of the first extraction module 31, an amplification sub-module 363 according to an amplification coefficient γ, preferably between 0.01 and 0.1, in order to deliver an amplified signal γ | Y _{k, l} | ² ;
at the output of the estimation module "SSMB" 35, an adder sub-module 364 capable of adding the amplified signal γ | Y _{k, l} | ² and the square modules | X _{k, l, i} | ² , in order to output the square modules | x _{k, l, i} | 2 of combined de-banded components of sub-band X _{k, l, i} specific to each sub-band SBi of a combined de-signaled signal X _{k, l} , responding to the corresponding equation: ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2};$
a root-square sub-module 361 for calculating the module | X _{k, l, i} | combined de-banded components of sub-band X _{k, l, i} ; and
a recombination sub-module 362 of the combined sub-band components X _{k, l, i} to obtain the combined de-signaled signal X _{k, l} , from the modules | X _{k, l, i} | and phases θ _{k, 1} , so that the output unedited signal SD _{k, 1} corresponds to the combined de-noise signal X _{k, 1} , ie SD _{k, l =} X _{k, l} .

D'un point de vue mathématique, on a :

le sous-module d'amplification 363 délivre en sortie le signal amplifié $γ {|Y_{k, l}|}^{2};$
le sous-module additionneur 364 réalise le calcul : ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2}$
le sous-module racine-carré 361 réalise le calcul : $|{\overset{‿}{X}}_{k, l, i}| = \sqrt{{|{\overset{‿}{X}}_{k, l, i}|}^{2}};$
et
le sous-module de recombinaison 362 effectue la réinjection de la phase, comme suit : SD_{k,l =} X̂_k,l = |X̂_k,l| cos θ _k,l + j |X̂_k,l| sin θ _k,l

From a mathematical point of view, we have:

the amplification sub-module 363 outputs the amplified signal $γ {|Y_{k, l}|}^{2};$
the adder submodule 364 performs the calculation: ${|X_{k, l, i}|}^{2} = {|{\hat{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2}$
the root-square sub-module 361 performs the calculation: $|{\overset{‿}{X}}_{k, l, i}| = \sqrt{{|{\overset{‿}{X}}_{k, l, i}|}^{2}};$
and
the recombination submodule 362 performs the reinjection of the phase, as follows: SD _{k, l =} X _{k, l} = | X _{k, l} | cos θ _{k, l} + j | X _{k, l} | sin θ _{k, l}

Ainsi, dans ce second mode de réalisation, le système 1 met en oeuvre une étape de réinjection d'une partie très faible du signal bruité Y_k,_l, issu du microphone, dans le signal débruité X̂_k,l ; le signal réinjecté correspondant au signal amplifié γ |Y_k,l|². De cette manière, on remédie au moins en partie aux problèmes de distorsion introduits par les modules d'estimation « MCRA » 33 et « SSMB » 35.Thus, in this second embodiment, the system 1 implements a step of reinjection of a very small part of the noisy signal Y _k , ₁ , coming from the microphone, into the denoised signal X _{k, l} ; the reinjected signal corresponding to the amplified signal γ | Y _{k, l} | ^2. In this way, at least some of the distortion problems introduced by the "MCRA" 33 and "SSMB" estimation modules 35 are remedied.

Le système 1 comprend, en sortie de l'unité de traitement numérique 3, une unité de conversion 4 dans le domaine temporel.The system 1 comprises, at the output of the digital processing unit 3, a conversion unit 4 in the time domain.

Cette unité de conversion 4 comporte un module de calcul 41 d'un signal de sortie Sd_l propre à chaque trame temporelle I pair application d'une transformée de Fourier inverse du signal débruité de sortie SD_k,l.This conversion unit 4 comprises a calculation module 41 of an output signal Sd ₁ specific to each time frame I by the application of an inverse Fourier transform of the output denoised signal SD _{k, l} .

D'un point de vue mathématique, ce module de calcul 41 met en oeuvre le calcul de la transformée de Fourier rapide inverse (IFFT pour « Inverse Fast Fourier Transform ») avec une taille N égale à 256 (N correspondant pour rappel au nombre de points de la transformée de Fourier), avec la relation suivante : ${Sd}_{1} = IDFT ({SD}_{k, l})$

où IDFT correspond à la fonction de transformée de Fourier discrète inverse, qui peut être du type transformée de Fourier rapide inverse (IFFT).From a mathematical point of view, this calculation module 41 implements the calculation of the inverse fast Fourier Transform (IFFT) with a size N equal to 256 (N corresponding to the number of points of the Fourier transform), with the following relation:

{sd}_{1} = IDFT ({SD}_{k, l})

where IDFT corresponds to the inverse discrete Fourier transform function, which can be of the inverse fast Fourier transform (IFFT) type.

En outre, du fait de la symétrie de l'amplitude de la transformée de Fourier (DFT ou FFT) des signaux réels, le traitement de réduction du bruit et de reconstruction selon la méthode « OLA » se fera uniquement sur les premiers [N/2 + 1] premiers points d'échantillonnage, soit sur les 129 premiers points d'échantillonnage pour N égal à 256, sachant que l'on a la relation suivante de symétrie hermitienne : ${SD}_{- k, l} = {SD}_{k, l}^{*}$

In addition, due to the symmetry of the Fourier transform (DFT or FFT) amplitude of the real signals, the noise reduction and reconstruction processing according to the "OLA" method will be done only on the first [N / 2 + 1] first sampling points, ie on the first 129 sampling points for N equal to 256, knowing that we have the following Hermitian symmetry relation:

{SD}_{- k, l} = {SD}_{k, l}^{*}

Dans le cas du premier mode de réalisation illustré sur la figure 1, on a la relation suivante : Sd_l = x̂_l = IFFT(X̂_k,l).In the case of the first embodiment illustrated on the figure 1 we have the following relation: Sd _l = x _l = IFFT (X _{k, l} ).

Dans le cas du second mode de réalisation illustré sur la figure 2, on a la relation suivante : Sd_l = x̂_l = IFFT(X̂_k,l).In the case of the second embodiment illustrated on the figure 2 we have the following relation: Sd _l = x _l = IFFT (X _{k, l} ).

Cette unité de conversion 4 comporte, en sortie du module de calcul 41, un module de reconstruction 42 d'un signal vocal débruité de sortie sd(t) dans le domaine temporel à partir des signaux de sortie Sd_l.This conversion unit 4 comprises, at the output of the calculation module 41, a reconstruction module 42 of a speech output signal sd (t) in the time domain from the output signals Sd ₁ .

Le signal temporel est restitué selon la méthode de recouvrement et addition « OLA », la reconstruction du signal vocal débruité de sortie sd(t) étant réalisée par les additions successives des parties en recouvrement des signaux de deux trames temporelles successives, selon le principe : $\{{\tilde{sd}}_{l}\} = \{{Sd}_{l - 1}\} + \{{Sd}_{l}\},$

qui se traduit dans le premier mode de réalisation par :

\{{\tilde{sd}}_{l}\} = \{{\hat{x}}_{l - 1}\} + \{{\hat{x}}_{l}\},

et qui se traduit dans le second mode de réalisation par :

\{{\tilde{sd}}_{1}\} = \{{\overset{‿}{s}}_{l - 1}\} + \{{\overset{‿}{x}}_{l}\} .

The temporal signal is restored according to the "OLA" overlay and addition method, the reconstruction of the output defracked speech signal sd (t) being performed by the successive additions of the parts in recovery of the signals of two successive time frames, according to the principle:

\{{\tilde{sd}}_{l}\} = \{{sd}_{l - 1}\} + \{{sd}_{l}\},

which is translated in the first embodiment by:

\{{\tilde{sd}}_{l}\} = \{{\hat{x}}_{l - 1}\} + \{{\hat{x}}_{l}\},

and which is translated in the second embodiment by:

\{{\tilde{sd}}_{1}\} = \{{\overset{‿}{s}}_{l - 1}\} + \{{\overset{‿}{x}}_{l}\} .

A chaque fois qu'une trame temporelle du signal débruité de sortie SD_k,_l est délivrée dans le domaine fréquentiel et que sa transformée de Fourier inverse Sd_l est calculée, les premiers N/2 points d'échantillonnage seront additionnés avec les derniers N/2 points d'échantillonnage de la trame traitée précédente. Les derniers N/2 points d'échantillonnage de la trame traitée en cours seront quand à eux stockés en mémoire pour être à leur tour utilisés lors du traitement de la trame suivante.Whenever a time frame of the output denoised signal SD _k , _l is delivered in the frequency domain and its inverse Fourier transform Sd _l is computed, the first N / 2 sampling points will be added together with the last N / 2 sampling points of the previous processed frame. The last N / 2 sampling points of the current processed frame will be stored in memory for use in processing the next frame.

Autrement dit, après traitement dans le domaine spectral, la trame temporelle du signal débruité de sortie SD_k,l passe par le module de calcul 41 de la transformée de Fourier inverse, puis sa première moitié (N/2 premiers points d'échantillonnage) est additionnée avec la seconde moitié (N/2 derniers points d'échantillonnage) sauvegardée de la trame précédente, tandis que sa seconde moitié (N/2 premiers points d'échantillonnage) est sauvegardée pour le prochain bloc.In other words, after processing in the spectral domain, the time frame of the output denoised signal SD _{k, 1} passes through the calculation module 41 of the inverse Fourier transform, then its first half (N / 2 first sampling points). is added with the second half (N / 2 last sampling points) saved from the previous frame, while its second half (N / 2 first sampling points) is saved for the next block.

Avec N égal à 256, on a N/2 qui est égal à 128, étant rappelé que le taux de recouvrement de deux trames successives est de cinquante pourcent (50 %).With N equal to 256, we have N / 2 which is equal to 128, being reminded that the recovery rate of two successive frames is fifty percent (50%).

De manière optionnelle et avantageuse, l'unité de traitement numérique 3 comprend en outre, en sortie du module d'estimation « MCRA » 33, un module de détection du bruit 37 qui pilote le module de reconstruction 42 selon le principe suivant :

mise en oeuvre d'un calcul d'un rapport moyen signal à bruit r_l pour effectuer une détection de bruit ;
comparaison de ce rapport moyen signal à bruit r_l avec un seuil ψ_TH pour établir si le bruit est présent (r_l < ψ_TH) ou si le bruit est absent ou du moins extrémement faible (r_l > ψ_TH) ; et
pilotage du module de reconstruction 42 selon les règles suivantes :
si le bruit est présent (r_l < ψ_TH), alors on prend en compte le signal traité numériquement, c'est-à-dire le signal de sortie Sd_l pour la reconstruction temporelle ;
si le bruit est absent ou extrémement faible (r_l > ψ_TH), alors on ne prend pas en compte le signal de sortie Sd_l mais on prend en compte directement le signal échantillonné y_l pour la reconstruction temporelle, ce qui revient à ignorer les traitements de réduction du bruit (« MCRA », « SSMB ») pour cette trame I, avec l'avantage d'éviter des distorsions inutiles lorsque le niveau de bruit est tel qu'un traitement de réduction du bruit n'est pas nécessaire.

In an optional and advantageous manner, the digital processing unit 3 further comprises, at the output of the estimation module "MCRA" 33, a noise detection module 37 which drives the reconstruction module 42 according to the following principle:

implementing a calculation of a signal-to-noise average ratio r ₁ for performing noise detection;
comparing this signal to noise ratio r ₁ with a threshold ψ _TH to establish whether the noise is present (r _l <ψ _TH ) or if the noise is absent or at least extremely low (r _l > ψ _TH ); and
driving the reconstruction module 42 according to the following rules:
if the noise is present (r _l <ψ _TH ), then we take into account the digitally processed signal, that is to say the output signal Sd _l for time reconstruction;
if the noise is absent or extremely weak (r _l > ψ _TH ), then we do not take into account the output signal Sd _l but we take into account directly the sampled signal y _l for the time reconstruction, which amounts to ignoring noise reduction processing ("MCRA", "SSMB") for this frame I, with the advantage of avoiding unnecessary distortions when the noise level is such that a noise reduction treatment is not necessary .

De ce fait, dans une situation d'absence ou quasi absence de bruit, les distorsions qui peuvent être apportées par le traitement de réduction du bruit seront éliminées.Therefore, in a situation of absence or near absence of noise, the distortions that can be made by the noise reduction treatment will be eliminated.

D'un point de vue structurel, le module de détection du bruit 37 comprend :

un module de calcul d'un rapport moyen signal à bruit r_l propre à chaque trame temporelle 1 à partir du module carré |Y_k,l|² et de la composante de bruit D_k,l;
un module de comparaison du rapport moyen signal à bruit r_l propre à chaque trame temporelle I avec un seuil ψ_TH prédéterminé ;
un module de contrôle du module de reconstruction 42 du signal vocal débruité de sortie sd(t) qui est conçu pour que :
- si le rapport moyen signal à bruit r_l est inférieur audit seuil ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction pour cette trame temporelle I correspond au signal de sortie Sd_l issu du module de calcul dudit signal de sortie Sd_l ;
- si le rapport moyen signal à bruit r_l est supérieur audit seuil ψ_TH pour la trame temporelle I, alors le signal considéré avant reconstruction pour cette trame temporelle I correspond au signal échantillonné y_l issu du module de découpage du signal acoustique bruité y(t).

From a structural point of view, the noise detection module 37 comprises:

a module for calculating a signal-to-noise average ratio r ₁ specific to each time frame 1 from the square module | Y _{k, l} | ² and the noise component D _{k, l} ;
a module for comparing the average signal-to-noise ratio r ₁ for each time frame I with a predetermined threshold ψ _TH ;
a module of control of the reconstruction module 42 of the output speech signal noiseless output sd (t) which is designed so that:
- if the average signal-to-noise ratio r ₁ is less than said threshold ψ _TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the output signal Sd ₁ from the calculation module of said output signal Sd _l ;
- if the average signal-to-noise ratio r _l is greater than said threshold ψ _TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the sampled signal y ₁ from the noisy acoustic signal cutting module y (t ).

D'un point de vue mathématique, le module de détection du bruit 37 met en oeuvre l'algorithme de calcul suivant, pour chaque trame temporelle l:

g.1) calcul d'une composante de bruit moyenne D_l à partir de la composante de bruit D̂_k,l estimée par le module d'estimation « MCRA » 33 et répondant à l'équation : ${\overline{D}}_{1} = \frac{1}{M} \sum_{k = 0}^{M - 1} {\hat{D}}_{k, l}$

où M est une constante prédéterminée égale à N/2, N étant pour rappel le nombre de points de la transformée de Fourier ;
g.2) calcul d'un module carré moyen |Y_k,l|² du signal bruité Y_k,l répondant à l'équation : $\overline{{|Y_{k, l}|}^{2}} = \frac{1}{M} \sum_{k = 0}^{M - 1} {|Y_{k, l}|}^{2}$
g.3) calcul d'une composante filtrée P_l du module carré moyen |Y_k,l|² répondant à l'équation : $P_{1} = λ P_{l - 1} + (1 - λ) \overline{{|Y_{k, l}|}^{2}}$

où - λ est une constante prédéterminée caractéristique d'un filtre passe-bas, de préférence compris entre 0,80 et 0,99 ;
- $P_{0} = {\overline{D}}_{0} = \frac{1}{M} \sum_{k = 0}^{M - 1}$
  D̂_k,0 pour initialiser l'algorithme.
g.4) calcul du rapport moyen signal à bruit r_l répondant à l'équation :
- si D _l > 0 alors $r_{l} = \frac{P_{l}}{{\overline{D}}_{l}},$
- si D _l < 0 alors r_l = 0.

From a mathematical point of view, the noise detection module 37 implements the following calculation algorithm, for each time frame 1:

g.1) calculation of an average noise component D _l from the noise component D _{k, l} estimated by the estimation module "MCRA" 33 and corresponding to the equation: ${\tilde{D}}_{1} = \frac{1}{M} Σ_{k = 0}^{M - 1} {\hat{D}}_{k, l}$

where M is a predetermined constant equal to N / 2, N being for reference the number of points of the Fourier transform;
g.2) calculating a mean square module | Y _{k, l} | ² of the noisy signal Y _{k, l} corresponding to the equation: $\tilde{{|Y_{k, l}|}^{2}} = \frac{1}{M} Σ_{k = 0}^{M - 1} {|Y_{k, l}|}^{2}$
g.3) calculating a filtered component P _l of the average square module | Y _{k, l} | ² answering the equation: $P_{1} = λ P_{l - 1} + (1 - λ) \tilde{{|Y_{k, l}|}^{2}}$

where - λ is a predetermined constant characteristic of a low-pass filter, preferably between 0.80 and 0.99;
- $P_{0} = {\tilde{D}}_{0} = \frac{1}{M} Σ_{k = 0}^{M - 1}$
  D _{k, 0} to initialize the algorithm.
g.4) calculation of the average signal to noise ratio r _l corresponding to the equation:
- if D _l > 0 then $r_{l} = \frac{P_{l}}{{\tilde{D}}_{l}},$
- if D _l <0 then r _l = 0.

Bien entendu l'exemple de mise en oeuvre évoqué ci-dessus ne présente aucun caractère limitatif et d'autres améliorations et détails peuvent être apportés au système de réduction selon l'invention, sans pour autant sortir du cadre de l'invention.Of course the implementation example mentioned above is not limiting and other improvements and details can be made to the reduction system according to the invention, without departing from the scope of the invention.

Ainsi, il est envisageable de se passer de l'extraction de la phase de la phase θ_k,l, du signal bruité Y_k,l comme illustré sur la figure 3 où le système 1 ne comporte pas le deuxième module d'extraction 32 de la phase θ _k,l du signal bruité Y_k,l. Dans ce cas, le sous-module de recombinaison 362 des composantes débruitées de sous-bande X̂_k,l,i effectue le calcul du signal débruité X̂_k,l à partir des modules |X_k,l,i| et du signal bruité Y_k,l, En effet, la réinjection de la phase peut être effectuée directement à partir du signal bruité Y_k,l qui intègre intrinsèquement cette phase.Thus, it is conceivable to dispense with the extraction of the phase of the phase θ _{k, 1,} from the noisy signal Y _{k, l} as illustrated in FIG. figure 3 where the system 1 does not include the second extraction module 32 of the phase θ _{k, 1} of the noisy signal Y _{k, l} . In this case, the recombination sub-module 362 of the de-banded components of sub-band X _{k, l, i} calculates the denoised signal X _{k, l} from the modules | X _{k, l, i} | and the noisy signal Y _{k, l} , Indeed, the reinjection of the phase can be carried out directly from the noisy signal Y _{k, l} which intrinsically integrates this phase.

A cet effet, le sous-module de recombinaison 362 réalise le calcul suivant : ${\hat{X}}_{k, l} = |{\hat{X}}_{k, l}| \frac{r_{k, l}}{|Y_{k, l}|} + j |{\hat{X}}_{k, l}| \frac{i_{k, l}}{|Y_{k, l}|} = G_{k, l} r_{k, l} + j G_{k, l} i_{k, l},$

soit X̂_k,l = G_k,l Y_k,l.For this purpose, the recombination sub-module 362 performs the following calculation:

{\hat{X}}_{k, l} = |{\hat{X}}_{k, l}| \frac{r_{k, l}}{|Y_{k, l}|} + j |{\hat{X}}_{k, l}| \frac{i_{k, l}}{|Y_{k, l}|} = {BOY WUT}_{k, l} r_{k, l} + j {BOY WUT}_{k, l} i_{k, l},

let X _{k, l} = G _{k, l} Y _{k, l} .

Avec G_k,l le gain de l'algorithme de reduction de bruit. Avec ce calcul, il n'est donc plus nécessaire de calculer, stocker et réinjecter la phase.With G _{k, l} the gain of the noise reduction algorithm. With this calculation, it is no longer necessary to calculate, store and reinject the phase.

Le système 1 de la figure 3 est une variante du système 1 de la figure 2 sans le deuxième module d'extraction 32, mais il est bien entendu également envisageable de prévoir la suppression du deuxième module d'extraction 32 dans le système 1 de la figure 1.System 1 of the figure 3 is a variant of the system 1 of the figure 2 without the second extraction module 32, but it is of course also conceivable to provide for the removal of the second extraction module 32 in the system 1 of the figure 1 .

Claims

A method for reducing noise on a noisy acoustic signal y(t) from a microphone operating in a noisy environment, including the following successive steps:
a) converting the noisy acoustic signal y(t) in the time domain into a noisy signal Y_k,l in the frequency domain, by time slicing the noisy acoustic signal y(t) into sampled signals y_l into successive time frames I, windowing the sampled signals y_l by applying a weighting window, and applying a discrete Fourier transform, with extraction of the square module |Y_k,l|², and possibly the phase θ_k,l, from the noisy signal Y_k,l;

b) estimating a noise component D̂_k,l contained in the noisy signal Y_k,l from the square module |Y_k,l|², by an algorithm of power spectral density estimation of the noise component according to a method of minima controlled recursive averaging called "MCRA";
characterized in that it further comprises, after the step b), the following successive steps:

c) slicing the frequency band into several frequency sub-bands SB_i=[e_i, b_i], followed by multi-band decomposing the square module |Y_k,l|² and the noise component D̂_k,l, consisting in decomposing the square module |Y_k,l|² and the noise component D̂_k,l, into respectively several square modules of sub-band |Y_k,l,i|² and several noise components of sub-band D̂_k,l,i specific to each sub-band SB_i;

d) estimating, for each sub-band SB_i, the square module |X̂_k,l,i|² of a denoised component of sub-band X̂_k,l,¡ specific to each sub-band SB_i of a denoised signal X̂_k,l, by an algorithm of multi-band spectral subtraction called "SSMB" from the square modules of sub-band |Y_k,l,i|² and noise components of sub-band D̂_k,l,i;

e) determining an output denoised signal SD_k,l from the square modules |X̂_k,l,i|² from step d), and possibly the phases θ_k,l extracted during step a);

f) converting the output denoised signal SD_k,l into an output denoised vocal signal sd(t) in the time domain, by a step f.1) of calculating an output sampled signal sd_l specific to each time frame I by applying an inverse Fourier transform of the output denoised signal SD_k,l, followed by a step f.2) of time reconstructing the output denoised vocal signal sd(t) from the output sampled signals sd_l;
and wherein the step e) consists of:
- determining, for each sub-band SB_i, the square module ${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
of a combined denoised component of sub-band ${\overset{‿}{X}}_{k, l, i}$
specific to each sub-band SB_i of a combined denoised signal ${\overset{‿}{X}}_{k, l},$
satisfying the corresponding equation: ${|{\overset{\lor}{X}}_{k, l, i}|}^{2} = {|{\overset{\land}{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2}$

where γ is a predetermined amplification coefficient, preferably ranging between 0.01 and 0.1;
- determining a combined denoised signal ${\overset{‿}{X}}_{k, l}$
from the square modules
${|{\overset{‿}{X}}_{k, l, i}|}^{2}$
of the combined denoised components of sub-band ${\overset{‿}{X}}_{k, l, i},$
and possibly the phases θ_k,l extracted during the step a), such that the output denoised signal SD_k,l corresponds to the combined denoised signal ${\overset{‿}{X}}_{k, l},$
namely ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$
The method according to claim 1, wherein the algorithm of power spectral density estimation of the noise component according to the method of minima controlled recursive averaging called "MCRA" during step b) implements the following calculation phases:
b.1) calculating a filtered noisy component S_k,l satisfying the equation: $S_{k, l} = α_{s} S_{k, l - 1} + (1 - α_{s}) {|Y_{k, l}|}^{2}$

where α_s is a predetermined constant characteristic of a low-pass filter;

b.2) calculating a speech presence probability density p̃_k,l by implementing the following progressive calculation:
(i) calculating a spectral minimum component Smin_k,l with:
- if rem(k,l) = 0
then Smin_k,l = min (Smin_k,l-1; S_k,l) and Stmp_k,l = S_k,l

- if rem(k,l) ≠ 0
then Smin_k,l = min (Stmp_k,l-1; S_k,l) and
Stmp_k,l = min (Stmp_k,l-1; S_k,l)
where rem(k,l) is the rest of the integer division of k by I, then

(ii) calculating a spectral ratio Sr_k,l satisfying the equation: ${Sr}_{k, l} = \frac{S_{k, l}}{{Smin}_{k, l}}$

(iii) calculating an indicator variable I_k,l with:
- if Sr_k,l > δ_TH then I_k,l = 1

- if Sr_k,l ≤ δ_TH then I_k,l = 0
where δ_TH is a fixed threshold predetermined parameter of speech detection;

(iv) calculating the speech presence probability density p̃_k,l with:
p̃_k,l = α_p p̃_k,l-₁ + (1 - α_p) I_k,l

where α_p is a predetermined constant;

b.3) calculating a coefficient α̃_k,l satisfying the following equation:
α̃_k,l = α + (1 - α) p̃_k,l,
where α is a predetermined constant;

b.4) calculating the noise component D̂_k,l satisfying the following equation: ${\hat{D}}_{k, l} = {\tilde{α}}_{k, l} {\hat{D}}_{k, l - 1} + (1 - {\tilde{α}}_{k, l}) {|Y_{k, l}|}^{2} .$
The method according to any one of claims 1 and 2, wherein the algorithm of multi-band spectral subtraction called "SSMB" of step d) implements the following calculation phases, for each sub-band SB_i:
d.1) calculating a signal-to-noise ratio SNR_k,l,i specific to each sub-band SB_i satisfying the following equation: ${SNR}_{k, l, i} = 10. \log_{10} (\frac{Σ_{k = ei}^{bi} {|Y_{k, l, i}|}^{2}}{Σ_{k = ei}^{bi} |{\hat{D}}_{k, l, i}|})$

d.2) calculating the square module |X̂_k,l,i|² of the denoised component of sub-band X̂_k,l,i specific to each sub-band SB_i, according to the equation: ${|{\hat{X}}_{k, l, i}|}^{2} = {\begin{array}{l} {|Y_{k, l, i}|}^{2} - α_{i} δ_{i} |{\hat{D}}_{k, l, i}| & if & {|Y_{k, l, i}|}^{2} > α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \\ β {|Y_{k, l, i}|}^{2} & if & {|Y_{k, l, i}|}^{2} \leq α_{i} δ_{i} |{\hat{D}}_{k, l, i}| \end{array}$

where - δ_i is a variable parameter according to the corresponding sub-band SB_i, taking distinct values from one sub-band to another;
- α_i is a variable parameter which depends on the value of the signal-to-noise ratio SNR_k,l,i calculated in the corresponding sub-band SB_i; and

- β is a constant.
The method according to claim 3, wherein the parameters _αi satisfy the following equations: $α_{i} = {\begin{array}{l} α_{c 1} & if & {SNR}_{k, l, i} < {SNR}_{1} \\ α_{c 2} + α_{c 3} {SNR}_{k, l, i} & if & {SNR}_{1} \leq {SNR}_{k, l, i} \leq {SNR}_{2} \\ α_{c 4} & if & {SNR}_{k, l, i} > {SNR}_{2} \end{array}$

where α_c1, α_c2, α_c3 and α_c4 are predetermined constants, and SNR₁ and SNR₂ are predetermined thresholds.
The method according to any one of claims 1 to 4, wherein the step e) consists of determining the denoised signal X_k,l from the square modules |X̂_k,l,i|² of the denoised components of sub-band X_k,l,i, and possibly of phases θ_k,l extracted during step a), such that the denoised output signal SD_k,l corresponds to the denoised signal X̂_k,l, namely SD_k,l = X̂_k,l.
The method according to any one of claims 1 to 5, wherein the step f.2) consists of reconstructing the output denoised vocal signal sd(t) only from the output signals Sd_l from the step f.1), said output signals Sd_l corresponding to the inverse Fourier transforms of the output denoised signal SD_k,l specific to each time frame I.
The method according to any one of claims 1 to 5, wherein the step f.2) consists, for each time frame I of:
g) calculating a mean signal-to-noise ratio r_l specific to the time frame I from the square module |Y_k,l|² and the noise component D̂_k,l;

h) comparing the mean signal-to-noise ratio r_l with a predetermined threshold ψ_TH;

i) reconstructing the output denoised vocal signal sd(t) by taking into consideration that:
- if the mean signal-to-noise ratio r_l is lower than said threshold ψ_TH for the time frame I, then the considered signal before time reconstruction for this time frame I corresponds to the output signal Sd_l from step f.1);

- if the mean signal-to-noise ratio r_l is higher than said threshold ψ_TH for the time frame I, then the considered signal before time reconstruction for this time frame I corresponds to the sampled signal y_l from the slicing step of step a).
The method according to claim 7, wherein the step g) implements the following calculation algorithm, for each time frame I:
g.1) calculating a mean noise component D ₁ from the noise component D̂_k,l estimated during the step b) and satisfying the equation: ${\overline{D}}_{l} = \frac{1}{M} Σ_{k = 0}^{M - 1} {\hat{D}}_{k, l}$

where M is a predetermined constant, preferably equal to N or to N/2, N being the number of sampling points of the Fourier transform;

g.2) calculating a mean square module |Y_k,l|² of the noisy signal Y_k,l satisfying the equation: $\overline{{|Y_{k, l}|}^{2}} = \frac{1}{M} Σ_{k = 0}^{M - 1} {|Y_{k, l}|}^{2}$

g.3) calculating a filtered component P_l of the mean square module |Y_k,l|² satisfying the equation: $P_{l} = λ P_{l - 1} + (1 - λ) \overline{{|Y_{k, l}|}^{2}}$

where - λ is a predetermined constant characteristic of a low-pass filter, preferably ranging between 0.80 and 0.99;
- $P_{0} = {\overline{D}}_{0} = \frac{1}{M} \sum_{k = 0}^{M - 1} {\hat{D}}_{k, 0}$

for initializing the algorithm.

g.4) calculating the mean signal-to-noise ratio r_l satisfying the equation:
- if D_l > 0 then $r_{1} = \frac{P_{1}}{{\overline{D}}_{1}}$

- if D_l ≤ 0 then r_l = 0.
The method according to any one of the preceding claims, wherein the steps a) and f) of conversion implement an overlap-add method called "OLA", with:
- for step a), a slicing of the noisy acoustic signal y(t) into time frames with an overlapping between the successive time frames;

- for step f.2), the reconstructing of the output denoised vocal signal sd(t) is achieved by the successive additions of the overlapping portions of the signals of two successive time frames.
A system (1) for reducing noise on a noisy acoustic signal y(t) from a microphone operating in a noisy environment, comprising:
- a unit (2) for converting the noisy acoustic signal y(t) in the time domain into a noisy signal Y_k,l in the frequency domain, including:

- a module (21) for slicing the noisy acoustic signal y(t) into sampled signals y_l in successive time frames I;

- at the output of the slicing module (21), a module (22) for windowing the sampled signals y_l by applying a weighting window;

- at the output of the windowing module (22), a module (23) for calculating a discrete Fourier transform which outputs the noisy signal Y_k,l;

- a digital processing unit (3) in the frequency domain including, at the output of the conversion unit:

- a first module (31) for extracting the square module |Y_k,l|² from the noisy signal Y_k,l; and possibly a second module (32) for extracting the phase θ _k,l from the noisy signal Y_k,l;

- at the output of the first extraction module (31), an estimation module (33) called "MCRA", for estimating a noise component D̂_k,l contained in the noisy signal Y_k,l from the square module |Y_k,l|² from the first extraction module (31), by an algorithm of power spectral density estimation of the noise component according to a minima controlled recursive averaging method called "MCRA";

- at the output of the first extraction module (31), a module (34) for slicing the frequency band into several frequency sub-bands SB_i=[e_i, b_i], in particular of filter bank type;

- at the output of the estimation module "MCRA" (33) and the module (34) for slicing the frequency band, an estimation module (35), called "SSMB", for estimating the square module |X̂_k,l,i|² of a denoised component of sub-band X̂_k,l,i specific to each sub-band SB_i of a denoised signal X̂_k,l, by a multi-band spectral subtraction algorithm from the square modules of sub-bands |Y_k,l,i|² and noise components of sub-band D̂_k,l,i;

- at the output of the estimation module "SSMB" (35), and possibly the second extraction module (32), a module (36) for determining an output denoised signal SD_k,l from the square module |X̂_k,l,i|² and possibly the phases θ_k,l;

- a conversion unit (4) in the time domain including, at the output of the digital processing unit (3):

- a module (41) for calculating an output signal Sd_l specific to each time frame I by applying an inverse Fourier transform of the output denoised signal SD_k,l ; and

- a module (42) for reconstructing an output denoised vocal signal sd(t) in the time domain from said output signals Sd_l;
wherein the module (36) for determining an output denoised signal SD_k,l comprises:
- at the output of the first extraction module (31), an amplification sub-module (363) according to an amplification coefficient γ, preferably ranging between 0.01 and 0.1 in order to output an amplified signal γ |Y_k,l|² ;

- at the output of the estimation module "SSMB" (35), an adder sub-module (364) suitable for adding the amplified signal γ |Y_k,l|² and the square modules |X_k,l,i|², in order to output the square module |X_k,l,i|² of combined denoised components of sub-band ${\overset{‿}{X}}_{k, l, i}$
specific to each sub-band SB_i of a combined denoised signal ${\overset{‿}{X}}_{k, l, i},$
satisfying the corresponding equation: ${|{\overset{\lor}{X}}_{k, l, i}|}^{2} = {|{\overset{\land}{X}}_{k, l, i}|}^{2} + γ {|Y_{k, l, i}|}^{2};$

- a square root sub-module (361) for calculating the module $|{\overset{‿}{X}}_{k, l, i}|$
of the combined denoised components of sub-band ${\overset{‿}{X}}_{k, l, i};$
and

- a sub-module (362) for recombining combined components of sub-band ${\overset{‿}{X}}_{k, l, i}$
in order to obtain the combined denoised signal ${\overset{‿}{X}}_{k, l},$
from the modules | $|{\overset{‿}{X}}_{k, l, i}|,$
and possibly the phases θ _k,l, such that the output denoised signal SD_k,l corresponds to the combined denoised signal ${\overset{‿}{X}}_{k, l},$
namely ${SD}_{k, l} = {\overset{‿}{X}}_{k, l} .$
The system (1) according to claim 10, wherein the module (36) for determining the output denoised signal SD_k,l comprises:
- a square root sub-module (361) for calculating the module |X̂_k,l,i| of the denoised components of sub-band X̂_k,l,i; and

- a sub-module (362) for recombining the denoised components of sub-band X̂_k,l,i in order to obtain the denoised signal R_k,l based on the modules |X̂_k,l,i|, and possibly the phases θ_k,l, such that the output denoised signal SD_k,l corresponds to the denoised signal R_k,l, namely SD_k,l = X̂_k,l.
The system (1) according to any one of claims 10 and 11, wherein the digital processing unit (3) further comprises, at the output of the estimation module "MCRA" (33), a noise detection module (37) comprising
- a module for calculating a mean signal-to-noise ratio r_l specific to each time frame I from the square module |Y_k,l|² and the noise component D̂_k,l;

- a module for comparing the mean signal-to-noise ratio r_l specific to each time frame I with a predetermined threshold ψ_TH;

- a module for controlling the module (42) for reconstructing the output denoised vocal signal sd(t) which is designed in such a manner that:

- if the mean signal-to-noise ratio r_l is lower than said threshold ψ_TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the output signal Sd_l from the calculation module of said output signal Sd_l;

- if the mean signal-to-noise ratio r_l is higher than said threshold ψ_TH for the time frame I, then the signal considered before reconstruction for this time frame I corresponds to the sampled signal y_l from the module for slicing the noisy acoustic signal y(t).