CN100492495C - Apparatus and method for detecting noise - Google Patents

Apparatus and method for detecting noise Download PDF

Info

Publication number
CN100492495C
CN100492495C CNB2005101301670A CN200510130167A CN100492495C CN 100492495 C CN100492495 C CN 100492495C CN B2005101301670 A CNB2005101301670 A CN B2005101301670A CN 200510130167 A CN200510130167 A CN 200510130167A CN 100492495 C CN100492495 C CN 100492495C
Authority
CN
China
Prior art keywords
module
signal
current frame
voice
frame signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005101301670A
Other languages
Chinese (zh)
Other versions
CN1787079A (en
Inventor
林中松
邓昊
王箫程
冯宇红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vimicro Corp
Original Assignee
Vimicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimicro Corp filed Critical Vimicro Corp
Priority to CNB2005101301670A priority Critical patent/CN100492495C/en
Publication of CN1787079A publication Critical patent/CN1787079A/en
Application granted granted Critical
Publication of CN100492495C publication Critical patent/CN100492495C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Noise Elimination (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a noise detecting device and method, receiving a current frame signal and add one-step length to a count value as receiving a frame signal each time, and as the count value and the preset constant meet a fixed proportion relation or meets the fixed proportion relation but the current frame signal is forecast as a voice signal, according to local least frequency spectrum energy of the previous frame signal and the frequency spectrum energy of the current frame signal, calculating the local least frequency spectrum energy of the current frame signal, and as they meet the fixed proportion relation and the current frame signal is forecasted as a voice signal, subtracting one-step length from the current count value; as they meet the fixed proportion relation but the current frame signal is not forecasted as a voice signal, according to the temporary least frequency spectrum energy of the previous frame signal and the frequency spectrum energy of the current frame signal, calculating the local least frequency spectrum energy of the current frame signal, and after this, according to the frequency spectrum energy and local least frequency spectrum energy of the current frame signal, judging whether the current frame signal is a pure noise, avoiding the voice being estimated as a noise and raising the noise detecting accuracy.

Description

Noise detection device and method
Technical Field
The invention relates to the technical field of signal analysis, in particular to a noise detection device and method.
Background
At present, in mobile terminal communication, noise is a main problem causing the reduction of the quality of call voice: the microphone of the mobile terminal collects background noise into the communication module of the mobile terminal while recording the voice of the user. Because the environment of the user is complex and variable, the background noise signal is usually unstable, its spectral characteristics are also variable, and in addition, compared with the speech signal, the energy of the background noise signal is also greatly different from environment to environment, such as: offices usually have small background noise, and subway stations and the like usually have strong background noise.
At present, before the voice is sent to the communication module, the voice is usually subjected to noise reduction processing by a voice enhancement method, so that the purpose of improving the voice quality is achieved. The speech enhancement technology applied to mobile terminal communication must have the following characteristics: the time delay is very small so as not to interfere with normal communication; the method can adapt to the change of background noise and effectively inhibit noise; noise caused by introducing manual treatment is avoided, such as: music noise, etc., without compromising speech quality. In addition, the speech enhancement technique must meet the needs to run in a Digital Signal Processing (DSP) chip or other dedicated chip. Noise detection is an essential part of the monophonic speech enhancement technique. Noise detection, i.e. by estimating values of spectral characteristics of the noise signal, such as: spectral energy, etc., to detect whether the signal is noise. The estimation of the noise spectrum characteristic value can distinguish the voice and the noise in the signal, thereby tracking the continuously changing background noise and achieving the purpose of eliminating the noise in the signal.
The prior art noise detection process is shown in fig. 1, and includes the following main steps:
step 101: for the frequency spectrum energy Y [ i, n ] of the current frame input signal at the frequency point i]Smoothing is carried out to obtain the smooth spectral energy S of the input signal of the current frame at the frequency point if[i,n]。
The smoothing formula is: <math> <mrow> <msub> <mi>s</mi> <mi>f</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mo>-</mo> <mi>w</mi> </mrow> <mi>w</mi> </munderover> <mi>b</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>+</mo> <mi>j</mi> <mo>]</mo> </mrow> <msup> <mrow> <mo>|</mo> <mi>Y</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>+</mo> <mi>j</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow></math>
wherein, w is a frequency value, (2w +1) represents the length of a hanning window, b [ i + j ] is the hanning window value at a frequency point (i + j), j is a frequency value, n is an integer and represents the total frame number of the current input signal, and Y [ i + j, n ] is the spectral energy of the current frame, i.e. the nth frame input signal at the frequency point (i + j) as follows: FFT coefficients at frequency point (i + j).
Step 102: to Sf[i,n]Performing time recursion operation to obtain recursion smooth spectrum energy S [ i, n ] of current frame input signal at frequency point i]。
The temporal recursion formula is: s [ i, n ]]=αsS[i,n-1]+(1-αs)Sf[i,n] (2)
Wherein, S [ i, n ]]Recursively smoothed spectral energy, alpha, at frequency point i for an input signal of a previous frame of a current framesIs constant and satisfies: 0<αs<1。
Step 103: calculating local minimum recursive smooth spectrum energy S of current frame input signalmin[i,n]:
S min [ i , n ] = min { S min [ i , n - 1 ] , S [ i , n ] } S tmp [ i , n ] = min { S tmp [ i , n - 1 ] , S [ i , n ] } , When n is not constantInteger multiple of L (3)
S min [ i , n ] = min { S tmp [ i , n - 1 ] , S [ i , n ] } S tmp [ i , n ] = S [ i , n ] , When n is an integer multiple of the constant L (4)
Wherein S ismin[i,n-1]For the previous frame of the current frame: local minimum of (n-1) th frame input signal at frequency point iRecursively smoothed spectral energy, Stmp[i,n-1]For a temporary minimum recursive smoothing of the spectral energy, S, at a frequency point i of the input signal of a previous frame of the current frametmp[i,n]The spectral energy is smoothed for a temporary minimum recursion of the input signal of the current frame at frequency point i.
Step 104: judgment of <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> If yes, determining the current frame input signal as pure noise and making the noise detection factor I [ I, n ] at the same time]0; otherwise, judging that the current frame input signal is not pure noise, i.e. speech is existed on the current frame input signal, and making I [ I, n ] at the same time]1. Where δ is a constant.
Thereafter, the probability that the speech exists at the frequency point i in the current frame input signal can be calculated according to the following formula p ~ [ i , n ] :
<math> <mrow> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>=</mo> <msub> <mi>&alpha;</mi> <mi>p</mi> </msub> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mi>I</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow></math>
Wherein,
Figure C200510130167D00106
probability of speech being present at frequency point i for the input signal of the previous frame of the current frame, alphapIs constant and satisfies: 0<αp<1. It can be seen that when I [ I, n ]]When the content is equal to 0, the content, p ~ [ i , n ] = 0 .
then, the noise spectrum energy lambda of the current frame input signal at the frequency point i is calculated according to the following formulad[i,n]:
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>~</mo> </mover> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>=</mo> <msub> <mi>&alpha;</mi> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>)</mo> </mrow> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>;</mo> </mrow></math>
<math> <mrow> <msub> <mi>&lambda;</mi> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>=</mo> <msub> <mover> <mi>&alpha;</mi> <mo>~</mo> </mover> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <msub> <mi>&lambda;</mi> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mover> <mi>&alpha;</mi> <mo>~</mo> </mover> <mi>d</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>)</mo> </mrow> <msup> <mrow> <mo>|</mo> <mi>Y</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow></math>
Wherein λ isd[i,n-1]The noise spectral energy, alpha, at frequency point i for the input signal of the previous frame of the current framed[i,n]Is constant and satisfies: 0<αd[i,n]<1。
The steps are as follows: in finding local minimum recursion smoothing spectral energy S of current frame input signalmin[i,n]When the total frame number n of the input signal is the integer of L
Several times, according to S in formula (4)tmp[i,n]=S[i,n]Calculating temporary minimum recursive smooth spectrum energy S of current frame input signaltmp[i,n]. Suppose that: m is an integer and n is mL, and it is set that there is no speech in the input signal of the (n-1) th frame, then the local minimum recursive smoothed spectral energy S of the input signal of the (n-1) th frame is known from formula (3)min[i,n-1]=min{Smin[i,n-2],S[i,n-1]And i.e.: smin[i,n-1]≤S[i,n-1]At this time, if there is speech with high energy in the input signal of the nth frame, the temporary minimum recursive smooth spectral energy S of the input signal of the nth frame is obtained according to the formula (4)tmp[i,n]=S[i,n]And S [ i, n ]]>S[i,n-1]Therefore, it is clear that: stmp[i,n]>S[i,n-1]≥Smin[i,n-1]If speech is present in the input signals of the n-th to (n + L) th frames, then the temporary minimum recursive smooth spectral energy calculated when the signals are input in the n-th to (n + L) th frames is as follows: stmp[i,n]~Stmp[i,n+L]Will be greater than Smin[i,n-1]In the (n + L) frame, the local minimum recursive smooth spectral energy S of the input signal of the (n + L) frame is obtained according to the formula (4)min[i,n+L]=min{Stmp[i,n+L-1],S[i,n+L]And, and: s [ i, n + L ]]>S[i,n-1]It is clear that: smin[i,n+L]>S[i,n-1]>Smin[i,n-1]Therefore, according to the above process, the noise detection method in the prior art may cause the local minimum recursive smooth spectral energy to be larger in value, thereby causing a part of the speech to be estimated as noise incorrectly, and applying the noise detection method to speech enhancement may cause the speech signal to be subtracted from the original input signal incorrectly, thereby causing speech in some time domains to be suddenly attenuated or to sound unnatural.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a noise detection apparatus and method to avoid the speech being estimated as noise incorrectly, and to improve the accuracy of noise detection.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a noise detection apparatus, the apparatus comprising:
the spectrum energy calculation module is used for receiving the signals and outputting the calculated spectrum energy of the current frame signal to the minimum spectrum energy calculation module and the noise detection module;
the voice prediction judging module is used for receiving signals, increasing the count value of the module per se by one step when a frame of signal is received at the current frequency point, judging whether the current frame of signal is predicted as voice or not when the current count value of the module per se and a preset constant meet a fixed proportional relation, if so, reducing the current count value of the module per se by one step, and outputting a voice indicating signal to the minimum spectral energy calculating module;
the minimum spectral energy calculating module is used for receiving the spectral energy of the current frame signal output by the spectral energy calculating module, and when a spectral energy value is received at a current frequency point, the counting value of the minimum spectral energy calculating module is increased by one step, and when the current counting value of the minimum spectral energy calculating module and a preset constant do not meet a fixed proportional relation, the obtained local minimum spectral energy of the current frame signal is output to the noise detecting module according to the local minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal; when the current count value of the module per se and a preset constant meet a fixed proportional relation and a voice indication signal output by the voice prediction judgment module is not received, outputting the obtained local minimum spectral energy of the current frame signal to the noise detection module according to the temporary minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal; when receiving a voice indication signal output by a voice prediction judgment module, outputting the obtained local minimum spectrum energy of the current frame signal to a noise detection module according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal, and subtracting a step length from the current count value of the module per se;
and the noise detection module is used for judging whether the current frame signal is pure noise or not according to the spectral energy of the current frame signal output by the spectral energy calculation module and the local minimum spectral energy of the current frame signal output by the minimum spectral energy calculation module.
The device further comprises a voice existence probability calculation module for calculating the voice existence probability of the current frame signal according to the pure noise indication signal or the non-pure noise indication signal output by the noise detection module,
the noise detection module is further configured to output a pure noise indication signal to the speech existence probability calculation module when the current frame signal is pure noise; and when the current frame signal is not pure noise, outputting a non-pure noise indication signal to a voice existence probability calculation module.
The voice existence probability calculation module is further used for outputting the voice existence probability of the current frame signal to the voice prediction judgment module,
the voice prediction judging module is further used for receiving and storing the voice existence probability of the current frame signal output by the voice existence probability calculating module, judging whether the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than a constant stored by the voice prediction judging module when the current counting value and the preset constant meet a fixed proportional relation, and outputting a voice indicating signal to the minimum spectrum energy calculating module if the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than the constant stored by the voice prediction judging module.
The device further comprises a noise spectrum energy calculating module which is used for calculating the noise spectrum energy of the current frame signal according to the voice existence probability of the current frame signal output by the voice existence probability calculating module and the spectrum energy of the current frame signal output by the spectrum energy calculating module.
The voice prediction judging module comprises: the speech existence posterior probability calculation module and the judgment module are provided, wherein:
the voice posterior probability calculating module is used for receiving signals, adding one step to the count value of the self module when a frame of signal is received at the current frequency point, calculating the posterior probability of voice existing on the previous frame of signal when the current count value of the self module and a preset constant meet a fixed proportional relation, and outputting the posterior probability of voice existing on the previous frame of signal to the judging module, and is used for subtracting one step from the current count value of the self module after the voice indicating signal output by the judging module is received;
and the judging module is used for judging whether the posterior probability of the voice existing on the previous frame signal output by the voice existence posterior probability calculating module is larger than a constant stored by the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module and the voice existence posterior probability calculating module.
The voice prediction judging module comprises: the signal to noise ratio calculating module and judging module a priori, wherein:
the prior signal-to-noise ratio calculation module is used for receiving signals, increasing the count value of the module per se by one step when a frame of signal is received at the current frequency point, calculating the prior signal-to-noise ratio of the current frame of signal when the current count value of the module per se and a preset constant meet a fixed proportional relation, outputting the prior signal-to-noise ratio of the previous frame of signal to the judgment module, and reducing the current count value of the module per se by one step after a voice indication signal output by the judgment module is received;
and the judging module is used for judging whether the prior signal-to-noise ratio of the previous frame signal output by the prior signal-to-noise ratio calculating module is greater than a constant stored in the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module and the prior signal-to-noise ratio calculating module.
A noise detection apparatus, the apparatus comprising:
the counter module is used for counting the number of frames of signals on the current frequency point, adding a step length to the count value when receiving a frame of signal, outputting the frame of signal to the spectrum energy calculation module and the voice prediction judgment module, and outputting the count value to the voice prediction judgment module and the minimum spectrum energy calculation module; when receiving the interrupt signal output by the voice prediction judging module, reducing the current count value by one step length;
the frequency spectrum energy calculating module is used for calculating the frequency spectrum energy of the signal output by the counter module and outputting the frequency spectrum energy to the minimum frequency spectrum energy calculating module and the noise detecting module;
the voice prediction judging module is used for judging whether the signal output by the counter module is predicted to be voice or not when the count value output by the counter module and a preset constant meet a fixed proportional relation, if so, sending an interrupt signal to the counter module, and outputting a voice indicating signal to the minimum spectrum energy calculating module;
the minimum spectrum energy calculating module is used for outputting the local minimum spectrum energy of the current frame signal obtained according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal to the noise detecting module when the counting value output by the counter module and the predetermined constant do not meet the fixed proportional relation; when the count value output by the counter module and a preset constant satisfy a fixed proportional relation and a voice indication signal output by the voice prediction judgment module is not received, outputting the obtained local minimum spectrum energy of the current frame signal to the noise detection module according to the temporary minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal; when receiving a voice indication signal output by the voice prediction judgment module, outputting the obtained local minimum spectrum energy of the current frame signal to the noise detection module according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal;
and the noise detection module is used for judging whether the current frame input signal is pure noise or not according to the frequency spectrum energy output by the frequency spectrum energy calculation module and the local minimum frequency spectrum energy output by the minimum frequency spectrum energy calculation module.
The device further comprises a voice existence probability calculation module for calculating the voice existence probability of the current frame signal according to the pure noise indication signal or the non-pure noise indication signal output by the noise detection module,
the noise detection module is further configured to output a pure noise indication signal to the speech existence probability calculation module when the current frame signal is pure noise; and when the current frame signal is not pure noise, outputting a non-pure noise indication signal to a voice existence probability calculation module.
The voice existence probability calculation module is further used for outputting the voice existence probability of the current frame signal to the voice prediction judgment module,
the voice prediction judging module is further used for receiving and storing the voice existence probability of the current frame signal output by the voice existence probability calculating module, judging whether the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than a constant stored by the voice prediction judging module when the current counting value and the preset constant meet a fixed proportional relation, and outputting a voice indicating signal to the minimum spectrum energy calculating module if the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than the constant stored by the voice prediction judging module.
The device further comprises a noise spectrum energy calculating module which is used for calculating the noise spectrum energy of the current frame signal according to the voice existence probability of the current frame signal output by the voice existence probability calculating module and the spectrum energy of the current frame signal output by the spectrum energy calculating module.
The voice prediction judging module comprises: the speech existence posterior probability calculation module and the judgment module are provided, wherein:
the voice existence posterior probability calculation module is used for receiving the signals, calculating the posterior probability of the voice existing on the previous frame of signals when the current count value output by the counter module and a preset constant meet a fixed proportional relation, and outputting the posterior probability of the voice existing on the previous frame of signals to the judgment module;
and the judging module is used for judging whether the posterior probability of the voice existing on the previous frame signal output by the voice posterior probability calculating module is greater than a constant stored by the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
The voice prediction judging module comprises: the signal to noise ratio calculating module and judging module a priori, wherein:
the prior signal-to-noise ratio calculation module is used for receiving the signal, calculating the prior signal-to-noise ratio of the current frame signal when the current count value output by the counter module and a preset constant meet a fixed proportional relation, and outputting the prior signal-to-noise ratio of the previous frame signal to the judgment module;
and the judging module is used for judging whether the prior signal-to-noise ratio of the previous frame signal output by the prior signal-to-noise ratio calculating module is greater than a constant stored in the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
A noise detection method, which performs the following steps for all signal frames at each frequency point, the method comprising:
A. receiving a current frame signal, increasing the count value by one step, calculating and storing the frequency spectrum energy of the current frame signal, judging whether the current count value and a preset constant meet a fixed proportional relation, and if not, executing the step B; otherwise, judging whether the current frame signal is predicted as voice, if so, subtracting a step length from the current count value and executing the step B; if not, executing step C;
B. calculating and storing the local minimum spectral energy of the current frame signal according to the local minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal; calculating and storing the temporary minimum spectral energy of the current frame signal according to the temporary minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal, and then executing the step D;
C. calculating and storing local minimum spectrum energy of the current frame signal according to the temporary minimum spectrum energy of the previous frame signal and the energy of the current frame signal; calculating and storing the temporary minimum spectral energy of the current frame signal according to the spectral energy of the current frame signal, and then executing the step D;
D. and judging whether the ratio of the spectral energy of the current frame signal to the local minimum spectral energy is smaller than a preset value, if so, judging that the current frame signal is pure noise, and otherwise, judging that the current frame signal is not the pure noise.
The method further comprises the following steps: presetting a posterior probability of voice existence;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the posterior probability of the voice existing on the previous frame signal, then judging whether the posterior probability is greater than the posterior probability of the preset voice, if so, judging that the current frame signal is predicted as the voice; otherwise, the current frame signal is judged not to be predicted as voice.
The method further comprises the following steps: presetting a speech existence prior signal-to-noise ratio;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the prior signal-to-noise ratio of the voice existing on the current frame signal, then judging whether the prior signal-to-noise ratio is larger than the predetermined voice existing prior signal-to-noise ratio, if so, judging that the current frame signal is predicted as the voice; otherwise, the current frame signal is judged not to be predicted as voice.
The method further comprises the following steps: presetting a voice existence probability;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the probability of the existence of voice on the previous frame signal, judging whether the probability of the existence of voice on the previous frame signal is greater than the preset voice existence probability, and if so, judging that the current frame signal is predicted as voice; otherwise, the current frame signal is judged not to be predicted as voice.
The step B specifically comprises the following steps:
comparing the local minimum spectrum energy of the previous frame signal with the spectrum energy of the current frame signal, taking the small one of the local minimum spectrum energy and the current frame signal as the local minimum spectrum energy of the current frame signal, and storing the local minimum spectrum energy of the current frame signal; and D, comparing the temporary minimum spectral energy of the previous frame signal with the spectral energy of the current frame signal, taking the smallest of the temporary minimum spectral energy of the previous frame signal and the current frame signal as the temporary minimum spectral energy of the current frame signal, storing the temporary minimum spectral energy of the current frame signal, and then executing the step D.
The step C is specifically as follows:
comparing the temporary minimum spectrum energy of the previous frame signal with the spectrum energy of the current frame signal, taking the smallest of the temporary minimum spectrum energy and the spectrum energy of the current frame signal as the local minimum spectrum energy of the current frame signal, and storing the local minimum spectrum energy of the current frame signal; and D, taking the spectral energy of the current frame signal as the temporary minimum spectral energy of the current frame signal, storing the temporary minimum spectral energy of the current frame signal, and then executing the step D.
The technical scheme of the invention avoids the speech from being estimated as noise by mistake, and improves the precision of noise detection. The invention is applied to the noise spectrum estimation and noise reduction processing of the noisy signals, can effectively estimate the noise spectrum and effectively eliminate the noise.
Drawings
FIG. 1 is a flow chart of noise detection in the prior art;
FIG. 2 is a flow chart of noise detection provided by the present invention;
FIG. 3 is a block diagram of a first embodiment of noise detection provided by the present invention;
FIG. 4 is a block diagram of a second embodiment of noise detection provided by the present invention;
FIG. 5 is a block diagram of a first embodiment of a speech prediction module according to the present invention;
FIG. 6 is a block diagram of a second embodiment of the speech prediction module according to the present invention;
FIG. 7 is a block diagram of a third embodiment of noise detection provided by the present invention;
FIG. 8 is a block diagram of a fourth embodiment of noise detection provided by the present invention;
FIG. 9 is a block diagram of a third embodiment of the speech prediction module of the present invention;
fig. 10 is a block diagram of the voice prediction determining module according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 2 is a flow chart of noise detection according to the present invention, and as shown in fig. 2, the specific steps are as follows:
step 201: and setting a counter at a frequency point i, adding 1 to the counter value every time one frame of input signal is received, setting the current counter value as F [ i ], and calculating and storing the frequency spectrum energy S [ i, n ] of the current frame of input signal at the frequency point i.
Where n is an integer and represents the total frame number of the current input signal, S [ i, n ] can be calculated from the spectral energy Y [ i, n ] of the current input signal at frequency point i and equations (1) and (2).
Step 202: judging whether F [ i ] is an integral multiple of a preset constant L, if so, executing step 203; otherwise, step 204 is performed.
The value of L can be determined empirically, for example: at a sampling rate of 8 kbit/s, L is typically 60.
Step 203: calculated and saved according to the formula (3)Local minimum spectral energy S of current frame input signalmin[i,n]And temporary minimum spectral energy Stmp[i,n]Namely: comparing the local minimum spectrum energy of the previous frame input signal of the current frame at the frequency point i with the energy of the current frame input signal, and taking the smaller one as the local minimum spectrum energy S of the current frame input signalmin[i,n]Simultaneously comparing the temporary minimum spectrum energy of the previous frame input signal of the current frame at the frequency point i with the spectrum energy of the current frame input signal, and taking the smaller of the temporary minimum spectrum energy and the spectrum energy as the temporary minimum spectrum energy S of the current frame input signaltmp[i,n]Then, step 207 is performed.
Step 204: judging whether the current frame input signal is predicted as voice, if yes, executing step 205; otherwise, step 206 is performed.
Judging whether the current frame input signal is predicted as speech can be done in one of three ways:
the first method is as follows: calculating posterior probability p [ i, n-1] of a previous frame of input signals of the current frame with voice at a frequency point i, then judging whether p [ i, n-1] is equal to or more than C1, and if yes, judging that the input signals of the current frame are predicted as voice at the frequency point i; otherwise, the input signal of the current frame is not predicted as voice at the frequency point i. Here, C1 is a constant and satisfies: 0< C1<1, specific values can be determined empirically; n-1 represents a frame previous to the current frame.
The second method comprises the following steps: calculating the prior signal-to-noise ratio xi [ i, n ] of the current frame input signal, judging whether xi [ i, n ] is more than or equal to C2, and if yes, judging that the current frame input signal is predicted as voice on a frequency point i; otherwise, the input signal of the current frame is not predicted as voice at the frequency point i. Here, C2 is a constant that normally satisfies: 0< C2<10, specific values can be determined empirically.
The third method comprises the following steps: calculating the probability of the speech existing on the frequency point i of the input signal of the previous frame of the current frame
Figure C200510130167D00191
Judgment of <math> <mrow> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> </mrow> <mo>&GreaterEqual;</mo> <mi>C</mi> <mn>3</mn> </mrow></math> Judging whether the input signal of the current frame is predicted as voice on a frequency point i if the input signal of the current frame is true; otherwise, the input signal of the current frame is not predicted as voice at the frequency point i. Here, C3 is a constant and satisfies: 0<C3<1, the specific value may be determined empirically.
Here, ,
Figure C200510130167D00193
this can be obtained from equation (5): <math> <mrow> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> </mrow> <mo>=</mo> <msub> <mi>&alpha;</mi> <mi>p</mi> </msub> <mover> <mi>p</mi> <mo>~</mo> </mover> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>2</mn> <mo>]</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&alpha;</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mi>I</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> </mrow> <mo>,</mo> </mrow></math> wherein,is the probability of speech being present on the input signal of the first two frames of the current frame, I [ I, n-1]]The values of (A) are as follows: in that <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mi>l</mi> <mo>]</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mi>l</mi> <mo>]</mo> </mrow> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> When it is established, I [ I, n-1]]0; in that <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mi>l</mi> <mo>]</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mi>l</mi> <mo>]</mo> </mrow> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> If not, I [ I, n-1]]1, wherein S [ i, n-1]For the spectral energy of the input signal of the frame preceding the current frame, Smin[i,n-1]The local minimum spectral energy of the input signal for a frame preceding the current frame.
Step 205: calculating and saving S according to formula (3)min[i,n]And Stmp[i,n]And with (F [ i ]]-1) updating F [ i]Then, step 207 is performed.
Step 206: calculating and saving S according to formula (4)min[i,n]And Stmp[i,n]Namely: comparing the temporary minimum spectral energy of the input signal of the previous frame of the current frame at the frequency point i with the spectral energy of the input signal of the current frame, and taking the smaller one as the local minimum spectral energy S of the input signal of the current framemin[i,n]Meanwhile, the frequency spectrum energy of the current frame input signal is used as the temporary minimum frequency spectrum energy S of the current frame input signaltmp[i,n]Then, step 207 is performed.
Step 207: judgment of <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> If yes, judging that the input signal of the current frame is pure noise; otherwise, the input signal of the current frame is judged not to be pure noise.
The value of δ is determined empirically and typically satisfies: δ > 1.
If it is <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> If true, then I [ I, n)]0; if it is <math> <mrow> <mfrac> <mrow> <mi>S</mi> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> <mrow> <msub> <mi>S</mi> <mi>min</mi> </msub> <mrow> <mo>[</mo> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mo>]</mo> </mrow> </mrow> </mfrac> <mo>&lt;</mo> <mi>&delta;</mi> </mrow></math> If the condition is not satisfied,I[i,n]after that, the probability that the speech exists on the frequency point i in the current frame input signal can be calculated according to the formula (5)
Figure C200510130167D00205
Then, the noise spectrum energy lambda of the current frame input signal at the frequency point i can be obtained by calculation according to the formula (6)d[i,n]。
It should be noted that, in a specific application, the processing in steps 201 to 207 above is performed for each frame of input signal at frequency point i.
Fig. 3 is a block diagram of a noise detection apparatus provided in the present invention, as shown in fig. 3, which mainly includes:
spectral energy calculation module 31: for receiving the input signal, calculating the spectral energy of the current frame input signal at the current frequency point, and outputting the spectral energy to the minimum spectral energy calculation module 33 and the noise detection module 34.
The voice prediction judging module 32: the device is used for receiving an input signal, setting a counter on the current frequency point, adding 1 to the count value of the counter of the current module when receiving a frame of input signal on the current frequency point, judging whether the input signal of the current frame is predicted as voice when the count value of the counter of the current module is integral multiple of a preset constant L, if so, subtracting one from the count value of the counter of the current module, and outputting a voice indication signal to the minimum spectral energy calculation module 33.
Minimum spectral energy calculation module 33: the device is used for receiving the spectrum energy of the current frame input signal output by the spectrum energy calculation module 31 at the current frequency point, setting a counter at the current frequency point, adding 1 to the count value of the counter when receiving a spectrum energy value at the current frequency point, calculating the local minimum spectrum energy and the temporary minimum spectrum energy of the current frame input signal at the current frequency point according to a formula (3) when the count value of the counter of the current module is not an integral multiple of L, and outputting the obtained local minimum spectrum energy to the noise detection module 34; when the counter value of the current self module is an integral multiple of L and the voice indication signal output by the voice prediction judgment module 32 is not received, the local minimum spectral energy and the temporary minimum spectral energy of the current frame input signal at the current frequency point are calculated according to the formula (4), and the obtained local minimum spectral energy is output to the noise detection module 34; when receiving the voice indication signal output by the voice prediction judgment module 32, the local minimum spectral energy and the temporary minimum spectral energy of the current frame input signal at the current frequency point are calculated according to the formula (3), the counter value of the current self module is reduced by one, and the obtained local minimum spectral energy is output to the noise detection module 34.
The noise detection module 34: and is configured to receive the spectrum energy of the current frame input signal at the current frequency point output by the spectrum energy calculation module 31, determine whether the current frame input signal is pure noise at the current frequency point according to the spectrum energy and the local minimum spectrum energy output by the minimum spectrum energy calculation module 33, and output the determination result to the outside.
From the above, it can be seen that: the counter values of the counters of the speech prediction judging module 32 and the minimum spectral energy calculating module 33 are kept consistent.
Further, as shown in fig. 4, the apparatus includes a speech existence probability calculation module 35 for calculating a constant α stored in itself based on the signal value output from the noise detection module 34pAnd calculating the voice existence probability of the current frame input signal at the current frequency point, and outputting the voice existence probability to the outside.
The noise detection module 34 is further configured to output a signal 0 to the speech existence probability calculation module 35 when the current frame input signal is determined to be pure noise at the current frequency point, and output a signal 1 to the speech existence probability calculation module 35 when the current frame input signal is determined not to be pure noise at the current frequency point.
Further, as shown in fig. 4, the apparatus includes: noise spectral energy calculation module 36: for calculating the noise spectrum energy of the current frame input signal at the current frequency point according to the speech existence probability of the current frame input signal at the current frequency point output by the speech existence probability calculation module 35 and the spectrum energy of the current frame input signal at the current frequency point output by the spectrum energy calculation module 31, and outputting the noise spectrum energy to the outside.
Further, as shown in fig. 4, the speech existence probability calculating module 35 is further configured to output the obtained speech existence probability of the current frame signal at the current frequency point to the speech prediction judging module 32,
meanwhile, the speech prediction determining module 32 is further configured to receive and store the speech existence probability of the current frame signal output by the speech existence probability calculating module 35 at the current frequency point, and when the counter value of the current self-module is an integer multiple of L, determine whether the speech existence probability of the previous frame input signal of the current frame stored by the self-module at the current frequency point is greater than a self-stored constant C3, and if so, output a speech indication signal to the minimum spectral energy calculating module 33.
Alternatively, as shown in fig. 5, the speech prediction determining module 32 includes: a speech existence posterior probability calculating module 3211 and a judging module 3212, wherein:
the speech existence posterior probability calculation module 3211: the device is used for receiving input signals, setting a counter on a current frequency point, adding 1 to the count value of the counter when receiving a frame of input signals on the current frequency point, calculating the posterior probability of the voice existing on the current frequency point of the previous frame of input signals of the current frame when the count value of the counter of the current module is integral multiple of L, outputting the posterior probability to a judging module 3212, and subtracting one from the count value of the counter of the current module after receiving a voice indicating signal sent by the judging module 3212.
The count value of the counter of the speech existence posterior probability calculation module 3211 is consistent with the count values of the counters of the speech prediction judgment module 32 and the minimum spectral energy calculation module 33.
The judging module 3212: it is used to judge whether the posterior probability output by the module 3211 is greater than the constant C1 stored in itself, if so, it outputs a voice indication signal to the module 33 for calculating the minimum spectral energy and the module 3211 for calculating the posterior probability.
Alternatively, as shown in fig. 6, the voice determination module 32 includes: a priori signal-to-noise ratio calculating module 3221 and a judging module 3222, wherein:
the prior snr calculating module 3221: the device is configured to receive an input signal, set a counter at a current frequency point, add 1 to a count value of the counter at each frame of received input signal at the current frequency point, calculate a priori signal-to-noise ratio of the current frame of input signal at the current frequency point when the count value of the counter of the current module is an integer multiple of L, output the priori signal-to-noise ratio to the determining module 3222, and subtract one from the count value of the counter of the current module after receiving a voice indication signal sent by the determining module 3222.
The count value of the counter of the prior snr calculating module 3221 is consistent with the count values of the counters of the speech prediction determining module 32 and the minimum spectral energy calculating module 33.
The judging module 3222: the module 3221 is configured to determine whether the prior snr of the current frame input signal output by the prior snr calculating module 3221 at the current frequency point is greater than a constant C2 stored in the module, and if so, output a voice indication signal to the minimum spectral energy calculating module 33 and the prior snr calculating module 3221.
Fig. 7 is a block diagram of a third embodiment of noise detection provided by the present invention, as shown in fig. 7, which mainly includes:
the counter module 40: the device is used for counting the number of frames of input signals on the current frequency point, adding 1 to the count value when receiving one frame of input signals, outputting the frame of input signals to the spectrum energy calculation module 41 and the voice prediction judgment module 42, and outputting the count value to the voice prediction judgment module 42 and the minimum spectrum energy calculation module 43; and decrements the current count value by 1 upon receipt of the interrupt signal output by the speech prediction determination module 42.
Spectral energy calculation module 41: for receiving the current frame input signal at the current frequency point output by the counter module 40, calculating the spectral energy of the current frame input signal at the current frequency point, and outputting the spectral energy to the minimum spectral energy calculation module 43 and the noise detection module 44.
The speech prediction determination module 42: the device is configured to receive a current frame input signal at a current frequency point output by the counter module 40, determine whether the current frame input signal is predicted to be speech when a count value output by the counter module 40 is an integer multiple of a predetermined constant L, send an interrupt signal to the counter module 40 if the current frame input signal is predicted to be speech, and output a speech indication signal to the minimum spectral energy calculation module 43.
Minimum spectral energy calculation module 43: the spectrum energy calculating module is configured to receive the spectrum energy of the current frame input signal output by the spectrum energy calculating module 41 at the current frequency point, calculate, according to formula (3), the local minimum spectrum energy and the temporary minimum spectrum energy of the current frame input signal at the current frequency point when the count value output by the counter module 40 is not an integer multiple of L, and output the obtained local minimum spectrum energy to the noise detecting module 44; when the count value output by the counter module 40 is an integer multiple of L and the speech indication signal output by the speech prediction determination module 42 is not received, the local minimum spectral energy and the temporary minimum spectral energy of the current frame input signal at the current frequency point are calculated according to the formula (4), and the obtained local minimum spectral energy is output to the noise detection module 44; when receiving the voice indication signal output by the voice prediction judgment module 42, the local minimum spectral energy and the temporary minimum spectral energy of the current frame input signal at the current frequency point are calculated according to formula (3), and the obtained local minimum spectral energy is output to the noise detection module 44.
The noise detection module 44: the spectrum energy calculating module 41 is configured to receive the spectrum energy of the current frame input signal at the current frequency point output by the spectrum energy calculating module 41, determine whether the current frame input signal is pure noise at the current frequency point according to the spectrum energy and the local minimum spectrum energy output by the minimum spectrum energy calculating module 43, and output the determination result to the outside.
Fig. 8 is a block diagram of a fourth apparatus according to a specific embodiment of noise detection provided by the present invention, as shown in fig. 8, the apparatus further includes, compared with fig. 7:
a speech existence probability calculation module 45 for calculating a constant α stored in itself according to the signal value output from the noise detection module 44pAnd calculating the voice existence probability of the current frame input signal at the current frequency point, and outputting the voice existence probability to the outside.
The noise detection module 44 is further configured to output a signal 0 to the speech existence probability calculation module 45 when the current frame input signal is determined to be pure noise at the current frequency point, and output a signal 1 to the speech existence probability calculation module 45 when the current frame input signal is determined not to be pure noise at the current frequency point.
As shown in fig. 8, the apparatus further comprises: noise spectral energy calculation module 46: for calculating the noise spectrum energy of the current frame input signal at the current frequency point according to the speech existence probability of the current frame input signal at the current frequency point output by the speech existence probability calculation module 45 and the spectrum energy of the current frame input signal at the current frequency point output by the spectrum energy calculation module 41, and outputting the noise spectrum energy to the outside.
Further, as shown in fig. 8, the speech existence probability calculation module 45 is further configured to output the obtained speech existence probability of the current frame signal at the current frequency point to the speech prediction judgment module 42,
meanwhile, the speech prediction determining module 42 is further configured to receive and store the speech existence probability of the current frame signal output by the speech existence probability calculating module 45 at the current frequency point, and when the count value output by the counter module 40 is an integer multiple of L, determine whether the speech existence probability of the previous frame input signal of the current frame stored by itself at the current frequency point is greater than a self-stored constant C3, and if so, output a speech indication signal to the minimum spectral energy calculating module 43.
Alternatively, as shown in fig. 9, the speech prediction determining module 42 includes: a speech existence posterior probability calculation module 4211 and a judgment module 4212, wherein:
the speech existence posterior probability calculation module 4211: the module is configured to receive and store the current frame input signal at the current frequency point output by the counter module 40, calculate the posterior probability that the previous frame input signal of the current frame has speech at the current frequency point when the count value output by the counter module 40 is an integer multiple of L, and output the posterior probability to the determining module 4212.
The judgment module 4212: the module 4211 is used for judging whether the posterior probability output by the speech existence posterior probability calculating module 4211 is larger than a constant C1 stored by itself, and if so, outputting a speech indicating signal to the minimum spectral energy calculating module 43.
Alternatively, as shown in fig. 10, the speech prediction determining module 42 includes: a priori signal-to-noise ratio calculation module 4221 and a judgment module 4222, wherein:
a priori signal-to-noise ratio calculation module 4221: the module is configured to receive and store a current frame input signal at a current frequency point output by the counter module 40, calculate a priori signal-to-noise ratio of the current frame input signal at the current frequency point when a count value output by the counter module 40 is an integer multiple of L, and output the priori signal-to-noise ratio to the determining module 4222.
The judgment module 4222: the module is configured to determine whether the prior snr of the current frame input signal output by the prior snr computing module 4221 at the current frequency point is greater than a constant C2 stored in the module, and if so, output a voice indication signal to the minimum spectral energy computing module 43.
In the present invention, every time a frame of input signal is received at the current frequency point, a preset step length may be added to the count value, and in the above specific embodiment, the step length is 1; in addition, in the present invention, it is determined whether the count value F [ i ] and the predetermined constant L satisfy a fixed proportional relationship, and in the above-mentioned embodiment, the fixed proportional relationship is an integer.
In practice, the noise detection apparatus and method provided by the present invention can be applied to noisy signals such as: noise spectrum estimation of a speech signal or the like, and noise reduction processing.
The above-described embodiments of the process and method are merely exemplary and not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (18)

1. A noise detection apparatus, characterized in that the apparatus comprises:
the spectrum energy calculation module is used for receiving the signals and outputting the calculated spectrum energy of the current frame signal to the minimum spectrum energy calculation module and the noise detection module;
the voice prediction judging module is used for receiving the signal, increasing the count value of the voice prediction judging module by one step when a frame of signal is received at the current frequency point, judging whether the current frame of signal is predicted as voice or not when the current count value of the voice prediction judging module and a preset constant meet a fixed proportional relation, if so, reducing the current count value of the voice prediction judging module by one step, and outputting a voice indicating signal to the minimum spectrum energy calculating module;
the minimum spectral energy calculating module is used for receiving the spectral energy of the current frame signal output by the spectral energy calculating module, adding a step length to the counting value of the minimum spectral energy calculating module when a spectral energy value is received at the current frequency point, and outputting the obtained local minimum spectral energy of the current frame signal to the noise detecting module according to the local minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal when the current counting value of the minimum spectral energy calculating module and a preset constant do not meet a fixed proportional relation; when the current count value of the minimum spectral energy calculation module and a preset constant meet a fixed proportional relation and a voice indication signal output by the voice prediction judgment module is not received, outputting the local minimum spectral energy of the current frame signal obtained according to the temporary minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal to a noise detection module; when receiving a voice indication signal output by a voice prediction judgment module, outputting the obtained local minimum spectrum energy of the current frame signal to a noise detection module according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal, and subtracting a step length from the current count value of a minimum spectrum energy calculation module;
and the noise detection module is used for judging whether the current frame signal is pure noise or not according to the spectral energy of the current frame signal output by the spectral energy calculation module and the local minimum spectral energy of the current frame signal output by the minimum spectral energy calculation module.
2. The apparatus of claim 1, wherein the apparatus further comprises a speech existence probability calculation module for calculating a speech existence probability of the current frame signal based on the clean noise indication signal or the non-clean noise indication signal output from the noise detection module,
the noise detection module is further configured to output a pure noise indication signal to the speech existence probability calculation module when the current frame signal is pure noise; and when the current frame signal is not pure noise, outputting a non-pure noise indication signal to a voice existence probability calculation module.
3. The apparatus of claim 2, wherein the speech existence probability calculation module is further for outputting the speech existence probability of the current frame signal to a speech prediction judgment module,
the voice prediction judging module is further used for receiving and storing the voice existence probability of the current frame signal output by the voice existence probability calculating module, judging whether the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than the constant stored by the voice prediction judging module or not when the current counting value and the preset constant meet the fixed proportional relation, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
4. The apparatus of claim 2 or 3, further comprising a noise spectral energy calculating module for calculating the noise spectral energy of the current frame signal according to the speech existence probability of the current frame signal output by the speech existence probability calculating module and the spectral energy of the current frame signal output by the spectral energy calculating module.
5. The apparatus of claim 1, wherein the speech prediction determination module comprises: the speech existence posterior probability calculation module and the judgment module are provided, wherein:
the voice existence posterior probability calculation module is used for receiving signals, the counting value of the voice existence posterior probability calculation module is increased by one step when a frame of signal is received at the current frequency point, when the current counting value of the voice existence posterior probability calculation module and a preset constant meet a fixed proportional relation, the posterior probability of voice existence on the previous frame of signal is calculated, the posterior probability of voice existence on the previous frame of signal is output to the judgment module, and the voice existence posterior probability calculation module is used for subtracting the current counting value of the voice existence posterior probability calculation module by one step after the voice indication signal output by the judgment module is received;
and the judging module is used for judging whether the posterior probability of the voice existing on the previous frame signal output by the voice existence posterior probability calculating module is greater than the constant stored by the judging module or not, and if so, outputting a voice indicating signal to the minimum spectral energy calculating module and the voice existence posterior probability calculating module.
6. The apparatus of claim 1, wherein the speech prediction determination module comprises: the signal to noise ratio calculating module and judging module a priori, wherein:
the prior signal-to-noise ratio calculation module is used for receiving signals, the counting value of the prior signal-to-noise ratio calculation module is increased by one step when a frame of signal is received at the current frequency point, the prior signal-to-noise ratio of the current frame of signal is calculated when the current counting value of the prior signal-to-noise ratio calculation module and a preset constant meet a fixed proportional relation, the prior signal-to-noise ratio of the current frame of signal is output to the judgment module, and the prior signal-to-noise ratio calculation module is used for reducing the current counting value of the prior signal-to-noise ratio calculation module by one step;
and the judging module is used for judging whether the prior signal-to-noise ratio of the previous frame signal output by the prior signal-to-noise ratio calculating module is greater than the constant stored by the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module and the prior signal-to-noise ratio calculating module.
7. A noise detection apparatus, characterized in that the apparatus comprises:
the counter module is used for counting the number of frames of signals on the current frequency point, adding a step length to the count value when receiving a frame of signal, outputting the frame of signal to the spectrum energy calculation module and the voice prediction judgment module, and outputting the count value to the voice prediction judgment module and the minimum spectrum energy calculation module; when receiving the interrupt signal output by the voice prediction judging module, reducing the current count value by one step length;
the frequency spectrum energy calculating module is used for calculating the frequency spectrum energy of the signal output by the counter module and outputting the frequency spectrum energy to the minimum frequency spectrum energy calculating module and the noise detecting module;
the voice prediction judging module is used for judging whether the signal output by the counter module is predicted to be voice or not when the count value output by the counter module and a preset constant meet a fixed proportional relation, if so, sending an interrupt signal to the counter module, and outputting a voice indicating signal to the minimum spectrum energy calculating module;
the minimum spectrum energy calculating module is used for outputting the local minimum spectrum energy of the current frame signal obtained according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal to the noise detecting module when the counting value output by the counter module and the predetermined constant do not meet the fixed proportional relation; when the count value output by the counter module and a preset constant satisfy a fixed proportional relation and a voice indication signal output by the voice prediction judgment module is not received, outputting the obtained local minimum spectrum energy of the current frame signal to the noise detection module according to the temporary minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal; when receiving a voice indication signal output by the voice prediction judgment module, outputting the obtained local minimum spectrum energy of the current frame signal to the noise detection module according to the local minimum spectrum energy of the previous frame signal and the spectrum energy of the current frame signal;
and the noise detection module is used for judging whether the current frame input signal is pure noise or not according to the frequency spectrum energy output by the frequency spectrum energy calculation module and the local minimum frequency spectrum energy output by the minimum frequency spectrum energy calculation module.
8. The apparatus of claim 7, wherein the apparatus further comprises a speech existence probability calculation module for calculating a speech existence probability of the current frame signal based on the clean noise indication signal or the non-clean noise indication signal output from the noise detection module,
the noise detection module is further configured to output a pure noise indication signal to the speech existence probability calculation module when the current frame signal is pure noise; and when the current frame signal is not pure noise, outputting a non-pure noise indication signal to a voice existence probability calculation module.
9. The apparatus of claim 8, wherein the speech existence probability calculation module is further for outputting the speech existence probability of the current frame signal to a speech prediction judgment module,
the voice prediction judging module is further used for receiving and storing the voice existence probability of the current frame signal output by the voice existence probability calculating module, judging whether the voice existence probability of the previous frame signal stored by the voice prediction judging module is larger than the constant stored by the voice prediction judging module or not when the current counting value and the preset constant meet the fixed proportional relation, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
10. The apparatus according to claim 8 or 9, wherein the apparatus further comprises a noise spectral energy calculating module for calculating the noise spectral energy of the current frame signal based on the speech existence probability of the current frame signal output by the speech existence probability calculating module and the spectral energy of the current frame signal output by the spectral energy calculating module.
11. The apparatus of claim 7, wherein the speech prediction determination module comprises: the speech existence posterior probability calculation module and the judgment module are provided, wherein:
the voice existence posterior probability calculation module is used for receiving the signals, calculating the posterior probability of the voice existing on the previous frame of signals when the current count value output by the counter module and a preset constant meet a fixed proportional relation, and outputting the posterior probability of the voice existing on the previous frame of signals to the judgment module;
and the judging module is used for judging whether the posterior probability of the voice existing on the previous frame signal output by the voice existence posterior probability calculating module is greater than the constant stored by the judging module or not, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
12. The apparatus of claim 7, wherein the speech prediction determination module comprises: the signal to noise ratio calculating module and judging module a priori, wherein:
the prior signal-to-noise ratio calculation module is used for receiving the signal, calculating the prior signal-to-noise ratio of the current frame signal when the current count value output by the counter module and a preset constant meet a fixed proportional relation, and outputting the prior signal-to-noise ratio of the previous frame signal to the judgment module;
and the judging module is used for judging whether the prior signal-to-noise ratio of the previous frame signal output by the prior signal-to-noise ratio calculating module is greater than the constant stored by the judging module, and if so, outputting a voice indicating signal to the minimum spectrum energy calculating module.
13. A noise detection method, wherein the following steps are performed for all signal frames at each frequency bin, the method comprising:
A. receiving a current frame signal, increasing the count value by one step, calculating and storing the frequency spectrum energy of the current frame signal, judging whether the current count value and a preset constant meet a fixed proportional relation, and if not, executing the step B; otherwise, judging whether the current frame signal is predicted as voice, if so, subtracting a step length from the current count value and executing the step B; if not, executing step C;
B. calculating and storing the local minimum spectral energy of the current frame signal according to the local minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal; calculating and storing the temporary minimum spectral energy of the current frame signal according to the temporary minimum spectral energy of the previous frame signal and the spectral energy of the current frame signal, and then executing the step D;
C. calculating and storing local minimum spectrum energy of the current frame signal according to the temporary minimum spectrum energy of the previous frame signal and the energy of the current frame signal; calculating and storing the temporary minimum spectral energy of the current frame signal according to the spectral energy of the current frame signal, and then executing the step D;
D. and judging whether the ratio of the spectral energy of the current frame signal to the local minimum spectral energy is smaller than a preset value, if so, judging that the current frame signal is pure noise, and otherwise, judging that the current frame signal is not the pure noise.
14. The method of claim 13, further comprising: presetting a posterior probability of voice existence;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the posterior probability of the voice existing on the previous frame signal, then judging whether the posterior probability is greater than the posterior probability of the preset voice, if so, judging that the current frame signal is predicted as the voice; otherwise, the current frame signal is judged not to be predicted as voice.
15. The method of claim 13, further comprising: presetting a speech existence prior signal-to-noise ratio;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the prior signal-to-noise ratio of the voice existing on the current frame signal, then judging whether the prior signal-to-noise ratio is larger than the predetermined voice existing prior signal-to-noise ratio, if so, judging that the current frame signal is predicted as the voice; otherwise, the current frame signal is judged not to be predicted as voice.
16. The method of claim 13, further comprising: presetting a voice existence probability;
the step a of judging whether the current frame signal is predicted as speech specifically comprises:
calculating the probability of the existence of voice on the previous frame signal, judging whether the probability of the existence of voice on the previous frame signal is greater than the preset voice existence probability, and if so, judging that the current frame signal is predicted as voice; otherwise, the current frame signal is judged not to be predicted as voice.
17. The method according to claim 13, wherein the step B is specifically:
comparing the local minimum spectrum energy of the previous frame signal with the spectrum energy of the current frame signal, taking the small one of the local minimum spectrum energy and the current frame signal as the local minimum spectrum energy of the current frame signal, and storing the local minimum spectrum energy of the current frame signal; and D, comparing the temporary minimum spectral energy of the previous frame signal with the spectral energy of the current frame signal, taking the smallest of the temporary minimum spectral energy of the previous frame signal and the current frame signal as the temporary minimum spectral energy of the current frame signal, storing the temporary minimum spectral energy of the current frame signal, and then executing the step D.
18. The method according to claim 13, wherein step C is specifically:
comparing the temporary minimum spectrum energy of the previous frame signal with the spectrum energy of the current frame signal, taking the smallest of the temporary minimum spectrum energy and the spectrum energy of the current frame signal as the local minimum spectrum energy of the current frame signal, and storing the local minimum spectrum energy of the current frame signal; and D, taking the spectral energy of the current frame signal as the temporary minimum spectral energy of the current frame signal, storing the temporary minimum spectral energy of the current frame signal, and then executing the step D.
CNB2005101301670A 2005-12-19 2005-12-19 Apparatus and method for detecting noise Expired - Fee Related CN100492495C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005101301670A CN100492495C (en) 2005-12-19 2005-12-19 Apparatus and method for detecting noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005101301670A CN100492495C (en) 2005-12-19 2005-12-19 Apparatus and method for detecting noise

Publications (2)

Publication Number Publication Date
CN1787079A CN1787079A (en) 2006-06-14
CN100492495C true CN100492495C (en) 2009-05-27

Family

ID=36784497

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101301670A Expired - Fee Related CN100492495C (en) 2005-12-19 2005-12-19 Apparatus and method for detecting noise

Country Status (1)

Country Link
CN (1) CN100492495C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440870A (en) * 2013-08-16 2013-12-11 北京奇艺世纪科技有限公司 Method and device for voice frequency noise reduction
CN103632681B (en) * 2013-11-12 2016-09-07 广州海格通信集团股份有限公司 A kind of spectral envelope silence detection method
CN105374367B (en) * 2014-07-29 2019-04-05 华为技术有限公司 Abnormal frame detection method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device
CN106531180B (en) * 2016-12-10 2019-09-20 广州酷狗计算机科技有限公司 Noise detecting method and device

Also Published As

Publication number Publication date
CN1787079A (en) 2006-06-14

Similar Documents

Publication Publication Date Title
Davis et al. Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
CN103650040B (en) Use the noise suppressing method and device of multiple features modeling analysis speech/noise possibility
KR100944252B1 (en) Detection of voice activity in an audio signal
EP1547061B1 (en) Multichannel voice detection in adverse environments
JP3963850B2 (en) Voice segment detection device
JP5596039B2 (en) Method and apparatus for noise estimation in audio signals
AU696152B2 (en) Spectral subtraction noise suppression method
US20040078199A1 (en) Method for auditory based noise reduction and an apparatus for auditory based noise reduction
JP6412132B2 (en) Voice activity detection method and apparatus
EP2619753B1 (en) Method and apparatus for adaptively detecting voice activity in input audio signal
US8515098B2 (en) Noise suppression device and noise suppression method
CN102137194B (en) Call detection method and device
JP3961290B2 (en) Noise suppressor
JP2007179073A (en) Voice activity detecting device, mobile station, and voice activity detecting method
SE501981C2 (en) Method and apparatus for discriminating between stationary and non-stationary signals
CN100492495C (en) Apparatus and method for detecting noise
JP4551817B2 (en) Noise level estimation method and apparatus
US20120265526A1 (en) Apparatus and method for voice activity detection
US20050171769A1 (en) Apparatus and method for voice activity detection
EP1229517B1 (en) Method for recognizing speech with noise-dependent variance normalization
KR100798056B1 (en) Speech processing method for speech enhancement in highly nonstationary noise environments
JP3279254B2 (en) Spectral noise removal device
Beritelli et al. A low‐complexity speech‐pause detection algorithm for communication in noisy environments
US20020147585A1 (en) Voice activity detection
JP3244252B2 (en) Background noise average level prediction method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090527

Termination date: 20111219