CN114724574A - Double-microphone noise reduction method with adjustable expected sound source direction - Google Patents

Double-microphone noise reduction method with adjustable expected sound source direction Download PDF

Info

Publication number
CN114724574A
CN114724574A CN202210157383.8A CN202210157383A CN114724574A CN 114724574 A CN114724574 A CN 114724574A CN 202210157383 A CN202210157383 A CN 202210157383A CN 114724574 A CN114724574 A CN 114724574A
Authority
CN
China
Prior art keywords
noise
omega
signal
calculating
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210157383.8A
Other languages
Chinese (zh)
Inventor
赵清颖
陈喆
殷福亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202210157383.8A priority Critical patent/CN114724574A/en
Publication of CN114724574A publication Critical patent/CN114724574A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a double-microphone noise reduction method with adjustable expected sound source direction, which comprises the following steps: preprocessing the noisy signal x received by the dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω); a beam forming process, introducing a virtual microphone at the middle point of the connection line of the two microphones, and performing frequency domain signal X according to a central difference format1(omega) and X2(omega) performing a differential transformation to construct a differential signal Y1(omega) and Y2(ω). Calculating a difference signal Y1(omega) and Y2(ω) and the ratio of the statistical averages is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed,it is directly mapped to the noise masking value λ (ω) by the normalization function. Mixing X1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω); post-wiener filtering process, for R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function to further eliminate R1Residual noise in (ω).

Description

Double-microphone noise reduction method with adjustable expected sound source direction
Technical Field
The invention relates to the technical field of voice signal noise reduction, in particular to a double-microphone noise reduction method with adjustable expected sound source direction.
Background
Portable devices such as bluetooth headsets have become good tools for improving efficiency in daily life, but when users make or receive calls with the portable devices, if the users are interfered by background noise, voice in a non-target direction, and the like, the call quality is rapidly reduced. In this case, it is desirable to keep the speech close to the speaking direction of the user and suppress the background noise and the speech in the non-target direction as much as possible while ensuring no distortion of the speech.
Existing generalized side lobe cancellers (GSCs) and delay beamformers use multiple microphone recorded signals for spatial filtering. For portable devices such as bluetooth headsets, GSCs are too complex to be able to handle the capabilities of micro devices. Delay beamforming techniques such as the first-order differential microphone (FDM) and adaptive null-forming (ANF) require only two microphones, which are suitable settings for size limitation and real-time processing. However, this fixed beamformer has a maximum gain at 0 ° and a null at 180 °, and cannot eliminate noise in directions other than the null. Algorithms based on coherence functions between input signals discuss the nature of the real and imaginary parts of the coherence function to produce different means of masking noise. The coherent function based approach does not rely on noise statistics, but the target direction is not adjustable. A competing directional noise cancellation method for hearing aids combines spectral estimation and array beamforming to suppress noise. The directivity coefficients are estimated in the pure noise interval and updated to adapt to the mobile noise. Similarly, this method can set the desired direction only to a limited range around 0 °. Since the position of the sound source is sometimes not constant, it is important to design a noise reduction algorithm with adjustable sound source direction in practical application.
In order to solve the problems that the sound in a non-target direction cannot be accurately eliminated when the beam forming is directly applied to a close-range double-microphone system, the direction of an expected sound source cannot be set according to requirements and the like, a two-step denoising method based on the beam forming and the wiener filtering is provided. The test result shows that: under the condition of low signal-to-noise ratio and coexistence of multiple types of noise sources, the method can effectively recover the energy distribution characteristics of the original signal, reduce background noise and non-target direction voice, and obviously improve the signal-to-noise ratio.
Disclosure of Invention
According to the problems in the prior art, the invention discloses a double-microphone noise reduction method with adjustable expected sound source direction, which specifically comprises the following steps:
the pretreatment process comprises the following steps: for noisy signal x received by dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω);
And (3) beam forming process: introducing a virtual microphone at the midpoint of the two-microphone line, and applying the frequency domain signal X according to the central difference format1(omega) and X2(omega) performing differential conversion to construct a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a statistical average of the power spectrum of the frequency domain signal X, recording a ratio of the statistical average as a directivity function Γ (ω, θ), analyzing a property of the directivity function Γ (ω, θ), directly mapping it as a noise masking value λ (ω) by a normalization function, and converting the frequency domain signal X into a frequency domain signal X1Multiplying (omega) by the noise masking value lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω);
Post wiener filtering process: to R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function, thereby eliminating signal R1Residual noise in (ω).
Further, the pretreatment process comprises the following steps:
will carry the noise signal x1(t) and x2(t) discrete sampling is carried out, and then pre-emphasis processing is carried out on the high-frequency part of the voice;
sampling signal x1(n) and x2(n) dividing the signals into frames with the length of 10ms, adding equal-length Hamming windows w (n), introducing the windowed signals into a buffer area for processing, obtaining the frequency domain signals of the current frame through short-time Fourier transform, and outputting the signals of the first 1/2 frequency points for beam forming processing according to the conjugate symmetry of real number sequence Fourier transform.
Further, the beamforming process includes amplitude alignment, power spectrum calculation, directivity function value calculation, threshold calculation, and normalized mapping;
the amplitude alignment mode is to the frequency domain signal X1(omega) and X2(ω) multiplying by a scaling factor respectively for amplitude alignment;
when calculating the power spectrum: assuming that the desired beam is S (omega) and the direction thereof is preset to be alpha, introducing a virtual microphone at the midpoint of the two-way microphone to receive the desired beam S (omega), and according to a central difference format and a frequency domain signal X1(ω)、X2The spatial relationship of (ω) to the desired beam S (ω) constructs a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a power spectrum;
when calculating the directional function value: wherein the differential signal Y1(omega) and Y2(ω) the ratio of the statistical average of the power spectra is the value of the directivity function Γ (ω, θ), which tends to infinity when the actual sound source incidence direction θ is equal to the given desired sound source incidence direction α, and which functions monotonically and approximately symmetrically on both sides of the α -axis, discussing the nature of Γ (ω, θ);
Figure BDA0003513328410000031
when calculating the threshold and normalizing the mapping: as gamma (omega, theta) tends to be infinite, a threshold omega is calculated according to a preset main lobe width theta and passes through a sigmoid functionNormalized mapping, directly mapping gamma (omega, theta) to noise masking value lambda (omega) of corresponding frequency point, and mapping X1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω)。
Figure BDA0003513328410000032
Figure BDA0003513328410000033
Further, the post-wiener filtering process comprises the steps of calculating a signal-to-noise ratio index, calculating a logarithmic spectrum deviation, modifying or resetting a noise flag and calculating a gain function value;
when calculating the signal-to-noise ratio index, the signal R is calculated1(omega) dividing the channel into a plurality of channels according to a critical bandwidth criterion, estimating the energy of each channel, initializing the channel noise energy estimation into the channel energy of the first four frames, and calculating a channel signal-to-noise ratio index according to the channel noise energy estimation;
when calculating the logarithmic spectrum deviation, designing a nonlinear data table as a voice index table, mapping the signal-to-noise ratio index into a group of numbers for measuring the voice quality, taking the sum of the voice indexes in a certain frequency range as the voice quality evaluation result of the current channel, taking the logarithm of the signal energy of the current channel, and calculating the deviation of the long-time logarithmic spectrum energy and the short-time logarithmic spectrum energy;
modifying or resetting the noise mark, judging whether the current frame is a voice frame or a noise frame according to the calculated voice index sum, the signal-to-noise ratio index and the log spectrum deviation parameter information, resetting the noise updating mark, checking the updating marks of the previous frames, and if the noise cannot be updated for a long time and the result is unreliable, forcibly updating the signal-to-noise ratio index;
when the gain function value is calculated, the channel signal-to-noise ratio index is used for calculating the channel gain value to remove residual background noise, and the noise energy estimation of the next frame is updated according to the result of the noise updating mark.
Due to the adoption of the technical scheme, the method for reducing the noise of the double microphones with the adjustable expected sound source direction, provided by the invention, comprises the steps of firstly calculating the ratio of the constructed statistical average value of the power spectrum of the differential signal as the value of a directional function after preprocessing signals of the double microphones, and then obtaining the masking value of noise through the mapping of a normalization function. Meanwhile, a wiener filter is installed in the next step, and residual noise is reduced by estimating a signal-to-noise ratio index and calculating a gain function; the algorithm provided by the invention is simple and efficient, and the signal-to-noise ratio and the quality of the voice interfered by the non-target sound in different noise scenes are obviously improved after the voice is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic overall view of the process of the present invention;
FIG. 2 is a schematic diagram of the pretreatment process of the present invention;
FIG. 3 is a schematic diagram of a beamforming process in the present invention;
FIG. 4 is a schematic view of the sound source propagation in the present invention;
FIG. 5 is a diagram illustrating directional functions in accordance with the present invention;
FIG. 6 is a diagram illustrating a post-wiener filtering process in accordance with the present invention;
FIG. 7 is a diagram showing the PESQ comparison results of the present invention with other noise reduction methods for a single noise source with different SNR;
FIG. 8 is a diagram showing the PESQ comparison result between the present invention and other noise reduction methods when multiple noise sources have different signal-to-noise ratios;
FIG. 9 is a graph of SegSNR comparison results for a single noise source with different SNR for the present invention and other noise reduction methods;
FIG. 10 is a graph of SegSNR comparison results for multiple noise sources with different SNR;
fig. 11 is a diagram illustrating the results of the present invention when the expected sound source directions are different.
Detailed Description
In order to make the technical solutions and advantages of the present invention clearer, the following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the drawings in the embodiments of the present invention:
fig. 1 shows a method for reducing noise with two microphones, where the direction of a desired sound source is adjustable, and in implementation, the method includes: a preprocessing process, a beam forming process and a post-wiener filtering process. The method disclosed by the invention comprises the following specific steps:
s1: preprocessing the noisy signal x received by the two microphones, as shown in FIG. 21(t) and x2(t) after discrete sampling, pre-emphasis, framing and windowing, obtaining a frequency domain signal X by short-time Fourier transform1(omega) and X2(ω), specifically in the following manner:
s11: continuous signal x with noise1(t) and x2(t) discrete sampling is carried out first, and the sampling frequency is 16 kHz. Pre-emphasis of the speech pitch part is achieved by a first order FIR high-pass digital filter, where EMP _ FAC is the pre-emphasis coefficient. Setting the sampling value of the voice signal at the time n as x (n), and the result after pre-emphasis processing is as follows:
z(n)=x(n)-EMP_FAC*x(n-1) (1)
taking EMP _ FAC as 0.8, after the noise reduction process, synthesizing a time domain signal by using short-time fourier transform, and performing de-emphasis operation on the time domain signal to restore a high-frequency part.
S12: sampling signal x1(n) and x2(n) framing, the frame length is 10ms, hamming windows with equal length are added to the framed signals, the window function formula is as formula (2), the windowed signals are introduced into a buffer area, and the length of the buffer area is 5 times of the number of FFT points.
Figure BDA0003513328410000051
S13: and obtaining a current frame frequency domain signal through fast Fourier transform, and outputting signals of the first 1/2 frequency points for subsequent algorithm processing according to the conjugate symmetry of real number sequence fast Fourier transform.
S2: the beam forming process is as shown in fig. 3, a virtual microphone is introduced at the midpoint of the connection line of the two microphones, and the obtained frequency domain signal X is1(omega) and X2(omega) carrying out differential transformation according to the central differential format to construct a differential signal Y1(omega) and Y2(ω), calculating Y1(omega) and Y2(ω) and the ratio of the statistical averages is taken as the directivity function Γ (ω, θ). Analyzing the property of the gamma (omega, theta), and directly mapping the property of the gamma (omega, theta) into a noise masking value through a normalization function, wherein the following method is adopted:
s21: the amplitudes are aligned and although the sound field is assumed to be far-field, the received signals of the two microphones have slight differences in amplitude. In order to further conform to the hypothesis, two paths of frequency domain signals X are firstly processed1(omega) and X2And (ω) multiplying the respective scaling factors to perform amplitude alignment.
S22: assuming that the desired beam is S (ω) and its direction is preset to α, a virtual microphone is introduced at the midpoint of the two-way microphone to receive the signal, and the sound source propagation diagram is as shown in fig. 4. X1(ω)、X2The spatial relationship between (ω) and S (ω) is:
Figure BDA0003513328410000052
Figure BDA0003513328410000053
wherein d is the microphone spacing, v is the sound velocity, and theta is the actual sound source incidence direction; according to the center difference format and X1(ω)、X2The spatial relationship between (omega) and S (omega) to construct a differential signal Y1(omega) and Y2(ω), and calculate Y1(omega) and Y2(ω) power spectrum.
Figure BDA0003513328410000061
S23:Y1(omega) and Y2The ratio of the statistical average of the (ω) power spectra is the value of the directional function Γ (ω, θ).
Figure BDA0003513328410000062
Image of Γ (ω, θ) as in fig. 5, it was found that Γ (ω, θ) tends to infinity when the actual incident direction θ of the sound source is equal to the given desired incident direction α of the sound source; and the function value is monotonous and approximately symmetrical on two sides of the axis theta-alpha;
s24: since the infinity of Γ (ω, θ) tends to be not reached in actual calculation when θ is α, it is necessary to calculate a threshold value Ω based on a preset main lobe width Θ.
Figure BDA0003513328410000063
The sigmoid function is an S-shaped function with a value range of (0,1), and the gamma (omega, theta) can be directly mapped to the noise masking value lambda (omega) of the corresponding frequency point in a normalized mode through certain deformation.
Figure BDA0003513328410000064
S25: the sound after masking the noise of the competing direction is:
R1(ω)=λ(ω)X1(ω) (8)
s3: post wiener filtering process as shown in FIG. 6, by pair R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculate gain function, further eliminating R1The residual noise in (ω) is specifically as follows:
s31: r is to be1(ω) are divided into NUM _ CHAN channels according to a critical bandwidth criterion. Because the voice energy is mainly concentrated at 0.3-3.4 kHz and at low frequencyThe corresponding channel is narrower; at high frequencies, the corresponding channel is wider. And estimating the energy of each channel, wherein beta is a smoothing factor, M is the number of the frequency points in the current channel, M represents the label of the channel of the current frame, i is the label of the current frame, and k is the label of the frequency points in the current channel.
Figure BDA0003513328410000065
S32: initializing the channel noise estimate to the channel energy of the first four frames, the signal-to-noise ratio can be calculated by (10):
Figure BDA0003513328410000071
s33: a nonlinear data table is designed as a voice index table, and the signal-to-noise ratio index (quantized signal-to-noise ratio value) is mapped to a group of numbers for measuring voice quality. And when the signal-to-noise ratio is high, the voice quality is considered to be high, and the sum of the voice indexes in the frequency range of 0.3-3.4 kHz is calculated.
S34: the total noise energy estimate (tne) and the total energy estimate (tce) for the first HI _ CHAN channels are calculated, i.e., the sum of the noise energy and the sum of the channel energy over a frequency range of 0.3-3.4 kHz.
Figure BDA0003513328410000072
Figure BDA0003513328410000073
S35: and calculating a log spectrum of the current channel energy, and recording the deviation of the long-time log spectrum energy and the short-time log spectrum energy as ch _ enrg _ dev.
ch_enrg_db(i,m)=10lg(ch_enrg(i,m)) (13)
Figure BDA0003513328410000074
S36: calculating a long-term integration constant alpha, which is a function of the total channel energy (tce), i.e., high tce (-40dB), slow smoothing (alpha 0.99); low tce (-60dB), fast smoothing (alpha ═ 0.50).
Figure BDA0003513328410000075
S37: and calculating and updating long-term log spectral energy.
Figure BDA0003513328410000076
S38: and resetting a noise updating flag Update _ flag through comparison according to the calculated parameters such as the voice index sum, the signal-to-noise ratio, the logarithmic spectrum deviation and the like. "Update _ flag" indicates that the current frame is a noise frame, and "Update _ flag" indicates that the current frame is a speech frame. And then, the noise updating marks of the previous frames need to be checked, if the noise cannot be updated for a long time, the current result is considered to be unreliable, and the signal-to-noise ratio index needs to be updated forcibly.
S39: and calculating a channel gain ftmp2 by using the obtained channel signal-to-noise ratio index.
Figure BDA0003513328410000081
Figure BDA0003513328410000082
If the noise Update flag Update _ flag is TRUE, the current frame is determined as a noise frame, and the energy estimation of the noise needs to be updated at this time.
To verify the effectiveness of the present invention, several tests were performed. It should be noted that, in order to verify that the method is applicable to various types of sounds, the voice data used for evaluation is derived from the TIMIT database, and the noise includes Babble noise and competitive directional voice. The experimental results in the invention are the results obtained by processing 10 sections of voice data averagely.
The present invention is compared with both the Coherence and SNR-Coherence methods, and α is first set to 0 °. Fig. 7 and 8 show PESQ scores of different methods after adding various noises (including competitive speech and Babble noise). It is clear that the present invention is superior to the Coherence method, and comparable to SNR-Coherence. In general, the PESQ results of the present invention are at least 0.5 higher than the unprocessed signal and the effect is maintained under multiple noise source conditions.
Fig. 9 and 10 show that the SegSNR value of the present invention is improved by at least 5dB over the unprocessed value at low signal-to-noise ratios (-5dB and 0dB) when the interference is non-target speech and Babble noise. The SegSNR result of the present invention is both higher than the SNR-Coherence method and almost equal to the Coherence method. Furthermore, the invention maintains optimal results in the presence of multiple noise sources.
Meanwhile, the evaluation results when the desired direction is set at other angles are shown in fig. 11. It can be seen that the invention still maintains a good noise suppression capability compared to the sound before processing.
The comparative tests show the good noise reduction performance and good working stability of the invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (4)

1. A method for reducing noise of a dual microphone with adjustable expected sound source direction is characterized by comprising the following steps:
the pretreatment process comprises the following steps: for noisy signal x received by dual microphones1(t) and x2(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X1(omega) and X2(ω);
And (3) beam forming process: introducing a virtual microphone at the midpoint of the two-microphone line, and applying the frequency domain signal X according to the central difference format1(omega) and X2(omega) performing differential conversion to construct a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a statistical average of the power spectrum, and the ratio of the statistical average is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed, it is directly mapped to a noise masking value λ (ω) by a normalization function, and the frequency domain signal X is converted into a frequency domain signal X1Multiplying (omega) by the noise masking value lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω);
Post wiener filtering process: to R1Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function, thereby eliminating signal R1Residual noise in (ω).
2. The method of claim 1, wherein: the pretreatment process comprises the following steps:
will carry the noise signal x1(t) and x2(t) discrete sampling is carried out, and then pre-emphasis processing is carried out on the high-frequency part of the voice;
sampling signal x1(n) and x2(n) dividing the signals into frames with the length of 10ms, adding equal-length Hamming windows w (n), introducing the windowed signals into a buffer area for processing, obtaining the frequency domain signals of the current frame through short-time Fourier transform, and outputting the signals of the first 1/2 frequency points for beam forming processing according to the conjugate symmetry of real number sequence Fourier transform.
3. The method of claim 1, wherein: the beam forming process comprises amplitude alignment, power spectrum calculation, directivity function value calculation, threshold calculation and normalization mapping;
the amplitude alignment mode is to the frequency domain signal X1(omega) and X2(ω) multiplying by a scaling factor respectively for amplitude alignment;
when calculating the power spectrum: assuming that the desired beam is S (co),the direction of the two-way microphone is preset to be alpha, a virtual microphone is introduced at the midpoint of the two-way microphone to receive a desired beam S (omega), and a central difference format and a frequency domain signal X are used1(ω)、X2The spatial relationship of (ω) to the desired beam S (ω) constructs a differential signal Y1(omega) and Y2(ω) calculating a difference signal Y1(omega) and Y2(ω) a power spectrum;
when calculating the directional function value: wherein the differential signal Y1(omega) and Y2(ω) the ratio of the statistical average of the power spectra is the value of the directivity function Γ (ω, θ), which tends to infinity when the actual sound source incidence direction θ is equal to the given desired sound source incidence direction α, and which functions monotonically and approximately symmetrically on both sides of the α -axis, discussing the nature of Γ (ω, θ);
Figure FDA0003513328400000021
when calculating the threshold and normalizing the mapping: as gamma (omega, theta) tends to be infinite, a threshold omega is calculated according to a preset main lobe width theta, the gamma (omega, theta) is directly mapped into a noise masking value lambda (omega) of a corresponding frequency point through normalized mapping of a sigmoid function, and X is used for mapping X to the noise masking value lambda (omega) of the corresponding frequency point1Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated1(ω)
Figure FDA0003513328400000022
Figure FDA0003513328400000023
4. The method of claim 1, wherein: the post-wiener filtering process comprises the steps of calculating a signal-to-noise ratio index, calculating a logarithmic spectrum deviation, modifying or resetting a noise mark and calculating a gain function value;
when calculating the SNR index, it willSignal R1(omega) dividing the channel into a plurality of channels according to a critical bandwidth criterion, estimating the energy of each channel, initializing the channel noise energy estimation into the channel energy of the first four frames, and calculating a channel signal-to-noise ratio index according to the channel noise energy estimation;
when calculating the logarithmic spectrum deviation, designing a nonlinear data table as a voice index table, mapping the signal-to-noise ratio index into a group of numbers for measuring the voice quality, taking the sum of the voice indexes in a certain frequency range as the voice quality evaluation result of the current channel, taking the logarithm of the signal energy of the current channel, and calculating the deviation of the long-time logarithmic spectrum energy and the short-time logarithmic spectrum energy;
modifying or resetting the noise mark, judging whether the current frame is a voice frame or a noise frame according to the calculated voice index sum, the signal-to-noise ratio index and the log spectrum deviation parameter information, resetting the noise updating mark, checking the updating marks of the previous frames, and if the noise cannot be updated for a long time and the result is unreliable, forcibly updating the signal-to-noise ratio index;
when the gain function value is calculated, the channel signal-to-noise ratio index is used for calculating the channel gain value to remove residual background noise, and the noise energy estimation of the next frame is updated according to the result of the noise updating mark.
CN202210157383.8A 2022-02-21 2022-02-21 Double-microphone noise reduction method with adjustable expected sound source direction Pending CN114724574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210157383.8A CN114724574A (en) 2022-02-21 2022-02-21 Double-microphone noise reduction method with adjustable expected sound source direction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210157383.8A CN114724574A (en) 2022-02-21 2022-02-21 Double-microphone noise reduction method with adjustable expected sound source direction

Publications (1)

Publication Number Publication Date
CN114724574A true CN114724574A (en) 2022-07-08

Family

ID=82235970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210157383.8A Pending CN114724574A (en) 2022-02-21 2022-02-21 Double-microphone noise reduction method with adjustable expected sound source direction

Country Status (1)

Country Link
CN (1) CN114724574A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497500A (en) * 2022-11-14 2022-12-20 北京探境科技有限公司 Audio processing method and device, storage medium and intelligent glasses

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1809105A (en) * 2006-01-13 2006-07-26 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
US20080019548A1 (en) * 2006-01-30 2008-01-24 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US20090055170A1 (en) * 2005-08-11 2009-02-26 Katsumasa Nagahama Sound Source Separation Device, Speech Recognition Device, Mobile Telephone, Sound Source Separation Method, and Program
US20100246851A1 (en) * 2009-03-30 2010-09-30 Nuance Communications, Inc. Method for Determining a Noise Reference Signal for Noise Compensation and/or Noise Reduction
CN101916567A (en) * 2009-11-23 2010-12-15 瑞声声学科技(深圳)有限公司 Speech enhancement method applied to dual-microphone system
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
US20120140947A1 (en) * 2010-12-01 2012-06-07 Samsung Electronics Co., Ltd Apparatus and method to localize multiple sound sources
US20120278070A1 (en) * 2011-04-26 2012-11-01 Parrot Combined microphone and earphone audio headset having means for denoising a near speech signal, in particular for a " hands-free" telephony system
CN111063366A (en) * 2019-12-26 2020-04-24 紫光展锐(重庆)科技有限公司 Method and device for reducing noise, electronic equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055170A1 (en) * 2005-08-11 2009-02-26 Katsumasa Nagahama Sound Source Separation Device, Speech Recognition Device, Mobile Telephone, Sound Source Separation Method, and Program
CN1809105A (en) * 2006-01-13 2006-07-26 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
US20080019548A1 (en) * 2006-01-30 2008-01-24 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US20100246851A1 (en) * 2009-03-30 2010-09-30 Nuance Communications, Inc. Method for Determining a Noise Reference Signal for Noise Compensation and/or Noise Reduction
CN101916567A (en) * 2009-11-23 2010-12-15 瑞声声学科技(深圳)有限公司 Speech enhancement method applied to dual-microphone system
US20120140947A1 (en) * 2010-12-01 2012-06-07 Samsung Electronics Co., Ltd Apparatus and method to localize multiple sound sources
US20120278070A1 (en) * 2011-04-26 2012-11-01 Parrot Combined microphone and earphone audio headset having means for denoising a near speech signal, in particular for a " hands-free" telephony system
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
CN111063366A (en) * 2019-12-26 2020-04-24 紫光展锐(重庆)科技有限公司 Method and device for reducing noise, electronic equipment and readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HUANG, GONGPING ET, AL.: "《ROBUST AND STEERABLE KRONECKER PRODUCT DIFFERENTIAL BEAMFORMING WITH RECTANGULAR MICROPHONE ARRAYS》", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)》, 2 March 2021 (2021-03-02), pages 211 - 215 *
ZHAO QINGYING ET.AL.: "《Directional Noise Suppression Based on Dual-Microphone With Desired Direction Presetting》", 《IEEE SENSORS JOURNAL》, vol. 24, no. 6, 15 March 2024 (2024-03-15), pages 8427 - 8437 *
徐娜;吴长奇;: "结合差分阵列与幅度谱减的双麦语音增强算法", 信号处理, no. 07, 25 July 2018 (2018-07-25), pages 124 - 129 *
陈震昊: "《基于麦克风阵列的语音增强算法的研究与实现》", 《中国硕士优秀学位论文全文数据库 信息科技辑》, no. 03, 16 February 2022 (2022-02-16), pages 10 - 67 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497500A (en) * 2022-11-14 2022-12-20 北京探境科技有限公司 Audio processing method and device, storage medium and intelligent glasses

Similar Documents

Publication Publication Date Title
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
CN106782590B (en) Microphone array beam forming method based on reverberation environment
US7366662B2 (en) Separation of target acoustic signals in a multi-transducer arrangement
US9224393B2 (en) Noise estimation for use with noise reduction and echo cancellation in personal communication
JP5007442B2 (en) System and method using level differences between microphones for speech improvement
JP5762956B2 (en) System and method for providing noise suppression utilizing nulling denoising
US8538749B2 (en) Systems, methods, apparatus, and computer program products for enhanced intelligibility
US8965003B2 (en) Signal processing using spatial filter
US8204252B1 (en) System and method for providing close microphone adaptive array processing
US8958572B1 (en) Adaptive noise cancellation for multi-microphone systems
US9232309B2 (en) Microphone array processing system
CN108447496B (en) Speech enhancement method and device based on microphone array
US20140037100A1 (en) Multi-microphone noise reduction using enhanced reference noise signal
Priyanka A review on adaptive beamforming techniques for speech enhancement
CN114724574A (en) Double-microphone noise reduction method with adjustable expected sound source direction
US11153695B2 (en) Hearing devices and related methods
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
Xu et al. Adaptive speech enhancement algorithm based on first-order differential microphone array
CN113763984A (en) Parameterized noise elimination system for distributed multiple speakers
JP2003044087A (en) Device and method for suppressing noise, voice identifying device, communication equipment and hearing aid
CN116320947B (en) Frequency domain double-channel voice enhancement method applied to hearing aid
Lotter et al. A stereo input-output superdirective beamformer for dual channel noise reduction.
US20230186934A1 (en) Hearing device comprising a low complexity beamformer
Hussain et al. A novel psychoacoustically motivated multichannel speech enhancement system
Mittal et al. Frame-by-frame mixture of beamformers for source separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination