CN111968659B

CN111968659B - Microphone array voice enhancement method based on optimized IMCRA

Info

Publication number: CN111968659B
Application number: CN202010719382.9A
Authority: CN
Inventors: 李秋颖; 张涛
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2023-10-31
Anticipated expiration: 2040-07-23
Also published as: CN111968659A

Abstract

A microphone array speech enhancement method based on optimized IMCRA, comprising: respectively calculating signals collected by each microphone to form a microphone array; calculating output signals of the microphone array; calculating an average noise signal in the blocking matrix output signal; noise power spectral density estimated using an optimized IMCRA method; the output signal of the microphone array and the noise power spectral density are input to an MMSE-LSA estimator to obtain a final speech enhancement signal. The invention not only improves the precision of noise estimation, but also reduces the calculation amount of voice enhancement; the spatial characteristics of microphone array voice enhancement are utilized, and noise residues of traditional microphone array voice enhancement are further removed; more accurate noise power spectral density can be estimated, thereby enabling the MMSE-LSA estimator to output enhanced speech signals of higher quality.

Description

Microphone array voice enhancement method based on optimized IMCRA

Technical Field

The invention relates to a voice enhancement method. And more particularly to a microphone array speech enhancement method based on optimized IMCRA.

Background

The voice is used as the most basic communication means of human beings, and is the most convenient and direct information interaction tool between people. With the rapid development of science and technology, voice is also a main tool for people to communicate with machines. But in daily life, the speech signal is often disturbed by noise. Therefore, how to reduce noise, or how to improve speech quality, especially speech intelligibility, is a hot spot for many scholars to study. The goal of speech enhancement is to suppress noise as much as possible. In recent years, in order to reduce noise, many speech enhancement methods have been proposed.

The number of microphones can be classified into single-channel speech enhancement and microphone array speech enhancement. Among them, the single-channel speech enhancement algorithm is the earliest one. Classical single-channel speech enhancement algorithms include spectral subtraction, wiener filtering, and kalman filtering. However, during the processing, the spectral subtraction may generate musical noise, the wiener filtering method may have poor performance in a non-stationary environment, and the kalman filtering method may cause damage to the voice.

The microphone array speech enhancement method has more advantages than the conventional single channel speech enhancement method. The method not only utilizes time domain information among samples, but also utilizes space information among channels, thereby improving the voice enhancement performance. Currently, there are many well-established microphone array speech enhancement algorithms. Such as fixed beam forming algorithms, adaptive beam forming algorithms, and generalized sidelobe canceling methods. Fixed beam forming algorithms are easy to implement but require more microphones to effectively enhance speech. Adaptive beamforming algorithms were developed based on fixed beamforming. The key change in comparison to fixed beam forming is the weighting coefficients. The weighting coefficients of adaptive beamforming are no longer fixed values but vary with the input. The flexibility of the algorithm is improved, and the algorithm can be applied to more acoustic environments. The generalized sidelobe cancellation method can eliminate strong correlation noise, but has poor suppression capability on weak correlation noise and incoherent noise, and is more convenient and flexible to calculate.

Estimating the power spectral density of noise is an important step in speech enhancement. Noise environments can be classified into stationary noise environments and non-stationary noise environments. In a stationary noise environment, the noise distribution is uniform and varies slowly. It only uses noise segments of the noisy speech signal to estimate the noise spectrum, which are typically identified by speech activity detection methods. In practical application, background noise is often non-stationary, so that it is of practical significance to study noise estimation algorithms in non-stationary noise environments. In a non-stationary noise environment, the noise is constantly changing. Common estimation methods are minimum statistics, minimum control recursive average and modified minimum control recursive average.

For single channel speech enhancement algorithms, the noise power spectrum estimator generally exploits the time spectral characteristics of the signal. In a non-stationary noise environment its estimation accuracy will decrease. Because these methods assume that noise is only updated when speech is not present and not updated when speech is present. In practice, noise is always present. This will result in non-stationary noise being not accurately estimated when the noise changes very rapidly.

However, microphone arrays can overcome the limitations of single channel methods by spatially separating the signals, which is relatively easy to extract speech or noise components. Therefore, the microphone array voice enhancement is combined with the noise estimation algorithm, so that the accuracy of noise estimation can be remarkably improved, and the calculated amount is reduced.

Disclosure of Invention

The invention aims to solve the technical problem of providing a microphone array voice enhancement method based on optimized IMCRA, which can improve the precision of noise estimation and voice enhancement performance.

The technical scheme adopted by the invention is as follows: a microphone array voice enhancement method based on an optimized IMCRA comprises the following steps:

1) Respectively calculating the signal x collected by each microphone _n (t) forming a microphone array;

2) Calculating the output signal y of the microphone array _a (t)；

3) Calculating average noise signal B in blocking matrix output signals _AV (t)；

4) Noise power spectral density estimated using optimized IMCRA method

5) Output signal y of microphone array _a (t) and noise Power Spectrum DensityAnd inputting the speech enhancement signal to an MMSE-LSA estimator to obtain a final speech enhancement signal.

Step 1) comprises:

there are J signals s in the set space _j (t), J e 1,2,3,..j, n+1 microphones, the nth microphone receiving a signal x at time t _n (t)：

in the formula ,h_n,j Is the channel impulse response from the jth sound source to the nth microphone, v _n (t) represents an additive noise,represents convolution, where N represents the microphones in the array, N e 1,2, N; at signal s _j In (t), j=1 represents a desired speech signal, j=2, 3, & gt, and J represents an interference signal.

Step 2) comprises:

signal X received at time t for microphone array _N (t) calculating the output signal y of the array using beamforming _a (t)：

In which the input signal of the array is X _N (t)，X _N (t)＝[x ₁ (t),x ₂ (t),...,x _N (t)]The weight of beam forming is w _a (t)，w _a (t)＝[w _a,1 (t),w _a,2 (t),K,w _a,N (t)]。

The step 3) comprises the following steps:

signal X to be received at time t based on microphone array _N (t) obtaining a noise signal by blocking matrix B as follows:

then the noise signal is averaged by the following formula to obtain an average noise signal B _AV (t)：

Where N represents the total number of microphones.

Step 4) is to output the signal y of the microphone array _a (t) average noise Signal B _AV (t) obtaining a corresponding frequency domain signal y after Fourier transformation _a (k, l) and B _AV (k, l) estimating the power spectral density of the noise by using the optimized IMCRA method to the frequency domain signalWhere k represents frequency.

The optimized IMCRA method is that an average noise signal B is added into an updating formula of noise power spectrum density in the IMCRA method _AV (t) the specific formula is as follows:

wherein ,is the estimated noise power spectral density at frequency k and frame i +.>Is the estimated noise power spectral density of the 1 st+1 frame; alpha _d Is a smoothing factor; p (k, l) is the probability of signal presence.

According to the microphone array voice enhancement method based on the optimized IMCRA, the noise estimation algorithm is combined with the microphone array voice enhancement, so that the accuracy of noise estimation is improved, and the calculated amount of voice enhancement is reduced. The method combines microphone array voice enhancement and single-channel voice enhancement, not only utilizes the airspace characteristic of microphone array voice enhancement, but also further removes the noise residue of the traditional microphone array voice enhancement. The invention firstly uses a wave beam forming method to process the noise-containing signals received by the microphone array, so that the signal-to-noise ratio of the noise-containing signals is improved. The optimized IMCRA method can estimate more accurate noise power spectral density, so that the MMSE-LSA estimator can output enhanced voice signals with higher quality.

Drawings

Fig. 1 is a block diagram of a microphone array speech enhancement method based on an optimized IMCRA of the present invention.

Detailed Description

A microphone array speech enhancement method based on an optimized IMCRA of the present invention is described in detail below with reference to the examples and figures.

As shown in fig. 1, the microphone array voice enhancement method based on the optimized IMCRA of the present invention includes the following steps:

1) Respectively calculating the signal x collected by each microphone _n (t) forming a microphone array; comprising the following steps:

2) Calculating the output signal y of the microphone array _a (t); comprising the following steps:

In which the input signal of the array is X _N (t)，X _N (t)＝[x ₁ (t),x ₂ (t),…,x _N (t)]The weight of beam forming is w _a (t)，w _a (t)＝[w _a,1 (t),w _a,2 (t),K,w _a,N (t)]。

3) Calculating average noise signal B in blocking matrix output signals _AV (t); comprising the following steps:

Where N represents the total number of microphones.

4) Noise power spectral density estimated using optimized IMCRA method

In particular the output signal y of the microphone array _a (t) average noise Signal B _AV (t) obtaining a corresponding frequency domain signal y after Fourier transformation _a (k, l) and B _AV (k, l) estimating the power spectral density of the noise by using the optimized IMCRA method to the frequency domain signalWhere k represents frequency.

The optimized IMCRA method of the invention is to add a flat to an update formula of noise power spectrum density in the IMCRA methodNoise equalizing signal B _AV (t) the specific formula is as follows:

wherein ,is the estimated noise power spectral density at frequency k and frame i +.>Is the estimated noise power spectral density of the 1 st+1 frame; alpha _p Is a smoothing factor; p (k, l) is the probability of signal presence.

The effect of a microphone array speech enhancement method based on an optimized IMCRA of the present invention is illustrated by comparing with existing methods in the same simulation environment.

The simulation environment is built on an open source tool McRoomsim, which can generate a reverberation room impulse response of a user-defined rectangular room. The room properties are set to sound-deadening chambers with a sound absorption coefficient of 1, which means that there is no reverberation or other noise in the environment. The microphone array was a uniform circular microphone array (UCA) with a gap of 18 cm, n=7. The UCA is centered at the origin (0, 0) of the coordinate system with its center being the positive X-axis. The target signal is placed at 15 meters from the central axis. An interference source was established in the simulation, located at (7.5,7.5,7.5).

One male and one female are randomly selected from the TIMIT database 20. 5 noise signals were extracted from the noise-92 database 21, simulating different noise environments, namely pink noise, F16 (aircraft) noise, white noise, volwa (car) noise and M109 noise, respectively. The selected 2 clean speech and 5 noise signals are resampled at 16kHz and mixed with SNR at-10 dB, -5dB, 0dB, 5dB and 10dB to generate the target signal. The target signal and the interfering signal propagate to the microphone array in the simulated environment.

The microphone array voice enhancement method based on the optimized IMCRA, the fixed beam forming (DS) method and the voice enhancement method based on the MVDR method are used for carrying out comparison experiments, and the experimental results adopt a segmented signal-to-noise ratio (segSNR) and short-time objective intelligibility (STOI) to represent voice quality. The experimental results are shown in tables 1 to 5. Simulation experiment results show that the microphone array voice enhancement method based on the optimized IMCRA can improve voice enhancement performance, and the STOI and segSNR are respectively improved by 12% and 88%.

TABLE 1 average STOI score of original and enhanced signals (female)

TABLE 2 average STOI score of original and enhanced signals (Male)

TABLE 3 segSNR score of original and enhanced signals (female)

TABLE 4 segSNR score of original and enhanced signals (Male)

Table 5.M-sex expression of OIMCRA

From tables 1 to 4, it can be seen that the microphone array speech enhancement method based on the optimized IMCRA of the present invention can effectively improve the STOI and segSNR scores of the noise speech under various noise conditions. Experimental results show that the method of the invention improves STOI and segSNR by 12% and 88%, respectively.

Research shows that the male voice has more middle and low frequency components and the female voice has more middle and high frequency components. From table 5 it can be seen that the method of the invention has a noise reduction performance for women that is better than for men for different types of noise, and a noise reduction effect for women that is 20% better than for men.

Claims

1. A microphone array speech enhancement method based on optimized IMCRA, comprising the steps of:

in the formula ,h_n,j Is the channel impulse response from the jth sound source to the nth microphone, v _n (t) represents an additive noise,represents convolution, where N represents the microphones in the array, N e 1,2, N; at signal s _j In (t), j=1 represents a desired speech signal, j=2, 3, & gt, and J represents an interference signalA number;

In which the input signal of the array is X _N (t)，X _N (t)＝[x ₁ (t),x ₂ (t),...,x _N (t)]The weight of beam forming is w _a (t)，w _a (t)＝[w _a,1 (t),w _a,2 (t),K,w _a,N (t)]；

Wherein N represents the total number of microphones;

4) Noise power spectral density estimated using optimized IMCRA methodIs to output signal y of microphone array _a (t) average noise Signal B _AV (t) obtaining a corresponding frequency domain signal y after Fourier transformation _a (k, l) and B _AV (k, l) estimating the power spectral density of the noise by using the optimized IMCRA method on the frequency domain signal>Wherein k represents a frequency;

wherein ,is the estimated noise power spectral density at frequency k and frame i +.>Is the estimated noise power spectral density of the 1 st+1 frame; alpha _p Is a smoothing factor; p (k, l) is the probability of signal presence;

5) Output signal y of microphone array _a (t) and noise Power Spectrum DensityAnd inputting the speech enhancement signal to an MMSE-LSA estimator to obtain a final speech enhancement signal. />