CN117894332A

CN117894332A - Time domain multichannel voice noise reduction method based on Cronecker decomposition

Info

Publication number: CN117894332A
Application number: CN202410241313.XA
Authority: CN
Inventors: 王向辉; 李梅; 韩宗乐; 王桂宝; 赵莹珂; 田旭华; 王姣; 郭晶; 陈晓屹
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2024-03-04
Filing date: 2024-03-04
Publication date: 2024-04-16

Abstract

The invention discloses a time domain multichannel voice noise reduction method based on Cronecker decomposition, which comprises the following steps: collecting a voice signal with noise, and preprocessing the voice signal with noise; estimating the statistical characteristics of the voice signal with noise and the noise signal; obtaining an iterative Venus noise reduction filter based on Cronecker decomposition based on the statistical characteristics; noise reduction filtering is carried out on the voice signals with noise based on the iterative Venus noise reduction filter, and estimated values of the clean voice signals are obtained. The invention uses singular value decomposition and Cronecker decomposition to realize low-rank representation of the traditional multi-channel noise reduction filter, decomposes filter coefficients in time domain dimension and space domain dimension, converts the estimation problem of one long filter into the estimation problem of two shorter sub-filters, reduces estimation parameters, reduces algorithm complexity, improves the tracking capability of noise signal statistical characteristic change, and has the other advantage of no music noise compared with the prior frequency domain voice noise reduction method applied in an actual system.

Description

Time domain multichannel voice noise reduction method based on Cronecker decomposition

Technical Field

The invention relates to the field of voice noise reduction, in particular to a time domain multichannel voice noise reduction method based on Cronecker decomposition.

Background

In a daily environment, various types of noise exist, and a voice signal collected by a microphone is necessarily contaminated with various environmental noises, and the noise deteriorates the quality and intelligibility of the voice signal and may cause hearing fatigue of a listener. The speech noise reduction technique is particularly important. The voice noise reduction technology aims at suppressing the influence of noise, and aims at recovering a 'clean' voice signal from a voice signal with noise, so that the quality and the intelligibility of voice are improved, and the voice noise reduction technology plays an important role in voice communication.

According to whether the voice noise reduction algorithm utilizes spatial information, it can be classified into a single channel noise reduction algorithm and a multi-channel noise reduction algorithm. According to the difference of the execution domains of the voice noise reduction algorithm, the noise reduction algorithm can be further divided into a time domain algorithm and a transform domain algorithm (such as a wavelet domain, a frequency domain and the like). At present, the application range of the frequency domain voice noise reduction method is wider. The reason is that the complexity of the frequency domain noise reduction method is low, and real-time noise reduction can be realized. But has a disadvantage in that music noise (music noise) is easily generated. Through researches, people are more difficult to endure music noise than noise in daily life. Therefore, reducing the musical noise generated by the frequency domain noise reduction algorithm has been a research hotspot for researchers. The advantage of the time domain speech noise reduction method is that it does not generate musical noise. The biggest bottleneck limiting the practical deployment of time-domain speech noise reduction methods is their high complexity. The reason is that in time-domain speech noise reduction algorithms, their filters are typically long, resulting in excessive complexity. Particularly, for a multichannel voice noise reduction algorithm, the multichannel voice noise reduction algorithm is difficult to deploy in a practical system to perform real-time noise reduction processing on a voice signal with noise. In addition, the application of a longer filter also brings about the following two problems: firstly, the estimation error of the signal correlation matrix increases along with the increase of the length of the filter, and finally the noise reduction performance of the algorithm is reduced; second, more observation samples are needed to estimate the correlation matrix of the signal, to calculate the coefficients of the filter, resulting in a reduced ability of the algorithm to track changes in the statistical characteristics of the signal.

In order to solve the problems, the invention provides a design method of a time domain multi-channel iterative noise reduction filter based on Cronecker decomposition.

Disclosure of Invention

The invention aims to provide a time domain multichannel voice noise reduction method based on Cronecker decomposition so as to solve the problems in the prior art.

In order to achieve the above object, the present invention provides a time domain multichannel speech noise reduction method based on kronecker decomposition, comprising:

collecting a voice signal with noise, and preprocessing the voice signal with noise;

estimating the statistical characteristics of the noisy speech signal and the noise signal;

obtaining an iterative Venus noise reduction filter based on the Kronecker decomposition based on the statistical characteristics;

and carrying out noise reduction filtering on the noise-carrying voice signal based on the iterative Venus noise reduction filter to obtain an estimated value of the clean voice signal.

Optionally, collecting the noisy speech signal, and preprocessing the noisy speech signal includes:

in speech noise reduction, the time domain signal model is:

y _m (t)＝x _m (t)+v _m (t) (1)

here, t represents a discrete time point, subscript(·) _m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x _m (t) and v _m (t) representing the clean speech signal and the additive noise signal received by the mth microphone, y _m (t) represents the noisy speech signal received by the mth microphone, x _m (t) and v _m (t) independent of each other; the 1 st microphone in the microphone array is selected as the reference microphone, i.e. x ₁ (t) as a desired signal;

by combining L consecutive sample points together, the signal received by the mth microphone is written as a vector of length L:

wherein x is _m (t) and v _m Definition of (t) and y _m (t) is similar, namely:

x _m (t)＝[x _m (t) x _m (t-1) … x _m (t-L+1)] ^T

v _m (t)＝[v _m (t) v _m (t-1) … v _m (t-L+1)] ^T

x _m (t) and v _m (t) representing the expected signal vector of the mth channel and the noise signal vector of the mth channel, y, respectively _m (t) noisy signal vector representing the mth channel, superscript (·) ^T Representing a transpose;

m noisy signal vectors y of length L _m (t) (m=1, 2,) spliced together, M is written:

wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e

y(t)、x(t) andv(t) representing an overall noisy signal vector, an overall clean speech signal vector, and an overall noise signal vector, respectively.

Optionally, the process of estimating the statistical characteristics of the noisy speech signal and the noise signal includes:

estimating the ensemble noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) _v (t) estimating the overall noisy signal vector by a recursive algorithmyCorrelation matrix R of (t) _y (t)：R _y (t)＝αR _y (t-1)+(1-α)y(t)y ^T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R _x (t)＝R _y (t)-R _v (t) estimating an overall clean speech signal vectorxCorrelation matrix R of (t) _x (t) based on a speech signal correlation matrix R _x (t) determining a vectorρ(t) obtaining statistical properties.

Optionally, based on the speech signal correlation matrix R _x (t) determining a vectorρThe process of (t) comprises:

extracting the voice signal correlation matrix R _x (t) elements of the first row and the first column divided by elements of the first row and the first column to obtain a vectorρ(t)。

Optionally, the process of obtaining the iterative wiener noise reduction filter based on the kronecker decomposition based on the statistical characteristics includes:

constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix, performing approximate representation, and obtaining approximate representation of the linear filter with the same length based on approximate representation of a singular value decomposition result;

improving an expression of the expected signal estimated value based on the approximate expression of the linear filter, and expressing the expected signal estimated value by adopting the filtered voice signal and the filtered residual noise;

and defining a mean square error of the expected signal estimated value, deforming the mean square error based on an approximate representation of the linear filter, and obtaining an iterative wiener filter based on the Kronecker decomposition based on a deformed mean square error expression.

Optionally, constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix and performing approximate representation, and obtaining the approximate representation of the linear filter with the same length based on the approximate representation of the singular value decomposition result comprises the following steps:

to achieve the purpose of noise reduction, the whole noisy signal vector with length ML is neededy(t) passing through a linear filterh(t), i.e

Wherein z (t) is the desired signal x ₁ An estimated value of (t), h _m (t) (m=1, 2, …, M) is a linear filter of the mth channel, length L, h(t) is a linear filter of length ML;

to derive a time domain multi-channel noise reduction scheme based on kronecker decomposition, h is set to _m (t) (m=1, 2,) M is written in matrix form, i.e.

H(t)＝[h ₁ (t) h ₂ (t) … h _M (t)] (5)

Using singular value decomposition, the matrix H (t) can be decomposed into the following forms:

wherein H is ₁ (t)＝[h _1,1 (t) h _1,2 (t) … h _1,L (t)]And H ₂ (t)＝[h _2,1 (t) h _2,2 (t) … h _2,M (t)]Left singular vectors H respectively of matrix H (t) _1,l (t) (l=1, 2,., L) and right singular vector h _2,m (t) (m=1, 2,.. M.) orthogonal matrices having dimensions l×l and m×m, respectively, Σ (t) being a rectangular diagonal matrix having dimensions l×m, the diagonal elements of which are the singular values of matrix H (t) and are non-negative real numbers; ranking singular values from large to small, i.e. sigma ₁ (t)≥σ ₂ (t)≥…≥σ _M (t)≥0；

Approximating matrix H (t) with singular vectors corresponding to the P (P.ltoreq.min (M, L)) maximum singular values, i.e.

Wherein h is _1,p (t) is the singular value σ of the matrix H (t) _p (t)(σ _p (t) is a left singular vector (p=1, 2,..p.) corresponding to the P-th singular value of the large-to-small permutation of the matrix H (t), H _2,p (t) is the singular value σ of the matrix H (t) _p (t) the corresponding right singular vectors (p=1, 2, once again, P),

based on (7), the filterh(t) can be expressed approximately as

Wherein,h(t)＝vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents a kronecker product (Kronecker product); when the larger the P is, the more,h _P (t) pairhNear (t)The better the degree of similarity, when p=m,h _P (t)＝h(t)；

application relation type

Will beh _P (t) written as

Wherein the method comprises the steps ofThe dimension is ML x L->Its dimension is ML x M, I _L And I _M The identity matrices are of dimensions L x L and M x M, respectively.

Optionally, the process of improving the expression of the desired signal estimate based on the approximate representation of the linear filter includes:

desired signal x based on (10) ₁ The estimate z (t) of (t) is written as:

wherein,

y _T,P (t)＝[y ^T (t)H _T,1 (t) y ^T (t)H _T,2 (t) … y ^T (t)H _T,P (t)] ^T ＝H _T,P (t)y(t)

y _S,P (t)＝[y ^T (t)H _S,1 (t) y ^T (t)H _S,2 (t) … y ^T (t)H _S,P (t)] ^T ＝H _S,P (t)y(t)

H _T,P (t)＝[H _T,1 (t) H _T,2 (t) … H _T,P (t)] ^T

H _S,P (t)＝[H _S,1 (t) H _S,2 (t) … H _S,P (t)] ^T

vector/matrixh _T,P (t)，h _S,P (t)，y _T,P (t)，y _S,P (t)，H _T,P (t) A method of producing a solid-state image sensorH _S,P The dimensions of (t) are LP×1, MP×1, LP×1, MP×ML, and LP×ML, respectively;h _T,P (t) is a sub-filter vector that performs a speech noise reduction function in the time domain,h _S,P (t) is a sub-filter vector that performs a speech noise reduction function in the spatial domain,H _T,P (t) is a sub-filter matrix that performs a voice noise reduction function in the time domain,H _S,P (t) is a sub-filter matrix which plays a role in voice noise reduction in the space domain,y _T,P (t) is a sub-filtered matrixH _T,P (t) a filtered noisy speech signal vector,y _S,P (t) is a sub-filtered matrixH _S,P (t) a filtered noisy speech signal vector;

by deforming, z (t) is expressed as:

wherein Y (t) = [ Y ] ₁ (t) y ₂ (t) … y _M (t)]For matrix with noise signals vec [ Y (t) ]]＝y(t) the dimension of the matrix Y (t) V (t) is LxM; filter h _T,p (t) (p=1, 2, …, P) and h _S,p (t) (p=1, 2, …, P) in the time dimension and null, respectivelyNoise reduction is realized in the inter-dimension;

formula (11) may be further written as follows:

wherein the vector isx _S,P (t)＝H _S,P (t)x(t)，v _S,P (t)＝H _S,P (t)v(t) are LP in length, vectorx _T,P (t)＝H _T,P (t)x(t)，v _T,P (t)＝H _T,P (t)vThe length of (t) is MP,for a filtered speech signal, < >>Is the filtered residual noise.

Optionally, defining a mean square error of the desired signal estimate, and deforming the mean square error based on an approximate representation of the linear filter includes:

deriving the mean square error of the estimated value z (t) of the desired signal, defining the error of z (t) as

ε(t)＝z(t)-x ₁ (t) (15)

Based on equation (15), the mean square error of z (t) is defined as

Wherein,

using equation (10), the mean square error (16) of the desired signal estimate z (t) is written as

Wherein,

ρ _S,P (t)＝H _S,P (t)ρ(t) (19)

ρ _T,P (t)＝H _T,P (t)ρ(t) (21)

respectively fixh _S,P (t) andh _T,P (t) writing formula (17) in the form:

optionally, the process of obtaining the iterative wiener filter based on the kronecker decomposition based on the deformed mean square error expression includes:

step one: will beh _T,P The initial value of (t) is set asWherein, h _W,p (t) a wiener filter for the p-th channel, length L,for matrix R _y (t) a matrix of elements located in the (p-1) th L+1 to pL rows and (p-1) th L+1 to pL columns, which is an autocorrelation matrix of the vector of the p-th channel noisy speech signal, vector->Is vector quantityρVector composed of (p-1) L+1 to pL elements of (t),. About.>For matrix R _x (t) an element located in a p-th row and a p-th column;

step two: application ofBy the formula->AndH _T,P (t)＝[H _T,1 (t) H _T,2 (t) … H _T,P (t)] ^T Structure->And brings it to the formula->Andin (1) get->And->Superscript (·) ⁽ⁿ⁾ Representing the result of the nth iteration;

step three: will beAnd->Carry-in to->In (1) to obtain

Step four: application ofBy the formula->AndH _S,P (t)＝[H _S,1 (t) H _S,2 (t) … H _S,P (t)] ^T Structure->And brings it to the formula->And->In (1) get->And->

Step five: will beAnd->Carry-in to->In (1) to obtain

Repeating the steps from the second step to the fifth step for N times, wherein N is a freely set parameter, and obtaining the iterative wiener filter based on the Cronecker decomposition

The invention has the technical effects that:

the invention realizes the low-rank representation of the traditional multichannel noise reduction filter by applying singular value decomposition and Crohn's decomposition, and decomposes the filter coefficients playing a role in noise reduction in the time domain dimension and the space domain dimension, thereby converting the estimation problem of one long filter into the estimation problem of two shorter sub-filters and realizing the multichannel noise reduction scheme based on the Crohn's decomposition. A shorter filter means fewer parameters need to be estimated, so the algorithm complexity can be significantly reduced. In addition, the number of signal samples required for estimating a shorter filter is also reduced, so that the noise reduction algorithm can better track the change of the signal statistical characteristics and is more suitable for processing non-stationary noise. Compared with the traditional multi-channel noise reduction filter, the multi-channel noise reduction filter based on the Kroneck decomposition can more flexibly control the compromise between the noise reduction performance and the algorithm complexity when processing stable noise; when processing non-stationary noise, better noise reduction performance can be obtained, and the complexity is lower; compared with the prior frequency domain voice noise reduction method widely applied in a practical system, the method has the further advantage that music noise does not exist.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a schematic flow chart of a method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a system structure according to an embodiment of the invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

As shown in fig. 1-2, in this embodiment, a time domain multichannel voice noise reduction method based on kronecker decomposition is provided, which includes:

step 1, collecting a voice signal with noise;

step 2, estimating the statistical characteristics of the voice signal with noise and the noise signal;

step 3, estimating an iterative Venus noise reduction filter based on Cronecker decomposition;

and 4, filtering and denoising the voice signal with noise to obtain an estimated value of the clean voice signal.

The specific method of the step 2 is as follows:

in speech noise reduction, the time domain signal model is:

y _m (t)＝x _m (t)+v _m (t) (1)

here, t represents a discrete point in time, subscript (.) _m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x _m (t) and v _m (t) represents a clean speech signal and an additive noise signal, y, respectively _m (t) represents a noisy speech signal, x _m (t) and v _m (t) are independent of each other. In the invention, all signals are zero mean value and broadband real signals. In the invention, the 1 st microphone in the microphone array is selected as the reference microphone, namely x is selected ₁ (t) as a desired signal (signal to be recovered). But in theory, any microphoneWind may be used as the reference microphone.

By combining L consecutive sample points together, the signal received by the mth microphone can be written as a vector of length L:

wherein x is _m (t) and v _m Definition of (t) and y _m (t) is similar, namely:

x _m (t)＝[x _m (t) x _m (t-1) … x _m (t-L+1)] ^T

v _m (t)＝[v _m (t) v _m (t-1) … v _m (t-L+1)] ^T

x _m (t) and v _m (t) representing the desired signal vector and the noise signal vector, respectively, superscript (·) ^T Representing the transpose.

In conventional time-domain multi-channel speech enhancement, M noisy signal vectors y of length L are typically used _m (t) (m=1, 2,) spliced together, M is written:

wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e

Estimating the integral noisy speech signal in the step 2The statistical characteristics of the number and the overall noise signal are as follows: estimating an overall noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) _v (t); estimating overall noisy speech signal vector by recursive methodyCorrelation matrix R of (t) _y (t)：R _y (t)＝αR _y (t-1)+(1-α)y(t)y ^T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R _x (t)＝R _y (t)-R _v (t) estimating an overall clean speech signal vectorxCorrelation matrix R of (t) _x (t); based on the correlation matrix R of the voice signals _x (t) determining the vectorρ(t) (located in matrix R _x The 1 st element in (t) isMatrix R _x Column 1 of (t) divided by +.>I.e. as vectorsρ(t))；

The specific method of the step 3 is as follows:

in the conventional method, in order to achieve the purpose of noise reduction, the whole noisy signal vector with length of MLy(t) passing through a linear filterh(t), i.e

Wherein z (t) is the desired signal x ₁ An estimate of (t). h is a _m (t) (m=1, 2,., M is a linear filter for the mth channel, length L, hand (t) is a linear filter with length ML. In the conventional method, therefore, it is necessary to estimate a filter having a length of MLh(t)。

In this embodiment, to derive time domain multichannel degradation based on kronecker decompositionNoise scheme, let h _m (t) (m=1, 2,) M is written in matrix form, i.e.

H(t)＝[h ₁ (t) h ₂ (t) … h _M (t)] (5)

wherein H is ₁ (t)＝[h _1,1 (t) h _1,2 (t) … h _1,L (t)]And H ₂ (t)＝[h _2,1 (t) h _2,2 (t) … h _2,M (t)]Left singular vectors H respectively of matrix H (t) _1,l (t) (l=1, 2,., L) and right singular vector h _2,m (t) (m=1, 2,.. M.) orthogonal matrices are formed, the dimensions being l×l and m×m, respectively, Σ (t) being a rectangular diagonal matrix of dimension l×m, the diagonal elements of which are the singular values (non-negative and real) of matrix H (t). Here, the singular values are arranged from large to small, i.e. σ ₁ (t)≥σ ₂ (t)≥…≥σ _M (t)≥0。

Due to noisy signal vector y _m (t) (m=1, 2,., M) is strongly correlated, so the filter h _m The correlation between (t) (m=1, 2,., M) is also strong, which results in a very large difference in the magnitude of the singular values of matrix H (t), so that the matrix H (t) can be approximated with the singular vectors corresponding to the largest singular values of P (p.ltoreq.min (M, L)) before, i.e.

Wherein,it should be noted that this representation is not unique, since +.>

Based onFilter (7)h(t) can be expressed approximately as

Wherein,h(t)＝vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents the kronecker product (Kronecker product). When the larger the P is, the more,h _P (t) pairhThe better the approximation of (t), when p=m,h _P (t)＝h(t)。

application relation type

Can be used forh _P (t) written as

At this time, the desired signal x ₁ The estimate z (t) of (t) can be written as

Wherein,

y _T,P (t)＝[y ^T (t)H _T,1 (t) y ^T (t)H _T,2 (t)… y ^T (t)H _T,P (t)] ^T

＝H _T,P (t)y(t)

y _S,P (t)＝[y ^T (t)H _S,1 (t) y ^T (t)H _S,2 (t) … y ^T (t)H _S,P (t)] ^T

＝H _S,P (t)y(t)

H _T,P (t)＝[H _T,1 (t) H _T,2 (t) … H _T,P (t)] ^T

H _S,P (t)＝[H _S,1 (t) H _S,2 (t) … H _S,P (t)] ^T

vector/matrixh _T,P (t)，h _S,P (t)，y _T,P (t)，y _S,P (t)，H _T,P (t) A method of producing a solid-state image sensorH _S,P The dimensions of (t) are LP×1, MP×1, LP×1, MP×ML, and LP×ML, respectively.h _T,P (t) is a sub-filter vector that performs a speech noise reduction function in the time domain,h _S,P (t) is a sub-filter vector that performs a speech noise reduction function in the spatial domain,H _T,P (t) is a sub-filter matrix that performs a voice noise reduction function in the time domain,H _S,P (t) is a sub-filter matrix which plays a role in voice noise reduction in the space domain,y _T,P (t) is a sub-filtered matrixH _T,P (t) a filtered noisy speech signal vector,y _S,P (t) is a sub-filtered matrixH _S,P (t) a filtered noisy speech signal vector.

By deformation, z (t) can be expressed as

Wherein Y (t) = [ Y ] ₁ (t) y ₂ (t) … y _M (t)]For noisy signal matrices, the matrices X (t) and V (t) are defined similarly to Y (t) (vec [ Y (t))]＝y(t)，vec[X(t)]＝x(t)，vec[V(t)]＝v(t)), the dimensions of the matrices Y (t), X (t) and V (t) are all LxM. As can be seen from equation (12), the filter h _T,p (t) (p=1, 2,., P) and h _S,p (t) (p=1, 2, …, P) achieves noise reduction in the time and space dimensions, respectively.

Formula (11) may be further written as follows:

wherein the vector isx _S,P (t)＝H _S,P (t)x(t)，v _S,P (t)＝H _S,P (t)v(t) are LP in length, vectorx _T,P (t)＝H _T,P (t)x(t)，v _T,P (t)＝H _T,P (t)vThe length of (t) is MP,in order to filter the speech signal after it has been processed,is the filtered residual noise.

Because the two parts of the desired signal estimate z (t) are uncorrelated, the variance of z (t) is

Wherein,R _y (t)＝E[y(t)y ^T (t)]，R _x (t)＝E[x(t)x ^T (t)]，R _v (t)＝E[v(t)v ^T (t)](R _v (t) is a full rank matrix). For simplicity, in the following description, the symbol t is removed where it does not cause ambiguity.

To derive an iterative noise reduction filter based on the kronecker decomposition, the mean square error of the desired signal estimate z is derived. Define the error of z as

ε(t)＝z(t)-x ₁ (t) (15)

Based on equation (15), the mean square error of z can be defined as

Wherein,

the mean square error (16) of the desired signal estimate z can be written as using equation (10)

Wherein,

/>

ρ _S,P (t)＝H _S,P (t)ρ(t) (19)

ρ _T,P (t)＝H _T,P (t)ρ(t) (21)

as can be seen from the above several formulas, when P is small, the required matrix is in the noise reduction scheme based on kronecker decomposition(dimension is MP×MP) and +.>The dimension (LP x LP) is much smaller than the matrix R required in conventional noise reduction schemes _y (t) (dimension is ML by ML). The following advantages are brought about: 1) The complexity of matrix inversion can be significantly reduced; 2) Phase matrix R _y (t) estimation matrix->And->Fewer observation samples are required. Thus, based on a matrixAnd->Generally having lower complexity and better noise statistics tracking capabilities.

To derive an iterative wiener filter based on the kronecker decomposition, one is fixed separatelyh _S,P (t) andh _T,P (t) writing the formula (17) as follows

The filter is derived as follows:

will beh _T,P The initial value of (t) is set as

Wherein, h _W,p (t) is a wiener filter of the p-th channel, and the length is L.(matrix R _y The matrix of elements in rows (p-1) L+1 to pL and columns (p-1) L+1 to pL) is the autocorrelation matrix of the vector of the noise-added speech signal of the p-th channel, vector->Is vector quantityρA vector composed of (p-1) L+1 to pL elements,for matrix R _x The element located in the p-th row and p-th column.

Application ofStructure->And brings it into the formulae (20) and (21), it is possible to obtain

Will beAnd->Is brought to formula (23)

Pairing (26)Deriving and zeroing the result to obtain

/>

Wherein, superscript (·) ⁽ⁿ⁾ The result of the nth iteration is shown.

Application ofStructure->And brings it into the formulae (18) and (19), it is possible to obtain

Will beAnd->Is brought to (22) can obtain

Pairing (30)Deriving and zeroing the result to obtain

According to the thought, the method can be obtained after the iteration is continued for n times

Wherein,

based on equations (32) and (33), an iterative wiener filter based on the kronecker decomposition after the nth iteration can be obtained:

passing the noisy signal through a designed iterative Venus noise reduction filter based on Cronecker decompositionh _W,P (t) obtaining the estimated value of the clean voice signal after noise reductionOr may be +.>And obtaining a noise-reduced voice signal z (t), wherein the two methods are equivalent.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A time domain multichannel voice noise reduction method based on Cronecker decomposition is characterized by comprising the following steps:

2. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 1, wherein,

the method for collecting the noisy speech signal and preprocessing the noisy speech signal comprises the following steps:

in speech noise reduction, the time domain signal model is:

y _m (t)＝x _m (t)+v _m (t) (1)

here, t represents a discrete point in time, subscript (.) _m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x _m (t) and v _m (t) representing the clean speech signal and the additive noise signal received by the mth microphone, y _m (t) represents the noisy speech signal received by the mth microphone, x _m (t) and v _m (t) independent of each other; the 1 st microphone in the microphone array is selected as the reference microphone, i.e. x ₁ (t) as a desired signal;

wherein x is _m (t) and v _m Definition of (t) and y _m (t) is similar, namely:

x _m (t)＝[x _m (t) x _m (t-1) … x _m (t-L+1)] ^T

v _m (t)＝[v _m (t) v _m (t-1) … v _m (t-L+1)] ^T

wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e

3. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 2, wherein,

the process of estimating the statistical properties of the noisy speech signal and the noise signal comprises:

estimating the ensemble noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) _v (t) estimating the overall noisy signal vector by a recursive algorithmyCorrelation matrix R of (t) _y (t)：R _y (t)＝αR _y (t-1)+(1-α)y(t)y ^T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R _x (t)＝R _y (t)-R _v (t) estimating a correlation matrix R of the overall clean speech signal vector x (t) _x (t) based on a speech signal correlation matrix R _x (t) determining a vectorρ(t) obtaining statistical properties.

4. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 3, wherein,

based on the correlation matrix R of the voice signals _x (t) determining a vectorρThe process of (t) comprises:

5. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 3, wherein,

the process of obtaining the iterative wiener noise reduction filter based on the kronecker decomposition based on the statistical characteristics comprises the following steps:

6. The method for time-domain multi-channel speech noise reduction based on kronecker decomposition according to claim 5, wherein,

constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix, performing approximate representation, and obtaining the approximate representation of the linear filter with the same length based on the approximate representation of the singular value decomposition result, wherein the process of obtaining the approximate representation of the linear filter with the same length comprises the following steps:

Wherein z (t) is the desired signal x ₁ An estimated value of (t), h _m (t) (m=1, 2,., M is a linear filter for the mth channel, length L, h(t) is a linear filter of length ML;

H(t)＝[h ₁ (t) h ₂ (t) … h _M (t)] (5)

based on (7), the filterh(t) can be expressed approximately as

Wherein,h(t)＝vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents a kronecker product (Kronecker product); when the larger the P is, the more,h _P (t) pairhThe better the approximation of (t), when p=m,h _P (t)＝h(t)；

application relation type

Will beh _P (t) written as

7. The method of time domain multi-channel speech noise reduction based on kronecker decomposition according to claim 6, wherein,

the process of improving the expression of the desired signal estimate based on the approximate representation of the linear filter includes:

desired signal x based on (10) ₁ The estimate z (t) of (t) is written as:

wherein,

H _T,P (t)＝[H _T,1 (t) H _T,2 (t) … H _T,P (t)] ^T

H _S,P (t)＝[H _S,1 (t) H _S,2 (t) … H _S,P (t)] ^T

by deforming, z (t) is expressed as:

wherein Y (t) = [ Y ] ₁ (t) y ₂ (t) … y _M (t)]For matrix with noise signals vec [ Y (t) ]]＝y(t) the dimension of the matrix Y (t) V (t) is LxM; filter h _T,p (t) (p=1, 2,., P) and h _S,p (t) (p=1, 2,., P) achieves noise reduction in the time and space dimensions, respectively;

formula (11) may be further written as follows:

8. The method of time domain multi-channel speech noise reduction based on kronecker decomposition according to claim 7, wherein,

defining a mean square error of an estimate of a desired signal, the process of warping the mean square error based on an approximate representation of a linear filter comprising:

ε(t)＝z(t)-x ₁ (t) (15)

Based on equation (15), the mean square error of z (t) is defined as

Wherein,

ρ _S,P (t)＝H _S,P (t)ρ(t) (19)

ρ _T,P (t)＝H _T,P (t)ρ(t) (21)

respectively fixh _S,P (t) andh _T,P (t) writing formula (17) in the form:

9. the method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 1, wherein,

the process of obtaining the iterative wiener filter based on the kronecker decomposition based on the deformed mean square error expression comprises the following steps:

step one: will beh _T,P The initial value of (t) is set asWherein, h _W,p (t) wiener filter for the p-th channel, length L,/L>For matrix R _y (t) a matrix of elements located in the (p-1) th L+1 to pL rows and (p-1) th L+1 to pL columns, which is an autocorrelation matrix of the vector of the p-th channel noisy speech signal, vector->Is vector quantityρVector composed of (p-1) L+1 to pL elements of (t),. About.>For matrix R _x (t) an element located in a p-th row and a p-th column;

step two: application ofBy the formula->AndH _T,P (t)＝[H _T,1 (t) H _T,2 (t) … H _T,P (t)] ^T Structure->And brings it to the formula->And->In (1) get->And->Superscript (·) ⁽ⁿ⁾ Representing the result of the nth iteration;

step three: will beAnd->Carry-in to->In (1) get->

Step five: will beAnd->Carry-in to->In (1) to obtain