CN117894332A - Time domain multichannel voice noise reduction method based on Cronecker decomposition - Google Patents

Time domain multichannel voice noise reduction method based on Cronecker decomposition Download PDF

Info

Publication number
CN117894332A
CN117894332A CN202410241313.XA CN202410241313A CN117894332A CN 117894332 A CN117894332 A CN 117894332A CN 202410241313 A CN202410241313 A CN 202410241313A CN 117894332 A CN117894332 A CN 117894332A
Authority
CN
China
Prior art keywords
signal
matrix
noise reduction
vector
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410241313.XA
Other languages
Chinese (zh)
Inventor
王向辉
李梅
韩宗乐
王桂宝
赵莹珂
田旭华
王姣
郭晶
陈晓屹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202410241313.XA priority Critical patent/CN117894332A/en
Publication of CN117894332A publication Critical patent/CN117894332A/en
Pending legal-status Critical Current

Links

Landscapes

  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a time domain multichannel voice noise reduction method based on Cronecker decomposition, which comprises the following steps: collecting a voice signal with noise, and preprocessing the voice signal with noise; estimating the statistical characteristics of the voice signal with noise and the noise signal; obtaining an iterative Venus noise reduction filter based on Cronecker decomposition based on the statistical characteristics; noise reduction filtering is carried out on the voice signals with noise based on the iterative Venus noise reduction filter, and estimated values of the clean voice signals are obtained. The invention uses singular value decomposition and Cronecker decomposition to realize low-rank representation of the traditional multi-channel noise reduction filter, decomposes filter coefficients in time domain dimension and space domain dimension, converts the estimation problem of one long filter into the estimation problem of two shorter sub-filters, reduces estimation parameters, reduces algorithm complexity, improves the tracking capability of noise signal statistical characteristic change, and has the other advantage of no music noise compared with the prior frequency domain voice noise reduction method applied in an actual system.

Description

Time domain multichannel voice noise reduction method based on Cronecker decomposition
Technical Field
The invention relates to the field of voice noise reduction, in particular to a time domain multichannel voice noise reduction method based on Cronecker decomposition.
Background
In a daily environment, various types of noise exist, and a voice signal collected by a microphone is necessarily contaminated with various environmental noises, and the noise deteriorates the quality and intelligibility of the voice signal and may cause hearing fatigue of a listener. The speech noise reduction technique is particularly important. The voice noise reduction technology aims at suppressing the influence of noise, and aims at recovering a 'clean' voice signal from a voice signal with noise, so that the quality and the intelligibility of voice are improved, and the voice noise reduction technology plays an important role in voice communication.
According to whether the voice noise reduction algorithm utilizes spatial information, it can be classified into a single channel noise reduction algorithm and a multi-channel noise reduction algorithm. According to the difference of the execution domains of the voice noise reduction algorithm, the noise reduction algorithm can be further divided into a time domain algorithm and a transform domain algorithm (such as a wavelet domain, a frequency domain and the like). At present, the application range of the frequency domain voice noise reduction method is wider. The reason is that the complexity of the frequency domain noise reduction method is low, and real-time noise reduction can be realized. But has a disadvantage in that music noise (music noise) is easily generated. Through researches, people are more difficult to endure music noise than noise in daily life. Therefore, reducing the musical noise generated by the frequency domain noise reduction algorithm has been a research hotspot for researchers. The advantage of the time domain speech noise reduction method is that it does not generate musical noise. The biggest bottleneck limiting the practical deployment of time-domain speech noise reduction methods is their high complexity. The reason is that in time-domain speech noise reduction algorithms, their filters are typically long, resulting in excessive complexity. Particularly, for a multichannel voice noise reduction algorithm, the multichannel voice noise reduction algorithm is difficult to deploy in a practical system to perform real-time noise reduction processing on a voice signal with noise. In addition, the application of a longer filter also brings about the following two problems: firstly, the estimation error of the signal correlation matrix increases along with the increase of the length of the filter, and finally the noise reduction performance of the algorithm is reduced; second, more observation samples are needed to estimate the correlation matrix of the signal, to calculate the coefficients of the filter, resulting in a reduced ability of the algorithm to track changes in the statistical characteristics of the signal.
In order to solve the problems, the invention provides a design method of a time domain multi-channel iterative noise reduction filter based on Cronecker decomposition.
Disclosure of Invention
The invention aims to provide a time domain multichannel voice noise reduction method based on Cronecker decomposition so as to solve the problems in the prior art.
In order to achieve the above object, the present invention provides a time domain multichannel speech noise reduction method based on kronecker decomposition, comprising:
collecting a voice signal with noise, and preprocessing the voice signal with noise;
estimating the statistical characteristics of the noisy speech signal and the noise signal;
obtaining an iterative Venus noise reduction filter based on the Kronecker decomposition based on the statistical characteristics;
and carrying out noise reduction filtering on the noise-carrying voice signal based on the iterative Venus noise reduction filter to obtain an estimated value of the clean voice signal.
Optionally, collecting the noisy speech signal, and preprocessing the noisy speech signal includes:
in speech noise reduction, the time domain signal model is:
y m (t)=x m (t)+v m (t) (1)
here, t represents a discrete time point, subscript(·) m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x m (t) and v m (t) representing the clean speech signal and the additive noise signal received by the mth microphone, y m (t) represents the noisy speech signal received by the mth microphone, x m (t) and v m (t) independent of each other; the 1 st microphone in the microphone array is selected as the reference microphone, i.e. x 1 (t) as a desired signal;
by combining L consecutive sample points together, the signal received by the mth microphone is written as a vector of length L:
wherein x is m (t) and v m Definition of (t) and y m (t) is similar, namely:
x m (t)=[x m (t) x m (t-1) … x m (t-L+1)] T
v m (t)=[v m (t) v m (t-1) … v m (t-L+1)] T
x m (t) and v m (t) representing the expected signal vector of the mth channel and the noise signal vector of the mth channel, y, respectively m (t) noisy signal vector representing the mth channel, superscript (·) T Representing a transpose;
m noisy signal vectors y of length L m (t) (m=1, 2,) spliced together, M is written:
wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e
y(t)、x(t) andv(t) representing an overall noisy signal vector, an overall clean speech signal vector, and an overall noise signal vector, respectively.
Optionally, the process of estimating the statistical characteristics of the noisy speech signal and the noise signal includes:
estimating the ensemble noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) v (t) estimating the overall noisy signal vector by a recursive algorithmyCorrelation matrix R of (t) y (t):R y (t)=αR y (t-1)+(1-α)y(t)y T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R x (t)=R y (t)-R v (t) estimating an overall clean speech signal vectorxCorrelation matrix R of (t) x (t) based on a speech signal correlation matrix R x (t) determining a vectorρ(t) obtaining statistical properties.
Optionally, based on the speech signal correlation matrix R x (t) determining a vectorρThe process of (t) comprises:
extracting the voice signal correlation matrix R x (t) elements of the first row and the first column divided by elements of the first row and the first column to obtain a vectorρ(t)。
Optionally, the process of obtaining the iterative wiener noise reduction filter based on the kronecker decomposition based on the statistical characteristics includes:
constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix, performing approximate representation, and obtaining approximate representation of the linear filter with the same length based on approximate representation of a singular value decomposition result;
improving an expression of the expected signal estimated value based on the approximate expression of the linear filter, and expressing the expected signal estimated value by adopting the filtered voice signal and the filtered residual noise;
and defining a mean square error of the expected signal estimated value, deforming the mean square error based on an approximate representation of the linear filter, and obtaining an iterative wiener filter based on the Kronecker decomposition based on a deformed mean square error expression.
Optionally, constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix and performing approximate representation, and obtaining the approximate representation of the linear filter with the same length based on the approximate representation of the singular value decomposition result comprises the following steps:
to achieve the purpose of noise reduction, the whole noisy signal vector with length ML is neededy(t) passing through a linear filterh(t), i.e
Wherein z (t) is the desired signal x 1 An estimated value of (t), h m (t) (m=1, 2, …, M) is a linear filter of the mth channel, length L, h(t) is a linear filter of length ML;
to derive a time domain multi-channel noise reduction scheme based on kronecker decomposition, h is set to m (t) (m=1, 2,) M is written in matrix form, i.e.
H(t)=[h 1 (t) h 2 (t) … h M (t)] (5)
Using singular value decomposition, the matrix H (t) can be decomposed into the following forms:
wherein H is 1 (t)=[h 1,1 (t) h 1,2 (t) … h 1,L (t)]And H 2 (t)=[h 2,1 (t) h 2,2 (t) … h 2,M (t)]Left singular vectors H respectively of matrix H (t) 1,l (t) (l=1, 2,., L) and right singular vector h 2,m (t) (m=1, 2,.. M.) orthogonal matrices having dimensions l×l and m×m, respectively, Σ (t) being a rectangular diagonal matrix having dimensions l×m, the diagonal elements of which are the singular values of matrix H (t) and are non-negative real numbers; ranking singular values from large to small, i.e. sigma 1 (t)≥σ 2 (t)≥…≥σ M (t)≥0;
Approximating matrix H (t) with singular vectors corresponding to the P (P.ltoreq.min (M, L)) maximum singular values, i.e.
Wherein h is 1,p (t) is the singular value σ of the matrix H (t) p (t)(σ p (t) is a left singular vector (p=1, 2,..p.) corresponding to the P-th singular value of the large-to-small permutation of the matrix H (t), H 2,p (t) is the singular value σ of the matrix H (t) p (t) the corresponding right singular vectors (p=1, 2, once again, P),
based on (7), the filterh(t) can be expressed approximately as
Wherein,h(t)=vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents a kronecker product (Kronecker product); when the larger the P is, the more,h P (t) pairhNear (t)The better the degree of similarity, when p=m,h P (t)=h(t);
application relation type
Will beh P (t) written as
Wherein the method comprises the steps ofThe dimension is ML x L->Its dimension is ML x M, I L And I M The identity matrices are of dimensions L x L and M x M, respectively.
Optionally, the process of improving the expression of the desired signal estimate based on the approximate representation of the linear filter includes:
desired signal x based on (10) 1 The estimate z (t) of (t) is written as:
wherein,
y T,P (t)=[y T (t)H T,1 (t) y T (t)H T,2 (t) … y T (t)H T,P (t)] TH T,P (t)y(t)
y S,P (t)=[y T (t)H S,1 (t) y T (t)H S,2 (t) … y T (t)H S,P (t)] TH S,P (t)y(t)
H T,P (t)=[H T,1 (t) H T,2 (t) … H T,P (t)] T
H S,P (t)=[H S,1 (t) H S,2 (t) … H S,P (t)] T
vector/matrixh T,P (t),h S,P (t),y T,P (t),y S,P (t),H T,P (t) A method of producing a solid-state image sensorH S,P The dimensions of (t) are LP×1, MP×1, LP×1, MP×ML, and LP×ML, respectively;h T,P (t) is a sub-filter vector that performs a speech noise reduction function in the time domain,h S,P (t) is a sub-filter vector that performs a speech noise reduction function in the spatial domain,H T,P (t) is a sub-filter matrix that performs a voice noise reduction function in the time domain,H S,P (t) is a sub-filter matrix which plays a role in voice noise reduction in the space domain,y T,P (t) is a sub-filtered matrixH T,P (t) a filtered noisy speech signal vector,y S,P (t) is a sub-filtered matrixH S,P (t) a filtered noisy speech signal vector;
by deforming, z (t) is expressed as:
wherein Y (t) = [ Y ] 1 (t) y 2 (t) … y M (t)]For matrix with noise signals vec [ Y (t) ]]=y(t) the dimension of the matrix Y (t) V (t) is LxM; filter h T,p (t) (p=1, 2, …, P) and h S,p (t) (p=1, 2, …, P) in the time dimension and null, respectivelyNoise reduction is realized in the inter-dimension;
formula (11) may be further written as follows:
wherein the vector isx S,P (t)=H S,P (t)x(t),v S,P (t)=H S,P (t)v(t) are LP in length, vectorx T,P (t)=H T,P (t)x(t),v T,P (t)=H T,P (t)vThe length of (t) is MP,for a filtered speech signal, < >>Is the filtered residual noise.
Optionally, defining a mean square error of the desired signal estimate, and deforming the mean square error based on an approximate representation of the linear filter includes:
deriving the mean square error of the estimated value z (t) of the desired signal, defining the error of z (t) as
ε(t)=z(t)-x 1 (t) (15)
Based on equation (15), the mean square error of z (t) is defined as
Wherein,
using equation (10), the mean square error (16) of the desired signal estimate z (t) is written as
Wherein,
ρ S,P (t)=H S,P (t)ρ(t) (19)
ρ T,P (t)=H T,P (t)ρ(t) (21)
respectively fixh S,P (t) andh T,P (t) writing formula (17) in the form:
optionally, the process of obtaining the iterative wiener filter based on the kronecker decomposition based on the deformed mean square error expression includes:
step one: will beh T,P The initial value of (t) is set asWherein, h W,p (t) a wiener filter for the p-th channel, length L,for matrix R y (t) a matrix of elements located in the (p-1) th L+1 to pL rows and (p-1) th L+1 to pL columns, which is an autocorrelation matrix of the vector of the p-th channel noisy speech signal, vector->Is vector quantityρVector composed of (p-1) L+1 to pL elements of (t),. About.>For matrix R x (t) an element located in a p-th row and a p-th column;
step two: application ofBy the formula->AndH T,P (t)=[H T,1 (t) H T,2 (t) … H T,P (t)] T Structure->And brings it to the formula->Andin (1) get->And->Superscript (·) (n) Representing the result of the nth iteration;
step three: will beAnd->Carry-in to->In (1) to obtain
Step four: application ofBy the formula->AndH S,P (t)=[H S,1 (t) H S,2 (t) … H S,P (t)] T Structure->And brings it to the formula->And->In (1) get->And->
Step five: will beAnd->Carry-in to->In (1) to obtain
Repeating the steps from the second step to the fifth step for N times, wherein N is a freely set parameter, and obtaining the iterative wiener filter based on the Cronecker decomposition
The invention has the technical effects that:
the invention realizes the low-rank representation of the traditional multichannel noise reduction filter by applying singular value decomposition and Crohn's decomposition, and decomposes the filter coefficients playing a role in noise reduction in the time domain dimension and the space domain dimension, thereby converting the estimation problem of one long filter into the estimation problem of two shorter sub-filters and realizing the multichannel noise reduction scheme based on the Crohn's decomposition. A shorter filter means fewer parameters need to be estimated, so the algorithm complexity can be significantly reduced. In addition, the number of signal samples required for estimating a shorter filter is also reduced, so that the noise reduction algorithm can better track the change of the signal statistical characteristics and is more suitable for processing non-stationary noise. Compared with the traditional multi-channel noise reduction filter, the multi-channel noise reduction filter based on the Kroneck decomposition can more flexibly control the compromise between the noise reduction performance and the algorithm complexity when processing stable noise; when processing non-stationary noise, better noise reduction performance can be obtained, and the complexity is lower; compared with the prior frequency domain voice noise reduction method widely applied in a practical system, the method has the further advantage that music noise does not exist.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a schematic flow chart of a method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a system structure according to an embodiment of the invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
As shown in fig. 1-2, in this embodiment, a time domain multichannel voice noise reduction method based on kronecker decomposition is provided, which includes:
step 1, collecting a voice signal with noise;
step 2, estimating the statistical characteristics of the voice signal with noise and the noise signal;
step 3, estimating an iterative Venus noise reduction filter based on Cronecker decomposition;
and 4, filtering and denoising the voice signal with noise to obtain an estimated value of the clean voice signal.
The specific method of the step 2 is as follows:
in speech noise reduction, the time domain signal model is:
y m (t)=x m (t)+v m (t) (1)
here, t represents a discrete point in time, subscript (.) m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x m (t) and v m (t) represents a clean speech signal and an additive noise signal, y, respectively m (t) represents a noisy speech signal, x m (t) and v m (t) are independent of each other. In the invention, all signals are zero mean value and broadband real signals. In the invention, the 1 st microphone in the microphone array is selected as the reference microphone, namely x is selected 1 (t) as a desired signal (signal to be recovered). But in theory, any microphoneWind may be used as the reference microphone.
By combining L consecutive sample points together, the signal received by the mth microphone can be written as a vector of length L:
wherein x is m (t) and v m Definition of (t) and y m (t) is similar, namely:
x m (t)=[x m (t) x m (t-1) … x m (t-L+1)] T
v m (t)=[v m (t) v m (t-1) … v m (t-L+1)] T
x m (t) and v m (t) representing the desired signal vector and the noise signal vector, respectively, superscript (·) T Representing the transpose.
In conventional time-domain multi-channel speech enhancement, M noisy signal vectors y of length L are typically used m (t) (m=1, 2,) spliced together, M is written:
wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e
y(t)、x(t) andv(t) representing an overall noisy signal vector, an overall clean speech signal vector, and an overall noise signal vector, respectively.
Estimating the integral noisy speech signal in the step 2The statistical characteristics of the number and the overall noise signal are as follows: estimating an overall noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) v (t); estimating overall noisy speech signal vector by recursive methodyCorrelation matrix R of (t) y (t):R y (t)=αR y (t-1)+(1-α)y(t)y T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R x (t)=R y (t)-R v (t) estimating an overall clean speech signal vectorxCorrelation matrix R of (t) x (t); based on the correlation matrix R of the voice signals x (t) determining the vectorρ(t) (located in matrix R x The 1 st element in (t) isMatrix R x Column 1 of (t) divided by +.>I.e. as vectorsρ(t));
The specific method of the step 3 is as follows:
in the conventional method, in order to achieve the purpose of noise reduction, the whole noisy signal vector with length of MLy(t) passing through a linear filterh(t), i.e
Wherein z (t) is the desired signal x 1 An estimate of (t). h is a m (t) (m=1, 2,., M is a linear filter for the mth channel, length L, hand (t) is a linear filter with length ML. In the conventional method, therefore, it is necessary to estimate a filter having a length of MLh(t)。
In this embodiment, to derive time domain multichannel degradation based on kronecker decompositionNoise scheme, let h m (t) (m=1, 2,) M is written in matrix form, i.e.
H(t)=[h 1 (t) h 2 (t) … h M (t)] (5)
Using singular value decomposition, the matrix H (t) can be decomposed into the following forms:
wherein H is 1 (t)=[h 1,1 (t) h 1,2 (t) … h 1,L (t)]And H 2 (t)=[h 2,1 (t) h 2,2 (t) … h 2,M (t)]Left singular vectors H respectively of matrix H (t) 1,l (t) (l=1, 2,., L) and right singular vector h 2,m (t) (m=1, 2,.. M.) orthogonal matrices are formed, the dimensions being l×l and m×m, respectively, Σ (t) being a rectangular diagonal matrix of dimension l×m, the diagonal elements of which are the singular values (non-negative and real) of matrix H (t). Here, the singular values are arranged from large to small, i.e. σ 1 (t)≥σ 2 (t)≥…≥σ M (t)≥0。
Due to noisy signal vector y m (t) (m=1, 2,., M) is strongly correlated, so the filter h m The correlation between (t) (m=1, 2,., M) is also strong, which results in a very large difference in the magnitude of the singular values of matrix H (t), so that the matrix H (t) can be approximated with the singular vectors corresponding to the largest singular values of P (p.ltoreq.min (M, L)) before, i.e.
Wherein,it should be noted that this representation is not unique, since +.>
Based onFilter (7)h(t) can be expressed approximately as
Wherein,h(t)=vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents the kronecker product (Kronecker product). When the larger the P is, the more,h P (t) pairhThe better the approximation of (t), when p=m,h P (t)=h(t)。
application relation type
Can be used forh P (t) written as
Wherein the method comprises the steps ofThe dimension is ML x L->Its dimension is ML x M, I L And I M The identity matrices are of dimensions L x L and M x M, respectively.
At this time, the desired signal x 1 The estimate z (t) of (t) can be written as
Wherein,
y T,P (t)=[y T (t)H T,1 (t) y T (t)H T,2 (t)… y T (t)H T,P (t)] T
H T,P (t)y(t)
y S,P (t)=[y T (t)H S,1 (t) y T (t)H S,2 (t) … y T (t)H S,P (t)] T
H S,P (t)y(t)
H T,P (t)=[H T,1 (t) H T,2 (t) … H T,P (t)] T
H S,P (t)=[H S,1 (t) H S,2 (t) … H S,P (t)] T
vector/matrixh T,P (t),h S,P (t),y T,P (t),y S,P (t),H T,P (t) A method of producing a solid-state image sensorH S,P The dimensions of (t) are LP×1, MP×1, LP×1, MP×ML, and LP×ML, respectively.h T,P (t) is a sub-filter vector that performs a speech noise reduction function in the time domain,h S,P (t) is a sub-filter vector that performs a speech noise reduction function in the spatial domain,H T,P (t) is a sub-filter matrix that performs a voice noise reduction function in the time domain,H S,P (t) is a sub-filter matrix which plays a role in voice noise reduction in the space domain,y T,P (t) is a sub-filtered matrixH T,P (t) a filtered noisy speech signal vector,y S,P (t) is a sub-filtered matrixH S,P (t) a filtered noisy speech signal vector.
By deformation, z (t) can be expressed as
Wherein Y (t) = [ Y ] 1 (t) y 2 (t) … y M (t)]For noisy signal matrices, the matrices X (t) and V (t) are defined similarly to Y (t) (vec [ Y (t))]=y(t),vec[X(t)]=x(t),vec[V(t)]=v(t)), the dimensions of the matrices Y (t), X (t) and V (t) are all LxM. As can be seen from equation (12), the filter h T,p (t) (p=1, 2,., P) and h S,p (t) (p=1, 2, …, P) achieves noise reduction in the time and space dimensions, respectively.
Formula (11) may be further written as follows:
wherein the vector isx S,P (t)=H S,P (t)x(t),v S,P (t)=H S,P (t)v(t) are LP in length, vectorx T,P (t)=H T,P (t)x(t),v T,P (t)=H T,P (t)vThe length of (t) is MP,in order to filter the speech signal after it has been processed,is the filtered residual noise.
Because the two parts of the desired signal estimate z (t) are uncorrelated, the variance of z (t) is
Wherein,R y (t)=E[y(t)y T (t)],R x (t)=E[x(t)x T (t)],R v (t)=E[v(t)v T (t)](R v (t) is a full rank matrix). For simplicity, in the following description, the symbol t is removed where it does not cause ambiguity.
To derive an iterative noise reduction filter based on the kronecker decomposition, the mean square error of the desired signal estimate z is derived. Define the error of z as
ε(t)=z(t)-x 1 (t) (15)
Based on equation (15), the mean square error of z can be defined as
Wherein,
the mean square error (16) of the desired signal estimate z can be written as using equation (10)
Wherein,
/>
ρ S,P (t)=H S,P (t)ρ(t) (19)
ρ T,P (t)=H T,P (t)ρ(t) (21)
as can be seen from the above several formulas, when P is small, the required matrix is in the noise reduction scheme based on kronecker decomposition(dimension is MP×MP) and +.>The dimension (LP x LP) is much smaller than the matrix R required in conventional noise reduction schemes y (t) (dimension is ML by ML). The following advantages are brought about: 1) The complexity of matrix inversion can be significantly reduced; 2) Phase matrix R y (t) estimation matrix->And->Fewer observation samples are required. Thus, based on a matrixAnd->Generally having lower complexity and better noise statistics tracking capabilities.
To derive an iterative wiener filter based on the kronecker decomposition, one is fixed separatelyh S,P (t) andh T,P (t) writing the formula (17) as follows
The filter is derived as follows:
will beh T,P The initial value of (t) is set as
Wherein, h W,p (t) is a wiener filter of the p-th channel, and the length is L.(matrix R y The matrix of elements in rows (p-1) L+1 to pL and columns (p-1) L+1 to pL) is the autocorrelation matrix of the vector of the noise-added speech signal of the p-th channel, vector->Is vector quantityρA vector composed of (p-1) L+1 to pL elements,for matrix R x The element located in the p-th row and p-th column.
Application ofStructure->And brings it into the formulae (20) and (21), it is possible to obtain
Will beAnd->Is brought to formula (23)
Pairing (26)Deriving and zeroing the result to obtain
/>
Wherein, superscript (·) (n) The result of the nth iteration is shown.
Application ofStructure->And brings it into the formulae (18) and (19), it is possible to obtain
Will beAnd->Is brought to (22) can obtain
Pairing (30)Deriving and zeroing the result to obtain
According to the thought, the method can be obtained after the iteration is continued for n times
Wherein,
based on equations (32) and (33), an iterative wiener filter based on the kronecker decomposition after the nth iteration can be obtained:
passing the noisy signal through a designed iterative Venus noise reduction filter based on Cronecker decompositionh W,P (t) obtaining the estimated value of the clean voice signal after noise reductionOr may be +.>And obtaining a noise-reduced voice signal z (t), wherein the two methods are equivalent.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A time domain multichannel voice noise reduction method based on Cronecker decomposition is characterized by comprising the following steps:
collecting a voice signal with noise, and preprocessing the voice signal with noise;
estimating the statistical characteristics of the noisy speech signal and the noise signal;
obtaining an iterative Venus noise reduction filter based on the Kronecker decomposition based on the statistical characteristics;
and carrying out noise reduction filtering on the noise-carrying voice signal based on the iterative Venus noise reduction filter to obtain an estimated value of the clean voice signal.
2. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 1, wherein,
the method for collecting the noisy speech signal and preprocessing the noisy speech signal comprises the following steps:
in speech noise reduction, the time domain signal model is:
y m (t)=x m (t)+v m (t) (1)
here, t represents a discrete point in time, subscript (.) m Represents the signal received by the mth microphone (M microphones are arranged in the microphone array in the invention), x m (t) and v m (t) representing the clean speech signal and the additive noise signal received by the mth microphone, y m (t) represents the noisy speech signal received by the mth microphone, x m (t) and v m (t) independent of each other; the 1 st microphone in the microphone array is selected as the reference microphone, i.e. x 1 (t) as a desired signal;
by combining L consecutive sample points together, the signal received by the mth microphone is written as a vector of length L:
wherein x is m (t) and v m Definition of (t) and y m (t) is similar, namely:
x m (t)=[x m (t) x m (t-1) … x m (t-L+1)] T
v m (t)=[v m (t) v m (t-1) … v m (t-L+1)] T
x m (t) and v m (t) representing the expected signal vector of the mth channel and the noise signal vector of the mth channel, y, respectively m (t) noisy signal vector representing the mth channel, superscript (·) T Representing a transpose;
m noisy signal vectors y of length L m (t) (m=1, 2,) spliced together, M is written:
wherein,x(t) andvdefinition of (t) andy(t) is the same, i.e
y(t)、x(t) andv(t) representing an overall noisy signal vector, an overall clean speech signal vector, and an overall noise signal vector, respectively.
3. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 2, wherein,
the process of estimating the statistical properties of the noisy speech signal and the noise signal comprises:
estimating the ensemble noise signal vector by an existing noise estimation algorithmvCorrelation matrix R of (t) v (t) estimating the overall noisy signal vector by a recursive algorithmyCorrelation matrix R of (t) y (t):R y (t)=αR y (t-1)+(1-α)y(t)y T (t), wherein alpha is a forgetting factor (0 < alpha < 1); by R x (t)=R y (t)-R v (t) estimating a correlation matrix R of the overall clean speech signal vector x (t) x (t) based on a speech signal correlation matrix R x (t) determining a vectorρ(t) obtaining statistical properties.
4. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 3, wherein,
based on the correlation matrix R of the voice signals x (t) determining a vectorρThe process of (t) comprises:
extracting the voice signal correlation matrix R x (t) elements of the first row and the first column divided by elements of the first row and the first column to obtain a vectorρ(t)。
5. The method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 3, wherein,
the process of obtaining the iterative wiener noise reduction filter based on the kronecker decomposition based on the statistical characteristics comprises the following steps:
constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix, performing approximate representation, and obtaining approximate representation of the linear filter with the same length based on approximate representation of a singular value decomposition result;
improving an expression of the expected signal estimated value based on the approximate expression of the linear filter, and expressing the expected signal estimated value by adopting the filtered voice signal and the filtered residual noise;
and defining a mean square error of the expected signal estimated value, deforming the mean square error based on an approximate representation of the linear filter, and obtaining an iterative wiener filter based on the Kronecker decomposition based on a deformed mean square error expression.
6. The method for time-domain multi-channel speech noise reduction based on kronecker decomposition according to claim 5, wherein,
constructing a corresponding matrix of the linear filter with the same length as the integral noisy signal vector, performing singular value decomposition on the matrix, performing approximate representation, and obtaining the approximate representation of the linear filter with the same length based on the approximate representation of the singular value decomposition result, wherein the process of obtaining the approximate representation of the linear filter with the same length comprises the following steps:
to achieve the purpose of noise reduction, the whole noisy signal vector with length ML is neededy(t) passing through a linear filterh(t), i.e
Wherein z (t) is the desired signal x 1 An estimated value of (t), h m (t) (m=1, 2,., M is a linear filter for the mth channel, length L, h(t) is a linear filter of length ML;
to derive a time domain multi-channel noise reduction scheme based on kronecker decomposition, h is set to m (t) (m=1, 2,) M is written in matrix form, i.e.
H(t)=[h 1 (t) h 2 (t) … h M (t)] (5)
Using singular value decomposition, the matrix H (t) can be decomposed into the following forms:
wherein H is 1 (t)=[h 1,1 (t) h 1,2 (t) … h 1,L (t)]And H 2 (t)=[h 2,1 (t) h 2,2 (t) … h 2,M (t)]Left singular vectors H respectively of matrix H (t) 1,l (t) (l=1, 2,., L) and right singular vector h 2,m (t) (m=1, 2,.. M.) orthogonal matrices having dimensions l×l and m×m, respectively, Σ (t) being a rectangular diagonal matrix having dimensions l×m, the diagonal elements of which are the singular values of matrix H (t) and are non-negative real numbers; ranking singular values from large to small, i.e. sigma 1 (t)≥σ 2 (t)≥…≥σ M (t)≥0;
Approximating matrix H (t) with singular vectors corresponding to the P (P.ltoreq.min (M, L)) maximum singular values, i.e.
Wherein h is 1,p (t) is the singular value σ of the matrix H (t) p (t)(σ p (t) is a left singular vector (p=1, 2,..p.) corresponding to the P-th singular value of the large-to-small permutation of the matrix H (t), H 2,p (t) is the singular value σ of the matrix H (t) p (t) the corresponding right singular vectors (p=1, 2, once again, P),
based on (7), the filterh(t) can be expressed approximately as
Wherein,h(t)=vec[H(t)]symbol vec [. Cndot.]Vectorization operation of representing matrix, symbolRepresents a kronecker product (Kronecker product); when the larger the P is, the more,h P (t) pairhThe better the approximation of (t), when p=m,h P (t)=h(t);
application relation type
Will beh P (t) written as
Wherein the method comprises the steps ofThe dimension is ML x L->Its dimension is ML x M, I L And I M The identity matrices are of dimensions L x L and M x M, respectively.
7. The method of time domain multi-channel speech noise reduction based on kronecker decomposition according to claim 6, wherein,
the process of improving the expression of the desired signal estimate based on the approximate representation of the linear filter includes:
desired signal x based on (10) 1 The estimate z (t) of (t) is written as:
wherein,
H T,P (t)=[H T,1 (t) H T,2 (t) … H T,P (t)] T
H S,P (t)=[H S,1 (t) H S,2 (t) … H S,P (t)] T
vector/matrixh T,P (t),h S,P (t),y T,P (t),y S,P (t),H T,P (t) A method of producing a solid-state image sensorH S,P The dimensions of (t) are LP×1, MP×1, LP×1, MP×ML, and LP×ML, respectively;h T,P (t) is a sub-filter vector that performs a speech noise reduction function in the time domain,h S,P (t) is a sub-filter vector that performs a speech noise reduction function in the spatial domain,H T,P (t) is a sub-filter matrix that performs a voice noise reduction function in the time domain,H S,P (t) is a sub-filter matrix which plays a role in voice noise reduction in the space domain,y T,P (t) is a sub-filtered matrixH T,P (t) a filtered noisy speech signal vector,y S,P (t) is a sub-filtered matrixH S,P (t) a filtered noisy speech signal vector;
by deforming, z (t) is expressed as:
wherein Y (t) = [ Y ] 1 (t) y 2 (t) … y M (t)]For matrix with noise signals vec [ Y (t) ]]=y(t) the dimension of the matrix Y (t) V (t) is LxM; filter h T,p (t) (p=1, 2,., P) and h S,p (t) (p=1, 2,., P) achieves noise reduction in the time and space dimensions, respectively;
formula (11) may be further written as follows:
wherein the vector isx S,P (t)=H S,P (t)x(t),v S,P (t)=H S,P (t)v(t) are LP in length, vectorx T,P (t)=H T,P (t)x(t),v T,P (t)=H T,P (t)vThe length of (t) is MP,in order to filter the speech signal after it has been processed,is the filtered residual noise.
8. The method of time domain multi-channel speech noise reduction based on kronecker decomposition according to claim 7, wherein,
defining a mean square error of an estimate of a desired signal, the process of warping the mean square error based on an approximate representation of a linear filter comprising:
deriving the mean square error of the estimated value z (t) of the desired signal, defining the error of z (t) as
ε(t)=z(t)-x 1 (t) (15)
Based on equation (15), the mean square error of z (t) is defined as
Wherein,
using equation (10), the mean square error (16) of the desired signal estimate z (t) is written as
Wherein,
ρ S,P (t)=H S,P (t)ρ(t) (19)
ρ T,P (t)=H T,P (t)ρ(t) (21)
respectively fixh S,P (t) andh T,P (t) writing formula (17) in the form:
9. the method for time-domain multi-channel speech noise reduction based on the Kronecker decomposition according to claim 1, wherein,
the process of obtaining the iterative wiener filter based on the kronecker decomposition based on the deformed mean square error expression comprises the following steps:
step one: will beh T,P The initial value of (t) is set asWherein, h W,p (t) wiener filter for the p-th channel, length L,/L>For matrix R y (t) a matrix of elements located in the (p-1) th L+1 to pL rows and (p-1) th L+1 to pL columns, which is an autocorrelation matrix of the vector of the p-th channel noisy speech signal, vector->Is vector quantityρVector composed of (p-1) L+1 to pL elements of (t),. About.>For matrix R x (t) an element located in a p-th row and a p-th column;
step two: application ofBy the formula->AndH T,P (t)=[H T,1 (t) H T,2 (t) … H T,P (t)] T Structure->And brings it to the formula->And->In (1) get->And->Superscript (·) (n) Representing the result of the nth iteration;
step three: will beAnd->Carry-in to->In (1) get->
Step four: application ofBy the formula->AndH S,P (t)=[H S,1 (t) H S,2 (t) … H S,P (t)] T Structure->And brings it to the formula->And->In (1) get->And->
Step five: will beAnd->Carry-in to->In (1) to obtain
Repeating the steps from the second step to the fifth step for N times, wherein N is a freely set parameter, and obtaining the iterative wiener filter based on the Cronecker decomposition
CN202410241313.XA 2024-03-04 2024-03-04 Time domain multichannel voice noise reduction method based on Cronecker decomposition Pending CN117894332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410241313.XA CN117894332A (en) 2024-03-04 2024-03-04 Time domain multichannel voice noise reduction method based on Cronecker decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410241313.XA CN117894332A (en) 2024-03-04 2024-03-04 Time domain multichannel voice noise reduction method based on Cronecker decomposition

Publications (1)

Publication Number Publication Date
CN117894332A true CN117894332A (en) 2024-04-16

Family

ID=90649404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410241313.XA Pending CN117894332A (en) 2024-03-04 2024-03-04 Time domain multichannel voice noise reduction method based on Cronecker decomposition

Country Status (1)

Country Link
CN (1) CN117894332A (en)

Similar Documents

Publication Publication Date Title
Doclo et al. GSVD-based optimal filtering for single and multimicrophone speech enhancement
CN108172231B (en) Dereverberation method and system based on Kalman filtering
US8848933B2 (en) Signal enhancement device, method thereof, program, and recording medium
JP2007526511A (en) Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain
CN108447498B (en) Speech enhancement method applied to microphone array
CN110517701B (en) Microphone array speech enhancement method and implementation device
WO2020121590A1 (en) Signal processing device, signal processing method, and program
AU2006344268B2 (en) Blind signal extraction
Park et al. Subband-based blind signal separation for noisy speech recognition
CN112735460B (en) Beam forming method and system based on time-frequency masking value estimation
JP7486266B2 (en) Method and apparatus for determining a depth filter - Patents.com
US20040054528A1 (en) Noise removing system and noise removing method
Do et al. Speech Separation in the Frequency Domain with Autoencoder.
Huang et al. A minimum variance distortionless response filter based on the bifrequency spectrum for single-channel noise reduction
Esra et al. Speech Separation Methodology for Hearing Aid.
CN113409804A (en) Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace
KR102033469B1 (en) Adaptive noise canceller and method of cancelling noise
CN117894332A (en) Time domain multichannel voice noise reduction method based on Cronecker decomposition
Zehtabian et al. A novel speech enhancement approach based on singular value decomposition and genetic algorithm
Saoud et al. New speech enhancement based on discrete orthonormal stockwell transform
Hoya et al. Stereophonic noise reduction using a combined sliding subspace projection and adaptive signal enhancement
CN118116402A (en) Bilinear filtering-based multichannel voice noise reduction method
CN111899754A (en) Speech separation effect algorithm of GA _ FastICA algorithm
CN117854536B (en) RNN noise reduction method and system based on multidimensional voice feature combination
CN115588438B (en) WLS multi-channel speech dereverberation method based on bilinear decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination