CN109243482B - Micro-array voice noise reduction method for improving ACROC and beam forming - Google Patents

Micro-array voice noise reduction method for improving ACROC and beam forming Download PDF

Info

Publication number
CN109243482B
CN109243482B CN201811275824.4A CN201811275824A CN109243482B CN 109243482 B CN109243482 B CN 109243482B CN 201811275824 A CN201811275824 A CN 201811275824A CN 109243482 B CN109243482 B CN 109243482B
Authority
CN
China
Prior art keywords
signal
filter
voice
noise
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811275824.4A
Other languages
Chinese (zh)
Other versions
CN109243482A (en
Inventor
曾庆宁
罗瀛
方韶劻
林凤梅
谢先明
龙超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Aangsi Science & Technology Co ltd
Original Assignee
Shenzhen Aangsi Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Aangsi Science & Technology Co ltd filed Critical Shenzhen Aangsi Science & Technology Co ltd
Priority to CN201811275824.4A priority Critical patent/CN109243482B/en
Publication of CN109243482A publication Critical patent/CN109243482A/en
Application granted granted Critical
Publication of CN109243482B publication Critical patent/CN109243482B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a micro-array voice noise reduction method for improving ACROC and beam forming, which relates to the technical field of voice signal processing and solves the technical problem of how to further improve the noise suppression performance of voice reduction by the ACROC method, and comprises the following steps: the ACROANC method is improved, and the method comprises the following specific steps: (1) obtaining a plurality of paths of distorted voice signals after noise reduction through multi-path adaptive noise cancellation; (2) taking the multi-path distorted voice signals as the input of a recovery filter in an ACROC system, thereby obtaining noise-reduced voice; (II) forming a beam, which comprises the following specific steps: (1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice; (2) and obtaining better noise-reduced voice by beamforming for multipath noise-reduced voice. The invention can make the output voice effect better and further improve the voice noise reduction effect.

Description

Micro-array voice noise reduction method for improving ACROC and beam forming
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a micro-array voice noise reduction method for improving ACROC and beam forming.
Background
The voice noise reduction technology can effectively improve the voice quality and the recognition rate of a voice recognition system, and the micro-array voice noise reduction technology is an effective voice noise reduction method. The micro microphone is an array with a small array aperture, the array aperture is usually within 5 cm, and the number of array elements is small. The micro-array is easy to embed into various application devices, so that the micro-array has wide application value. A Generalized Sidelobe Cancellation (Generalized Sidelobe Cancellation) method (abbreviated as VAD-GSC) based on VAD (voice Activity detector) is a common and effective method for reducing noise of a microphone voice. Array Crosstalk Resistant Adaptive Noise Cancellation (abbreviated as ACRANC) is also an effective micro microphone voice Noise reduction method, and the ACRANC method has better Noise reduction effect than VAD-GSC and many improved methods thereof in many occasions, especially occasions with a short voice source distance Array.
In ACRANC, the input of the second stage filter is only one signal, which is actually one distorted speech signal, i.e. the output of the first stage filter, and the function of the second stage filter is to recover a pure speech signal from the distorted speech signal, i.e. to make the output of the second stage filter approach the pure speech signal in the main microphone. Due to the complexity of the audio signal propagation in the actual environment and the distortion caused by the ACRANC first-stage filter to the audio signal, the voice effect of the second-stage filter to recover the output is still insufficient.
Disclosure of Invention
Aiming at the defects of the prior art, the technical problem solved by the invention is how to further improve the noise suppression performance of the ACROC method.
In order to solve the above technical problems, the technical solution adopted by the present invention is a micro-array voice noise reduction method for improving ACRANC And beam forming, which performs voice noise reduction by inputting multiple distorted voice channels into a recovery filter And combining with DAS (Delay And Sum) beam forming, And comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), which reach the microphone through multiple paths respectivelyMiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k)(2)
ni(k)=hni(k)*n(k)i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response of
Figure BDA0001846946880000021
While the noise signal niTo the noise signal njHas an intermediate propagation impulse response of
Figure BDA0001846946880000022
Then:
Figure BDA0001846946880000023
Figure BDA0001846946880000031
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence stage, i.e. the stage that all the signals are silent, the signals are passed through the filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in addition, theIn the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a multi-path distorted speech signal can be obtained. The reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
setting the minimum error power as
Figure BDA0001846946880000032
And the corresponding optimal coefficient vector is:
Figure BDA0001846946880000033
to obtain the above
Figure BDA0001846946880000041
And
Figure BDA0001846946880000042
only filter a needs to be adjustediSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
Figure BDA0001846946880000043
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
Figure BDA0001846946880000044
wherein:
Figure BDA0001846946880000045
above ei1(k) Is a distorted speech with residual noise, pi(k) It is the distorted speech from which it is distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N)。
(2) The method comprises the following steps of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system to obtain voice signals after noise reduction, wherein the specific process is as follows:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into ACRONC systemSecond stage filter BiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
Figure BDA0001846946880000051
as can be seen from formula (15), minimization
Figure BDA0001846946880000052
Equivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k) In that respect Due to the filter BiThe input of the method is not only a single-path but also a multi-path distorted voice signal, thereby obtaining better voice noise reduction effect than ACROC, and recording the better voice noise reduction signal as
Figure BDA0001846946880000053
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, and the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, and therefore N subsystems are established.
In each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediTo compensate for errors caused by changes in environmental factors. To this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
Figure BDA0001846946880000061
wherein the discrete interval:
[k'ij,k”ij]={k'ij,k'ij+1,…,k”ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
Figure BDA0001846946880000062
thus, it is easy to prove that:
Figure BDA0001846946880000063
wherein:
Figure BDA0001846946880000064
if k "j<k'jThen define [ k 'in formula (18)'j,k”j]=φ;
Adjusting the filter AiWhen the optimal coefficient is obtained, no voice signal is contained in any path of signal, otherwise, the voice is cancelled as noise together, therefore, the filter A is adjusted only in the following L-ONVP stageiThe coefficient of (a);
Figure BDA0001846946880000071
where L is the reference signal input filter AiAnd the number of delay time samples of:
[k'j+L,k”j]={k'j+L,k'j+L+1,…,k”j} (20)
if k "j<k'j+ L, likewise defined as [ k 'in formula (26)'j+L,k”j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
Figure BDA0001846946880000072
in the formula
Figure BDA0001846946880000073
Is to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decisionThe aim is to ensure that the time interval used is a pure noise interval, and delta is also an optional positive integer, but this is satisfied
Δ≥L+δ+Δ' (22)
Where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0The time delay between microphones is counted by the number of delay samples, and the maximum number of delay samples is:
Figure BDA0001846946880000074
wherein d isiIs a microphone
Figure BDA0001846946880000075
And a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in air;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly for filtering purposes.
Adaptively adjusting all filters B in the rest of the phase except the global silence phaseiFor simplicity, may also be given to BiContinuously carrying out self-adaptive adjustment from beginning to end;
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
Figure BDA0001846946880000081
in the formula tauiRelative to a selected one of the reference microphones in the array
Figure BDA0001846946880000082
In other words, the speech reaches the microphone MiThe delay time of (d); reference microphone
Figure BDA0001846946880000083
Optionally any one of the microphones in the array, typically selecting the microphone at or near the center of the microphone as the reference microphone;
delay time tauiThe cross-correlation method or generalized cross-correlation method may be used for calculation or the following method:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
Figure BDA0001846946880000084
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
Compared with the prior art, the invention has the beneficial effects that:
compared with the original method that only one distorted voice is input into the recovery filter, the method has better voice noise reduction effect through the improved ACRANC method compared with the common ACRANC method, and the improved ACRANC method is combined with the beam forming method to further improve the noise reduction effect.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of speech and noise propagation and crosstalk;
FIG. 3 is a schematic diagram of an improved ACRONC system;
fig. 4 is a schematic diagram of improved ACRANC in combination with beamforming.
Detailed Description
The following further describes the embodiments of the present invention with reference to the drawings, but the present invention is not limited thereto.
Fig. 1 shows a micro-array voice noise reduction method for improving ACRANC and beam forming, which performs voice noise reduction by inputting multiple distorted voices to a recovery filter of ACRANC and combining with beam forming, and comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), as shown in FIG. 2, they reach the microphone M through multiple paths, respectivelyiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k) (2)
ni(k)=hni(k)*n(k)i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response of
Figure BDA0001846946880000101
While the noise signal niTo the noise signal njHas an intermediate propagation impulse response of
Figure BDA0001846946880000102
Then:
Figure BDA0001846946880000103
Figure BDA0001846946880000104
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence period, i.e. the period when each signal is silent, as shown in FIG. 3, pass filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a multi-path distorted speech signal can be obtained. The reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
setting the minimum error power as
Figure BDA0001846946880000111
And the corresponding optimal coefficient vector is:
Figure BDA0001846946880000112
to obtain the above
Figure BDA0001846946880000113
And
Figure BDA0001846946880000114
only filter a needs to be adjustediSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
Figure BDA0001846946880000115
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
Figure BDA0001846946880000116
wherein:
Figure BDA0001846946880000117
above ei1(k) Is a distorted speech with residual noise, pi(k) Is thereinDistorted speech, which is actually distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N)。
(2) The method comprises the following steps of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system to obtain voice signals after noise reduction, wherein the specific process is as follows:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into the second stage filter B in the ACRANC systemiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
Figure BDA0001846946880000121
as can be seen from formula (15), minimization
Figure BDA0001846946880000122
Equivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k)。
Due to the filter BiIs inputted with N-way signal ej1(k) (j ═ 1,2, … N), theyAll of which are distorted speech signals formed by N paths of speech according to equation (13), the output approximation generated by the multiple paths of input will be greater than that of only one path of signal ei1(k) The output approximation effect generated by the input is better, theoretically, only the filter B is needediTo other input signals ej1(k) When all coefficients (j-1, …, (i-1), (i +1), … N) take 0 values, the N inputs are degenerated to only one signal ei1(k) The input situation of (1). Therefore, the improved ACRANC method also has better effect than the existing ACRANC method, and the better voice noise reduction signal is recorded as
Figure BDA0001846946880000131
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, and the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, and therefore N subsystems are established.
In each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); as shown in fig. 4, adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediTo compensate for errors caused by changes in environmental factors. To this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
Figure BDA0001846946880000132
wherein the discrete interval:
[k’ij,k”ij]={k’ij,k’ij+1,…,k”ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
Figure BDA0001846946880000141
thus, it is easy to prove that:
Figure BDA0001846946880000142
wherein:
Figure BDA0001846946880000143
if k "j<k'jThen define [ k 'in formula (18)'j,k”j]=φ;
Adjusting the filter AiWhen the optimal coefficient is obtained, no voice signal is contained in any path of signal, otherwise, the voice is cancelled as noise together, therefore, the filter A is adjusted only in the following L-ONVP stageiThe coefficient of (a);
Figure BDA0001846946880000144
where L is the reference signal input filter AiThe number of delay time samples of (a),and:
[k'j+L,k”j]={k'j+L,k'j+L+1,…,k”j} (20)
if k "j<k’j+ L, likewise defined as [ k 'in formula (26)'j+L,k”j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
Figure BDA0001846946880000145
in the formula
Figure BDA0001846946880000146
Is to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decision, in order to ensure that the used time interval is a pure noise interval, and Δ is also an optional positive integer, but should satisfy:
Δ≥L+δ+Δ' (22)
where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0The time delay between microphones is counted by the number of delay samples, and the maximum number of delay samples is:
Figure BDA0001846946880000151
wherein d isiIs a microphone
Figure BDA0001846946880000152
And a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in airDegree;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly for filtering purposes.
Adaptively adjusting all filters B in the rest of the phase except the global silence phaseiFor simplicity, may also be given to BiContinuously carrying out self-adaptive adjustment from beginning to end;
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
Figure BDA0001846946880000153
in the formula tauiRelative to a selected one of the reference microphones in the array
Figure BDA0001846946880000154
In other words, the speech reaches the microphone MiThe delay time of (d); reference microphone
Figure BDA0001846946880000155
Optionally any one of the microphones in the array, typically selecting the microphone at or near the center of the microphone as the reference microphone;
delay time tauiThe cross-correlation method or the generalized cross-correlation method may be used or calculated as follows:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
Figure BDA0001846946880000161
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
For example, if any one microphone M in the array is presentiTo the reference microphone
Figure BDA0001846946880000162
Is less than 2 cm and the snapshot sampling frequency of the array is 8000Hz, the maximum extension time will be less than half the sampling time interval, so that all tau will not be takeni=0。
(3) Complexity with respect to computation
Fig. 4 shows a voice noise reduction process combining ACRANC with DAS beamforming, in which the amount of computation of both AMC and DAS beamformer is small, and AMC can be implemented by a vad (voice Activity detector). The computational complexity of the method therefore depends mainly on the computational load estimation of the improved ACRANC algorithm of the N subsystems, which in turn depends on all filters a for each improved ACRANCiAnd BiThe adaptive algorithm used. If the LMS adaptive algorithm is adopted, the calculation amount of the improved ACROC algorithm of the N subsystems is not more than difficult to calculate
(2A+3M)[(L+1)(N-1)+(LB+1)N]Nf (26)
In the formula 2ARepresenting 2 addition operations, 3MRepresenting 3 multiplications, L being the decision filter AiThe number of delay time samples used by the reference signal in the order of equation (10), where N is the number of microphones in the array, and LBIs a filter BiF is the sampling rate of the microphone array. Since many chips can complete an addition and multiplication operation in one operation, the real computation time is much shorter than the time required by equation (32).
For example, if the decision filter A is selectediLength L24, determining filter BiLength LBIf the sampling frequency f is 8000 and the array is made up of N5 microphones, then the calculation of interest is no more than 41MFLOPS, as can be derived from equation (32).
Compared with the prior art, the invention has the beneficial effects that:
compared with the original method that only one distorted voice is input into the recovery filter, the method has better voice noise reduction effect through the improved ACRANC method compared with the common ACRANC method, and the improved ACRANC method is combined with the beam forming method to further improve the noise reduction effect.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention.

Claims (2)

1. A method for improving ACROC and beamforming microarray voice noise reduction is characterized in that voice noise reduction is carried out by inputting multi-path distorted voice to a recovery filter and combining with beamforming, and the method comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), which reach the microphone M through multiple paths respectivelyiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k) (2)
ni(k)=hni(k)*n(k) i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response of
Figure FDA0003485045240000011
While the noise signal niTo the noise signal njHas an intermediate propagation impulse response of
Figure FDA0003485045240000012
Then:
Figure FDA0003485045240000013
Figure FDA0003485045240000014
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence stage, i.e. the stage that all the signals are silent, the signals are passed through the filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a plurality of distorted speech signals can be obtained; the reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
let the minimum error power be P [ err ]i 0(k)]And the corresponding optimal coefficient vector is:
Figure FDA0003485045240000021
to obtain the above
Figure FDA0003485045240000022
And P [ err ]i 0(k)]Only need to adjust filter AiSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
Figure FDA0003485045240000031
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
Figure FDA0003485045240000032
wherein:
Figure FDA0003485045240000033
above ei1(k) Is a distorted speech with residual noise, pi(k) It is the distorted speech from which it is distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N);
(2) The method comprises the following specific processes of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system so as to obtain noise-reduced voice:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into the second stage filter B in the ACRANC systemiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
Figure FDA0003485045240000041
as can be seen from formula (15), minimization
Figure FDA0003485045240000042
Equivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k) (ii) a Due to the filter BiThe input of the method is not only a single-path but also a multi-path distorted voice signal, thereby obtaining better voice noise reduction effect than ACROC, and recording the better voice noise reduction signal as
Figure 4
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, wherein the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, so that N subsystems are established;
in each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediIs most preferablyThe number is used for compensating errors caused by the change of the environmental factors; to this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
Figure FDA0003485045240000051
wherein the discrete interval:
[k′ij,k″ij]={k′ij,k′ij+1,…,k″ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
Figure FDA0003485045240000052
thus, it is easy to prove that:
Figure FDA0003485045240000053
wherein:
Figure FDA0003485045240000054
if k ″)j<k′jThen define [ k 'in formula (18)'j,k″j]=φ;
Adjusting the filter AiIs most preferredIn the case of coefficients, no speech signal should be contained in any one path of signal, otherwise, speech is cancelled as noise, and therefore, the filter a is adjusted only in the following L-ONVP stageiThe coefficient of (a);
Figure FDA0003485045240000061
where L is the reference signal input filter AiAnd the number of delay time samples of:
[k′j+L,k″j]={k′j+L,k′j+L+1,…,k″j} (20)
if k ″)j<k′j+ L, likewise defined as [ k 'in formula (26)'j+L,k″j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
Figure FDA0003485045240000062
in the formula
Figure FDA0003485045240000063
Is to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decision, in order to ensure that the used time interval is a pure noise interval, and Δ is also an optional positive integer, but should satisfy:
Δ≥L+δ+Δ' (22)
where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0Between microphonesThe time delay is counted by the number of delay samples, and the maximum number of delay samples is as follows:
Figure FDA0003485045240000064
wherein d isiIs a microphone
Figure FDA0003485045240000065
And a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in air;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly used for filtering;
adaptively adjusting all filters B in the rest of the phase except the global silence phaseiThe optimum coefficient of (a);
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
Figure FDA0003485045240000071
in the formula tauiRelative to a selected one of the reference microphones in the array
Figure FDA0003485045240000072
In other words, the speech reaches the microphone MiThe delay time of (d); reference microphone
Figure FDA0003485045240000073
Optionally any one microphone in the arrayA microphone located at or near the center of the microphone is typically selected as the reference microphone.
2. The method of claim 1, wherein the delay time τ is greater than the delay time τiThe cross-correlation method or the generalized cross-correlation method may be used or calculated as follows:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
Figure FDA0003485045240000074
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
CN201811275824.4A 2018-10-30 2018-10-30 Micro-array voice noise reduction method for improving ACROC and beam forming Active CN109243482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811275824.4A CN109243482B (en) 2018-10-30 2018-10-30 Micro-array voice noise reduction method for improving ACROC and beam forming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811275824.4A CN109243482B (en) 2018-10-30 2018-10-30 Micro-array voice noise reduction method for improving ACROC and beam forming

Publications (2)

Publication Number Publication Date
CN109243482A CN109243482A (en) 2019-01-18
CN109243482B true CN109243482B (en) 2022-03-18

Family

ID=65079322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811275824.4A Active CN109243482B (en) 2018-10-30 2018-10-30 Micro-array voice noise reduction method for improving ACROC and beam forming

Country Status (1)

Country Link
CN (1) CN109243482B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951260B (en) * 2021-03-02 2022-07-19 桂林电子科技大学 Method for enhancing speech by double microphones
CN117278896B (en) * 2023-11-23 2024-03-19 深圳市昂思科技有限公司 Voice enhancement method and device based on double microphones and hearing aid equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529528A (en) * 2003-09-28 2004-09-15 曾庆宁 Multi sampling rate array signal noise-removing method
US20060222184A1 (en) * 2004-09-23 2006-10-05 Markus Buck Multi-channel adaptive speech signal processing system with noise reduction
CN105575397A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Voice noise reduction method and voice collection device
CN105814627A (en) * 2013-12-16 2016-07-27 哈曼贝克自动***股份有限公司 Active noise control system
CN106024001A (en) * 2016-05-03 2016-10-12 电子科技大学 Method used for improving speech enhancement performance of microphone array

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529528A (en) * 2003-09-28 2004-09-15 曾庆宁 Multi sampling rate array signal noise-removing method
US20060222184A1 (en) * 2004-09-23 2006-10-05 Markus Buck Multi-channel adaptive speech signal processing system with noise reduction
CN105814627A (en) * 2013-12-16 2016-07-27 哈曼贝克自动***股份有限公司 Active noise control system
CN105575397A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Voice noise reduction method and voice collection device
CN106024001A (en) * 2016-05-03 2016-10-12 电子科技大学 Method used for improving speech enhancement performance of microphone array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Speech Enhancement by Multi-Channel Crosstalk Resistant Adaptive Noise Cancellation;Qingning Zeng et al.;《2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings》;20060724;第I-485-I-488页 *
基于阵列抗串扰自适应噪声抵消的语音增强;曾庆宁等;《电子学报》;20050225;第33卷(第02期);第241-244页 *

Also Published As

Publication number Publication date
CN109243482A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
Zhang et al. ADL-MVDR: All deep learning MVDR beamformer for target speech separation
US10403299B2 (en) Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition
CN109686381B (en) Signal processor for signal enhancement and related method
CN108172231B (en) Dereverberation method and system based on Kalman filtering
CN108141656B (en) Method and apparatus for digital signal processing of microphones
KR100480789B1 (en) Method and apparatus for adaptive beamforming using feedback structure
WO2018119470A1 (en) Online dereverberation algorithm based on weighted prediction error for noisy time-varying environments
EP2364037B1 (en) Adaptive notch filter with variable bandwidth, and method and apparatus for canceling howling by using the adaptive notch filter with variable bandwidth
JP5738488B2 (en) Beam forming equipment
CN108293170B (en) Method and apparatus for adaptive phase distortion free amplitude response equalization in beamforming applications
CN109243482B (en) Micro-array voice noise reduction method for improving ACROC and beam forming
CN112331226B (en) Voice enhancement system and method for active noise reduction system
Braun et al. Task splitting for dnn-based acoustic echo and noise removal
Albu The constrained stability least mean square algorithm for active noise control
CN113362846B (en) Voice enhancement method based on generalized sidelobe cancellation structure
EP2045620B1 (en) Acoustic propagation delay measurement
JP2003250193A (en) Echo elimination method, device for executing the method, program and recording medium therefor
US11195540B2 (en) Methods and apparatus for an adaptive blocking matrix
JP6143702B2 (en) Echo canceling apparatus, method and program
KR102045953B1 (en) Method for cancellating mimo acoustic echo based on kalman filtering
KR102056398B1 (en) Real-time speech derverberation method and apparatus using multi-channel linear prediction with estimation of early speech psd for distant speech recognition
JP4948019B2 (en) Adaptive signal processing apparatus and adaptive signal processing method thereof
Kamo et al. Importance of switch optimization criterion in switching wpe dereverberation
CN113347536B (en) Acoustic feedback suppression algorithm based on linear prediction and sub-band adaptive filtering
Hosseini et al. A novel noise cancellation method for speech enhancement using variable step-size adaptive algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant