CN109243482B - Micro-array voice noise reduction method for improving ACROC and beam forming - Google Patents
Micro-array voice noise reduction method for improving ACROC and beam forming Download PDFInfo
- Publication number
- CN109243482B CN109243482B CN201811275824.4A CN201811275824A CN109243482B CN 109243482 B CN109243482 B CN 109243482B CN 201811275824 A CN201811275824 A CN 201811275824A CN 109243482 B CN109243482 B CN 109243482B
- Authority
- CN
- China
- Prior art keywords
- signal
- filter
- voice
- noise
- microphone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000009467 reduction Effects 0.000 title claims abstract description 45
- 238000002493 microarray Methods 0.000 title claims abstract description 10
- 230000000694 effects Effects 0.000 claims abstract description 20
- 206010013952 Dysphonia Diseases 0.000 claims abstract description 15
- 230000003044 adaptive effect Effects 0.000 claims abstract description 10
- 238000011084 recovery Methods 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 4
- 230000003111 delayed effect Effects 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 230000001629 suppression Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- RYYVLZVUVIJVGH-UHFFFAOYSA-N trimethylxanthine Natural products CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention discloses a micro-array voice noise reduction method for improving ACROC and beam forming, which relates to the technical field of voice signal processing and solves the technical problem of how to further improve the noise suppression performance of voice reduction by the ACROC method, and comprises the following steps: the ACROANC method is improved, and the method comprises the following specific steps: (1) obtaining a plurality of paths of distorted voice signals after noise reduction through multi-path adaptive noise cancellation; (2) taking the multi-path distorted voice signals as the input of a recovery filter in an ACROC system, thereby obtaining noise-reduced voice; (II) forming a beam, which comprises the following specific steps: (1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice; (2) and obtaining better noise-reduced voice by beamforming for multipath noise-reduced voice. The invention can make the output voice effect better and further improve the voice noise reduction effect.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a micro-array voice noise reduction method for improving ACROC and beam forming.
Background
The voice noise reduction technology can effectively improve the voice quality and the recognition rate of a voice recognition system, and the micro-array voice noise reduction technology is an effective voice noise reduction method. The micro microphone is an array with a small array aperture, the array aperture is usually within 5 cm, and the number of array elements is small. The micro-array is easy to embed into various application devices, so that the micro-array has wide application value. A Generalized Sidelobe Cancellation (Generalized Sidelobe Cancellation) method (abbreviated as VAD-GSC) based on VAD (voice Activity detector) is a common and effective method for reducing noise of a microphone voice. Array Crosstalk Resistant Adaptive Noise Cancellation (abbreviated as ACRANC) is also an effective micro microphone voice Noise reduction method, and the ACRANC method has better Noise reduction effect than VAD-GSC and many improved methods thereof in many occasions, especially occasions with a short voice source distance Array.
In ACRANC, the input of the second stage filter is only one signal, which is actually one distorted speech signal, i.e. the output of the first stage filter, and the function of the second stage filter is to recover a pure speech signal from the distorted speech signal, i.e. to make the output of the second stage filter approach the pure speech signal in the main microphone. Due to the complexity of the audio signal propagation in the actual environment and the distortion caused by the ACRANC first-stage filter to the audio signal, the voice effect of the second-stage filter to recover the output is still insufficient.
Disclosure of Invention
Aiming at the defects of the prior art, the technical problem solved by the invention is how to further improve the noise suppression performance of the ACROC method.
In order to solve the above technical problems, the technical solution adopted by the present invention is a micro-array voice noise reduction method for improving ACRANC And beam forming, which performs voice noise reduction by inputting multiple distorted voice channels into a recovery filter And combining with DAS (Delay And Sum) beam forming, And comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), which reach the microphone through multiple paths respectivelyMiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k)(2)
ni(k)=hni(k)*n(k)i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response ofWhile the noise signal niTo the noise signal njHas an intermediate propagation impulse response ofThen:
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence stage, i.e. the stage that all the signals are silent, the signals are passed through the filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in addition, theIn the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a multi-path distorted speech signal can be obtained. The reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
to obtain the aboveAndonly filter a needs to be adjustediSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
wherein:
above ei1(k) Is a distorted speech with residual noise, pi(k) It is the distorted speech from which it is distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N)。
(2) The method comprises the following steps of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system to obtain voice signals after noise reduction, wherein the specific process is as follows:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into ACRONC systemSecond stage filter BiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
as can be seen from formula (15), minimizationEquivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k) In that respect Due to the filter BiThe input of the method is not only a single-path but also a multi-path distorted voice signal, thereby obtaining better voice noise reduction effect than ACROC, and recording the better voice noise reduction signal as
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, and the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, and therefore N subsystems are established.
In each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediTo compensate for errors caused by changes in environmental factors. To this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
wherein the discrete interval:
[k'ij,k”ij]={k'ij,k'ij+1,…,k”ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
thus, it is easy to prove that:
wherein:
if k "j<k'jThen define [ k 'in formula (18)'j,k”j]=φ;
Adjusting the filter AiWhen the optimal coefficient is obtained, no voice signal is contained in any path of signal, otherwise, the voice is cancelled as noise together, therefore, the filter A is adjusted only in the following L-ONVP stageiThe coefficient of (a);
where L is the reference signal input filter AiAnd the number of delay time samples of:
[k'j+L,k”j]={k'j+L,k'j+L+1,…,k”j} (20)
if k "j<k'j+ L, likewise defined as [ k 'in formula (26)'j+L,k”j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
in the formulaIs to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decisionThe aim is to ensure that the time interval used is a pure noise interval, and delta is also an optional positive integer, but this is satisfied
Δ≥L+δ+Δ' (22)
Where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0The time delay between microphones is counted by the number of delay samples, and the maximum number of delay samples is:
wherein d isiIs a microphoneAnd a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in air;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly for filtering purposes.
Adaptively adjusting all filters B in the rest of the phase except the global silence phaseiFor simplicity, may also be given to BiContinuously carrying out self-adaptive adjustment from beginning to end;
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
in the formula tauiRelative to a selected one of the reference microphones in the arrayIn other words, the speech reaches the microphone MiThe delay time of (d); reference microphoneOptionally any one of the microphones in the array, typically selecting the microphone at or near the center of the microphone as the reference microphone;
delay time tauiThe cross-correlation method or generalized cross-correlation method may be used for calculation or the following method:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
Compared with the prior art, the invention has the beneficial effects that:
compared with the original method that only one distorted voice is input into the recovery filter, the method has better voice noise reduction effect through the improved ACRANC method compared with the common ACRANC method, and the improved ACRANC method is combined with the beam forming method to further improve the noise reduction effect.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of speech and noise propagation and crosstalk;
FIG. 3 is a schematic diagram of an improved ACRONC system;
fig. 4 is a schematic diagram of improved ACRANC in combination with beamforming.
Detailed Description
The following further describes the embodiments of the present invention with reference to the drawings, but the present invention is not limited thereto.
Fig. 1 shows a micro-array voice noise reduction method for improving ACRANC and beam forming, which performs voice noise reduction by inputting multiple distorted voices to a recovery filter of ACRANC and combining with beam forming, and comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), as shown in FIG. 2, they reach the microphone M through multiple paths, respectivelyiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k) (2)
ni(k)=hni(k)*n(k)i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response ofWhile the noise signal niTo the noise signal njHas an intermediate propagation impulse response ofThen:
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence period, i.e. the period when each signal is silent, as shown in FIG. 3, pass filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a multi-path distorted speech signal can be obtained. The reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
to obtain the aboveAndonly filter a needs to be adjustediSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
wherein:
above ei1(k) Is a distorted speech with residual noise, pi(k) Is thereinDistorted speech, which is actually distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N)。
(2) The method comprises the following steps of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system to obtain voice signals after noise reduction, wherein the specific process is as follows:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into the second stage filter B in the ACRANC systemiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
as can be seen from formula (15), minimizationEquivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k)。
Due to the filter BiIs inputted with N-way signal ej1(k) (j ═ 1,2, … N), theyAll of which are distorted speech signals formed by N paths of speech according to equation (13), the output approximation generated by the multiple paths of input will be greater than that of only one path of signal ei1(k) The output approximation effect generated by the input is better, theoretically, only the filter B is needediTo other input signals ej1(k) When all coefficients (j-1, …, (i-1), (i +1), … N) take 0 values, the N inputs are degenerated to only one signal ei1(k) The input situation of (1). Therefore, the improved ACRANC method also has better effect than the existing ACRANC method, and the better voice noise reduction signal is recorded as
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, and the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, and therefore N subsystems are established.
In each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); as shown in fig. 4, adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediTo compensate for errors caused by changes in environmental factors. To this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
wherein the discrete interval:
[k’ij,k”ij]={k’ij,k’ij+1,…,k”ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
thus, it is easy to prove that:
wherein:
if k "j<k'jThen define [ k 'in formula (18)'j,k”j]=φ;
Adjusting the filter AiWhen the optimal coefficient is obtained, no voice signal is contained in any path of signal, otherwise, the voice is cancelled as noise together, therefore, the filter A is adjusted only in the following L-ONVP stageiThe coefficient of (a);
where L is the reference signal input filter AiThe number of delay time samples of (a),and:
[k'j+L,k”j]={k'j+L,k'j+L+1,…,k”j} (20)
if k "j<k’j+ L, likewise defined as [ k 'in formula (26)'j+L,k”j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
in the formulaIs to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decision, in order to ensure that the used time interval is a pure noise interval, and Δ is also an optional positive integer, but should satisfy:
Δ≥L+δ+Δ' (22)
where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0The time delay between microphones is counted by the number of delay samples, and the maximum number of delay samples is:
wherein d isiIs a microphoneAnd a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in airDegree;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly for filtering purposes.
Adaptively adjusting all filters B in the rest of the phase except the global silence phaseiFor simplicity, may also be given to BiContinuously carrying out self-adaptive adjustment from beginning to end;
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
in the formula tauiRelative to a selected one of the reference microphones in the arrayIn other words, the speech reaches the microphone MiThe delay time of (d); reference microphoneOptionally any one of the microphones in the array, typically selecting the microphone at or near the center of the microphone as the reference microphone;
delay time tauiThe cross-correlation method or the generalized cross-correlation method may be used or calculated as follows:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
For example, if any one microphone M in the array is presentiTo the reference microphoneIs less than 2 cm and the snapshot sampling frequency of the array is 8000Hz, the maximum extension time will be less than half the sampling time interval, so that all tau will not be takeni=0。
(3) Complexity with respect to computation
Fig. 4 shows a voice noise reduction process combining ACRANC with DAS beamforming, in which the amount of computation of both AMC and DAS beamformer is small, and AMC can be implemented by a vad (voice Activity detector). The computational complexity of the method therefore depends mainly on the computational load estimation of the improved ACRANC algorithm of the N subsystems, which in turn depends on all filters a for each improved ACRANCiAnd BiThe adaptive algorithm used. If the LMS adaptive algorithm is adopted, the calculation amount of the improved ACROC algorithm of the N subsystems is not more than difficult to calculate
(2A+3M)[(L+1)(N-1)+(LB+1)N]Nf (26)
In the formula 2ARepresenting 2 addition operations, 3MRepresenting 3 multiplications, L being the decision filter AiThe number of delay time samples used by the reference signal in the order of equation (10), where N is the number of microphones in the array, and LBIs a filter BiF is the sampling rate of the microphone array. Since many chips can complete an addition and multiplication operation in one operation, the real computation time is much shorter than the time required by equation (32).
For example, if the decision filter A is selectediLength L24, determining filter BiLength LBIf the sampling frequency f is 8000 and the array is made up of N5 microphones, then the calculation of interest is no more than 41MFLOPS, as can be derived from equation (32).
Compared with the prior art, the invention has the beneficial effects that:
compared with the original method that only one distorted voice is input into the recovery filter, the method has better voice noise reduction effect through the improved ACRANC method compared with the common ACRANC method, and the improved ACRANC method is combined with the beam forming method to further improve the noise reduction effect.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention.
Claims (2)
1. A method for improving ACROC and beamforming microarray voice noise reduction is characterized in that voice noise reduction is carried out by inputting multi-path distorted voice to a recovery filter and combining with beamforming, and the method comprises the following steps:
the ACROANC method is improved, and the method comprises the following specific steps:
(1) the distorted voice signals after the multi-path noise reduction are obtained through multi-path adaptive noise cancellation, and the specific process is as follows:
suppose that the speech signal is s (k) and the noise signal is n (k), which reach the microphone M through multiple paths respectivelyiAnd converted into a signal si(k) And ni(k) (ii) a From the speech source and the noise source to the microphone MiIs assumed to be hsi(k) And hni(k) (ii) a Microphone MiThe signal actually picked up is denoted xi(k)=si(k)+ni(k) Where i is 1,2, … N, k is 0,1,2, …, where N represents the number of microphones in the array and k is a discrete time index, we obtain:
xi(k)=si(k)+ni(k) (1)
si(k)=hsi(k)*s(k) (2)
ni(k)=hni(k)*n(k) i=1,2,…,N (3)
wherein, is convolution operation symbol;
setting a speech signal siTo the speech signal sjHas an impact response ofWhile the noise signal niTo the noise signal njHas an intermediate propagation impulse response ofThen:
in this substep, for each microphone MiWith a microphone MiThe obtained signal xi(k) As main path signal, and signals x obtained by other N-1 microphonesj(k) (j ═ 1, …, i-1, i +1, …, N) as a reference signal; in the global silence stage, i.e. the stage that all the signals are silent, the signals are passed through the filter AiAdaptively canceling noise in the main path with noise in the multi-path reference signal; in the non-global silence stage, the coefficient of the filter Ai is kept unchanged, and only filtering output is carried out; thus, a plurality of distorted speech signals can be obtained; the reason is as follows:
due to speech signal s in the global silence phasei(k) 0, i-1, 2, …, N, so there are:
xi(k)=yi1(k)+ei1(k) (6)
ni(k)=wini(k)+erri(k) (7)
in the formula xi(k)=ni(k),ei1(k)=erri(k) Is the prediction error, yi1=wini(k) Is a filter AiOutput of (d), wiIs a filter A of dimension 1 × (N-1) (L +1)iThe coefficient row vector of (a), i.e.:
wi=(wi1,…,wi(i-1),wi(i+1)…,wN) (8)
in the formula wij=(wij0,wij1,…,wijL),ni(k) A noise signal column vector of (N-1) (L +1) × 1 dimensions;
ni(k)=[ni1(k),…,ni(i-1)(k),ni(i+1)(k),…,niN(k)]T (9)
in the formula nij(k)=[nij(k),nij(k-1),…,nij(k-L)]TL is the number of samples delayed by the reference channel noise signal;
let the minimum error power be P [ err ]i 0(k)]And the corresponding optimal coefficient vector is:
to obtain the aboveAnd P [ err ]i 0(k)]Only need to adjust filter AiSuch that ei1The sum of squares of (a) is minimum;
at a stage immediately following the global silence stage, filter a is kept under the assumption that the noise environment is constant or slowly varyingiThe optimal coefficients of (a) are not changed, and only the filtered output is made, so that:
in the formula xi(k) And si(k) Representing the picked noisy speech vector and pure speech vector, respectively, as given by equations (6) and (11):
wherein:
above ei1(k) Is a distorted speech with residual noise, pi(k) It is the distorted speech from which it is distorted from the clean speech signal in the N-way, as can be seen from equation (13);
ei1(k) if i is from 1 to N, each signal is used as main signal and the rest signals are used as reference signal, then N paths of distorted speech signals e containing residual noise can be obtainedj1(k)(j=1,2,…N);
(2) The method comprises the following specific processes of taking a plurality of paths of distorted voice signals as the input of a recovery filter in an ACROC system so as to obtain noise-reduced voice:
will distort the multi-path speech signal ej1(k) (j ═ 1,2, … N), input into the second stage filter B in the ACRANC systemiAdjusting filter B at a stage other than the global silence stageiSo that it outputs e2i(k) The sum of squares of (a) is minimal, wherein:
||ei2(k)||2=||xi(k)-yi2(k)||2
=||si(k)+ni(k)-yi2(k)||2
=||ni(k)||2+||si(k)-yi2(k)||2+2ni(k)[si(k)-yi2(k)] (14)
as can be seen from formula (15), minimizationEquivalent to minimizing E [ s ]i(k)-yi2(k)2]The latter is equivalent to minimizing yi2(k) And speech si(k) So that the filter BiOutput y ofi2(k) Can approach to a clean speech signal si(k) (ii) a Due to the filter BiThe input of the method is not only a single-path but also a multi-path distorted voice signal, thereby obtaining better voice noise reduction effect than ACROC, and recording the better voice noise reduction signal as ;
And (II) beam forming, wherein the voice noise reduction effect is further improved by combining the improved ACROC with the beam forming, and the method comprises the following specific steps:
(1) establishing a plurality of improved ACROC subsystems and a self-adaptive mode control AMC subsystem to obtain multi-channel noise reduction voice, wherein the specific process is as follows:
each path of signal is used as a main signal, and the rest signals are used as reference signals, an improved ACROC is established, so that N subsystems are established;
in each improved ACROC, filter BiThe input of (A) is all filters Ai(i-1, 2, … N) instead of a filter aiAn output of (d); adaptive mode control AMC is used to control when the filters in these subsystems update coefficients and when fixed coefficients are unchanged;
in the silence period without voice, namely NVP period, the filter A can be adjustediIs most preferablyThe number is used for compensating errors caused by the change of the environmental factors; to this end, a global silence phase, i.e. an ONVP phase, is defined, the first filter a of each subsystemiAdjusting the optimal coefficients only during ONVP;
by a microphone MiPicking up the ith path of noisy speech signal xi(k) Is set to nvp (i), which consists of a series of discrete intervals, namely:
wherein the discrete interval:
[k′ij,k″ij]={k′ij,k′ij+1,…,k″ij}
the discrete interval is xi(k) The jth NVP of (a), obviously NVP (i)1) Not necessarily with NVP (i)2) Equal, i1≠i2,i1,i2E {1,2, …, N }. But NVP (i)1) NVP (i) only2) Translation results on the time axis;
define ONVP as:
thus, it is easy to prove that:
wherein:
if k ″)j<k′jThen define [ k 'in formula (18)'j,k″j]=φ;
Adjusting the filter AiIs most preferredIn the case of coefficients, no speech signal should be contained in any one path of signal, otherwise, speech is cancelled as noise, and therefore, the filter a is adjusted only in the following L-ONVP stageiThe coefficient of (a);
where L is the reference signal input filter AiAnd the number of delay time samples of:
[k′j+L,k″j]={k′j+L,k′j+L+1,…,k″j} (20)
if k ″)j<k′j+ L, likewise defined as [ k 'in formula (26)'j+L,k″j]=φ;
In the L-ONVP stage, all signals and the delay used belong to the silence stage, and no speech signal is included, so that the filter A can be adjusted in the L-ONVP stageiThe aforementioned NVP stage refers to L-ONVP or a part of L-ONVP;
filter A is performed during the (Delta, Delta') -ONVP stageiAdjusting the optimal coefficient:
in the formulaIs to constitute the ith0NVP (i) of way signal0) The discrete time interval of (a) is a positive integer, which can be arbitrarily selected according to the accuracy of VD decision, in order to ensure that the used time interval is a pure noise interval, and Δ is also an optional positive integer, but should satisfy:
Δ≥L+δ+Δ' (22)
where δ is the propagation of noise from other microphones of the microphone array to the ith microphone0Between microphonesThe time delay is counted by the number of delay samples, and the maximum number of delay samples is as follows:
wherein d isiIs a microphoneAnd a microphone MiF is the sampling frequency of the array, and c is the speed of propagation of the audio signal in air;
at a stage outside (Δ, Δ') -ONVP, the filter a of each subsystemiThe optimal coefficient of (A) is kept unchanged, and the filter AiOnly used for filtering;
adaptively adjusting all filters B in the rest of the phase except the global silence phaseiThe optimum coefficient of (a);
(2) and obtaining final noise-reduction voice through DAS beam forming by delay and sum, wherein the specific process is as follows:
the output of each subsystem is a path of voice signal after noise reduction, all N paths of outputs can be input into a beam former to obtain better voice noise reduction effect, if a common DAS beam former is used, the following input and output relationship can be described as follows:
in the formula tauiRelative to a selected one of the reference microphones in the arrayIn other words, the speech reaches the microphone MiThe delay time of (d); reference microphoneOptionally any one microphone in the arrayA microphone located at or near the center of the microphone is typically selected as the reference microphone.
2. The method of claim 1, wherein the delay time τ is greater than the delay time τiThe cross-correlation method or the generalized cross-correlation method may be used or calculated as follows:
1) selecting an (delta, T) _ OVP discrete time interval [ k ', k' ], wherein k is more than or equal to k '+ delta and k- (k' + delta) is as small as possible;
2) finding tauiSatisfies the following conditions:
all tau if the array aperture of the microphone is small and the sampling frequency of the array signal is not very highiCan be considered as a 0 process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811275824.4A CN109243482B (en) | 2018-10-30 | 2018-10-30 | Micro-array voice noise reduction method for improving ACROC and beam forming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811275824.4A CN109243482B (en) | 2018-10-30 | 2018-10-30 | Micro-array voice noise reduction method for improving ACROC and beam forming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109243482A CN109243482A (en) | 2019-01-18 |
CN109243482B true CN109243482B (en) | 2022-03-18 |
Family
ID=65079322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811275824.4A Active CN109243482B (en) | 2018-10-30 | 2018-10-30 | Micro-array voice noise reduction method for improving ACROC and beam forming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109243482B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951260B (en) * | 2021-03-02 | 2022-07-19 | 桂林电子科技大学 | Method for enhancing speech by double microphones |
CN117278896B (en) * | 2023-11-23 | 2024-03-19 | 深圳市昂思科技有限公司 | Voice enhancement method and device based on double microphones and hearing aid equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1529528A (en) * | 2003-09-28 | 2004-09-15 | 曾庆宁 | Multi sampling rate array signal noise-removing method |
US20060222184A1 (en) * | 2004-09-23 | 2006-10-05 | Markus Buck | Multi-channel adaptive speech signal processing system with noise reduction |
CN105575397A (en) * | 2014-10-08 | 2016-05-11 | 展讯通信(上海)有限公司 | Voice noise reduction method and voice collection device |
CN105814627A (en) * | 2013-12-16 | 2016-07-27 | 哈曼贝克自动***股份有限公司 | Active noise control system |
CN106024001A (en) * | 2016-05-03 | 2016-10-12 | 电子科技大学 | Method used for improving speech enhancement performance of microphone array |
-
2018
- 2018-10-30 CN CN201811275824.4A patent/CN109243482B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1529528A (en) * | 2003-09-28 | 2004-09-15 | 曾庆宁 | Multi sampling rate array signal noise-removing method |
US20060222184A1 (en) * | 2004-09-23 | 2006-10-05 | Markus Buck | Multi-channel adaptive speech signal processing system with noise reduction |
CN105814627A (en) * | 2013-12-16 | 2016-07-27 | 哈曼贝克自动***股份有限公司 | Active noise control system |
CN105575397A (en) * | 2014-10-08 | 2016-05-11 | 展讯通信(上海)有限公司 | Voice noise reduction method and voice collection device |
CN106024001A (en) * | 2016-05-03 | 2016-10-12 | 电子科技大学 | Method used for improving speech enhancement performance of microphone array |
Non-Patent Citations (2)
Title |
---|
Speech Enhancement by Multi-Channel Crosstalk Resistant Adaptive Noise Cancellation;Qingning Zeng et al.;《2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings》;20060724;第I-485-I-488页 * |
基于阵列抗串扰自适应噪声抵消的语音增强;曾庆宁等;《电子学报》;20050225;第33卷(第02期);第241-244页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109243482A (en) | 2019-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | ADL-MVDR: All deep learning MVDR beamformer for target speech separation | |
US10403299B2 (en) | Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition | |
CN109686381B (en) | Signal processor for signal enhancement and related method | |
CN108172231B (en) | Dereverberation method and system based on Kalman filtering | |
CN108141656B (en) | Method and apparatus for digital signal processing of microphones | |
KR100480789B1 (en) | Method and apparatus for adaptive beamforming using feedback structure | |
WO2018119470A1 (en) | Online dereverberation algorithm based on weighted prediction error for noisy time-varying environments | |
EP2364037B1 (en) | Adaptive notch filter with variable bandwidth, and method and apparatus for canceling howling by using the adaptive notch filter with variable bandwidth | |
JP5738488B2 (en) | Beam forming equipment | |
CN108293170B (en) | Method and apparatus for adaptive phase distortion free amplitude response equalization in beamforming applications | |
CN109243482B (en) | Micro-array voice noise reduction method for improving ACROC and beam forming | |
CN112331226B (en) | Voice enhancement system and method for active noise reduction system | |
Braun et al. | Task splitting for dnn-based acoustic echo and noise removal | |
Albu | The constrained stability least mean square algorithm for active noise control | |
CN113362846B (en) | Voice enhancement method based on generalized sidelobe cancellation structure | |
EP2045620B1 (en) | Acoustic propagation delay measurement | |
JP2003250193A (en) | Echo elimination method, device for executing the method, program and recording medium therefor | |
US11195540B2 (en) | Methods and apparatus for an adaptive blocking matrix | |
JP6143702B2 (en) | Echo canceling apparatus, method and program | |
KR102045953B1 (en) | Method for cancellating mimo acoustic echo based on kalman filtering | |
KR102056398B1 (en) | Real-time speech derverberation method and apparatus using multi-channel linear prediction with estimation of early speech psd for distant speech recognition | |
JP4948019B2 (en) | Adaptive signal processing apparatus and adaptive signal processing method thereof | |
Kamo et al. | Importance of switch optimization criterion in switching wpe dereverberation | |
CN113347536B (en) | Acoustic feedback suppression algorithm based on linear prediction and sub-band adaptive filtering | |
Hosseini et al. | A novel noise cancellation method for speech enhancement using variable step-size adaptive algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |