CN110517701A - A kind of microphone array voice enhancement method and realization device - Google Patents

A kind of microphone array voice enhancement method and realization device Download PDF

Info

Publication number
CN110517701A
CN110517701A CN201910677433.3A CN201910677433A CN110517701A CN 110517701 A CN110517701 A CN 110517701A CN 201910677433 A CN201910677433 A CN 201910677433A CN 110517701 A CN110517701 A CN 110517701A
Authority
CN
China
Prior art keywords
branch
microphone array
neural network
noise
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910677433.3A
Other languages
Chinese (zh)
Other versions
CN110517701B (en
Inventor
张军
梁晟
宁更新
冯义志
余华
季飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910677433.3A priority Critical patent/CN110517701B/en
Publication of CN110517701A publication Critical patent/CN110517701A/en
Application granted granted Critical
Publication of CN110517701B publication Critical patent/CN110517701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a kind of microphone array voice enhancement method and realization devices, inhibit the signal of speaker and interference source direction by branch three, obtain space noncoherent noise spectral vector;Nonlinear characteristic and temporal correlation that voice signal can be effectively utilized from noisy speech and noise to the mapping of clean speech are completed using deep neural network, keep estimated result more accurate and close to human auditory system;The deep neural network, as input, has better reinforcing effect using noisy speech and noise compared with tradition is only with noisy speech deep neural network speech enhancement technique as input.The present invention combines the speech enhancement technique based on microphone array and deep neural network, and performance is better than traditional microphone array voice enhancement method and single microphone deep neural network sound enhancement method;It can be widely applied in the various voice communications applications with noisy background such as video conference, vehicle-carrying communication, meeting-place, multi-media classroom.

Description

A kind of microphone array voice enhancement method and realization device
Technical field
The present invention relates to speech signal processing technologies, and in particular to one kind is based on the wheat of deep neural network (DNN) Gram wind array voice enhancement method and realization device.
Background technique
In real life, the process that people transmit voice messaging usually unavoidably will receive the interference of outside noise, These interference can enable voice quality decline, and influence the effect of voice communication and identification.Speech enhan-cement is one kind from by noise jamming Voice in extract useful voice signal, inhibition and reduce noise technology, i.e., extracted from noisy speech as pure as possible Raw tone, voice communication, in terms of have extensive purposes.
According to the number of used microphone, existing voice enhancement algorithm can be divided into two classes, and one kind is based on single wheat The voice enhancement algorithm, such as spectrum-subtraction, Wiener Filter Method, MMSE, Kalman filtering etc. of gram wind.This kind of voice enhancement algorithm makes Voice signal is received with single microphone, small in size, structure is simple, but noise reduction capability is limited, can only handle and steadily make an uproar mostly Sound, for nonstationary noise effect speech enhan-cement, the effect is unsatisfactory.Another kind of is the speech enhan-cement based on microphone array, The sound from different spaces direction is received using multiple microphones i.e. in voice acquisition system, by airspace filter come The signal for amplifying speaker direction, inhibits noise and the interference in other directions, has higher signal compared with traditional method Gain and stronger interference rejection capability can solve a variety of acoustics estimation problems, as auditory localization, dereverberation, speech enhan-cement, Blind source separating etc., the disadvantage is that volume is big, algorithm complexity is higher.Existing Microphone Array Speech enhancing technology can substantially divide Method, Adaptive beamformer method and adaptive post-filtering method three classes are formed for fixed beam, wherein Adaptive beamformer is Adjusted under certain optiaml ciriterion by adaptive algorithm with optimization array weight, have to the variation of environment and well adapt to Ability, therefore apply in practice the most extensive.
Generalized sidelobe canceller (GSC) is a kind of common structure for realizing adaptive beam, is mainly made of two branches: Branch one is using the signal of fixed beam former enhancing receiving direction, and branch two is first using blocking matrix prevention receiving direction Signal pass through, then the output of blocking matrix is filtered using sef-adapting filter, to estimate in the output of out branch one Remaining noise, and by subtracting each other counteracting.GSC can convert limited linear constraint minimal variance (LCMV) optimization problem For unconstrained optimization problem, therefore there is very high computational efficiency, implements than other adaptive beam-forming algorithms more Simply.But there is also some shortcomings by traditional GSC, such as: it is not strong to space noncoherent noise rejection ability, do not utilize language The priori knowledge of sound signal is simultaneously optimized for the characteristics of voice signal.
To solve the above-mentioned problems, Chinese invention patent 201711201341.5 provides a kind of wheat based on statistical model Gram wind array voice enhancement method, this method utilize clean speech model and the noise model estimated from the output of GSC branch two Best voice filter is constructed to enhance the output signal of GSC branch one, enhancing system can be effectively improved to incoherent The rejection ability of noise, and can make to export the auditory properties that voice more meets the mankind using the priori knowledge of voice signal.But There is also following disadvantages for this method: (1) this method uses sef-adapting filter output signal energy and sef-adapting filter M-1 The ratio of the sum of road input signal energy adjusts the renewal rate of noncoherent noise, coherent noise and noncoherent noise simultaneously In the presence of be difficult to accurately to estimate and tracking noncoherent noise, thus affect the effect of noise suppressed;(2) this method uses linear Filter enhances come the output for forming part to fixed beam, and the mistake of voice signal can be brought while eliminating noise Very, make reinforcing effect by biggish limitation;(3) in speech enhan-cement treatment process, the processing of front and back speech frame is mutually indepedent, The correlation of voice signal in time can not be utilized.
Summary of the invention
The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of based on deep neural network Microphone array voice enhancement method and realization device, this method difference from prior art are: (1) traditional GSC's On the basis of increase branch three for estimating noncoherent noise, can more accurately estimate remaining noise in the output of out branch one;(2) Noisy speech and noise is used to utilize the depth nerve net as output training deep neural network as input, clean speech Network enhances the output of branch one, the nonlinear characteristic and temporal correlation of voice signal can be preferably utilized, by branch One output is more accurately mapped as clean speech.It present invention can be widely used to video conference, vehicle-carrying communication, meeting-place, more matchmakers In the various voice communications applications with noisy background such as body classroom.
The first purpose of this invention can be reached by adopting the following technical scheme that:
A kind of microphone array voice enhancement method based on deep neural network, using following steps to the voice of input Signal is enhanced:
S1, depth for noisy speech and noise to be mapped as to clean speech is trained using clean speech library and noise library Neural network.
S2, arrival bearing, the number of interference source and the arrival bearing of interference source of microphone array estimation speaker are used.
S3, microphone array received signal is divided into three branches, branch one is using fixed beam former to speaking The signal in people direction enhances, and obtains the voice spectrum S of the output of branch one(f)(ω, t), wherein t is frame number.Branch two is adopted With blocking matrix B1Inhibit the signal in speaker direction, and the output of blocking matrix is obtained into branch by sef-adapting filter The noise component(s) frequency spectrum of two outputsBranch three uses blocking matrix B2Inhibit speaker and all interference source directions Signal obtains the spectral vector of the space noncoherent noise of the output of branch three
S4, useWithEstimate S(f)The noise spectrum for including in (ω, t)
S5, by S(f)(ω, t) andTrained deep neural network in input step S1, obtains enhanced Voice.
Further, in above-mentioned steps S1, the training of deep neural network uses following steps:
Step S1.1, noisy speech is obtained by the noise in the voice in clean speech library and noise library is superimposed, band is made an uproar language As input, the short-term spectrum of corresponding clean speech is defeated as target for the short-term spectrum of sound and the short-term spectrum of corresponding noise Out, training dataset is obtained.
Step S1.2, the structural parameters of deep neural network are set, and use following cost function:
Wherein X (ω, t) indicates the short-term spectrum of t frame clean speech,It indicates By t frame noisy speech short-term spectrum S(f)(ω, t) and noise short-term spectrumThe input sample of composition, f (Y (ω, T) output of neural network) is indicated, T is the number of speech frames of training.
Step S1.3, training deep neural network, so that the variation of cost function Φ is less than preset value.
It is first K subband by the signal decomposition of input, the signal of each subband passes through in above-mentioned steps S3 and step S4 After three branches are handled, then synthesize the S of full band(f)(ω, t) and
In above-mentioned steps S3, for i-th of subband, the weight matrix w of branch oneq,iIt is calculated using following methods:
Wherein C1i=d (ωi0) it is constraint matrix,M is microphone array Array number, ωiFor the centre frequency of i-th of subband, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1 is speaker Sound reaches m-th of array element and reaches the delay inequality of the 0th array element, and f is response vector.
In above-mentioned steps S3, for i-th of subband, the blocking matrix B of branch two1iIt is calculated using following methods:
By Matrix C1i=d (ωi0) carry out singular value decomposition
Wherein Σ1irFor r1×r1Diagonal matrix, r1For C1iOrder.It enablesWherein U1irFor U1iBefore r1Row,For U1iRemaining rows, then
In above-mentioned steps S3, for i-th of subband, the blocking matrix B of branch three2iIt is calculated using following methods:
By Matrix C2i=[d (ωi0),d(ωi1),…,d(ωiJ)] carry out singular value decomposition
WhereinM is the array number of microphone array, ωiFor i-th of subband Centre frequency, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1, for m-th of array element of speaker's sound arrival and arrival The delay inequality of 0th array element, J is interference source number, θjFor interference The arrival bearing in source, τj,m, 0≤m≤M-1, be j-th interference source sound reach m-th of array element and reach the 0th array element when Prolong difference, Σ2irFor r2×r2Diagonal matrix, r2For C2iOrder.It enablesWherein U2irFor U2iPreceding r2Row,For U2iRemaining rows, then
In above-mentioned steps S4, for i-th of subband, the voice spectrum that branch one exports is calculated using following formulaIn The noise spectrum for including
Wherein wq,iAnd wa,iThe respectively weight of the sef-adapting filter of the fixed beam former of branch one and branch two Vector, B1iFor the blocking matrix of branch two, To be propped up in i-th of subband The spectral vector for the space noncoherent noise that road three exports,The noise component(s) exported for branch two in i-th of subband Frequency spectrum.
Another object of the present invention can be reached by adopting the following technical scheme that:
A kind of realization device of the microphone array voice enhancement method based on deep neural network, the realization device Including microphone array receiving module, sub-band division module, sub-band synthesis module, 24 improved subband GSC and depth nerve Network, wherein the microphone array receiving module, sub-band division module are sequentially connected with, and are respectively used to receive MCVF multichannel voice frequency Signal and division subband;The sub-band synthesis module and deep neural network is sequentially connected with, and is respectively used to synthesis full band signal Neural network with training for filtering;The improved subband GSC module of described 24 respectively with sub-band division module and subband Synthesis module connection carries out GSC filtering for the subband to signal;
Wherein, the microphone array receiving module uses linear array configuration, is uniformly distributed on straight line comprising 8 Microphone, each array element isotropism;The audio signal that the sub-band division module acquires each microphone array element is decomposed For 24 subbands, it is sent to respectively and is correspondingly improved subband GSC and is handled;The sub-band synthesis module is by 24 improved sons Output with GSC synthesizes full band signal, and sending to deep neural network is enhanced.
Further, i-th, i=1,2 ..., 24 improved subband GSC structures include 3 branches, and branch one is using solid Standing wave beamformer wq,iThe signal in speaker direction is enhanced, branch two uses blocking matrix B1iInhibit speaker direction Signal, and by the output of blocking matrix pass through sef-adapting filter wa,i, obtain noise component(s) frequency spectrumBranch three Using blocking matrix B2iThe signal for inhibiting speaker and all interference source directions, obtains the spectral vector of space noncoherent noise
The present invention has the following advantages and effects with respect to the prior art:
1, the present invention inhibits the signal of speaker and interference source direction by branch three, obtains space noncoherent noise frequency Vector is composed, space noncoherent noise can be more accurately estimated and tracked compared with Chinese invention patent 201711201341.5.
2, the present invention is completed using deep neural network from noisy speech and noise to the mapping of clean speech, with tradition GSC directly subtract each other or Chinese invention patent 201711201341.5 in using the statistical models construction such as GMM, HMM linear filter Wave device is compared, and the nonlinear characteristic and temporal correlation of voice signal can be effectively utilized, and is kept estimated result more accurate and is connect Nearly human auditory system.
3, deep neural network used in the present invention is used as input using noisy speech and noise, with tradition only with band Voice of making an uproar deep neural network speech enhancement technique as input, which is compared, has better reinforcing effect.
4, the present invention combines the speech enhancement technique based on microphone array and deep neural network, and performance is better than biography The microphone array voice enhancement method and single microphone deep neural network sound enhancement method of system.
Detailed description of the invention
Fig. 1 is the structural block diagram that microphone array voice enhancement method realizes system in the embodiment of the present invention;
Fig. 2 is i-th of improved subband GSC structural block diagram in the embodiment of the present invention;
Fig. 3 is the flow chart of microphone array voice enhancement method in the embodiment of the present invention;
Fig. 4 is deep neural network structural block diagram used in the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment one
A kind of realization system structure diagram of microphone array voice enhancement method disclosed in the present embodiment, such as Fig. 1 institute Show, by microphone array receiving module, sub-band division module, sub-band synthesis module, 24 improved subband GSC and depth nerve Network collectively forms, and wherein microphone array receiving module, sub-band division module are sequentially connected with, and is respectively used to receive MCVF multichannel voice frequency Signal and division subband;Sub-band synthesis module and deep neural network are sequentially connected with, and are respectively used to synthesis full band signal and training Neural network for filtering;24 improved subband GSC modules are connect with sub-band division module and sub-band synthesis module respectively, GSC filtering is carried out for the subband to signal.In above-described embodiment, microphone array receiving module uses linear array configuration, The microphone being uniformly distributed on straight line comprising 8, each array element isotropism.Sub-band division module adopts each microphone array element The audio signal of collection is decomposed into 24 subbands, is sent to respectively and is correspondingly improved subband GSC and is handled.Sub-band synthesis module is by 24 The output of a improved subband GSC synthesizes full band signal, and sending to deep neural network is enhanced.
I-th of improved subband GSC structure is as shown in Fig. 2, include 3 branches.Branch one uses fixed beam former wq,iThe signal in speaker direction is enhanced, branch two uses blocking matrix B1iInhibit the signal in speaker direction, and will resistance The output for filling in matrix passes through sef-adapting filter wa,i, obtain noise component(s) frequency spectrumBranch three uses blocking matrix B2i The signal for inhibiting speaker and all interference source directions, obtains the spectral vector of space noncoherent noise
Embodiment two
A kind of microphone array voice enhancement method for being based on deep neural network (DNN) is disclosed in the present embodiment, is adopted The microphone array speech enhancement system disclosed in embodiment one is implemented, and the process enhanced to the voice of input is such as Shown in Fig. 3:
Step S1, using clean speech library and the training of noise library for noisy speech and noise to be mapped as clean speech Deep neural network;
In above-mentioned steps S1, the training of deep neural network uses following steps:
Step S1.1, noisy speech is obtained by the noise in the voice in clean speech library and noise library is superimposed, band is made an uproar language As input, the short-term spectrum of corresponding clean speech is defeated as target for the short-term spectrum of sound and the short-term spectrum of corresponding noise Out, training dataset is obtained.
In the present embodiment, the noise in noise library includes the noise of variety classes and different signal-to-noise ratio.
Step S1.2, the structural parameters of deep neural network are set, and use following cost function:
Wherein X (ω, t) indicates the short-term spectrum of t frame clean speech,It indicates By t frame noisy speech short-term spectrum S(f)(ω, t) and noise short-term spectrumThe input sample of composition, f (Y (ω, T) output of neural network) is indicated, T is the number of speech frames of training.
In the present embodiment, deep neural network structure is as shown in figure 4, include 1 dimensionality reduction layer, 10 full articulamentums and 3 Dropout layers, input vector is after dimensionality reduction layer dimensionality reduction, by by 9 full articulamentums and 3 Dropout layers of depth constituted Neural network hidden layer, each full articulamentum number of nodes is 2048, uses Relu as activation primitive, every 3 full articulamentums One Dropout layers are connected afterwards, and 3 Dropout layers of loss ratios are respectively 0.1,0.2,0.2.The output layer of deep neural network To use full articulamentum of the Relu as activation primitive, number of nodes and the dimension for inputting Y (ω, t) are identical.
Step S1.3, deep neural network is trained with gradient descent method, so that the variation of cost function Φ is less than preset value.
Step S2, using arrival bearing, the number of interference source and the incoming wave of interference source of microphone array estimation speaker Direction.
In the present embodiment, the arrival bearing of speaker and the number of interference source and arrival bearing's estimation method are specific as follows:
Step S2.1, number of source is determined using Eigenvalues Decomposition method.When there are J mutually independent far field broadband letters in space Number respectively with incidence angle θj, i=1~J is incident on the homogenous linear matrix containing M array element, and array received signal is
X (t)=AS (t)+N (t)
Wherein X (t) is array received signal phasor, and S (t) is J far-field signal source vector, and A is array manifold rectangle, N It (t) is additivity ambient noise vector.Seek the covariance of array received signal phasor
R=E [X (t) X (t)H]
Expectation is asked in E expression.Eigenvalues Decomposition is carried out to covariance R:
R=U Σ UH
Wherein Σ is that M ties up diagonal matrix, M diagonal element λn, n=1~M, be R characteristic value, U be character pair to Moment matrix, M are array element number.Above-mentioned M characteristic value is subjected to descending arrangement by size, it may be assumed that
λ1≥λ2≥…λn≥λn+1…≥λM
In the present embodiment, signal source number is calculated using following formula
Wherein
K is observation signal sample number.
In another embodiment, signal source number is calculated using following formula
Wherein
Step S2.2, sound bearing angle, method are estimated using Music algorithm are as follows: Eigenvalues Decomposition is carried out to R, it will be minimum The corresponding feature vector of M-K characteristic value form matrix G, calculate MUSIC spectrum PMUSIC(θ), MUSIC compose maximum point and come Wave direction.MUSIC spectrum calculates as follows:
Wherein
It step S3, is 24 subbands, each subband i by sub-band division decomposition module by microphone array received signal Signal be divided into three branches, branch one enhances the signal in speaker direction using fixed beam former, is propped up The voice spectrum that road one exportsWherein t is frame number.Branch two uses blocking matrix B1iInhibit speaker direction Signal obtain the noise component(s) frequency spectrum of the output of branch two and by the output of blocking matrix by sef-adapting filterBranch three uses blocking matrix B2iThe signal for inhibiting speaker and all interference source directions obtains the output of branch three Space noncoherent noise spectral vector
In the present embodiment, the decomposition and synthesis of subband signal are realized using cosine-modulation filtering group, point of filter group Analysis and composite filter are to be to bandwidthLow-pass filter is modulated to obtain, and wherein K=24 is number of sub-bands.For son With the analysis filter group coefficient calculation method decomposed are as follows: with coefficient be h0(l) low-pass filter is ptototype filter, then divides Analysing filter coefficient is
Wherein hkIt (l) is the coefficient of k-th of filter in analysis filter group, L is the order of ptototype filter, l=0~ L-1, k=0~K-1, θk=(- 1)kπ/4。
In the present embodiment, for i-th of subband, the weight matrix w of branch oneq,iIt is calculated using following methods:
Wherein C1i=d (ωi0) it is constraint matrix,M is microphone array Array number, ωiFor the centre frequency of i-th of subband, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1 is speaker Sound reaches m-th of array element and reaches the delay inequality of the 0th array element, and f is response vector.Microphone array receiving module is exported I-th of subband signal pass through wq,iAfter weighting, the voice spectrum of the output of branch one is obtained
For i-th of subband, the blocking matrix B of branch two1iIt is calculated using following methods: by Matrix C1i=d (ωi0) Carry out singular value decomposition
Wherein Σ1irFor r1×r1Diagonal matrix, r1For C1iOrder.It enablesWherein U1irFor U1iBefore r1Row,For U1iRemaining rows, thenI-th of subband signal that microphone array receiving module is exported passes through Blocking matrix B1iAfter weighting, then pass through sef-adapting filter wa,i, obtain the noise component(s) frequency spectrum of the output of branch two
For i-th of subband, the blocking matrix B of branch three2iIt is calculated using following methods: by Matrix C2i=[d (ωi, θ0),d(ωi1),…,d(ωiJ)] carry out singular value decomposition
WhereinM is the array number of microphone array, ωiFor i-th of subband Centre frequency, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1, for m-th of array element of speaker's sound arrival and arrival The delay inequality of 0th array element, J is interference source number, θjFor interference The arrival bearing in source, τj,m, 0≤m≤M-1, be j-th interference source sound reach m-th of array element and reach the 0th array element when Prolong difference, Σ2irFor r2×r2Diagonal matrix, r2For C2iOrder.It enablesWherein U2irFor U2iPreceding r2Row,For U2iRemaining rows, thenI-th of subband signal that microphone array receiving module is exported passes through obstruction Matrix B2iAfter weighting, the noise component(s) frequency spectrum of the output of branch three is obtained
Step S4, it usesWithEstimate S(f)The noise spectrum for including in (ω, t)
In the present embodiment, for i-th of subband, the voice spectrum that branch one exports is calculated using following formulaMiddle packet The noise spectrum contained
Wherein wq,iAnd wa,iThe respectively weight of the sef-adapting filter of the fixed beam former of branch one and branch two Vector, B1iFor the blocking matrix of branch two, To be propped up in i-th of subband The spectral vector for the space noncoherent noise that road three exports,The noise component(s) exported for branch two in i-th of subband Frequency spectrum.
In the present embodiment, the S of all subbandsi (f)(ω) andBand S is helped using cosine-modulation filtering combination(f) (ω) andComposite filter group coefficient calculation method for subband synthesis are as follows: using identical as analysis filter group Lowpass prototype filter h0(l), Synthetic filter section coefficient is
Wherein gkIt (l) is the coefficient of k-th of filter in composite filter group, L is the order of ptototype filter, l=0~ L-1, k=0~K-1, θk=(- 1)kπ/4。
Step S5, by S(f)(ω) andThe deep neural network of training, obtains enhanced in input step S1 Voice.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (9)

1. a kind of microphone array voice enhancement method based on deep neural network, which is characterized in that use following steps pair The voice signal of input is enhanced:
S1, the depth using clean speech library and the training of noise library for noisy speech and noise to be mapped as to clean speech are neural Network;
S2, arrival bearing, the number of interference source and the arrival bearing of interference source of microphone array estimation speaker are used;
S3, microphone array received signal is divided into three branches, branch one is using fixed beam former to speaker side To signal enhanced, obtain branch one output voice spectrum S(f)(ω, t), wherein t is frame number;Branch two is using resistance Fill in matrix B1Inhibit the signal in speaker direction, and it is defeated to be obtained into branch two by sef-adapting filter for the output of blocking matrix Noise component(s) frequency spectrum outBranch three uses blocking matrix B2Inhibit the letter of speaker and all interference source directions Number, obtain the spectral vector of the space noncoherent noise of the output of branch three
S4, useWithEstimate S(f)The noise spectrum for including in (ω, t)
S5, by S(f)(ω, t) andThe deep neural network of training, obtains enhanced voice in input step S1.
2. microphone array voice enhancement method according to claim 1, which is characterized in that depth in the step S1 The training of neural network uses following steps:
S1.1, noisy speech is obtained by the noise in the voice in clean speech library and noise library is superimposed, in short-term by noisy speech The short-term spectrum of frequency spectrum and corresponding noise is exported as target, is obtained as input, the short-term spectrum of corresponding clean speech Training dataset;
S1.2, the structural parameters that deep neural network is set, and use following cost function:
Wherein X (ω, t) indicates the short-term spectrum of t frame clean speech,It indicates by the T frame noisy speech short-term spectrum S(f)(ω, t) and noise short-term spectrumThe input sample of composition, f (Y (ω, t)) table Show the output of neural network, T is the number of speech frames of training;
S1.3, training deep neural network, so that the variation of cost function Φ is less than preset value.
3. microphone array voice enhancement method according to claim 1, which is characterized in that the step S3 and step In S4, it is first K subband by the signal decomposition of input, after the signal of each subband is handled by three branches, then closes Help the S of band(f)(ω, t) and
4. microphone array voice enhancement method according to claim 1, which is characterized in that right in the step S3 In i-th, i=1,2 ..., 24 subbands, the weight matrix w of branch oneq,iIt is calculated using following methods:
Wherein C1i=d (ωi0) it is constraint matrix,M is the battle array of microphone array First number, ωiFor the centre frequency of i-th of subband, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1 is speaker's sound It reaches m-th of array element and reaches the delay inequality of the 0th array element, f is response vector.
5. microphone array voice enhancement method according to claim 1, which is characterized in that right in the step S3 In i-th, i=1,2 ..., 24 subbands, the blocking matrix B of branch two1iIt is calculated using following methods:
By Matrix C1i=d (ωi0) carry out singular value decomposition
Wherein Σ1irFor r1×r1Diagonal matrix, r1For C1iOrder.It enablesWherein U1irFor U1iPreceding r1Row,For U1iRemaining rows, then
6. microphone array voice enhancement method according to claim 1, which is characterized in that right in the step S3 In i-th, i=1,2 ..., 24 subbands, the blocking matrix B of branch three2iIt is calculated using following methods:
By Matrix C2i=[d (ωi0),d(ωi1),…,d(ωiJ)] carry out singular value decomposition
WhereinM is the array number of microphone array, ωiFor in i-th of subband Frequency of heart, θ0For the arrival bearing of speaker, τ0,m, 0≤m≤M-1, for m-th of array element of speaker's sound arrival and arrival the 0th The delay inequality of a array element,1≤j≤J, J are interference source number, θjFor interference source Arrival bearing, τj,m, 0≤m≤M-1 is that j-th of interference source sound reaches m-th of array element and reaches the delay inequality of the 0th array element, Σ2irFor r2×r2Diagonal matrix, r2For C2iOrder, enableWherein U2irFor U2iBeforer2Row,For U2i Remaining rows, then
7. microphone array voice enhancement method according to claim 6, which is characterized in that right in the step S4 In i-th, i=1,2 ..., 24 subbands calculate the voice spectrum that branch one exports using following formulaIn include noise Frequency spectrum
Wherein wq,iAnd wa,iThe respectively weight vector of the sef-adapting filter of the fixed beam former of branch one and branch two, B1iFor the blocking matrix of branch two, For branch three in i-th of subband The spectral vector of the space noncoherent noise of output,The noise component(s) frequency spectrum exported for branch two in i-th of subband.
8. a kind of realization device of the microphone array voice enhancement method based on deep neural network, which is characterized in that described Realization device include microphone array receiving module, sub-band division module, sub-band synthesis module, 24 improved subband GSC And deep neural network, wherein the microphone array receiving module, sub-band division module are sequentially connected with, and are respectively used to connect It receives multipath audio signal and divides subband;The sub-band synthesis module and deep neural network is sequentially connected with, and is respectively used to close Neural network at full band signal and training for filtering;The improved subband GSC module of described 24 respectively with sub-band division Module is connected with sub-band synthesis module, carries out GSC filtering for the subband to signal;
Wherein, the microphone array receiving module uses linear array configuration, the wheat being uniformly distributed on straight line comprising 8 Gram wind, each array element isotropism;The audio signal that each microphone array element acquires is decomposed into 24 by the sub-band division module A subband is sent to respectively and is correspondingly improved subband GSC and is handled;The sub-band synthesis module is by 24 improved subbands The output of GSC synthesizes full band signal, and sending to deep neural network is enhanced.
9. the realization device of microphone array voice enhancement method according to claim 8, which is characterized in that i-th, i= 1,2 ..., 24 improved subband GSC structures include 3 branches, and branch one uses fixed beam former wq,iTo speaker side To signal enhanced, branch two use blocking matrix B1iInhibit the signal in speaker direction, and by the output of blocking matrix Pass through sef-adapting filter wa,i, obtain noise component(s) frequency spectrumBranch three uses blocking matrix B2iInhibit speaker and The signal in all interference source directions, obtains the spectral vector of space noncoherent noise
CN201910677433.3A 2019-07-25 2019-07-25 Microphone array speech enhancement method and implementation device Active CN110517701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910677433.3A CN110517701B (en) 2019-07-25 2019-07-25 Microphone array speech enhancement method and implementation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910677433.3A CN110517701B (en) 2019-07-25 2019-07-25 Microphone array speech enhancement method and implementation device

Publications (2)

Publication Number Publication Date
CN110517701A true CN110517701A (en) 2019-11-29
CN110517701B CN110517701B (en) 2021-09-21

Family

ID=68624022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910677433.3A Active CN110517701B (en) 2019-07-25 2019-07-25 Microphone array speech enhancement method and implementation device

Country Status (1)

Country Link
CN (1) CN110517701B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341342A (en) * 2020-02-11 2020-06-26 上海应用技术大学 Vehicle-mounted voice extraction method and system based on environmental sound separation
CN111596261A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Sound source positioning method and device
CN111866665A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Microphone array beam forming method and device
CN112017681A (en) * 2020-09-07 2020-12-01 苏州思必驰信息科技有限公司 Directional voice enhancement method and system
CN112259113A (en) * 2020-09-30 2021-01-22 清华大学苏州汽车研究院(相城) Preprocessing system for improving accuracy rate of speech recognition in vehicle and control method thereof
CN114373475A (en) * 2021-12-28 2022-04-19 陕西科技大学 Voice noise reduction method and device based on microphone array and storage medium
CN114501283A (en) * 2022-04-15 2022-05-13 南京天悦电子科技有限公司 Low-complexity double-microphone directional sound pickup method for digital hearing aid
CN114613385A (en) * 2022-05-07 2022-06-10 广州易而达科技股份有限公司 Far-field voice noise reduction method, cloud server and audio acquisition equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076756A1 (en) * 2008-03-28 2010-03-25 Southern Methodist University Spatio-temporal speech enhancement technique based on generalized eigenvalue decomposition
CN102509552A (en) * 2011-10-21 2012-06-20 浙江大学 Method for enhancing microphone array voice based on combined inhibition
CN104835503A (en) * 2015-05-06 2015-08-12 南京信息工程大学 Improved GSC self-adaptive speech enhancement method
CN105792074A (en) * 2016-02-26 2016-07-20 西北工业大学 Voice signal processing method and device
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model
US9972315B2 (en) * 2015-01-14 2018-05-15 Honda Motor Co., Ltd. Speech processing device, speech processing method, and speech processing system
CN108417224A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 The training and recognition methods of two way blocks model and system
US10074380B2 (en) * 2016-08-03 2018-09-11 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal
CN108615536A (en) * 2018-04-09 2018-10-02 华南理工大学 Time-frequency combination feature musical instrument assessment of acoustics system and method based on microphone array
CN109616136A (en) * 2018-12-21 2019-04-12 出门问问信息科技有限公司 A kind of Adaptive beamformer method, apparatus and system
CN109686381A (en) * 2017-10-19 2019-04-26 恩智浦有限公司 Signal processor and correlation technique for signal enhancing
US20190172476A1 (en) * 2017-12-04 2019-06-06 Apple Inc. Deep learning driven multi-channel filtering for speech enhancement

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076756A1 (en) * 2008-03-28 2010-03-25 Southern Methodist University Spatio-temporal speech enhancement technique based on generalized eigenvalue decomposition
CN102509552A (en) * 2011-10-21 2012-06-20 浙江大学 Method for enhancing microphone array voice based on combined inhibition
US9972315B2 (en) * 2015-01-14 2018-05-15 Honda Motor Co., Ltd. Speech processing device, speech processing method, and speech processing system
CN104835503A (en) * 2015-05-06 2015-08-12 南京信息工程大学 Improved GSC self-adaptive speech enhancement method
CN105792074A (en) * 2016-02-26 2016-07-20 西北工业大学 Voice signal processing method and device
US10074380B2 (en) * 2016-08-03 2018-09-11 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal
CN109686381A (en) * 2017-10-19 2019-04-26 恩智浦有限公司 Signal processor and correlation technique for signal enhancing
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model
US20190172476A1 (en) * 2017-12-04 2019-06-06 Apple Inc. Deep learning driven multi-channel filtering for speech enhancement
CN108417224A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 The training and recognition methods of two way blocks model and system
CN108615536A (en) * 2018-04-09 2018-10-02 华南理工大学 Time-frequency combination feature musical instrument assessment of acoustics system and method based on microphone array
CN109616136A (en) * 2018-12-21 2019-04-12 出门问问信息科技有限公司 A kind of Adaptive beamformer method, apparatus and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIONG Y: ""Model-Based Post Filter for Microphone Array Speech Enhancement"", 《2018 7TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH)》 *
YAN X: ""Speech Enhancement Based on Multi-Stream Model"", 《2016 6TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH)》 *
张晖: ""基于深度学习的语音分离研究"", 《中国博士学位论文全文数据库信息科技辑》 *
***: ""基于波束形成与DNN的远距离语音识别方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341342A (en) * 2020-02-11 2020-06-26 上海应用技术大学 Vehicle-mounted voice extraction method and system based on environmental sound separation
CN111596261A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Sound source positioning method and device
CN111596261B (en) * 2020-04-02 2022-06-14 云知声智能科技股份有限公司 Sound source positioning method and device
CN111866665A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Microphone array beam forming method and device
CN112017681A (en) * 2020-09-07 2020-12-01 苏州思必驰信息科技有限公司 Directional voice enhancement method and system
CN112259113A (en) * 2020-09-30 2021-01-22 清华大学苏州汽车研究院(相城) Preprocessing system for improving accuracy rate of speech recognition in vehicle and control method thereof
CN114373475A (en) * 2021-12-28 2022-04-19 陕西科技大学 Voice noise reduction method and device based on microphone array and storage medium
CN114501283A (en) * 2022-04-15 2022-05-13 南京天悦电子科技有限公司 Low-complexity double-microphone directional sound pickup method for digital hearing aid
CN114613385A (en) * 2022-05-07 2022-06-10 广州易而达科技股份有限公司 Far-field voice noise reduction method, cloud server and audio acquisition equipment

Also Published As

Publication number Publication date
CN110517701B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN110517701A (en) A kind of microphone array voice enhancement method and realization device
CN106251877B (en) Voice Sounnd source direction estimation method and device
CN107845389B (en) Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN107993670B (en) Microphone array speech enhancement method based on statistical model
CA2621940C (en) Method and device for binaural signal enhancement
CN104717587B (en) Earphone and method for Audio Signal Processing
CN105869651B (en) Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
CN110085248B (en) Noise estimation at noise reduction and echo cancellation in personal communications
WO2013033991A1 (en) Method, device, and system for noise reduction in multi-microphone array
CN108447500B (en) Method and device for speech enhancement
CN108122559B (en) Binaural sound source positioning method based on deep learning in digital hearing aid
JP2018513625A (en) Adaptive mixing of subband signals
CN108986832A (en) Ears speech dereverberation method and device based on voice probability of occurrence and consistency
CN109637554A (en) MCLP speech dereverberation method based on CDR
CN110534127A (en) Applied to the microphone array voice enhancement method and device in indoor environment
CN106331969B (en) Method and system for enhancing noisy speech and hearing aid
Niwa et al. Pinpoint extraction of distant sound source based on DNN mapping from multiple beamforming outputs to prior SNR
CN111312275B (en) On-line sound source separation enhancement system based on sub-band decomposition
WO2023108864A1 (en) Regional pickup method and system for miniature microphone array device
Neo et al. Signal compaction using polynomial EVD for spherical array processing with applications
US20190090052A1 (en) Cost effective microphone array design for spatial filtering
Wang et al. Two-stage enhancement of noisy and reverberant microphone array speech for automatic speech recognition systems trained with only clean speech
Shanmugapriya et al. Evaluation of sound classification using modified classifier and speech enhancement using ICA algorithm for hearing aid application
Dam et al. Space constrained beamforming with source PSD updates
CN113763983B (en) Robust speech enhancement method and system based on mouth-binaural room impulse response

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant