CN105096961B - Speech separating method and device - Google Patents
Speech separating method and device Download PDFInfo
- Publication number
- CN105096961B CN105096961B CN201410189386.5A CN201410189386A CN105096961B CN 105096961 B CN105096961 B CN 105096961B CN 201410189386 A CN201410189386 A CN 201410189386A CN 105096961 B CN105096961 B CN 105096961B
- Authority
- CN
- China
- Prior art keywords
- signal
- value
- masking matrix
- voice signal
- harmonic compensation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims abstract description 166
- 230000000873 masking effect Effects 0.000 claims abstract description 158
- 238000000926 separation method Methods 0.000 claims abstract description 116
- 238000001228 spectrum Methods 0.000 claims description 41
- 238000007667 floating Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 25
- 230000003595 spectral effect Effects 0.000 claims description 22
- 238000009499 grossing Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 11
- 230000005236 sound signal Effects 0.000 claims description 8
- 238000002156 mixing Methods 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005315 distribution function Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- VWDWKYIASSYTQR-UHFFFAOYSA-N sodium nitrate Chemical compound [Na+].[O-][N+]([O-])=O VWDWKYIASSYTQR-UHFFFAOYSA-N 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
Landscapes
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The embodiment of the present invention provides a kind of speech separating method and device, the present embodiment speech separating method, it include: by obtaining the first signal, initial ideal two-value masking matrix is determined according to the first signal, according to initial ideal two-value masking matrix, harmonic compensation is carried out to the first signal, separation voice signal after obtaining harmonic compensation, according to the separation voice signal after harmonic compensation, the first signal and the second signal are filtered, obtain target separation voice signal, to reduce the generation of Energy volution in target separation voice signal, inhibit the distortion of target separation voice signal.
Description
Technical field
The present embodiments relate to signal processing technology field more particularly to a kind of speech separating methods and device.
Background technique
Speech processing is as a noticeable research field in recent years, so far in the continuous language of large vocabulary
Sound identification, speech synthesis, voice communication etc. achieve a series of achievements to attract people's attention.However, existing voice signal
Processing technique is researched and developed under the voice environment in clean speech or with small noise, in more noisy environment not
Satisfactory effect can be always obtained, this limits part of speech Related product answering in real life to a certain extent
With.Therefore, how to inhibit or eliminate background noise, so that isolating targeted voice signal has become Speech processing neck
An important research direction in domain.
Computational auditory scene analysis is mainly based upon the research of physiology of hearing and psychological field, shelters plan using acoustics
Speech Separation is slightly carried out, so that separation voice more meets the perception characteristics of human ear.In the prior art, it generallys use based on threshold value
Ideal two-value masking (Ideal Binary Mask, abbreviation IBM) matrix carries out Computational auditory scene analysis, and IBM matrix is one
Dimension 0-1 matrix identical with time-frequency spectrum, wherein 1 corresponding voice dominates time frequency unit, 0 corresponding noise dominates time frequency unit.
In target voice synthesis phase, the leading time frequency unit energy of voice is all retained, and noise dominates time frequency unit energy can be complete
Portion is rejected.However since to will cause the leading time frequency unit of part of speech wrong for the erroneous estimation of the IBM matrix based on threshold value
Accidentally refuse, the leading time frequency unit of partial noise is mistakenly retained, so as to cause generating in voice signal after isolation
The cavity of many speech energies, to largely distort primitive sound signal.
Summary of the invention
The embodiment of the present invention provides a kind of speech separating method and device, using Computational auditory scene analysis and ideal floating value
Masking strategy obtains separation voice signal, to reduce the generation of Energy volution in separation voice signal, it is suppressed that separation voice
The distortion of signal.
In a first aspect, the embodiment of the present invention provides a kind of speech separating method, comprising:
The first signal is obtained, first signal includes voice signal and noise signal;
Determine that initial ideal two-value masking matrix, the initial ideal two-value masking matrix are used for according to first signal
Distinguish voice signal and noise signal that first signal includes;
According to the initial ideal two-value masking matrix, harmonic compensation is carried out to first signal, obtains harmonic compensation
Separation voice signal afterwards;
According to the separation voice signal after the harmonic compensation, the first signal and the second signal are filtered, are obtained
Voice signal is separated to target.
In the first possible implementation of the first aspect, described that initial ideal two is determined according to first signal
It is worth masking matrix, comprising:
Calculate the average value of the power spectrum of the noise signal;
According to the average value of the power spectrum of the noise signal, the institute for constituting the initial ideal two-value masking matrix is determined
There is the value of time frequency unit;
According to the value for all time frequency units for constituting the initial ideal two-value masking matrix, described initial ideal two are determined
It is worth masking matrix.
According to the first possible implementation of first aspect, in the second possible implementation, the calculating
The average value of the power spectrum of the noise signal, comprising:
Fourier transformation is carried out according to the frame number for being used for estimated noise in first signal and to first signal
Later t frame, kth frequency range frequency-region signal power spectral density, calculate the average value of the power spectrum of the noise signal, t is
Integer more than or equal to 1, k are greater than or equal to 1 integer.
According to the first any one into second of possible implementation of first aspect, first aspect,
It is described according to the initial ideal two-value masking matrix in three kinds of possible implementations, harmonic wave is carried out to first signal
Compensation, the separation voice signal after obtaining harmonic compensation, comprising:
The initial ideal two-value masking matrix is updated, updated two-value masking matrix, the update are obtained
Two-value masking matrix afterwards is for purifying the target separation voice signal;
According to the updated two-value masking matrix, harmonic compensation is carried out to first signal, obtains harmonic compensation
Separation voice signal afterwards.
According to the third possible implementation of first aspect, in the fourth possible implementation, to described first
The ideal that begins two-value masking matrix is updated, and obtains updated two-value masking matrix, comprising:
It is leading to the voice in the initial ideal two-value masking matrix according to current iteration number and maximum number of iterations
The value of time frequency unit be updated;
The knot being updated according to the value of the time frequency unit leading to the voice in the initial ideal two-value masking matrix
Fruit obtains updated two-value masking matrix.
According to the third or the 4th kind of possible implementation of first aspect, in a fifth possible implementation,
It is described according to the updated two-value masking matrix, harmonic compensation is carried out to first signal, after obtaining harmonic compensation
Separate voice signal, comprising:
According to the updated two-value masking matrix, the initially-separate voice signal of first signal is obtained;
The initially-separate voice signal is handled, ideal floating value masking matrix is obtained;
According to the ideal floating value masking matrix, harmonic compensation is carried out to first signal, after obtaining harmonic compensation
Separate voice signal.
According to the 5th of first aspect the kind of possible implementation, in a sixth possible implementation,
It is described that the initially-separate voice signal is handled, obtain ideal floating value masking matrix, comprising:
Inverse Fourier transform is carried out to the initially-separate voice signal, is obtained corresponding to the initially-separate voice signal
Time-domain signal;
Halfwave rectifier processing is carried out to the corresponding time-domain signal of the initially-separate voice signal, after obtaining halfwave rectifier
Time-domain signal;
Short Time Fourier Transform is carried out to the time-domain signal after the halfwave rectifier, and is calculated by the Fourier in short-term
The power spectral density obtained after transformation;
According to the power spectral density obtained after the Short Time Fourier Transform, the initially-separate voice signal is carried out flat
Sliding processing, to obtain the result after smoothing processing;
According to after the average value of the power spectrum of the noise signal and the smoothing processing as a result, obtaining described ideal floating
It is worth masking matrix.
According to the 6th of first aspect the kind of possible implementation, in the 7th kind of possible implementation, the basis
Separation voice signal after the harmonic compensation, is filtered the first signal and the second signal, obtains the target point
From voice signal, comprising:
According to the separation voice signal after the harmonic compensation, determination is filtered the first signal and the second signal
The filter of the main channel of Shi Caiyong and the filter of subaisle;
According to the filter of the main channel used when being filtered to the first signal and the second signal and subaisle
Filter is filtered the first signal and the second signal, obtains the target separation voice signal.
Second aspect, the embodiment of the present invention provide a kind of speech Separation device, comprising:
Module is obtained, for obtaining the first signal, first signal includes voice signal and noise signal;
Determining module, for determining initial ideal two-value masking matrix according to first signal, described initial ideal two
Value masking matrix is for distinguishing voice signal and noise signal that first signal includes;
Harmonic compensation module, for carrying out harmonic wave to first signal according to the initial ideal two-value masking matrix
Compensation, the separation voice signal after obtaining harmonic compensation;
Filter module, for believing first signal and second according to the separation voice signal after the harmonic compensation
It number is filtered, obtains target separation voice signal.
In the first possible implementation of the second aspect, the determining module is specifically used for calculating the noise
The average value of the power spectrum of signal;According to the average value of the power spectrum of the noise signal, determines and constitute described initial ideal two
It is worth the value of all time frequency units of masking matrix;According to all time frequency units for constituting the initial ideal two-value masking matrix
Value determines the initial ideal two-value masking matrix.
According to the first possible implementation of second aspect, in the second possible implementation, the determination
Module, specifically for carrying out Fourier according to the frame number for being used for estimated noise in first signal and to first signal
T frame after transformation, kth frequency range frequency-region signal power spectral density, calculate the average value of the power spectrum of the noise signal,
T is greater than or equal to 1 integer, and k is greater than or equal to 1 integer.
According to the first any one into second of possible implementation of second aspect, second aspect,
In three kinds of possible implementations, the harmonic compensation module is specifically used for carrying out the initial ideal two-value masking matrix
It updates, obtains updated two-value masking matrix, the updated two-value masking matrix is for purifying the target separation language
Sound signal;According to the updated two-value masking matrix, harmonic compensation is carried out to first signal, after obtaining harmonic compensation
Separation voice signal.
According to the third possible implementation of second aspect, in the fourth possible implementation, the harmonic wave
Compensating module is specifically used for according to current iteration number and maximum number of iterations, in the initial ideal two-value masking matrix
The value of the leading time frequency unit of voice be updated;According to what is dominated to the voice in the initial ideal two-value masking matrix
It is that the value of time frequency unit is updated as a result, obtaining updated two-value masking matrix.
According to the third or the 4th kind of possible implementation of second aspect, in a fifth possible implementation,
The harmonic compensation module is specifically used for obtaining the initial of first signal according to the updated two-value masking matrix
Separate voice signal;The initially-separate voice signal is handled, ideal floating value masking matrix is obtained;According to the ideal
Floating value masking matrix, carries out harmonic compensation to first signal, the separation voice signal after obtaining harmonic compensation.
According to the 5th of second aspect the kind of possible implementation, in a sixth possible implementation, the harmonic wave
Compensating module is specifically used for carrying out inverse Fourier transform to the initially-separate voice signal, obtain and the initially-separate language
The corresponding time-domain signal of sound signal;Halfwave rectifier processing is carried out to the corresponding time-domain signal of the initially-separate voice signal, is obtained
Time-domain signal after obtaining halfwave rectifier;Short Time Fourier Transform is carried out to the time-domain signal after the halfwave rectifier, and calculates warp
Cross the power spectral density obtained after the Short Time Fourier Transform;According to the power spectrum obtained after the Short Time Fourier Transform
Degree, is smoothed the initially-separate voice signal, to obtain the result after smoothing processing;According to the noise signal
Power spectrum average value and after the smoothing processing as a result, obtaining the ideal floating value masking matrix.
According to the 6th of second aspect the kind of possible implementation, in the 7th kind of possible implementation, the filtering
Module, specifically for according to the separation voice signal after the harmonic compensation, determine to the first signal and the second signal into
The filter of the main channel used when row filtering and the filter of subaisle;It is carried out according to the first signal and the second signal
The filter of the main channel used when filtering and the filter of subaisle, are filtered the first signal and the second signal,
Obtain the target separation voice signal.
Speech separating method of the embodiment of the present invention and device are determined initial by obtaining the first signal according to the first signal
Ideal two-value masking matrix carries out harmonic compensation to the first signal, obtains harmonic compensation according to initial ideal two-value masking matrix
Separation voice signal afterwards is filtered the first signal and the second signal, obtains according to the separation voice signal after harmonic compensation
Voice signal is separated to target, to reduce the generation of Energy volution in target separation voice signal, it is suppressed that target separates language
The distortion of sound signal.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of speech separating method provided by the embodiment of the present invention one;
Fig. 2 is the flow chart of speech separating method provided by the embodiment of the present invention two;
Fig. 3 is the structural schematic diagram of speech Separation device 300 provided by the embodiment of the present invention three;
Fig. 4 is the structural schematic diagram of speech Separation device 400 provided by the embodiment of the present invention four.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1 is the flow chart of speech separating method provided by the embodiment of the present invention one.The method of the present embodiment is suitable for
The case where reducing target separation voice signal distortion based on Computational auditory scene analysis.This method is executed by speech Separation device,
The device is realized usually in a manner of hardware and/or software.The method of the present embodiment includes the following steps:
S110, the first signal is obtained, the first signal includes voice signal and noise signal.
S120, determine that initial ideal two-value masking matrix, initial ideal two-value masking matrix are used for area according to the first signal
The voice signal and noise signal for dividing the first signal to include.
The initial ideal two-value masking matrix of S130, basis, carries out harmonic compensation to the first signal, after obtaining harmonic compensation
Separate voice signal.
Often the frequency of occurrences is empty for the separation voice signal that speech separating method based on Computational auditory scene analysis obtains
The phenomenon that hole, occurs so as to cause separation voice signal distortion in order to reduce the phenomenon, by according to initial reason in the present embodiment
Think two-value masking matrix, harmonic compensation is carried out to the first signal, to obtain the separation voice signal after harmonic compensation, with this
Reduce frequency cavitation.
S140, according to the separation voice signal after harmonic compensation, the first signal and the second signal are filtered, mesh is obtained
Mark separation voice signal.
Existing single pass speech signal separation technology has extraordinary treatment effect for steady noise, but for similar
The non-stationaries noises such as background music, non-targeted people's sound of speaking can then generate speech damage.And according to the separation after harmonic compensation
Voice signal is filtered the first signal and the second signal, and the method for obtaining target separation voice signal can make full use of mesh
Poster sound and noise are filtered the first signal and the second signal in the redundancy of different spatial, so as to
To further suppress target separation voice signal distortion.
Specifically, obtaining the first signal, initial ideal two-value masking matrix is determined according to the first signal, according to initial ideal
Two-value masking matrix carries out harmonic compensation to the first signal, the separation voice signal after obtaining harmonic compensation, according to harmonic compensation
Separation voice signal afterwards, is filtered the first signal and the second signal, obtains target separation voice signal.
Speech separating method provided in this embodiment is determined initial ideal by obtaining the first signal according to the first signal
Two-value masking matrix carries out harmonic compensation to the first signal, after obtaining harmonic compensation according to initial ideal two-value masking matrix
Separation voice signal is filtered the first signal and the second signal, obtains mesh according to the separation voice signal after harmonic compensation
Mark separation voice signal, to reduce the generation of Energy volution in target separation voice signal, it is suppressed that target separates voice letter
Number distortion.
Based on above-described embodiment, further progress optimization, Fig. 2 is provided by the embodiment of the present invention two the present embodiment
Speech separating method flow chart, referring to Fig. 2, the method for the present embodiment may include:
S210, the first signal is obtained, the first signal includes voice signal and noise signal.
First signal can come from the main microphon close to speaker mouth.
S220, calculate noise signal power spectrum average value.
The average value for calculating the power spectrum of noise signal can be accomplished in that
Y (t, k) indicates the first signal by t after Fourier transformation
Frame, kth frequency range frequency-region signal power spectral density, T indicate the first signal totalframes, D indicate T frame mixing voice signal open
Stage beginning is used for the frame number of estimated noise, and D ' expression T frame mixing voice signal ending phase is used for the frame number of estimated noise.
It should be noted that usually starting voice in several frames recorded of recording and terminate in practical Recording Process
Energy lacks completely, therefore Speech processing is frequently utilized that these frames carry out estimated noise, such as can use and start to record
Scale section and each 20 frame of ending phase carry out estimated noise, that is, the power of 20 frames when 20 frames and recording for starting recording close to an end
The average value namely D that the average value of spectrum density is composed as power noise are equal to 20, and value of the D ' equal to 20, D and D ' can phase
It together, can not also be identical.
S230, the average value according to the power spectrum of noise signal are determined and are constituted all of initial ideal two-value masking matrix
The value of time frequency unit.
Wherein, it according to the average value of the power spectrum of noise signal, determines and constitutes all of initial ideal two-value masking matrix
The value of time frequency unit can be accomplished in that
γ indicates control parameter, and 1.5≤γ≤2.5, M (t, k) indicate first
The value of the corresponding time frequency unit (t, k) of signal t frame, kth frequency range, wherein M (t, k), which is equal to 1, indicates that the time frequency unit is language
Sound dominates time frequency unit, and M (t, k), which is equal to 0, indicates that the time frequency unit is that noise dominates time frequency unit.
It should be noted that using excessively high control parameter γ, more target voice energy can be lost, and too low
One control parameter can then remain more noise energies, can both be reached very when the first control parameter is set as 2
Good counterbalance effect.
S240, according to the value for all time frequency units for constituting initial ideal two-value masking matrix, determine initial ideal two-value
Masking matrix.
S250, initial ideal two-value masking matrix is updated, obtains updated two-value masking matrix, it is updated
Two-value masking matrix is for purifying target separation voice signal.
Since the power spectral density of the leading time frequency unit of some noises may also can be much larger than the power spectrum of noise signal
Average value, therefore in the prior art using the ideal two-value masking matrix estimation method based on threshold value can generate it is many discrete
Noise, such as power spectral density still have been retained greater than the noise of the average value of the power spectrum of noise signal.And this reality
It applies in example and is updated by initial ideal two-value masking matrix, discrete two-value can effectively be inhibited to shelter mistake, so that
It is purer to separate voice signal.
For example, initial ideal two-value masking matrix is updated, obtaining updated two-value masking matrix can be with
It is accomplished in that
According to current iteration number and maximum number of iterations, when leading to the voice in initial ideal two-value masking matrix
The value of frequency unit is updated;It is carried out more according to the value of the time frequency unit leading to the voice in initial ideal two-value masking matrix
It is new as a result, obtaining updated two-value masking matrix.
Specifically, according to current iteration number and maximum number of iterations, to the voice in initial ideal two-value masking matrix
The value of leading time frequency unit, which is updated, to be accomplished in that
If current iteration number i is less than maximum number of iterations Niter, then from the time-frequency list of initial ideal two-value masking matrix
A time frequency unit (t, k) is randomly choosed in member, wherein Niter=3 × K × T, K indicate that the first signal passes through Fourier transformation
Frequency range number afterwards, 1≤t≤T;If a randomly selected time frequency unit (t, k) is the leading time frequency unit of voice, basis
Time frequency unit distribution function calculates N and N', and calculates the value of p (M (t, k)=1) and p (M (t, k)=0);Calculate p (M (t, k)
=1) and the ratio r of p (M (t, k)=0)0, whereinIf r is less than or equal to r0, then when will be randomly selected
Frequency unit (t, k) is updated to noise and dominates time frequency unit, and r indicates the random number generated using random function, the first signal
Corresponding initial IBM matrix M is M ' by the updated matrix of iteration.
Wherein, N indicate the first signal t frame, the corresponding time frequency unit (t, k) of kth frequency range neighborhood in time frequency unit
Sum has been marked as the number of the leading time frequency unit of voice in N ' expression neighborhood.Calculate p (M (t, k)=1) and p (M (t,
K) value=0) need to be determining according to time frequency unit distribution function, time frequency unit Distribution Function Definition are as follows:
Wherein, α, δ respectively indicate different control parameters, the time frequency unit in the neighborhood of time frequency unit (t, k) be expressed as (t ',
K '), the value range of t ', k ' are as follows: (t ', k ') | | and t-t ' |≤Nt,|k-k′|≤Nk, NtAnd NkValue be all 1, exp (α ×
N' α × N' the power for) indicating e indicates that the power of e, exp (α × (N-N')) indicate e
α × (N-N') power, indicate the power of e, p (M (t, k)=1) indicates neighborhood
Interior that time frequency unit (t, k) is revised as to the probability that voice dominates time frequency unit, p (M (t, k)=0) is indicated time frequency unit in neighborhood
(t, k) is revised as the probability that noise dominates time frequency unit,Indicate p (M
(t, k)=1) to it is directly proportional,
Indicate p (M (t, k)=0) withIt is directly proportional.
It should be noted that control parameter α is equal to 2, control parameter δ is equal to 0.25, has comprehensively considered part and neighborhood
Energy distribution information, to effectively inhibit discrete two-value masking mistake.
S260, according to updated two-value masking matrix, harmonic compensation is carried out to the first signal, after obtaining harmonic compensation
Separate voice signal.
For example, according to updated two-value masking matrix, harmonic compensation is carried out to the first signal, obtains harmonic compensation
Separation voice signal afterwards can be accomplished in that
According to updated two-value masking matrix, the initially-separate voice signal of the first signal is obtained;To initially-separate language
Sound signal is handled, and ideal floating value masking matrix is obtained;According to the floating value masking matrix of ideal, harmonic wave benefit is carried out to the first signal
It repays, the separation voice signal after obtaining harmonic compensation.
Wherein, initially-separate voice signal is handled, obtaining ideal floating value masking matrix can be in the following way
It realizes:
Inverse Fourier transform is carried out to initially-separate voice signal, obtains time domain letter corresponding with initially-separate voice signal
Number;Halfwave rectifier processing is carried out to the corresponding time-domain signal of initially-separate voice signal, the time-domain signal after obtaining halfwave rectifier;
Short Time Fourier Transform is carried out to the time-domain signal after halfwave rectifier, and calculates the power obtained after Short Time Fourier Transform
Spectrum density;According to the power spectral density obtained after Short Time Fourier Transform, initially-separate voice signal is smoothed, with
Result after obtaining smoothing processing;According to the average value of the power spectrum of noise signal and smooth treated as a result, obtaining ideal
Floating value masking matrix.
Specifically, the process for obtaining the separation voice signal after harmonic compensation is as follows:
According to updated two-value masking matrix M ', obtain the first signal initially-separate voice signal M ' (t, k) y (t,
K), wherein y (t, k) indicates the first signal by t frame after Fourier transformation, the frequency-region signal of kth frequency range, and M ' is initial
Ideal two-value masking matrix M passes through the updated matrix of iteration;Inverse Fourier transform ISTFT is carried out to M ' (t, k) y (t, k), is obtained
It obtains and the corresponding time-domain signal s (t) of M ' (t, k) y (t, k), wherein s (t)=ISTFT (M ' (t, k) y (t, k));To s (t) into
The processing of row halfwave rectifier, wherein, max (s (t), 0) is indicated the time-domain signal after obtaining halfwave rectifier
Take the maximum value in s (t) and 0;To progress Short Time Fourier Transform STFT, and calculate the power spectrum obtained after STFT
Spending indicates to progress STFT, | it indicates to progress STFT
The power spectral density obtained afterwards;According to being smoothed, to obtain the result after smoothing processingμ indicates control parameter, 0.5≤μ≤0.9;According to noise signal
The average value of power spectrum and determining ideal floating value masking matrix R (t, k), wherein
According to R (t, k), the frequency-region signal Q (t, k) of the separation voice signal after harmonic compensation is obtained, wherein Q (t, k)=R (t,
k)y(t,k);ISTFT, acquisition and Q (t, k) corresponding time-domain signal are carried out to Q (t, k) and are determined as by harmonic wave
Compensated separation voice signal, wherein ISTFT (Q (t, k)) indicates to carry out Q (t, k) inverse Fu
In leaf transformation.
S270, according to the separation voice signal after harmonic compensation, determine when being filtered to the first signal and the second signal
The filter of the main channel of use and the filter of subaisle.
According to the separation voice signal after harmonic compensationCalculation formula
H when being minimized1And h2, and willH when being minimized1And h2
It is expressed asWithy1Indicate the first signal, y2Indicate second signal, h1Indicate the spatial filter of main channel, h2It indicates
The spatial filter of secondary channels, λ expression control parameter, 0.0001≤λ≤0.05,It indicatesTwo norms square, | | h1||1Indicate h1A norm, | | h2||1Indicate h2A norm.Its
In, second signal can be the signal from the secondary microphone remote apart from speaker mouth.
It should be noted that the filter of main channel can be for close to the corresponding filtering of the close main microphon in speaker mouth
The filter of device, subaisle can be for close to the secondary microphone corresponding filtering remoter than the main microphon of distance of speaker mouth
Device.
S280, according to the filter of main channel and subaisle used when being filtered to the first signal and the second signal
Filter is filtered the first signal and the second signal, obtains target separation voice signal.
According toWithThe first signal and the second signal are filtered, and are determinedVoice signal is separated for target,
In,It should be noted thatWithIt is adopted when being respectively filtered to the first signal and the second signal
The filter of main channel and the filter of subaisle.
It should be noted that since the first signal and the second signal may be considered targeted voice signal and background noise warp
Adduction signal after crossing corresponding time shift and decaying, target separate voice signal and are equal to the first signal and the second signal through filtering
The adduction of signal after wave can approach target, i.e., wherein, y using the separation voice signal after harmonic compensation as target separation voice signal to calculate target separation voice signal1And y2Respectively
The first signal and the second signal, therefore need to only calculate appropriate h1And h2, that is, can determine target separation voice signal and be
Calculate appropriate h1And h2, can be determined by formula, this implementation
H when being minimized in example by calculation formula1And h2As to
The filter for the main channel that one signal and second signal use when being filtered and the filter of subaisle calculate target and separate language
Sound signal wherein, the filter of the filter and subaisle of the main channel used when being filtered to the first signal and the second signal
Wave device be expressed as and namely
Speech separating method provided in this embodiment, the average value of the power spectrum by calculating noise signal, according to noise
The average value of the power spectrum of signal determines the value for constituting all time frequency units of initial ideal two-value masking matrix, to initial reason
Think that two-value masking matrix is updated, obtain updated two-value masking matrix, according to updated two-value masking matrix, to
One signal carries out harmonic compensation, the separation voice signal after obtaining harmonic compensation, according to the separation voice signal after harmonic compensation,
The filter of the main channel used when being filtered to the first signal and the second signal and the filter of subaisle are determined, according to right
The filter of the main channel used when the first signal and the second signal are filtered and the filter of subaisle, to the first signal and
Second signal is filtered, and obtains target separation voice signal, to guarantee to retain the leading time frequency unit energy of more voices
The leading time frequency unit energy of noise is measured and effectively refused, keeps target separation voice signal purer, and reduce target point
Generation from Energy volution in voice signal, it is suppressed that the distortion of target separation voice signal.
Fig. 3 is the structural schematic diagram of speech Separation device 300 provided by the embodiment of the present invention three.The device of the present embodiment
The case where suitable for reducing target separation voice signal distortion based on Computational auditory scene analysis.The device usually with hardware and/
Or the mode of software is realized.The device of the present embodiment includes following module: obtaining module 310, determining module 320, harmonic wave and mends
Repay module 330 and filter module 340.
Module 310 is obtained for obtaining the first signal, the first signal includes voice signal and noise signal;Determining module
320 for determining initial ideal two-value masking matrix according to the first signal, and initial ideal two-value masking matrix is for distinguishing first
The voice signal and noise signal that signal includes;Harmonic compensation module 330 is used for according to initial ideal two-value masking matrix, to the
One signal carries out harmonic compensation, the separation voice signal after obtaining harmonic compensation;After filter module 340 is used for according to harmonic compensation
Separation voice signal, the first signal and the second signal are filtered, obtain target separation voice signal.
Further, determining module 320, the average value of the power spectrum specifically for calculating noise signal;Believed according to noise
Number power spectrum average value, determine the value for constituting all time frequency units of initial ideal two-value masking matrix;It is first according to constituting
The value of all time frequency units of the ideal that begins two-value masking matrix determines initial ideal two-value masking matrix.
Further, determining module 320, specifically for according to the frame number for being used for estimated noise in the first signal and to the
One signal carry out Fourier transformation after t frame, kth frequency range frequency-region signal power spectral density, calculate the function of noise signal
The average value of rate spectrum, t are greater than or equal to 1 integer, and k is greater than or equal to 1 integer.
Further, harmonic compensation module 330 is obtained specifically for being updated to initial ideal two-value masking matrix
Updated two-value masking matrix, updated two-value masking matrix is for purifying target separation voice signal;After update
Two-value masking matrix, harmonic compensation, separation voice signal after obtaining harmonic compensation are carried out to the first signal.
Further, harmonic compensation module 330 is specifically used for according to current iteration number and maximum number of iterations, to first
The value for the time frequency unit that voice in the ideal that begins two-value masking matrix is dominated is updated;Square is sheltered according to initial ideal two-value
The value of the leading time frequency unit of voice in battle array be updated as a result, obtaining updated two-value masking matrix.
Further, harmonic compensation module 330 is specifically used for obtaining the first letter according to updated two-value masking matrix
Number initially-separate voice signal;Initially-separate voice signal is handled, ideal floating value masking matrix is obtained;According to ideal
Floating value masking matrix, carries out harmonic compensation to the first signal, the separation voice signal after obtaining harmonic compensation.
Further, harmonic compensation module 330 is specifically used for carrying out inverse Fourier transform to initially-separate voice signal,
Obtain time-domain signal corresponding with initially-separate voice signal;Half-wave is carried out to the corresponding time-domain signal of initially-separate voice signal
Rectification processing, the time-domain signal after obtaining halfwave rectifier;Short Time Fourier Transform is carried out to the time-domain signal after halfwave rectifier, and
Calculate the power spectral density obtained after STFT;According to the power spectral density obtained after Short Time Fourier Transform, to initial point
It is smoothed from voice signal, to obtain the result after smoothing processing;According to the average value of the power spectrum of noise signal and
It is after smoothing processing as a result, obtaining ideal floating value masking matrix.
Further, filter module 340, specifically for determining to first according to the separation voice signal after harmonic compensation
The filter for the main channel that signal and second signal use when being filtered and the filter of subaisle;According to the first signal and
The filter of the main channel used when second signal is filtered and the filter of subaisle, to the first signal and the second signal into
Row filtering obtains target separation voice signal.
Speech Separation device provided in this embodiment is determined initial ideal by obtaining the first signal according to the first signal
Two-value masking matrix carries out harmonic compensation to the first signal, after obtaining harmonic compensation according to initial ideal two-value masking matrix
Separation voice signal is filtered the first signal and the second signal, obtains mesh according to the separation voice signal after harmonic compensation
Mark separation voice signal, to reduce the generation of Energy volution in target separation voice signal, it is suppressed that target separates voice letter
Number distortion.
Correspondingly, Fig. 4 is the structural representation of speech Separation device 400 provided by the embodiment of the present invention four refering to attached drawing 4
Figure, which includes processor 401, memory 402, communication interface 403 and bus 404.Wherein, processor 401,
Memory 402, communication interface 403 are connected with each other by bus 404.
Memory 402, for storing program.Specifically, program may include program code, and program code includes computer
Operational order.
Processor 401 executes the program that memory 402 is stored, and realizes speech separating method, comprising:
For processor 401 for obtaining the first signal, the first signal includes voice signal and noise signal;According to the first signal
Determine initial ideal two-value masking matrix, initial ideal two-value masking matrix be used to distinguish voice signal that the first signal includes with
Noise signal;According to initial ideal two-value masking matrix, harmonic compensation is carried out to the first signal, the separation after obtaining harmonic compensation
Voice signal;According to the separation voice signal after harmonic compensation, the first signal and the second signal are filtered, obtain target point
From voice signal.
Further, processor 401, the average value of the power spectrum specifically for calculating noise signal;According to noise signal
Power spectrum average value, determine the value for constituting all time frequency units of initial ideal two-value masking matrix;It is initial according to constituting
The value of all time frequency units of ideal two-value masking matrix determines initial ideal two-value masking matrix.
Further, processor 401, specifically for according in the first signal be used for estimated noise frame number and to first
Signal carry out Fourier transformation after t frame, kth frequency range frequency-region signal power spectral density, calculate the power of noise signal
The average value of spectrum, t are greater than or equal to 1 integer, and k is greater than or equal to 1 integer.
Further, processor 401, specifically for being updated to initial ideal two-value masking matrix, after obtaining update
Two-value masking matrix, updated two-value masking matrix for purify target separation voice signal;According to updated two-value
Masking matrix carries out harmonic compensation to the first signal, the separation voice signal after obtaining harmonic compensation.
Further, processor 401 are specifically used for according to current iteration number and maximum number of iterations, to initial ideal
The value for the time frequency unit that voice in two-value masking matrix is dominated is updated;According to in initial ideal two-value masking matrix
The value of the leading time frequency unit of voice be updated as a result, obtaining updated two-value masking matrix.
Further, processor 401 are specifically used for obtaining the first of the first signal according to updated two-value masking matrix
Begin separation voice signal;Initially-separate voice signal is handled, ideal floating value masking matrix is obtained;It is covered according to the floating value of ideal
Matrix is covered, harmonic compensation is carried out to the first signal, the separation voice signal after obtaining harmonic compensation.
Further, processor 401, be specifically used for initially-separate voice signal carry out inverse Fourier transform, obtain with
The corresponding time-domain signal of initially-separate voice signal;The corresponding time-domain signal of initially-separate voice signal is carried out at halfwave rectifier
Reason, the time-domain signal after obtaining halfwave rectifier;Short Time Fourier Transform is carried out to the time-domain signal after halfwave rectifier, and calculates warp
Cross the power spectral density obtained after STFT;According to the power spectral density obtained after Short Time Fourier Transform, to initially-separate voice
Signal is smoothed, to obtain the result after smoothing processing;According to the average value of the power spectrum of noise signal and smooth place
It is after reason as a result, obtaining ideal floating value masking matrix.
Further, processor 401, specifically for determining and believing first according to the separation voice signal after harmonic compensation
Number and the filter of the filter of main channel and subaisle that uses when being filtered of second signal;According to the first signal and
The filter of the main channel used when binary signal is filtered and the filter of subaisle carry out the first signal and the second signal
Filtering obtains target separation voice signal.
Speech Separation device provided in this embodiment is determined initial ideal by obtaining the first signal according to the first signal
Two-value masking matrix carries out harmonic compensation to the first signal, after obtaining harmonic compensation according to initial ideal two-value masking matrix
Separation voice signal is filtered the first signal and the second signal, obtains mesh according to the separation voice signal after harmonic compensation
Mark separation voice signal, to reduce the generation of Energy volution in target separation voice signal, it is suppressed that target separates voice letter
Number distortion.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (16)
1. a kind of speech separating method characterized by comprising
The first signal is obtained, first signal includes voice signal and noise signal;
Initial ideal two-value masking matrix is determined according to first signal, and the initial ideal two-value masking matrix is for distinguishing
The voice signal and noise signal that first signal includes;
According to the initial ideal two-value masking matrix, harmonic compensation is carried out to first signal, after obtaining harmonic compensation
Separate voice signal;
According to the separation voice signal after the harmonic compensation, the first signal and the second signal are filtered, mesh is obtained
Mark separation voice signal;
Wherein, first signal is the signal of the main channel in mixing voice signal, and the second signal is the creolized language
The signal of subaisle in sound signal.
2. the method according to claim 1, wherein described determine initial ideal two-value according to first signal
Masking matrix, comprising:
Calculate the average value of the power spectrum of the noise signal;
According to the average value of the power spectrum of the noise signal, the institute of the composition initial ideal two-value masking matrix is determined sometimes
The value of frequency unit;
According to the value for all time frequency units for constituting the initial ideal two-value masking matrix, determine that the initial ideal two-value is covered
Cover matrix.
3. according to the method described in claim 2, it is characterized in that, the power spectrum for calculating the noise signal is averaged
Value, comprising:
After according to the frame number for being used for estimated noise in first signal and to first signal progress Fourier transformation
T frame, kth frequency range frequency-region signal power spectral density, calculate the average value of the power spectrum of the noise signal, t is greater than
Or the integer equal to 1, k are greater than or equal to 1 integer.
4. method described in any one of claim 1 to 3, which is characterized in that described according to the initial ideal two-value
Masking matrix carries out harmonic compensation to first signal, the separation voice signal after obtaining harmonic compensation, comprising:
The initial ideal two-value masking matrix is updated, updated two-value masking matrix is obtained, it is described updated
Two-value masking matrix is for purifying the target separation voice signal;
According to the updated two-value masking matrix, harmonic compensation is carried out to first signal, after obtaining harmonic compensation
Separate voice signal.
5. according to the method described in claim 4, it is characterized in that, be updated to the initial ideal two-value masking matrix,
Obtain updated two-value masking matrix, comprising:
According to current iteration number and maximum number of iterations, when leading to the voice in the initial ideal two-value masking matrix
The value of frequency unit is updated;
According to the value of the time frequency unit leading to the voice in the initial ideal two-value masking matrix be updated as a result,
To updated two-value masking matrix.
6. right according to the method described in claim 5, it is characterized in that, described according to the updated two-value masking matrix
First signal carries out harmonic compensation, the separation voice signal after obtaining harmonic compensation, comprising:
According to the updated two-value masking matrix, the initially-separate voice signal of first signal is obtained;
The initially-separate voice signal is handled, ideal floating value masking matrix is obtained;
According to the ideal floating value masking matrix, harmonic compensation is carried out to first signal, the separation after obtaining harmonic compensation
Voice signal.
7. according to the method described in claim 6, it is characterized in that, described handle the initially-separate voice signal,
Obtain ideal floating value masking matrix, comprising:
Inverse Fourier transform is carried out to the initially-separate voice signal, when obtaining corresponding with the initially-separate voice signal
Domain signal;
Halfwave rectifier processing is carried out to the corresponding time-domain signal of the initially-separate voice signal, the time domain after obtaining halfwave rectifier
Signal;
Short Time Fourier Transform is carried out to the time-domain signal after the halfwave rectifier, and is calculated by the Short Time Fourier Transform
The power spectral density obtained afterwards;
According to the power spectral density obtained after the Short Time Fourier Transform, the initially-separate voice signal is smoothly located
Reason, to obtain the result after smoothing processing;
It is covered according to after the average value of the power spectrum of the noise signal and the smoothing processing as a result, obtaining the ideal floating value
Cover matrix.
8. the method according to the description of claim 7 is characterized in that the separation voice according to after the harmonic compensation is believed
Number, the first signal and the second signal are filtered, the target separation voice signal is obtained, comprising:
According to the separation voice signal after the harmonic compensation, determines and adopted when being filtered to the first signal and the second signal
The filter of main channel and the filter of subaisle;
According to the filtering of the filter of the main channel used when being filtered to the first signal and the second signal and subaisle
Device is filtered the first signal and the second signal, obtains the target separation voice signal.
9. a kind of speech Separation device characterized by comprising
Module is obtained, for obtaining the first signal, first signal includes voice signal and noise signal;
Determining module, for determining initial ideal two-value masking matrix according to first signal, the initial ideal two-value is covered
Matrix is covered for distinguishing voice signal and noise signal that first signal includes;
Harmonic compensation module, for carrying out harmonic compensation to first signal according to the initial ideal two-value masking matrix,
Separation voice signal after obtaining harmonic compensation;
Filter module, for according to the separation voice signal after the harmonic compensation, to the first signal and the second signal into
Row filtering obtains target separation voice signal;
Wherein, first signal is the signal of the main channel in mixing voice signal, and the second signal is the creolized language
The signal of subaisle in sound signal.
10. device according to claim 9, which is characterized in that the determining module is specifically used for calculating the noise letter
Number power spectrum average value;According to the average value of the power spectrum of the noise signal, determines and constitute the initial ideal two-value
The value of all time frequency units of masking matrix;According to all time frequency units for constituting the initial ideal two-value masking matrix
Value determines the initial ideal two-value masking matrix.
11. device according to claim 10, which is characterized in that the determining module is specifically used for according to described first
In signal for estimated noise frame number and to first signal carry out Fourier transformation after t frame, kth frequency range frequency
The power spectral density of domain signal calculates the average value of the power spectrum of the noise signal, and t is greater than or equal to 1 integer, and k is
Integer more than or equal to 1.
12. the device according to any one of claim 9~11, which is characterized in that the harmonic compensation module is specific to use
It is updated in the initial ideal two-value masking matrix, obtains updated two-value masking matrix, described updated two
Value masking matrix is for purifying the target separation voice signal;According to the updated two-value masking matrix, to described
One signal carries out harmonic compensation, the separation voice signal after obtaining harmonic compensation.
13. device according to claim 12, which is characterized in that the harmonic compensation module is specifically used for according to current
The number of iterations and maximum number of iterations, the value of the time frequency unit leading to the voice in the initial ideal two-value masking matrix into
Row updates;The knot being updated according to the value of the time frequency unit leading to the voice in the initial ideal two-value masking matrix
Fruit obtains updated two-value masking matrix.
14. device according to claim 13, which is characterized in that the harmonic compensation module is specifically used for according to
Updated two-value masking matrix obtains the initially-separate voice signal of first signal;The initially-separate voice is believed
It number is handled, obtains ideal floating value masking matrix;According to the ideal floating value masking matrix, first signal is carried out humorous
Wave compensation, the separation voice signal after obtaining harmonic compensation.
15. device according to claim 14, which is characterized in that the harmonic compensation module is specifically used for described first
Begin to separate voice signal progress inverse Fourier transform, obtains time-domain signal corresponding with the initially-separate voice signal;To institute
It states the corresponding time-domain signal of initially-separate voice signal and carries out halfwave rectifier processing, the time-domain signal after obtaining halfwave rectifier;It is right
Time-domain signal after the halfwave rectifier carries out Short Time Fourier Transform, and calculates and obtain after the Short Time Fourier Transform
Power spectral density;According to the power spectral density obtained after the Short Time Fourier Transform, to the initially-separate voice signal
It is smoothed, to obtain the result after smoothing processing;According to the average value of the power spectrum of the noise signal and described flat
It is sliding that treated as a result, obtaining the ideal floating value masking matrix.
16. device according to claim 15, which is characterized in that the filter module is specifically used for according to the harmonic wave
Compensated separation voice signal determines the filtering of the main channel used when being filtered to the first signal and the second signal
The filter of device and subaisle;According to the filter of the main channel used when being filtered to the first signal and the second signal
With the filter of subaisle, the first signal and the second signal are filtered, obtain the target separation voice signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410189386.5A CN105096961B (en) | 2014-05-06 | 2014-05-06 | Speech separating method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410189386.5A CN105096961B (en) | 2014-05-06 | 2014-05-06 | Speech separating method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105096961A CN105096961A (en) | 2015-11-25 |
CN105096961B true CN105096961B (en) | 2019-02-01 |
Family
ID=54577241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410189386.5A Active CN105096961B (en) | 2014-05-06 | 2014-05-06 | Speech separating method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105096961B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017141542A1 (en) * | 2016-02-16 | 2017-08-24 | 日本電信電話株式会社 | Mask estimation apparatus, mask estimation method, and mask estimation program |
CN110168640B (en) | 2017-01-23 | 2021-08-03 | 华为技术有限公司 | Apparatus and method for enhancing a desired component in a signal |
CN107657962B (en) * | 2017-08-14 | 2020-06-12 | 广东工业大学 | Method and system for identifying and separating throat sound and gas sound of voice signal |
CN107680611B (en) * | 2017-09-13 | 2020-06-16 | 电子科技大学 | Single-channel sound separation method based on convolutional neural network |
WO2019072395A1 (en) | 2017-10-12 | 2019-04-18 | Huawei Technologies Co., Ltd. | An apparatus and a method for signal enhancement |
CN108109619B (en) * | 2017-11-15 | 2021-07-06 | 中国科学院自动化研究所 | Auditory selection method and device based on memory and attention model |
CN110070887B (en) * | 2018-01-23 | 2021-04-09 | 中国科学院声学研究所 | Voice feature reconstruction method and device |
CN110544488B (en) * | 2018-08-09 | 2022-01-28 | 腾讯科技(深圳)有限公司 | Method and device for separating multi-person voice |
CN109584903B (en) * | 2018-12-29 | 2021-02-12 | 中国科学院声学研究所 | Multi-user voice separation method based on deep learning |
CN112216303A (en) * | 2019-07-11 | 2021-01-12 | 北京声智科技有限公司 | Voice processing method and device and electronic equipment |
CN113077807B (en) * | 2019-12-17 | 2023-02-28 | 北京搜狗科技发展有限公司 | Voice data processing method and device and electronic equipment |
CN113516990A (en) * | 2020-04-10 | 2021-10-19 | 华为技术有限公司 | Voice enhancement method, method for training neural network and related equipment |
CN111583954B (en) * | 2020-05-12 | 2021-03-30 | 中国人民解放军国防科技大学 | Speaker independent single-channel voice separation method |
CN112562649B (en) * | 2020-12-07 | 2024-01-30 | 北京大米科技有限公司 | Audio processing method and device, readable storage medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993005503A1 (en) * | 1991-08-28 | 1993-03-18 | Massachusetts Institute Of Technology | Multi-channel signal separation |
CN101031956A (en) * | 2004-07-22 | 2007-09-05 | 索福特迈克斯有限公司 | Headset for separation of speech signals in a noisy environment |
CN101278337A (en) * | 2005-07-22 | 2008-10-01 | 索福特迈克斯有限公司 | Robust separation of speech signals in a noisy environment |
CN101828335A (en) * | 2007-10-18 | 2010-09-08 | 摩托罗拉公司 | Robust two microphone noise suppression system |
CN101903948A (en) * | 2007-12-19 | 2010-12-01 | 高通股份有限公司 | Systems, methods, and apparatus for multi-microphone based speech enhancement |
CN102157156A (en) * | 2011-03-21 | 2011-08-17 | 清华大学 | Single-channel voice enhancement method and system |
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
-
2014
- 2014-05-06 CN CN201410189386.5A patent/CN105096961B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993005503A1 (en) * | 1991-08-28 | 1993-03-18 | Massachusetts Institute Of Technology | Multi-channel signal separation |
CN101031956A (en) * | 2004-07-22 | 2007-09-05 | 索福特迈克斯有限公司 | Headset for separation of speech signals in a noisy environment |
CN101278337A (en) * | 2005-07-22 | 2008-10-01 | 索福特迈克斯有限公司 | Robust separation of speech signals in a noisy environment |
CN101828335A (en) * | 2007-10-18 | 2010-09-08 | 摩托罗拉公司 | Robust two microphone noise suppression system |
CN101903948A (en) * | 2007-12-19 | 2010-12-01 | 高通股份有限公司 | Systems, methods, and apparatus for multi-microphone based speech enhancement |
CN102157156A (en) * | 2011-03-21 | 2011-08-17 | 清华大学 | Single-channel voice enhancement method and system |
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
Non-Patent Citations (3)
Title |
---|
"基于听觉场景分析和语者模型信息的语音识别鲁棒前端研究";关勇,李鹏,刘文举,徐波;《自动化学报》;20090430;第35卷(第4期);全文 * |
"基于噪声追踪的二值时频掩蔽到浮值掩蔽的泛化算法";梁山,刘文举,江巍;《声学学报》;20130930;第38卷(第5期);全文 * |
"改进谐波组织规则的单通道浊语音分离***";张学良,刘文举,李鹏,徐波;《声学学报》;20110131;第36卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105096961A (en) | 2015-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105096961B (en) | Speech separating method and device | |
Han et al. | Deep neural network based spectral feature mapping for robust speech recognition. | |
Hoshen et al. | Speech acoustic modeling from raw multichannel waveforms | |
Han et al. | Learning spectral mapping for speech dereverberation | |
Vaseghi | Multimedia signal processing: theory and applications in speech, music and communications | |
CN109767783A (en) | Sound enhancement method, device, equipment and storage medium | |
CN109145123A (en) | Construction method, intelligent interactive method, system and the electronic equipment of knowledge mapping model | |
KR20180080446A (en) | Voice recognizing method and voice recognizing appratus | |
KR20130133858A (en) | Speech syllable/vowel/phone boundary detection using auditory attention cues | |
CN105225672B (en) | Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information | |
CN103325379A (en) | Method and device used for acoustic echo control | |
Zezario et al. | Self-supervised denoising autoencoder with linear regression decoder for speech enhancement | |
Abdullah et al. | Towards more efficient DNN-based speech enhancement using quantized correlation mask | |
JP2017509014A (en) | A system for speech analysis and perceptual enhancement | |
CN112259112A (en) | Echo cancellation method combining voiceprint recognition and deep learning | |
CN111105809B (en) | Noise reduction method and device | |
CN110268471A (en) | The method and apparatus of ASR with embedded noise reduction | |
Sainath et al. | Raw multichannel processing using deep neural networks | |
CN108806707A (en) | Method of speech processing, device, equipment and storage medium | |
CN106875944A (en) | A kind of system of Voice command home intelligent terminal | |
Jassim et al. | Voice activity detection using neurograms | |
CN113763978B (en) | Voice signal processing method, device, electronic equipment and storage medium | |
Kashani et al. | Speech Enhancement via Deep Spectrum Image Translation Network | |
CN114302286A (en) | Method, device and equipment for reducing noise of call voice and storage medium | |
CN112992167A (en) | Audio signal processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |