CN110085246A - Sound enhancement method, device, equipment and storage medium - Google Patents
Sound enhancement method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110085246A CN110085246A CN201910233376.XA CN201910233376A CN110085246A CN 110085246 A CN110085246 A CN 110085246A CN 201910233376 A CN201910233376 A CN 201910233376A CN 110085246 A CN110085246 A CN 110085246A
- Authority
- CN
- China
- Prior art keywords
- noise
- time
- present frame
- frame signals
- frequency mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Abstract
The present invention relates to speech signal processing technology, a kind of sound enhancement method, device, equipment and storage medium are provided, it is intended to solve the problems, such as that existing voice Enhancement Method is computationally intensive, be unsatisfactory for requirement of real-time.The sound enhancement method includes: the present frame signals with noise for obtaining microphone array acquisition, and the present frame signals with noise includes at least the voice signal that target voice sound source and other sound sources respectively issue;Using the present frame signals with noise, the corresponding time-frequency mask of the present frame signals with noise is determined;Using the time-frequency mask, the corresponding filter coefficient of the present frame signals with noise is determined;Using the filter coefficient, speech enhan-cement processing is carried out to signals with noise.Since the present invention is when calculating the time-frequency mask, it is only necessary to handle a frame signals with noise, therefore calculation amount of the invention is smaller, and meets requirement of real-time.
Description
Technical field
The present invention relates to speech signal processing technologies, in particular to a kind of sound enhancement method, device, set
Standby and storage medium.
Background technique
Under noise circumstance, the performance of many speech processing systems sharply declines.Speech enhan-cement is as solution noise pollution
A kind of effective preconditioning technique is always the hot spot of field of voice signal.The purpose of speech enhan-cement is from signals with noise
In extract primary speech signal as pure as possible, improve signal-to-noise ratio, improve voice quality.
In the prior art, the General Principle of speech enhan-cement are as follows: first with filter coefficient to by Fourier transformation or
The signals with noise of Short Time Fourier Transform is filtered, the frequency-region signal enhanced;Then the frequency domain of the enhancing is believed again
Number inversefouriertransform is done, the time-domain signal enhanced, to export.Wherein for the determination of filter coefficient, existing skill
There are a variety of determining methods in art.In conventional determination method, filter coefficient is confirmed as a fixed value, due to noise sheet
Body generally can be with time to time change, therefore filter coefficient is confirmed as a fixed value and does not meet general rule naturally
Rule only can be suitably used for speech enhan-cement in the constant situation of noise field using the method that this filter coefficient carries out speech enhan-cement, adapt to
Property is weak.In order to overcome the above problem, existing another kind algorithm is to be made an uproar letter using EM algorithm using one section of longer band of caching
Number calculates the corresponding time-frequency mask of this section of voice first, and it is corresponding then to calculate this section of voice using the time-frequency mask
Filter coefficient;Although such method can accurately calculate filter coefficient, so that Speech enhancement effect is improved,
Be need the long period to cache mass data due to such method, such as need 10 minutes it is data cached, therefore by such method
After sound enhancement method, sound enhancement method is not only computationally intensive, but also it is unsatisfactory for requirement of real-time, cannot be answered
For in the speech enhan-cement task with requirement of real-time.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of sound enhancement method, device, equipment and storage mediums.Purport
It solving the problems, such as that existing voice Enhancement Method is computationally intensive, be unsatisfactory for requirement of real-time.
In a first aspect, the embodiment of the invention provides a kind of sound enhancement methods, comprising:
The present frame signals with noise of microphone array acquisition is obtained, the present frame signals with noise includes at least target voice
The voice signal that sound source and other sound sources respectively issue;
Using the present frame signals with noise, the corresponding time-frequency mask of the present frame signals with noise is determined;
Using the time-frequency mask, the corresponding filter coefficient of the present frame signals with noise is determined;
Using the filter coefficient, speech enhan-cement processing is carried out to signals with noise.
Second aspect, the embodiment of the invention provides a kind of speech sound enhancement devices, comprising:
Module is obtained, for obtaining the present frame signals with noise of microphone array acquisition, the present frame signals with noise is extremely
It less include the voice signal that target voice sound source and other sound sources respectively issue;
Time-frequency mask determining module determines the present frame signals with noise pair for utilizing the present frame signals with noise
The time-frequency mask answered;
Filter coefficient determining module determines that the present frame signals with noise is corresponding for utilizing the time-frequency mask
Filter coefficient;And
Speech enhan-cement module carries out speech enhan-cement processing to signals with noise for utilizing the filter coefficient.
The third aspect, the embodiment of the invention provides a kind of speech enhancement apparatus, comprising: microphone array, is deposited processor
Reservoir and be stored in the computer program that can be run on the memory and on the processor, the processor with it is described
Microphone array connection, which is characterized in that when the processor executes the computer program, realize that the embodiment of the present invention is appointed
Sound enhancement method described in one.
Fourth aspect, the embodiment of the invention provides storage mediums, are stored thereon with computer program, which is characterized in that
When the computer program is executed by processor, any sound enhancement method of the embodiment of the present invention is realized.
Compared with prior art, the invention has the following advantages:
In sound enhancement method provided by the present invention, the present frame signals with noise of microphone array acquisition, benefit are obtained
With the present frame signals with noise, it is determined that the corresponding time-frequency mask of the present frame signals with noise, using the time-frequency mask,
The corresponding filter coefficient of the present frame signals with noise has been determined, using the filter coefficient, language is carried out to signals with noise
Sound enhancing processing is also possible to the present frame band and makes an uproar letter wherein the signals with noise can be the present frame signals with noise
Number former frame signals with noise or rear a few frame signals with noise.On the one hand, since the present invention is directed to each frame signals with noise, calculating should
The corresponding time-frequency mask of frame signals with noise, the time-frequency mask not instead of fixed value, according to the specific feelings of every frame signals with noise
Condition and change, correspondingly, be also to be changed according to the concrete condition of every frame signals with noise by its resulting filter coefficient,
Therefore final speech enhan-cement is carried out using the filter coefficient to handle, speech enhan-cement effect can be improved;On the other hand, by
In the present invention when calculating the time-frequency mask, it is only necessary to a frame signals with noise is handled, therefore calculation amount of the invention compared with
It is small, and meet requirement of real-time.
Detailed description of the invention
Fig. 1 shows the flow diagram of the sound enhancement method provided in embodiment;
Fig. 2 shows the structural block diagrams of speech-enhancement system as described in the examples;
Fig. 3 shows the flow diagram of the determination method of time-frequency mask as described in the examples;
Fig. 4 shows the structural block diagram of the speech sound enhancement device provided in embodiment;
Fig. 5 shows the structural block diagram of another speech sound enhancement device provided in embodiment.
Specific embodiment
A specific embodiment of the invention is described below, which is schematical, it is intended to disclose of the invention
Specific work process should not be understood as further limiting scope of protection of the claims.
Referring to Fig. 1, embodiment provides a kind of sound enhancement method, the method can be used as speech recognition, voice coder
The pre-processing link of the voice signals application technologies such as code, can be applied in speech-enhancement system.As shown in Fig. 2, the voice
Enhancing system mainly includes sequentially connected microphone array 10, audio decoder 20, digital signal processor 30 and DA conversion
Device 40.Wherein, the microphone array 10 is for acquiring sound, and the sound of acquisition is converted to and is made an uproar by the band of analog representation
Signal;The audio decoder 20 is used to carry out digitized sampling conversion to the signals with noise, and the data after conversion are sent
Enter the digital signal processor 30;The digital signal processor 30 is used to carry out speech enhan-cement processing to the data, and
The D/A converter 40 is sent by speech enhan-cement treated data by described;What the D/A converter 40 was used to receive
Data are converted to analog signal, and export.It include at least in speech-enhancement system shown in Fig. 2, in the microphone array 10
Two array elements, each array element are a microphone, when the microphone array 10 is digital microphone array, shown in Fig. 2
Shown audio decoder 20 can be saved in system.
In the related technology, digital signal processor 30 is when carrying out speech enhan-cement processing, first with filter coefficient pair
Signals with noise by Fourier transformation or Short Time Fourier Transform is filtered, the frequency-region signal enhanced;Then right again
The frequency-region signal of the enhancing does inversefouriertransform, the time-domain signal enhanced, to export.Among these, for filtering
The determination of device coefficient, mainly includes the following three types existing method in the prior art:
The first existing method utilizes the covariance matrix of signals with noiseApproximation is used as noise covariance matrixUtilize the noise covariance matrixCalculate the filter coefficient.Its drawback is, due to
It is notAccurate estimation, will lead to noise inhibiting ability deficiency, while also damage speaker speech.
Second of existing method, it is assumed that noise is disperse noise field, utilizes the covariance square of preset disperse noise
Battle arrayInstead of true autocorrelation matrixWhen noise is disperse noise, this scheme has good noise suppression
Ability processed.However, there are various coherent noises or interference sounds in practical application scene, at this moment,With really make an uproar
Sound covariance matrix deviation is larger.It will lead to noise inhibiting ability deficiency, speech enhan-cement is ineffective.
The third existing method using EM algorithm to time frequency point cluster calculation time-frequency mask, and then is covered using the time-frequency
Mould calculates noise covariance matrixThe benefit of the method is more accurately to estimate noise covariance matrixIts drawback is that EM algorithm is computationally intensive, and needing to cache sufficiently large data could estimate accurately, it is difficult to meet real
When the demand that calculates.
In the present invention, since sound enhancement method provided in this embodiment has, calculation amount is small, is able to satisfy requirement of real-time
The features such as, therefore can be applied to have in the Speech processing task of requirement of real-time.Hereinafter, the present embodiment will be in conjunction with figure
1, it describes in detail to the sound enhancement method.
Step 101, the present frame signals with noise of microphone array acquisition is obtained, the present frame signals with noise includes at least
The voice signal that target voice sound source and other sound sources respectively issue.
Specifically, when the sound that each sound source is issued is captured by each array element in microphone array, the microphone
Each array element in array will generate a signals with noise.As an example, obtaining the institute of each array element in the microphone array
State signals with noise.Since the array element is after receiving the sound that target voice sound source and other sound sources respectively issue, and generate
The signals with noise, therefore include the voice signal that target voice source of students issues in the signals with noise, also include him
The voice signal that sound source issues.It should be appreciated that the target voice sound source is to need to carry out the sound source of speech enhan-cement, it is described
Other sound sources are the sound source in addition to the target voice sound source, and the quantity of other sound sources can be one or more.Make
For example, the microphone array of mobile phone receives the voice from user A, also receives and make an uproar from what ambient noise source was issued
Sound, mobile phone need to enhance the voice of user, and by the voice signal real-time Transmission formed after enhancing to the another of other places
User B.Among these, user A is the target voice sound source, and ambient noise source is other described sound sources.
As an example, the microphone array can be mobile phone, tablet computer, palm PC PDA, laptop, platform
Microphone array on formula computer or intelligent sound box (such as day cat spirit, millet box) equipment.Correspondingly, described in obtaining
The signals with noise and subject of implementation for carrying out following subsequent processings to the signals with noise can be the mobile phone, tablet computer, the palm
The processor of upper computer PDA, laptop, desktop computer or intelligent sound etc..
Specifically, the microphone array includes at least two array elements, each array element is a microphone.As showing
Example, the microphone array specifically may include 6 array elements.
As an example, the frame length of present frame signals with noise can be 10ms, 20ms or 30ms etc..It should be appreciated that described
The frame length of signals with noise is not limited to the example above, the present invention to the frame length of signals with noise without limitation.It should be appreciated that this
Inventive embodiments obtain the corresponding filter coefficient of each frame just because of each frame to signals with noise is handled in time,
Therefore more accurate filter coefficient can be calculated for every frame signal, and then improves speech enhan-cement effect.Again due to every frame
The frame length of signals with noise is shorter, and generally within 100ms, therefore signal data volume is smaller, the data volume handled needed for processor
It is smaller, and due to not needing just to be handled after caching longer signal data, the real-time of processing is more preferable.
Step 102, using the present frame signals with noise, the corresponding time-frequency mask of the present frame signals with noise is determined.
Since the corresponding time-frequency mask of the present frame signals with noise is obtained using present frame signals with noise, because
The numerical values recited of this time-frequency mask is related to the actual conditions of present frame signals with noise, and the numerical value of the time-frequency mask is more quasi-
Really, after being applied to speech enhan-cement, speech enhan-cement effect can be improved.
Citing as an embodiment can specifically be asked by the method for step 201 included below and step 202
The time-frequency mask is taken, as shown in Figure 3.
Step 201, according to the present frame signals with noise, determine the target voice sound source relative to the microphone array
The estimation orientation of column.
As an example, can be selected by any estimation method and estimated for the location estimation of the target voice sound source, this
Invention does not limit this.For example, spherical microphone array can be passed through when the microphone array selects spherical microphone array
The sound pressure information for acquiring high-order sound field decomposes sound field using spheric harmonic function and establishes signal model, estimates using MUSIC algorithm
The orientation of target voice sound source.In another example existing TDOA algorithm can also be selected to estimate the orientation of target voice sound source.It examines
Consider and the position of the target voice sound source is estimated, the prior art can be selected, therefore the present invention is to specific estimation side
Method repeats no more.It should be appreciated that the estimation method that calculation amount can be selected less than normal is to described in order to further decrease calculation amount
Estimated the position of target voice sound source.
In this step, estimation orientation and target voice sound source of the target voice sound source relative to the microphone array
True bearing between there are error, the size of the error will receive the influence of noise source (i.e. described other sound sources).Such as
When the acoustic impacts that the sound that noise source is issued issues target voice sound source are larger, the error is also larger.
Step 202, according to the relative positional relationship between the estimation orientation and target area, the present frame band is determined
The corresponding time-frequency mask of noise cancellation signal, wherein the target area is the physical location region where the target voice sound source.
Wherein, the physical location region is position section.Such as the actual position phase of the target voice sound source
Orientation for the microphone array is 15 °, then the physical location region is [15-a, 15+a].As an example, a
Specific size can preset, when being set as 30 ° such as a, at this time physical location region be [- 15 °, 45 °];As an example, institute
State a specific size can also according to speech enhan-cement effect timely automated ground adjusting and optimizing, such as when speech enhan-cement effect is unobvious,
When the voice signal of output still includes larger noise, a can be reduced automatically, such as when being exported after speech enhan-cement
Original targeted voice signal (i.e. described target voice sound source corresponding voice signal) has been damaged in voice signal, it can will be automatic
The a is amplified.
As an example, when the target voice sound source is the fixed sound source in position, such as the target voice sound source
Actual position is 15 ° relative to the orientation of the microphone array, and constant always, then the physical location region is always
[15-a,15+a];As an example, when the target voice sound source is that position is not fixed sound source, then the physical location region
For [b-a, b+a], wherein the actual position of the target voice sound source is b, for example it can be tracked, infrared be chased after by binocular camera
The non-speech audios processing methods such as track positioning, determine the actual position b of target voice sound source.
It should be appreciated that it is that the target voice sound source determines a physical location that any rational method, which can be selected, in the present invention
Region, i.e., the described target area.Multiple examples enumerated above do not limit the present invention.
In the above-mentioned method comprising step 201 and step 202, due to being specifically according to the estimation orientation and target area
Between relative positional relationship, determine that the corresponding time-frequency mask of the present frame signals with noise, essence are according to the estimation
Error size between orientation and the target voice sound source actual position, determines the time-frequency mask.And as previously mentioned, described
Size of error itself will receive the influence of noise source, therefore the size of the time-frequency mask, be led by noise source itself
It determines.Such as when the acoustic impacts that the sound that noise source is issued issues target voice sound source are larger, the time-frequency
Mask is larger, and when the acoustic impacts that the sound that noise source is issued issues target voice sound source are smaller, the time-frequency is covered
Mould is smaller.
Specific implementation for above-mentioned steps 202, i.e., how according to the relative position between estimation orientation and target area
Relationship determines that the corresponding time-frequency mask of the present frame signals with noise, embodiment provide the citing of following two embodiment.
First embodiment, if the estimation orientation is located in the target area, it is determined that the time-frequency mask
For predetermined fixed value T1;If the estimation orientation is located at outside the target area, it is determined that the time-frequency mask is default solid
Definite value T2;Wherein T2≤1 0≤T1 <.It should be appreciated that when first embodiment determines described in a manner of hard decision
Frequency mask.
In above-mentioned first embodiment, predetermined fixed value T1 is corresponding " estimation orientation is located in the target area "
The case where, influence of the noise source to target voice sound source at this time is smaller, the estimation orientation and target voice sound source physical location
Between error it is smaller;It the case where predetermined fixed value T2 corresponding " estimation orientation is located at outside the target area ", makes an uproar at this time
Sound source is affected to target voice sound source, the error between the estimation orientation and target voice sound source physical location compared with
Greatly;Since T1 and T2 has the numerical relation of T2≤1 0≤T1 <, reflect when noise source influences target voice sound source
When larger, the time-frequency mask is larger, and when noise source influences smaller to target voice sound source, the time-frequency mask is smaller.
As an example, the specific value of the T1, preferably takes 0;The specific value of the T2, preferably takes 1.Above-mentioned
Under sample situation, when the estimation orientation is located in the target area, the time-frequency mask is 0, it is believed that at this time can be with
Do not consider noise source;When the estimation orientation is located at outside the target area, the time-frequency mask is 1, it is believed that this
When need to consider noise source.The T1 is taken as 0, the T2 beneficial effect for being taken as 1 is, the number that size is 0 or 1
Value can be conducive to be further simplified calculating, and then further decrease calculation amount.It should be appreciated that the specific value of the T1 and T2,
It is not limited to the example above, such as the T1 can also be taken as 0.05, or be taken as 0.1, or be taken as 0.2 etc., such as the T2
Can also be taken as 0.95, or be taken as 0.9, or be taken as 0.8 etc., the present invention to the specific value of T1 and T2 without limitation.
Second embodiment, if the estimation orientation is located at outside the target area, it is determined that the time-frequency mask
For predetermined fixed value T3, wherein 0 T3≤1 <;If the estimation orientation is located in the target area, according to the estimation
Specific relative position of the orientation in the target area determines that the time-frequency mask is T4;Wherein 0≤T4 < T3.It should manage
Solution, second embodiment determines the time-frequency mask in a manner of soft-decision.
In above-mentioned second embodiment, predetermined fixed value T3 is corresponding " estimation orientation is located at outside the target area "
The case where, noise source is affected to target voice sound source at this time, the estimation orientation and target voice sound source physical location
Between error it is larger;The case where T4 corresponding " estimation orientation is located in the target area ", noise source is to target at this time
The influence of voice sound source is smaller, and the error between the estimation orientation and target voice sound source physical location is smaller;Due to T3 and
T4 has the numerical relation of 0≤T4 < T3, therefore reflects when noise source is affected to target voice sound source, the time-frequency
Mask is larger, and when noise source influences smaller to target voice sound source, the time-frequency mask is smaller.
As an example, the specific value of the T3, preferably takes 1.The T2 beneficial effect for being taken as 1 is, size
It can be conducive to be further simplified calculating for 1 numerical value, and then further decrease calculation amount.It should be appreciated that the T3's specifically takes
Value, it is not limited to which the example above, such as the T3 can also be taken as 0.95, or be taken as 0.9, or be taken as 0.8 etc., the present invention couple
The specific value of T3 is without limitation.
In above-mentioned second embodiment, in the specific opposite position according to the estimation orientation in the target area
It sets, when determining that the time-frequency mask is T4, as an example, the size of the numerical value of the T4 meets following relationship: the estimation side
Position is closer to the center of the target area, and the numerical value of the T4 is closer to 0;The estimation orientation is closer to the target
The marginal position in region, the numerical value of the T4 is closer to the T3.
In above-mentioned example, when the estimation orientation is closer to the center of the target area, noise source is to mesh at this time
The influence in poster speech source is smaller, and the error between the estimation orientation and target voice sound source physical location is smaller, accordingly
Time-frequency mask is smaller;When the estimation orientation is closer to the marginal position of the target area, noise source is to target voice at this time
Sound source is affected, and the error between the estimation orientation and target voice sound source physical location is larger, and corresponding time-frequency is covered
Mould is larger.
For example, specifically the T4 can be gone out according to preset mapping function.The mapping function can be set by experience
It is fixed, it can also be obtained by the statistical learning of machine.As an example, can be by tool of the estimation orientation in the target area
Mapping function between body relative position and the T4, is set as linear mapping function.For example the target voice sound source is true
Real position is 15 ° relative to the orientation of the microphone array, and the physical location region is [- 15 °, 45 °], predetermined fixed value
The number of T3 is set as 1, then can set linear mapping function as T4=R/30-0.5 (15≤R≤45), T4=-R/30+0.5
(- 15≤R≤15), wherein R is the estimation orientation.As an example, can also be by the estimation orientation in the target area
Mapping function between interior specific relative position and the T4 is set as nonlinear mapping function, the Nonlinear Mapping letter
In number, using the estimation orientation as independent variable, using the time-frequency mask T4 as dependent variable, the nonlinear mapping function can basis
Experience setting, can also be obtained by the statistical learning of machine.It should be appreciated that setting approach of the present invention to the mapping function
Without limitation.
Step 103, using the time-frequency mask, the corresponding filter coefficient of the present frame signals with noise is determined.
Wherein, due to the own actual situation gained that the time-frequency mask is according to present frame signals with noise, specifically root
According to the corresponding noise source situation gained of previous frame signals with noise, therefore using filter coefficient determined by the time-frequency mask, it is
The accuracy of the corresponding filter coefficient of present frame signals with noise, the filter coefficient is high.
Citing as an embodiment can specifically be determined by the method for step 1 included below to step 3
The filter coefficient.
Step 1 carries out Fourier transformation to the present frame signals with noise, obtains Fu of the present frame signals with noise
In leaf transformation frequency spectrum.
Specifically, since the signals with noise exported from microphone array is the signals with noise continuously exported in time,
The present embodiment intercepts and captures out a frame signals with noise of current time from the continuous signals with noise, and this frame signals with noise
It is named as present frame signals with noise, Fourier transformation then is carried out to the present frame signals with noise.It should be appreciated that being directed to one
The longer continuous signals with noise of section, the present embodiment will on a frame-by-frame basis intercept out data from the continuous signals with noise and carry out Fu
In leaf transformation, as time goes by, the continuous signals with noise will be truncated as multiple frames, successively carry out Fourier transformation, because
This is directed to continuous signals with noise, has substantially carried out Short Time Fourier Transform to it, for each frame, has been equivalent to a time
Data corresponding to window.
Specifically, can intercept out a frame band from the signals with noise that each array element in microphone array exports and make an uproar letter
Number, the present frame signals with noise as each array element;Then Fourier transformation is carried out to the present frame signals with noise of each array element,
Obtain the Fourier transformation frequency spectrum y of each array element1、y2、y3Or ym, wherein subscript m represents array element number;Finally by each array element
Fourier transformation frequency spectrum merge, obtain the Fourier transformation frequency spectrum y=[y of the present frame signals with noise1,y2,y3…
ym]T, the Fourier transformation frequency spectrum characterizes by the matrix y.
Step 2 calculates noise covariance matrix according to following formula:
Wherein, describedFor the noise covariance matrix, T is the time-frequency mask, and y (t, f) is described in characterization
The matrix of Fourier transformation frequency spectrum, yH(t, f) is the conjugate matrices of y (t, f).
Specifically, the y (t, f) has array element number attribute, time attribute and frequency attribute, the yH(t, f) can lead to
It crosses and is taken by conjugation and is acquired for the y (t, f).
Wherein, by being introduced into the time-frequency mask T into above-mentioned formula, progressive of the invention is embodied.The prior art
When seeking filter coefficient, the noise covariance matrix is also applied.In the first foregoing existing method, benefit
With the covariance matrix of signals with noiseApproximation is used as noise covariance matrixIt is equivalent to time-frequency mask
Fixedly value is 1 to T.In foregoing second of existing method, assume that noise is disperse noise field, using presetting
Disperse noise covariance matrixInstead of true autocorrelation matrixIt is equivalent to noise coefficient T
Fixedly value is a decimal between 0 to 1, such as 0.6.
In above two existing method, it cannot determine to work as according to the corresponding noise source situation of present frame signals with noise
The corresponding time-frequency mask of previous frame signals with noise, therefore time-frequency mask is not accurate enough, causes filter coefficient accuracy lower, finally
Keep speech enhan-cement ineffective.And in the present invention, it can be determined according to the corresponding noise source situation of present frame signals with noise
The corresponding time-frequency mask of present frame signals with noise, therefore time-frequency mask is more acurrate, so that filter coefficient accuracy is higher, finally
Make speech enhan-cement better effect.
Step 3 calculates the filter coefficient according to following formula:
Wherein, w (f) is the filter coefficient,For the noise covariance matrix,It is described current
The corresponding steering vector being estimated of frame signals with noise,ForConjugate matrices.
Wherein, the steering vectorThe steering vector can be estimated by existing estimation methodExample
As in the prior art, the phase time-frequency mask estimation steering vector of audio signal can be passed through.It should be appreciated that the present invention to how
Estimate the steering vectorWithout limitation.
In this step, due to the noise covariance matrixIt is according to noise source in present frame signals with noise
Concrete condition and determination, therefore be also to match with present frame signals with noise using its filter coefficient determined, it is described
The accuracy of filter coefficient is higher, and when carrying out speech enhan-cement processing using it, speech enhan-cement effect can be improved.
Step 104, using the filter coefficient, speech enhan-cement processing is carried out to signals with noise.
As an example, the signals with noise can be the present frame signals with noise, it is also possible to the present frame band and makes an uproar
Former frame signals with noise of signal or rear a few frame signals with noise.For example, digital signal processor can utilize the filter coefficient,
In subsequent time, a frame corresponding to subsequent time or a few frame signals with noise carry out speech enhan-cement processing.For another example, number letter
The filter coefficient can also be used in number processor, and a frame corresponding to previous moment or a few frame signals with noise carry out speech enhan-cement
Then processing exports the speech enhan-cement signal of previous moment, although decay time is on described at this time there are signal output time delay
The time difference between one moment and current time, this section of time difference corresponding frame number is a frame or a few frames, due to every frame signals with noise
Duration is very short, therefore signal output time delay very little.
As an example, the present embodiment also selects the General Principle method of speech enhan-cement described in background technology, to described
Present frame signals with noise carries out speech enhan-cement processing.Specifically, using the filter coefficient, to by Fourier transformation or short
When Fourier transformation signals with noise be filtered, the frequency-region signal enhanced;Then again to the frequency-region signal of the enhancing
Inversefouriertransform is done, the time-domain signal enhanced, to export.More specifically, the frequency domain of enhancing is calculated as follows first
Signal, Then the time-domain signal of enhancing is calculated as follows,
WhereinFor the frequency-region signal of the enhancing, wHIt (f) is the conjugate matrices of the filter coefficient, y (t, f) is to characterize institute
The matrix of Fourier transformation frequency spectrum is stated,For the time-domain signal of the enhancing, IFFT indicates inversefouriertransform.It should manage
Solution, above-mentioned example only acts on as an example, is not intended to limit the present invention.
With it is above-mentioned include sound enhancement method of the step 101 to step 104, obtain working as microphone array acquisition
Previous frame signals with noise utilizes the present frame signals with noise, it is determined that the corresponding time-frequency mask of the present frame signals with noise, benefit
With the time-frequency mask, it is determined that the corresponding filter coefficient of the present frame signals with noise, it is right using the filter coefficient
The present frame signals with noise carries out speech enhan-cement processing.On the one hand, it since the above method is directed to each frame signals with noise, calculates
The corresponding time-frequency mask of frame signals with noise, the time-frequency mask not instead of fixed value, according to the specific of every frame signals with noise
Situation and change, correspondingly, being also to be changed according to the concrete condition of every frame signals with noise by its resulting filter coefficient
, therefore carry out final speech enhan-cement using the filter coefficient and handle, speech enhan-cement effect can be improved;Another party
Face, since the above method is when calculating the time-frequency mask, it is only necessary to handle a frame signals with noise, therefore meter of the invention
Calculation amount is smaller, and meets requirement of real-time.
Referring to Fig. 4, embodiment provides a kind of speech sound enhancement device, the speech sound enhancement device includes:
Module 501 is obtained, for obtaining the present frame signals with noise of microphone array acquisition, the present frame signals with noise
The voice signal respectively issued including at least target voice sound source and other sound sources;
Time-frequency mask determining module 502 determines the present frame signals with noise for utilizing the present frame signals with noise
Corresponding time-frequency mask;
Filter coefficient determining module 503 determines that the present frame signals with noise is corresponding for utilizing the time-frequency mask
Filter coefficient;And
Speech enhan-cement module 504 carries out voice increasing to the present frame signals with noise for utilizing the filter coefficient
Strength reason.
Optionally, referring to Fig. 5, time-frequency mask determining module on the basis of above-mentioned Fig. 4, in the speech sound enhancement device
502 include:
Estimation orientation determines submodule 5021, for determining the target language speech according to the present frame signals with noise
Estimation orientation of the source relative to the microphone array;And
Time-frequency mask determines submodule 5022, for being closed according to the relative position between the estimation orientation and target area
System, determines the corresponding time-frequency mask of the present frame signals with noise, wherein the target area is the target voice sound source institute
Physical location region.
Optionally, on the basis of above-mentioned Fig. 5, the time-frequency mask determines submodule 5022, can be specifically used for: if institute
It states estimation orientation to be located in the target area, it is determined that the time-frequency mask is predetermined fixed value T1;If the estimation side
Position is located at outside the target area, it is determined that the time-frequency mask is predetermined fixed value T2;Wherein T2≤1 0≤T1 <.
Or it is optional, on the basis of above-mentioned Fig. 5, the time-frequency mask determines submodule 5022, can be specifically used for: such as
Estimation orientation described in fruit is located at outside the target area, it is determined that and the time-frequency mask is predetermined fixed value T3, wherein 0 < T3≤
1;If the estimation orientation is located in the target area, specific in the target area according to the estimation orientation
Relative position determines that the time-frequency mask is T4;Wherein 0≤T4 < T3.Wherein, the estimation orientation is closer to the target area
The center in domain, the numerical value of the T4 is closer to 0;Marginal position of the estimation orientation closer to the target area, institute
The numerical value of T4 is stated closer to the T3.
Optionally, on the basis of above-mentioned Fig. 4, the filter coefficient determining module 503 in the speech sound enhancement device is wrapped
It includes:
Fourier transformation submodule obtains described current for carrying out Fourier transformation to the present frame signals with noise
The Fourier transformation frequency spectrum of frame signals with noise;
Noise covariance matrix computational submodule, for calculating noise covariance matrix according to following formula:And
Filter coefficient computational submodule, for calculating the filter coefficient according to following formula:
Wherein,For the noise covariance matrix, T is the time-frequency mask, and y (t, f) is to characterize in Fu
The matrix of leaf transformation frequency spectrum, yH(t, f) is the conjugate matrices of y (t, f), and w (f) is the filter coefficient,Work as to be described
The corresponding steering vector being estimated of previous frame signals with noise,ForConjugate matrices.
In addition, embodiment additionally provides a kind of speech enhancement apparatus, the speech enhancement apparatus include: microphone array,
Processor, memory and it is stored in the computer program that can be run on the memory and on the processor, the place
Reason device is connect with the microphone array, when the processor executes the computer program, realizes that any of the above method is real
Apply sound enhancement method described in example.
In above-mentioned speech enhancement apparatus, as an example, digital microphone array, the place can be selected in the microphone array
Managing device can be selected digital signal processor.As an example, nonnumeric formula microphone array, institute also can be selected in the microphone array
Stating processor still can be selected digital signal processor, can connect the microphone array and the place by audio decoder at this time
Device is managed, the signals with noise that the audio decoder is used to generate the microphone array carries out digitized sampling conversion, and will
Data after conversion are sent into the digital signal processor.
In addition, embodiment additionally provides a kind of storage medium, it is stored thereon with computer program, when the computer program
When being executed by processor, sound enhancement method described in any of the above embodiment of the method is realized.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of sound enhancement method characterized by comprising
The present frame signals with noise of microphone array acquisition is obtained, the present frame signals with noise includes at least target voice sound source
The voice signal respectively issued with other sound sources;
Using the present frame signals with noise, the corresponding time-frequency mask of the present frame signals with noise is determined;
Using the time-frequency mask, the corresponding filter coefficient of the present frame signals with noise is determined;
Using the filter coefficient, speech enhan-cement processing is carried out to signals with noise.
2. sound enhancement method according to claim 1, which is characterized in that utilize the present frame signals with noise, determine
The corresponding time-frequency mask of the present frame signals with noise, comprising:
According to the present frame signals with noise, estimation side of the target voice sound source relative to the microphone array is determined
Position;
According to the relative positional relationship between the estimation orientation and target area, determine that the present frame signals with noise is corresponding
Time-frequency mask, wherein the target area is the physical location region where the target voice sound source.
3. sound enhancement method according to claim 2, which is characterized in that according to the estimation orientation and target area it
Between relative positional relationship, determine the corresponding time-frequency mask of the present frame signals with noise, comprising:
If the estimation orientation is located in the target area, it is determined that the time-frequency mask is predetermined fixed value T1;
If the estimation orientation is located at outside the target area, it is determined that the time-frequency mask is predetermined fixed value T2;
Wherein T2≤1 0≤T1 <.
4. sound enhancement method according to claim 2, which is characterized in that according to the estimation orientation and target area it
Between relative positional relationship, determine the corresponding time-frequency mask of the present frame signals with noise, comprising:
If the estimation orientation is located at outside the target area, it is determined that the time-frequency mask is predetermined fixed value T3, wherein 0
T3≤1 <;
If the estimation orientation is located in the target area, according to tool of the estimation orientation in the target area
Body relative position determines that the time-frequency mask is T4;
Wherein 0≤T4 < T3.
5. sound enhancement method according to claim 4, which is characterized in that the size of the numerical value of the T4 meets with ShiShimonoseki
System:
The estimation orientation is closer to the center of the target area, and the numerical value of the T4 is closer to 0;The estimation orientation
Marginal position closer to the target area, the numerical value of the T4 is closer to the T3.
6. sound enhancement method according to claim 1, which is characterized in that it is described to utilize the time-frequency mask, determine institute
State the corresponding filter coefficient of present frame signals with noise, comprising:
Fourier transformation is carried out to the present frame signals with noise, obtains the Fourier transformation frequency of the present frame signals with noise
Spectrum;
Noise covariance matrix is calculated according to following formula:
The filter coefficient is calculated according to following formula:
Wherein,For the noise covariance matrix, T is the time-frequency mask, and y (t, f) is to characterize the Fourier to become
Change the matrix of frequency spectrum, yH(t, f) is the conjugate matrices of y (t, f), and w (f) is the filter coefficient,For the present frame
The corresponding steering vector being estimated of signals with noise,ForConjugate matrices.
7. a kind of speech sound enhancement device characterized by comprising
Module is obtained, for obtaining the present frame signals with noise of microphone array acquisition, the present frame signals with noise is at least wrapped
Include the voice signal that target voice sound source and other sound sources respectively issue;
Time-frequency mask determining module determines that the present frame signals with noise is corresponding for utilizing the present frame signals with noise
Time-frequency mask;
Filter coefficient determining module determines the corresponding filtering of the present frame signals with noise for utilizing the time-frequency mask
Device coefficient;And
Speech enhan-cement module carries out speech enhan-cement processing to signals with noise for utilizing the filter coefficient.
8. speech sound enhancement device according to claim 7, which is characterized in that the time-frequency mask determining module includes:
Estimation orientation determines submodule, for according to the present frame signals with noise, determine the target voice sound source relative to
The estimation orientation of the microphone array;And
Time-frequency mask determines submodule, for determining according to the relative positional relationship between the estimation orientation and target area
The corresponding time-frequency mask of the present frame signals with noise, wherein the target area is the reality where the target voice sound source
The band of position.
9. a kind of speech enhancement apparatus, comprising: microphone array, processor, memory and be stored on the memory simultaneously
The computer program that can be run on the processor, the processor are connect with the microphone array, which is characterized in that when
When the processor executes the computer program, sound enhancement method described in claim 1 to 6 any one is realized.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that when the computer program is by processor
When execution, sound enhancement method described in claim 1 to 6 any one is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233376.XA CN110085246A (en) | 2019-03-26 | 2019-03-26 | Sound enhancement method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233376.XA CN110085246A (en) | 2019-03-26 | 2019-03-26 | Sound enhancement method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110085246A true CN110085246A (en) | 2019-08-02 |
Family
ID=67413667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910233376.XA Pending CN110085246A (en) | 2019-03-26 | 2019-03-26 | Sound enhancement method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110085246A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600050A (en) * | 2019-09-12 | 2019-12-20 | 深圳市华创技术有限公司 | Microphone array voice enhancement method and system based on deep neural network |
CN111128221A (en) * | 2019-12-17 | 2020-05-08 | 北京小米智能科技有限公司 | Audio signal processing method and device, terminal and storage medium |
CN111276150A (en) * | 2020-01-20 | 2020-06-12 | 杭州耳青聪科技有限公司 | Intelligent voice-to-character and simultaneous interpretation system based on microphone array |
CN111429934A (en) * | 2020-03-13 | 2020-07-17 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN112533120A (en) * | 2020-11-23 | 2021-03-19 | 北京声加科技有限公司 | Beam forming method and device based on dynamic compression of noisy speech signal magnitude spectrum |
CN112785997A (en) * | 2020-12-29 | 2021-05-11 | 紫光展锐(重庆)科技有限公司 | Noise estimation method and device, electronic equipment and readable storage medium |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
TWI818493B (en) * | 2021-04-01 | 2023-10-11 | 大陸商深圳市韶音科技有限公司 | Methods, systems, and devices for speech enhancement |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976565A (en) * | 2010-07-09 | 2011-02-16 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and method |
CN103594093A (en) * | 2012-08-15 | 2014-02-19 | 王景芳 | Method for enhancing voice based on signal to noise ratio soft masking |
CN104103277A (en) * | 2013-04-15 | 2014-10-15 | 北京大学深圳研究生院 | Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method |
CN105788607A (en) * | 2016-05-20 | 2016-07-20 | 中国科学技术大学 | Speech enhancement method applied to dual-microphone array |
CN108831495A (en) * | 2018-06-04 | 2018-11-16 | 桂林电子科技大学 | A kind of sound enhancement method applied to speech recognition under noise circumstance |
CN109036411A (en) * | 2018-09-05 | 2018-12-18 | 深圳市友杰智新科技有限公司 | A kind of intelligent terminal interactive voice control method and device |
CN109308904A (en) * | 2018-10-22 | 2019-02-05 | 上海声瀚信息科技有限公司 | A kind of array voice enhancement algorithm |
-
2019
- 2019-03-26 CN CN201910233376.XA patent/CN110085246A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976565A (en) * | 2010-07-09 | 2011-02-16 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and method |
CN103594093A (en) * | 2012-08-15 | 2014-02-19 | 王景芳 | Method for enhancing voice based on signal to noise ratio soft masking |
CN104103277A (en) * | 2013-04-15 | 2014-10-15 | 北京大学深圳研究生院 | Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method |
CN105788607A (en) * | 2016-05-20 | 2016-07-20 | 中国科学技术大学 | Speech enhancement method applied to dual-microphone array |
CN108831495A (en) * | 2018-06-04 | 2018-11-16 | 桂林电子科技大学 | A kind of sound enhancement method applied to speech recognition under noise circumstance |
CN109036411A (en) * | 2018-09-05 | 2018-12-18 | 深圳市友杰智新科技有限公司 | A kind of intelligent terminal interactive voice control method and device |
CN109308904A (en) * | 2018-10-22 | 2019-02-05 | 上海声瀚信息科技有限公司 | A kind of array voice enhancement algorithm |
Non-Patent Citations (3)
Title |
---|
JEONG S Y: ""Dominant speech enhancement based on SNR-adaptive soft mask filtering"", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 * |
T. HIGUCHI: ""robust MVDR beamforming using time-frequency masks for online/offline ASR in noise"", 《ICASSP》 * |
王智国: ""基于掩码迭代估计的多通道语音识别算法"", 《信息技术与标准化》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600050A (en) * | 2019-09-12 | 2019-12-20 | 深圳市华创技术有限公司 | Microphone array voice enhancement method and system based on deep neural network |
CN110600050B (en) * | 2019-09-12 | 2022-04-15 | 深圳市华创技术有限公司 | Microphone array voice enhancement method and system based on deep neural network |
CN111128221A (en) * | 2019-12-17 | 2020-05-08 | 北京小米智能科技有限公司 | Audio signal processing method and device, terminal and storage medium |
CN111128221B (en) * | 2019-12-17 | 2022-09-02 | 北京小米智能科技有限公司 | Audio signal processing method and device, terminal and storage medium |
CN111276150A (en) * | 2020-01-20 | 2020-06-12 | 杭州耳青聪科技有限公司 | Intelligent voice-to-character and simultaneous interpretation system based on microphone array |
CN111429934A (en) * | 2020-03-13 | 2020-07-17 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111429934B (en) * | 2020-03-13 | 2023-02-28 | 北京小米松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111862989B (en) * | 2020-06-01 | 2024-03-08 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN112533120B (en) * | 2020-11-23 | 2022-04-22 | 北京声加科技有限公司 | Beam forming method and device based on dynamic compression of noisy speech signal magnitude spectrum |
CN112533120A (en) * | 2020-11-23 | 2021-03-19 | 北京声加科技有限公司 | Beam forming method and device based on dynamic compression of noisy speech signal magnitude spectrum |
CN112785997B (en) * | 2020-12-29 | 2022-11-01 | 紫光展锐(重庆)科技有限公司 | Noise estimation method and device, electronic equipment and readable storage medium |
CN112785997A (en) * | 2020-12-29 | 2021-05-11 | 紫光展锐(重庆)科技有限公司 | Noise estimation method and device, electronic equipment and readable storage medium |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
TWI818493B (en) * | 2021-04-01 | 2023-10-11 | 大陸商深圳市韶音科技有限公司 | Methods, systems, and devices for speech enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110085246A (en) | Sound enhancement method, device, equipment and storage medium | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
CN109643554A (en) | Adaptive voice Enhancement Method and electronic equipment | |
CN104158990B (en) | Method and audio receiving circuit for processing audio signal | |
Nakashima et al. | Frequency domain binaural model based on interaural phase and level differences | |
CN103632677B (en) | Noisy Speech Signal processing method, device and server | |
US20090281804A1 (en) | Processing unit, speech recognition apparatus, speech recognition system, speech recognition method, storage medium storing speech recognition program | |
JP2019191558A (en) | Method and apparatus for amplifying speech | |
CN109979476A (en) | A kind of method and device of speech dereverbcration | |
CN110148420A (en) | A kind of audio recognition method suitable under noise circumstance | |
CN107408394A (en) | It is determined that the noise power between main channel and reference channel is differential and sound power stage is poor | |
US20100316228A1 (en) | Methods and systems for blind dereverberation | |
CN102881289A (en) | Hearing perception characteristic-based objective voice quality evaluation method | |
CN111429932A (en) | Voice noise reduction method, device, equipment and medium | |
CN113077806B (en) | Audio processing method and device, model training method and device, medium and equipment | |
CN104778948B (en) | A kind of anti-noise audio recognition method based on bending cepstrum feature | |
CN112820315A (en) | Audio signal processing method, audio signal processing device, computer equipment and storage medium | |
CN105702262A (en) | Headset double-microphone voice enhancement method | |
CN111883154B (en) | Echo cancellation method and device, computer-readable storage medium, and electronic device | |
WO2022256577A1 (en) | A method of speech enhancement and a mobile computing device implementing the method | |
Wake et al. | Enhancing listening capability of humanoid robot by reduction of stationary ego‐noise | |
US7646912B2 (en) | Method and device for ascertaining feature vectors from a signal | |
CN109215635B (en) | Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement | |
CN110875037A (en) | Voice data processing method and device and electronic equipment | |
JP6815956B2 (en) | Filter coefficient calculator, its method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190802 |