CN113393850A - Parameterized auditory filter bank for end-to-end time domain sound source separation system - Google Patents
Parameterized auditory filter bank for end-to-end time domain sound source separation system Download PDFInfo
- Publication number
- CN113393850A CN113393850A CN202110569382.XA CN202110569382A CN113393850A CN 113393850 A CN113393850 A CN 113393850A CN 202110569382 A CN202110569382 A CN 202110569382A CN 113393850 A CN113393850 A CN 113393850A
- Authority
- CN
- China
- Prior art keywords
- sound source
- network
- filter bank
- auditory
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 63
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000004044 response Effects 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000002269 spontaneous effect Effects 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 108010003272 Hyaluronate lyase Proteins 0.000 description 1
- 241000269400 Sirenidae Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention provides a parameterized auditory filter bank for an end-to-end time domain sound source separation system; a parameterized auditory filter bank is introduced into an end-to-end time domain separation system, a separation model with better auditory rationality is established, and the separation performance of the network is improved. Compared with a fixed filter bank, the parameters of the parameterized auditory filter bank are obtained through network training, so that the method has better flexibility, and can perform spontaneous adjustment according to the characteristics of the network and data to obtain better separation performance. Compared with a free-form filter bank, the parameterized auditory filter bank provides prior information for the network auditory system in a form of a Gamma atom filter, so that the network can better simulate the human auditory system to improve the separation capability of the network in an actual scene, and the network has better interpretability. Furthermore, only 4 parameters per filter need to be trained, which significantly reduces the number of parameters of the network compared to a free-form filter where all parameters need to be trained.
Description
Technical Field
The invention belongs to the field of sound source separation, and particularly relates to a parameterized auditory filter bank for improving the performance of an end-to-end time domain sound source separation system.
Background
In real sound scenes, there is usually simultaneous sounding of multiple sound sources, and sound source separation has always been an important aspect of computational auditory scene analysis. With the rapid development of deep learning, sound source separation systems have made breakthrough progress. As shown in fig. 1, most end-to-end time-domain sound source separation systems at the present stage follow the framework of encoder-separator-decoder. The encoder converts the time-domain mixed sound into an intermediate representation, the separator estimates a weighting function (mask) of the sound source, and then multiplies the mask and the intermediate representation of the mixed sound source and obtains a separated sound source through the decoder.
The encoder refers to a set of filters that convolve a time domain signal, and may be fixed (referred to as a fixed filter bank in the present invention), such as a Short Time Fourier Transform (STFT), a constant Q transform, or an auditory filter bank having a fixed value. On the other hand, the filter bank may also be a set of one-dimensional convolutional layers with any initialization value, and its parameters are obtained through spontaneous learning in the network training process, and is referred to as a free-form filter bank in the present invention.
The fixed filter bank has intuitive explanation and contains priori knowledge, so that overfitting of a network is not easy to cause, but the performance is difficult to improve. In contrast, free-form filter banks have a high degree of freedom, and generally perform better, but are susceptible to noisy data during training.
Disclosure of Invention
Technical problem to be solved
The invention mainly aims at the problems that the performance of a fixed filter bank adopted by a coder in the existing sound source separation system is difficult to improve, and the free filter bank training is easily influenced by noise-containing data, compromises between flexibility and prior information, provides a parameterized auditory filter bank for an end-to-end time domain sound source separation system, and improves the separation performance of a network by improving the coder.
The excellent performance of the human ear in auditory scene analysis inspires us to introduce an auditory filter bank with physiological and psychoacoustic rationality into the sound source separation system. For auditory models, spectral analysis of the cochlea is typically modeled by a Gammatone filter bank. The parametric auditory filter bank proposed in the invention refers to a set of filter banks with a Gamma function form, and the parameters are obtained by network learning, so that the parametric auditory filter bank has better separation performance compared with a fixed filter bank and better auditory rationality and interpretability compared with a free filter bank.
The technical scheme of the invention is as follows:
the parameterized auditory filter bank for the end-to-end time domain sound source separation system adopts Gamma filters, and the number N of the filters is not less than 32; pure tone with filter time domain impulse response modulated by Gamma distribution:
g(t)=Atp-1e-2πbtcos(2πfct+φ)
wherein p is the order, fcIs the center frequency, b is the bandwidth, phi is the phase, a is the amplitude, determined by the order p and the bandwidth b.
The method for constructing the end-to-end time domain sound source separation system by using the parameterized auditory filter bank comprises the following steps:
step 1: creating a time-domain separation network according to the framework of the encoder-separator-decoder; the encoder is realized by a one-dimensional convolutional layer, and a filter bank of the encoder is in a parameterized auditory filter bank; a separator for estimating a mask of the sound source; the decoder is a one-dimensional deconvolution layer; multiplying the mask estimation value of each sound source from the separator by the two-dimensional representation of the mixed sound from the encoder, after which the time-domain signals of the separated sound sources can be synthesized by the decoder;
step 2: parameter sets for individual filters based on a priori knowledge of the auditory system of the human earInitializing, parameterizing parameter sets in an auditory filter bankIn the network training processThe variable:
(1) order p of each filteriAll are initialized to 4, corresponding to the mean fit value of the filter order in the auditory system of the human ear;
(2) center frequency of each filterThe initialization is uniformly distributed on an equivalent rectangular bandwidth ERB scale, wherein the mapping from linear frequency to ERB scale is
(4) Initialization phase phi of each filteriIs set asTo align the peak of the tone with the peak of the Gamma envelope;
and step 3: and selecting different sound sources to create a data set according to the separation task, and training a time domain separation network by using the data set to obtain an end-to-end time domain sound source separation system.
Further, the separator adopts a network structure based on deep convolution, and comprises a plurality of hole convolution modules with different expansion factors, wherein each module comprises a convolution layer, a rectifying layer, a normalization layer, a deep convolution layer and a residue and jump structure.
Further, when the time domain separation network is trained, the dimension invariant signal distortion ratio between the minimum real sound source and the estimated sound source is taken as a training target, the network is trained through an Adam optimizer until the separation performance is not improved any more, and an end-to-end time domain sound source separation system is obtained.
Advantageous effects
The invention introduces a parameterized auditory filter bank into an end-to-end time domain separation system, establishes a separation model with more auditory rationality and improves the separation performance of the network. Compared with a fixed filter bank, the parameters of the parameterized auditory filter bank are obtained through network training, so that the method has better flexibility, and can perform spontaneous adjustment according to the characteristics of the network and data to obtain better separation performance. Compared with a free-form filter bank, the parameterized auditory filter bank provides prior information for the network auditory system in a form of a Gamma atom filter, so that the network can better simulate the human auditory system to improve the separation capability of the network in an actual scene, and the network has better interpretability. Furthermore, only 4 parameters per filter need to be trained, which significantly reduces the number of parameters of the network compared to a free-form filter where all parameters need to be trained.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a general frame for a sound source separation system;
FIG. 2 is a diagram of an end-to-end time domain convolution separation network architecture;
FIG. 3(a) the frequency response of a parameterized Gamma Filter Bank, arranged in terms of center frequency; (b) frequency response of 4 gamma filters centered at 1.125 KHz.
Detailed Description
The present invention provides a parameterized auditory filter bank for improving the performance of an end-to-end time domain sound source separation network as an encoder of the end-to-end time domain sound source separation network and creates a more auditory rational end-to-end time domain sound source separation network based on the framework of encoder-separator-decoder. The encoder is in the form of a group of Gamma auditory filters, and the parameters of the encoder are obtained by learning in network training, so that the performance of a separation network is improved, and a foundation is laid for the application of selective listening of a machine in a real scene.
In this embodiment, training an end-to-end time domain separation network for separating any sound source, where an encoder is composed of a set of Gammatone filters, and a parameter set thereof is obtained through network learning, includes the following steps:
step 1: and constructing an end-to-end time domain convolutional network. A network is built according to the framework of encoder-splitter-decoder. The encoder is implemented by means of one-dimensional convolutional layers, the form of which filter bank is given in step 2. The mask used by the splitter to estimate the sound source may have a variety of network forms. The invention provides a network structure based on deep convolution, as shown in fig. 2, which comprises a plurality of hole convolution modules with different expansion factors, wherein each module comprises a convolution layer, a rectifying layer, a normalization layer, a deep convolution layer and a residue and jump structure. In this embodiment, the separator is composed of 3 convolution modules, and each convolution module is implemented by 8 pieces of hole-carrying convolution blocks with an exponent having a spreading factor of 2. The mask estimate from each sound source of the splitter is multiplied by a two-dimensional representation of the mixed sound from the encoder. Finally, the time domain signals of the separated sound sources are synthesized through a decoder (one-dimensional transposition convolution layer).
Step 2: a gamma atone filter bank is created and initialized.
The Gammatone filter is a filter with a good simulation effect on the auditory system, and the time domain impulse response of the filter can be expressed as pure tone modulated by Gamma distribution:
g(t)=Atp-1e-2πbtcos(2πfct+φ)
wherein p is the order, fcIs the center frequency, b is the bandwidth, phi is the phase, A is the amplitude
The encoder in the invention is a group of Gamma filtersThe number N of filters of the filter is not less than 32. Parameter set of filterAnd is variable during the network training process.
Suitable initial values facilitate network training, for which the parameter sets for the individual filters are adapted according to a priori knowledge of the auditory system of the human earInitialization is performed.
(1) Order p of each filteriIs initially set to 4, corresponding to the mean fit value of the filter order in the human auditory system.
(2) Center frequency of each filterThe initialization is uniformly distributed on an Equivalent Rectangular Bandwidth (ERB) scale, wherein the mapping from linear frequency to ERB scale is
(4) Initialization phase phi of each filteriIs set asTo align the peak of the tone with the peak of the Gamma envelope.
The encoder in this embodiment is composed of 512 Gamma filters of 2ms length, each with a parameter setThe initialization values of (a) are: order pi4, center frequencyBandwidth b being one of 512 frequency points evenly distributed on the ERB scaleiAnd phase phiiBy a correspondingAnd piAnd (4) calculating.
And step 3: a dataset is created and the network is trained. Different sound sources are selected to create the data set according to the separation task. And (3) taking a scale-invariant source-to-distortion ratio (SI-SDR) between the minimized real sound source and the estimated sound source as a training target, training the network by an Adam optimizer until the separation performance is not improved any more, and obtaining a sound source separation model.
To simulate the separation of arbitrary sound sources in a real sound scene, the present embodiment creates a multi-class large data set containing ambient sounds, voices, and tones. Ambient sounds (including traffic noise, sirens, dog calls, etc.) from the BBC effects data set, speech from the Librispeech data set, and musical tones from the musan data set are selected. Each source was down sampled to 16 kHz. Two different sound sources are randomly selected from the data set and mixed with a random signal-to-noise ratio of-5 dB to 5 dB. The data set contained 37.5 hours of acoustic samples, 70% for training, 20% for cross-validation, and 10% for testing.
And (3) taking a scale-invariant source-to-distortion ratio (SI-SDR) between the minimized real sound source and the estimated sound source as a training target, training the network by an Adam optimizer until the separation performance is not improved any more, and obtaining a sound source separation model.
The network SI-SDR improvement values (dB) on the test set are shown in Table 1. Compared with a fixed filter bank, the parameter type Gamma filter bank has the separation performance improved by 2.31dB, and proves that the network can learn the parameter set more suitable for the separation system, and has higher flexibility and better separation performance. Compared with a free-form filter bank, the parametric gamma-atom filter bank not only has better interpretability, but also obtains the improvement on performance, and proves that the gamma-atom filter bank with auditory rationality can introduce prior information beneficial to a separation network.
TABLE 1 Sound Source separation network with different coders SI-SDR improvement values (dB) in test set
Fig. 3(a) shows 512 filter responses obtained after network training, which are arranged by center frequency. (b) The frequency response of 4 gamma filters centered at 1.125KHz is shown. The result shows that the center frequency learned by the network is still distributed according to the ERB scale, but the more abundant orders p and bandwidths b can be learned, which indicates that the network is sensitive to the parameters of the filter. It is difficult to artificially determine suitable fixed parameter values for the filter bank, and spontaneous learning of parameter values through network training is a better way to improve performance.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.
Claims (6)
1. A parameterized auditory filter bank for an end-to-end time-domain sound source separation system, characterized by: the Gamma filters are adopted, and the number N of the filters is not less than 32; pure tone with filter time domain impulse response modulated by Gamma distribution:
g(t)=Atp-1e-2πbtcos(2πfct+φ)
wherein p is the order, fcIs the center frequency of the frequency band, and is,b is the bandwidth, phi is the phase, and A is the amplitude, determined by the order p and the bandwidth b.
3. A method of constructing an end-to-end time-domain sound source separation system using the parameterized auditory filter bank of claim 1, wherein: the method comprises the following steps:
step 1: creating a time-domain separation network according to the framework of the encoder-separator-decoder; wherein the encoder is implemented by a one-dimensional convolutional layer, and the filter bank of the encoder is the parameterized auditory filter bank; a separator for estimating a mask of the sound source; the decoder is a one-dimensional deconvolution layer; multiplying the mask estimation value of each sound source from the separator by the two-dimensional representation of the mixed sound from the encoder, after which the time-domain signals of the separated sound sources can be synthesized by the decoder;
step 2: parameter sets for individual filters based on a priori knowledge of the auditory system of the human earInitializing, parameterizing parameter sets in an auditory filter bankDuring the network training process, variable:
(1) order p of each filteriAll are initialized to 4, corresponding to the mean fit value of the filter order in the auditory system of the human ear;
(2) center frequency of each filterThe initialization is uniformly distributed on the ERB scale of the equivalent rectangular bandwidthFrom linear frequency to ERB scale as
(4) Initialization phase phi of each filteriIs set asTo align the peak of the tone with the peak of the Gamma envelope;
and step 3: and selecting different sound sources to create a data set according to the separation task, and training a time domain separation network by using the data set to obtain an end-to-end time domain sound source separation system.
4. A method of constructing an end-to-end time-domain sound source separation system according to claim 3, wherein: the separator adopts a network structure based on deep convolution and comprises a plurality of hole convolution modules with different expansion factors, and each hole convolution module comprises a convolution layer, a rectifying layer, a normalization layer, a deep convolution layer, a residue number and a jump structure.
5. A method of constructing an end-to-end time-domain sound source separation system according to claim 3, wherein: when the time domain separation network is trained, an end-to-end time domain sound source separation system is obtained by taking the minimized scale invariant signal distortion ratio between a real sound source and an estimated sound source as a training target and training the network through an Adam optimizer until the separation performance is not improved any more.
6. An end-to-end time domain sound source separation system, characterized by: constructed by the method of claim 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569382.XA CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569382.XA CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113393850A true CN113393850A (en) | 2021-09-14 |
CN113393850B CN113393850B (en) | 2024-01-19 |
Family
ID=77618982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110569382.XA Active CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393850B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117711423A (en) * | 2024-02-05 | 2024-03-15 | 西北工业大学 | Mixed underwater sound signal separation method combining auditory scene analysis and deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
US20140044279A1 (en) * | 2012-08-07 | 2014-02-13 | Microsoft Corporation | Multi-microphone audio source separation based on combined statistical angle distributions |
CN103985390A (en) * | 2014-05-20 | 2014-08-13 | 北京安慧音通科技有限责任公司 | Method for extracting phonetic feature parameters based on gammatone relevant images |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
CN110010150A (en) * | 2019-04-15 | 2019-07-12 | 吉林大学 | Auditory Perception speech characteristic parameter extracting method based on multiresolution |
US20190394568A1 (en) * | 2018-06-21 | 2019-12-26 | Trustees Of Boston University | Auditory signal processor using spiking neural network and stimulus reconstruction with top-down attention control |
-
2021
- 2021-05-25 CN CN202110569382.XA patent/CN113393850B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140044279A1 (en) * | 2012-08-07 | 2014-02-13 | Microsoft Corporation | Multi-microphone audio source separation based on combined statistical angle distributions |
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
CN103985390A (en) * | 2014-05-20 | 2014-08-13 | 北京安慧音通科技有限责任公司 | Method for extracting phonetic feature parameters based on gammatone relevant images |
US20190394568A1 (en) * | 2018-06-21 | 2019-12-26 | Trustees Of Boston University | Auditory signal processor using spiking neural network and stimulus reconstruction with top-down attention control |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
CN110010150A (en) * | 2019-04-15 | 2019-07-12 | 吉林大学 | Auditory Perception speech characteristic parameter extracting method based on multiresolution |
Non-Patent Citations (1)
Title |
---|
王雨;林家骏;袁文浩;陈宁;: "基于改进基音跟踪算法的单通道语音分离", 华东理工大学学报(自然科学版), no. 03 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117711423A (en) * | 2024-02-05 | 2024-03-15 | 西北工业大学 | Mixed underwater sound signal separation method combining auditory scene analysis and deep learning |
CN117711423B (en) * | 2024-02-05 | 2024-05-10 | 西北工业大学 | Mixed underwater sound signal separation method and system combining auditory scene analysis and deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN113393850B (en) | 2024-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5124014B2 (en) | Signal enhancement apparatus, method, program and recording medium | |
CN107452389A (en) | A kind of general monophonic real-time noise-reducing method | |
CN107845389A (en) | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks | |
CN110473567A (en) | Audio-frequency processing method, device and storage medium based on deep neural network | |
US20080228470A1 (en) | Signal separating device, signal separating method, and computer program | |
Lee et al. | Differentiable artificial reverberation | |
JP6485711B2 (en) | Sound field reproduction apparatus and method, and program | |
Ramírez et al. | A general-purpose deep learning approach to model time-varying audio effects | |
CN110660406A (en) | Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene | |
CN115424627A (en) | Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm | |
CN113823316B (en) | Voice signal separation method for sound source close to position | |
CN113393850B (en) | Parameterized auditory filter bank for end-to-end time domain sound source separation system | |
CN112201276B (en) | TC-ResNet network-based microphone array voice separation method | |
US7280943B2 (en) | Systems and methods for separating multiple sources using directional filtering | |
CN113327624B (en) | Method for intelligent monitoring of environmental noise by adopting end-to-end time domain sound source separation system | |
Mu et al. | An objective analysis method for perceptual quality of a virtual bass system | |
CN113921007B (en) | Method for improving far-field voice interaction performance and far-field voice interaction system | |
CN111091847A (en) | Deep clustering voice separation method based on improvement | |
CN115019818A (en) | Voice dereverberation method, device and equipment based on progressive reverberation perception network | |
Kim et al. | Hd-demucs: General speech restoration with heterogeneous decoders | |
CN116935879A (en) | Two-stage network noise reduction and dereverberation method based on deep learning | |
Miyazaki et al. | Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction | |
Drake et al. | A computational auditory scene analysis-enhanced beamforming approach for sound source separation | |
Douglas et al. | Blind separation of acoustical mixtures without time-domain deconvolution or decorrelation | |
CN113689869A (en) | Speech enhancement method, electronic device, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |