CN111863008A - Audio noise reduction method and device and storage medium - Google Patents
Audio noise reduction method and device and storage medium Download PDFInfo
- Publication number
- CN111863008A CN111863008A CN202010645246.XA CN202010645246A CN111863008A CN 111863008 A CN111863008 A CN 111863008A CN 202010645246 A CN202010645246 A CN 202010645246A CN 111863008 A CN111863008 A CN 111863008A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- processed
- audio signal
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009467 reduction Effects 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000005236 sound signal Effects 0.000 claims abstract description 248
- 238000000605 extraction Methods 0.000 claims abstract description 136
- 230000006870 function Effects 0.000 claims description 37
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 17
- 238000003062 neural network model Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The utility model discloses an audio noise reduction method, device and storage medium, relating to the technical field of audio processing and improving the accuracy of the extracted audio signal. The method comprises the following steps: acquiring an audio signal to be processed; inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed; performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed; and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
Description
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to an audio denoising method and apparatus, and a storage medium.
Background
Audio noise reduction generally refers to the process of removing or attenuating noise portions of a segment of an audio signal in some way to obtain a desired audio signal. Audio noise reduction in the general sense mainly refers to removing or attenuating noise to obtain an audio signal. In the related art, when an audio signal with a low signal-to-noise ratio is faced, the existing denoising method cannot accurately remove a noise signal, so that the accuracy of the extracted audio signal is low.
Disclosure of Invention
The embodiment of the disclosure provides an audio noise reduction method, an audio noise reduction device and a storage medium, so as to improve the accuracy of an extracted target speech signal.
According to a first aspect of the embodiments of the present disclosure, there is provided an audio noise reduction method, including:
acquiring an audio signal to be processed;
inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed;
performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is trained by:
acquiring sample data, wherein each sample data comprises a first band noise frequency signal and a first noise signal;
inputting the first noise-carrying frequency signal into a noise extraction model to obtain a second noise signal;
calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
In one possible implementation, the signal extraction model is trained by:
obtaining sample data, wherein each sample data comprises a second noisy frequency signal and a first voice signal, the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy frequency signal is obtained by the following method:
inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
Inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal;
calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
and adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
In one possible implementation manner, the acquiring an audio signal to be processed includes:
carrying out time domain processing on the to-be-processed noise frequency signal to obtain the to-be-processed audio signal; or;
and carrying out time-frequency domain processing on the to-be-processed signal with the noise frequency to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain; or;
Carrying out short-time Fourier transform on the noise-reduced audio signal to be processed in a time domain to obtain the noise-reduced audio signal to be processed in a time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed on the time-frequency domain into a signal extraction model to obtain the target voice signal on the time-frequency domain; or;
performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
According to a second aspect of the embodiments of the present disclosure, there is provided an audio noise reduction apparatus comprising:
An acquisition signal unit configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit configured to perform input of the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
the noise reduction unit is configured to perform noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and the signal extraction unit is configured to input the noise-reduced audio signal to be processed into a signal extraction model to obtain a target voice signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is obtained by training:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
And the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal extraction model is obtained by training:
a second obtaining sample unit configured to perform obtaining sample data, each sample data including a second noisy audio signal and a first voice signal, wherein the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy audio signal is obtained by: inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
a third noise extraction unit configured to perform inputting the second noisy audio signal into a signal extraction model, resulting in a second speech signal;
a second calculation unit configured to perform calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
And the second determination model unit is configured to adjust the parameter information of the signal extraction model according to the loss function until the signal extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal acquiring unit includes:
the first acquisition signal subunit is configured to perform time domain processing on a to-be-processed noisy audio signal to obtain the to-be-processed audio signal;
a second obtaining signal subunit configured to perform time-frequency domain processing on the to-be-processed noisy frequency signal to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; the signal extraction unit includes:
a first signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time domain to a signal extraction model, resulting in the target speech signal in the time domain; or;
the second signal extraction subunit is configured to perform short-time fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; the signal extraction unit includes:
a third signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time-frequency domain to a signal extraction model, so as to obtain the target speech signal in the time-frequency domain; or;
the fourth signal extraction subunit is configured to perform inverse short-time fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement an audio noise reduction method;
according to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform an audio noise reduction method;
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio noise reduction method provided by the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
firstly, extracting noise from an audio signal to be processed to obtain a noise signal, and reducing the noise of the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and then, carrying out audio signal extraction processing on the audio signal to be processed after noise reduction to obtain a target audio signal. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:
fig. 1 is a schematic flow chart of an audio denoising method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a first audio denoising method in an embodiment of the present disclosure;
FIG. 3 is a flow chart of a second audio denoising method in an embodiment of the present disclosure;
FIG. 4 is a flow chart of a third audio denoising method in an embodiment of the present disclosure;
FIG. 5 is a flow chart of a fourth method of audio noise reduction in an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an audio noise reduction device in an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a terminal device in an embodiment of the present disclosure.
Detailed Description
In order to improve the accuracy of an extracted target speech signal, the embodiments of the present disclosure provide an audio noise reduction method, an audio noise reduction device, and a storage medium. In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The technical scheme provided by the embodiment of the disclosure is described below with reference to the accompanying drawings.
Audio noise reduction generally refers to the process of removing or attenuating noise portions of a segment of an audio signal in some way to obtain a desired audio signal. Audio noise reduction in the general sense refers primarily to removing or attenuating noise into a desired signal. At present, the audio noise reduction mode is mainly divided into a traditional noise reduction algorithm and a noise reduction algorithm based on a neural network. The traditional noise reduction algorithm mainly refers to algorithms such as spectral subtraction and wiener filtering, and often depends on the additive property of background noise or the statistical properties of audio signals and noise signals, and the performance of the traditional noise reduction algorithm cannot meet the actual requirements for unexpected noise types such as sudden noise and the like in the actual environment. Therefore, in consideration of the complexity of actual noise, the noise reduction algorithm based on the neural network is rapidly developed, and the noise reduction algorithm has obvious advantages in the environments of low signal-to-noise ratio, non-stationary noise and the like.
The audio signal without noise is used as a sample to be trained, so that when the trained neural network model receives the signal with noise, the audio signal in the signal with noise is extracted to complete the audio noise reduction function. However, in the case of a low SNR, for example, the overall SNR is 0dB, the local SNR corresponding to the low energy part of the audio signal will be lower than 0 dB. Under the condition of extremely low signal-to-noise ratio, because the energy of a noise signal is far greater than that of an audio signal, the deep neural network is extremely difficult to accurately learn the part of the signal, and the audio signal cannot be accurately extracted.
In view of the above, the present disclosure provides an audio noise reduction method for solving the above problems, which includes performing noise extraction processing on an audio signal to be processed to obtain a noise signal, and performing noise reduction on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and then, carrying out audio signal extraction processing on the audio signal to be processed after noise reduction to obtain a target audio signal. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
For the convenience of understanding, the technical solutions provided by the present disclosure are further described below with reference to the accompanying drawings.
Fig. 1 is a flow chart illustrating a method of audio noise reduction according to an exemplary embodiment, as shown in fig. 1, including the following steps.
In step S11, an audio signal to be processed is acquired.
Wherein, the audio signal to be processed is a signal with noise frequency.
In step S12, the audio signal to be processed is input into a noise extraction model, so as to obtain a noise signal in the audio signal to be processed.
The noise signal may be divided into various types, and the types of the noise signal are divided according to the time domain waveform.
In the embodiment of the present disclosure, the noise extraction processing may be performed on the audio signal to be processed through a neural network model. In which a neural network model for extracting noise can be trained by the following method. The method specifically comprises the following steps of A1-A4:
step A1: sample data is acquired, each sample data including a first noisy frequency signal and a first noise signal.
Wherein the first noisy frequency signal and the first noise signal in each sample correspond to each other.
Step A2: and inputting the first band noise frequency signal into a noise extraction model to obtain a second noise signal.
Step A3: calculating a loss function of the noise extraction model from the first noise signal and the second noise signal.
Step A4: and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
In the embodiment of the present disclosure, the first noisy audio signal is input to the noise extraction model for training, and a parameter in the noise extraction model is adjusted according to a loss function between the input result and the first noise signal, so that the loss function is within a predetermined range. In this way, the neural network model that extracts the noise is trained. When noise extraction is carried out, the audio signal to be processed is input into the trained noise extraction model, so that the noise signal of the audio signal to be processed can be obtained.
In the embodiment of the present disclosure, the noise reduction algorithm based on the neural network may be divided into a time domain algorithm (directly estimating a time domain waveform as an estimation signal) and a time-frequency domain algorithm (performing STFT (short time fourier transform) on a signal to transform the signal to a time-frequency domain, performing estimation operation, and obtaining an estimation signal from ISTFT (inverse short time fourier transform) to a time domain according to a difference of an estimation target domain).
In the embodiment of the present disclosure, if the time domain processing is performed on the to-be-processed noisy audio signal, the to-be-processed audio signal is obtained; and if the to-be-processed signal with the noise frequency is subjected to time-frequency domain processing, obtaining the to-be-processed audio signal. In this way, if the obtained audio signal to be processed is a signal in the time domain, inputting the audio signal to be processed into a noise extraction model in the time domain to obtain a noise signal in the audio signal to be processed; the time domain noise extraction model is obtained by training with a noise signal as a sample. If the obtained audio signal to be processed is a signal in a time-frequency domain, inputting the audio signal to be processed into a noise extraction model in the time-frequency domain to obtain a noise signal in the audio signal to be processed; the noise extraction model of the time-frequency domain is obtained by training with a noise signal as a sample.
In the time domain, Wave-Unet (Wave) and Conv-TasNet (time domain separation network) algorithms can be used as the algorithms for extracting signals in the time domain noise extraction model, and other algorithms can be used, which is not limited in this disclosure.
In the time-frequency domain, the Unet (segmentation) and the phanen (number of phases) algorithm may be used as the algorithm for extracting the signal in the noise extraction model of the time-frequency domain, and of course, other algorithms may also be used, which is not limited in this disclosure.
Therefore, the audio signal to be processed, whether the audio signal is a signal in a time domain or a signal in a time-frequency domain, can be extracted by the method disclosed by the invention, so that the noise extraction mode is more flexible.
Of course, if the obtained audio signal to be processed is a signal in the time domain, the audio signal to be processed may also be subjected to noise processing in the time-frequency domain. The audio signal to be processed in the time domain can be obtained only by performing STFT on the audio signal to be processed in the time domain, so that the conversion is completed.
In step S13, performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed.
In the embodiment of the present disclosure, after the noise signal is extracted, the noise signal needs to be removed from the audio signal to be processed. Namely, the audio signal to be processed is subjected to primary denoising processing to obtain a denoised audio signal to be processed. Therefore, the signal-to-noise ratio of the audio signal to be processed after noise reduction can be improved, and the audio signal can be extracted later.
In step S14, the noise-reduced audio signal to be processed is input to a signal extraction model, so as to obtain a target audio signal of the audio signal to be processed.
In the embodiment of the present disclosure, the noise-reduced audio signal to be processed may also be processed by extracting the audio signal through the neural network model. The signal extraction model and the noise extraction model are different in structure only in training samples. The neural network model that extracts noise can be trained by the following method. The method specifically comprises the following steps B1-B4:
step B1: sample data are obtained, and each sample data comprises a second noise-carrying frequency signal and a first voice signal.
The first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy signal is obtained by the following method:
and inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and carrying out noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal.
Step B2: and inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal.
Step B3: a loss function of the signal extraction model is calculated from the first speech signal and the second speech signal.
Step B4: and adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
Therefore, the audio signal to be processed after noise reduction is input into the neural network model, and the audio signal can be obtained.
Also, in the embodiment of the present disclosure, the method of extracting the audio signal according to the difference of the estimation target domain by the neural network based noise reduction algorithm may also be performed in the time domain and the time-frequency domain. Specifically, the method comprises the following steps:
if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target speech signal in the time domain.
As shown in fig. 2, it is an overall flow chart of the first audio noise reduction. Wherein S (t) represents the audio signal to be processed in the time domain, S noise (t) represents the noise signal in the time domain, S noise (t) represents the noise-reduced audio signal to be processed in the time domain, and S audio (t) represents the audio signal in the time domain. S denoise (t) — S (t) -S denoise (t), where t represents time.
If the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; then, performing short-time Fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
As shown in fig. 3, it is an overall flow chart of the second audio noise reduction. Wherein S denoise (n, k) represents the denoised audio signal to be processed in the time-frequency domain, and S audio (n, k) represents the audio signal in the time-frequency domain. Where n denotes a frame and k denotes a frequency.
If the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
As shown in fig. 4, it is an overall flow chart of the third audio noise reduction. Where S (n, k) represents the audio signal to be processed in the time-frequency domain, and S noise (n, k) represents the noise signal in the time-frequency domain. S denoise (n, k) -S denoise (n, k).
If the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
As shown in fig. 5, it is an overall flow chart of a fourth audio denoising method. Wherein S (n, k) represents the audio signal to be processed in the time-frequency domain, and S noise (n, k) represents the noise signal in the time-frequency domain. S denoise (n, k) represents the denoised audio signal to be processed in the time-frequency domain, and S audio (n, k) represents the audio signal in the time-frequency domain.
Therefore, even if the front and the back neural network models are in different target domains, the audio signal to be processed can still be processed, and the audio signal is extracted and obtained.
Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
Based on the same inventive concept, the present disclosure also provides an audio noise reduction device. As shown in fig. 6, a schematic diagram of an audio noise reduction apparatus provided by the present disclosure is shown. The device includes:
an acquisition signal unit 601 configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit 602 configured to perform inputting the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
A noise reduction unit 603 configured to perform noise reduction processing on the audio signal to be processed according to the noise signal, so as to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
a signal extraction unit 604 configured to perform inputting the noise-reduced audio signal to be processed into a signal extraction model, so as to obtain a target audio signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is obtained by training:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal extraction model is obtained by training:
a second obtaining sample unit configured to perform obtaining sample data, each sample data including a second noisy audio signal and a first voice signal, wherein the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy audio signal is obtained by: inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
a third noise extraction unit configured to perform inputting the second noisy audio signal into a signal extraction model, resulting in a second speech signal;
a second calculation unit configured to perform calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
and the second determination model unit is configured to adjust the parameter information of the signal extraction model according to the loss function until the signal extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal acquiring unit 601 includes:
the first acquisition signal subunit is configured to perform time domain processing on a to-be-processed noisy audio signal to obtain the to-be-processed audio signal;
a second obtaining signal subunit configured to perform time-frequency domain processing on the to-be-processed noisy frequency signal to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; the signal extraction unit 604 includes:
a first signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time domain to a signal extraction model, resulting in the target speech signal in the time domain; or;
the second signal extraction subunit is configured to perform short-time fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; the signal extraction unit 604 includes:
a third signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time-frequency domain to a signal extraction model, so as to obtain the target speech signal in the time-frequency domain; or;
the fourth signal extraction subunit is configured to perform inverse short-time fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
As shown in fig. 7, based on the same technical concept, the embodiment of the present disclosure also provides an electronic device 70, which may include a memory 701 and a processor 702.
The memory 701 is used for storing a computer program executed by the processor 702. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the task management device, and the like. The processor 702 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 701 and the processor 702 is not limited in the embodiments of the present disclosure. In fig. 7, the memory 701 and the processor 702 are connected by a bus 703, the bus 703 is represented by a thick line in fig. 7, and the connection manner between other components is merely illustrative and not limited. The bus 703 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The memory 701 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 701 may also be a non-volatile memory (non-volatile) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 701 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Memory 701 may be a combination of the above.
A processor 702 for executing the method performed by the apparatus in the embodiment shown in fig. 1 when invoking the computer program stored in said memory 701.
In some possible embodiments, various aspects of the methods provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the methods according to various exemplary embodiments of the present disclosure described above in this specification when the program product is run on the computer device, for example, the computer device may perform the methods performed by the devices in the embodiments shown in fig. 1-5.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (10)
1. A method for audio noise reduction, the method comprising:
acquiring an audio signal to be processed;
inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed;
performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed.
2. The audio denoising method of claim 1, wherein the noise extraction model is trained by:
acquiring sample data, wherein each sample data comprises a first band noise frequency signal and a first noise signal;
inputting the first noise-carrying frequency signal into a noise extraction model to obtain a second noise signal;
Calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
3. The audio noise reduction method according to claim 2, wherein the signal extraction model is trained by:
obtaining sample data, wherein each sample data comprises a second noisy frequency signal and a first voice signal, the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy frequency signal is obtained by the following method:
inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal;
calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
And adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
4. The audio noise reduction method according to any of claims 1 to 3, wherein the obtaining the audio signal to be processed comprises:
carrying out time domain processing on the to-be-processed noise frequency signal to obtain the to-be-processed audio signal; or the like, or, alternatively,
and carrying out time-frequency domain processing on the to-be-processed signal with the noise frequency to obtain the to-be-processed audio signal.
5. The audio noise reduction method according to claim 4, wherein if the audio signal to be processed is a time-domain signal, the noise-reduced audio signal to be processed is a time-domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain; or the like, or, alternatively,
carrying out short-time Fourier transform on the noise-reduced audio signal to be processed in a time domain to obtain the noise-reduced audio signal to be processed in a time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
6. The audio noise reduction method according to claim 4, wherein if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed on the time-frequency domain into a signal extraction model to obtain the target voice signal on the time-frequency domain; or the like, or, alternatively,
performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
7. An audio noise reduction apparatus, characterized in that the apparatus comprises:
an acquisition signal unit configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit configured to perform input of the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
The noise reduction unit is configured to perform noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and the signal extraction unit is configured to input the noise-reduced audio signal to be processed into a signal extraction model to obtain a target voice signal of the audio signal to be processed.
8. The audio noise reduction device of claim 7, wherein the noise extraction model is trained by:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio noise reduction method of any of claims 1 to 6.
10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the audio noise reduction method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645246.XA CN111863008A (en) | 2020-07-07 | 2020-07-07 | Audio noise reduction method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645246.XA CN111863008A (en) | 2020-07-07 | 2020-07-07 | Audio noise reduction method and device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111863008A true CN111863008A (en) | 2020-10-30 |
Family
ID=73153442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010645246.XA Pending CN111863008A (en) | 2020-07-07 | 2020-07-07 | Audio noise reduction method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111863008A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112447183A (en) * | 2020-11-16 | 2021-03-05 | 北京达佳互联信息技术有限公司 | Training method and device for audio processing model, audio denoising method and device, and electronic equipment |
CN112786066A (en) * | 2020-12-24 | 2021-05-11 | 北京猿力未来科技有限公司 | Audio signal screening method and device and electronic equipment |
CN113259801A (en) * | 2021-05-08 | 2021-08-13 | 深圳市睿耳电子有限公司 | Loudspeaker noise reduction method of intelligent earphone and related device |
CN113593598A (en) * | 2021-08-09 | 2021-11-02 | 深圳远虑科技有限公司 | Noise reduction method and device of audio amplifier in standby state and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1841500A (en) * | 2005-03-30 | 2006-10-04 | 松下电器产业株式会社 | Method and apparatus for resisting noise based on adaptive nonlinear spectral subtraction |
US8983844B1 (en) * | 2012-07-31 | 2015-03-17 | Amazon Technologies, Inc. | Transmission of noise parameters for improving automatic speech recognition |
CN109215665A (en) * | 2018-07-20 | 2019-01-15 | 广东工业大学 | A kind of method for recognizing sound-groove based on 3D convolutional neural networks |
CN110491404A (en) * | 2019-08-15 | 2019-11-22 | 广州华多网络科技有限公司 | Method of speech processing, device, terminal device and storage medium |
CN110827847A (en) * | 2019-11-27 | 2020-02-21 | 高小翎 | Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth |
US20200211580A1 (en) * | 2018-12-27 | 2020-07-02 | Lg Electronics Inc. | Apparatus for noise canceling and method for the same |
-
2020
- 2020-07-07 CN CN202010645246.XA patent/CN111863008A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1841500A (en) * | 2005-03-30 | 2006-10-04 | 松下电器产业株式会社 | Method and apparatus for resisting noise based on adaptive nonlinear spectral subtraction |
US8983844B1 (en) * | 2012-07-31 | 2015-03-17 | Amazon Technologies, Inc. | Transmission of noise parameters for improving automatic speech recognition |
CN109215665A (en) * | 2018-07-20 | 2019-01-15 | 广东工业大学 | A kind of method for recognizing sound-groove based on 3D convolutional neural networks |
US20200211580A1 (en) * | 2018-12-27 | 2020-07-02 | Lg Electronics Inc. | Apparatus for noise canceling and method for the same |
CN110491404A (en) * | 2019-08-15 | 2019-11-22 | 广州华多网络科技有限公司 | Method of speech processing, device, terminal device and storage medium |
CN110827847A (en) * | 2019-11-27 | 2020-02-21 | 高小翎 | Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112447183A (en) * | 2020-11-16 | 2021-03-05 | 北京达佳互联信息技术有限公司 | Training method and device for audio processing model, audio denoising method and device, and electronic equipment |
CN112786066A (en) * | 2020-12-24 | 2021-05-11 | 北京猿力未来科技有限公司 | Audio signal screening method and device and electronic equipment |
CN112786066B (en) * | 2020-12-24 | 2023-03-14 | 北京猿力未来科技有限公司 | Audio signal screening method and device and electronic equipment |
CN113259801A (en) * | 2021-05-08 | 2021-08-13 | 深圳市睿耳电子有限公司 | Loudspeaker noise reduction method of intelligent earphone and related device |
CN113593598A (en) * | 2021-08-09 | 2021-11-02 | 深圳远虑科技有限公司 | Noise reduction method and device of audio amplifier in standby state and electronic equipment |
CN113593598B (en) * | 2021-08-09 | 2024-04-12 | 深圳远虑科技有限公司 | Noise reduction method and device for audio amplifier in standby state and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767783B (en) | Voice enhancement method, device, equipment and storage medium | |
JP5666444B2 (en) | Apparatus and method for processing an audio signal for speech enhancement using feature extraction | |
Abd El-Fattah et al. | Speech enhancement with an adaptive Wiener filter | |
CN111863008A (en) | Audio noise reduction method and device and storage medium | |
KR101141033B1 (en) | Noise variance estimator for speech enhancement | |
EP2031583B1 (en) | Fast estimation of spectral noise power density for speech signal enhancement | |
CN101031963B (en) | Method of processing a noisy sound signal and device for implementing said method | |
CN112700786B (en) | Speech enhancement method, device, electronic equipment and storage medium | |
JP5752324B2 (en) | Single channel suppression of impulsive interference in noisy speech signals. | |
EP4189677B1 (en) | Noise reduction using machine learning | |
JP6748304B2 (en) | Signal processing device using neural network, signal processing method using neural network, and signal processing program | |
CN112599148A (en) | Voice recognition method and device | |
CN113571076A (en) | Signal processing method, signal processing device, electronic equipment and storage medium | |
CN113035216B (en) | Microphone array voice enhancement method and related equipment | |
Hammam et al. | Blind signal separation with noise reduction for efficient speaker identification | |
CN113314147B (en) | Training method and device of audio processing model, audio processing method and device | |
CN115497492A (en) | Real-time voice enhancement method based on full convolution neural network | |
CN110648681B (en) | Speech enhancement method, device, electronic equipment and computer readable storage medium | |
CN114360572A (en) | Voice denoising method and device, electronic equipment and storage medium | |
CN113593599A (en) | Method for removing noise signal in voice signal | |
Funaki | Speech enhancement based on iterative wiener filter using complex speech analysis | |
Seyedin et al. | New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition | |
Lu et al. | Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition | |
CN115985337B (en) | Transient noise detection and suppression method and device based on single microphone | |
Islam et al. | Enhancement of noisy speech based on decision-directed Wiener approach in perceptual wavelet packet domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |