CN111863008A - Audio noise reduction method and device and storage medium - Google Patents

Audio noise reduction method and device and storage medium Download PDF

Info

Publication number
CN111863008A
CN111863008A CN202010645246.XA CN202010645246A CN111863008A CN 111863008 A CN111863008 A CN 111863008A CN 202010645246 A CN202010645246 A CN 202010645246A CN 111863008 A CN111863008 A CN 111863008A
Authority
CN
China
Prior art keywords
signal
noise
processed
audio signal
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010645246.XA
Other languages
Chinese (zh)
Inventor
郑羲光
张晨
郭亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010645246.XA priority Critical patent/CN111863008A/en
Publication of CN111863008A publication Critical patent/CN111863008A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The utility model discloses an audio noise reduction method, device and storage medium, relating to the technical field of audio processing and improving the accuracy of the extracted audio signal. The method comprises the following steps: acquiring an audio signal to be processed; inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed; performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed; and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.

Description

Audio noise reduction method and device and storage medium
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to an audio denoising method and apparatus, and a storage medium.
Background
Audio noise reduction generally refers to the process of removing or attenuating noise portions of a segment of an audio signal in some way to obtain a desired audio signal. Audio noise reduction in the general sense mainly refers to removing or attenuating noise to obtain an audio signal. In the related art, when an audio signal with a low signal-to-noise ratio is faced, the existing denoising method cannot accurately remove a noise signal, so that the accuracy of the extracted audio signal is low.
Disclosure of Invention
The embodiment of the disclosure provides an audio noise reduction method, an audio noise reduction device and a storage medium, so as to improve the accuracy of an extracted target speech signal.
According to a first aspect of the embodiments of the present disclosure, there is provided an audio noise reduction method, including:
acquiring an audio signal to be processed;
inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed;
performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is trained by:
acquiring sample data, wherein each sample data comprises a first band noise frequency signal and a first noise signal;
inputting the first noise-carrying frequency signal into a noise extraction model to obtain a second noise signal;
calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
In one possible implementation, the signal extraction model is trained by:
obtaining sample data, wherein each sample data comprises a second noisy frequency signal and a first voice signal, the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy frequency signal is obtained by the following method:
inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
Inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal;
calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
and adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
In one possible implementation manner, the acquiring an audio signal to be processed includes:
carrying out time domain processing on the to-be-processed noise frequency signal to obtain the to-be-processed audio signal; or;
and carrying out time-frequency domain processing on the to-be-processed signal with the noise frequency to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain; or;
Carrying out short-time Fourier transform on the noise-reduced audio signal to be processed in a time domain to obtain the noise-reduced audio signal to be processed in a time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed on the time-frequency domain into a signal extraction model to obtain the target voice signal on the time-frequency domain; or;
performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
According to a second aspect of the embodiments of the present disclosure, there is provided an audio noise reduction apparatus comprising:
An acquisition signal unit configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit configured to perform input of the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
the noise reduction unit is configured to perform noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and the signal extraction unit is configured to input the noise-reduced audio signal to be processed into a signal extraction model to obtain a target voice signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is obtained by training:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
And the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal extraction model is obtained by training:
a second obtaining sample unit configured to perform obtaining sample data, each sample data including a second noisy audio signal and a first voice signal, wherein the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy audio signal is obtained by: inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
a third noise extraction unit configured to perform inputting the second noisy audio signal into a signal extraction model, resulting in a second speech signal;
a second calculation unit configured to perform calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
And the second determination model unit is configured to adjust the parameter information of the signal extraction model according to the loss function until the signal extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal acquiring unit includes:
the first acquisition signal subunit is configured to perform time domain processing on a to-be-processed noisy audio signal to obtain the to-be-processed audio signal;
a second obtaining signal subunit configured to perform time-frequency domain processing on the to-be-processed noisy frequency signal to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; the signal extraction unit includes:
a first signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time domain to a signal extraction model, resulting in the target speech signal in the time domain; or;
the second signal extraction subunit is configured to perform short-time fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; the signal extraction unit includes:
a third signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time-frequency domain to a signal extraction model, so as to obtain the target speech signal in the time-frequency domain; or;
the fourth signal extraction subunit is configured to perform inverse short-time fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement an audio noise reduction method;
according to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform an audio noise reduction method;
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio noise reduction method provided by the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
firstly, extracting noise from an audio signal to be processed to obtain a noise signal, and reducing the noise of the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and then, carrying out audio signal extraction processing on the audio signal to be processed after noise reduction to obtain a target audio signal. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:
fig. 1 is a schematic flow chart of an audio denoising method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a first audio denoising method in an embodiment of the present disclosure;
FIG. 3 is a flow chart of a second audio denoising method in an embodiment of the present disclosure;
FIG. 4 is a flow chart of a third audio denoising method in an embodiment of the present disclosure;
FIG. 5 is a flow chart of a fourth method of audio noise reduction in an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an audio noise reduction device in an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a terminal device in an embodiment of the present disclosure.
Detailed Description
In order to improve the accuracy of an extracted target speech signal, the embodiments of the present disclosure provide an audio noise reduction method, an audio noise reduction device, and a storage medium. In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The technical scheme provided by the embodiment of the disclosure is described below with reference to the accompanying drawings.
Audio noise reduction generally refers to the process of removing or attenuating noise portions of a segment of an audio signal in some way to obtain a desired audio signal. Audio noise reduction in the general sense refers primarily to removing or attenuating noise into a desired signal. At present, the audio noise reduction mode is mainly divided into a traditional noise reduction algorithm and a noise reduction algorithm based on a neural network. The traditional noise reduction algorithm mainly refers to algorithms such as spectral subtraction and wiener filtering, and often depends on the additive property of background noise or the statistical properties of audio signals and noise signals, and the performance of the traditional noise reduction algorithm cannot meet the actual requirements for unexpected noise types such as sudden noise and the like in the actual environment. Therefore, in consideration of the complexity of actual noise, the noise reduction algorithm based on the neural network is rapidly developed, and the noise reduction algorithm has obvious advantages in the environments of low signal-to-noise ratio, non-stationary noise and the like.
The audio signal without noise is used as a sample to be trained, so that when the trained neural network model receives the signal with noise, the audio signal in the signal with noise is extracted to complete the audio noise reduction function. However, in the case of a low SNR, for example, the overall SNR is 0dB, the local SNR corresponding to the low energy part of the audio signal will be lower than 0 dB. Under the condition of extremely low signal-to-noise ratio, because the energy of a noise signal is far greater than that of an audio signal, the deep neural network is extremely difficult to accurately learn the part of the signal, and the audio signal cannot be accurately extracted.
In view of the above, the present disclosure provides an audio noise reduction method for solving the above problems, which includes performing noise extraction processing on an audio signal to be processed to obtain a noise signal, and performing noise reduction on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and then, carrying out audio signal extraction processing on the audio signal to be processed after noise reduction to obtain a target audio signal. Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
For the convenience of understanding, the technical solutions provided by the present disclosure are further described below with reference to the accompanying drawings.
Fig. 1 is a flow chart illustrating a method of audio noise reduction according to an exemplary embodiment, as shown in fig. 1, including the following steps.
In step S11, an audio signal to be processed is acquired.
Wherein, the audio signal to be processed is a signal with noise frequency.
In step S12, the audio signal to be processed is input into a noise extraction model, so as to obtain a noise signal in the audio signal to be processed.
The noise signal may be divided into various types, and the types of the noise signal are divided according to the time domain waveform.
In the embodiment of the present disclosure, the noise extraction processing may be performed on the audio signal to be processed through a neural network model. In which a neural network model for extracting noise can be trained by the following method. The method specifically comprises the following steps of A1-A4:
step A1: sample data is acquired, each sample data including a first noisy frequency signal and a first noise signal.
Wherein the first noisy frequency signal and the first noise signal in each sample correspond to each other.
Step A2: and inputting the first band noise frequency signal into a noise extraction model to obtain a second noise signal.
Step A3: calculating a loss function of the noise extraction model from the first noise signal and the second noise signal.
Step A4: and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
In the embodiment of the present disclosure, the first noisy audio signal is input to the noise extraction model for training, and a parameter in the noise extraction model is adjusted according to a loss function between the input result and the first noise signal, so that the loss function is within a predetermined range. In this way, the neural network model that extracts the noise is trained. When noise extraction is carried out, the audio signal to be processed is input into the trained noise extraction model, so that the noise signal of the audio signal to be processed can be obtained.
In the embodiment of the present disclosure, the noise reduction algorithm based on the neural network may be divided into a time domain algorithm (directly estimating a time domain waveform as an estimation signal) and a time-frequency domain algorithm (performing STFT (short time fourier transform) on a signal to transform the signal to a time-frequency domain, performing estimation operation, and obtaining an estimation signal from ISTFT (inverse short time fourier transform) to a time domain according to a difference of an estimation target domain).
In the embodiment of the present disclosure, if the time domain processing is performed on the to-be-processed noisy audio signal, the to-be-processed audio signal is obtained; and if the to-be-processed signal with the noise frequency is subjected to time-frequency domain processing, obtaining the to-be-processed audio signal. In this way, if the obtained audio signal to be processed is a signal in the time domain, inputting the audio signal to be processed into a noise extraction model in the time domain to obtain a noise signal in the audio signal to be processed; the time domain noise extraction model is obtained by training with a noise signal as a sample. If the obtained audio signal to be processed is a signal in a time-frequency domain, inputting the audio signal to be processed into a noise extraction model in the time-frequency domain to obtain a noise signal in the audio signal to be processed; the noise extraction model of the time-frequency domain is obtained by training with a noise signal as a sample.
In the time domain, Wave-Unet (Wave) and Conv-TasNet (time domain separation network) algorithms can be used as the algorithms for extracting signals in the time domain noise extraction model, and other algorithms can be used, which is not limited in this disclosure.
In the time-frequency domain, the Unet (segmentation) and the phanen (number of phases) algorithm may be used as the algorithm for extracting the signal in the noise extraction model of the time-frequency domain, and of course, other algorithms may also be used, which is not limited in this disclosure.
Therefore, the audio signal to be processed, whether the audio signal is a signal in a time domain or a signal in a time-frequency domain, can be extracted by the method disclosed by the invention, so that the noise extraction mode is more flexible.
Of course, if the obtained audio signal to be processed is a signal in the time domain, the audio signal to be processed may also be subjected to noise processing in the time-frequency domain. The audio signal to be processed in the time domain can be obtained only by performing STFT on the audio signal to be processed in the time domain, so that the conversion is completed.
In step S13, performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; and the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed.
In the embodiment of the present disclosure, after the noise signal is extracted, the noise signal needs to be removed from the audio signal to be processed. Namely, the audio signal to be processed is subjected to primary denoising processing to obtain a denoised audio signal to be processed. Therefore, the signal-to-noise ratio of the audio signal to be processed after noise reduction can be improved, and the audio signal can be extracted later.
In step S14, the noise-reduced audio signal to be processed is input to a signal extraction model, so as to obtain a target audio signal of the audio signal to be processed.
In the embodiment of the present disclosure, the noise-reduced audio signal to be processed may also be processed by extracting the audio signal through the neural network model. The signal extraction model and the noise extraction model are different in structure only in training samples. The neural network model that extracts noise can be trained by the following method. The method specifically comprises the following steps B1-B4:
step B1: sample data are obtained, and each sample data comprises a second noise-carrying frequency signal and a first voice signal.
The first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy signal is obtained by the following method:
and inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and carrying out noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal.
Step B2: and inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal.
Step B3: a loss function of the signal extraction model is calculated from the first speech signal and the second speech signal.
Step B4: and adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
Therefore, the audio signal to be processed after noise reduction is input into the neural network model, and the audio signal can be obtained.
Also, in the embodiment of the present disclosure, the method of extracting the audio signal according to the difference of the estimation target domain by the neural network based noise reduction algorithm may also be performed in the time domain and the time-frequency domain. Specifically, the method comprises the following steps:
if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target speech signal in the time domain.
As shown in fig. 2, it is an overall flow chart of the first audio noise reduction. Wherein S (t) represents the audio signal to be processed in the time domain, S noise (t) represents the noise signal in the time domain, S noise (t) represents the noise-reduced audio signal to be processed in the time domain, and S audio (t) represents the audio signal in the time domain. S denoise (t) — S (t) -S denoise (t), where t represents time.
If the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; then, performing short-time Fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
As shown in fig. 3, it is an overall flow chart of the second audio noise reduction. Wherein S denoise (n, k) represents the denoised audio signal to be processed in the time-frequency domain, and S audio (n, k) represents the audio signal in the time-frequency domain. Where n denotes a frame and k denotes a frequency.
If the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
As shown in fig. 4, it is an overall flow chart of the third audio noise reduction. Where S (n, k) represents the audio signal to be processed in the time-frequency domain, and S noise (n, k) represents the noise signal in the time-frequency domain. S denoise (n, k) -S denoise (n, k).
If the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
As shown in fig. 5, it is an overall flow chart of a fourth audio denoising method. Wherein S (n, k) represents the audio signal to be processed in the time-frequency domain, and S noise (n, k) represents the noise signal in the time-frequency domain. S denoise (n, k) represents the denoised audio signal to be processed in the time-frequency domain, and S audio (n, k) represents the audio signal in the time-frequency domain.
Therefore, even if the front and the back neural network models are in different target domains, the audio signal to be processed can still be processed, and the audio signal is extracted and obtained.
Therefore, the signal-to-noise ratio of the audio signal to be processed can be improved by extracting the noise, and the audio signal can be accurately extracted by extracting the audio signal, so that the accuracy of the extracted audio signal is improved.
Based on the same inventive concept, the present disclosure also provides an audio noise reduction device. As shown in fig. 6, a schematic diagram of an audio noise reduction apparatus provided by the present disclosure is shown. The device includes:
an acquisition signal unit 601 configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit 602 configured to perform inputting the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
A noise reduction unit 603 configured to perform noise reduction processing on the audio signal to be processed according to the noise signal, so as to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
a signal extraction unit 604 configured to perform inputting the noise-reduced audio signal to be processed into a signal extraction model, so as to obtain a target audio signal of the audio signal to be processed.
In one possible implementation, the noise extraction model is obtained by training:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal extraction model is obtained by training:
a second obtaining sample unit configured to perform obtaining sample data, each sample data including a second noisy audio signal and a first voice signal, wherein the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy audio signal is obtained by: inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
a third noise extraction unit configured to perform inputting the second noisy audio signal into a signal extraction model, resulting in a second speech signal;
a second calculation unit configured to perform calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
and the second determination model unit is configured to adjust the parameter information of the signal extraction model according to the loss function until the signal extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
In one possible implementation, the signal acquiring unit 601 includes:
the first acquisition signal subunit is configured to perform time domain processing on a to-be-processed noisy audio signal to obtain the to-be-processed audio signal;
a second obtaining signal subunit configured to perform time-frequency domain processing on the to-be-processed noisy frequency signal to obtain the to-be-processed audio signal.
In a possible implementation manner, if the audio signal to be processed is a time domain signal, the noise-reduced audio signal to be processed is a time domain signal; the signal extraction unit 604 includes:
a first signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time domain to a signal extraction model, resulting in the target speech signal in the time domain; or;
the second signal extraction subunit is configured to perform short-time fourier transform on the noise-reduced audio signal to be processed in the time domain to obtain the noise-reduced audio signal to be processed in the time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
In a possible implementation manner, if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal; the signal extraction unit 604 includes:
a third signal extraction subunit, configured to perform input of the noise-reduced audio signal to be processed in a time-frequency domain to a signal extraction model, so as to obtain the target speech signal in the time-frequency domain; or;
the fourth signal extraction subunit is configured to perform inverse short-time fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
As shown in fig. 7, based on the same technical concept, the embodiment of the present disclosure also provides an electronic device 70, which may include a memory 701 and a processor 702.
The memory 701 is used for storing a computer program executed by the processor 702. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the task management device, and the like. The processor 702 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 701 and the processor 702 is not limited in the embodiments of the present disclosure. In fig. 7, the memory 701 and the processor 702 are connected by a bus 703, the bus 703 is represented by a thick line in fig. 7, and the connection manner between other components is merely illustrative and not limited. The bus 703 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The memory 701 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 701 may also be a non-volatile memory (non-volatile) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 701 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Memory 701 may be a combination of the above.
A processor 702 for executing the method performed by the apparatus in the embodiment shown in fig. 1 when invoking the computer program stored in said memory 701.
In some possible embodiments, various aspects of the methods provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the methods according to various exemplary embodiments of the present disclosure described above in this specification when the program product is run on the computer device, for example, the computer device may perform the methods performed by the devices in the embodiments shown in fig. 1-5.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for audio noise reduction, the method comprising:
acquiring an audio signal to be processed;
inputting the audio signal to be processed into a noise extraction model to obtain a noise signal in the audio signal to be processed;
performing noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed.
2. The audio denoising method of claim 1, wherein the noise extraction model is trained by:
acquiring sample data, wherein each sample data comprises a first band noise frequency signal and a first noise signal;
inputting the first noise-carrying frequency signal into a noise extraction model to obtain a second noise signal;
Calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and adjusting the parameter information of the noise extraction model according to the loss function until the loss function is smaller than a preset value, and taking the noise extraction model corresponding to the parameter as a trained model.
3. The audio noise reduction method according to claim 2, wherein the signal extraction model is trained by:
obtaining sample data, wherein each sample data comprises a second noisy frequency signal and a first voice signal, the first voice signal is a noiseless voice signal corresponding to the first noisy signal, and the second noisy frequency signal is obtained by the following method:
inputting the first frequency-band noise signal into a noise extraction model to obtain a third noise signal, and performing noise reduction processing on the first frequency-band noise signal according to the third noise signal to obtain a second frequency-band noise signal;
inputting the second band noise frequency signal into a signal extraction model to obtain a second voice signal;
calculating a loss function of the signal extraction model from the first speech signal and the second speech signal;
And adjusting the parameter information of the signal extraction model according to the loss function until the loss function is smaller than a preset value, and taking the signal extraction model corresponding to the parameter as a trained model.
4. The audio noise reduction method according to any of claims 1 to 3, wherein the obtaining the audio signal to be processed comprises:
carrying out time domain processing on the to-be-processed noise frequency signal to obtain the to-be-processed audio signal; or the like, or, alternatively,
and carrying out time-frequency domain processing on the to-be-processed signal with the noise frequency to obtain the to-be-processed audio signal.
5. The audio noise reduction method according to claim 4, wherein if the audio signal to be processed is a time-domain signal, the noise-reduced audio signal to be processed is a time-domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain; or the like, or, alternatively,
carrying out short-time Fourier transform on the noise-reduced audio signal to be processed in a time domain to obtain the noise-reduced audio signal to be processed in a time-frequency domain; and inputting the audio signal to be processed after the noise reduction in the time-frequency domain into a signal extraction model to obtain the target speech signal in the time-frequency domain.
6. The audio noise reduction method according to claim 4, wherein if the audio signal to be processed is a time-frequency domain signal, the noise-reduced audio signal to be processed is a time-frequency domain signal;
inputting the noise-reduced audio signal to be processed into a signal extraction model to obtain a target audio signal of the audio signal to be processed, wherein the method comprises the following steps:
inputting the noise-reduced audio signal to be processed on the time-frequency domain into a signal extraction model to obtain the target voice signal on the time-frequency domain; or the like, or, alternatively,
performing reverse short-time Fourier transform on the noise-reduced audio signal to be processed in a time-frequency domain to obtain the noise-reduced audio signal to be processed in a time domain; and inputting the noise-reduced audio signal to be processed in the time domain into a signal extraction model to obtain the target voice signal in the time domain.
7. An audio noise reduction apparatus, characterized in that the apparatus comprises:
an acquisition signal unit configured to perform acquisition of an audio signal to be processed;
a first noise extraction unit configured to perform input of the audio signal to be processed into a noise extraction model, resulting in a noise signal in the audio signal to be processed;
The noise reduction unit is configured to perform noise reduction processing on the audio signal to be processed according to the noise signal to obtain a noise-reduced audio signal to be processed; the signal-to-noise ratio of the noise-reduced audio signal to be processed is higher than that of the audio signal to be processed;
and the signal extraction unit is configured to input the noise-reduced audio signal to be processed into a signal extraction model to obtain a target voice signal of the audio signal to be processed.
8. The audio noise reduction device of claim 7, wherein the noise extraction model is trained by:
a first acquisition sample unit configured to perform acquisition of sample data, each of the sample data including a first noisy frequency signal and a first noise signal;
a second noise extraction unit configured to perform inputting the first noisy audio signal into a noise extraction model, resulting in a second noise signal;
a first calculation unit configured to perform calculating a loss function of the noise extraction model from the first noise signal and the second noise signal;
and the first determination model unit is configured to adjust the parameter information of the noise extraction model according to the loss function until the noise extraction model corresponding to the parameter is taken as a trained model when the loss function is smaller than a preset value.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio noise reduction method of any of claims 1 to 6.
10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the audio noise reduction method of any of claims 1 to 6.
CN202010645246.XA 2020-07-07 2020-07-07 Audio noise reduction method and device and storage medium Pending CN111863008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010645246.XA CN111863008A (en) 2020-07-07 2020-07-07 Audio noise reduction method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010645246.XA CN111863008A (en) 2020-07-07 2020-07-07 Audio noise reduction method and device and storage medium

Publications (1)

Publication Number Publication Date
CN111863008A true CN111863008A (en) 2020-10-30

Family

ID=73153442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010645246.XA Pending CN111863008A (en) 2020-07-07 2020-07-07 Audio noise reduction method and device and storage medium

Country Status (1)

Country Link
CN (1) CN111863008A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112447183A (en) * 2020-11-16 2021-03-05 北京达佳互联信息技术有限公司 Training method and device for audio processing model, audio denoising method and device, and electronic equipment
CN112786066A (en) * 2020-12-24 2021-05-11 北京猿力未来科技有限公司 Audio signal screening method and device and electronic equipment
CN113259801A (en) * 2021-05-08 2021-08-13 深圳市睿耳电子有限公司 Loudspeaker noise reduction method of intelligent earphone and related device
CN113593598A (en) * 2021-08-09 2021-11-02 深圳远虑科技有限公司 Noise reduction method and device of audio amplifier in standby state and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841500A (en) * 2005-03-30 2006-10-04 松下电器产业株式会社 Method and apparatus for resisting noise based on adaptive nonlinear spectral subtraction
US8983844B1 (en) * 2012-07-31 2015-03-17 Amazon Technologies, Inc. Transmission of noise parameters for improving automatic speech recognition
CN109215665A (en) * 2018-07-20 2019-01-15 广东工业大学 A kind of method for recognizing sound-groove based on 3D convolutional neural networks
CN110491404A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method of speech processing, device, terminal device and storage medium
CN110827847A (en) * 2019-11-27 2020-02-21 高小翎 Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth
US20200211580A1 (en) * 2018-12-27 2020-07-02 Lg Electronics Inc. Apparatus for noise canceling and method for the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841500A (en) * 2005-03-30 2006-10-04 松下电器产业株式会社 Method and apparatus for resisting noise based on adaptive nonlinear spectral subtraction
US8983844B1 (en) * 2012-07-31 2015-03-17 Amazon Technologies, Inc. Transmission of noise parameters for improving automatic speech recognition
CN109215665A (en) * 2018-07-20 2019-01-15 广东工业大学 A kind of method for recognizing sound-groove based on 3D convolutional neural networks
US20200211580A1 (en) * 2018-12-27 2020-07-02 Lg Electronics Inc. Apparatus for noise canceling and method for the same
CN110491404A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method of speech processing, device, terminal device and storage medium
CN110827847A (en) * 2019-11-27 2020-02-21 高小翎 Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112447183A (en) * 2020-11-16 2021-03-05 北京达佳互联信息技术有限公司 Training method and device for audio processing model, audio denoising method and device, and electronic equipment
CN112786066A (en) * 2020-12-24 2021-05-11 北京猿力未来科技有限公司 Audio signal screening method and device and electronic equipment
CN112786066B (en) * 2020-12-24 2023-03-14 北京猿力未来科技有限公司 Audio signal screening method and device and electronic equipment
CN113259801A (en) * 2021-05-08 2021-08-13 深圳市睿耳电子有限公司 Loudspeaker noise reduction method of intelligent earphone and related device
CN113593598A (en) * 2021-08-09 2021-11-02 深圳远虑科技有限公司 Noise reduction method and device of audio amplifier in standby state and electronic equipment
CN113593598B (en) * 2021-08-09 2024-04-12 深圳远虑科技有限公司 Noise reduction method and device for audio amplifier in standby state and electronic equipment

Similar Documents

Publication Publication Date Title
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
JP5666444B2 (en) Apparatus and method for processing an audio signal for speech enhancement using feature extraction
Abd El-Fattah et al. Speech enhancement with an adaptive Wiener filter
CN111863008A (en) Audio noise reduction method and device and storage medium
KR101141033B1 (en) Noise variance estimator for speech enhancement
EP2031583B1 (en) Fast estimation of spectral noise power density for speech signal enhancement
CN101031963B (en) Method of processing a noisy sound signal and device for implementing said method
CN112700786B (en) Speech enhancement method, device, electronic equipment and storage medium
JP5752324B2 (en) Single channel suppression of impulsive interference in noisy speech signals.
EP4189677B1 (en) Noise reduction using machine learning
JP6748304B2 (en) Signal processing device using neural network, signal processing method using neural network, and signal processing program
CN112599148A (en) Voice recognition method and device
CN113571076A (en) Signal processing method, signal processing device, electronic equipment and storage medium
CN113035216B (en) Microphone array voice enhancement method and related equipment
Hammam et al. Blind signal separation with noise reduction for efficient speaker identification
CN113314147B (en) Training method and device of audio processing model, audio processing method and device
CN115497492A (en) Real-time voice enhancement method based on full convolution neural network
CN110648681B (en) Speech enhancement method, device, electronic equipment and computer readable storage medium
CN114360572A (en) Voice denoising method and device, electronic equipment and storage medium
CN113593599A (en) Method for removing noise signal in voice signal
Funaki Speech enhancement based on iterative wiener filter using complex speech analysis
Seyedin et al. New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition
Lu et al. Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition
CN115985337B (en) Transient noise detection and suppression method and device based on single microphone
Islam et al. Enhancement of noisy speech based on decision-directed Wiener approach in perceptual wavelet packet domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination