CN113470684B - Audio noise reduction method, device, equipment and storage medium - Google Patents

Audio noise reduction method, device, equipment and storage medium Download PDF

Info

Publication number
CN113470684B
CN113470684B CN202110837937.4A CN202110837937A CN113470684B CN 113470684 B CN113470684 B CN 113470684B CN 202110837937 A CN202110837937 A CN 202110837937A CN 113470684 B CN113470684 B CN 113470684B
Authority
CN
China
Prior art keywords
audio
information
frequency
time
noise reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110837937.4A
Other languages
Chinese (zh)
Other versions
CN113470684A (en
Inventor
张之勇
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110837937.4A priority Critical patent/CN113470684B/en
Publication of CN113470684A publication Critical patent/CN113470684A/en
Application granted granted Critical
Publication of CN113470684B publication Critical patent/CN113470684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention relates to artificial intelligence and provides an audio noise reduction method, device, equipment and storage medium. The method can preprocess the noisy frequency to obtain frequency spectrum information, process the frequency spectrum information based on a frequency domain signal processing network to obtain frequency spectrum mask characteristics, obtain time-frequency characteristics according to the frequency spectrum information and the frequency spectrum mask characteristics, process the time-frequency characteristics based on a time domain signal processing network to obtain the time-frequency mask characteristics, generate predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics, adjust network parameters of a preset learner based on the predicted audio and pure audio to obtain a noise reduction model, obtain request audio, and perform noise reduction processing on the request audio based on the noise reduction model to obtain target audio. The invention can improve the noise reduction accuracy and the real-time performance of the request audio. Furthermore, the present invention also relates to blockchain techniques in which the target audio may be stored.

Description

Audio noise reduction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an audio noise reduction method, apparatus, device, and storage medium.
Background
In teleconferences such as tele-office calls, there is a high demand for real-time and accuracy of audio noise reduction, however, in the current noise reduction method, the information on the frame level is usually processed in the complete speech sequence, resulting in low noise reduction efficiency.
Therefore, how to improve the real-time performance and accuracy of the audio noise reduction is a technical problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an audio noise reduction method, apparatus, device, and storage medium, which can improve the noise reduction accuracy and the noise reduction instantaneity of the requested audio.
In one aspect, the present invention proposes an audio noise reduction method, including:
acquiring an audio sample, and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
preprocessing the noise-carrying frequency to obtain spectrum information;
processing the spectrum information based on the frequency domain signal processing network to obtain spectrum mask characteristics corresponding to the spectrum information;
acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics;
Processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
generating predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics;
adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
and acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
According to a preferred embodiment of the present invention, the acquiring an audio sample includes:
counting the audio duration of the pure audio;
acquiring audio with the duration less than or equal to the audio duration from a recording library to obtain a plurality of recorded audio;
performing arbitrary synthesis processing on the pure audio and each recorded audio to obtain a plurality of noisy audio;
a plurality of the noisy audio and the clean audio are determined as the audio samples.
According to a preferred embodiment of the present invention, the preprocessing the noisy frequency to obtain the spectrum information includes:
acquiring a preset moving window function;
performing Fourier transform on the noisy frequency based on the preset moving window function to obtain a spectrogram;
acquiring a preset processing time length, and calculating the ratio of the audio time length to the preset processing time length;
And carrying out segmentation processing on the spectrogram according to the preset processing time length to obtain the frequency spectrum information, wherein the quantity of the frequency spectrum information is the same as the ratio.
According to a preferred embodiment of the present invention, the frequency domain signal processing network includes a gating neural network, a fully connected network, and an activation function, the gating neural network includes a reset gate and an update gate, and the processing the spectrum information based on the frequency domain signal processing network to obtain spectrum mask features corresponding to the spectrum information includes:
acquiring time sequence information of the frequency spectrum information, wherein the time sequence information comprises a first frequency spectrum at a first moment and a second frequency spectrum at a second moment;
analyzing the first frequency spectrum and the second frequency spectrum based on the reset parameters of the reset gate to obtain candidate information of the second moment;
calculating the information amount of the first frequency spectrum based on the update parameters in the update gate, the first frequency spectrum and the second frequency spectrum;
generating output information of the second moment according to the first frequency spectrum, the candidate information and the information quantity, determining the output information as the first frequency spectrum until the time sequence information participates in training, and obtaining a first network output of the gating neural network;
Analyzing the network output according to the weight matrix and the bias value in the fully connected network to obtain a second network output;
and processing the second network output based on the activation function to obtain the spectrum mask feature.
According to a preferred embodiment of the present invention, the obtaining the time-frequency characteristic of the noisy audio according to the spectral information and the spectral mask characteristic includes:
calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information;
calculating the product of the amplitude information, the phase information and the spectrum mask characteristics to obtain a predicted spectrum;
performing inverse Fourier transform processing on the predicted spectrum to obtain a predicted time frequency;
and extracting the characteristics in the predicted time frequency based on a first preset convolution layer to obtain the time frequency characteristics.
According to a preferred embodiment of the present invention, the generating the predicted audio according to the time-frequency features and the time-frequency mask features includes:
calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic;
performing up-sampling processing on the enhancement features based on a second preset convolution layer to obtain a restored signal;
acquiring initial information of the restoring signal on each time sequence;
If the number of the initial information on any time sequence is multiple, calculating the average value of the initial information on any time sequence to obtain overlapped information on any time sequence;
generating prediction information according to the initial information and the overlapping information;
and converting the prediction information to obtain the prediction audio.
According to a preferred embodiment of the present invention, the adjusting the network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model includes:
acquiring first time domain information of the pure audio and second time domain information of the predicted audio;
calculating the loss value of the preset learner according to the following formula:
wherein loss refers to the loss value, y t Refers to the first time domain information of the first time domain,refers to the second time domain information;
and adjusting the network parameters according to the loss value until the loss value is not reduced, so as to obtain the noise reduction model.
In another aspect, the present invention also provides an audio noise reduction device, including:
the acquisition unit is used for acquiring an audio sample and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
The preprocessing unit is used for preprocessing the noise-carrying frequency to obtain frequency spectrum information;
the processing unit is used for processing the frequency spectrum information based on the frequency domain signal processing network to obtain frequency spectrum mask characteristics corresponding to the frequency spectrum information;
the acquisition unit is further used for acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics;
the processing unit is further used for processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
the generating unit is used for generating prediction audio according to the time-frequency characteristics and the time-frequency mask characteristics;
the adjusting unit is used for adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
the acquisition unit is further used for acquiring the request audio, and performing noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
In another aspect, the present invention also proposes an electronic device, including:
a memory storing computer readable instructions; a kind of electronic device with high-pressure air-conditioning system
And a processor executing computer readable instructions stored in the memory to implement the audio noise reduction method.
In another aspect, the present invention also proposes a computer readable storage medium having stored therein computer readable instructions that are executed by a processor in an electronic device to implement the audio noise reduction method.
According to the technical scheme, the whole noisy frequency can be converted into the frequency spectrum information by preprocessing the noisy frequency, so that the processing efficiency of the frequency spectrum information can be improved, the noise reduction efficiency of the noisy audio can be improved, the noise reduction of the noisy audio on a frequency domain can be realized through the frequency domain signal processing network, and the enhancement of the phase information of a target sound source on a time frequency can be realized through the time frequency signal processing network, so that the double noise reduction on the frequency domain and the time frequency is realized, the noise reduction accuracy of the noise reduction model is improved, and the voice enhancement effect of the target audio is further improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the audio noise reduction method of the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of the audio noise reduction device of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing an audio noise reduction method.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a preferred embodiment of the audio noise reduction method of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The audio noise reduction method is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored computer readable instructions, and the hardware comprises, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGA), digital signal processors (Digital Signal Processor, DSP), embedded devices and the like.
The electronic device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.
The electronic device may comprise a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, a group of electronic devices made up of multiple network electronic devices, or a Cloud based Cloud Computing (Cloud Computing) made up of a large number of hosts or network electronic devices.
The network on which the electronic device is located includes, but is not limited to: the internet, wide area networks, metropolitan area networks, local area networks, virtual private networks (Virtual Private Network, VPN), etc.
S10, acquiring an audio sample, and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network.
In at least one embodiment of the present invention, the noisy audio refers to audio containing noise information, and the noisy audio is synthesized from the clean audio and the recorded audio.
The clean audio refers to audio that does not contain noise information.
The frequency domain signal processing network is a network for removing noise information from the frequency domain of the noisy audio.
The time domain signal processing network is a network for removing noise information from the time domain of the noisy audio.
In at least one embodiment of the invention, the electronic device obtaining an audio sample comprises:
counting the audio duration of the pure audio;
acquiring audio with the duration less than or equal to the audio duration from a recording library to obtain a plurality of recorded audio;
performing arbitrary synthesis processing on the pure audio and each recorded audio to obtain a plurality of noisy audio;
a plurality of the noisy audio and the clean audio are determined as the audio samples.
Wherein, the audio time length refers to the total time length of the pure audio.
And the recording library stores mapping relations of a plurality of audios and time lengths.
The duration of the plurality of recorded audio is less than or equal to the duration of the audio, and the plurality of recorded audio can be background sounds such as whistling sounds.
The plurality of recorded audios can be obtained through the audio duration, so that the duration of the synthesized noisy audio is ensured to be the same as the audio duration of the pure audio.
S11, preprocessing the noise-carrying frequency to obtain frequency spectrum information.
In at least one embodiment of the present invention, the spectral information refers to information of the noisy audio in the frequency domain.
In at least one embodiment of the present invention, the electronic device pre-processes the noisy frequency to obtain spectral information includes:
acquiring a preset moving window function;
performing Fourier transform on the noisy frequency based on the preset moving window function to obtain a spectrogram;
acquiring a preset processing time length, and calculating the ratio of the audio time length to the preset processing time length;
and carrying out segmentation processing on the spectrogram according to the preset processing time length to obtain the frequency spectrum information, wherein the quantity of the frequency spectrum information is the same as the ratio.
The preset moving window function can be set according to requirements, and the preset moving window function can enable the band noise frequency to output a stable signal within a limited time width.
The spectrogram refers to the mapping relation of the noisy audio frequency on time-energy.
The preset processing time length is set according to the noise reduction efficiency requirement.
The Fourier transform is carried out on the noisy frequency through the preset moving window function, so that the generated spectrogram is more stable, and the subsequent parallel processing of the frequency spectrum information can be facilitated through the segmentation processing of the spectrogram, so that the noise reduction efficiency of the noisy frequency is improved.
And S12, processing the frequency spectrum information based on the frequency domain signal processing network to obtain frequency spectrum mask characteristics corresponding to the frequency spectrum information.
In at least one embodiment of the invention, the spectral mask features are used to mask noise information of the noisy audio in the frequency domain. The spectrum information corresponds to corresponding spectrum mask characteristics.
In at least one embodiment of the present invention, the frequency domain signal processing network includes a gating neural network, a fully connected network, and an activation function, the gating neural network includes a reset gate and an update gate, and the electronic device processes the spectrum information based on the frequency domain signal processing network, and obtaining a spectrum mask feature corresponding to the spectrum information includes:
acquiring time sequence information of the frequency spectrum information, wherein the time sequence information comprises a first frequency spectrum at a first moment and a second frequency spectrum at a second moment;
analyzing the first frequency spectrum and the second frequency spectrum based on the reset parameters of the reset gate to obtain candidate information of the second moment;
calculating the information amount of the first frequency spectrum based on the update parameters in the update gate, the first frequency spectrum and the second frequency spectrum;
generating output information of the second moment according to the first frequency spectrum, the candidate information and the information quantity, determining the output information as the first frequency spectrum until the time sequence information participates in training, and obtaining a first network output of the gating neural network;
Analyzing the network output according to the weight matrix and the bias value in the fully connected network to obtain a second network output;
and processing the second network output based on the activation function to obtain the spectrum mask feature.
The reset parameter, the update parameter, the weight matrix and the bias value are network parameters which are initialized and set in the preset learner.
The information amount refers to the information amount of the first spectrum reserved in the second time.
The activation function is typically set as a sigmoid function.
The time sequence information is analyzed through the gating neural network, so that the problems of gradient disappearance and gradient explosion can be solved, and the accuracy of the spectrum mask characteristics can be improved.
S13, acquiring the time-frequency characteristic of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristic.
In at least one embodiment of the present invention, the time-frequency characteristic refers to a characteristic of the noisy audio over time-frequency.
In at least one embodiment of the present invention, the electronic device obtaining the time-frequency characteristic of the noisy audio according to the spectral information and the spectral mask characteristic includes:
Calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information;
calculating the product of the amplitude information, the phase information and the spectrum mask characteristics to obtain a predicted spectrum;
performing inverse Fourier transform processing on the predicted spectrum to obtain a predicted time frequency;
and extracting the characteristics in the predicted time frequency based on a first preset convolution layer to obtain the time frequency characteristics.
Wherein the convolution kernel size of the first predetermined convolution layer is typically set to 1*1.
Noise information in the noisy audio can be accurately removed through the spectrum mask features, accuracy of the predicted spectrum is improved, and then the time-frequency features can be accurately extracted according to the convolution layer.
S14, processing the time-frequency characteristics based on the time-domain signal processing network to obtain time-frequency mask characteristics.
In at least one embodiment of the invention, the time-frequency mask feature is used to mask noise information of the noisy audio in the time domain.
In at least one embodiment of the present invention, the time domain signal processing network includes an instantaneous normalization layer, a gated loop unit layer, a fully connected layer, and an activation function. And the electronic equipment processes the time-frequency characteristic based on the instantaneous normalization layer, the gating circulating unit layer, the full-connection layer and the activation function to obtain the time-frequency mask characteristic.
In at least one embodiment of the present invention, a manner in which the electronic device processes the time-frequency characteristic based on the time-domain signal processing network is similar to a manner in which the electronic device processes the spectrum information based on the frequency-domain signal processing network, which is not described in detail herein.
S15, generating prediction audio according to the time-frequency characteristics and the time-frequency mask characteristics.
In at least one embodiment of the present invention, the predicted audio is audio obtained by the preset learner after noise reduction processing on the noisy audio in a frequency domain and a time domain.
In at least one embodiment of the present invention, the generating, by the electronic device, predicted audio according to the time-frequency features and the time-frequency mask features includes:
calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic;
performing up-sampling processing on the enhancement features based on a second preset convolution layer to obtain a restored signal;
acquiring initial information of the restoring signal on each time sequence;
if the number of the initial information on any time sequence is multiple, calculating the average value of the initial information on any time sequence to obtain overlapped information on any time sequence;
Generating prediction information according to the initial information and the overlapping information;
and converting the prediction information to obtain the prediction audio.
Wherein, the prediction information refers to information of the prediction audio in the time domain.
By the embodiment, the generated prediction information can be more gentle, so that the noise reduction effect of the prediction audio is improved.
Specifically, the electronic device generates prediction information according to the initial information and the overlapping information.
For example: the initial information at the first time sequence is n 1 The initial information at the second time sequence is n 2 、n 3 、n 4 The initial information at the third time sequence is n 5 If the initial information at the second time sequence is detected to be a plurality of, calculating the overlapping information at the second time sequence asThe prediction information can be generated as follows: n is n 1 、/>n 5
S16, adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model.
In at least one embodiment of the invention, the network parameters include initialization configuration parameters of the frequency domain signal processing network and the time domain signal processing network.
The noise reduction model is used for eliminating noise information in the audio.
In at least one embodiment of the present invention, the electronic device adjusting network parameters of the preset learner based on the predicted audio and the clean audio, and obtaining the noise reduction model includes:
acquiring first time domain information of the pure audio and second time domain information of the predicted audio;
calculating the loss value of the preset learner according to the following formula:
wherein loss refers to the loss value, y t Refers to the first time domain information of the first time domain,refers to the second time domain information;
and adjusting the network parameters according to the loss value until the loss value is not reduced, so as to obtain the noise reduction model.
The accuracy of the loss value can be improved through the first time domain information and the second time domain information, so that the noise reduction accuracy of the noise reduction model can be ensured according to the loss value.
S17, acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
In at least one embodiment of the present invention, the requested audio refers to audio that requires noise reduction. The requested audio may be any audio received in real time.
The target audio is the audio obtained after the request audio is subjected to noise reduction. If the accuracy of the noise reduction model reaches 100%, the target audio does not contain any noise information.
It is emphasized that to further ensure the privacy and security of the target audio, the target audio may also be stored in a blockchain node.
In at least one embodiment of the present invention, a manner in which the electronic device performs the noise reduction processing on the request audio based on the noise reduction model is similar to a manner in which the electronic device performs the processing on the noisy audio based on the preset learner to obtain the predicted audio, which is not described in detail herein.
According to the technical scheme, the model loss value in the preset learner can be accurately determined through the pure audio and the decoded audio predicted by the preset learner for the noise frequency, so that the network parameters can be accurately adjusted according to the model loss value, and the enhancement effect of the audio noise reduction model is improved. In addition, the coding network is used for coding the noise-carrying frequency, and because the audio coding information contains phase information in each voice time sequence state, the enhancement effect of the audio noise reduction model can be improved, and the enhancement effect of the target audio is improved.
FIG. 2 is a functional block diagram of a preferred embodiment of the audio noise reduction device of the present invention. The audio noise reduction device 11 includes an acquisition unit 110, a preprocessing unit 111, a processing unit 112, a generation unit 113, and an adjustment unit 114. The module/unit referred to herein is a series of computer readable instructions capable of being retrieved by the processor 13 and performing a fixed function and stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
The obtaining unit 110 obtains an audio sample, and obtains a preset learner, where the audio sample includes noisy audio and clean audio, and the preset learner includes a frequency domain signal processing network and a time domain signal processing network.
In at least one embodiment of the present invention, the noisy audio refers to audio containing noise information, and the noisy audio is synthesized from the clean audio and the recorded audio.
The clean audio refers to audio that does not contain noise information.
The frequency domain signal processing network is a network for removing noise information from the frequency domain of the noisy audio.
The time domain signal processing network is a network for removing noise information from the time domain of the noisy audio.
In at least one embodiment of the present invention, the acquiring unit 110 acquires an audio sample includes:
counting the audio duration of the pure audio;
acquiring audio with the duration less than or equal to the audio duration from a recording library to obtain a plurality of recorded audio;
performing arbitrary synthesis processing on the pure audio and each recorded audio to obtain a plurality of noisy audio;
a plurality of the noisy audio and the clean audio are determined as the audio samples.
Wherein, the audio time length refers to the total time length of the pure audio.
And the recording library stores mapping relations of a plurality of audios and time lengths.
The duration of the plurality of recorded audio is less than or equal to the duration of the audio, and the plurality of recorded audio can be background sounds such as whistling sounds.
The plurality of recorded audios can be obtained through the audio duration, so that the duration of the synthesized noisy audio is ensured to be the same as the audio duration of the pure audio.
The preprocessing unit 111 performs preprocessing on the noisy frequency to obtain spectrum information.
In at least one embodiment of the present invention, the spectral information refers to information of the noisy audio in the frequency domain.
In at least one embodiment of the present invention, the preprocessing unit 111 performs preprocessing on the noisy frequency, and obtaining the spectrum information includes:
acquiring a preset moving window function;
performing Fourier transform on the noisy frequency based on the preset moving window function to obtain a spectrogram;
acquiring a preset processing time length, and calculating the ratio of the audio time length to the preset processing time length;
and carrying out segmentation processing on the spectrogram according to the preset processing time length to obtain the frequency spectrum information, wherein the quantity of the frequency spectrum information is the same as the ratio.
The preset moving window function can be set according to requirements, and the preset moving window function can enable the band noise frequency to output a stable signal within a limited time width.
The spectrogram refers to the mapping relation of the noisy audio frequency on time-energy.
The preset processing time length is set according to the noise reduction efficiency requirement.
The Fourier transform is carried out on the noisy frequency through the preset moving window function, so that the generated spectrogram is more stable, and the subsequent parallel processing of the frequency spectrum information can be facilitated through the segmentation processing of the spectrogram, so that the noise reduction efficiency of the noisy frequency is improved.
The processing unit 112 processes the spectrum information based on the frequency domain signal processing network, and obtains spectrum mask features corresponding to the spectrum information.
In at least one embodiment of the invention, the spectral mask features are used to mask noise information of the noisy audio in the frequency domain. The spectrum information corresponds to corresponding spectrum mask characteristics.
In at least one embodiment of the present invention, the frequency domain signal processing network includes a gated neural network, a fully connected network, and an activation function, the gated neural network includes a reset gate and an update gate, and the processing unit 112 processes the spectrum information based on the frequency domain signal processing network, and obtaining spectrum mask features corresponding to the spectrum information includes:
acquiring time sequence information of the frequency spectrum information, wherein the time sequence information comprises a first frequency spectrum at a first moment and a second frequency spectrum at a second moment;
analyzing the first frequency spectrum and the second frequency spectrum based on the reset parameters of the reset gate to obtain candidate information of the second moment;
calculating the information amount of the first frequency spectrum based on the update parameters in the update gate, the first frequency spectrum and the second frequency spectrum;
Generating output information of the second moment according to the first frequency spectrum, the candidate information and the information quantity, determining the output information as the first frequency spectrum until the time sequence information participates in training, and obtaining a first network output of the gating neural network;
analyzing the network output according to the weight matrix and the bias value in the fully connected network to obtain a second network output;
and processing the second network output based on the activation function to obtain the spectrum mask feature.
The reset parameter, the update parameter, the weight matrix and the bias value are network parameters which are initialized and set in the preset learner.
The information amount refers to the information amount of the first spectrum reserved in the second time.
The activation function is typically set as a sigmoid function.
The time sequence information is analyzed through the gating neural network, so that the problems of gradient disappearance and gradient explosion can be solved, and the accuracy of the spectrum mask characteristics can be improved.
The obtaining unit 110 obtains the time-frequency characteristic of the noisy audio according to the spectrum information and the spectrum mask characteristic.
In at least one embodiment of the present invention, the time-frequency characteristic refers to a characteristic of the noisy audio over time-frequency.
In at least one embodiment of the present invention, the obtaining unit 110 obtains the time-frequency characteristic of the noisy audio according to the spectral information and the spectral mask characteristic includes:
calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information;
calculating the product of the amplitude information, the phase information and the spectrum mask characteristics to obtain a predicted spectrum;
performing inverse Fourier transform processing on the predicted spectrum to obtain a predicted time frequency;
and extracting the characteristics in the predicted time frequency based on a first preset convolution layer to obtain the time frequency characteristics.
Wherein the convolution kernel size of the first predetermined convolution layer is typically set to 1*1.
Noise information in the noisy audio can be accurately removed through the spectrum mask features, accuracy of the predicted spectrum is improved, and then the time-frequency features can be accurately extracted according to the convolution layer.
The processing unit 112 processes the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic.
In at least one embodiment of the invention, the time-frequency mask feature is used to mask noise information of the noisy audio in the time domain.
In at least one embodiment of the present invention, the time domain signal processing network includes an instantaneous normalization layer, a gated loop unit layer, a fully connected layer, and an activation function. The processing unit 112 processes the time-frequency characteristic based on the transient normalization layer, the gating cycle unit layer, the full connection layer and the activation function to obtain the time-frequency mask characteristic.
In at least one embodiment of the present invention, the manner in which the processing unit 112 processes the time-frequency characteristic based on the time-domain signal processing network is similar to the manner in which the processing unit 112 processes the spectrum information based on the frequency-domain signal processing network, which is not described in detail herein.
The generating unit 113 generates predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics.
In at least one embodiment of the present invention, the predicted audio is audio obtained by the preset learner after noise reduction processing on the noisy audio in a frequency domain and a time domain.
In at least one embodiment of the present invention, the generating unit 113 generates predicted audio according to the time-frequency characteristic and the time-frequency mask characteristic includes:
Calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic;
performing up-sampling processing on the enhancement features based on a second preset convolution layer to obtain a restored signal;
acquiring initial information of the restoring signal on each time sequence;
if the number of the initial information on any time sequence is multiple, calculating the average value of the initial information on any time sequence to obtain overlapped information on any time sequence;
generating prediction information according to the initial information and the overlapping information;
and converting the prediction information to obtain the prediction audio.
Wherein, the prediction information refers to information of the prediction audio in the time domain.
By the embodiment, the generated prediction information can be more gentle, so that the noise reduction effect of the prediction audio is improved.
Specifically, the generating unit 113 generates prediction information from the initial information and the superimposition information.
For example: the initial information at the first time sequence is n 1 The initial information at the second time sequence is n 2 、n 3 、n 4 The initial information at the third time sequence is n 5 If the initial information at the second time sequence is detected to be a plurality of, calculating the overlapping information at the second time sequence as The prediction information can be generated as follows: n is n 1 、/>n 5
The adjusting unit 114 adjusts the network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model.
In at least one embodiment of the invention, the network parameters include initialization configuration parameters of the frequency domain signal processing network and the time domain signal processing network.
The noise reduction model is used for eliminating noise information in the audio.
In at least one embodiment of the present invention, the adjusting unit 114 adjusts the network parameters of the preset learner based on the predicted audio and the clean audio, and obtaining the noise reduction model includes:
acquiring first time domain information of the pure audio and second time domain information of the predicted audio;
calculating the loss value of the preset learner according to the following formula:
wherein loss refers to the loss value, y t Refers to the first time domain information of the first time domain,refers to the second time domain information;
and adjusting the network parameters according to the loss value until the loss value is not reduced, so as to obtain the noise reduction model.
The accuracy of the loss value can be improved through the first time domain information and the second time domain information, so that the noise reduction accuracy of the noise reduction model can be ensured according to the loss value.
The obtaining unit 110 obtains the request audio, and performs noise reduction processing on the request audio based on the noise reduction model, so as to obtain the target audio.
In at least one embodiment of the present invention, the requested audio refers to audio that requires noise reduction. The requested audio may be any audio received in real time.
The target audio is the audio obtained after the request audio is subjected to noise reduction. If the accuracy of the noise reduction model reaches 100%, the target audio does not contain any noise information.
It is emphasized that to further ensure the privacy and security of the target audio, the target audio may also be stored in a blockchain node.
In at least one embodiment of the present invention, the manner in which the obtaining unit 110 performs the noise reduction processing on the request audio based on the noise reduction model is similar to the manner in which the prediction audio is obtained by processing the noisy audio based on the preset learner, which is not described in detail herein.
According to the technical scheme, the model loss value in the preset learner can be accurately determined through the pure audio and the decoded audio predicted by the preset learner for the noise frequency, so that the network parameters can be accurately adjusted according to the model loss value, and the enhancement effect of the audio noise reduction model is improved. In addition, the coding network is used for coding the noise-carrying frequency, and because the audio coding information contains phase information in each voice time sequence state, the enhancement effect of the audio noise reduction model can be improved, and the enhancement effect of the target audio is improved.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing the audio noise reduction method.
In one embodiment of the invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as an audio noise reduction program, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and may include more or less components than illustrated, or may combine certain components, or different components, e.g. the electronic device 1 may further include input-output devices, network access devices, buses, etc.
The processor 13 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 13 is an operation core and a control center of the electronic device 1, connects various parts of the entire electronic device 1 using various interfaces and lines, and executes an operating system of the electronic device 1 and various installed applications, program codes, etc.
Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instructions capable of performing a specific function, the computer readable instructions describing a process of executing the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be divided into an acquisition unit 110, a preprocessing unit 111, a processing unit 112, a generation unit 113, and an adjustment unit 114.
The memory 12 may be used to store the computer readable instructions and/or modules, and the processor 13 may implement various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. Memory 12 may include non-volatile and volatile memory, such as: a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other storage device.
The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), or the like.
The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may also be implemented by implementing all or part of the processes in the methods of the embodiments described above, by instructing the associated hardware by means of computer readable instructions, which may be stored in a computer readable storage medium, the computer readable instructions, when executed by a processor, implementing the steps of the respective method embodiments described above.
Wherein the computer readable instructions comprise computer readable instruction code which may be in the form of source code, object code, executable files, or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory).
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
In connection with fig. 1, the memory 12 in the electronic device 1 stores computer readable instructions implementing an audio noise reduction method, the processor 13 being executable to implement:
acquiring an audio sample, and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
preprocessing the noise-carrying frequency to obtain spectrum information;
processing the spectrum information based on the frequency domain signal processing network to obtain spectrum mask characteristics corresponding to the spectrum information;
Acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics;
processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
generating predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics;
adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
and acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
In particular, the specific implementation method of the processor 13 on the computer readable instructions may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The computer readable storage medium has stored thereon computer readable instructions, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:
Acquiring an audio sample, and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
preprocessing the noise-carrying frequency to obtain spectrum information;
processing the spectrum information based on the frequency domain signal processing network to obtain spectrum mask characteristics corresponding to the spectrum information;
acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics;
processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
generating predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics;
adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
and acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (8)

1. An audio noise reduction method, characterized in that the audio noise reduction method comprises:
acquiring an audio sample, and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
preprocessing the noise-carrying frequency to obtain spectrum information;
processing the spectrum information based on the frequency domain signal processing network to obtain spectrum mask characteristics corresponding to the spectrum information;
acquiring the time-frequency characteristic of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristic comprises the following steps: calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information; calculating the product of the amplitude information, the phase information and the spectrum mask characteristics to obtain a predicted spectrum; performing inverse Fourier transform processing on the predicted spectrum to obtain a predicted time frequency; extracting characteristics in the predicted time frequency based on a first preset convolution layer to obtain the time frequency characteristics;
processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
generating predicted audio according to the time-frequency characteristics and the time-frequency mask characteristics, including: calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic; performing up-sampling processing on the enhancement features based on a second preset convolution layer to obtain a restored signal; acquiring initial information of the restoring signal on each time sequence; if the number of the initial information on any time sequence is multiple, calculating the average value of the initial information on any time sequence to obtain overlapped information on any time sequence; generating prediction information according to the initial information and the overlapping information; converting the prediction information to obtain the prediction audio;
Adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
and acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
2. The audio noise reduction method of claim 1, wherein the obtaining audio samples comprises:
counting the audio duration of the pure audio;
acquiring audio with the duration less than or equal to the audio duration from a recording library to obtain a plurality of recorded audio;
performing arbitrary synthesis processing on the pure audio and each recorded audio to obtain a plurality of noisy audio;
a plurality of the noisy audio and the clean audio are determined as the audio samples.
3. The method of audio noise reduction according to claim 2, wherein the preprocessing the noisy frequency to obtain spectral information includes:
acquiring a preset moving window function;
performing Fourier transform on the noisy frequency based on the preset moving window function to obtain a spectrogram;
acquiring a preset processing time length, and calculating the ratio of the audio time length to the preset processing time length;
and carrying out segmentation processing on the spectrogram according to the preset processing time length to obtain the frequency spectrum information, wherein the quantity of the frequency spectrum information is the same as the ratio.
4. The audio noise reduction method of claim 1, wherein the frequency domain signal processing network comprises a gated neural network, a fully connected network, and an activation function, the gated neural network comprising a reset gate and an update gate, the processing the spectral information based on the frequency domain signal processing network to obtain spectral mask features corresponding to the spectral information comprising:
acquiring time sequence information of the frequency spectrum information, wherein the time sequence information comprises a first frequency spectrum at a first moment and a second frequency spectrum at a second moment;
analyzing the first frequency spectrum and the second frequency spectrum based on the reset parameters of the reset gate to obtain candidate information of the second moment;
calculating the information amount of the first frequency spectrum based on the update parameters in the update gate, the first frequency spectrum and the second frequency spectrum;
generating output information of the second moment according to the first frequency spectrum, the candidate information and the information quantity, determining the output information as the first frequency spectrum until the time sequence information participates in training, and obtaining a first network output of the gating neural network;
analyzing the network output according to the weight matrix and the bias value in the fully connected network to obtain a second network output;
And processing the second network output based on the activation function to obtain the spectrum mask feature.
5. The method of audio noise reduction according to claim 1, wherein said adjusting network parameters of the preset learner based on the predicted audio and the clean audio to obtain a noise reduction model comprises:
acquiring first time domain information of the pure audio and second time domain information of the predicted audio;
calculating the loss value of the preset learner according to the following formula:
wherein,refers to the loss value,/->Refers to the first time domain information,/or->Refers to the second time domain information;
and adjusting the network parameters according to the loss value until the loss value is not reduced, so as to obtain the noise reduction model.
6. An audio noise reduction device, characterized in that the audio noise reduction device comprises:
the acquisition unit is used for acquiring an audio sample and acquiring a preset learner, wherein the audio sample comprises noisy audio and clean audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;
the preprocessing unit is used for preprocessing the noise-carrying frequency to obtain frequency spectrum information;
The processing unit is used for processing the frequency spectrum information based on the frequency domain signal processing network to obtain frequency spectrum mask characteristics corresponding to the frequency spectrum information;
the obtaining unit is further configured to obtain a time-frequency characteristic of the noisy audio according to the spectrum information and the spectrum mask characteristic, and includes: calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information; calculating the product of the amplitude information, the phase information and the spectrum mask characteristics to obtain a predicted spectrum; performing inverse Fourier transform processing on the predicted spectrum to obtain a predicted time frequency; extracting characteristics in the predicted time frequency based on a first preset convolution layer to obtain the time frequency characteristics;
the processing unit is further used for processing the time-frequency characteristic based on the time-domain signal processing network to obtain a time-frequency mask characteristic;
the generating unit is configured to generate predicted audio according to the time-frequency characteristic and the time-frequency mask characteristic, and includes: calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic; performing up-sampling processing on the enhancement features based on a second preset convolution layer to obtain a restored signal; acquiring initial information of the restoring signal on each time sequence; if the number of the initial information on any time sequence is multiple, calculating the average value of the initial information on any time sequence to obtain overlapped information on any time sequence; generating prediction information according to the initial information and the overlapping information; converting the prediction information to obtain the prediction audio;
The adjusting unit is used for adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;
the acquisition unit is further used for acquiring the request audio, and performing noise reduction processing on the request audio based on the noise reduction model to obtain target audio.
7. An electronic device, the electronic device comprising:
a memory storing computer readable instructions; a kind of electronic device with high-pressure air-conditioning system
A processor executing computer readable instructions stored in the memory to implement the audio noise reduction method of any one of claims 1 to 5.
8. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein computer readable instructions that are executed by a processor in an electronic device to implement the audio noise reduction method of any of claims 1 to 5.
CN202110837937.4A 2021-07-23 2021-07-23 Audio noise reduction method, device, equipment and storage medium Active CN113470684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837937.4A CN113470684B (en) 2021-07-23 2021-07-23 Audio noise reduction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837937.4A CN113470684B (en) 2021-07-23 2021-07-23 Audio noise reduction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113470684A CN113470684A (en) 2021-10-01
CN113470684B true CN113470684B (en) 2024-01-12

Family

ID=77882114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837937.4A Active CN113470684B (en) 2021-07-23 2021-07-23 Audio noise reduction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113470684B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102930A1 (en) * 2021-12-10 2023-06-15 清华大学深圳国际研究生院 Speech enhancement method, electronic device, program product, and storage medium
CN113921022B (en) * 2021-12-13 2022-02-25 北京世纪好未来教育科技有限公司 Audio signal separation method, device, storage medium and electronic equipment
CN114267368A (en) * 2021-12-22 2022-04-01 北京百度网讯科技有限公司 Training method of audio noise reduction model, and audio noise reduction method and device
WO2023140488A1 (en) * 2022-01-20 2023-07-27 Samsung Electronics Co., Ltd. Bandwidth extension and speech enhancement of audio
CN115294952A (en) * 2022-05-23 2022-11-04 神盾股份有限公司 Audio processing method and device, and non-transitory computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010239424A (en) * 2009-03-31 2010-10-21 Kddi Corp Method, device and program for suppressing noise
CN104240717A (en) * 2014-09-17 2014-12-24 河海大学常州校区 Voice enhancement method based on combination of sparse code and ideal binary system mask
CN110491407A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method, apparatus, electronic equipment and the storage medium of voice de-noising
CN110808063A (en) * 2019-11-29 2020-02-18 北京搜狗科技发展有限公司 Voice processing method and device for processing voice
CN112567458A (en) * 2018-08-16 2021-03-26 三菱电机株式会社 Audio signal processing system, audio signal processing method, and computer-readable storage medium
CN112652321A (en) * 2020-09-30 2021-04-13 北京清微智能科技有限公司 Voice noise reduction system and method based on deep learning phase friendlier

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6260504B2 (en) * 2014-02-27 2018-01-17 株式会社Jvcケンウッド Audio signal processing apparatus, audio signal processing method, and audio signal processing program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010239424A (en) * 2009-03-31 2010-10-21 Kddi Corp Method, device and program for suppressing noise
CN104240717A (en) * 2014-09-17 2014-12-24 河海大学常州校区 Voice enhancement method based on combination of sparse code and ideal binary system mask
CN112567458A (en) * 2018-08-16 2021-03-26 三菱电机株式会社 Audio signal processing system, audio signal processing method, and computer-readable storage medium
CN110491407A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method, apparatus, electronic equipment and the storage medium of voice de-noising
CN110808063A (en) * 2019-11-29 2020-02-18 北京搜狗科技发展有限公司 Voice processing method and device for processing voice
CN112652321A (en) * 2020-09-30 2021-04-13 北京清微智能科技有限公司 Voice noise reduction system and method based on deep learning phase friendlier

Also Published As

Publication number Publication date
CN113470684A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN113470684B (en) Audio noise reduction method, device, equipment and storage medium
WO2021052224A1 (en) Video generation method and apparatus, electronic device, and computer storage medium
CN106683663B (en) Neural network training apparatus and method, and speech recognition apparatus and method
WO2018223727A1 (en) Voiceprint recognition method, apparatus and device, and medium
CN113450822B (en) Voice enhancement method, device, equipment and storage medium
CN109766925B (en) Feature fusion method and device, electronic equipment and storage medium
CN112634920B (en) Training method and device of voice conversion model based on domain separation
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
CN111696029B (en) Virtual image video generation method, device, computer equipment and storage medium
CN113470664B (en) Voice conversion method, device, equipment and storage medium
Tzinis et al. Compute and memory efficient universal sound source separation
CN112927707A (en) Training method and device of voice enhancement model and voice enhancement method and device
CN113435522A (en) Image classification method, device, equipment and storage medium
CN113470672B (en) Voice enhancement method, device, equipment and storage medium
CN113345431B (en) Cross-language voice conversion method, device, equipment and medium
TWI803243B (en) Method for expanding images, computer device and storage medium
CN113268597B (en) Text classification method, device, equipment and storage medium
CN113707167A (en) Training method and training device for residual echo suppression model
CN111858891A (en) Question-answer library construction method and device, electronic equipment and storage medium
WO2021135454A1 (en) Method, device, and computer-readable storage medium for recognizing fake speech
CN113438374A (en) Intelligent outbound call processing method, device, equipment and storage medium
CN113555003B (en) Speech synthesis method, device, electronic equipment and storage medium
CN112071331B (en) Voice file restoration method and device, computer equipment and storage medium
CN112309404B (en) Machine voice authentication method, device, equipment and storage medium
CN113470686B (en) Voice enhancement method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant