CN112259116A

CN112259116A - Method and device for reducing noise of audio data, electronic equipment and storage medium

Info

Publication number: CN112259116A
Application number: CN202011098018.1A
Authority: CN
Inventors: 吴威麒; 张金亮; 高华; 许一峰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-22
Anticipated expiration: 2040-10-14
Also published as: CN112259116B

Abstract

The embodiment of the disclosure discloses a method and a device for reducing noise of audio data, electronic equipment and a storage medium, wherein the method comprises the following steps: performing subband processing on audio frame data, wherein frequency intervals of all subbands are different, and a subband corresponding to a frequency interval with the minimum interval maximum value is used as a first subband; inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; determining gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands; and determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands. The noise reduction of the first sub-band is carried out based on the noise reduction model, and the noise reduction of other sub-bands is carried out through gain mapping, so that the model training efficiency can be improved, and the noise reduction efficiency of audio data can also be improved.

Description

Method and device for reducing noise of audio data, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular relates to a noise reduction method and device for audio data, electronic equipment and a storage medium.

Background

In recent years, noise reduction of audio data using a deep learning model is one of the mainstream trends of noise reduction. The model training method can be used for training the deep learning model by utilizing the audio data samples, and can be applied to noise reduction of the audio data with the low sampling rate by utilizing the model obtained by training the audio data samples with the low sampling rate, and can be applied to noise reduction of the audio data with the high sampling rate by utilizing the model obtained by training the audio data samples with the high sampling rate.

The existing noise reduction method at least comprises the following technical problems: when the sampling rate of the audio data samples is higher, longer sample points are needed for model training, more model parameters and more complex model structures are needed, and the model training efficiency is lower; moreover, when the noise reduction of the audio data is performed based on the model, the noise reduction processing also takes a relatively long time, resulting in a relatively low noise reduction efficiency of the audio data.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for reducing noise of audio data, an electronic device and a storage medium, which can improve the model training efficiency and the noise reduction efficiency of the audio data.

In a first aspect, an embodiment of the present disclosure provides a method for reducing noise of audio data, including:

performing subband processing on audio frame data, wherein frequency intervals of all subbands are different, and a subband corresponding to a frequency interval with the minimum interval maximum value is used as a first subband;

inputting the frequency domain information of the first sub-band into a noise reduction model, so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band;

determining gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands;

and determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for reducing noise of audio data, including:

the molecular band module is used for carrying out molecular band processing on the audio frame data, wherein the frequency intervals of all sub-bands are different, and the sub-band corresponding to the frequency interval with the minimum interval maximum value is used as a first sub-band;

a first denoising module, configured to input the frequency domain information of the first subband into a denoising model, so that the denoising model outputs the frequency domain information of the first subband after denoising and a gain of the first subband;

the second noise reduction module is used for determining the gains of other sub-bands except the first sub-band based on the gain of the first sub-band and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands;

and the noise reduction data determining module is used for determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of noise reduction for audio data as in any of the embodiments of the present disclosure.

In a fourth aspect, the embodiments of the present disclosure also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the method for denoising audio data according to any one of the embodiments of the present disclosure.

According to the technical scheme of the embodiment of the disclosure, the audio frame data is processed by the sub-band, wherein the frequency intervals of all sub-bands are different, and the sub-band corresponding to the frequency interval with the minimum interval maximum value is used as a first sub-band; inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; determining gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands; and determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands.

The frequency range of sound to which the human ear is most sensitive is generally considered to be the range of lower frequencies. Based on this, in the technical solution of the embodiment of the present disclosure, the audio data after being framed is divided into each subband, noise reduction is performed only on the first subband corresponding to the low frequency interval through the noise reduction model, and the gains of other subbands are mapped according to the gain of the first subband output by the noise reduction model, so as to perform noise reduction on other subbands. Compared with the traditional noise reduction method which utilizes a noise reduction model to reduce noise in a full frequency band, the method reduces the whole noise reduction time consumption and improves the noise reduction efficiency.

In addition, the noise reduction model is only applied to noise reduction of the first sub-band, and noise reduction of the full frequency band is not performed, so that the noise reduction model can be trained only through sample data of the first sub-band, and therefore the model training efficiency can be improved. Especially, under the condition of carrying out model training on the audio data with high sampling rate, the model parameters and the model complexity can be greatly reduced and the model training efficiency can be improved by only training the sample data of the first sub-band.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flowchart illustrating a method for denoising audio data according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for reducing noise of audio data according to a second embodiment of the disclosure;

fig. 3 is a schematic flowchart of a noise reduction method for audio data according to a third embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a noise reduction apparatus for audio data according to a fifth embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to a sixth embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

Example one

Fig. 1 is a schematic flowchart of a noise reduction method for audio data according to an embodiment of the present disclosure, which is suitable for noise reduction of audio data, especially for real-time noise reduction of audio data with a high sampling rate. The method may be performed by a noise reduction apparatus for audio data, which may be implemented in the form of software and/or hardware, which may be configured in an electronic device, such as a computer.

As shown in fig. 1, the method for reducing noise of audio data provided by this embodiment includes:

and S110, performing subband processing on the audio frame data, wherein the frequency intervals of the subbands are different, and the subband corresponding to the frequency interval with the minimum interval maximum value is used as a first subband.

In the embodiment of the present disclosure, the audio data to be denoised may be an audio data stream acquired in real time, or may be an audio data file after the acquisition.

Generally, before the audio data is denoised, the audio data may be subjected to framing processing to obtain audio frame data. The framing processing of the audio data may be understood as performing segmentation processing on sampling points in the audio data, and may be based on an existing Matlab framing code or other framing methods, which is not exhaustive here. By framing the audio data, the method is not only beneficial to real-time processing of the audio data stream, and noise reduction is not required after the audio data is acquired, but also beneficial to simultaneous noise reduction of each audio frame data in the audio data file, so that the noise reduction efficiency is improved.

When audio frame data is acquired, a subband process may be performed on each frame of data. The different sub-bands of the audio frame data may be regarded as data portions of different frequency intervals included in the audio frame data. The audio frame data may be parsed into data portions of different frequency intervals based on a time domain and/or frequency domain processing method.

Here, the frequency interval of each sub-band of the audio data is different, and it can be understood that the end point values of the frequency interval of each sub-band are not completely consistent. The frequency intervals of the sub-bands may overlap, be sequentially adjacent or not adjacent, and the like. For example, audio frame data with a sampling rate of 48khz and a frequency range of 0hz to 24khz may overlap frequency intervals corresponding to sub-bands of the audio frame data, such as 0hz to 8khz, 7khz to 16khz, and 15khz to 24 khz; or adjacent in sequence, for example, 0hz to 8khz, 8khz to 16khz, and 16khz to 24 khz; they may be non-adjacent ones, for example, 0hz-8khz, 9khz-16khz, 17khz-24khz, etc.

In some preferred implementations of the embodiments of the present disclosure, the frequency intervals of the sub-bands of the audio frame data are sequentially adjacent, and a right end point value of the frequency interval of a previous sub-band may be considered to be equal to a left end point value of the frequency interval of a next sub-band. In these preferred implementation manners, by adjacently setting the frequency intervals of the sub-bands, not only is the repeated noise reduction processing of the audio frame data in the overlapping frequency regions avoided, but also the audio frame data in some frequency regions are not omitted when the noise is reduced, so that the noise reduction can be completed in the whole band of the audio frame data, and the noise reduction effect can be improved to a certain extent.

The frequency interval with the minimum interval maximum value may be considered as a low frequency interval corresponding to the frequency interval corresponding to each subband. In addition, the relative low frequency interval in the frequency interval may also be represented by a frequency interval with the minimum interval value, or a frequency interval with the minimum interval value, which is not exhaustive herein.

The frequency range of sound to which the human ear is most sensitive is 200hz-800hz, which is generally considered to be a lower frequency range than the range of sound frequencies 20hz-20000 hz. Therefore, after the audio frame data is subjected to subband processing, the subband corresponding to the frequency interval with the minimum interval maximum value is used as the first subband, and the first subband corresponding to the low-frequency interval and containing the frequency range sensitive to human ears is selected from the subbands, so that a foundation is laid for the subsequent noise reduction of the first subband through a noise reduction model.

And S120, inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

In the embodiment of the present disclosure, the Noise reduction model may be, for example, a Recurrent Neural Network Noise suppression (RNNoise) model, a Convolutional Neural Network (CNN) model, or the like, and other audio Noise reduction models may also be applied thereto, which is not exhaustive. The noise reduction model can be obtained through pre-training, noise reduction of the first sub-band of the audio frame data can be achieved based on the noise reduction model, and meanwhile the gain of the first sub-band can be output.

For each frame of audio data, the gain of the first subband may be considered as a set of gains corresponding to each frequency point in a frequency interval corresponding to the first subband, and the gain corresponding to each frequency point is a gain value. The gain value may be, for example, frequency domain information after noise reduction, or a ratio value of the frequency domain information before noise reduction. Wherein, the gain is usually in the range of 0-1, and the closer the gain is to 0, the more the audio frame data is considered to be close to noise, the more the audio frame data needs to be eliminated.

Compared with the traditional noise reduction processing on the full-band data of the audio data, the noise reduction processing is only carried out on the first sub-band by utilizing the noise reduction model, so that the overall noise reduction time consumption of the audio data can be reduced, the noise reduction efficiency is improved, and the advantage is very obvious in the application of real-time noise reduction of the audio data.

In addition, generally, the higher the sampling rate is, the more complicated the structure of the conventional noise reduction model is, the more computing resources are consumed in the noise reduction process, so that the noise reduction of the audio data is difficult to fall to a terminal with limited computing resources, such as a notebook or a mobile phone, and the application of the actual engineering is not facilitated. The noise reduction model provided by the embodiment of the disclosure can only reduce noise of the first subband in the low frequency range when the noise of the audio data with high sampling rate is reduced, so that the complexity of the model is greatly reduced, the consumed computing resources are also greatly reduced, the model is favorable for falling to a terminal with limited computing resources, and the application of practical engineering is favorable.

In some optional implementation manners of the embodiment of the present disclosure, a training manner of the noise reduction model includes: acquiring sample frame data which is the same as the frequency interval of the first sub-band and target frequency domain information of the sample frame data; and training the noise reduction model by using the frequency domain information of the sample frame data and the target frequency domain information until the noise reduction model is converged.

When the noise reduction model is trained, the required sample data can be the same as the frequency range of the audio data to be subjected to noise reduction or different from the frequency range of the audio data to be subjected to noise reduction, and only after the sample data is subjected to framing and/or molecular band processing, the sample frame data and the first sub-band have the same frequency interval.

Generally, the sampling rate of audio data is twice its maximum frequency to meet the acquisition requirement. Most existing open-source audio data sets have a low sampling rate, for example a sampling rate of 16 khz. When the maximum frequency value of the audio data needing noise reduction is higher, the sampling rate of the audio data is higher, for example, the sampling rate reaches 32khz, 48khz and the like, at this time, if the traditional noise reduction model training is performed by using full-band sample data, not only is the audio data set support of an open source not available, but also a large amount of time is consumed for sample data acquisition with a high sampling rate, and more sample points, more model parameters and more complex model structures are inevitably needed in the training process, which results in lower model training efficiency.

In these optional embodiments of the present disclosure, the existing open-source audio data set with a lower sampling rate may be utilized to perform framing and/or subband processing on the sample data in the data set, so that the sample frame data and the first subband have the same frequency interval, and then the noise reduction model training may be performed to be applied to noise reduction of the audio data with a high sampling rate. Therefore, not only is time-consuming sample data acquisition at a high sampling rate not needed, but also training is performed only according to the sample frame data of the frequency interval of the first sub-band, so that model parameters and model complexity can be greatly reduced, and model training efficiency is improved.

And S130, determining the gains of the other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands.

In the embodiment of the disclosure, because the frequency intervals of each sub-band are sequentially adjacent and the correlation of gains of adjacent frequency points is strong, the gains of the frequency points in the frequency intervals corresponding to other sub-bands can be determined based on the gain of the first sub-band. The method for determining the gains of the other subbands except the first subband based on the gain of the first subband may be, for example, sequentially determining the gains of the frequency points of the other subbands according to a gain mapping relationship between adjacent frequency points, or determining the gains of the other subbands according to the gain of the frequency point with a larger frequency value in the frequency interval of the first subband.

In some optional implementations of the embodiment of the present disclosure, determining, based on the gain of the first subband, gains of subbands other than the first subband includes: and in the frequency interval of the first sub-band, the average gain from a preset frequency point value to the maximum value of the interval and the decision probability of the first sub-band output by the noise reduction model belonging to a preset category are integrated, and the gains of other sub-bands except the first sub-band are determined.

The average gain from the preset frequency point value to the maximum value of the interval can be regarded as the average value of the gains corresponding to the frequency points in the frequency range from the preset frequency point value to the maximum value of the interval. In the frequency interval corresponding to the first sub-band, the gain of the frequency point with the larger frequency value has higher relevance with the gain of other sub-bands corresponding to the opposite high-frequency interval. Therefore, the average gain from a preset frequency point value to the interval maximum value in the frequency interval of the first subband may be used as one of the factors for determining the gains of the other subbands.

The noise reduction model may perform noise reduction on the frequency information of the first sub-band, and may also perform detection on an effective sound category of the audio frame data, where the effective sound category is, for example, a human voice category or a musical instrument voice category. When the value of the decision probability of the preset category is larger, it can be considered that the smaller the probability that the audio frame data belongs to the noise is, the larger the corresponding gain is. In the same audio frame data, the relevance of the decision probability that each sub-band belongs to the preset category is high. Therefore, the decision probability that the first subband belongs to the preset class can be also used as one of the factors for determining the gains of the other subbands.

In these optional embodiments, by integrating the average gain from the preset frequency point value to the maximum value of the interval in the frequency interval of the first subband and the decision probability that the first subband output by the noise reduction model belongs to the preset category, the gains of other subbands with better noise reduction effect can be determined.

S140, determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing of other sub-bands.

In the embodiment of the disclosure, the time domain information of the audio frame data after noise reduction can be obtained according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands. In addition, the audio frame data after noise reduction can be synthesized to restore the audio data stream or audio data file after noise reduction, so that the audio data stream or audio file in the time domain can be conveniently played.

In some optional implementation manners of the embodiments of the present disclosure, the noise reduction method for audio data is applied to noise reduction of voice data, and accordingly, a frequency interval with a minimum interval maximum value includes a human voice frequency interval.

In these alternative embodiments, the noise reduction method may be applied to noise reduction of voice data, for example, noise reduction may be performed on a voice stream input by a communicator in real time in a communication environment with relatively large noise, or noise reduction may be performed on a recorded voice file, and so on. When the method is applied to noise reduction of voice data, a frequency interval with the minimum interval maximum value, namely a relatively low-frequency interval in the voice data needs to contain a human voice frequency interval, so that the noise reduction effect of the voice data is improved.

According to the technical scheme of the embodiment of the disclosure, audio data after framing is divided into sub-bands, noise reduction is only performed on the first sub-band corresponding to the low-frequency interval through the noise reduction model, and the gains of other sub-bands are mapped according to the gain of the first sub-band output by the noise reduction model so as to perform noise reduction on other sub-bands. Compared with the traditional noise reduction method which utilizes a noise reduction model to reduce noise in a full frequency band, the method reduces the whole noise reduction time consumption and improves the noise reduction efficiency.

Example two

The embodiments of the present disclosure and various alternatives in the noise reduction method of audio data provided in the above embodiments may be combined. The noise reduction method for audio data provided by this embodiment optimizes the steps of performing the subband processing in the time domain, and correspondingly synthesizing the noise-reduced audio frame data in the time domain, and enriches the subband processing modes. And, through carrying on the downsampling and processing, can reduce the calculated amount of the model of making an uproar, raise the efficiency of making an uproar to a certain extent, through carrying on the upsampling and processing, can guarantee the tone quality of the audio frame data.

Fig. 2 is a schematic flowchart of a noise reduction method for audio data according to a second embodiment of the disclosure. Referring to fig. 2, the method for reducing noise of audio data provided in this embodiment includes:

s210, performing subband processing on the audio frame data in a time domain, wherein the frequency intervals of the subbands are different, and the subband corresponding to the frequency interval with the minimum interval maximum value is used as a first subband.

The performing the subband processing on the audio frame data in the time domain may include: and performing subband processing on the audio frame data through an analysis filter bank, wherein the passband of each filter in the analysis filter bank is different.

The filters in the analysis filter bank may be quadrature mirror filters, discrete cosine modulated filters, or other filters applicable to the processing of the molecular band, which is not exhaustive here. The number of filters in the filter bank may be equal to the number of subbands required by the user, and the passband of each filter may correspond to a frequency interval required by each subband, for example, may be respectively equal to the frequency interval required by each subband. The audio frame data of the time domain and each filter are respectively convolved, so that the time domain information of different sub-bands can be obtained, and the division of the different sub-bands can be realized.

S220, inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

In the embodiment of the disclosure, after the time domain information of each sub-band is obtained, only the time domain information of the first sub-band may be subjected to time-frequency transformation, and the time domain data of other sub-bands is not required to be transformed. The time domain information of the first sub-band may be fourier transformed to obtain the frequency domain information of the first sub-band. By utilizing the noise reduction model, the noise reduction of the frequency domain information of the first sub-band can be realized, and the gain corresponding to each frequency point in the frequency domain interval corresponding to the first sub-band is output.

And S230, determining the gains of the other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively performing noise reduction processing on the time domain information of the other sub-bands according to the gains of the other sub-bands.

After the gains of other subbands are obtained, the time domain information of other subbands may be directly multiplied by the corresponding gains to implement noise reduction processing on the time domain information of other subbands.

S240, converting the frequency domain information of the first sub-band after noise reduction into time domain information, and synthesizing the time domain information of the first sub-band after noise reduction and the time domain information of other sub-bands after noise reduction to obtain the time domain information of the audio frame data after noise reduction.

The frequency domain information of the first sub-band after noise reduction can be converted into time domain information by using inverse fourier transform. And synthesizing the time domain information subjected to noise reduction of each sub-band to obtain the time domain information of the audio frame data subjected to noise reduction.

The synthesizing the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of the other sub-bands may include: and synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands by using a comprehensive filter bank. Each filter in the synthesis filter bank needs to correspond to each filter in the analysis filter bank, so as to realize accurate synthesis of audio frame data after the molecular band.

In some further implementation manners of the embodiments of the present disclosure, the method further includes: performing down-sampling processing on the audio frame data before performing the molecular band processing on the audio frame data through the analysis filter set; or after the audio frame data is subjected to the subband processing through the analysis filter group, performing downsampling processing on the time domain information of the first subband;

correspondingly, before synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands through the comprehensive filter bank, performing up-sampling processing on the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands; or after synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands through the synthesis filter bank, performing up-sampling processing on the time domain information of the audio frame data subjected to noise reduction.

According to the Nobel equation, the audio frame data is firstly subjected to downsampling processing, then the audio frame data is subjected to subband processing through the analysis filter bank, which can be equivalent to that the audio frame data is subjected to subband processing through the analysis filter bank firstly, and then the time domain information of the first subband is subjected to downsampling processing. Similarly, the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of the other sub-bands are subjected to upsampling processing, and then the upsampled time domain information is synthesized through the comprehensive filter bank.

In these further implementation manners, the amount of data to be calculated by the noise reduction model can be reduced through the down-sampling operation, so that the filtering efficiency of the noise reduction model can be improved to a certain extent. Through the operation of upsampling, can restore the tone quality of audio frame data to a certain extent, guarantee user's sense of hearing when playing audio frame data and experience.

For example, when the number of subbands required by the user is 3, the denoising process performed on the audio frame data x (n), where n is 1, 2.

First, the filter bank design may be done for the number of subbands, as well as the frequency range of the first subband.

For example, the audio frame data may be processed by applying 3 discrete cosine modulated filters, and the expression of each filter may be:

wherein h is₀(n) can represent a prototype low-pass filter, h₁(n)、h₂(n) and h₃(n) may represent 3 discrete cosine modulated filters, respectively; where N may equal the number of filters and the filter length may be len.

Meanwhile, audio frame data may be down-sampled.

For example, the audio frame data may be subjected to downsampling processing of M times, and the downsampling processing may be expressed as:

where dec (m, n) may represent time domain information of an nth sampling point of an mth group after next sampling; m may be equal to the number of subbands.

Secondly, a filter h can be utilized₁(n)、h₂(n) and h₃(n) performing subband processing on each set of downsampled audio frame data, and expressing the subband processingCan be as follows:

x(m,n)＝dec(m,n)×h_m(n)；

where x (m, n) may represent time domain information for the nth point of the mth subband. And performing time domain convolution on each group of down-sampled time domain information and each filter in the analysis filter group respectively to obtain the time domain information of each sub-band.

Again, this may be done for only the time domain information for the first sub-band

Fast Fourier Transform (FFT) of a point, other subbands are not used for this processing. And the FFT of the time domain information of the first sub-band can be simplified as:

wherein, S (1, k) may represent frequency domain information corresponding to a kth point in the first subband; wherein x (1, n) may represent time domain information of an nth point of a 1 st subband (i.e., the first subband); wherein the content of the first and second substances,

it may represent an FFT transformation of the first sub-band.

Then, inputting S (1, k) into the noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band; determining gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively performing noise reduction processing on time domain information of the other sub-bands according to the gains of the other sub-bands; and converting the frequency domain information subjected to noise reduction of the first sub-band into time domain information.

And then, synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands by a comprehensive filter bank.

Wherein each filter in the synthesis filter bank needs to correspond to each filter in the analysis filter bank. For example, 3 discrete cosine modulated filters may be employed as each filter in the synthesis filter bank, and the expression for each filter in the synthesis filter bank may be:

wherein, the filter f in the synthesis filter bank₁(n)、f₂(n) and f₃(n), and h₁(n)、h₂(n) and h₃(n) correspond to each other. Through f₁(n)、f₂(n) and f₃(n) filtering the time domain information after denoising for each subband, where the filtering expression may be:

syn(m,n)＝out(m,n)×f_m(n)；

out (m, n) may represent time domain information of the nth sample point of the mth subband after noise reduction; syn (m, n) may represent the nth point-filtered time domain information for the mth subband. The time domain information of each sub-band is convoluted with the corresponding sub-band synthesis filter, so that a foundation can be laid for the synthesis of the audio frame data after noise reduction.

Finally, M times up-sampling processing up (M, n) is performed on syn (M, n) to obtain up sample (syn (M, n)), and the final noise reduction enhanced audio frame data is obtained by synthesis, and the expression of the synthesis may be:

where enh (n) may represent the final enhanced audio data.

According to the technical scheme of the embodiment of the disclosure, the steps of performing the molecular band processing on the time domain, correspondingly synthesizing the noise-reduced audio frame data in the time domain and the like are optimized, and the molecular band processing mode is enriched. And, through carrying on the downsampling and processing, can reduce the calculated amount of the model of making an uproar, raise the efficiency of making an uproar to a certain extent, through carrying on the upsampling and processing, can guarantee the tone quality of the audio frame data. In addition, the noise reduction method for audio data provided by the embodiment of the present disclosure and the noise reduction method for audio data provided by the above embodiment belong to the same technical concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.

EXAMPLE III

The embodiments of the present disclosure and various alternatives in the noise reduction method of audio data provided in the above embodiments may be combined. The method for reducing noise of audio data provided by this embodiment optimizes the steps of performing the subband processing in the frequency domain, and correspondingly synthesizing the noise-reduced audio frame data in the frequency domain, and enriches the subband processing modes.

Fig. 3 is a schematic flowchart of a noise reduction method for audio data according to a third embodiment of the present disclosure. Referring to fig. 3, the method for reducing noise of audio data provided in this embodiment includes:

and S310, after the audio frame data are converted into the frequency domain information, performing molecular band processing on the frequency domain information of the audio frame data, wherein the frequency intervals of all sub-bands are different, and the sub-band corresponding to the frequency interval with the minimum interval maximum value is used as a first sub-band.

The time domain information of the audio frame data may be directly fourier-transformed to obtain the frequency domain information of the full-band audio frame data. Further, performing the subband processing on the frequency domain information of the audio frame data may include: and grouping the frequency domain information of the audio frame data according to the frequency interval to obtain the frequency domain information of each sub-band.

The frequency domain information of the audio frame data may include component information of each frequency point corresponding to each sampling point after time-frequency conversion is performed on the current audio frame data, for example, how many components each sampling point includes at frequency points such as 1khz, 2khz, 3khz,. 24khz, and the like. The frequency domain information of the audio frame data is grouped according to the frequency intervals, which can be understood as dividing the frequency intervals according to the frequency points, and the frequency domain information in each frequency interval is used as the frequency domain information of different sub-bands.

For example, if the frequency range of the full frequency band is 0hz-24khz, two frequency points of 8khz and 16khz may be used as the division frequency points, and 0hz-24khz may be divided into three frequency ranges of 0hz-8khz, 8khz-16khz and 16khz-24 khz. The frequency domain information in each frequency interval may be considered as frequency domain information of different subbands.

And S320, inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

After determining the frequency domain information of the first sub-band, only the frequency domain information of the first sub-band may be input to the noise reduction model, and the frequency domain data of other sub-bands may not be input to the noise reduction model.

S330, determining the gains of the other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively performing noise reduction processing on the frequency domain information of the other sub-bands according to the gains of the other sub-bands.

After the gains of other sub-bands are obtained, the frequency domain information of other sub-bands can be directly multiplied by the corresponding gains to implement the noise reduction processing on the frequency domain information of other sub-bands.

S340, splicing the frequency domain information subjected to noise reduction of the first sub-band and the frequency domain information subjected to noise reduction of the other sub-bands to obtain the frequency domain information subjected to noise reduction of the audio frame data.

The frequency domain information after the noise reduction of the first sub-band and the frequency domain information after the noise reduction of the other sub-bands are spliced, and the method comprises the following steps: and splicing the frequency domain information subjected to noise reduction of the first sub-band and the frequency domain information subjected to noise reduction of the other sub-bands according to the frequency interval.

The frequency domain information is subjected to the subband according to the frequency domain interval, and the frequency domain information of the full frequency band after the noise reduction of the audio frame data can be obtained by splicing the frequency domain information of each sub-band after the noise reduction according to the divided frequency domain intervals.

S350, converting the frequency domain information of the audio frame data subjected to noise reduction into time domain information to obtain the time domain information of the frame data of the audio data subjected to noise reduction.

The frequency domain information of the audio frame data after noise reduction can be converted into time domain information by using inverse fourier transform, that is, the time domain information of the audio frame data after noise reduction can be obtained.

For example, when the number of subbands required by the user is 3, the process of denoising the audio frame data x (n), where n is 1, 2.. L (where L is the total number of sample points) may also be:

first, L-point FFT transformation is performed on the audio frame data x (n), which can be simplified as follows:

where x (k) is frequency domain information of a k-th point in the audio frame data.

Secondly, grouping x (k) according to frequency intervals to obtain frequency domain information of each sub-band, which can be simplified and expressed as:

wherein M may represent the number of subbands; s (m, k) may represent frequency domain information corresponding to a kth point in the mth subband.

Thirdly, S (1, k) may be input into the noise reduction model, so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction, and the gain of the first sub-band; and determining the gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the frequency domain information of the other sub-bands according to the gains of the other sub-bands.

Then, all sub-bands S (m, k) may be pieced together to obtain full-band frequency domain information S (k), and the time domain information of the frame data of the noise-reduced audio data may be obtained through inverse FFT transformation, and the inverse FFT transformation may be simply expressed as:

enh(n)＝ifft(S(k))；

where enh (n) may represent the final enhanced audio data.

According to the technical scheme of the embodiment of the disclosure, steps of performing molecular band processing on the frequency domain, correspondingly synthesizing and denoising the audio frame data in the frequency domain and the like are optimized, and the molecular band processing mode is enriched. In addition, the noise reduction method for audio data provided by the embodiment of the present disclosure and the noise reduction method for audio data provided by the above embodiment belong to the same technical concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.

Example four

The embodiments of the present disclosure and various alternatives in the noise reduction method of audio data provided in the above embodiments may be combined. The method for reducing noise of audio data provided by this embodiment optimizes the step of determining the gains of other subbands, and can implement mapping from the gain of the first subband to the gains of other subbands, thereby laying a foundation for noise reduction processing of other subbands.

In some optional implementation manners of the embodiment of the present disclosure, in the frequency interval of the first subband, determining gains of other subbands except the first subband by synthesizing an average gain from a preset frequency point value to a maximum value of the interval and a decision probability that the first subband output by the noise reduction model belongs to a preset category includes:

presetting the average gain from a frequency point value to the maximum value of an interval in a frequency interval of a first sub-band as a first gain factor; determining the decision probability that the other sub-bands except the first sub-band belong to the preset category according to the first gain factor and the decision probability that the first sub-band output by the noise reduction model belongs to the preset category; determining second gain factors of other sub-bands according to the decision probability that the other sub-bands belong to the preset category; and integrating the first gain factor, the second gain factor and the decision probability of the other sub-bands belonging to the preset category to determine the gains of the other sub-bands except the first sub-band.

For example, it is assumed that in the current audio frame data, the gain of the mth subband and the kth frequency is denoted as G (m, k), and for convenience of description, the gain G (1, k) of the first subband may be abbreviated as G (k).

In the frequency interval corresponding to the first sub-band, the gain of the frequency point with the larger frequency value has higher relevance with the gain of other sub-bands corresponding to the opposite high-frequency interval. Therefore, the average gain from a preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band can be used as a first gain factor for determining the gains of other sub-bands, so as to ensure that the sub-bands subjected to noise reduction according to the gains can be smoothly transited.

Wherein, the calculation formula of the first gain factor can be simplified and expressed as:

avgggainh may represent a first gain factor; the nFT is a constant value and can represent the total number of frequency points after time frequency conversion; bw is also a constant value and may represent the number of bins between a predetermined bin value and the maximum value of the interval, and is typically an empirical value, which may be, for example, a value in the range 1/4 to 1/3 for nFFT.

The decision probability that the first subband estimated by the noise reduction model belongs to the preset category may be regarded as the decision probability that the current audio frame data belongs to the preset category, and may be denoted as vad.

Wherein, according to the first gain factor avgggainh and the decision probability vad of the first subband output by the noise reduction model, determining the decision probability that other subbands except the first subband belong to the preset class, for example, the decision probability may be determined based on the following formula:

avgProbH may represent a decision probability that the other subbands than the first subband belong to a preset class. The avgpprobh may be obtained by determining a weighted sum of avgGainH and vad, besides the square root of avgGainH and vad, and other ways of fusing avgGainH and vad to determine the decision probability that the other sub-bands except the first sub-band belong to the preset category may also be applied herein, which is not specifically limited herein.

Wherein, the second gain factor of the other sub-band is determined according to the decision probability avgProbH that the other sub-band belongs to the preset category, and may be determined based on the following formula, for example:

gainH may represent a second gain factor for other subbands, and gainH and avgProbH need to be positively correlated, and the range of values needs to be [0,1 ]. In addition to determining gainH by hyperbolic tangent tanh, other gainH determination methods that satisfy positive correlation between gainH and avgProbH and a gainH range of [0,1] may also be applied, and are not specifically limited herein.

The first gain factor avgggainh, the second gain factor gainH, and the decision probability avgProbH that the other subband belongs to the preset category are integrated to determine the gains of the other subbands except the first subband, which may be determined based on the following formula, for example:

where gain may represent the gain of other subbands than the first subband. Because the relative high-frequency interval corresponding to other sub-bands except the first sub-band contains less component information of effective sound, the noise reduction requirement of other sub-bands can be met by setting the gain of each frequency point in other sub-bands to be the same gain value.

According to the technical scheme of the embodiment of the disclosure, the step of determining the gains of other sub-bands is optimized, the mapping from the gain of the first sub-band to the gains of other sub-bands can be realized, and a foundation is laid for the noise reduction processing of other sub-bands. In addition, the noise reduction method for audio data provided by the embodiment of the present disclosure and the noise reduction method for audio data provided by the above embodiment belong to the same technical concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.

EXAMPLE five

Fig. 4 is a schematic structural diagram of a noise reduction apparatus for audio data according to a fifth embodiment of the present disclosure. The noise reduction device for audio data provided by the embodiment is suitable for the situation of noise reduction of audio data, and is particularly suitable for the situation of real-time noise reduction of audio data with a high sampling rate.

As shown in fig. 4, the noise reduction apparatus for audio data includes:

a subband module 410, configured to perform subband processing on the audio frame data, where frequency intervals of the subbands are different, and a subband corresponding to a frequency interval with a minimum interval maximum value is used as a first subband;

a first denoising module 420, configured to input the frequency domain information of the first sub-band into a denoising model, so that the denoising model outputs the frequency domain information of the first sub-band after denoising and a gain of the first sub-band;

the second denoising module 430 is configured to determine gains of other subbands except the first subband based on the gain of the first subband, and perform denoising processing on the other subbands according to the gains of the other subbands;

and a de-noising data determining module 440, configured to determine the audio frame data after de-noising according to the frequency domain information after de-noising of the first sub-band and the result of de-noising of the other sub-bands.

In some optional implementations of embodiments of the present disclosure, the molecular band module includes:

the time domain molecular band submodule is used for carrying out molecular band processing on the audio frame data in the time domain;

correspondingly, the second denoising module is used for respectively denoising the time domain information of other sub-bands according to the gains of the other sub-bands;

and the noise reduction data determining module is used for converting the frequency domain information of the first sub-band subjected to noise reduction into time domain information, and synthesizing the time domain information of the first sub-band subjected to noise reduction and the time domain information of other sub-bands subjected to noise reduction to obtain the time domain information of the audio frame data subjected to noise reduction.

In some further implementation manners of the embodiments of the present disclosure, the time domain subband module is specifically configured to perform subband processing on the audio frame data through an analysis filter bank, where passbands of filters in the analysis filter bank are different;

correspondingly, the noise reduction data determining module is specifically configured to synthesize, by the synthesis filter bank, the noise-reduced time domain information of the first sub-band and the noise-reduced time domain information of the other sub-bands.

In some further implementations of embodiments of the present disclosure, the time domain subband module is further configured to perform downsampling on the audio frame data before performing subband processing on the audio frame data by the analysis filter bank; or after the audio frame data is subjected to the subband processing through the analysis filter group, performing downsampling processing on the time domain information of the first subband;

correspondingly, the de-noising data determining module is further configured to perform upsampling processing on the de-noised time domain information of the first sub-band and the de-noised time domain information of the other sub-bands before synthesizing the de-noised time domain information of the first sub-band and the de-noised time domain information of the other sub-bands through the synthesis filter bank; or after synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands through the synthesis filter bank, performing up-sampling processing on the time domain information of the audio frame data subjected to noise reduction.

the frequency domain molecular band submodule is used for converting the audio frame data into frequency domain information and then performing molecular band processing on the frequency domain information of the audio frame data;

correspondingly, the second denoising module is used for performing denoising processing on the frequency domain information of other sub-bands according to the gains of the other sub-bands;

the noise reduction data determining module is used for splicing the frequency domain information subjected to noise reduction of the first sub-band and the frequency domain information subjected to noise reduction of other sub-bands to obtain the frequency domain information subjected to noise reduction of the audio frame data; and converting the frequency domain information of the audio frame data subjected to noise reduction into time domain information to obtain the time domain information of the frame data of the audio data subjected to noise reduction.

In some further implementation manners of the embodiments of the present disclosure, the frequency domain sub-band module is specifically configured to group frequency domain information of the audio frame data according to frequency intervals to obtain frequency domain information of each sub-band;

correspondingly, the denoising data determining module is specifically configured to splice the frequency domain information after denoising the first sub-band and the frequency domain information after denoising the other sub-bands according to the frequency interval.

In some optional implementations of embodiments of the present disclosure, the second denoising module includes:

and the gain mapping sub-module is used for integrating the average gain from a preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band and the decision probability of the first sub-band output by the noise reduction model belonging to a preset category and determining the gains of other sub-bands except the first sub-band.

In some further implementation manners of the embodiments of the present disclosure, the gain mapping sub-module is specifically configured to use, as the first gain factor, an average gain from a preset frequency point value to an interval maximum value in a frequency interval of the first subband; determining the decision probability that the other sub-bands except the first sub-band belong to the preset category according to the first gain factor and the decision probability that the first sub-band output by the noise reduction model belongs to the preset category; determining second gain factors of other sub-bands according to the decision probability that the other sub-bands belong to the preset category; and integrating the first gain factor, the second gain factor and the decision probability of the other sub-bands belonging to the preset category to determine the gains of the other sub-bands except the first sub-band.

In some optional implementations of the embodiment of the present disclosure, the frequency intervals of the subbands are sequentially adjacent to each other.

In some optional implementation manners of the embodiment of the present disclosure, the noise reduction device for audio data is applied to noise reduction of voice data, and accordingly, a frequency interval with a minimum interval maximum value includes a human voice frequency interval.

The noise reduction device for audio data provided by the embodiment of the disclosure can execute the noise reduction method for audio data provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.

EXAMPLE six

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., the terminal device or the server in fig. 5) 500 suitable for implementing the noise reduction method for audio data according to the embodiment of the disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., central processor, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the noise reduction method of audio data of the embodiment of the present disclosure when executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure and the method for reducing noise of audio data provided by the above embodiment belong to the same disclosure concept, and technical details that are not described in detail in the embodiment of the present disclosure may be referred to the above embodiment, and the embodiment has the same beneficial effects as the above embodiment.

EXAMPLE seven

The disclosed embodiments provide a computer storage medium having stored thereon a computer program that, when executed by a processor, implements the method of noise reduction of audio data provided by the above-described embodiments.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or FLASH Memory (FLASH), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

performing subband processing on audio frame data, wherein frequency intervals of all subbands are different, and a subband corresponding to a frequency interval with the minimum interval maximum value is used as a first subband; inputting the frequency domain information of the first sub-band into the noise reduction model, so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; determining gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands; and determining the audio frame data after noise reduction according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit, module does not in some cases constitute a limitation of the unit, module itself, for example, a first noise reduction module may also be described as a "first sub-band noise reduction module".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Part (ASSP), a System On Chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a method of noise reduction of audio data, the method comprising:

According to one or more embodiments of the present disclosure, [ example two ] there is provided a method of noise reduction of audio data, further comprising:

in some optional implementation manners of the embodiments of the present disclosure, the performing a molecular band process on the audio frame data includes:

performing molecular band processing on the audio frame data in a time domain;

correspondingly, the performing noise reduction processing on the other subbands according to the gains of the other subbands includes: respectively carrying out noise reduction processing on the time domain information of other sub-bands according to the gains of the other sub-bands;

determining the audio frame data subjected to noise reduction according to the frequency domain information subjected to noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands, including:

and converting the frequency domain information subjected to noise reduction of the first sub-band into time domain information, and synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands to obtain the time domain information of the audio frame data subjected to noise reduction.

According to one or more embodiments of the present disclosure, [ example three ] there is provided a method of noise reduction of audio data, further comprising:

in some optional implementation manners of the embodiments of the present disclosure, the performing a subband processing on the audio frame data in the time domain includes:

performing subband processing on audio frame data through an analysis filter bank, wherein the passband of each filter in the analysis filter bank is different;

correspondingly, the synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of the other sub-bands includes: and synthesizing the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands by using a comprehensive filter bank.

According to one or more embodiments of the present disclosure, [ example four ] there is provided a method of noise reduction of audio data, further comprising:

before the audio frame data is subjected to the molecular band processing through the analysis filter set, performing down-sampling processing on the audio frame data; or after the audio frame data is subjected to the subband processing through the analysis filter group, performing downsampling processing on the time domain information of the first subband;

correspondingly, before the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands are synthesized through the comprehensive filter bank, the time domain information subjected to noise reduction of the first sub-band and the time domain information subjected to noise reduction of other sub-bands are subjected to up-sampling processing; or after synthesizing the time domain information of the first sub-band after noise reduction and the time domain information of other sub-bands after noise reduction through the comprehensive filter bank, performing upsampling processing on the time domain information of the audio frame data after noise reduction.

According to one or more embodiments of the present disclosure, [ example five ] there is provided a method of noise reduction of audio data, further comprising:

after converting audio frame data into frequency domain information, performing molecular band processing on the frequency domain information of the audio frame data;

correspondingly, the performing noise reduction processing on the other subbands according to the gains of the other subbands includes: respectively carrying out noise reduction processing on the frequency domain information of other sub-bands according to the gains of the other sub-bands;

splicing the frequency domain information subjected to noise reduction of the first sub-band and the frequency domain information subjected to noise reduction of other sub-bands to obtain the frequency domain information subjected to noise reduction of the audio frame data; and converting the frequency domain information of the audio frame data after noise reduction into time domain information to obtain the time domain information of the frame data of the audio data after noise reduction.

According to one or more embodiments of the present disclosure, [ example six ] there is provided a noise reduction method of audio data, further comprising:

in some optional implementation manners of the embodiment of the present disclosure, the performing a subband processing on the frequency domain information of the audio frame data includes:

grouping the frequency domain information of the audio frame data according to frequency intervals to obtain frequency domain information of each sub-band;

correspondingly, the splicing the frequency domain information after the noise reduction of the first sub-band and the frequency domain information after the noise reduction of the other sub-bands comprises:

and splicing the frequency domain information subjected to noise reduction of the first sub-band and the frequency domain information subjected to noise reduction of the other sub-bands according to a frequency interval.

According to one or more embodiments of the present disclosure, [ example seven ] there is provided a noise reduction method of audio data, further comprising:

in some optional implementation manners of the embodiment of the present disclosure, the training manner of the noise reduction model includes:

acquiring sample frame data which is the same as the frequency interval of the first sub-band and target frequency domain information of the sample frame data;

and training the noise reduction model by using the frequency domain information of the sample frame data and the target frequency domain information until the noise reduction model is converged.

According to one or more embodiments of the present disclosure, [ example eight ] there is provided a noise reduction method of audio data, further comprising:

in some optional implementations of the embodiments of the present disclosure, the determining, based on the gain of the first subband, gains of subbands other than the first subband includes:

and in the frequency interval of the first sub-band, the average gain from a preset frequency point value to the maximum value of the interval and the decision probability of the first sub-band output by the noise reduction model belonging to a preset category are integrated to determine the gains of other sub-bands except the first sub-band.

According to one or more embodiments of the present disclosure, [ example nine ] there is provided a method of noise reduction of audio data, further comprising:

in some optional implementations of the embodiment of the present disclosure, the determining, by integrating an average gain from a preset frequency point value to a maximum value of the interval in the frequency interval of the first subband and a decision probability that the first subband output by the noise reduction model belongs to a preset category, gains of subbands other than the first subband includes:

presetting the average gain from a frequency point value to the maximum value of the interval in the frequency interval of the first sub-band as a first gain factor;

determining the decision probability that the other sub-bands except the first sub-band belong to the preset category according to the first gain factor and the decision probability that the first sub-band output by the noise reduction model belongs to the preset category;

determining second gain factors of other sub-bands according to the decision probability that the other sub-bands belong to the preset category;

and integrating the first gain factor, the second gain factor and the decision probability of other sub-bands belonging to a preset category to determine the gains of other sub-bands except the first sub-band.

According to one or more embodiments of the present disclosure, [ example ten ] there is provided a noise reduction method of audio data, further comprising:

in some optional implementation manners of the embodiment of the present disclosure, the frequency intervals of the sub-bands are sequentially adjacent to each other.

According to one or more embodiments of the present disclosure, [ example eleven ] there is provided a noise reduction method of audio data, further comprising:

in some optional implementation manners of the embodiments of the present disclosure, the noise reduction method for audio data is applied to noise reduction of voice data, and correspondingly, the frequency interval with the minimum maximum value of the interval includes a human voice frequency interval.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for noise reduction of audio data, comprising:

2. The method of claim 1, wherein performing the subband processing on the audio frame data comprises:

performing molecular band processing on the audio frame data in a time domain;

3. The method of claim 2, wherein performing the subband processing on the audio frame data in the time domain comprises:

4. The method of claim 3, further comprising:

5. The method of claim 1, wherein performing the subband processing on the audio frame data comprises:

6. The method of claim 5, wherein the performing the subband processing on the frequency domain information of the audio frame data comprises:

7. The method according to claim 1, wherein the training mode of the noise reduction model comprises:

8. The method of claim 1, wherein determining gains for subbands other than the first subband based on the gain for the first subband comprises:

9. The method according to claim 8, wherein the synthesizing of the average gain from a preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band and the decision probability that the first sub-band output by the noise reduction model belongs to a preset class to determine the gains of the sub-bands except the first sub-band comprises:

10. The method of claim 1, wherein the frequency bins of each subband are sequentially adjacent.

11. A method according to any one of claims 1 to 10, applied to noise reduction of speech data, wherein the frequency interval in which the interval has the smallest maximum value, accordingly, comprises the human voice frequency interval.

12. An apparatus for noise reduction of audio data, comprising:

13. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of noise reduction of audio data as claimed in any of claims 1-11.

14. A storage medium containing computer executable instructions for performing a method of noise reduction of audio data as claimed in any one of claims 1-11 when executed by a computer processor.