CN113571076A

CN113571076A - Signal processing method, signal processing device, electronic equipment and storage medium

Info

Publication number: CN113571076A
Application number: CN202110669275.4A
Authority: CN
Inventors: 操陈斌; 何梦楠
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2021-10-29

Abstract

The application provides a signal processing method, a signal processing device, an electronic device and a storage medium. The specific implementation scheme is as follows: the method comprises the steps of obtaining an original signal collected by a microphone, carrying out short-time Fourier transform on the original signal collected by the microphone to obtain a frequency domain signal, adopting a Kalman filter to estimate a reverberation spectrum of the frequency domain signal, carrying out reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal, carrying out cepstrum smoothing on each frequency spectrum part in the dereverberation signal by adopting a corresponding smoothing coefficient, and carrying out short-time Fourier inverse transformation on the dereverberation signal after smoothing to obtain a target signal.

Description

Signal processing method, signal processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of signal processing technologies, and in particular, to a signal processing method and apparatus, an electronic device, and a storage medium.

Background

Reverberation is an acoustic noise that appears in an enclosed space through multiple reflections and diffractions of sound on room walls and objects. In a reverberant environment, a previously emitted speech signal undergoes an overlapping masking with the current speech signal via reflections, which results in a blurring of spectral features, thereby affecting speech intelligibility and speech quality. Wherein the reverberant signal is composed of three consecutive parts: direct sound, early reflections, and late reverberation. Studies have shown that early reflections help to improve human perception, while late reverberation can create overlap-masking effects on the direct sound, affecting speech quality and intelligibility, requiring removal and suppression.

Meanwhile, in the statistical processing of noise signals, estimation errors are inevitable, so that abnormal values occur in the spectral gain adaptation process, resulting in an auditory phenomenon called musical noise.

Therefore, it is an urgent technical problem to improve the voice quality in the reverberation environment.

Disclosure of Invention

The application provides a signal processing method, a device, an electronic device and a storage medium for improving the voice quality in a reverberation environment.

According to an aspect of the present application, there is provided a signal processing method including:

acquiring an original signal acquired by a microphone;

carrying out short-time Fourier transform on the original signal acquired by the microphone to obtain a frequency domain signal;

estimating a reverberation spectrum for the frequency domain signal using a kalman filter;

according to the reverberation spectrum, performing reverberation suppression on the frequency domain signal to obtain a dereverberation signal;

performing cepstrum smoothing on each frequency spectrum part in the dereverberation signal by adopting a corresponding smoothing coefficient;

and carrying out short-time Fourier inverse transformation according to the smoothed dereverberation signal to obtain a target signal.

According to another aspect of the present application, there is provided a signal processing apparatus including:

the acquisition module is used for acquiring an original signal acquired by the microphone;

the first processing module is used for carrying out short-time Fourier transform on the original signal acquired by the microphone to obtain a frequency domain signal;

the second processing module is used for estimating a reverberation spectrum of the frequency domain signal by adopting a Kalman filter;

the third processing module is used for carrying out reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal;

the fourth processing module is used for performing cepstrum smoothing on each frequency spectrum part in the dereverberation signal by adopting a corresponding smoothing coefficient;

and the fifth processing module is used for carrying out short-time Fourier inversion according to the smoothed dereverberation signal to obtain a target signal.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the signal processing method of the first aspect.

According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the signal processing method of the first aspect.

According to another aspect of the application, a computer program product is provided, comprising a computer program, characterized in that the computer program realizes the signal processing method of the first aspect when executed by a processor.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the method comprises the steps of obtaining an original signal collected by a microphone, carrying out short-time Fourier transform on the original signal collected by the microphone to obtain a frequency domain signal, adopting a Kalman filter to estimate a reverberation spectrum of the frequency domain signal, carrying out reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal, carrying out cepstrum smoothing on each frequency spectrum part in the dereverberation signal by adopting a corresponding smoothing coefficient, and carrying out short-time Fourier inverse transformation on the dereverberation signal after smoothing to obtain a target signal.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another signal processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure;

FIG. 4 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the purpose of understanding, which are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A signal processing method, an apparatus, an electronic device, and a storage medium of the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.

As shown in fig. 1, the method comprises the steps of:

step 101, acquiring an original signal acquired by a microphone.

In this embodiment, the original signal collected by the microphone is an unprocessed signal, which carries the reverberation quotation mark.

And 102, carrying out short-time Fourier transform on the original signal acquired by the microphone to obtain a frequency domain signal.

In this embodiment, an original signal acquired by a microphone is firstly subjected to framing processing to obtain a plurality of voice frames, windowing is performed on each voice signal frame, short-time fourier transform is performed on the windowed frame of voice signal, and finally a frequency domain signal of the frame is obtained.

And 103, estimating a reverberation spectrum for the frequency domain signal by adopting a Kalman filter.

In an implementation manner of the embodiment of the application, an estimation prediction error for performing reverberation spectrum estimation on a current audio frame is predicted according to an estimation prediction error when performing reverberation spectrum estimation on a historical audio frame of a frequency domain signal, a coefficient of a kalman filter is updated according to an estimation prediction error for performing reverberation spectrum estimation on the current audio frame, and a reverberation spectrum of the current audio frame is predicted according to a dereverberation signal of the historical audio frame and the updated coefficient of the kalman filter.

And 104, performing reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal.

In an implementation manner of the embodiment of the application, according to the reverberation spectrum, a wiener filtering algorithm is adopted to suppress late reverberation in a frequency domain signal to obtain a dereverberation signal, so that the voice quality and intelligibility are improved.

And 105, performing cepstrum smoothing on each frequency spectrum part in the dereverberation signal by adopting a corresponding smoothing coefficient.

In the embodiment of the application, each frequency spectrum part is obtained by dividing the dereverberation signal, wherein each frequency spectrum part comprises a first frequency spectrum part with a set low frequency, a second frequency spectrum part where adjacent frequency points of fundamental tone and fundamental tone are located, and a third frequency spectrum part except the first frequency spectrum part and the second frequency spectrum part, and the second frequency spectrum part and the third frequency spectrum part are selected from each frequency spectrum part to perform cepstrum smoothing, wherein a smoothing coefficient adopted by the second frequency spectrum part is smaller than a smoothing coefficient adopted by the third frequency spectrum part. In the application, different smoothing strategies are adopted to smooth different frequency spectrum parts in the dereverberation signal through the cepstrum balance coefficient, so that artifacts such as music noise introduced in the process of suppressing the reverberation signal are reduced, and the voice quality is improved.

And 106, performing short-time inverse Fourier transform according to the smoothed dereverberation signal to obtain a target signal.

In the application, the smoothed reverberation-removed signal is subjected to short-time inverse Fourier transform to obtain the target signal without reverberation noise and music noise, and the target signal is converted into a time domain signal.

In the signal processing method of the embodiment of the application, the original signal collected by the microphone is obtained, the short-time Fourier transform is carried out on the original signal collected by the microphone to obtain the frequency domain signal, the Kalman filter is adopted to estimate the reverberation spectrum of the frequency domain signal, the reverberation suppression is carried out on the frequency domain signal according to the reverberation spectrum to obtain the dereverberation signal, the cepstrum smoothing is carried out on each frequency spectrum part in the dereverberation signal by adopting the corresponding smoothing coefficient, the short-time inverse Fourier transform is carried out on the dereverberation signal after smoothing to obtain the target signal, the reverberation spectrum is adaptively estimated through the Kalman filter in the application, the estimated reverberation spectrum is suppressed through the spectrum enhancement technology to obtain the dereverberation signal, the frequency spectrum characteristic is enhanced, the voice quality is improved, and meanwhile, the problem of music noise introduced in the spectrum enhancement process is effectively solved through the cepstrum smoothing method, the voice quality is further improved.

Based on the foregoing embodiment, this embodiment provides another signal processing method, which specifically illustrates how to adaptively estimate a reverberation spectrum of a speech signal acquired by a microphone according to a kalman filter for each acquired frame of speech signal, so as to remove a reverberation component in a microphone input signal, and further remove music noise from the obtained speech signal without reverberation, so as to further improve speech quality. Fig. 2 is a schematic flow chart of another signal processing method according to an embodiment of the present application, and as shown in fig. 2, the step 103 includes the following steps:

step 201, acquiring an original signal collected by a microphone.

In one implementation of this embodiment, the original signal collected by the microphone can be expressed by a convolution transfer function model (the formula of this application uses l to represent the frame, and k to represent the frequency):

where S (l, k) is the Fourier transformed frequency domain representation of the non-reverberant speech signal, H (l, k) is the convolution filter coefficients, X (l, k) is the reverberant speech signal, and V (l, k) is additive noise.

In this embodiment, the reverberation signal can be divided into two parts, namely early reflection and late reverberation:

X(l，k)＝X_D(l，k)+R_D(l，k)

wherein the delay D is used to distinguish early reflections from late reverberation, X_D(l, k) is early reflection, R_D(l, k) is late reverberation, where early reflections have been found to improve speech intelligibility, while late reverberation overlaps-masks the direct sound (ove)rlap-mask) effects that affect speech quality and intelligibility and require removal and suppression.

Step 202, performing short-time fourier transform on the original signal acquired by the microphone to obtain a frequency domain signal.

As one implementation, the window in the windowing operation is a hanning window.

Y(l)＝fft(y*win)

Where Y (l) is the vector form of the frequency domain signal Y (l, k) obtained by fourier transform of the microphone input signal, 1 represents a frame, k represents frequency, Y is the vector of the microphone input signal, win is the short time analysis window, and fft (·) is fourier transform.

y＝[y(n)，y(n-1)，…，y(n-N+1)]^T；

win＝[0；sqrt(hanming(N-1))]；

hanning(n)＝0.5*[1-cos(2π*n/N)]；

Where, hanning (N) is a hanning window of length N-1, N represents the index of the sample points, and a frame signal has N sample points.

In this embodiment, after the original signal collected by the microphone is subjected to framing and windowing, fourier transform is performed to convert the original frame of signal from the time domain to the frequency domain, so as to facilitate subsequent processing.

Step 203, predicting the estimation prediction error of the reverberation spectrum estimation of the current audio frame according to the estimation prediction error when the reverberation spectrum estimation is performed on the historical audio frame of the frequency domain signal.

In an implementation manner of the embodiment of the application, a kalman filter is used to perform reverberation spectrum estimation according to a historical audio frame of a frequency domain signal of a microphone, and estimation prediction error of the reverberation spectrum estimation performed on a current audio frame is predicted.

Firstly, obtaining a historical audio frame of a microphone frequency domain signal, namely a previous audio frame of a current audio frame, and determining a priori error signal E (l | l-1, k) of the current audio frame according to an estimated signal of the historical audio frame after reverberation removal and a coefficient of a Kalman filter corresponding to the historical audio frame.

Where Y (l, k) is the frequency domain signal of the microphone,

is the coefficient, X, of the Kalman adaptive filter corresponding to the previous audio frame^T(l-1, k) is the dereverberated signal estimated for the previous audio frame.

Furthermore, the prior error variance of the current audio frame is determined according to the prior error signal E (l | l-1, k) of the current audio frame

Further, a posterior error variance of a previous audio frame is obtained

According to the acquired posterior error variance of the previous audio frame

And the prior error variance of the current audio frame

Performing a weighted average calculation to predict an estimated prediction error for a reverberation spectrum estimation of a current audio frame

Wherein beta is a forgetting factor, and beta is more than or equal to 0 and less than or equal to 1. It should be noted that the posterior error variance of the previous audio frame is the posterior error variance of the reverberation spectrum estimation performed on the previous audio frame determined according to the dereverberation signal of the previous audio frame.

And step 204, updating the coefficient of the Kalman filter according to the estimation prediction error of the reverberation spectrum estimation of the current audio frame.

In one implementation of this embodiment, an estimated a posteriori state error covariance matrix for a previous frame is obtained

Sum process noise covariance matrix Φ_w(l, k) to determine the prior state error covariance matrix P (l | l-1, k) for the current audio frame.

P(l|l-1，k)＝P(l-1，k)+Φ_w(l，k)；

Wherein the content of the first and second substances,

Φ_w(l, k) may be a scaled identity matrix, I is an identity matrix,

is a parameter for controlling uncertainty of target coefficient g (l, k) of Kalman adaptive filter

Further, an estimated prediction error of the reverberation spectrum estimation is performed according to the current audio frame

And the prior state error covariance matrix P (l | l-1, K) of the current audio frame, and determining the Kalman gain K (l, K) corresponding to the current audio frame.

Further, according to the Kalman gain corresponding to the current audio frameK (l, K), the a priori error signal E (l | l-1, K) for the current audio frame, and the Kalman adaptive filter coefficients estimated at the time of the previous audio frame

Corresponding Kalman adaptive filter coefficients when predicting the current audio frame to update the Kalman adaptive filter coefficients

And step 205, predicting the reverberation spectrum of the current audio frame according to the dereverberation signal of the historical audio frame and the updated coefficient of the Kalman filter.

In an implementation manner of the embodiment of the application, a first-order recursive average algorithm is adopted to determine the reverberation spectrum of the current audio frame according to the dereverberation signal of the historical audio frame and the updated coefficient of the kalman filter.

Wherein the content of the first and second substances,

is the predicted reverberation spectrum of the current audio frame,

the method is characterized in that the method is an estimation prediction error of the reverberation spectrum estimation of the previous audio frame, wherein lambda is a forgetting factor, and lambda is more than or equal to 0 and less than or equal to 1.

And step 206, according to the reverberation spectrum of the current audio frame, performing reverberation suppression on the frequency domain signal to obtain a dereverberation signal of the current audio frame.

In an implementation manner of this embodiment, according to the reverberation spectrum of the current audio frame, a dimensional nano-filtering algorithm is used to suppress late reverberation in the frequency domain signal to obtain a dereverberation signal of the current audio frame, so that suppression of the late reverberation signal in the frequency signal is realized, and quality of a microphone acquisition signal is improved.

Specifically, an estimated power spectrum of the microphone input signal Y (l, k) is calculated.

From the reverberation spectrum

And an estimated power spectrum of the microphone input signal, determining a posterior signal-to-noise ratio gamma (l, k) of the current audio frame;

wherein the content of the first and second substances,

is the estimated power spectrum of the microphone input signal Y (l, k).

Further, an estimated power spectrum of a target signal of a frame preceding the current audio frame is determined

Wherein the content of the first and second substances,

the estimation power spectrum is obtained by estimating the power spectrum of the target signal without late reverberation.

Further, the reverberation spectrum of the current audio frame is used

And a previous frame object of the current audio frameEstimated power spectrum of a signal

And calculating to obtain the estimated posterior signal-to-noise ratio of the current audio frame by a decision-oriented method.

Wherein eta is forgetting factor, eta is more than or equal to 0 and less than or equal to 1, ξ_minIs the minimum a priori signal to noise ratio.

Further, a gain function for removing late reverberation signals is determined

Thus, the determined gain function and the signal of the current frame input by the microphone are used to obtain the dereverberation signal X of the current audio frame without dereverberation^T(l，k)，X^T(l，k)＝Y(l，k)*G(l，k)。

Step 207, determining the posterior error variance of the reverberation spectrum estimation of the current audio frame according to the dereverberation signal of the current audio frame.

The posterior error variance of the current audio frame is used for predicting an estimation prediction error of a subsequent audio frame for reverberation spectrum estimation, the estimation prediction error indicates the difference between a signal of the current audio frame and an ideal target signal without reverberation, the estimation prediction error is determined through continuous iteration in the process of estimating the reverberation spectrum, the difference between the signal of the current audio frame indicated by the estimation prediction error and the ideal target signal without reverberation is enabled to be smaller and smaller, the coefficient of the Kalman filter is continuously updated, and finally the Kalman filter is enabled to reach the optimal state, namely the accuracy of the estimated reverberation spectrum obtained by utilizing the Kalman filter is improved.

In this embodiment, the dereverberation signal X is based on the current audio frame^T(l, k) determining the posterior error variance of the reverberation spectrum estimation of the current audio frame

And 208, dividing the dereverberation signal of the current audio frame to obtain spectrum parts, wherein each spectrum part comprises a first spectrum part with a set low frequency, a second spectrum part with adjacent frequency points of fundamental tone and fundamental tone, and a third spectrum part except the first spectrum part and the second spectrum part.

In this embodiment, the dereverberation signal of the current audio frame is divided to obtain each frequency spectrum portion, where each frequency spectrum portion includes a corresponding frequency point, the frequency of the corresponding frequency point included in each frequency spectrum portion is different, and each frequency point has a corresponding gain function.

Step 209, selecting a second spectral portion and a third spectral portion from the spectral portions for cepstral smoothing.

And the smoothing coefficient adopted by the second spectrum part is smaller than that adopted by the third spectrum part.

In this embodiment, in order to ensure that the speech starting point and the spectral envelopes of the fricative and plosive are not distorted by the smoothing process, the first spectral portion with a low frequency is not smoothed. In addition, a smaller smoothing coefficient is used for a second spectrum part where adjacent frequency points of the fundamental tone and the fundamental tone are located, and the duration of the voiced sound is longer, so that the fine structure (such as the fundamental tone and the harmonic) of the voice signal is not influenced. In summary, the following strategy is adopted for selecting the smoothing coefficient β:

1) for the first spectral portion where the low frequency is set, smoothing is not performed.

2) And for the second spectrum part where the adjacent frequency points of the fundamental tone and the fundamental tone are positioned, using a smaller smoothing coefficient.

3) For spectral portions other than the first spectral portion and the second spectral portion, a larger smoothing coefficient is used.

Wherein k is_low' means that the highest frequency point of the first spectral portion with a set low frequency, below which the corresponding cepstral gain factor of the spectral portion is not smooth. k'_pitchCepstrum frequency points representing fundamental tones, which can be found by searching for G between 80Hz and 500Hz in a speech segment_cepThe frequency point corresponding to the maximum value of (l, k') is determined, and three adjacent frequency points use smaller smoothing parameter beta_low. For other cepstrum frequency points, a larger smoothing parameter beta is used_highIs subjected to smoothing, wherein_highGreater than beta_low。

In one implementation of this embodiment, the gain function G (l, k) of the dereverberated signal of the current audio frame is represented in vector form G (l), and the inverse fourier transform is performed on the gain function in vector form to obtain a cepstral domain representation G of the gain function_cep(l)。

G_cep(1)＝ifft(log(G(l)))。

Further, a gain function of a cepstral domain obtained by performing cepstral smoothing based on the smoothing parameter is represented by G_{sm_cep}(l，k)＝βG_{sm_cep}(l-1，k)+(1-β)G_cep(l，k)；

Where k' is the frequency bin of the cepstral domain.

And further, converting the gain function of the cepstrum domain obtained after smoothing into a frequency domain through Fourier transform to obtain:

G_sm(l)＝exp(fft(G_{sm_cep}(l)))；

wherein G is_{sm_cep}(l) Is G_{sm_cep}(l, k') in the form of a vector of all frequency bins.

Further, according to the smoothed gain function and the current frame signal input by the microphone, obtaining a smoothed dereverberation signal x (l) of the current audio frame, that is, x (l) is a smoothed enhancement signal for removing reverberation and removing music noise, wherein x (l) is y (l)_sm(l)。

In this embodiment, an abnormal value occurs during the reverberation spectrum suppression process, that is, during the spectrum gain adaptation process, which may cause an auditory phenomenon called music noise, and different smoothing strategies are set for different spectrum portions, specifically, by smoothing gains of cepstrum domains corresponding to the first spectrum portion and the second spectrum portion, dynamic representation of a signal over time due to a higher gain may be reduced, generation of music noise may be suppressed, and quality of a speech signal may be improved.

And step 210, performing short-time Fourier inversion according to the smoothed dereverberation signal of the current voice frame to obtain a target signal of the current voice frame.

In an implementation manner of this embodiment, a target signal of a current speech frame is:

e＝ifft(Y(l).*G_sm(l)).*win。

in the signal processing method of this embodiment, an original signal collected by a microphone is obtained, a short-time fourier transform is performed on the original signal collected by the microphone to obtain a frequency domain signal, a kalman filter is used to estimate a reverberation spectrum of the frequency domain signal, a reverberation suppression is performed on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal, a cepstrum smoothing is performed on each frequency spectrum part in the dereverberation signal by using a corresponding smoothing coefficient, a short-time inverse fourier transform is performed according to the smoothed dereverberation signal to obtain a target signal, in the present application, the reverberation spectrum is adaptively estimated by the kalman filter, and the estimated reverberation spectrum is suppressed by a spectrum enhancement technology to obtain the dereverberation signal, so that the frequency spectrum characteristics are enhanced, the voice quality is improved, and meanwhile, the problem of music noise introduced in the spectrum enhancement process is effectively solved by using the cepstrum smoothing method, the voice quality is further improved.

In order to implement the above embodiments, the present embodiment provides a signal processing apparatus.

Fig. 3 is a schematic structural diagram of a signal processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the apparatus includes:

an obtaining module 31, configured to obtain an original signal collected by a microphone;

the first processing module 32 is configured to perform short-time fourier transform on an original signal acquired by the microphone to obtain a frequency domain signal;

a second processing module 33, configured to estimate a reverberation spectrum for the frequency domain signal by using a kalman filter;

the third processing module 34 is configured to perform reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal;

a fourth processing module 35, configured to perform cepstrum smoothing on each frequency spectrum portion in the dereverberation signal by using a corresponding smoothing coefficient;

and a fifth processing module 36, configured to perform short-time inverse fourier transform on the smoothed dereverberation signal to obtain a target signal.

Further, in an implementation manner of the embodiment of the present application, the second processing module 33 is specifically configured to:

according to the estimation prediction error when the reverberation spectrum estimation is carried out on the historical audio frame of the frequency domain signal, the estimation prediction error for carrying out the reverberation spectrum estimation on the current audio frame is predicted;

updating the coefficient of the Kalman filter according to an estimation prediction error of the reverberation spectrum estimation of the current audio frame;

and predicting the reverberation spectrum of the current audio frame according to the dereverberation signal of the historical audio frame and the updated coefficient of the Kalman filter.

In an implementation manner of the embodiment of the present application, the second processing module 33 is further specifically configured to:

and determining the reverberation spectrum by adopting a first-order recursive average algorithm according to the dereverberation signal of the historical audio frame and the updated coefficient of the Kalman filter.

In an implementation manner of the embodiment of the present application, the apparatus further includes:

the determining module is used for determining the posterior error variance of the reverberation spectrum estimation of the current audio frame according to the dereverberation signal of the current audio frame; the posterior error variance of the current audio frame is used for predicting the estimation prediction error of the reverberation spectrum estimation of the subsequent audio frame.

In an implementation manner of the embodiment of the present application, the third processing module 34 is specifically configured to:

and according to the reverberation spectrum, adopting a wiener filtering algorithm to suppress late reverberation in the frequency domain signal to obtain the dereverberation signal.

In an implementation manner of the embodiment of the present application, the fourth processing module 35 is specifically configured to:

dividing the dereverberation signal to obtain each frequency spectrum part, wherein each frequency spectrum part comprises a first frequency spectrum part with set low frequency, a second frequency spectrum part where fundamental tones and adjacent frequency points of the fundamental tones are located, and a third frequency spectrum part except the first frequency spectrum part and the second frequency spectrum part;

selecting the second spectrum part and the third spectrum part from the spectrum parts for cepstrum smoothing; wherein the second spectral portion employs a smoothing coefficient that is less than a smoothing coefficient employed by the third spectral portion.

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and the principle is the same, and is not repeated in this embodiment.

In the signal processing device of this embodiment, an original signal collected by a microphone is obtained, a short-time fourier transform is performed on the original signal collected by the microphone to obtain a frequency domain signal, a kalman filter is used to estimate a reverberation spectrum of the frequency domain signal, a reverberation suppression is performed on the frequency domain signal according to the reverberation spectrum to obtain a dereverberation signal, a corresponding smoothing coefficient is used to perform cepstrum smoothing on each frequency spectrum part in the dereverberation signal, a short-time inverse fourier transform is performed according to the smoothed dereverberation signal to obtain a target signal, in the present application, the reverberation spectrum is adaptively estimated by the kalman filter, and the estimated reverberation spectrum is suppressed by a spectrum enhancement technology to obtain the dereverberation signal, so that the frequency spectrum characteristics are enhanced, the voice quality is improved, and meanwhile, a cepstrum smoothing method is used to effectively solve the problem of music noise introduced in the spectrum enhancement process, the voice quality is further improved.

In order to implement the above embodiments, the present embodiment provides an electronic device, including:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the signal processing method of the foregoing method embodiments.

In order to implement the foregoing embodiments, the present embodiment provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are configured to cause the computer to execute the signal processing method of the foregoing method embodiment.

In order to implement the above embodiments, the present embodiment provides a computer program product comprising a computer program which, when executed by a processor, implements the signal processing method described in the aforementioned method embodiments.

FIG. 4 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application. The electronic device 12 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in FIG. 4, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a memory 28, and a bus 18 that couples various system components including the memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from and writing to a removable, nonvolatile optical disk (e.g., a Compact disk read Only Memory (CD-ROM), a Digital versatile disk read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, the electronic device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network (e.g., an Internet) via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by executing programs stored in the memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A signal processing method, comprising the steps of:

acquiring an original signal acquired by a microphone;

2. The signal processing method of claim 1, wherein said estimating a reverberation spectrum of said frequency domain signal using a kalman filter comprises:

predicting an estimation prediction error for performing reverberation spectrum estimation on the current audio frame according to an estimation prediction error when performing reverberation spectrum estimation on the historical audio frame of the frequency domain signal;

3. The signal processing method according to claim 2, wherein predicting the reverberation spectrum of the current audio frame according to the dereverberation signal of the historical audio frame and the updated coefficients of the kalman filter comprises:

4. The signal processing method according to claim 2, wherein after performing reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberated signal, the method further comprises:

determining the posterior error variance of the current audio frame for carrying out reverberation spectrum estimation according to the dereverberation signal of the current audio frame; the posterior error variance of the current audio frame is used for predicting an estimation prediction error of a subsequent audio frame for carrying out reverberation spectrum estimation.

5. The signal processing method according to any one of claims 1 to 4, wherein the performing reverberation suppression on the frequency domain signal according to the reverberation spectrum to obtain a dereverberated signal comprises:

6. The signal processing method according to any one of claims 1 to 4, wherein said performing cepstral smoothing on each spectral portion in the dereverberated signal by using a corresponding smoothing coefficient comprises:

dividing the dereverberation signal to obtain spectrum parts, wherein each spectrum part comprises a first spectrum part with set low frequency, a second spectrum part with fundamental tone and adjacent frequency points of the fundamental tone, and a third spectrum part except the first spectrum part and the second spectrum part;

7. A signal processing apparatus, characterized by comprising:

and the fifth processing module is used for carrying out short-time Fourier inverse transformation according to the smoothed dereverberation signal to obtain a target signal.

8. The apparatus according to claim 7, wherein the second processing module is specifically configured to:

9. The apparatus of claim 8, wherein the second processing module is further specifically configured to:

10. The apparatus of claim 8, further comprising:

the determining module is used for determining the posterior error variance of the reverberation spectrum estimation of the current audio frame according to the dereverberation signal of the current audio frame; the posterior error variance of the current audio frame is used for predicting an estimation prediction error of a subsequent audio frame for carrying out reverberation spectrum estimation.

11. The apparatus according to any one of claims 7 to 10, wherein the third processing module is specifically configured to:

12. The apparatus according to any of claims 7-10, wherein the fourth processing module is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the signal processing method of any one of claims 1-6.

14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the signal processing method according to any one of claims 1 to 6.

15. A computer program product comprising a computer program, characterized in that the computer program realizes the signal processing method according to any one of claims 1-6 when executed by a processor.