CN114900730A

CN114900730A - Method and device for acquiring delay estimation steady state value, electronic equipment and storage medium

Info

Publication number: CN114900730A
Application number: CN202210614157.8A
Authority: CN
Inventors: 廖达松
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-08-12
Anticipated expiration: 2042-05-31
Also published as: CN114900730B

Abstract

The application relates to the technical field of audio and live webcasting, and provides a method and a device for obtaining a delay estimation steady-state value, electronic equipment and a storage medium. The method and the device can improve the output speed of the delay estimation steady-state value. The method comprises the following steps: acquiring similarity of a target frame microphone signal and each frame of reference signal and a preset coefficient which is respectively adaptive to the current signal transmission stage, and inputting the similarity and the preset coefficient into a delay probability determination model to obtain delay probabilities which respectively correspond to each frame of reference signal output by the model; the model carries out weighted summation on the delay probability weight obtained by adjusting the preset coefficient according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signal and the frame number difference corresponding to the frame number difference to obtain the delay probability corresponding to the reference signal; and acquiring the frame number of the reference signal corresponding to the maximum delay probability based on the delay probability corresponding to each frame of reference signal, and acquiring a delay estimation steady-state value according to the frame number and the frame number of the current frame of microphone signal.

Description

Method and device for acquiring delay estimation steady-state value, electronic equipment and storage medium

Technical Field

The present application relates to the field of audio and webcast technologies, and in particular, to a method and an apparatus for obtaining a steady-state value of a delay estimation, an electronic device, and a computer-readable storage medium.

Background

In a voice communication system of network live broadcast, echo cancellation processing needs to be performed on echo signals in a signal transmission process, and a delay estimation unit in the echo cancellation system is used for performing shift alignment on reference signals and microphone signals. The echo cancellation is to cancel echo signals in the microphone acquisition signals, and the delay estimation is to estimate the relative delay between two paths of signals.

In the current technology, the delay estimation is divided into two steps, the first step is to calculate the transient value of the delay estimation, and the second step is to obtain the steady-state value of the delay estimation. The technology needs to acquire a delay estimation transient value with a certain time length, then calculates the maximum probability distribution of the delay estimation transient values, and obtains a delay estimation steady-state value output according to the delay estimation transient value with the maximum probability. However, this technique needs to acquire a long-time delay estimation transient value, which results in a slow output speed of the delay estimation steady-state value, slowing down the convergence speed of the delay estimation, and further causes a problem of long echo residual time.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, an electronic device and a computer-readable storage medium for obtaining a steady-state value of a delay estimation.

In a first aspect, the present application provides a method for obtaining a steady-state value of a delay estimation. The method comprises the following steps:

acquiring the similarity between the target frame microphone signal and each frame reference signal;

determining a preset coefficient adaptive to the current signal transmission stage;

inputting the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into a preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model; the preset delay probability determination model is used for performing weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers; the microphone frame number difference is the frame number difference between each frame of microphone signal and the current frame of microphone signal in the target frame of microphone signals;

acquiring the frame number of the reference signal corresponding to the maximum delay probability based on the delay probability corresponding to each frame of reference signal;

and obtaining a delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

In one embodiment, the determining the preset coefficient adapted to the current signal transmission stage includes:

acquiring a preset corresponding relation between a signal transmission stage and a coefficient; and acquiring a preset coefficient adaptive to the current signal transmission stage based on the preset corresponding relation.

In an embodiment, the obtaining a preset coefficient adapted to the current signal transmission stage based on the preset corresponding relationship includes:

based on the preset corresponding relation, when the current signal transmission stage is an initial stage, determining a preset coefficient adapted to the current signal transmission stage as a first preset coefficient; when the current signal transmission stage is a single talk stage, determining the preset coefficient adapted to the current signal transmission stage as a second preset coefficient; and when the current signal transmission stage is a double-talk stage, determining that the preset coefficient adaptive to the current signal transmission stage is a third preset coefficient.

In one embodiment, the target frame microphone signals comprise a current frame microphone signal and each preamble frame microphone signal; the obtaining of the delay probability corresponding to each frame of reference signal output by the preset delay probability determination model by inputting the similarity between the target frame of microphone signal and each frame of reference signal and the preset coefficient into the preset delay probability determination model includes:

inputting the similarity between the current frame microphone signal and each frame reference signal, the similarity between each pre-preamble frame microphone signal and each frame reference signal, and a preset coefficient into the preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model;

the preset delay probability determination model is used for determining a microphone frame number difference corresponding to each frame of microphone signal, and obtaining a delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of the preset coefficient; for each frame of reference signal, multiplying the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-preamble frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability contribution value corresponding to each frame of microphone signal; and summing the delay probability contribution values corresponding to the microphone signals of each frame to obtain the delay probability corresponding to the reference signal of the frame.

In an embodiment, the obtaining the delay probability weight corresponding to each microphone signal frame by using the microphone frame number difference as an index adjustment factor of the preset coefficient includes:

obtaining a first part of weight according to the preset coefficient, and obtaining a second part of weight by taking the microphone frame number difference as an index adjustment factor of the preset coefficient; and multiplying the first part weight and the second part weight to obtain the delay probability weight.

In one embodiment, the obtaining the similarity between the target frame microphone signal and each frame reference signal includes:

obtaining the spectral similarity between the target frame microphone signal and each frame reference signal; and calculating the weighted frequency point sum of the spectrum similarity aiming at each frame of reference signal to obtain the similarity between the target frame of microphone signal and the frame of reference signal.

In one embodiment, the obtaining the spectral similarity between the target frame microphone signal and the frame reference signals respectively comprises:

acquiring an autocorrelation spectrum of the target frame microphone signal, acquiring respective autocorrelation spectra of the reference signals of each frame, and acquiring cross-correlation spectra of the target frame microphone signal and the reference signals of each frame respectively; and for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal to obtain the spectral similarity of the target frame microphone signal and the frame of reference signal.

In a second aspect, the present application further provides an apparatus for obtaining a steady-state value of delay estimation. The device comprises:

the similarity acquisition module is used for acquiring the similarity between the target frame microphone signal and each frame reference signal;

the coefficient determining module is used for determining a preset coefficient adaptive to the current signal transmission stage;

a probability determination module, configured to input the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into a preset delay probability determination model, so as to obtain delay probabilities corresponding to the frames of reference signals output by the preset delay probability determination model; the preset delay probability determination model is used for performing weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers; the microphone frame number difference is the frame number difference between each frame of microphone signal and the current frame of microphone signal in the target frame of microphone signals;

a frame number obtaining module, configured to obtain, based on the delay probability corresponding to each frame of reference signal, a frame number of the reference signal corresponding to the maximum delay probability;

and the steady state value obtaining module is used for obtaining the delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

In a third aspect, the present application further provides an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor realizes the following steps when executing the computer program:

acquiring the similarity between the target frame microphone signal and each frame reference signal; determining a preset coefficient adaptive to the current signal transmission stage; respectively inputting the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into a preset delay probability determination model to obtain delay probabilities corresponding to the frame reference signals output by the preset delay probability determination model; the preset delay probability determination model is used for performing weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers; the microphone frame number difference is the frame number difference between each frame of microphone signal and the current frame of microphone signal in the target frame of microphone signals; acquiring the frame number of the reference signal corresponding to the maximum delay probability based on the delay probability corresponding to each frame of reference signal; and obtaining a delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring the similarity between the target frame microphone signal and each frame reference signal; determining a preset coefficient adaptive to the current signal transmission stage; inputting the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into a preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model; the preset delay probability determination model is used for performing weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers; the microphone frame number difference is the frame number difference between each frame of microphone signal and the current frame of microphone signal in the target frame of microphone signals; acquiring the frame number of the reference signal corresponding to the maximum delay probability based on the delay probability corresponding to each frame of reference signal; and obtaining a delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

According to the method, the device, the electronic equipment and the storage medium for obtaining the delay estimation steady-state value, the similarity between the target frame microphone signal and each frame reference signal is obtained, the preset coefficient which is adaptive to the current signal transmission stage is determined, the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient are input into the preset delay probability determination model, and the delay probability which corresponds to each frame reference signal output by the preset delay probability determination model is obtained; the preset delay probability determination model is used for carrying out weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting a preset coefficient based on the difference of the microphone frame numbers; and acquiring the frame number of the reference signal corresponding to the maximum delay probability based on the delay probability corresponding to each frame of reference signal, and acquiring a delay estimation steady-state value according to the frame number and the frame number of the current frame of microphone signal. The scheme obtains the similarity between the microphone signal of a target frame and the reference signal of each frame respectively and determines a preset coefficient which is adaptive to the current signal transmission stage, a preset delay probability determination model carries out weighted summation on delay probability weights obtained by adjusting the preset coefficient according to the similarity between the microphone signal of each frame and the reference signal and the frame number difference corresponding to the similarity and the frame number difference to obtain a delay estimation steady state value, the delay estimation steady state value obtaining based on coefficient feedback is realized, the delay estimation transient value does not need to be calculated, the probability distribution of the delay estimation transient value does not need to be calculated, the output speed of the delay estimation steady state value is improved, the convergence speed of the delay estimation is accelerated, the time of echo residue is shortened, and the probability of echo is reduced, the echo cancellation performance under the application scenes of a voice room, live broadcast and wheat connection and the like is improved.

Drawings

Fig. 1 is an application scenario diagram of a method for obtaining a steady-state value of a delay estimation in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a comparison between the method for obtaining a steady-state value of delay estimation according to the present application and the current method for obtaining a steady-state value of delay estimation;

FIG. 3 is a schematic flow chart of a method for obtaining a steady-state value of a delay estimation according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a step of obtaining similarity between a microphone signal of a target frame and a reference signal according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating the steps of obtaining spectral similarity between a microphone signal of a target frame and a reference signal according to an embodiment of the present application;

fig. 6 is a block diagram of a structure of an apparatus for obtaining a steady-state value of a delay estimation in an embodiment of the present application;

fig. 7 is an internal structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The method for obtaining the delay estimation steady-state value provided by the embodiment of the application can be applied to an application scenario shown in fig. 1, where the scenario may include a terminal 110, a microphone 120, and a speaker 130, and the terminal 110 may be a personal computer, a smart phone, a tablet computer, a portable wearable device, and the like. Wherein, portable wearable equipment can be for intelligent wrist-watch, intelligent bracelet, head-mounted apparatus etc.. In this scenario, a user may engage in a voice communication session with other terminals via terminal 110, microphone 120, and speaker 130, terminal 110 may perform delay estimation and echo cancellation in the voice communication session, a rendering unit (Render) may be included in terminal 110, the device comprises a capturing unit (Capture), a delay estimation unit and an echo cancellation unit, wherein the rendering unit (Render) can provide corresponding reference signals to the delay estimation unit according to the obtained playing signals, the capturing unit (Capture) can obtain microphone signals according to signals input by a user through a microphone 120 and signals played by a loudspeaker 130 and provide the microphone signals to the delay estimation unit and the echo cancellation unit, the delay estimation unit can obtain aligned reference signals according to the microphone signals and the reference signals and provide the aligned reference signals to the echo cancellation unit, and the echo cancellation unit performs echo cancellation processing and output according to the aligned reference signals and the microphone signals.

In practical application, the method for obtaining the delay estimation steady state value of the present application may be executed by the terminal 110, and specifically, the method may be used as a part of a delay estimation unit in the terminal 110 in the form of a device for obtaining the delay estimation steady state value, so as to perform subsequent processing such as shift alignment, echo cancellation on the reference signal and the microphone signal.

In order to clearly show the difference between the method for acquiring a delay estimation steady state value and the method for acquiring a delay estimation steady state value in the prior art, the method for acquiring a delay estimation steady state value and the method for acquiring a delay estimation steady state value in the prior art are first described by comparing with fig. 2, and fig. 2 is a schematic diagram for comparing the method for acquiring a delay estimation steady state value and the method for acquiring a delay estimation steady state value in the prior art.

Specifically, the purpose of the delay estimation is to obtain a frame delay estDelay between the reference signal and the microphone signal, and then shift the reference signal by the length of the frame delay estDelay, thereby aligning the microphone signals. In the current method for obtaining the delay estimation steady state value, a current frame delay estimation transient value is obtained according to a microphone signal and a multi-frame reference signal, the current frame delay estimation transient value is stored in a buffer queue with the size of M, then enough delay estimation transient values are accumulated in the buffer queue, then the maximum probability distribution of the delay estimation transient values is obtained according to the delay estimation transient values accumulated in the buffer queue, and finally the delay estimation steady state value output is obtained according to the difference value of the delay estimation transient value with the maximum probability and the current frame microphone signal.

According to the method for obtaining the delay estimation steady state value, the delay probability corresponding to each frame of reference signal is obtained through a preset delay probability determination model according to the similarity between the target frame of microphone signal and each frame of reference signal and the preset coefficient fed back in the signal transmission stage, so that the frame number of the reference signal corresponding to the maximum delay probability is obtained, and then the delay estimation steady state value is obtained according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame of microphone signal in the target frame of microphone signal. According to the method for obtaining the delay estimation steady state value, the delay estimation transient state value and the probability distribution do not need to be calculated, the speed of outputting the delay estimation steady state value can be increased, the convergence speed of delay estimation is increased, the time of echo residue is shortened, the probability of echo is reduced, and the echo cancellation performance under application scenes such as a voice room, live broadcast microphone connection and the like is improved.

The following describes a method for obtaining a steady-state value of a delay estimation according to the present application with reference to an embodiment and a corresponding drawing.

In one embodiment, as shown in fig. 3, there is provided a method for obtaining a steady-state value of a delay estimation, including the following steps:

step S301, obtaining the similarity between the target frame microphone signal and each frame reference signal.

The target frame microphone signal is a microphone signal required for acquiring a delay estimation steady-state value, and may include a current frame microphone signal and a preamble frame or multiple frames of microphone signals thereof, which are called preamble frame microphone signals, where, for example, the frame number of the current frame microphone signal is n, the frame number of the preamble frame microphone signal may include n-1, n-2, etc., and the target frame microphone signal may include the current frame microphone signal (frame number n), the preamble frame microphone signal (frame number n-1), the preamble frame microphone signal (frame number n-2), etc.; the number of frames of the reference signal may take N frames. In this step, the similarity between the target frame microphone signal and each frame reference signal is obtained, and for convenience of description, taking the current frame microphone signal and the ith frame reference signal in the target frame microphone signal as an example, the current frame microphone signal may be usedThe similarity of the signal and the reference signal of the ith frame is expressed as P _xd (n, i), n represents the frame number of the current frame microphone signal, x corresponds to the reference signal, and d corresponds to the microphone signal. The similarity may be a similarity between the microphone signal and the reference signal in a characteristic dimension such as a frequency spectrum, a power spectrum, and the like.

In some embodiments, as shown in fig. 4, step S301 may include:

step S401, obtaining the spectral similarity between the target frame microphone signal and each frame reference signal.

In this step, for each frame of reference signal, the spectral similarity between the target frame of microphone signal and the frame of reference signal may be calculated based on the cross-correlation spectrum between the target frame of microphone signal and the frame of reference signal.

Specifically, as shown in fig. 5, as an embodiment, the step S401 specifically includes:

step S501, obtaining an autocorrelation spectrum of the microphone signal of the target frame, obtaining an autocorrelation spectrum of each reference signal of each frame, and obtaining a cross-correlation spectrum of the microphone signal of the target frame and each reference signal of each frame.

Step S502, for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal, so as to obtain the spectral similarity between the target frame microphone signal and the frame of reference signal.

Specifically, assuming E { a } as the desired operation of a, the desired operation of a can be calculated using N values:

wherein alpha is _i Is a weighting coefficient, satisfies:

specifically, let f (x) be a corresponding transform domain output signal obtained after the time domain signal x (t) is subjected to a certain signal transform, and k represents an output coordinate of the transform domain, where the signal transform may be, but is not limited to, fourier transform. For example, the output signals of the respective frequency bands may be extracted by a frequency division filter, and the time domain signal may be converted into a desired transform domain representation.

In this embodiment, the autocorrelation spectrums of the microphone signal of the target frame and the reference signal of each frame may be calculated first. Storing the transform domain representation Xf (i, k) of the reference signal into a buffer, calculating the autocorrelation spectrum Sx (i, k) of the reference signal, and storing the autocorrelation spectrum Sx (i, k) of the reference signal into the buffer:

Sx(i,k)＝E{Xf(i,k) ² }。

calculating an autocorrelation spectrum sd (k) of the target frame microphone signal from the transform domain representation df (k) of the target frame microphone signal:

Sd(k)＝E{Df(k) ² }。

calculating a cross-correlation spectrum Sxd (i, k) of the target frame microphone signal and the ith frame reference signal:

Sxd(i,k)＝E{Xf(i,k)*Df ^* (k)}。

thereby obtaining an autocorrelation spectrum Sx (i, k) of each frame reference signal, an autocorrelation spectrum sd (k) of the target frame microphone signal, and a cross-correlation spectrum Sxd (i, k) of the target frame microphone signal and the i-th frame reference signal.

Then, for each frame of reference signal, such as the ith frame of reference signal, the autocorrelation spectrum Sd (k) of the target frame of microphone signal and the autocorrelation spectrum Sx (i, k) of the ith frame of reference signal are utilized to normalize the cross-correlation spectrum Sxd (i, k) of the target frame of microphone signal and the ith frame of reference signal, so as to obtain the spectrum similarity P between the target frame of microphone signal and the ith frame of reference signal _xd (i, k) ═ normaize { Sxd (i, k) }. According to the calculation method of the spectrum similarity of the ith frame, the spectrum similarity of the microphone signal of the target frame and the reference signal of each frame can be obtained.

Step S402, calculating the weighted frequency point sum of the spectrum similarity of each frame of reference signals to obtain the similarity between the microphone signal of the target frame and the reference signal of the frame.

Using the spectral similarity P of the microphone signal of the target frame and the reference signal of the i-th frame _xd (i, k) for example, calculate the target frameSpectral similarity P of a wind signal to an i-th frame reference signal _xd Weighted sum of frequency points of (i, k)

Taking the weighted frequency point sum as the similarity P between the microphone signal of the target frame and the reference signal of the ith frame _xd (n,i)：

Wherein, beta _k Is a weighting coefficient, satisfies:

thereby obtaining the similarity P between the microphone signal of the target frame and the reference signal of the ith frame _xd (n, i), according to the calculation method, the similarity between the target frame microphone signal and each frame reference signal can be obtained.

Step S302, determining a preset coefficient adapted to the current signal transmission stage.

The method mainly comprises the step of determining a preset coefficient adapted to the current signal transmission stage according to the current signal transmission stage. The current signal transmission stage refers to a signal transmission stage corresponding to the current acquisition process of the delay estimation steady-state value or is called a call state, and in practical application, the signal transmission stage may include an initial stage, a single-talk stage and a double-talk stage. Specifically, in this step, a current signal transmission stage may be determined according to signal transmission state information provided by the rendering unit and the capturing unit, where the current signal transmission stage may be one of an initial stage, a single-talk stage and a double-talk stage, and a plurality of preset coefficients may be preconfigured and respectively correspond to different signal transmission stages, so that after the current signal transmission stage is determined, a preset coefficient γ (n) adapted to the current signal transmission stage is obtained from the preconfigured preset coefficients, and the preset coefficient may be adaptively adjusted according to the current signal transmission stage.

As an example, step S302 may include:

In this embodiment, a preset corresponding relationship between a signal transmission stage and a coefficient, which is configured in advance by a relevant person, may be obtained, and after the current signal transmission stage is determined, a preset coefficient γ (n) adapted to the current signal transmission stage is obtained based on the preset corresponding relationship.

Further, in some embodiments, the obtaining of the preset coefficient adapted to the current signal transmission stage based on the preset corresponding relationship in the above embodiments may specifically include:

based on the preset corresponding relation, when the current signal transmission stage is an initial stage, determining a preset coefficient adapted to the current signal transmission stage as a first preset coefficient; when the current signal transmission stage is a single talk stage, determining a preset coefficient suitable for the current signal transmission stage as a second preset coefficient; and when the current signal transmission stage is a double-talk stage, determining the preset coefficient suitable for the current signal transmission stage as a third preset coefficient.

In this embodiment, based on the preset corresponding relationship, the adaptive feedback of the corresponding preset coefficient according to the current signal transmission stage can be realized. Specifically, the preset corresponding relationship may be expressed as:

based on the preset corresponding relation, when the current signal transmission stage is the initial stage, determining that the preset coefficient gamma (n) adaptive to the current signal transmission stage is the first preset coefficient gamma ₁ When the current signal transmission stage is the single talk stage, the preset coefficient gamma (n) adapted to the current signal transmission stage is determined as a second preset coefficient gamma ₂ When the current signal transmission stage is a double-talk stage, the preset coefficient gamma (n) adapted to the current signal transmission stage is determined as a third preset coefficient gamma ₃ 。

Step S303, inputting the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into the preset delay probability determination model, to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model.

In this step, the similarity P between the target frame microphone signal and each frame reference signal is calculated _xd (n, i) and a preset coefficient gamma (n) are used as input data of a preset delay probability determination model, and the preset delay probability determination model respectively shares the similarity P with each frame of reference signals according to the microphone signals of the target frame _xd (n, i) and a predetermined coefficient gamma (n) output the delay probabilities Prob corresponding to the reference signals of the frames _xd (n, i). The preset delay probability determination model is used for carrying out weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals. The delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers, wherein the difference of the microphone frame numbers refers to the difference of the frame numbers of the microphone signals of each frame and the current frame microphone signal in the target frame microphone signals.

Specifically, the target frame microphone signal may include a current frame microphone signal (frame number n) and a preamble frame microphone signal (frame number n-1), for which, the corresponding difference q between the microphone frame numbers is 0, and for the preamble frame microphone signal, the corresponding difference q between the microphone frame numbers is 1, therefore, the preset delay probability determination model can adjust the size of the preset coefficient gamma (n) according to the microphone frame number difference q to obtain the delay probability weight corresponding to each frame of microphone signal, if the microphone signals with larger microphone frame number difference q can be given smaller delay probability weight on the basis of the preset coefficient gamma (n), the microphone frame number difference q can be used as a size adjusting factor of the preset coefficient gamma (n) to be used for influencing the size of the similarity corresponding to each frame of microphone signals so as to influence the corresponding delay probability contribution value. Then, for each frame of reference signal, the preset delay probability determination model may perform weighted summation according to the similarity between each frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability corresponding to the frame of reference signal. In this way, the preset delay probability determination model can obtain and output the delay probability corresponding to each frame of reference signal.

Further, in some embodiments, the target frame microphone signals may include a current frame microphone signal and each preamble frame microphone signal; the step S303 further includes:

and respectively inputting the similarity between the current frame microphone signal and each frame reference signal, the similarity between each pre-preamble frame microphone signal and each frame reference signal and a preset coefficient into a preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model.

In this embodiment, the delay probability corresponding to each frame of reference signal is calculated by combining the current frame microphone signal and each preamble frame microphone signal (the number of frames of each preamble frame microphone signal may be N-1 frames), so as to improve the accuracy and reliability of the steady-state value of the delay estimation. Specifically, the preset delay probability determination model of this embodiment is further configured to determine a microphone frame number difference corresponding to each frame of microphone signal, and obtain a delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of a preset coefficient; for each frame of reference signal, multiplying the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-preamble frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability contribution value corresponding to each frame of microphone signal; and summing the delay probability contribution values corresponding to the microphone signals of each frame to obtain the delay probability corresponding to the reference signal of the frame.

That is, when the target frame microphone signal includes the current frame microphone signal and its respective preamble frame microphone signals, the corresponding delay probability weight may be calculated for each frame of microphone signals (including the current frame microphone signal and its respective preamble frame microphone signals). Then, for the calculation of the delay probability corresponding to each frame of reference signal, the delay probability contribution value corresponding to each frame of microphone signal is obtained according to the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-frame of microphone signal and the frame of reference signal and the corresponding delay probability weight, and then the delay probability contribution values corresponding to each frame of microphone signal are summed to obtain the delay probability corresponding to the frame of reference signal. According to the calculation mode of the delay probability corresponding to the frame reference signal, the preset delay probability determination model can obtain the delay probability corresponding to each frame reference signal.

In one embodiment, the step of obtaining the delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of a preset coefficient specifically includes:

obtaining a first part of weight according to a preset coefficient, and obtaining a second part of weight by taking the difference of the microphone frame numbers as an index adjustment factor of the preset coefficient; and multiplying the first part weight and the second part weight to obtain a delay probability weight.

In this embodiment, the microphone frame number difference q is used as an exponential adjustment factor of the preset coefficient γ (n) to obtain a second partial weight constituting the delay probability weight, which may be expressed as γ (n), for example ^q . In this embodiment, the first partial weight is also obtained according to a preset coefficient γ (n), and the first partial weight may be expressed as (1- γ (n)), for example. Then multiplying the first part weight and the second part weight to obtain the time delay probability weight of (1-gamma (n)). gamma (n) ^q . Therefore, the difference q of the microphone frame numbers can be used as a preset coefficient gamma (n) ^q The index of (a) is used to influence the magnitude of the similarity corresponding to each frame of microphone signals and thus influence the corresponding delay probability contribution value.

In this embodiment, the preset delay probability determination model may be specifically expressed as:

wherein n corresponds to the current frame microphone signalThe number n-q corresponds to the preamble frame microphone signal differing from the frame number of the current frame microphone signal by q. Specifically, for each frame of microphone signals, determining the microphone frame number difference q between the frame of microphone signals and the current frame of microphone signals, obtaining a first partial weight (1-gamma (n)) according to a preset coefficient gamma (n), and obtaining a second partial weight gamma (n) by taking the microphone frame number difference q as an index adjustment factor of the preset coefficient gamma (n) ^q Then the first partial weight (1-gamma (n)) is compared with the second partial weight gamma (n) ^q Multiplying to obtain the delay probability weight (1-gamma (n)). gamma (n) ^q . Then, according to the corresponding delay probability weight (1-gamma (n)). gamma (n) of the microphone signal of the frame ^q And the similarity P between the frame of microphone signals and the frame of reference signals _xd (n-q, i) to obtain the corresponding delay probability contribution value (1-gamma (n)). gamma (n) of the frame microphone signal ^q *P _xd (n-q,i)。

Therefore, the preset delay probability determination model can obtain delay probability contribution values corresponding to N frames of microphone signals in the target frame of microphone signals, and then the preset delay probability determination model enables the delay probability contribution values (1-gamma (N)). gamma (N) corresponding to the N frames of microphone signals to be obtained ^q *P _xd (n-q, i) summing to obtain the corresponding delay probability Prob of the frame reference signal _xd (n, i). According to the mode, the preset delay probability determination model can obtain and output the delay probability Prob corresponding to each frame of reference signal _xd (n,i)。

Step S304, based on the respective corresponding delay probability of each frame of reference signal, obtaining the frame number of the reference signal corresponding to the maximum delay probability.

In this step, the delay probability Prob corresponding to each frame of reference signal is obtained _xd After (n, i), according to the delay probability Prob corresponding to each frame reference signal _xd (n, i), calculating the frame number of the reference signal corresponding to the maximum delay probability, obtaining the frame number of the reference signal corresponding to the maximum delay probability, and calculating the maximum value of the delay probability by the following method:

step S305, obtaining a delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

In this step, the frame number of the reference signal corresponding to the maximum delay probability is determined

And obtaining a delay estimation steady state value estdelay (n) by the difference value of the frame number n of the current frame microphone signal.

The method for obtaining the delay estimation steady state value comprises the steps of obtaining the similarity between a target frame microphone signal and each frame of reference signal respectively, determining a preset coefficient which is adaptive to the current signal transmission stage, outputting the delay probability corresponding to each frame of reference signal according to the similarity and the preset coefficient through a preset delay probability determination model, and taking the frame number of the reference signal corresponding to the maximum delay probability as the delay estimation steady state value, so that the delay estimation steady state value obtaining based on coefficient feedback is realized, the delay estimation transient value does not need to be calculated, the probability distribution of the delay estimation transient value does not need to be calculated, the output speed of the delay estimation steady state value is increased, the convergence speed of delay estimation is accelerated, the time of echo residual is shortened, and the probability of echo is reduced.

The method for obtaining the delay estimation steady state value can be applied to network live broadcast to improve echo cancellation performance in specific application scenes such as a voice room, live broadcast microphone connection and the like. Specifically, the application scenario shown in fig. 1 may specifically correspond to a live microphone connection application scenario of live webcast, and a home-end user may perform live microphone connection with a remote-end user by means of the terminal 110, the microphone 120, and the speaker 130. In the live broadcasting and microphone connecting process, after the voice signal transmitted from the far-end user to the home-end user is played by the speaker 130, the voice signal may enter the microphone 120 along with the voice signal recorded by the home-end user and be transmitted to the far-end user, thereby causing echo effect. In contrast, the terminal 110 may execute the method for obtaining the delay estimation steady state value provided by the present application in live webcast, which may improve the speed of outputting the delay estimation steady state value, accelerate the convergence speed of delay estimation, shorten the time of echo residue in live webcast microphone, reduce the probability of echo, improve the echo cancellation performance of live webcast microphone, and improve the user experience of live webcast.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides an obtaining apparatus for obtaining the delay estimation steady-state value, which is used for implementing the above mentioned method for obtaining the delay estimation steady-state value. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in the following one or more embodiments of the apparatus for acquiring a steady-state value of delay estimation may refer to the limitations of the above method for acquiring a steady-state value of delay estimation, and are not described herein again.

In one embodiment, as shown in fig. 6, there is provided an apparatus for obtaining a steady-state value of a delay estimation, the apparatus 600 includes:

a similarity obtaining module 601, configured to obtain similarities between a target frame microphone signal and each frame reference signal;

a coefficient determining module 602, configured to determine a preset coefficient adapted to a current signal transmission stage;

a probability determining module 603, configured to input the similarity between the target frame microphone signal and each frame reference signal and the preset coefficient into a preset delay probability determining model, so as to obtain delay probabilities corresponding to the frames of reference signals output by the preset delay probability determining model; the preset delay probability determination model is used for performing weighted summation according to the similarity between each frame of microphone signals in the target frame of microphone signals and the reference signals and the corresponding delay probability weight to obtain the delay probability corresponding to the reference signals; the delay probability weight is obtained by adjusting the preset coefficient based on the difference of the microphone frame numbers; the microphone frame number difference is the frame number difference between each frame of microphone signal and the current frame of microphone signal in the target frame of microphone signals;

a frame number obtaining module 604, configured to obtain, based on the delay probability corresponding to each frame of reference signal, a frame number of the reference signal corresponding to the maximum delay probability;

a steady state value obtaining module 605, configured to obtain a delay estimation steady state value according to the frame number of the reference signal corresponding to the maximum delay probability and the frame number of the current frame microphone signal.

In one embodiment, the coefficient determining module 602 is configured to obtain a preset corresponding relationship between a signal transmission stage and a coefficient; and acquiring a preset coefficient adaptive to the current signal transmission stage based on the preset corresponding relation.

In an embodiment, the coefficient determining module 602 is configured to determine, based on the preset correspondence, that a preset coefficient adapted to the current signal transmission stage is a first preset coefficient when the current signal transmission stage is an initial stage; when the current signal transmission stage is a single talk stage, determining the preset coefficient adapted to the current signal transmission stage as a second preset coefficient; and when the current signal transmission stage is a double-talk stage, determining that the preset coefficient adaptive to the current signal transmission stage is a third preset coefficient.

In one embodiment, the target frame microphone signals comprise a current frame microphone signal and each preamble frame microphone signal; a probability determining module 603, configured to input the similarity between the current frame microphone signal and each frame reference signal, the similarity between each pre-frame microphone signal and each frame reference signal, and a preset coefficient into the preset delay probability determining model, so as to obtain a delay probability corresponding to each frame reference signal output by the preset delay probability determining model; the preset delay probability determination model is used for determining a microphone frame number difference corresponding to each frame of microphone signal, and obtaining a delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of the preset coefficient; for each frame of reference signal, multiplying the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-preamble frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability contribution value corresponding to each frame of microphone signal; and summing the delay probability contribution values corresponding to the microphone signals of each frame to obtain the delay probability corresponding to the reference signal of the frame.

In an embodiment, the preset delay probability determination model is specifically configured to obtain a first partial weight according to the preset coefficient, and obtain a second partial weight by using the microphone frame number difference as an index adjustment factor of the preset coefficient; and multiplying the first part weight and the second part weight to obtain the delay probability weight.

In one embodiment, the similarity obtaining module 601 is configured to obtain spectral similarities between the target frame microphone signal and each frame reference signal; and calculating the weighted frequency point sum of the spectrum similarity aiming at each frame of reference signal to obtain the similarity between the target frame of microphone signal and the frame of reference signal.

In one embodiment, the similarity obtaining module 601 is configured to obtain autocorrelation spectrums of the microphone signals of the target frame, obtain respective autocorrelation spectrums of the reference signals of the frames, and obtain cross-correlation spectrums of the microphone signals of the target frame and the reference signals of the frames, respectively; and for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal to obtain the spectral similarity of the target frame microphone signal and the frame of reference signal.

The modules in the device for obtaining the delay estimation steady-state value can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The electronic device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of obtaining a delay estimate steady state value. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the electronic devices to which the subject application may be applied, and that a particular electronic device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, an electronic device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a preset corresponding relation between a signal transmission stage and a coefficient; and acquiring a preset coefficient adaptive to the current signal transmission stage based on the preset corresponding relation.

In one embodiment, the processor, when executing the computer program, further performs the steps of: based on the preset corresponding relation, when the current signal transmission stage is an initial stage, determining a preset coefficient adapted to the current signal transmission stage as a first preset coefficient; when the current signal transmission stage is a single talk stage, determining the preset coefficient adapted to the current signal transmission stage as a second preset coefficient; and when the current signal transmission stage is a double-talk stage, determining that the preset coefficient adaptive to the current signal transmission stage is a third preset coefficient.

In one embodiment, the target frame microphone signals comprise a current frame microphone signal and each preamble frame microphone signal; the processor, when executing the computer program, further performs the steps of: inputting the similarity between the current frame microphone signal and each frame reference signal, the similarity between each pre-preamble frame microphone signal and each frame reference signal, and a preset coefficient into the preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model; the preset delay probability determination model is used for determining a microphone frame number difference corresponding to each frame of microphone signal, and obtaining a delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of the preset coefficient; for each frame of reference signal, multiplying the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-preamble frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability contribution value corresponding to each frame of microphone signal; and summing the delay probability contribution values corresponding to the microphone signals of each frame to obtain the delay probability corresponding to the reference signal of the frame.

In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining a first part of weight according to the preset coefficient, and obtaining a second part of weight by taking the microphone frame number difference as an index adjustment factor of the preset coefficient; and multiplying the first part weight and the second part weight to obtain the delay probability weight.

In one embodiment, the processor when executing the computer program further performs the following steps of obtaining spectral similarities of the target frame microphone signal and the frame reference signals respectively; and calculating the weighted frequency point sum of the spectrum similarity aiming at each frame of reference signal to obtain the similarity between the target frame of microphone signal and the frame of reference signal.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring an autocorrelation spectrum of the target frame microphone signal, acquiring respective autocorrelation spectrums of the frame reference signals, and acquiring cross-correlation spectrums of the target frame microphone signal and the frame reference signals respectively; and for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal to obtain the spectral similarity of the target frame microphone signal and the frame of reference signal.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a preset corresponding relation between a signal transmission stage and a coefficient; and acquiring a preset coefficient adaptive to the current signal transmission stage based on the preset corresponding relation.

In one embodiment, the computer program when executed by the processor further performs the steps of: based on the preset corresponding relation, when the current signal transmission stage is an initial stage, determining a preset coefficient adapted to the current signal transmission stage as a first preset coefficient; when the current signal transmission stage is a single talk stage, determining the preset coefficient adapted to the current signal transmission stage as a second preset coefficient; and when the current signal transmission stage is a double-talk stage, determining that the preset coefficient adaptive to the current signal transmission stage is a third preset coefficient.

In one embodiment, the target frame microphone signals comprise a current frame microphone signal and each preamble frame microphone signal; the computer program when executed by the processor further realizes the steps of: inputting the similarity between the current frame microphone signal and each frame reference signal, the similarity between each pre-preamble frame microphone signal and each frame reference signal, and a preset coefficient into the preset delay probability determination model to obtain the delay probability corresponding to each frame reference signal output by the preset delay probability determination model; the preset delay probability determination model is used for determining a microphone frame number difference corresponding to each frame of microphone signal, and obtaining a delay probability weight corresponding to each frame of microphone signal by using the microphone frame number difference as an index adjustment factor of the preset coefficient; for each frame of reference signal, multiplying the similarity between the current frame of microphone signal and the frame of reference signal, the similarity between each pre-preamble frame of microphone signal and the frame of reference signal and the corresponding delay probability weight to obtain the delay probability contribution value corresponding to each frame of microphone signal; and summing the delay probability contribution values corresponding to the microphone signals of each frame to obtain the delay probability corresponding to the reference signal of the frame.

In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining a first part of weight according to the preset coefficient, and obtaining a second part of weight by taking the microphone frame number difference as an index adjustment factor of the preset coefficient; and multiplying the first part weight and the second part weight to obtain the delay probability weight.

In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining the spectral similarity of the target frame microphone signal and each frame reference signal; and calculating the weighted frequency point sum of the spectrum similarity aiming at each frame of reference signal to obtain the similarity between the target frame of microphone signal and the frame of reference signal.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring an autocorrelation spectrum of the target frame microphone signal, acquiring respective autocorrelation spectra of the reference signals of each frame, and acquiring cross-correlation spectra of the target frame microphone signal and the reference signals of each frame respectively; and for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal to obtain the spectral similarity of the target frame microphone signal and the frame of reference signal.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method for obtaining a steady-state value of a delay estimation is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the predetermined coefficient corresponding to the current signal transmission phase comprises:

acquiring a preset corresponding relation between a signal transmission stage and a coefficient;

and acquiring a preset coefficient adaptive to the current signal transmission stage based on the preset corresponding relation.

3. The method according to claim 2, wherein the obtaining of the preset coefficient adapted to the current signal transmission stage based on the preset correspondence relationship comprises:

4. The method of claim 1, wherein the target frame microphone signals comprise a current frame microphone signal and each preamble frame microphone signal; the obtaining of the delay probability corresponding to each frame of reference signal output by the preset delay probability determination model by inputting the similarity between the target frame of microphone signal and each frame of reference signal and the preset coefficient into the preset delay probability determination model includes:

5. The method according to claim 4, wherein the obtaining the delay probability weight corresponding to each of the microphone signals by using the microphone frame number difference as an exponential adjustment factor of the preset coefficient comprises:

obtaining a first part of weight according to the preset coefficient, and obtaining a second part of weight by taking the microphone frame number difference as an index adjustment factor of the preset coefficient;

and multiplying the first part weight and the second part weight to obtain the delay probability weight.

6. The method according to any one of claims 1 to 5, wherein the obtaining the similarity between the target frame microphone signal and each frame reference signal comprises:

obtaining the spectral similarity between the target frame microphone signal and each frame reference signal;

and calculating the weighted frequency point sum of the spectrum similarity aiming at each frame of reference signal to obtain the similarity between the target frame of microphone signal and the frame of reference signal.

7. The method of claim 6, wherein obtaining the spectral similarity between the target frame microphone signal and the frame reference signal comprises:

acquiring an autocorrelation spectrum of the target frame microphone signal, acquiring respective autocorrelation spectra of the reference signals of each frame, and acquiring cross-correlation spectra of the target frame microphone signal and the reference signals of each frame respectively;

and for each frame of reference signal, normalizing the cross-correlation spectrum of the target frame microphone signal and the frame of reference signal by using the autocorrelation spectrum of the target frame microphone signal and the autocorrelation spectrum of the frame of reference signal to obtain the spectral similarity of the target frame microphone signal and the frame of reference signal.

8. An apparatus for obtaining a steady state value of a delay estimation, the apparatus comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.