CN113808609A

CN113808609A - Echo detection method and device, computer readable storage medium and terminal equipment

Info

Publication number: CN113808609A
Application number: CN202111110928.1A
Authority: CN
Inventors: 潘思伟; 雍雅琴; 纪伟; 董斐
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2021-12-17

Abstract

An echo detection method and device, a computer readable storage medium and a terminal device are provided, wherein the echo detection method comprises the following steps: acquiring an uplink voice signal, and performing initial filtering operation on the acquired uplink voice signal to obtain a filtered signal; calculating the correlation between the filtered signal and a downlink reference signal; determining an echo state based at least on the correlation, the echo state including the presence of abnormal echo and anechoic. The technical scheme of the invention improves the conversation performance in the two-way conversation stage.

Description

Echo detection method and device, computer readable storage medium and terminal equipment

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to an echo detection method and apparatus, a computer-readable storage medium, and a terminal device.

Background

In audio systems, acoustic echo is due to coupling between the loudspeaker and the microphone, resulting in the microphone signal containing not only the useful upstream speech signal but also the echo. If the microphone signal is not processed, the echo signal is transmitted to the far-end loudspeaker for playing, and a far-end caller hears the delayed sound, which makes the caller uncomfortable and interferes with the uplink voice signal, thereby affecting the effect of the call. With the rapid development of scientific technology, communication modes and application scenes are increasingly diversified, and communication terminals are increasingly miniaturized and portable, so that the coupling between a loudspeaker and a microphone is stronger and the echo channel is more and more complex and changeable, which brings great challenges to acoustic echo cancellation in voice communication.

To ensure good speech quality, it is common practice to remove echoes using an Adaptive Echo Canceller (AEC) and a Non-linear Echo suppressor (NLP). The basic principle of AEC can be summarized as using a filter to adaptively estimate the echo propagation path, further estimate the echo signal received by the microphone, and subtract the estimated echo from the microphone pick-up signal, thereby removing the echo. In the hands-free state, the adaptive filter can eliminate linear echo and partial nonlinear echo of about 20 decibels (dB), residual echo needs echo suppression, and finally all echo signals are eliminated completely.

The sound emitted by the loudspeaker eventually reaches the microphone via the propagation path of air or other propagation medium, which collects the sound, which is defined as the echo path. When the speaker or the microphone or the propagation path has a shielding object, for example, a human hand shields the speaker, a human face approaches a telephone watch, etc., at this time, the propagation path of the acoustic echo changes, the shielding object, the speaker and the microphone form a new echo path, the echo received by the microphone changes, usually, nonlinear echoes in the echo signal increase abnormally, and the overall amplitude of the echo signal increases. The original parameters and filter coefficients in the device cannot eliminate abnormally large echoes, and the residual echo is large. In order to cancel the abnormal echo after the echo path is changed, a strong echo suppression parameter may be set in advance so that the echo can be completely canceled after the echo path is changed.

However, when the echo path is not changed, due to the strong echo suppression parameter, when there is uplink speech and echo simultaneously existing Double-talk (DT), the uplink speech and the echo are cancelled together, resulting in poor Double-talk performance.

Disclosure of Invention

The invention solves the technical problem of how to improve the conversation performance in the double-talk stage.

In order to solve the above technical problem, an embodiment of the present invention provides an echo detection method, where the echo detection method includes: acquiring an uplink voice signal, and performing initial filtering operation on the acquired uplink voice signal to obtain a filtered signal; calculating the correlation between the filtered signal and a downlink reference signal; determining an echo state based at least on the correlation, the echo state including the presence of abnormal echo and anechoic.

Optionally, the echo detection method further includes: and determining to directly output the filtered signal or perform secondary filtering operation on the filtered signal according to the echo state.

Optionally, the filtered signal is a current frame speech signal, and the calculating a correlation between the filtered signal and a downlink reference signal includes: and calculating a cross-correlation value of the current frame voice signal and the downlink reference signal to serve as the correlation.

Optionally, the calculating the correlation between the filtered signal and the downlink reference signal includes: and calculating the average value of the cross correlation values of the voice signals in the frequency band and the downlink reference signals to be used as the correlation.

Optionally, the determining, according to the echo state, to directly output the filtered signal or perform a secondary filtering operation on the filtered signal includes: and if the correlation value is greater than a first preset threshold and the amplitude of the downlink reference signal is greater than a second preset threshold, performing secondary filtering operation on the filtered signal.

Optionally, the determining, according to the echo state, to directly output the filtered signal or perform a secondary filtering operation on the filtered signal includes: and if the correlation value is lower than a first preset threshold or the amplitude of the downlink reference signal is lower than a second preset threshold, directly outputting the filtered signal.

Optionally, the performing a secondary filtering operation on the filtered signal includes: setting the filtered signal to zero. And adding a processing method to perform nonlinear preprocessing on the downlink signal and perform adaptive filtering again.

The embodiment of the invention also discloses an echo detection device, which comprises: the acquisition module is used for acquiring uplink voice signals and carrying out initial filtering operation on the acquired uplink voice signals to obtain filtered signals; a calculating module, configured to calculate a correlation between the filtered signal and a downlink reference signal; and the judging module is used for determining an echo state at least according to the correlation, wherein the echo state comprises abnormal echo and no echo.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is executed by a processor to execute the steps of the echo detection method.

The embodiment of the invention also discloses terminal equipment which comprises a memory and a processor, wherein the memory is stored with a computer program capable of running on the processor, and the processor executes the steps of the echo detection method when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

in the technical scheme of the invention, the echo under a normal echo path can be eliminated through the initial filtering operation; by calculating the correlation between the filtered signal and the downlink reference signal, whether the filtered signal has abnormal echo can be determined, so that the echo state is determined, the abnormal echo and the normal double-talk state can be distinguished, reference is provided for a subsequent scheme for eliminating the echo, the echo elimination effect is ensured, the normal double-talk is ensured to be not affected continuously, and the far-end listener has good conversation experience.

Further, if the correlation value is greater than a first preset threshold and the amplitude of the downlink reference signal is greater than a second preset threshold, performing a secondary filtering operation on the filtered signal. According to the technical scheme, when the abnormal echo is detected, echo suppression is enhanced, the abnormal echo is eliminated completely, and the purpose of improving the echo elimination and the double-talk performance is achieved at the same time.

Drawings

FIG. 1 is a flow chart of an echo detection method according to an embodiment of the present invention;

fig. 2 to 4 are schematic diagrams of a specific application scenario in the embodiment of the present invention;

FIG. 5 is a flowchart illustrating an echo detection method according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an echo detection device according to an embodiment of the present invention;

fig. 7 is a diagram illustrating an echo cancellation effect according to an embodiment of the present invention.

Detailed Description

As described in the background art, when the echo path is not changed, due to the strong echo suppression parameter, when there is uplink speech and echo simultaneously existing Double-talk (DT), the uplink speech and the echo are cancelled together, resulting in poor Double-talk performance.

In the prior art, a method for detecting abnormal echoes caused by echo path change is mainly based on cross correlation between microphone signals and signals after AEC, and the method has the condition that normal double talk and abnormal echoes are difficult to distinguish, so that the accuracy of abnormal echo detection is influenced, the missing detection of the abnormal echoes or the false detection of the normal double talk is easy to cause, and the echo suppression processing of the normal double talk and the abnormal echoes under hands-free call is difficult to balance.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of an echo detection method according to an embodiment of the present invention.

The echo detection method of the embodiment of the invention can be used for the terminal equipment side, namely, the terminal equipment can execute each step of the method. The terminal device may be a device with a call function, and specifically, the terminal device may be provided with a speaker and a microphone, such as a mobile phone, a computer, a wearable device, and the like

Specifically, the echo detection method may include the steps of:

step 101: acquiring an uplink voice signal, and performing initial filtering operation on the acquired uplink voice signal to obtain a filtered signal;

step 102: calculating the correlation between the filtered signal and a downlink reference signal;

step 103: determining an echo state based at least on the correlation, the echo state including the presence of abnormal echo and anechoic.

It should be noted that the sequence numbers of the steps in this embodiment do not represent a limitation on the execution sequence of the steps.

It will be appreciated that in a specific implementation, the echo detection method may be implemented by a software program running on a processor integrated within a chip or a chip module.

In a specific implementation of step 101, the uplink speech signal may be collected by a microphone of the terminal device. The uplink voice signal may include user voice and may further include echo, and the echo may specifically include normal echo and abnormal echo. The normal echo refers to an echo emitted by the speaker and picked up by the microphone via a normal path, and the abnormal echo refers to an echo emitted by the speaker and picked up by the microphone via an abnormal emission path formed by an obstacle. The decibel of the abnormal echo is greater than the decibel of the normal echo.

Referring specifically to fig. 2, a microphone 102 at the far end 100 sends a downstream signal 103 to a speaker 104 at the near end 115. The direct echo 105 is emitted by the loudspeaker 104 at the near end 115 and is directly picked up by the microphone 108 at the near end 115, and the indirect echo 106 is emitted by the loudspeaker 104 at the near end 115 and is indirectly picked up by the near end microphone 108 via ambient reflections. At the same time as the echo is picked up, near-end speech 107, if present, is picked up by microphone 108. The processed upstream signal 109 is sent to a speaker 110 at the remote end 100 for playback.

In some cases, for example, when the human face or the palm blocks the near-end speaker 104 or the near-end microphone 108, the reflection path of the direct echo 105 is changed to form a new echo reflection path, and when the echo reflection path is different from the original path greatly, the echo picked up by the near-end microphone 108 is caused to increase significantly, and the echo picked up by the near-end microphone 108 is an abnormal echo.

Referring to fig. 3 and 4 together, fig. 3 shows a layout of a speaker and a microphone of a mobile terminal. The sound of the mobile terminal is played by a loudspeaker at the lower left corner of the back of the mobile phone, and a microphone is positioned at the right side of the bottom of the mobile phone. When the user's hand is located at the middle upper part of the mobile phone during the call, the echo path between the speaker and the microphone is a normal echo path (the dotted line is the schematic line of the echo path). When the hand of the user is positioned at the lower part of the mobile phone, at the moment, the palm of the user is close to the loudspeaker and the microphone, a new echo path can be formed by the palm, the loudspeaker and the microphone, the original echo path is changed, and at the moment, the echo received by the microphone is abnormally increased and is difficult to eliminate.

Fig. 4 shows a layout of a speaker and a microphone of a wearable device. The speaker is located below the device and the microphone is located above and to the right of the device. When there is no obstacle between the speaker and the microphone during a call, the direct echo path between the speaker and the microphone is a normal echo path (the dotted line is the schematic line of the echo path). When the face is close to the watch or the palm is close to the watch, the face or the palm, the loudspeaker and the microphone form a new echo path together, the original echo path is changed, and at the moment, the echo received by the microphone is abnormally increased and is difficult to eliminate.

In the specific implementation of step 102, the similarity between the filtered signal and the downlink reference signal can be obtained by calculating the correlation between the filtered signal and the downlink reference signal. Further, in the implementation of step 103, the echo status, i.e. whether there is abnormal echo or no echo in the filtered signal, can be determined according to the correlation.

Specifically, the downlink reference signal may be a voice signal in a downlink transmitted from the far end to the near end. Because the echo in the uplink voice signal is formed by reflecting the downlink reference signal, the strength of the echo can be determined by comparing the similarity of the uplink voice signal and the downlink reference signal, so as to determine whether the abnormal echo exists.

In a specific embodiment, a cross-correlation value of the current frame speech signal and the downlink reference signal may be calculated as the correlation.

In this embodiment, the following formula may be used to calculate the cross-correlation value:

where Corr (m, k) represents a cross-correlation value, X (m, k) represents a spectrum of the filtered signal, Y (m, k) represents a spectrum of the downlink reference signal, m represents a frame index, k represents an index of a frequency point, and X represents a conjugate.

In another specific embodiment, an average value of the cross-correlation values of the speech signals of the frames in the frequency band and the downlink reference signal is calculated as the correlation.

In this embodiment, an average value of the cross-correlation values in a certain frequency band may be calculated as the correlation. The formula specifically adopted is as follows:

wherein, Corr _ avg (m) represents the mean value of the cross-correlation value of each frame signal and the downlink reference signal in the selected frequency band, N₁And N₂Respectively corresponding to a lower limit frequency point and an upper limit frequency point in a frequency band, and N₂≥N₁。

In one non-limiting embodiment, the direct output of the filtered signal or the second filtering operation on the filtered signal may be determined based on the echo status.

In this embodiment, directly outputting the filtered signal means that only the initial filtering operation is performed on the collected voice signal. And the secondary filtering operation is carried out on the filtered signal, so that a stronger echo suppression effect can be realized on the voice signal.

In other words, when the anechoic path is changed, the normal echo suppression parameters are used to ensure the double-talk continuity; when the echo path is changed, abnormal echo is detected and echo suppression parameters are enhanced, so that the abnormal echo is eliminated, and good two-way voice call quality can be guaranteed to the maximum extent.

Further, if the correlation value is greater than a first preset threshold and the amplitude of the downlink reference signal is greater than a second preset threshold, performing a secondary filtering operation on the filtered signal.

In specific implementation, it is determined whether the correlation value and the amplitude of the downlink reference signal satisfy the following formula:

Corr_avg(m)＞PCD_thr， (2)

y_level(m)＞EC_thr， (3)

wherein Corr _ avg (m) represents a correlation value, y _ level (m) represents the amplitude of the downlink reference signal, PCD _ thr represents a first preset threshold, and EC _ thr represents a second preset threshold. When the conditions of the formula (3) and the formula (4) are simultaneously satisfied, it is indicated that the filtered signal still contains more residual echoes, i.e. the filtered signal is judged to be in an abnormal echo state, and at this time, echo suppression can be enhanced, i.e. the filtered signal is subjected to secondary filtering operation.

Further, if the correlation value is lower than a first preset threshold, or the amplitude of the downlink reference signal is lower than a second preset threshold, the filtered signal is directly output.

In this embodiment, if the conditions of the formula (3) and the formula (4) cannot be satisfied at the same time, it is determined that the filtered signal has no significant residual echo, that is, it is determined that the filtered signal is in a normal echo state or an echo-free state (echo cancellation does not need to be started), and at this time, the filtered signal is directly output without any processing.

Referring to fig. 5, fig. 5 shows a specific flow of an echo detection method.

An input signal 502 of the microphone 501 and a down reference signal 504 input to the loudspeaker 503 result in an error signal 506 after iterative updating and filtering by the linear AEC unit 505. The error signal 506 typically contains linear residual echo and non-linear echo, and in the case of near-end voicing, near-end speech. Non-linear processing filtering of the error signal 506 containing residual echo can be used to achieve further echo suppression, and is called NLP unit 507 because of the non-linear filtering. After filtering by NLP unit 507, a filtered signal (which may also be referred to as a near-end speech estimation signal) 508 is obtained. Under normal echo path, the near-end speech in the filtered signal 508 should be completely retained and the downstream speech should be completely eliminated. When the echo path changes to cause an abnormal increase of the echo in the input signal 502 of the microphone 501, the echo cannot be completely eliminated after being processed by the NLP unit 507, so that the filtered signal 508 includes an undisleted residual echo or even a large residual echo.

The abnormal echo detector 509 functions to calculate the correlation 510 of the filtered signal with the downlink reference signal (also referred to as the eigenvalue of the residual echo of the current frame), which is the cross-correlation value calculated using the filtered signal 508 and the downlink reference signal 504.

The feature value 510 of the residual echo of the current frame is fed into the echo enhancement suppression judgment unit 511. The emphasis suppression judgment unit 511 judges when the filtered signal 508 contains more residual echoes according to the above equations (3) and (4). If the conditions of the formula (3) and the formula (4) are satisfied at the same time, it is determined that the echo state is abnormal, and at this time, the echo suppression is enhanced (i.e., the filtered signal is subjected to the secondary filtering operation), so that the residual echo is inaudible, and the processing result 512 after the enhancement of the echo suppression is output.

If the conditions of the formula (3) and the formula (4) cannot be satisfied at the same time, it is determined that the filtered signal 508 has no significant residual echo, i.e., it is determined to be in a normal echo state or no echo state (no echo cancellation needs to be turned on), and at this time, the result 512 is directly output without any processing, i.e., the filtered signal 508 is directly output.

In an alternative embodiment of the present invention, in addition to filtering the input signal 502 and the downlink reference signal 504 by using the linear AEC unit 505, the input signal 502 and the downlink reference signal 504 may be subjected to nonlinear preprocessing, and adaptive filtering is performed again

Referring to fig. 6, an embodiment of the invention discloses an echo detection device 60. The echo detecting device 60 may include:

the acquisition module 601 is configured to acquire an uplink voice signal and perform initial filtering operation on the acquired uplink voice signal to obtain a filtered signal;

a calculating module 602, configured to calculate a correlation between the filtered signal and a downlink reference signal;

a determining module 603, configured to determine an echo state according to at least the correlation, where the echo state includes existence of an abnormal echo and no echo.

In a specific implementation, the echo detection device may correspond to a Chip having an echo detection function in a terminal device, such as a System-On-a-Chip (SOC), a baseband Chip, or the like; or the terminal device comprises a chip module with an echo detection function; or to a chip module having a chip with a data processing function, or to a terminal device.

For more details of the operation principle and the operation mode of the echo detection device 60, reference may be made to the relevant descriptions in fig. 1 to 5, which are not described herein again.

Each module/unit included in each apparatus and product described in the above embodiments may be a software module/unit, or may also be a hardware module/unit, or may also be a part of a software module/unit and a part of a hardware module/unit. For example, for each device or product applied to or integrated into a chip, each module/unit included in the device or product may be implemented by hardware such as a circuit, or at least a part of the module/unit may be implemented by a software program running on a processor integrated within the chip, and the rest (if any) part of the module/unit may be implemented by hardware such as a circuit; for each device or product applied to or integrated with the chip module, each module/unit included in the device or product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules/units may be implemented by using a software program running on a processor integrated within the chip module, and the rest (if any) of the modules/units may be implemented by using hardware such as a circuit; for each device and product applied to or integrated in the terminal, each module/unit included in the device and product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules/units may be implemented by using hardware such as a circuit.

Referring to fig. 7, fig. 7 shows a spectrogram result of a correlation signal when echo is abnormally increased in a hands-free call, wherein the horizontal axis represents Time (Time) in units of: seconds(s), and the vertical axis represents frequency (frequency), unit: hertz (Hz). The original sampling rate of the signal in this example is 8 kHz.

FIG. 7 is a top view of a downlink reference signal with echo energy concentrated primarily in two time periods of 0-40 s and 50-60 s. Fig. 7 shows the result before abnormal echo detection (i.e., the result 508 output by the NLP unit shown in fig. 5) after the initial filtering operation, where near 45s is uplink-only speech, and within 50-60 s, the processed normal dual-talk section is used, the echo residue is very little, the processing result is close to the uplink-only speech, duplex is good, but the echo residue of the pure echo section within 0-40 s is abnormal and large, and cannot be eliminated. The lower graph in fig. 7 shows the output result after abnormal echo detection and processing, the abnormal echo in 0-40 s is completely eliminated, and the processing result of only the uplink voice in the vicinity of 45s and the normal double talk in 50-60 s is completely consistent with the graph result in fig. 4. It can be seen that after the processing of the scheme, the abnormal echo can be accurately detected and effectively suppressed, and meanwhile, the normal uplink voice only and the double-talk voice are ensured to be continuously unaffected.

The embodiment of the present invention also discloses a storage medium, which is a computer-readable storage medium, and a computer program is stored on the storage medium, and when the computer program runs, the steps of the echo detection method shown in fig. 1 or fig. 5 may be executed.

The embodiment of the invention also discloses terminal equipment which can comprise a memory and a processor, wherein the memory is stored with a computer program which can run on the processor. The processor, when running the computer program, may perform the steps of the echo detection method shown in fig. 1 or 5. The user equipment includes but is not limited to terminal equipment such as a mobile phone, a computer, a tablet computer, wearable equipment and the like with a conversation function.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more.

The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.

The term "connect" in the embodiments of the present application refers to various connection manners, such as direct connection or indirect connection, to implement communication between devices, which is not limited in this embodiment of the present application.

It should be understood that, in the embodiment of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM), SDRAM (SLDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods according to the embodiments of the present invention.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An echo detection method, comprising:

acquiring an uplink voice signal, and performing initial filtering operation on the acquired uplink voice signal to obtain a filtered signal;

calculating the correlation between the filtered signal and a downlink reference signal;

determining an echo state based at least on the correlation, the echo state including the presence of abnormal echo and anechoic.

2. The echo detection method of claim 1, further comprising:

and determining to directly output the filtered signal or perform secondary filtering operation on the filtered signal according to the echo state.

3. The method of claim 1, wherein the filtered signal is a current frame speech signal, and wherein the calculating the correlation between the filtered signal and a downlink reference signal comprises:

and calculating a cross-correlation value of the current frame voice signal and the downlink reference signal to serve as the correlation.

4. The method of claim 1, wherein the calculating the correlation of the filtered signal with a downlink reference signal comprises:

and calculating the average value of the cross correlation values of the voice signals of each frame in the frequency band and the downlink reference signal to be used as the correlation.

5. The echo detection method of claim 2, wherein the determining whether to directly output the filtered signal or to perform a second filtering operation on the filtered signal based on the echo state comprises:

and if the correlation value is greater than a first preset threshold and the amplitude of the downlink reference signal is greater than a second preset threshold, performing secondary filtering operation on the filtered signal.

6. The echo detection method of claim 2, wherein the determining whether to directly output the filtered signal or to perform a second filtering operation on the filtered signal based on the echo state comprises:

and if the correlation value is lower than a first preset threshold or the amplitude of the downlink reference signal is lower than a second preset threshold, directly outputting the filtered signal.

7. The method of claim 2, wherein said second filtering of the filtered signal comprises:

setting the filtered signal to zero.

8. An echo detection device, comprising:

the acquisition module is used for acquiring uplink voice signals and carrying out initial filtering operation on the acquired uplink voice signals to obtain filtered signals;

a calculating module, configured to calculate a correlation between the filtered signal and a downlink reference signal;

and the judging module is used for determining an echo state at least according to the correlation, wherein the echo state comprises abnormal echo and no echo.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the echo detection method according to any one of claims 1 to 7.

10. A terminal device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the echo detection method according to any of claims 1 to 7.