CN115665642B

CN115665642B - Noise elimination method and system

Info

Publication number: CN115665642B
Application number: CN202211587566.XA
Authority: CN
Inventors: 曹祖杨; 陈孟飞; 周航; 侯佩佩; 陶慧芳; 张永全; 包君健
Original assignee: Hangzhou Crysound Electronics Co Ltd
Current assignee: Hangzhou Crysound Electronics Co Ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-03-17
Anticipated expiration: 2042-12-12
Also published as: CN115665642A

Abstract

The invention relates to a noise elimination method, which records a first time interval before audio generation, a second time interval when the audio is generated and a third time interval after the audio is generated to respectively obtain a first audio, a second audio and a third audio; eliminating the first audio from the third audio to obtain an echo audio; intercepting the front section of the echo audio to obtain a balanced echo section; intercepting the middle section of the second audio to obtain a section to be processed; the method can accurately extract the echo part and the environmental sound part in the audio collected by the microphone, and eliminate the echo and the environmental sound from the collected signal; the method of the invention does not need to additionally construct an acoustic shielding environment, and does not need to adjust the acoustic shielding environment along with the frequency change of the test audio, thereby improving the test efficiency and reducing the cost.

Description

Noise elimination method and system

Technical Field

The invention belongs to the technical field of noise elimination, and particularly relates to a noise elimination method and a noise elimination system.

Background

In a microphone testing link, the sound receiving capacity of a microphone needs to be tested, and the specific testing method is that a continuous audio signal with a single frequency is played to the microphone for multiple times each time, then the frequency and the stability of an acoustic signal collected by the microphone are compared with the original audio signal with the single frequency, and performance parameters of the microphone in each frequency domain are evaluated.

When the microphone collects audio, the first part of the signal collected by the microphone is an audio signal which is over against the microphone when a sound source sounds, the second part of the signal is an audio signal which is reflected back by a ceiling, a floor, a wall and the like after the sound source sounds, and the third part of the signal is environmental noise.

In the prior art, three methods are commonly used to eliminate echo, one is to reduce the reflection of sound by processing the sound field environment material, i.e. replacing the wall and ceiling with sound absorption material, or using an acoustic shielding box. This can suppress echoes more directly, but at higher cost and with greater restriction; secondly, an echo suppressor is used, and a loudspeaker and a microphone are alternately opened and closed through a level comparison unit for testing, but output signals of the power amplifier equipment are discontinuous, and the overall effect is poor; thirdly, echo is eliminated by an echo suppression (acoustic echo suppression) method and an acoustic echo cancellation (acoustic echo cancellation) method, but since the echo path transfer function is continuous and infinite in length physically, the acoustic echo canceller can only be approximated by discrete finite-length filter coefficients. The acoustic echo canceller cannot completely cancel the echo, and the output of the acoustic echo canceller has echo residue.

Among the three methods, the latter two methods have a large influence on the microphone to acquire audio signals, so that the test result is distorted, while the first method needs to use a special test room or an acoustic shielding box, so that the test cost is increased and the test time is limited.

Therefore, a noise cancellation method is needed, which can reduce the requirements on environment and equipment during microphone testing, and can obtain better cancellation effects of noise such as echo and environmental sound, so that the microphone testing can obtain accurate results.

Disclosure of Invention

Based on the above-mentioned shortcomings and drawbacks of the prior art, it is an object of the present invention to at least solve one or more of the above-mentioned problems of the prior art, in other words, to provide a noise cancellation method and system that meets one or more of the above-mentioned needs.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a noise cancellation method, which specifically includes:

before playing the test audio, recording a first time interval before audio is generated by using a microphone, and generating a first audio signal; after the first time period, the test audio starts to play, and the microphone is continuously used for recording a second time period when the audio is generated, so that a second audio signal is generated; after the test audio is played, the microphone is continuously used for recording a third time interval after the audio is stopped playing, and a third audio signal is generated;

converting the first audio and the third audio into frequency domain signals, and carrying out spectral subtraction on the first audio from the third audio from the frequency domain, so as to eliminate an environmental sound part of the third audio, which is overlapped with the first audio, and obtain echo audio;

intercepting the front section of the echo audio to obtain a balanced echo section;

intercepting the middle section of the second audio to obtain a section to be processed;

and eliminating the first audio frequency containing the environmental sound part and the balanced echo section containing the stable echo from the to-be-processed section obtained by intercepting the second audio frequency to obtain an audio signal acquired by the microphone after noise elimination.

As a preferred aspect of the first aspect, the method for eliminating the first audio and equalized echo segment from the segment to be processed specifically includes:

performing short-time Fourier transform on the first audio frequency, the equalization echo section and the section to be processed, converting the first audio frequency, the equalization echo section and the section to be processed into frequency domain signals, and subtracting the amplitude of the first audio frequency and the amplitude of the equalization echo section from the amplitude of the section to be processed at each frequency;

and restoring the audio subjected to the spectral subtraction into a time-domain signal through inverse Fourier transform, thereby obtaining the audio subjected to noise elimination.

As a further preferable solution of the first aspect, before converting the first audio, the equalized echo section and the section to be processed into the frequency domain signal, the method further includes:

copying the balanced echo section for multiple times and splicing the balanced echo sections together to amplify the balanced echo section into a repeated balanced echo section with a specified length;

and cutting the amplified balanced echo section, the first audio frequency and the section to be processed to the specified length, and performing time alignment.

As a further preferable scheme of the first aspect, the method for converting the first audio frequency, the equalized echo section, and the section to be processed into the frequency domain signal specifically includes the following steps:

framing calculation is carried out on the first audio frequency, the balanced echo section and the section to be processed by taking a plurality of milliseconds as the length;

and performing short-time Fourier transform on the framed first audio, the balanced echo section and the section to be processed to obtain corresponding frequency domain signals.

As still another preferable aspect of the first aspect, the front section of the echo audio is cut off with a time point at which the echo audio starts to decay as a boundary, and the part of the echo audio before the decay is cut off as the equalized echo section.

Preferably, the middle section of the second audio is intercepted according to the following method:

and intercepting the appointed length of the second audio frequency with the highest amplitude intensity in the waveform of the second audio frequency, or intercepting the appointed length of the second audio frequency after the appointed time delay.

As a further preferred version of the first aspect, the first audio and the third audio are of equal length.

In a second aspect, the present invention further provides a noise cancellation system, specifically including:

the recording unit is used for recording a first time interval before audio generation, a second time interval during audio generation and a third time interval after audio generation to respectively obtain a first audio, a second audio and a third audio;

the echo extracting unit is used for eliminating the first audio frequency from the third audio frequency to obtain an echo audio frequency;

the first interception unit is used for intercepting the front section of the echo audio to obtain a balanced echo section;

the second intercepting unit is used for intercepting the middle section of the second audio to obtain a section to be processed;

and the noise elimination unit is used for eliminating the first audio and the balanced echo section from the section to be processed to obtain the audio after noise elimination.

As a preferable aspect of the second aspect, the echo extracting unit, when eliminating the first audio and equalized echo segment from the segment to be processed, is further configured to:

performing short-time Fourier transform on the first audio frequency, the equalizing echo section and the section to be processed, converting the first audio frequency, the equalizing echo section and the section to be processed into frequency domain signals, and subtracting the amplitude of the first audio frequency and the amplitude of the equalizing echo section from the amplitude of the section to be processed at each frequency;

and restoring the audio subjected to the spectral subtraction into a time domain signal through inverse Fourier transform, thereby obtaining the audio subjected to noise elimination.

As a further preferable aspect of the second aspect, before the echo extracting unit converts the first audio, the equalized echo section and the section to be processed into the frequency domain signal, it is further configured to:

As a further preferable solution of the second aspect, the echo extracting unit converts the first audio, the equalized echo segment and the segment to be processed into frequency domain signals, and is further configured to:

As another preferable mode of the second aspect, the first clipping unit, when clipping the front segment of the echo audio, takes a time point at which the echo audio starts to attenuate as a boundary, and clips a part of the echo audio before attenuation as the equalized echo segment.

As another preferable aspect of the second aspect, the second clipping unit, when clipping the middle section of the second audio, clips according to the following method:

and intercepting the specified length with the highest amplitude intensity in the waveform of the second audio, or intercepting the specified length of the second audio after the specified time delay.

In a third aspect, the present invention also provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any one of the methods described above.

In a fourth aspect, the present invention also provides a computer-readable storage medium having a computer program stored thereon, the computer-readable storage medium having instructions stored thereon, which, when executed on a computer or processor, cause the computer or processor to perform the steps of any of the methods described above.

Compared with the prior art, the invention has the beneficial effects that:

the method can accurately extract the echo part and the environmental sound part in the audio collected by the microphone, and eliminate the echo and the environmental sound from the collected signals;

the method of the invention does not need to additionally construct an acoustic shielding environment, and does not need to adjust the acoustic shielding environment along with the frequency change of the test audio, thereby improving the test efficiency and reducing the cost.

Drawings

FIG. 1 is a flow chart of a noise cancellation method of an embodiment of the present application;

fig. 2 is a time domain diagram of an audio signal generated by the acquisition of the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In the following description, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The following description provides embodiments of the present application, which may be combined or interchanged with one another, and therefore the present application is also to be construed as encompassing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes features a, B, C and another embodiment includes features B, D, then this application should also be construed to include embodiments that include all other possible combinations of one or more of a, B, C, D, although such embodiments may not be explicitly recited in the following text.

The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.

Fig. 1 is a flowchart illustrating a noise cancellation method according to an embodiment of the present application.

When the microphone is tested, the body of the audio signal and the echo part propagate to the microphone according to the following rules: because the path along which the audio signal facing the microphone propagates is shorter than the path along which the audio signal is reflected to the microphone, the audio signal facing the microphone is transmitted to the microphone and collected by the microphone before the audio signal is reflected to the microphone through a wall, the ground, the ceiling and the like.

Based on the characteristics, the time domain diagram of the audio signal generated by collecting the test audio from a period of time before the test audio is played to a period of time after the test audio is played is shown in fig. 2:

before the test audio is played, a microphone collects pure environmental sounds; when the test audio starts to play, the microphone collects the environmental sound and the test audio body, and starts to collect the echo caused by the reflection of the test audio in the surrounding environment after a short time delay.

When the test audio playing is completed, the microphone does not collect the test audio body any more, and at the moment, the environmental sound and the residual echoes which are not reflected yet are collected.

After a delay, the microphone can only pick up ambient sounds because the echo fades away as the sound source stops sounding.

The embodiments of the present application are performed in the following order:

s1, collecting is started before the test audio is played and when an echo is not generated, and a microphone to be tested is used for recording a first time period before the audio is generated, so that a first audio signal only containing environmental sound is obtained.

And continuously collecting when the test audio starts to be played, and recording a second time interval when the audio is generated by using the microphone to be tested to obtain a second audio signal comprising the environmental sound, the test audio and the echo.

And when the test audio is played, starting to record the third time interval after the test audio is stopped playing by using the microphone to obtain a third audio signal containing the environmental sound and the initially equalized echo which is then gradually faded.

It is understood that, in the above process, the microphone may use continuous recording during the switching process from the first time period recording to the second time period recording to the third time period recording; recording may also be suspended after the completion of recording for the first time period, continued at the beginning of the second time period, suspended at the end of the second time period, and continued at the beginning of the third time period. As long as the idea of the above method can be realized, the first audio signal only including the environmental sound, the second audio signal including the environmental sound, the test audio and the echo, and the third audio signal including the echo and the environmental sound are acquired.

And S2, eliminating the first audio from the third audio in order to obtain a single echo audio, wherein the first audio only comprises the ambient sound, and the third audio comprises the ambient sound and the initially equalized echo which is then gradually faded.

Specifically, the elimination of the first audio from the third audio may be implemented using the following method:

s21, framing the first audio and the third audio in each millisecond, performing short-time Fourier transform on the framed first audio and the framed third audio, and converting the framed first audio and the framed third audio into a frequency domain;

and S22, subtracting the first audio from the third audio in a frequency domain to obtain echo audio.

In some embodiments of the present application, to facilitate the subtraction of the first audio and the third audio, the captured first audio and the third audio are made equal in length.

The echo audio obtained through the steps is initially equalized and then gradually faded. The echo cancellation needs to use the echo that is always kept balanced for cancellation, so step S3 is executed to intercept the front segment of the echo audio to obtain a balanced echo segment.

As a preferred solution, the front segment of the echo audio is determined by the following method:

and according to the waveform judgment of the echo audio, taking the time when the echo audio starts to attenuate as a boundary, and intercepting the part of the echo audio before attenuation into an equalized echo section.

In addition, since the echo gradually increases at the beginning of the second time, in order to obtain a portion with equalized echo for cancellation using the equalized echo segment, step S4 is executed to intercept the second audio from the middle segment, and the portion with gradually increased echo is deleted to obtain a segment to be processed with equalized echo.

In some embodiments of the present application, the middle segment of the second audio intercepted in step S4 is intercepted according to the following method:

and segmenting the waveform of the second audio by taking 1S as the length, and selecting the segment with the highest amplitude intensity in the waveforms in all the segments to intercept the segment.

In other embodiments of the present application, the middle segment of the second audio intercepted in step S4 is intercepted according to the following method:

the segment of the second audio from 1S to 2S is fixedly cut.

And S5, eliminating the first audio frequency containing the environmental sound part and the balanced echo section containing the stable echo from the section to be processed obtained by intercepting the second audio frequency to obtain a test audio signal which is acquired by the microphone and subjected to noise elimination, wherein compared with a signal which is not subjected to noise elimination after acquisition, the audio signal subjected to noise elimination more accurately shows the acquisition performance of the microphone on the current frequency test audio frequency.

In some embodiments of the present application, the first audio and equalized echo segment is eliminated from the segment to be processed in step S5, and the following method may be specifically used:

s51, performing short-time Fourier transform on the first audio frequency, the equalization echo section and the section to be processed, converting the first audio frequency, the equalization echo section and the section to be processed into frequency domain signals, and subtracting the amplitude of the first audio frequency and the amplitude of the equalization echo section from the amplitude of the section to be processed at each frequency.

And S52, restoring the audio subjected to the spectrum reduction into a time domain signal through inverse Fourier transform, thereby obtaining the audio subjected to noise elimination.

In some embodiments of the present application, before the step S51 of converting the first audio frequency, the equalized echo section, and the section to be processed into the frequency domain signal, the method further includes:

s501, the equalized echo segments are copied for multiple times and spliced together, and the equalized echo segments are amplified to be repeated equalized echo segments with specified lengths, so that equalized echo signals with enough lengths are obtained.

S502, cutting the amplified balanced echo section, the first audio frequency and the section to be processed to the specified length, and performing time alignment.

The above time alignment specifically operates using the following formula:

；

whereinf(t) Andg(t) Two functions for time alignment are provided.

In some embodiments of the present application, the step S52 of converting the first audio, the equalized echo section, and the section to be processed into the frequency domain signal specifically includes the following steps:

and S521, performing frame calculation on the first audio, the equalization echo section and the section to be processed by taking a plurality of milliseconds as the length.

In some embodiments of the present application, step S521 performs frame calculation on the first audio, the equalized echo segment, and the segment to be processed with a length of 1 ms.

S522, performing short-time Fourier transform on the framed first audio, the equalized echo section and the section to be processed to obtain corresponding frequency domain signals.

Setting the signal of the segment to be processed after framing as y, the signal after eliminating the environmental sound and the balanced echo as x, the signal of the environmental sound and the balanced echo as d, and in n frames, the time domain functions of the first audio frequency, the balanced echo segment and the segment to be processed follow the following formulas:

；

converting the time domain function into a frequency domain function to obtain:

；

the formula for obtaining the audio frequency with the ambient sound eliminated and the echo equalized through the spectral subtraction is as follows:

。

after the above-mentioned spectral subtraction

The signal is a frequency domain signal, and is converted into a framed time domain signal after being subjected to inverse Fourier transform.

And performing the above processing on the first audio frequency, the balanced echo section and the section to be processed of each frame, and splicing in a time domain to obtain an audio signal which is acquired by a complete microphone and eliminates the environmental sound and echo.

In addition, the present application also provides a noise cancellation system, which specifically includes:

the recording unit 1 is used for recording a first time interval before audio generation, a second time interval during audio generation and a third time interval after audio generation to respectively obtain a first audio, a second audio and a third audio;

the echo extracting unit 2 is used for eliminating the first audio frequency from the third audio frequency to obtain an echo audio frequency;

the first interception unit 31 is configured to intercept a front segment of the echo audio to obtain a balanced echo segment;

a second intercepting unit 32, configured to intercept a middle segment of the second audio to obtain a segment to be processed;

and the noise elimination unit 4 is used for eliminating the first audio and the equalized echo section from the section to be processed to obtain the audio subjected to noise elimination.

It is clear to a person skilled in the art that the solution according to the embodiments of the present application can be implemented by means of software and/or hardware. The term "unit" in this specification refers to software and/or hardware capable of performing a specific function independently or in cooperation with other components, wherein the hardware may be, for example, a Field-Programmable Gate Array (FPGA), an Integrated Circuit (IC), or the like.

In some embodiments of the present invention, corresponding to the above method, the echo extracting unit 2, when eliminating the first audio and equalized echo segment from the segment to be processed, is further configured to:

In some embodiments of the present invention, corresponding to the above method, the echo extracting unit 2 is further configured to, before converting the first audio, the equalized echo segment and the segment to be processed into the frequency domain signal:

In some embodiments of the present invention, corresponding to the above method, the echo extracting unit 2 specifically performs the following operations in the process of converting the first audio frequency, the equalized echo segment, and the segment to be processed into the frequency domain signal:

performing frame calculation on the first audio, the balanced echo section and the section to be processed by taking a plurality of milliseconds as the length;

In some embodiments of the present invention, corresponding to the above method, when the first clipping unit 31 clips the front segment of the echo audio, the first clipping unit clips the part of the echo audio before attenuation into the equalized echo segment by using the time point when the echo audio starts to attenuate as a boundary.

In some embodiments of the present invention, corresponding to the above method, the second clipping unit 32, when clipping the middle section of the second audio, clips according to the following method:

intercepting a designated length with the highest amplitude intensity in the waveform of the second audio;

or in other embodiments, truncating according to the following method:

and intercepting the specified length of the second audio after the specified time delay.

In a third aspect, the present invention also provides an electronic device comprising at least one processor 501, at least one memory 502 and a computer program stored on the memory 502 and executable on the processor 501.

Processor 501 may include one or more processing cores, among other things. The processor 501 connects various parts within the overall electronic device using various interfaces and lines, performs various functions and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 502, and calling data stored in the memory 502. Optionally, the processor 501 may be implemented in at least one hardware form of DSP, FPGA, and PLA. The processor 501 may integrate one or a combination of several of a CPU, GPU, modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen.

The memory 502 may include a RAM or a ROM. Optionally, the memory 502 includes a non-transitory computer-readable medium. The memory 502 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 502 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 502 may alternatively be at least one memory device located remotely from the processor 501. The memory 502, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a noise cancellation application program therein.

After the electronic device is connected to the microphone and the audio testing playing device, it can automatically execute the application program for eliminating the noise stored in the memory 502. The playing device is controlled to play the test audio, the microphone is controlled to collect the first to third audio, then the application program is automatically executed to perform noise elimination on the audio signal, and the audio signal collected by the microphone and subjected to noise elimination is stored back in the memory 502.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A noise cancellation method is characterized by specifically comprising:

recording a first time interval before audio generation, a second time interval during audio generation and a third time interval after audio generation to respectively obtain a first audio, a second audio and a third audio;

eliminating the first audio frequency from the third audio frequency to obtain an echo audio frequency;

and eliminating the first audio and the equalization echo section from the section to be processed to obtain the audio after noise elimination.

2. The method of claim 1, wherein the removing the first audio and the equalized echo segment from the segment to be processed comprises:

converting the first audio, the equalized echo section and the section to be processed into frequency domain signals, and performing spectral subtraction on the first audio and the equalized echo section from the section to be processed in a frequency domain;

and restoring the audio subjected to the spectrum subtraction into a time domain signal to obtain the audio subjected to noise elimination.

3. A method of noise cancellation as claimed in claim 2, wherein before converting the first audio, the equalized echo section and the section to be processed into frequency domain signals, further comprising the steps of:

copying and amplifying the balanced echo section into a specified length;

and aligning the amplified equalized echo section with the first audio frequency and the section to be processed in time with a specified length.

4. The method according to claim 2, wherein said converting the first audio, the equalized echo section, and the section to be processed into frequency domain signals comprises:

5. The method of claim 1, wherein the clipping the front segment of the echo audio is performed by dividing a time when the echo audio starts to attenuate into the equalized echo segment.

6. A method of noise cancellation according to claim 1, wherein the clipping the mid-segment of the second audio is performed according to the following method:

and intercepting the specified length with the highest amplitude intensity in the waveform of the second audio, or intercepting the specified length of the second audio after specified time delay.

7. A noise cancellation method according to claim 1, wherein said first audio and said third audio are equal in length.

8. A noise cancellation system, comprising:

and the noise elimination unit is used for eliminating the first audio and the equalization echo section from the section to be processed to obtain the audio after noise elimination.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform the steps of the method according to any one of claims 1-7.