CN111261179A - Echo cancellation method and device and intelligent equipment - Google Patents
Echo cancellation method and device and intelligent equipment Download PDFInfo
- Publication number
- CN111261179A CN111261179A CN201811453725.0A CN201811453725A CN111261179A CN 111261179 A CN111261179 A CN 111261179A CN 201811453725 A CN201811453725 A CN 201811453725A CN 111261179 A CN111261179 A CN 111261179A
- Authority
- CN
- China
- Prior art keywords
- echo cancellation
- signal
- preset
- reference signal
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 11
- 238000003672 processing method Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 8
- 238000004088 simulation Methods 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000003491 array Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The application discloses an echo cancellation method and device and intelligent equipment, wherein an echo cancellation model with a plurality of layers of recursive networks is used for processing an input signal, and influences caused by nonlinear factors are improved to a great extent, so that echo cancellation generates a better effect, and actual requirements are better met. Moreover, the application considers the case of multiple loudspeakers and multiple microphone arrays, so that the application is wider and more convenient.
Description
Technical Field
The present application relates to, but not limited to, artificial intelligence technologies, and in particular, to an echo cancellation method and apparatus, and an intelligent device.
Background
Echo feedback is a common problem in electro-acoustic instruments, such as telephones, hearing aids, etc. Echo feedback seriously affects the quality of the speech signal, often causes noise problems such as howling and whistling, reduces the gain of the system, and changes the response of the system.
Adaptive Echo Control (AEC) is based on the correlation between the output signal of a loudspeaker and the multipath Echo generated by the output signal of the loudspeaker, and subtracts an Echo estimate from the input signal of the sound pickup device, thereby achieving the purpose of canceling the Echo.
After the intelligent device is born, the intelligent voice device also needs to eliminate the sound source of the intelligent device, as shown in fig. 1, after the reference signal (i.e. the input signal entering the loudspeaker of the intelligent device) in the intelligent device is amplified by the loudspeaker, the reference signal and the original signal such as the human voice are received by the microphone array of the intelligent device, so as to form a received signal. For example, the smart speaker needs to eliminate music played by itself, and for example, the smart television needs to eliminate sound of a television program played by itself. In scenarios such as voice wake-up, voice recognition, etc., scenarios requiring echo cancellation are often encountered.
In the related art, echo cancellation mainly performs voice wake-up or voice recognition after processing multi-channel sound on a signal layer. On one hand, the original sound signal and the reference signal are processed in a linear mode, and in practical situations, due to reverberation, equipment structures and the like, a large number of nonlinear transformations exist, and the influence of the nonlinear factors cannot be overcome. On the other hand, the method can only judge the AEC effect from the sense of hearing, and the optimization of the sense of hearing does not mean the promotion of voice awakening and voice recognition effects.
Disclosure of Invention
The application provides an echo cancellation method and device and intelligent equipment, which can enable echo cancellation to generate a better effect, so that actual requirements can be better met.
The embodiment of the invention provides an echo cancellation method, which comprises the following steps:
respectively inputting all reference signals into an echo cancellation model according to the number of channels of the received signals, and calculating to obtain a reference signal estimation value;
subtracting the reference signal estimation value corresponding to each channel from the radio signal of each channel to obtain an original signal estimation value;
and carrying out normalization processing on the original signal estimation values corresponding to all channels to obtain original signals.
In one illustrative example, the method further comprises generating the echo cancellation model, comprising:
simulating a radio signal by using a preset original signal and a preset reference signal;
and training the network to be trained by taking the radio signal obtained by simulation and a preset reference signal as input and taking a preset original signal as a modeling target to obtain the echo cancellation model.
In one illustrative example, the echo cancellation model includes a multi-layer recursive network.
In one illustrative example, the analog radio signal comprises:
after the preset impulse response is carried out on the preset reference signal, a preset environmental noise signal is added to obtain a first signal;
and superposing the first signal and a preset original signal to obtain the simulated radio signal.
In one illustrative example, the network to be trained includes at least one of:
the feedforward sequence memory neural network FSMN, the deep feedforward sequence memory neural network DFSMN, the long and short time memory unit LSTM, the bidirectional long and short time memory unit BLSTM or the gate cycle unit GRU.
In one illustrative example, the reference signal comprises at least the original signal comprises at least one path.
In one illustrative example, the method further comprises:
and performing joint training by adopting the obtained original signal, a voice awakening model and a voice recognition model.
The application also provides an echo cancellation processing method, which comprises the following steps:
simulating a radio signal by using a preset original signal and a preset reference signal;
and training the network to be trained by taking the radio signal obtained by simulation and a preset reference signal as input and taking a preset original signal as a modeling target to obtain the echo cancellation model.
In one illustrative example, the analog radio signal comprises:
after the preset impulse response is carried out on the preset reference signal, a preset environmental noise signal is added to obtain a first signal;
and superposing the first signal and a preset original signal to obtain the simulated radio signal.
In one illustrative example, the network to be trained includes at least one of:
the feedforward sequence memory neural network FSMN, the deep feedforward sequence memory neural network DFSMN, the long and short time memory unit LSTM, the bidirectional long and short time memory unit BLSTM or the gate cycle unit GRU.
The present application further provides a computer-readable storage medium storing computer-executable instructions for performing the echo cancellation method of any one of the above and/or performing the echo cancellation processing method of any one of the above.
The present application further provides an echo cancellation device, comprising a memory and a processor, wherein the memory stores the following instructions executable by the processor: for performing the steps of the echo cancellation method of any one of the above.
The application also provides an echo cancellation device, comprising a memory and a processor, wherein the memory stores the following instructions executable by the processor: for performing the steps of the echo cancellation processing method of any one of the above.
The present application further provides a smart device comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor:
according to the number of channels of the received radio signals, inputting all reference signals entering a loudspeaker into an echo cancellation model respectively, and calculating to obtain a plurality of reference signal estimation values;
for each channel, subtracting a reference signal estimation value corresponding to the channel from the radio signal of each channel to obtain a plurality of original signal estimation values;
and carrying out normalization processing on the original signal estimation values corresponding to all the channels to obtain original signals entering the microphone array.
In one illustrative example, the smart device comprises: intelligent audio amplifier, intelligent TV.
According to the echo cancellation method and device, the echo cancellation model with the multilayer recursive network is used for processing the input signals, and influences caused by nonlinear factors are improved to a great extent, so that echo cancellation can generate a better effect, and actual requirements can be better met. Moreover, the application considers the case of multiple loudspeakers and multiple microphone arrays, so that the application is wider and more convenient.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a schematic diagram of radio signal generation in a smart device;
FIG. 2 is a flow chart of an echo cancellation processing method according to the present application;
FIG. 3 is a diagram of a network architecture in which an embodiment of echo cancellation and speech recognition are used in conjunction with one embodiment of the present application;
FIG. 4 is a schematic diagram of the structure of an echo cancellation processing apparatus according to the present application;
FIG. 5 is a flow chart of the echo cancellation method of the present application;
FIG. 6 is a schematic diagram of an embodiment of an echo cancellation network according to the present application;
fig. 7 is a schematic diagram of a structure of an echo cancellation device according to the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In one exemplary configuration of the present application, a computing device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
The inventor of the application discovers through research on echo cancellation related technologies that if AEC can be realized based on a deep neural network, the strong nonlinear modeling capability of the deep neural network can be fully utilized, and the influence of nonlinear factors in actual conditions is handled.
Fig. 2 is a flowchart of an echo cancellation processing method of the present application, for training generation of an echo cancellation model, as shown in fig. 2, including:
step 200: and simulating the radio signal by using a preset original signal and a preset reference signal.
In one illustrative example, the analog radio signal may include:
after preset impulse response is carried out on a preset reference signal, a preset environmental noise signal is added to obtain a first signal;
and superposing the first signal and a preset original signal to obtain an analog radio signal.
Taking a microphone array (including 4 microphones) as an example, the far field data is usually simulated by using the near field data, and the formula is as follows:
y_1(t)=x(t)*h_s1(t)+n(t);y_2(t)=x(t)*h_s2(t)+n(t);y_3(t)=x(t)*h_s3(t)+n(t);y_4(t)=x(t)*h_s4(t)+n(t)。
where y _ i (t) represents the far-field data of the ith microphone generated by simulation, x (t) represents the near-field data, h _ si (t) represents the impulse response of the ith microphone determined by the house, environment and microphone position, represents the convolution operation, and n (t) represents the ambient noise. i is 1, 2, 3 or 4.
Step 201: and training the network to be trained by taking the radio signal obtained by simulation and a preset reference signal as input and taking a preset original signal as a modeling target to obtain an echo cancellation model.
In an exemplary embodiment, the modeling criterion may be a minimum mean square error criterion or may be modeled in the form of a mask. The general goal is to establish a mapping from the received signal to the original signal.
In an exemplary example, the network to be trained may include a multi-layer recursive network such as a feed-forward Sequential Memory neural network (FSMN), or a Deep feed-forward Sequential Memory neural network (DFSMN), or a Long-Short Term Memory unit (LSTM), or a Bidirectional Long-Short Term Memory unit (BLSTM), or a Gated Recursive Unit (GRU). Wherein LSTM is a time recursive Recurrent Neural Networks (RNN)
It should be noted that how to train the network to be trained to obtain the specific implementation of the echo cancellation model is not used to limit the scope of the present application. The application emphasizes that a multi-channel radio signal and a multi-channel reference signal obtained through simulation are used as input, and the network structure of the application is adopted to train the echo cancellation multi-layer neural network.
In one illustrative example, the reference signal includes at least one channel and the original signal includes at least one channel.
In an exemplary embodiment, the multi-channel speech signal after echo cancellation can be further used for model training of subsequent voice wakeup and speech recognition, and can be subjected to joint training.
In an exemplary example, taking echo cancellation followed by a speech recognition model as an example, assuming that the collected signals are 2 channels of original signals (such as wav1 and wav2 in fig. 3) and 2 channels of reference signals (such as ref1 and ref2 in fig. 3), the input of the network is the collected 4 channels of signals, and the echo cancellation process is performed on the collected 4 channels of signals, such as a dashed frame portion indicated by NN Front-end shown in fig. 3, to implement an AEC function, and the AEC processed signals are combined with the reference signals and then input to an Acoustic Model (AM) portion. During training, the NN Front-end and the AM are trained independently, and then the two networks are connected in series to be trained jointly.
The structure shown in fig. 3 is suitable for a wide range of application scenarios, such as: the method comprises the following steps that a multi-channel signal is collected by a wind array and can be added to an input end of a neural network at the same time; the following steps are repeated: there are cases of different types of signals, such as a case where an internal signal is simultaneously acquired in addition to an external signal, and the like.
The echo cancellation model obtained by training is a multilayer recursive network, and is very suitable for overcoming the influence caused by nonlinear factors, so that better effect is generated by echo cancellation, and the actual requirement is better met.
The present application further provides a computer-readable storage medium storing computer-executable instructions for performing the echo cancellation processing method of any one of the above.
The present application further provides an echo cancellation model generation apparatus, comprising a memory and a processor, wherein the memory stores the steps of the echo cancellation processing method according to any one of the above.
The present application further provides a smart device comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor:
according to the number of channels of the received radio signals, inputting all reference signals entering a loudspeaker into an echo cancellation model respectively, and calculating to obtain a plurality of reference signal estimation values;
for each channel, subtracting a reference signal estimation value corresponding to the channel from the radio signal of each channel to obtain a plurality of original signal estimation values;
and carrying out normalization processing on the original signal estimation values corresponding to all the channels to obtain original signals entering the microphone array.
In one illustrative example, the smart device comprises: intelligent audio amplifier, intelligent TV.
Fig. 4 is a schematic diagram of a structure of an echo cancellation processing apparatus according to the present application, as shown in fig. 4, at least including: the device comprises a signal processing module and a training module; wherein the content of the first and second substances,
the signal processing module is used for simulating a radio signal by using the original signal and the reference signal;
and the training module is used for training to obtain the echo cancellation model by taking the radio signal and the reference signal obtained by simulation as input and the original signal as a modeling target.
In an illustrative example, the signal processing module is specifically configured to:
after the reference signal is subjected to preset impulse response, a preset environmental noise signal is added to obtain a first signal; and superposing the first signal and the original signal to obtain an analog radio signal.
In an illustrative example, the network to be trained may include a multi-layer neural network such as FSMN, or DFSMN, or LSTM, or BLSTM, or GRU.
In one illustrative example, the reference signal includes at least one channel and the original signal includes at least one channel.
Fig. 5 is a flowchart of an echo cancellation method according to the present application, as shown in fig. 5, including:
step 500: and respectively calculating all reference signal echo cancellation models according to the number of the channels of the received signals to obtain a reference signal estimation value.
In an exemplary embodiment, the echo cancellation model is a multi-layer recursive network trained using a simulated radio signal of an original signal and a reference signal.
In one illustrative example, the representative form of the radio signal or reference signal may include, but is not limited to, such as: a raw Wave (WAV) signal, or a Fast Fourier Transform (FFT) signal that has undergone a Fourier Transform, or a commonly used word speech wake-up, FilterBank (FKank) feature of speech recognition, etc.
In an exemplary embodiment, the multi-layer recursive network for implementing the echo cancellation model in the present application is divided into a plurality of sub-networks according to the number of channels of the received signal, and each sub-network is an echo cancellation model with all reference signals as inputs. And obtaining a reference signal estimation value corresponding to the channel after echo cancellation model calculation.
In an exemplary embodiment, all reference signals in the sub-network corresponding to each channel are processed by a multi-layer recursive network for non-linearity (including linearity), the sub-network includes multiple layers, such as a recursive network layer of FSMN or LSTM-based RNN, a multi-layer normalization layer, and a direct connection of a residual network.
Step 501: and subtracting the reference signal estimation value corresponding to each channel from the radio signal of each channel to obtain an original signal estimation value.
In one exemplary embodiment, the steps include: for each channel, the estimated reference signal is subtracted from the received signal to obtain the raw signal estimate.
Fig. 6 is a schematic diagram of an embodiment of an echo cancellation network according to the present application, and as shown in fig. 6, in this embodiment, it is assumed that the number of speakers is 2, and the number of microphone arrays is also 2, that is, sound reception signals of 2 channels and reference signals of 2 channels are input, such as a sound reception signal 1 (represented as sound reception signal 1 (channel 1) in fig. 6) of channel 1, a sound reception signal 2 (represented as sound reception signal 2 (channel 2) in fig. 6) of channel 2, a reference signal 1 (represented as reference signal 1 (channel 1) in fig. 6) of channel 1, and a reference signal 2 (represented as reference signal 2 (channel 2) in fig. 6) of channel 2 shown in fig. 6 of the present application. Then, as shown by the dashed-line frame part in fig. 6, the multi-layer recursive network for implementing the echo cancellation model in the present embodiment is divided into 2 sub-networks according to the number of channels of the received signal. In each sub-network, there are for example two recursive network layers, such as FSMN or DFSMN or LSTM or BLSTM or GRU, and a plurality of normalization layers, where in this embodiment there is one normalization layer in the processing of all reference information in each sub-network and one normalization layer in the processing of reference signal estimates from each sub-network.
Step 502: and carrying out normalization processing on the original signal estimation values corresponding to all channels to obtain original signals.
According to the echo cancellation method and device, the echo cancellation model with the multilayer recursive network is used for processing the input signals, and influences caused by nonlinear factors are improved to a great extent, so that echo cancellation can generate a better effect, and actual requirements can be better met. Moreover, the application considers the case of multiple loudspeakers and multiple microphone arrays, so that the application is wider and more convenient.
In an exemplary embodiment, the echo cancellation processing of the present application is networked, so that joint training of the echo cancellation processing and a back-end voice wake-up and voice recognition model is more flexible.
The present application further provides a computer-readable storage medium having stored thereon computer-executable instructions for performing the echo cancellation method of any of the above.
The present application further provides an echo cancellation device, comprising a memory and a processor, wherein the memory stores the steps of the echo cancellation method of any one of the above.
Fig. 7 is a schematic structural diagram of the echo cancellation device according to the present application, as shown in fig. 7, at least including: the device comprises a first estimation module, a second estimation module and a processing module; wherein the content of the first and second substances,
the first estimation module is used for respectively inputting all reference signals into the echo cancellation model according to the number of the channels of the received signals and calculating to obtain a reference signal estimation value;
the second estimation module is used for subtracting the reference signal estimation value obtained by calculation corresponding to each channel from the radio signal of each channel to obtain an original signal estimation value;
and the processing module is used for carrying out normalization processing on the original signal estimation value corresponding to each channel to obtain an expected original signal.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.
Claims (15)
1. An echo cancellation method, comprising:
respectively inputting all reference signals into an echo cancellation model according to the number of channels of the received signals, and calculating to obtain a reference signal estimation value;
subtracting the reference signal estimation value corresponding to each channel from the radio signal of each channel to obtain an original signal estimation value;
and carrying out normalization processing on the original signal estimation values corresponding to all channels to obtain original signals.
2. The echo cancellation method of claim 1, further comprising, before the method, generating the echo cancellation model, comprising:
simulating a radio signal by using a preset original signal and a preset reference signal;
and training the network to be trained by taking the radio signal obtained by simulation and a preset reference signal as input and taking a preset original signal as a modeling target to obtain the echo cancellation model.
3. The echo cancellation method of claim 2, wherein the echo cancellation model comprises a multi-layer recursive network.
4. The echo cancellation method of claim 2, wherein the analog radio signal comprises:
after the preset impulse response is carried out on the preset reference signal, a preset environmental noise signal is added to obtain a first signal;
and superposing the first signal and a preset original signal to obtain the simulated radio signal.
5. The echo cancellation method according to claim 2, wherein the network to be trained comprises at least one of:
the feedforward sequence memory neural network FSMN, the deep feedforward sequence memory neural network DFSMN, the long and short time memory unit LSTM, the bidirectional long and short time memory unit BLSTM or the gate cycle unit GRU.
6. The echo cancellation method according to claim 1, wherein the reference signal comprises at least the original signal comprises at least one path.
7. The echo cancellation method of claim 1, the method further comprising:
and performing joint training by adopting the obtained original signal, a voice awakening model and a voice recognition model.
8. An echo cancellation processing method, comprising:
simulating a radio signal by using a preset original signal and a preset reference signal;
and training the network to be trained by taking the radio signal obtained by simulation and a preset reference signal as input and taking a preset original signal as a modeling target to obtain the echo cancellation model.
9. The echo cancellation processing method of claim 8, wherein the analog radio signal comprises:
after the preset impulse response is carried out on the preset reference signal, a preset environmental noise signal is added to obtain a first signal;
and superposing the first signal and a preset original signal to obtain the simulated radio signal.
10. The echo cancellation process of claim 8, wherein the network to be trained comprises at least one of:
the feedforward sequence memory neural network FSMN, the deep feedforward sequence memory neural network DFSMN, the long and short time memory unit LSTM, the bidirectional long and short time memory unit BLSTM or the gate cycle unit GRU.
11. A computer-readable storage medium storing computer-executable instructions for performing the echo cancellation method of any one of claims 1 to 7 and/or performing the echo cancellation processing method of any one of claims 8 to 10.
12. An apparatus for echo cancellation comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor: steps for performing the echo cancellation method of any one of claims 1 to 7.
13. An apparatus for echo cancellation comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor: steps for performing the echo cancellation processing method of any one of claims 8 to 10.
14. A smart device comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor:
according to the number of channels of the received radio signals, inputting all reference signals entering a loudspeaker into an echo cancellation model respectively, and calculating to obtain a plurality of reference signal estimation values;
for each channel, subtracting a reference signal estimation value corresponding to the channel from the radio signal of each channel to obtain a plurality of original signal estimation values;
and carrying out normalization processing on the original signal estimation values corresponding to all the channels to obtain original signals entering the microphone array.
15. The smart device of claim 14, wherein the smart device comprises: intelligent audio amplifier, intelligent TV.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453725.0A CN111261179A (en) | 2018-11-30 | 2018-11-30 | Echo cancellation method and device and intelligent equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453725.0A CN111261179A (en) | 2018-11-30 | 2018-11-30 | Echo cancellation method and device and intelligent equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111261179A true CN111261179A (en) | 2020-06-09 |
Family
ID=70946490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811453725.0A Pending CN111261179A (en) | 2018-11-30 | 2018-11-30 | Echo cancellation method and device and intelligent equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111261179A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883154A (en) * | 2020-07-17 | 2020-11-03 | 海尔优家智能科技(北京)有限公司 | Echo cancellation method and apparatus, computer-readable storage medium, and electronic apparatus |
CN112102816A (en) * | 2020-08-17 | 2020-12-18 | 北京百度网讯科技有限公司 | Speech recognition method, apparatus, system, electronic device and storage medium |
CN112202778A (en) * | 2020-09-30 | 2021-01-08 | 联想(北京)有限公司 | Information processing method and device and electronic equipment |
CN112420073A (en) * | 2020-10-12 | 2021-02-26 | 北京百度网讯科技有限公司 | Voice signal processing method, device, electronic equipment and storage medium |
CN112634923A (en) * | 2020-12-14 | 2021-04-09 | 广州智讯通信***有限公司 | Audio echo cancellation method, device and storage medium based on command scheduling system |
CN113450819A (en) * | 2021-05-21 | 2021-09-28 | 音科思(深圳)技术有限公司 | Signal processing method and related product |
CN114512136A (en) * | 2022-03-18 | 2022-05-17 | 北京百度网讯科技有限公司 | Model training method, audio processing method, device, apparatus, storage medium, and program |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1249889A (en) * | 1997-03-06 | 2000-04-05 | 旭化成工业株式会社 | Device and method for processing speech |
WO2000072564A2 (en) * | 1999-05-24 | 2000-11-30 | Matthias Waldorf | Echo compensation device |
CN101370323A (en) * | 2007-08-15 | 2009-02-18 | 美商富迪科技股份有限公司 | Apparatus capable of performing acoustic echo cancellation and a method thereof |
CN103152500A (en) * | 2013-02-21 | 2013-06-12 | 中国对外翻译出版有限公司 | Method for eliminating echo from multi-party call |
CN104157293A (en) * | 2014-08-28 | 2014-11-19 | 福建师范大学福清分校 | Signal processing method for enhancing target voice signal pickup in sound environment |
CN105144674A (en) * | 2013-05-03 | 2015-12-09 | 高通股份有限公司 | Multi-channel echo cancellation and noise suppression |
CN106157953A (en) * | 2015-04-16 | 2016-11-23 | 科大讯飞股份有限公司 | continuous speech recognition method and system |
CN106210368A (en) * | 2016-06-20 | 2016-12-07 | 百度在线网络技术(北京)有限公司 | The method and apparatus eliminating multiple channel acousto echo |
CN107105366A (en) * | 2017-06-15 | 2017-08-29 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit, method and smart machine |
US20170330071A1 (en) * | 2016-05-10 | 2017-11-16 | Google Inc. | Audio processing with neural networks |
CN107483761A (en) * | 2016-06-07 | 2017-12-15 | 电信科学技术研究院 | A kind of echo suppressing method and device |
CN107564539A (en) * | 2017-08-29 | 2018-01-09 | 苏州奇梦者网络科技有限公司 | Towards the acoustic echo removing method and device of microphone array |
US20180040333A1 (en) * | 2016-08-03 | 2018-02-08 | Apple Inc. | System and method for performing speech enhancement using a deep neural network-based signal |
CN107910014A (en) * | 2017-11-23 | 2018-04-13 | 苏州科达科技股份有限公司 | Test method, device and the test equipment of echo cancellor |
US9947338B1 (en) * | 2017-09-19 | 2018-04-17 | Amazon Technologies, Inc. | Echo latency estimation |
CN108028982A (en) * | 2015-09-23 | 2018-05-11 | 三星电子株式会社 | Electronic equipment and its audio-frequency processing method |
CN108429994A (en) * | 2017-02-15 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Audio identification, echo cancel method, device and equipment |
US20180261225A1 (en) * | 2017-03-13 | 2018-09-13 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Multichannel End-to-End Speech Recognition |
CN108604452A (en) * | 2016-02-15 | 2018-09-28 | 三菱电机株式会社 | Voice signal intensifier |
WO2018190547A1 (en) * | 2017-04-14 | 2018-10-18 | 한양대학교 산학협력단 | Deep neural network-based method and apparatus for combined noise and echo removal |
-
2018
- 2018-11-30 CN CN201811453725.0A patent/CN111261179A/en active Pending
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1249889A (en) * | 1997-03-06 | 2000-04-05 | 旭化成工业株式会社 | Device and method for processing speech |
WO2000072564A2 (en) * | 1999-05-24 | 2000-11-30 | Matthias Waldorf | Echo compensation device |
CN101370323A (en) * | 2007-08-15 | 2009-02-18 | 美商富迪科技股份有限公司 | Apparatus capable of performing acoustic echo cancellation and a method thereof |
CN103152500A (en) * | 2013-02-21 | 2013-06-12 | 中国对外翻译出版有限公司 | Method for eliminating echo from multi-party call |
CN105144674A (en) * | 2013-05-03 | 2015-12-09 | 高通股份有限公司 | Multi-channel echo cancellation and noise suppression |
CN104157293A (en) * | 2014-08-28 | 2014-11-19 | 福建师范大学福清分校 | Signal processing method for enhancing target voice signal pickup in sound environment |
CN106157953A (en) * | 2015-04-16 | 2016-11-23 | 科大讯飞股份有限公司 | continuous speech recognition method and system |
CN108028982A (en) * | 2015-09-23 | 2018-05-11 | 三星电子株式会社 | Electronic equipment and its audio-frequency processing method |
CN108604452A (en) * | 2016-02-15 | 2018-09-28 | 三菱电机株式会社 | Voice signal intensifier |
US20170330071A1 (en) * | 2016-05-10 | 2017-11-16 | Google Inc. | Audio processing with neural networks |
CN107483761A (en) * | 2016-06-07 | 2017-12-15 | 电信科学技术研究院 | A kind of echo suppressing method and device |
CN106210368A (en) * | 2016-06-20 | 2016-12-07 | 百度在线网络技术(北京)有限公司 | The method and apparatus eliminating multiple channel acousto echo |
US20180040333A1 (en) * | 2016-08-03 | 2018-02-08 | Apple Inc. | System and method for performing speech enhancement using a deep neural network-based signal |
CN108429994A (en) * | 2017-02-15 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Audio identification, echo cancel method, device and equipment |
US20180261225A1 (en) * | 2017-03-13 | 2018-09-13 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Multichannel End-to-End Speech Recognition |
WO2018190547A1 (en) * | 2017-04-14 | 2018-10-18 | 한양대학교 산학협력단 | Deep neural network-based method and apparatus for combined noise and echo removal |
CN107105366A (en) * | 2017-06-15 | 2017-08-29 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit, method and smart machine |
CN107564539A (en) * | 2017-08-29 | 2018-01-09 | 苏州奇梦者网络科技有限公司 | Towards the acoustic echo removing method and device of microphone array |
US9947338B1 (en) * | 2017-09-19 | 2018-04-17 | Amazon Technologies, Inc. | Echo latency estimation |
CN107910014A (en) * | 2017-11-23 | 2018-04-13 | 苏州科达科技股份有限公司 | Test method, device and the test equipment of echo cancellor |
Non-Patent Citations (2)
Title |
---|
余尤好;: "神经网络在通信***回音对消中的应用", no. 09, pages 1 - 5 * |
崔海徽,王石刚,王高中,蒋志辉: "基于前馈神经网络的自适应回声消除方法", no. 02 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883154A (en) * | 2020-07-17 | 2020-11-03 | 海尔优家智能科技(北京)有限公司 | Echo cancellation method and apparatus, computer-readable storage medium, and electronic apparatus |
CN111883154B (en) * | 2020-07-17 | 2023-11-28 | 海尔优家智能科技(北京)有限公司 | Echo cancellation method and device, computer-readable storage medium, and electronic device |
CN112102816A (en) * | 2020-08-17 | 2020-12-18 | 北京百度网讯科技有限公司 | Speech recognition method, apparatus, system, electronic device and storage medium |
CN112202778A (en) * | 2020-09-30 | 2021-01-08 | 联想(北京)有限公司 | Information processing method and device and electronic equipment |
CN112420073A (en) * | 2020-10-12 | 2021-02-26 | 北京百度网讯科技有限公司 | Voice signal processing method, device, electronic equipment and storage medium |
CN112420073B (en) * | 2020-10-12 | 2024-04-16 | 北京百度网讯科技有限公司 | Voice signal processing method, device, electronic equipment and storage medium |
CN112634923A (en) * | 2020-12-14 | 2021-04-09 | 广州智讯通信***有限公司 | Audio echo cancellation method, device and storage medium based on command scheduling system |
CN112634923B (en) * | 2020-12-14 | 2021-11-19 | 广州智讯通信***有限公司 | Audio echo cancellation method, device and storage medium based on command scheduling system |
CN113450819A (en) * | 2021-05-21 | 2021-09-28 | 音科思(深圳)技术有限公司 | Signal processing method and related product |
CN114512136A (en) * | 2022-03-18 | 2022-05-17 | 北京百度网讯科技有限公司 | Model training method, audio processing method, device, apparatus, storage medium, and program |
CN114512136B (en) * | 2022-03-18 | 2023-09-26 | 北京百度网讯科技有限公司 | Model training method, audio processing method, device, equipment, storage medium and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111261179A (en) | Echo cancellation method and device and intelligent equipment | |
CN107452389B (en) | Universal single-track real-time noise reduction method | |
CN111161752B (en) | Echo cancellation method and device | |
JP5587396B2 (en) | System, method and apparatus for signal separation | |
US20190222691A1 (en) | Data driven echo cancellation and suppression | |
Li et al. | Online direction of arrival estimation based on deep learning | |
Mertins et al. | Room impulse response shortening/reshaping with infinity-and $ p $-norm optimization | |
CN108429994B (en) | Audio identification and echo cancellation method, device and equipment | |
Xiao et al. | Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation | |
CN106537501B (en) | Reverberation estimator | |
KR102076760B1 (en) | Method for cancellating nonlinear acoustic echo based on kalman filtering using microphone array | |
CN114283795A (en) | Training and recognition method of voice enhancement model, electronic equipment and storage medium | |
KR102401959B1 (en) | Joint training method and apparatus for deep neural network-based dereverberation and beamforming for sound event detection in multi-channel environment | |
CN116030823B (en) | Voice signal processing method and device, computer equipment and storage medium | |
Su et al. | Perceptually-motivated environment-specific speech enhancement | |
CN109379652B (en) | Earphone active noise control secondary channel off-line identification method | |
WO2022256577A1 (en) | A method of speech enhancement and a mobile computing device implementing the method | |
Pfeifenberger et al. | Deep complex-valued neural beamformers | |
KR101587844B1 (en) | Microphone signal compensation apparatus and method of the same | |
JP2024526679A (en) | Data Augmentation for Speech Improvement | |
KR102045953B1 (en) | Method for cancellating mimo acoustic echo based on kalman filtering | |
Ayrapetian et al. | Asynchronous acoustic echo cancellation over wireless channels | |
JP4094523B2 (en) | Echo canceling apparatus, method, echo canceling program, and recording medium recording the program | |
Tang et al. | A Time-Varying Forgetting Factor-Based QRRLS Algorithm for Multichannel Speech Dereverberation | |
US20240127842A1 (en) | Apparatus, Methods and Computer Programs for Audio Signal Enhancement Using a Dataset |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40031322 Country of ref document: HK |