CN108766456B - Voice processing method and device - Google Patents

Voice processing method and device Download PDF

Info

Publication number
CN108766456B
CN108766456B CN201810496822.1A CN201810496822A CN108766456B CN 108766456 B CN108766456 B CN 108766456B CN 201810496822 A CN201810496822 A CN 201810496822A CN 108766456 B CN108766456 B CN 108766456B
Authority
CN
China
Prior art keywords
signal
processing
output signal
echo
beam forming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810496822.1A
Other languages
Chinese (zh)
Other versions
CN108766456A (en
Inventor
周舒然
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen China Investment Co Ltd
Mobvoi Innovation Technology Co Ltd
Original Assignee
Chumen Wenwen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chumen Wenwen Information Technology Co Ltd filed Critical Chumen Wenwen Information Technology Co Ltd
Priority to CN201810496822.1A priority Critical patent/CN108766456B/en
Publication of CN108766456A publication Critical patent/CN108766456A/en
Priority to PCT/CN2019/087301 priority patent/WO2019223603A1/en
Application granted granted Critical
Publication of CN108766456B publication Critical patent/CN108766456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

The invention provides a voice processing method and a device, wherein the method comprises the following steps: acquiring at least one near-end signal through a microphone array; performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal; respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal; performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal; and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. Therefore, the scheme provided by the invention can improve the signal-to-noise ratio.

Description

Voice processing method and device
Technical Field
The embodiment of the invention relates to the technical field of voice processing, in particular to a voice processing method and device.
Background
The intelligent voice technology is applied more and more widely at present, and each intelligent voice device can interact with a user by utilizing the intelligent voice technology. The voice signal received by the smart voice device may include a near-end signal and a reference signal. After a reference signal received by the voice terminal is sounded through a loudspeaker, the reference signal can form an echo.
Currently, echo is usually processed by speech in order to reduce the echo. But often results in distortion of the near-end speech while reducing residual echo during speech processing. The sound sounds not flat and harsh. It can be seen that in the existing manner, the process of speech processing results in a low signal-to-noise ratio.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for speech processing, which mainly aim to improve a signal-to-noise ratio.
In a first aspect, an embodiment of the present invention provides a speech processing method, where the speech processing method includes:
acquiring at least one near-end signal through a microphone array;
performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal;
performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;
and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.
Alternatively to this, the first and second parts may,
the noise reduction and gain processing of the nonlinear echo suppression output signal includes:
determining a signal-to-noise ratio of the nonlinear echo suppression output signal;
determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);
Figure BDA0001669407260000021
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
Alternatively to this, the first and second parts may,
the noise reduction and gain processing of the nonlinear echo suppression output signal includes:
determining at least one frequency point corresponding to the noise reduction output signal;
determining a first gain value corresponding to each frequency point according to a formula (2);
Figure BDA0001669407260000022
wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant;
and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
Alternatively to this, the first and second parts may,
the performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal includes:
filtering the reference signal by using at least one preset filter to obtain an estimated echo signal;
performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.
Alternatively to this, the first and second parts may,
the performing beamforming processing on the at least one near-end signal and the at least one residual echo signal respectively includes:
respectively performing time delay adjustment on the at least one near-end signal and the at least one residual echo signal;
performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals;
and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.
Alternatively to this, the first and second parts may,
the acquiring at least one near-end signal by the microphone array includes:
directing near-end speech with the microphone array;
beamforming the oriented near-end voice;
and acquiring the at least one near-end signal from the near-end voice after beam forming.
In a second aspect, an embodiment of the present invention provides a speech processing apparatus, including:
the acquisition module is used for acquiring at least one near-end signal through the microphone array;
the echo cancellation module is used for performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
a beam forming module, configured to perform beam forming processing on the at least one near-end signal and the at least one residual echo signal respectively;
the nonlinear echo suppression module is used for performing nonlinear echo suppression processing on at least one path of near-end signal and at least one path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;
and the processing module is used for carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.
Alternatively to this, the first and second parts may,
the processing module comprises: a noise reduction submodule;
the noise reduction sub-module is used for determining the signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);
Figure BDA0001669407260000031
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
Alternatively to this, the first and second parts may,
the processing module comprises: a gain sub-module;
the gain submodule is used for determining at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a formula (2); and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
Figure BDA0001669407260000041
Wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizes a second constant.
In a third aspect, an embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute any one of the foregoing speech processing methods.
In a fourth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform any one of the above-described speech processing methods.
The embodiment of the invention provides a voice processing method and a voice processing device, wherein one or more paths of near-end signals are obtained through a microphone array, and echo cancellation processing is carried out on each path of near-end signal to obtain one or more paths of residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method of speech processing according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method of speech processing according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a speech processing apparatus according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a speech processing method, where the speech processing method includes:
step 101: acquiring at least one near-end signal through a microphone array;
step 102: performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
step 103: respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal;
step 104: performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;
step 105: and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.
According to the embodiment shown in fig. 1, one or more near-end signals are obtained by a microphone array, and echo cancellation processing is performed on each near-end signal to obtain one or more residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.
In an embodiment of the present invention, the step 101 in the flowchart shown in fig. 1 of obtaining at least one near-end signal by using a microphone array may include:
directing near-end speech with the microphone array;
beamforming the oriented near-end voice;
and acquiring the at least one near-end signal from the near-end voice after beam forming.
In the present embodiment, the number and the type of the microphones included in the microphone array may be determined according to the service requirement. For example, the microphone array includes 4 microphones.
In the present embodiment, each of the microphones included in the microphone array may be omnidirectional sound reception. In order to better acquire the near-end signal, the near-end speech needs to be oriented in the presence of the near-end speech. And performing beam forming according to the oriented near-end voice to gain the oriented direction and suppress the voice in the non-oriented corresponding direction. And acquiring a near-end signal by using each microphone according to the near-end voice after beam forming.
In this embodiment, when the microphone array includes N microphones, N near-end signals are acquired.
According to the embodiment, when near-end voice exists, the near-end voice is firstly oriented by using the microphone array, the oriented near-end voice is subjected to beam forming, and one or more near-end signals are obtained from the beam-formed near-end voice. Since the near-end speech signal is subjected to directional gain and suppression by beamforming, the noise in the acquired near-end signal is low.
In an embodiment of the present invention, the step 102 in the flowchart shown in fig. 1 performs echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal, where the echo cancellation processing may include:
filtering the reference signal by using at least one preset filter to obtain an estimated echo signal;
performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.
In this embodiment, the type and number of filters can be determined according to the service requirement. When the filter is one, the filter may perform filtering processing on the reference signal by using its own filtering method, and obtain an estimated echo signal. When the number of the filters is two or more, each filter carries out filtering processing on the reference signal, a preferred filter is determined according to a filtering result, and the estimated echo signal is obtained by using the preferred filter.
In this embodiment, the method for determining the residual echo signal corresponding to any near-end signal may be: the estimated echo signal is removed from the near-end signal to obtain a residual echo signal.
According to the above embodiment, the filter is used to filter the reference signal to obtain the estimated echo signal, and the estimated echo signal is cancelled in each path of near-end signal to obtain the residual echo signal corresponding to each path of near-end signal. Since the estimated echo signal is cancelled out in each of the near-end signals, the echo can be reduced.
In an embodiment of the present invention, after the step of canceling the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal, the method in the previous embodiment may further include:
determining a second gain value corresponding to the near-end signal;
and performing echo compression processing on the residual echo signal by using the second gain value.
In this embodiment, the method for determining the second gain value corresponding to the near-end signal may be: taking a near-end signal as an example for explanation, at least one frequency point corresponding to the near-end signal is determined, and at least one correlation coefficient corresponding to the at least one frequency point is determined according to the near-end signal and the residual echo signal of the near-end signal. Executing the following steps for each frequency point: and determining the frequency point gain value of the frequency point as the correlation coefficient corresponding to the frequency point. And carrying out overload processing and smoothing processing on the obtained frequency point gain value to obtain a second gain value. And then, compressing the residual echo signal by using the second gain value to obtain echo compressed output after echo compression.
According to the above embodiment, since the echo compression processing is performed on the residual echo signal by using the second gain value corresponding to the near-end signal. Therefore, echo can be minimized.
In an embodiment of the present invention, the step 103 in the flowchart shown in fig. 1 respectively performs beamforming on the at least one near-end signal and the at least one residual echo signal, which may include:
respectively performing time delay adjustment on the at least one near-end signal and the at least one residual echo signal;
performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals;
and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.
In this embodiment, since the acquisition time of each near-end signal is different, it is necessary to perform delay adjustment on each near-end signal respectively to unify the time of each near-end signal respectively. And performing beam forming on each path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals. The resulting beamformed near-end signal is clearer.
In this embodiment, since the acquisition time of each path of residual echo signal is different, it is necessary to perform delay adjustment on each path of residual echo signal respectively to unify the time of each path of residual echo signal respectively. And performing beam forming on each path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals. The obtained beam forming residual echo signal is clearer.
According to the above embodiment, the time delay adjustment is performed on each near-end signal and each residual echo signal, and the beamforming is performed on each near-end signal and each residual echo signal, so as to obtain one beamforming near-end signal and one beamforming residual echo signal. Therefore, the obtained beam-formed near-end signal and the beam-formed residual echo signal are clearer.
In an embodiment of the present invention, the step 104 in the flowchart shown in fig. 1 performs a nonlinear echo suppression process on the at least one near-end signal and the at least one residual echo signal after the beamforming process to obtain a nonlinear echo suppressed output signal, which may include:
and carrying out nonlinear echo suppression processing on one path of the beam forming near-end signal and one path of the beam forming residual echo signal to obtain the nonlinear echo suppression output signal.
According to the above embodiment, the nonlinear echo suppression processing is performed on the beamformed near-end signal and the beamformed residual echo signal to obtain a nonlinear echo suppression output signal. Echo can be further reduced.
In an embodiment of the present invention, the performing, in step 105 of the flowchart shown in fig. 1, noise reduction and gain processing on the nonlinear echo suppression output signal may include:
determining a signal-to-noise ratio of the nonlinear echo suppression output signal;
determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);
Figure BDA0001669407260000091
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
In this embodiment, the method for determining the signal-to-noise ratio of the nonlinear echo suppression output signal may be: determining a total energy corresponding to the beamformed near-end signal and the beamformed residual echo signal, and determining a quotient between the total energy and an energy corresponding to the beamformed residual echo signal. The quotient is the signal-to-noise ratio.
According to the above embodiment, since the signal-to-noise ratio performs noise reduction processing on the nonlinear echo suppression output signal, noise can be reduced to the maximum.
In an embodiment of the present invention, the performing, in step 105 of the flowchart shown in fig. 1, noise reduction and gain processing on the nonlinear echo suppression output signal may include:
determining at least one frequency point corresponding to the noise reduction output signal;
determining a first gain value corresponding to each frequency point according to a formula (2);
Figure BDA0001669407260000092
wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant;
and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
In the present embodiment, K1 and K2 are both two end points within a preset interval, for example, the interval is [ K1, K2 ]. The values of K1 and K2 can be determined according to the service requirements. For example, K1 is 0.9; k2 was 0.99.
In this embodiment, as can be seen from equation (2), when the peak value corresponding to any frequency point is not located in the set interval and is smaller than any value in the interval, a larger gain value may be determined at this time, so as to enhance the nonlinear echo suppression output signal corresponding to the frequency point. When the peak value corresponding to any frequency point is not in the set interval and is greater than any numerical value in the interval, a smaller gain value can be determined at the moment, so that after the nonlinear echo suppression output signal corresponding to the frequency point is gained, the gained peak value can be in the interval to ensure that the gain output signal is not distorted.
According to the embodiment, the signal-to-noise ratio is high because the targeted gain is performed on each frequency point corresponding to the nonlinear echo suppression output signal.
In the following, a speech processing method will be described by taking an example in which the microphone array includes 4 microphones. As shown in fig. 2, the speech processing method includes:
step 201: near-end speech is directed using an array of microphones.
Step 202: and beamforming the oriented near-end voice.
In this step, beamforming is performed on the near-end speech after the direction is determined, so that the direction after the direction is determined is gained, and the speech in the direction corresponding to the non-direction is suppressed.
Step 203: and acquiring at least one near-end signal from the near-end voice after the beam forming by using a microphone array.
In this step, 4 near-end signals are acquired by 4 microphones.
Step 204: and filtering the reference signal by using at least one preset filter to obtain an estimated echo signal.
In this step, the filter may perform filtering processing on the reference signal by using its own filtering method, and obtain an estimated echo signal. The estimated echo signal is removed from the near-end signal to obtain a residual echo signal.
Step 205: performing for each near-end signal: eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal; and determining a second gain value corresponding to the near-end signal, and performing echo compression processing on the residual echo signal by using the second gain value.
Step 206: and respectively carrying out time delay adjustment on each near-end signal and each residual echo signal.
In this step, since the time for acquiring each near-end signal is different, time delay adjustment needs to be performed on each near-end signal, so as to respectively unify the time of each near-end signal.
Step 207: and performing beam forming on each path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals.
Step 208: and performing beam forming on each path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.
Step 209: and carrying out nonlinear echo suppression processing on the one path of beam forming near-end signal and the one path of beam forming residual echo signal to obtain a nonlinear echo suppression output signal.
Step 210: a signal-to-noise ratio of the nonlinear echo suppression output signal is determined.
In this step, the total energy corresponding to the beamformed near-end signal and the beamformed residual echo signal is determined, and the quotient between the total energy and the energy corresponding to the beamformed residual echo signal is determined. The quotient is the signal-to-noise ratio.
Step 211: and determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal according to the signal-to-noise ratio.
In this step, a noise reduction output signal corresponding to the nonlinear echo suppression output signal is determined using formula (1).
Step 212: and determining at least one frequency point corresponding to the nonlinear echo suppression output signal after the noise reduction treatment.
Step 213: and determining a first gain value corresponding to each frequency point.
In this step, a first gain value corresponding to each frequency point is determined according to formula (2).
In this step, when the peak value corresponding to any frequency point is not within the set interval and is smaller than any numerical value within the interval, a larger gain value can be determined at this time, so as to enhance the nonlinear echo suppression output signal corresponding to the frequency point. When the peak value corresponding to any frequency point is not in the set interval and is greater than any numerical value in the interval, a smaller gain value can be determined at the moment, so that after the nonlinear echo suppression output signal corresponding to the frequency point is gained, the gained peak value can be in the interval to ensure that the gain output signal is not distorted.
Step 214: and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
As shown in fig. 3, an embodiment of the present invention provides a speech processing apparatus, including:
an obtaining module 301, configured to obtain at least one near-end signal through a microphone array;
an echo cancellation module 302, configured to perform echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
a beam forming module 303, configured to perform beam forming processing on the at least one near-end signal and the at least one residual echo signal respectively;
a nonlinear echo suppression module 304, configured to perform nonlinear echo suppression processing on the at least one near-end signal and the at least one residual echo signal after the beamforming processing to obtain a nonlinear echo suppression output signal;
a processing module 305, configured to perform noise reduction and gain processing on the nonlinear echo suppression output signal.
According to the embodiment shown in fig. 3, in the present scheme, when the microphone array acquires a near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the embodiment provided by the invention can improve the signal-to-noise ratio.
In one embodiment of the invention, as shown in fig. 4, the processing module 305 may include a noise reduction sub-module 3051 for determining a signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);
Figure BDA0001669407260000121
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
In an embodiment of the present invention, as shown in fig. 4, the processing module 305 may include a gain sub-module 3052, configured to determine at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a formula (2); and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
Figure BDA0001669407260000122
Wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizes a second constant.
In an embodiment of the present invention, the echo cancellation module 302 is configured to perform filtering processing on a reference signal by using at least one preset filter, so as to obtain an estimated echo signal; performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.
In an embodiment of the present invention, the echo cancellation module 302 is further configured to determine a second gain value corresponding to the near-end signal; and performing echo compression processing on the residual echo signal by using the second gain value.
In an embodiment of the present invention, the beam forming module 303 is configured to perform delay adjustment on the at least one near-end signal and the at least one residual echo signal, respectively; performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals; and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.
In an embodiment of the present invention, the nonlinear echo suppression module 304 is configured to perform nonlinear echo suppression processing on one of the beam-forming near-end signals and one of the beam-forming residual echo signals to obtain the nonlinear echo suppression output signal.
In an embodiment of the present invention, the obtaining module 301 is configured to utilize the microphone array to orient near-end speech; beamforming the oriented near-end voice; and acquiring the at least one near-end signal from the near-end voice after beam forming.
An embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the voice processing method described in any one of the above.
In one embodiment of the present invention, an electronic device is provided, as shown in fig. 5, which includes a processor 401, a memory 402, and a bus 403; the processor 401 and the memory 402 complete communication with each other through the bus 403; the processor 401 is configured to call program instructions in the memory 402 to perform any one of the above-mentioned speech processing methods.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
The embodiments of the invention have at least the following beneficial effects:
1. in the embodiment of the invention, one or more paths of near-end signals are obtained through the microphone array, and echo cancellation processing is carried out on each path of near-end signal to obtain one or more paths of residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, in the present solution, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beam forming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.
2. In the embodiment of the invention, when near-end voice exists, the near-end voice is firstly oriented by using the microphone array, the beam forming is carried out on the oriented near-end voice, and one or more near-end signals are obtained from the beam-formed near-end voice. Since the near-end speech signal is subjected to directional gain and suppression by beamforming, the noise in the acquired near-end signal is low.
3. In the embodiment of the invention, the filter is used for filtering the reference signal to obtain the estimated echo signal, and the estimated echo signal is respectively eliminated from each path of near-end signal to obtain the residual echo signal corresponding to each path of near-end signal. Since the estimated echo signal is cancelled out in each of the near-end signals, the echo can be reduced.
4. In the embodiment of the invention, the echo compression processing is carried out on the residual echo signal by utilizing the second gain value corresponding to the near-end signal. Therefore, echo can be minimized.
5. In the embodiment of the invention, time delay adjustment is respectively carried out on each path of near-end signal and each path of residual echo signal, and beam forming is respectively carried out on each path of near-end signal and each path of residual echo signal, so that one path of beam forming near-end signal and one path of beam forming residual echo signal are obtained. Therefore, the obtained beam-formed near-end signal and the beam-formed residual echo signal are clearer.
6. In the embodiment of the invention, the beam forming near-end signal and the beam forming residual echo signal are subjected to nonlinear echo suppression processing to obtain a nonlinear echo suppression output signal. Echo can be further reduced.
7. In the embodiment of the invention, the signal-to-noise ratio carries out noise reduction processing on the nonlinear echo suppression output signal, so that the noise can be reduced to the maximum extent.
8. In the embodiment of the invention, the signal-to-noise ratio is higher because the targeted gain is carried out on each frequency point corresponding to the nonlinear echo suppression output signal.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method of speech processing, comprising:
acquiring at least one near-end signal through a microphone array;
performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal;
performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;
carrying out noise reduction and gain processing on the nonlinear echo suppression output signal;
the acquiring at least one near-end signal by the microphone array includes:
directing near-end speech with the microphone array;
beamforming the oriented near-end voice;
and acquiring the at least one near-end signal from the near-end voice after beam forming.
2. The speech processing method according to claim 1,
the noise reduction and gain processing of the nonlinear echo suppression output signal includes:
determining a signal-to-noise ratio of the nonlinear echo suppression output signal;
determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a first formula;
the first formula includes:
Figure FDA0002239127060000011
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
3. The speech processing method according to claim 2,
the noise reduction and gain processing of the nonlinear echo suppression output signal includes:
determining at least one frequency point corresponding to the noise reduction output signal;
determining a first gain value corresponding to each frequency point according to a second formula;
the second formula includes:
wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant;
and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.
4. The speech processing method according to any one of claims 1-3,
the performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal includes:
filtering the reference signal by using at least one preset filter to obtain an estimated echo signal;
performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.
5. The speech processing method according to any one of claims 1-3,
the performing beamforming processing on the at least one near-end signal and the at least one residual echo signal respectively includes:
respectively performing time delay adjustment on the at least one near-end signal and the at least one residual echo signal;
performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals;
and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.
6. A speech processing apparatus, comprising:
the acquisition module is used for acquiring at least one near-end signal through the microphone array;
the echo cancellation module is used for performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;
a beam forming module, configured to perform beam forming processing on the at least one near-end signal and the at least one residual echo signal respectively;
the nonlinear echo suppression module is used for performing nonlinear echo suppression processing on at least one path of near-end signal and at least one path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;
the processing module is used for carrying out noise reduction and gain processing on the nonlinear echo suppression output signal;
the acquisition module is used for utilizing the microphone array to orient near-end voice; beamforming the oriented near-end voice; and acquiring the at least one near-end signal from the near-end voice after beam forming.
7. The speech processing apparatus according to claim 6,
the processing module comprises: a noise reduction submodule;
the noise reduction sub-module is used for determining the signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a first formula;
the first formula includes:
wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.
8. The speech processing apparatus according to claim 7,
the processing module comprises: a gain sub-module;
the gain submodule is used for determining at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a second formula; determining a gain output signal corresponding to each frequency point by using a first gain value corresponding to each frequency point;
the second formula includes:
wherein, the N isiRepresenting a first gain value corresponding to the ith frequency point; said HiRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizes a second constant.
9. A storage medium, characterized in that the storage medium includes a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the voice processing method according to any one of claims 1 to 5.
10. An electronic device, wherein the electronic device comprises a processor, a memory and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the speech processing method of any one of claims 1 to 5.
CN201810496822.1A 2018-05-22 2018-05-22 Voice processing method and device Active CN108766456B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810496822.1A CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device
PCT/CN2019/087301 WO2019223603A1 (en) 2018-05-22 2019-05-16 Voice processing method and apparatus and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810496822.1A CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device

Publications (2)

Publication Number Publication Date
CN108766456A CN108766456A (en) 2018-11-06
CN108766456B true CN108766456B (en) 2020-01-07

Family

ID=64007626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810496822.1A Active CN108766456B (en) 2018-05-22 2018-05-22 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN108766456B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019223603A1 (en) * 2018-05-22 2019-11-28 出门问问信息科技有限公司 Voice processing method and apparatus and electronic device
CN109920405A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing
CN109901113B (en) * 2019-03-13 2020-08-11 出门问问信息科技有限公司 Voice signal positioning method, device and system based on complex environment
CN110097891B (en) * 2019-04-22 2022-04-12 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110310655B (en) * 2019-04-22 2021-10-22 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110335618B (en) * 2019-06-06 2021-07-30 福建星网智慧软件有限公司 Method for improving nonlinear echo suppression and computer equipment
CN111583949A (en) * 2020-04-10 2020-08-25 南京拓灵智能科技有限公司 Howling suppression method, device and equipment
CN111524532B (en) * 2020-04-29 2022-12-13 展讯通信(上海)有限公司 Echo suppression method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8175291B2 (en) * 2007-12-19 2012-05-08 Qualcomm Incorporated Systems, methods, and apparatus for multi-microphone based speech enhancement
US9053697B2 (en) * 2010-06-01 2015-06-09 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
US9357307B2 (en) * 2011-02-10 2016-05-31 Dolby Laboratories Licensing Corporation Multi-channel wind noise suppression system and method
US9226088B2 (en) * 2011-06-11 2015-12-29 Clearone Communications, Inc. Methods and apparatuses for multiple configurations of beamforming microphone arrays
CN102957819B (en) * 2011-09-30 2015-01-28 斯凯普公司 Method and apparatus for processing audio signals
US9497544B2 (en) * 2012-07-02 2016-11-15 Qualcomm Incorporated Systems and methods for surround sound echo reduction
US9936290B2 (en) * 2013-05-03 2018-04-03 Qualcomm Incorporated Multi-channel echo cancellation and noise suppression

Also Published As

Publication number Publication date
CN108766456A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108766456B (en) Voice processing method and device
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
CN109102822B (en) Filtering method and device based on fixed beam forming
US7366662B2 (en) Separation of target acoustic signals in a multi-transducer arrangement
US10755728B1 (en) Multichannel noise cancellation using frequency domain spectrum masking
US8543390B2 (en) Multi-channel periodic signal enhancement system
CN107483761A (en) A kind of echo suppressing method and device
CN111063366A (en) Method and device for reducing noise, electronic equipment and readable storage medium
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
US20180308503A1 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
CN111524532B (en) Echo suppression method, device, equipment and storage medium
CN111885276B (en) Method and system for eliminating echo
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
CN101292508B (en) Acoustic echo canceller
EP3692529A1 (en) An apparatus and a method for signal enhancement
CN107426391A (en) Hand-free call terminal and its audio signal processing method, device
CN115866483A (en) Beam forming method and device for audio signal
US7010556B1 (en) Antenna treatment method and system
US20230262390A1 (en) Audio denoising method and system
CN111970410B (en) Echo cancellation method and device, storage medium and terminal
CN113421582B (en) Microphone voice enhancement method and device, terminal and storage medium
CN112785997B (en) Noise estimation method and device, electronic equipment and readable storage medium
US20240155301A1 (en) Audio device with microphone sensitivity compensator
EP4156182A1 (en) Audio device with distractor attenuator
EP4273860A1 (en) Audio generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230620

Address after: 210034 floor 8, building D11, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Patentee after: New Technology Co.,Ltd.

Patentee after: VOLKSWAGEN (CHINA) INVESTMENT Co.,Ltd.

Address before: 100094 1001, 10th floor, office building a, 19 Zhongguancun Street, Haidian District, Beijing

Patentee before: MOBVOI INFORMATION TECHNOLOGY Co.,Ltd.