CN108766456B

CN108766456B - Voice processing method and device

Info

Publication number: CN108766456B
Application number: CN201810496822.1A
Authority: CN
Inventors: 周舒然; 李志飞
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2020-01-07
Anticipated expiration: 2038-05-22
Also published as: CN108766456A

Abstract

The invention provides a voice processing method and a device, wherein the method comprises the following steps: acquiring at least one near-end signal through a microphone array; performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal; respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal; performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal; and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. Therefore, the scheme provided by the invention can improve the signal-to-noise ratio.

Description

Voice processing method and device

Technical Field

The embodiment of the invention relates to the technical field of voice processing, in particular to a voice processing method and device.

Background

The intelligent voice technology is applied more and more widely at present, and each intelligent voice device can interact with a user by utilizing the intelligent voice technology. The voice signal received by the smart voice device may include a near-end signal and a reference signal. After a reference signal received by the voice terminal is sounded through a loudspeaker, the reference signal can form an echo.

Currently, echo is usually processed by speech in order to reduce the echo. But often results in distortion of the near-end speech while reducing residual echo during speech processing. The sound sounds not flat and harsh. It can be seen that in the existing manner, the process of speech processing results in a low signal-to-noise ratio.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for speech processing, which mainly aim to improve a signal-to-noise ratio.

In a first aspect, an embodiment of the present invention provides a speech processing method, where the speech processing method includes:

acquiring at least one near-end signal through a microphone array;

performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;

respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal;

performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;

and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.

Alternatively to this, the first and second parts may,

the noise reduction and gain processing of the nonlinear echo suppression output signal includes:

determining a signal-to-noise ratio of the nonlinear echo suppression output signal;

determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);

wherein T characterizes the noise reduction output signal; the P characterizes the nonlinear echo suppression output signal; the S characterizes the signal-to-noise ratio.

Alternatively to this, the first and second parts may,

determining at least one frequency point corresponding to the noise reduction output signal;

determining a first gain value corresponding to each frequency point according to a formula (2);

wherein, the N is_iRepresenting a first gain value corresponding to the ith frequency point; said H_iRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizing a second constant;

and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.

Alternatively to this, the first and second parts may,

the performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal includes:

filtering the reference signal by using at least one preset filter to obtain an estimated echo signal;

performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.

Alternatively to this, the first and second parts may,

the performing beamforming processing on the at least one near-end signal and the at least one residual echo signal respectively includes:

respectively performing time delay adjustment on the at least one near-end signal and the at least one residual echo signal;

performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals;

and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.

Alternatively to this, the first and second parts may,

the acquiring at least one near-end signal by the microphone array includes:

directing near-end speech with the microphone array;

beamforming the oriented near-end voice;

and acquiring the at least one near-end signal from the near-end voice after beam forming.

In a second aspect, an embodiment of the present invention provides a speech processing apparatus, including:

the acquisition module is used for acquiring at least one near-end signal through the microphone array;

the echo cancellation module is used for performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;

a beam forming module, configured to perform beam forming processing on the at least one near-end signal and the at least one residual echo signal respectively;

the nonlinear echo suppression module is used for performing nonlinear echo suppression processing on at least one path of near-end signal and at least one path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;

and the processing module is used for carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.

Alternatively to this, the first and second parts may,

the processing module comprises: a noise reduction submodule;

the noise reduction sub-module is used for determining the signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);

Alternatively to this, the first and second parts may,

the processing module comprises: a gain sub-module;

the gain submodule is used for determining at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a formula (2); and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.

Wherein, the N is_iRepresenting a first gain value corresponding to the ith frequency point; said H_iRepresenting a peak value corresponding to the ith frequency point; the K1 characterizes a first constant; the K2 characterizes a second constant.

In a third aspect, an embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute any one of the foregoing speech processing methods.

In a fourth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform any one of the above-described speech processing methods.

The embodiment of the invention provides a voice processing method and a voice processing device, wherein one or more paths of near-end signals are obtained through a microphone array, and echo cancellation processing is carried out on each path of near-end signal to obtain one or more paths of residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method of speech processing according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method of speech processing according to another embodiment of the present invention;

fig. 3 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a speech processing apparatus according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides a speech processing method, where the speech processing method includes:

step 101: acquiring at least one near-end signal through a microphone array;

step 102: performing echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;

step 103: respectively carrying out beam forming processing on the at least one near-end signal and the at least one residual echo signal;

step 104: performing nonlinear echo suppression processing on at least one near-end signal and the at least one residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal;

step 105: and carrying out noise reduction and gain processing on the nonlinear echo suppression output signal.

According to the embodiment shown in fig. 1, one or more near-end signals are obtained by a microphone array, and echo cancellation processing is performed on each near-end signal to obtain one or more residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

In an embodiment of the present invention, the step 101 in the flowchart shown in fig. 1 of obtaining at least one near-end signal by using a microphone array may include:

directing near-end speech with the microphone array;

beamforming the oriented near-end voice;

In the present embodiment, the number and the type of the microphones included in the microphone array may be determined according to the service requirement. For example, the microphone array includes 4 microphones.

In the present embodiment, each of the microphones included in the microphone array may be omnidirectional sound reception. In order to better acquire the near-end signal, the near-end speech needs to be oriented in the presence of the near-end speech. And performing beam forming according to the oriented near-end voice to gain the oriented direction and suppress the voice in the non-oriented corresponding direction. And acquiring a near-end signal by using each microphone according to the near-end voice after beam forming.

In this embodiment, when the microphone array includes N microphones, N near-end signals are acquired.

According to the embodiment, when near-end voice exists, the near-end voice is firstly oriented by using the microphone array, the oriented near-end voice is subjected to beam forming, and one or more near-end signals are obtained from the beam-formed near-end voice. Since the near-end speech signal is subjected to directional gain and suppression by beamforming, the noise in the acquired near-end signal is low.

In an embodiment of the present invention, the step 102 in the flowchart shown in fig. 1 performs echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal, where the echo cancellation processing may include:

In this embodiment, the type and number of filters can be determined according to the service requirement. When the filter is one, the filter may perform filtering processing on the reference signal by using its own filtering method, and obtain an estimated echo signal. When the number of the filters is two or more, each filter carries out filtering processing on the reference signal, a preferred filter is determined according to a filtering result, and the estimated echo signal is obtained by using the preferred filter.

In this embodiment, the method for determining the residual echo signal corresponding to any near-end signal may be: the estimated echo signal is removed from the near-end signal to obtain a residual echo signal.

According to the above embodiment, the filter is used to filter the reference signal to obtain the estimated echo signal, and the estimated echo signal is cancelled in each path of near-end signal to obtain the residual echo signal corresponding to each path of near-end signal. Since the estimated echo signal is cancelled out in each of the near-end signals, the echo can be reduced.

In an embodiment of the present invention, after the step of canceling the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal, the method in the previous embodiment may further include:

determining a second gain value corresponding to the near-end signal;

and performing echo compression processing on the residual echo signal by using the second gain value.

In this embodiment, the method for determining the second gain value corresponding to the near-end signal may be: taking a near-end signal as an example for explanation, at least one frequency point corresponding to the near-end signal is determined, and at least one correlation coefficient corresponding to the at least one frequency point is determined according to the near-end signal and the residual echo signal of the near-end signal. Executing the following steps for each frequency point: and determining the frequency point gain value of the frequency point as the correlation coefficient corresponding to the frequency point. And carrying out overload processing and smoothing processing on the obtained frequency point gain value to obtain a second gain value. And then, compressing the residual echo signal by using the second gain value to obtain echo compressed output after echo compression.

According to the above embodiment, since the echo compression processing is performed on the residual echo signal by using the second gain value corresponding to the near-end signal. Therefore, echo can be minimized.

In an embodiment of the present invention, the step 103 in the flowchart shown in fig. 1 respectively performs beamforming on the at least one near-end signal and the at least one residual echo signal, which may include:

In this embodiment, since the acquisition time of each near-end signal is different, it is necessary to perform delay adjustment on each near-end signal respectively to unify the time of each near-end signal respectively. And performing beam forming on each path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals. The resulting beamformed near-end signal is clearer.

In this embodiment, since the acquisition time of each path of residual echo signal is different, it is necessary to perform delay adjustment on each path of residual echo signal respectively to unify the time of each path of residual echo signal respectively. And performing beam forming on each path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals. The obtained beam forming residual echo signal is clearer.

According to the above embodiment, the time delay adjustment is performed on each near-end signal and each residual echo signal, and the beamforming is performed on each near-end signal and each residual echo signal, so as to obtain one beamforming near-end signal and one beamforming residual echo signal. Therefore, the obtained beam-formed near-end signal and the beam-formed residual echo signal are clearer.

In an embodiment of the present invention, the step 104 in the flowchart shown in fig. 1 performs a nonlinear echo suppression process on the at least one near-end signal and the at least one residual echo signal after the beamforming process to obtain a nonlinear echo suppressed output signal, which may include:

and carrying out nonlinear echo suppression processing on one path of the beam forming near-end signal and one path of the beam forming residual echo signal to obtain the nonlinear echo suppression output signal.

According to the above embodiment, the nonlinear echo suppression processing is performed on the beamformed near-end signal and the beamformed residual echo signal to obtain a nonlinear echo suppression output signal. Echo can be further reduced.

In an embodiment of the present invention, the performing, in step 105 of the flowchart shown in fig. 1, noise reduction and gain processing on the nonlinear echo suppression output signal may include:

In this embodiment, the method for determining the signal-to-noise ratio of the nonlinear echo suppression output signal may be: determining a total energy corresponding to the beamformed near-end signal and the beamformed residual echo signal, and determining a quotient between the total energy and an energy corresponding to the beamformed residual echo signal. The quotient is the signal-to-noise ratio.

According to the above embodiment, since the signal-to-noise ratio performs noise reduction processing on the nonlinear echo suppression output signal, noise can be reduced to the maximum.

In the present embodiment, K1 and K2 are both two end points within a preset interval, for example, the interval is [ K1, K2 ]. The values of K1 and K2 can be determined according to the service requirements. For example, K1 is 0.9; k2 was 0.99.

In this embodiment, as can be seen from equation (2), when the peak value corresponding to any frequency point is not located in the set interval and is smaller than any value in the interval, a larger gain value may be determined at this time, so as to enhance the nonlinear echo suppression output signal corresponding to the frequency point. When the peak value corresponding to any frequency point is not in the set interval and is greater than any numerical value in the interval, a smaller gain value can be determined at the moment, so that after the nonlinear echo suppression output signal corresponding to the frequency point is gained, the gained peak value can be in the interval to ensure that the gain output signal is not distorted.

According to the embodiment, the signal-to-noise ratio is high because the targeted gain is performed on each frequency point corresponding to the nonlinear echo suppression output signal.

In the following, a speech processing method will be described by taking an example in which the microphone array includes 4 microphones. As shown in fig. 2, the speech processing method includes:

step 201: near-end speech is directed using an array of microphones.

Step 202: and beamforming the oriented near-end voice.

In this step, beamforming is performed on the near-end speech after the direction is determined, so that the direction after the direction is determined is gained, and the speech in the direction corresponding to the non-direction is suppressed.

Step 203: and acquiring at least one near-end signal from the near-end voice after the beam forming by using a microphone array.

In this step, 4 near-end signals are acquired by 4 microphones.

Step 204: and filtering the reference signal by using at least one preset filter to obtain an estimated echo signal.

In this step, the filter may perform filtering processing on the reference signal by using its own filtering method, and obtain an estimated echo signal. The estimated echo signal is removed from the near-end signal to obtain a residual echo signal.

Step 205: performing for each near-end signal: eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal; and determining a second gain value corresponding to the near-end signal, and performing echo compression processing on the residual echo signal by using the second gain value.

Step 206: and respectively carrying out time delay adjustment on each near-end signal and each residual echo signal.

In this step, since the time for acquiring each near-end signal is different, time delay adjustment needs to be performed on each near-end signal, so as to respectively unify the time of each near-end signal.

Step 207: and performing beam forming on each path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals.

Step 208: and performing beam forming on each path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.

Step 209: and carrying out nonlinear echo suppression processing on the one path of beam forming near-end signal and the one path of beam forming residual echo signal to obtain a nonlinear echo suppression output signal.

Step 210: a signal-to-noise ratio of the nonlinear echo suppression output signal is determined.

In this step, the total energy corresponding to the beamformed near-end signal and the beamformed residual echo signal is determined, and the quotient between the total energy and the energy corresponding to the beamformed residual echo signal is determined. The quotient is the signal-to-noise ratio.

Step 211: and determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal according to the signal-to-noise ratio.

In this step, a noise reduction output signal corresponding to the nonlinear echo suppression output signal is determined using formula (1).

Step 212: and determining at least one frequency point corresponding to the nonlinear echo suppression output signal after the noise reduction treatment.

Step 213: and determining a first gain value corresponding to each frequency point.

In this step, a first gain value corresponding to each frequency point is determined according to formula (2).

In this step, when the peak value corresponding to any frequency point is not within the set interval and is smaller than any numerical value within the interval, a larger gain value can be determined at this time, so as to enhance the nonlinear echo suppression output signal corresponding to the frequency point. When the peak value corresponding to any frequency point is not in the set interval and is greater than any numerical value in the interval, a smaller gain value can be determined at the moment, so that after the nonlinear echo suppression output signal corresponding to the frequency point is gained, the gained peak value can be in the interval to ensure that the gain output signal is not distorted.

Step 214: and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.

As shown in fig. 3, an embodiment of the present invention provides a speech processing apparatus, including:

an obtaining module 301, configured to obtain at least one near-end signal through a microphone array;

an echo cancellation module 302, configured to perform echo cancellation processing on the at least one near-end signal to obtain at least one residual echo signal;

a beam forming module 303, configured to perform beam forming processing on the at least one near-end signal and the at least one residual echo signal respectively;

a nonlinear echo suppression module 304, configured to perform nonlinear echo suppression processing on the at least one near-end signal and the at least one residual echo signal after the beamforming processing to obtain a nonlinear echo suppression output signal;

a processing module 305, configured to perform noise reduction and gain processing on the nonlinear echo suppression output signal.

According to the embodiment shown in fig. 3, in the present scheme, when the microphone array acquires a near-end signal, the near-end signal is subjected to echo cancellation processing, beamforming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the embodiment provided by the invention can improve the signal-to-noise ratio.

In one embodiment of the invention, as shown in fig. 4, the processing module 305 may include a noise reduction sub-module 3051 for determining a signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a formula (1);

In an embodiment of the present invention, as shown in fig. 4, the processing module 305 may include a gain sub-module 3052, configured to determine at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a formula (2); and determining the gain output signal corresponding to each frequency point by using the first gain value corresponding to each frequency point.

In an embodiment of the present invention, the echo cancellation module 302 is configured to perform filtering processing on a reference signal by using at least one preset filter, so as to obtain an estimated echo signal; performing, for each of the near-end signals: and eliminating the estimated echo signal in the near-end signal to obtain a residual echo signal corresponding to the near-end signal.

In an embodiment of the present invention, the echo cancellation module 302 is further configured to determine a second gain value corresponding to the near-end signal; and performing echo compression processing on the residual echo signal by using the second gain value.

In an embodiment of the present invention, the beam forming module 303 is configured to perform delay adjustment on the at least one near-end signal and the at least one residual echo signal, respectively; performing beam forming on the at least one path of near-end signals after the time delay adjustment to obtain a path of beam forming near-end signals; and performing beam forming on the at least one path of residual echo signals after the time delay adjustment to obtain a path of beam forming residual echo signals.

In an embodiment of the present invention, the nonlinear echo suppression module 304 is configured to perform nonlinear echo suppression processing on one of the beam-forming near-end signals and one of the beam-forming residual echo signals to obtain the nonlinear echo suppression output signal.

In an embodiment of the present invention, the obtaining module 301 is configured to utilize the microphone array to orient near-end speech; beamforming the oriented near-end voice; and acquiring the at least one near-end signal from the near-end voice after beam forming.

An embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the voice processing method described in any one of the above.

In one embodiment of the present invention, an electronic device is provided, as shown in fig. 5, which includes a processor 401, a memory 402, and a bus 403; the processor 401 and the memory 402 complete communication with each other through the bus 403; the processor 401 is configured to call program instructions in the memory 402 to perform any one of the above-mentioned speech processing methods.

Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

The embodiments of the invention have at least the following beneficial effects:

1. in the embodiment of the invention, one or more paths of near-end signals are obtained through the microphone array, and echo cancellation processing is carried out on each path of near-end signal to obtain one or more paths of residual echo signals. And then, respectively carrying out beam forming processing on each path of near-end signal and each path of residual echo signal, and carrying out nonlinear echo suppression processing on each path of near-end signal and each path of residual echo signal after the beam forming processing to obtain a nonlinear echo suppression output signal. And finally, carrying out noise reduction and gain processing on the nonlinear echo suppression output signal. As can be seen from the above, in the present solution, when the microphone array acquires the near-end signal, the near-end signal is subjected to echo cancellation processing, beam forming processing, nonlinear echo suppression processing, and noise reduction and gain processing. The echo is suppressed, and simultaneously, the sound is not distorted to the maximum extent. Therefore, the scheme provided by the embodiment of the invention can improve the signal to noise ratio.

2. In the embodiment of the invention, when near-end voice exists, the near-end voice is firstly oriented by using the microphone array, the beam forming is carried out on the oriented near-end voice, and one or more near-end signals are obtained from the beam-formed near-end voice. Since the near-end speech signal is subjected to directional gain and suppression by beamforming, the noise in the acquired near-end signal is low.

3. In the embodiment of the invention, the filter is used for filtering the reference signal to obtain the estimated echo signal, and the estimated echo signal is respectively eliminated from each path of near-end signal to obtain the residual echo signal corresponding to each path of near-end signal. Since the estimated echo signal is cancelled out in each of the near-end signals, the echo can be reduced.

4. In the embodiment of the invention, the echo compression processing is carried out on the residual echo signal by utilizing the second gain value corresponding to the near-end signal. Therefore, echo can be minimized.

5. In the embodiment of the invention, time delay adjustment is respectively carried out on each path of near-end signal and each path of residual echo signal, and beam forming is respectively carried out on each path of near-end signal and each path of residual echo signal, so that one path of beam forming near-end signal and one path of beam forming residual echo signal are obtained. Therefore, the obtained beam-formed near-end signal and the beam-formed residual echo signal are clearer.

6. In the embodiment of the invention, the beam forming near-end signal and the beam forming residual echo signal are subjected to nonlinear echo suppression processing to obtain a nonlinear echo suppression output signal. Echo can be further reduced.

7. In the embodiment of the invention, the signal-to-noise ratio carries out noise reduction processing on the nonlinear echo suppression output signal, so that the noise can be reduced to the maximum extent.

8. In the embodiment of the invention, the signal-to-noise ratio is higher because the targeted gain is carried out on each frequency point corresponding to the nonlinear echo suppression output signal.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of speech processing, comprising:

acquiring at least one near-end signal through a microphone array;

carrying out noise reduction and gain processing on the nonlinear echo suppression output signal;

the acquiring at least one near-end signal by the microphone array includes:

directing near-end speech with the microphone array;

beamforming the oriented near-end voice;

2. The speech processing method according to claim 1,

determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a first formula;

the first formula includes:

3. The speech processing method according to claim 2,

determining a first gain value corresponding to each frequency point according to a second formula;

the second formula includes:

4. The speech processing method according to any one of claims 1-3,

5. The speech processing method according to any one of claims 1-3,

6. A speech processing apparatus, comprising:

the processing module is used for carrying out noise reduction and gain processing on the nonlinear echo suppression output signal;

the acquisition module is used for utilizing the microphone array to orient near-end voice; beamforming the oriented near-end voice; and acquiring the at least one near-end signal from the near-end voice after beam forming.

7. The speech processing apparatus according to claim 6,

the processing module comprises: a noise reduction submodule;

the noise reduction sub-module is used for determining the signal-to-noise ratio of the nonlinear echo suppression output signal; determining a noise reduction output signal corresponding to the nonlinear echo suppression output signal through a first formula;

the first formula includes:

8. The speech processing apparatus according to claim 7,

the processing module comprises: a gain sub-module;

the gain submodule is used for determining at least one frequency point corresponding to the noise reduction output signal; determining a first gain value corresponding to each frequency point according to a second formula; determining a gain output signal corresponding to each frequency point by using a first gain value corresponding to each frequency point;

the second formula includes:

9. A storage medium, characterized in that the storage medium includes a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the voice processing method according to any one of claims 1 to 5.

10. An electronic device, wherein the electronic device comprises a processor, a memory and a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the speech processing method of any one of claims 1 to 5.