KR20120022101A

KR20120022101A - Noise reduction method and device in voice communication of iptv

Info

Publication number: KR20120022101A
Application number: KR1020100085216A
Authority: KR
Inventors: 조정권
Original assignee: (주)제이유디지탈
Priority date: 2010-09-01
Filing date: 2010-09-01
Publication date: 2012-03-12

Abstract

PURPOSE: A noise reducing device in bidirectional television voice communication and a method thereof are provided to extract a target sound of a user. CONSTITUTION: An FFT(Fast Fourier Transform) module(210) performs conversion from the time domain to the frequency domain. An echo canceller module(220) detects whether it is bidirectional communication. The echo canceller module eliminates an echo. A noise cancelling module(230) presumes a noise interval. The noise cancelling module cancels noise. An IFFT(Inverse FFT) module(240) performs conversion from the frequency domain to the time domain.

Description

Noise reduction method and device in voice communication in two-way television voice communication

According to the present invention, when the microphone (hereinafter referred to as a microphone) is mounted on a two-way television (TV) or a video conferencing system 3 to 5 m away from the user's mouth, the speaker output sound and environmental noise are eliminated, and then only the target sound of the user is transmitted to the other party. The present invention relates to a voice signal processing technology and a related hardware device for enabling smooth two-way voice communication.

The microphone used in the video conferencing system is located within 1 to 2 m from the user's mouth to maintain a high signal-to-noise ratio of the user's target sound and the speaker's output sound from the other party's voice.

According to the prior art, a plurality of microphone modules are arranged and used to prevent the microphone from falling away from the user's mouth. Two-way TV can be used with multiple microphone modules or with a microphone mounted on the remote control, but for the convenience of user management and use, various voice signal processing technologies have been applied so that the microphone module can be attached to the top of the screen. Since the noise and speaker output sound were so large compared to the target sound among the input signals, the signal-to-noise ratio was very bad.

The present invention has been devised to solve the above problems, using a microphone array, a sound collecting housing having a special structure, a powerful echo canceller, and a static noise canceling technology, so that the microphone is 3 to 5 meters away from the mouth. The purpose of the present invention is to develop an effective noise canceling algorithm to enable two-way voice communication, and to provide a method for implementing a hardware system capable of processing the same in real time.

In using the voice signal processing technique according to the present invention for achieving the above object, an echo canceller for reducing the output sound of the speaker input to the microphone and noise by spectral subtraction to effectively remove static noise of the environment By using a canceller, a hardware system for preprocessing of digital signal processing techniques, a microphone array, a collection structure with a special structure, a microphone preamplifier with a low pass filter function, and the analog hardware combined with software By emphasizing only the target sound of the transmission to the other party to facilitate a smooth voice communication.

According to the present invention, since the microphone extracts and transmits only the target sound from a two-way TV, a videoconferencing system, etc., which should be used away from the user's mouth, it is effective for smooth two-way communication in many fields such as communication, education, shopping, and entertainment. Content activation becomes possible.

1 is a diagram illustrating a microphone sound collecting housing.
2 is a diagram illustrating a software flowchart of the invention.
3 is a block diagram of a microphone preamplifier including a summer and a low pass filter.
4 is a diagram illustrating an example of two-way TV voice communication.
5 is a block diagram of the signal flow and system of the echo canceller.
6 is a block diagram of a short channel speech enhancement technique.
7 is a block diagram of a real-time hardware system.

Hereinafter, the theory, configuration and operation of the present invention will be described in detail.

In the flowchart shown in FIG. 2, the signal input from the microphone array includes a target sound, a speaker output sound, and environmental noise, as in the example shown in FIG. 4, and is converted into a frequency domain through a Fourier transform. After detecting, use the line out signal just before the speaker output sound as a reference signal to operate the echo canceller.

Remove the speaker output sound included in the microphone input signal. After estimating the static noise section in the single-channel speech enhancer, we remove the static noise from the environment using a Wiener filter, convert it to a speech signal after outputting it by inverse Fourier transform.

Microphone housing And Preamp

If the microphone mounted on the top of the TV is 3 to 5m away from the user's mouth, the microphone's sensitivity and amplification rate must be increased to use the user's voice as valid data. The output sound of and the voice signal of the person on the left and right of the TV, not the user in front of the TV, should be suppressed as much as possible to increase the clarity of the target sound of the user. A digital signal processing technique using a microphone array can be used to direct the input signal of the microphone, but the analog-to-digital conversion (ADC) channel is required for the number of microphones, which in turn increases the computational cost of the microprocessor. In order to reduce and preserve original sound, the present invention intends to implement this function as a hardware device. Using the microphone sound collecting housing as shown in Figure 1

In the mid-range sound absorbing unit 110 and the high-sound absorbing unit 120, the attenuation occurs from the left and right instead of the front of the microphone. The amplification gain of about 2 ~ 5dB can be obtained depending on the size of the housing.

The micro-input signal through the microphone collecting housing is added to an adder using an element such as an operational amplifier as shown in FIG. 3 (310). When the signals present in the direction are incident and the signals are summed, attenuation occurs, so that an environmental noise can be suppressed. The signal passing through the summer is amplified to a valid data level in the microphone preamplifier 320 including the low pass filter 330 mainly in the voice signal band, and then passed through the ADC to be converted into a digital signal. When the signal-to-noise ratio of an analog signal before conversion to a digital signal is low, even a high-performance digital signal processing algorithm is difficult to produce a good result, so the sound collecting housing and the analog circuit at the microphone input stage are very important.

Echo canceller

Feedback and howling can be prevented by removing the speaker output sound (echo) of the TV incident to the microphone and then transmitting the signal to the other party. In one example of the bi-directional communication system configured as shown in FIG. 4, if the echo component is not effectively removed, communication is practically impossible due to an echo phenomenon in which a voice is heard by the speaker of the TV used by the user, and a closed loop is formed between the microphone and the speaker. If the loudspeaker volume is formed and the microphone's sensitivity and gain are high, howling occurs, which makes the equipment unusable and can seriously damage the amplifier and the speaker.

As shown in FIG. 5, since the user's voice and the speaker output sound are input to the microphone together, the power of the signal obtained by subtracting the reference signal from the input signal of the microphone by using the line out signal in front of the speaker output sound as a reference signal is obtained. In the frequency domain, a method of updating the coefficient of the adaptive filter to be minimized is used, and a double talk detection is performed through a method of measuring a correlation between a microphone and a reference signal to determine whether a speaker is in the voice (double talk detection). ), The echo is more efficiently removed by adjusting the gain of the output signal.

The other party's signal x (n), the user's speaker output signal y (n), the user's signal s (n), the environmental noise v (n), the error signal e (n), the coefficient

Line out signal (relative signal) before speaker output that passed through adaptive filter with

Speaking, respectively, are represented by Equations 1 to 3 below.

[Equation 1]

[Equation 2]

&Quot; (3) "

The smoothed power of x (n) in (3)

Is expressed as in Equation 4.

&Quot; (4) "

The algorithm used for this echo canceller uses a complex least squares average method normalized in the frequency domain.

Voice enhancer

A typical short channel noise cancellation system is performed in the frequency domain and estimates the loudness of speech by determining the attenuation or gain of each frequency component. This is a method to remove the ambient noise by using the characteristic that the noise is less change than the voice in the input signal mixed with voice and noise.

A block diagram of the proposed short channel microphone noise cancellation system is shown in FIG. 6. The short channel microphone sound quality improvement system of FIG. 6 estimates the power spectrum of the noise D (k, l) from the magnitude information of the frequency component Y (k, l) of the input signal y (t) with noise added to the voice, and uses the same. After estimating the gain G (k, l), multiply the input magnitude signal spectrum (noise spectral subtraction) and synthesize the speech using an Inverse Fast Fourier Transform (IFFT).

If the noise section is estimated, the gain controller reduces the magnitude of the input signal. If the gain controller is not used, the residual component after subtracting the frequency component of the noise is output.

As a method of estimating the noise section, the variation of the frequency axis and the time axis among the statistical characteristics of the microphone input signal is calculated, and the threshold is set by experimentally examining the variation of the noise section and the target signal section, and compared with each variation. After that, the noise section and the target signal section are distinguished.

The power of each frequency component in the frequency domain

,

Average of

, The full power of that frame

In this case, the normalized change amount in the frequency domain, that is, the frequency flatness, is expressed by Equation 5.

[Equation 5]

The power of one frame in the time domain

, The average of, the total power of that frame

In this case, the normalized change amount in the time domain is expressed by Equation 6.

&Quot; (6) "

When the values of Equations 7 and 8 obtained by averaging Infinite Impulse Responses (IIRs) of Equations 5 and 6, respectively, are larger than experimentally obtained thresholds, they may be regarded as target signals.

[Equation 7]

[Equation 8]

here

Is the IIR smoothing coefficient,

,

Is the experimental threshold.

As another parameter for estimating the desired signal, the IIR average is calculated for the power of the input signal, and the average of the frame power when a certain multiple or more is compared with the current frame power is taken into account. By not doing this, the method of considering the target sound when suddenly a large signal is continuously input is described below.

The IIR average power of the current frame is calculated as in Equation 9.

[Equation 9]

here

Is the IIR coefficient between 0 and 1,

Is the IIR average power of the current frame,

Is the power of the previous frame.

end

When the IIR average is recalculated for power corresponding to a frame less than a certain multiple, it is expressed by Equation 10.

[Equation 10]

here

Is the IIR smoothing coefficient between 0 and 1,

Is the long-term IIR average power up to the current frame,

Is the power of the previous frame.

Calculated in Equation 10

Does not participate in the average calculation for a rapidly large, i.e., large input signal, and generally represents the frame power level of the noise components.

Therefore, as shown in Equation 11, the power of the current frame of the microphone input signal

If it is larger than a certain multiple of, it can be regarded as the purpose signal and protected.

[Equation 11]

here

Where is the power of the current frame, c is an arbitrary constant obtained experimentally in multiples that can be regarded as the destination signal.

The noise estimator of FIG. 6 uses the spectrum of the noise as a frame determined as the noise section in Equations 7, 8, and 11.

In this case, the frame power of the noise section estimated using Equations 9 to 11

Frame power of the

Calculate the IIR averaged signal to noise ratio with

[Equation 12]

here

Is the power of the previous frame

Is the IIR smoothing coefficient between 0 and 1, abs is the absolute operator.

Using a Wiener filter commonly used among short-channel noise cancellation algorithms and applying Equation 12, the weight multiplied by the frequency component of the input signal is expressed as Equation 13.

[Equation 13]

As a variant of the Wiener filter, the gain multiplied by the frequency component of the input signal in FIG. 6 is expressed by Equation 14 below.

[Equation 14]

here

Is an attenuation level of the noise component, and the larger it reduces the noise component.

To reduce the sudden change in gain, the IIR average of the gain is calculated and multiplied by each frequency component of the microphone input signal as shown in Equation 15.

[Equation 15]

here

Is a constant between 0 and 1 that is determined experimentally.

Hardware system for real time processing

FIG. 7 is a block diagram of an independent hardware system for operating a program implementing the software flowchart shown in FIG. 2 in real time.

A real-time processing board based on an Arm 920T processor or a general purpose processor and a Philips CODEC UDA1341TS or AD (analog-to-digital) converter was developed based on the block diagram shown in FIG.

The input signals from the four microphones are summed into one channel by an adder 700 composed of op-amps, amplified by a preamplifier 710 with a first-order lowpass circuit with a cutoff frequency of 17KHz, and a UDA1341TS stereo codec. At 720 it is converted to a digital value. After the program implemented in the assembly language of the C and the Arm 920T is optimized, the algorithm shown in the flowchart of FIG. 2 is optimized and operated in the Arm 920T 740 to remove the static components of the input signal and the static noise of the environment in real time. The user-oriented hardware is designed to automatically store the parameter values used in the EEPROM 790 to maintain the optimal state adapted to the user environment. In order to change the function in detail according to the user's environment, a button 730 and a connection device 750 can be connected to a PC or other device to exchange data. 760). Equipped with a copy protection security chip 770 for copyright protection of the software, LED is displayed on the front panel 780 to indicate the operation status of the program and whether to use the button to facilitate the user's use and control of the equipment. It was.

The above description of a speaker output sound or a structure or hardware system designed to remove environmental noise included in a microphone input signal and a signal processing algorithm in an apparatus such as a two-way TV with a distance between a microphone and a user is illustrated and described. Presented for. It is not intended to be exhaustive or to limit the invention to the precise form of equations or drawings. Many modifications and variations are possible in light of the above teachings, and some combinations of the mathematical formulas and embodiments may be used. It is intended that the scope of the invention be limited not by this detailed description or figures, or by equations, but by the claims appended hereto.

200: signal input module of the microphone array
210: Fast Fourier Transform (FFT) module for transforming from the time domain to the frequency domain
220: echo canceller module for detecting whether bidirectional communication and removing the echo component
230: Module for estimating noise section and removing noise by subtracting spectrum by Wiener filter
240: IFFT (Inverse FFT) module converts from the frequency domain to the time domain

Claims

In a two-way TV or videoconferencing system where the distance between the microphone and the user's mouth is 3 to 5 meters, the other party's voice is output to the speaker to remove the other party's voice from the microphone input signal along with the user's voice, and to suppress environmental noise in the surroundings. By the microphone housing having a structure devised in the present invention to enable a smooth two-way voice communication; A preamplifier section having a summer and a low pass filter function; A microprocessor and main memory unit for executing echo cancellation and speech enhancement programs; A data storage unit for storing a driving driver of a program and hardware equipment; A flash memory unit incorporating a firmware program for driving electronic components and equipment when the power is turned on; A security chip for copy protection of the program; And a signal processing algorithm and hardware for extracting and reinforcing a target sound of a user and transmitting the same to a counterpart in a hardware system including a codec for converting an analog signal into a digital signal.

A noise estimation method for removing static noise in the surrounding environment,
Equation 7 after converting the microphone input signal into the frequency domain

By the amount of change in the frequency component,

By the amount of change in the time domain,

By the IIR average power of the current frame,

By the long term IIR average power of the current frame,

And estimating whether the signal of the current frame is a noise or a target signal by comparing the power of the current frame with an experimentally obtained threshold, and a bidirectional communication method and system characterized by the same.

To subtract the noise component estimated in claim 2 from the target signal, Equation 12

Compute the IIR average for the ratio of the input signal to the estimated noise signal, such as

Based on the Wiener filter of Equation 14

As the IIR mean for the gain of

And a noise canceling method for minimizing a noise component in a target signal section by multiplying each frequency component of a microphone input signal by the signal.

The method according to claim 1, wherein the line out signal before the speaker output is used as a reference signal, and Equation 3 is used to remove the reference signal from the microphone input signal.

An echo canceller based on an adaptive filter in a transform domain using coefficients such as and a two-way speech communication method using a static noise canceling technique by the spectral subtraction method as described in claim 3 with respect to an output sound of an echo canceller.

Method of suppressing environmental noise and giving the maximum gain in the direction to the target sound using the microphone housing having the structure as shown in FIG. 1 and the summer 310 of FIG. 3 and a two-way voice communication system using the same.

1. A voice communication method and system incorporating a microphone housing such as FIG. 1, a preamplifier comprising the summer and low pass filter of FIG. 3, and an echo canceller of claim 4 and a short channel noise canceller of claim 3.