US20050027526A1 - Audio signal processing for speech communication - Google Patents
Audio signal processing for speech communication Download PDFInfo
- Publication number
- US20050027526A1 US20050027526A1 US10/934,059 US93405904A US2005027526A1 US 20050027526 A1 US20050027526 A1 US 20050027526A1 US 93405904 A US93405904 A US 93405904A US 2005027526 A1 US2005027526 A1 US 2005027526A1
- Authority
- US
- United States
- Prior art keywords
- signal
- sound
- levels
- estimate
- intermittent component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims description 3
- 238000004891 communication Methods 0.000 title description 8
- 238000012545 processing Methods 0.000 title description 6
- 238000000034 method Methods 0.000 claims description 29
- 230000000737 periodic effect Effects 0.000 claims description 27
- 238000005311 autocorrelation function Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims 2
- 230000000717 retained effect Effects 0.000 claims 1
- 238000012935 Averaging Methods 0.000 description 12
- 230000003068 static effect Effects 0.000 description 6
- 230000001020 rhythmical effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G3/00—Gain control in amplifiers or frequency changers
- H03G3/20—Automatic control
- H03G3/30—Automatic control in amplifiers having semiconductor devices
- H03G3/32—Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02085—Periodic noise
Definitions
- This invention relates to audio signal processing for speech communication.
- ambient noise in the vicinity of a listener at one location can obscure speech received from a speaker at another location.
- FIG. 1 is a schematic of a communication path for speech.
- FIG. 2 is a schematic of the near-end device 101 .
- FIG. 3 is a schematic of the RX-AVC module 150 .
- FIG. 4 is a schematic of a method for storing information about frame energies.
- FIG. 5 is a graph of the amplitude of pop music sampled at 8 KHz.
- FIG. 6 is a graph of an auto-correlation function of the sound sample in FIG. 5 .
- a far-end device 102 detects far-end sound 105 that can include speech.
- the sound 105 is converted to a signal 106 , the far-end signal, which is transmitted to the near-end device 101 , for example, by modulating a radio frequency signal, interfacing with a network such as the Internet, or sending a signal on a waveguide.
- the transmission of the signal 106 can also include combinations of known signal transmission modes, such as those that use electric, optical, microwave, infrared, and radio signals, and any number of intermediaries, such as switches, computer servers, and satellites.
- the near-end device 101 reproduces the far-end sound 105 .
- the near-end device 101 also detects near-end sound that can include ambient noise 103 .
- the near-end device 101 processes the signal 106 in response to the ambient noise 103 in order to render the far-end sound 105 more human-interpretable to a user of the near-end device 101 .
- the near-end device 101 is a handheld telephone that receives the far-end signal 106 from the far-end device 102 which is a telephone at a remote location.
- the near-end device 101 uses a microphone 112 to detect sound 120 on the near-end.
- An analog signal for the near-end sound 120 can be converted into a digital signal 128 by a processor, CODEC 130 .
- the digital signal 128 is evaluated by a voice activity detector (VAD) 140 , and by a receive signal automatic volume control (RX-AVC) module 150 .
- VAD voice activity detector
- RX-AVC receive signal automatic volume control
- the RX-AVC module 150 monitors the near-end signal 128 for particular components, e.g., using a periodicity detector 157 .
- the RX-AVC module 150 can also have a noise estimator 156 for providing an estimate of noise in the signal.
- the noise estimator can be controlled by triggers from the VAD 140 and the periodicity detector 157 . Values from the noise estimator 156 are used by a dynamic range controller (DRC) 155 to alter the far-end signal 106 .
- DRC dynamic range controller
- the digital signal 128 for the near end sound 120 can be encoded by the encoder 110 for transmission (TX) to the far-end device 102 .
- the near-end device 101 receives the signal 106 for the far-end sound 105 at a receiver (RX).
- the signal 106 is decoded by the decoder 145 and analyzed by a receive path voice activity detector (RX-VAD) 162 .
- the decoded signal 106 is modulated by the DRC module 155 , e.g., to adjust the signal in the response to noise estimates from the noise estimator 156 and flags from the RX-VAD 162 .
- the adjusted signal is converted to an analog signal by CODEC 130 and rendered as sound by the speaker 170 .
- the noise estimator 156 and periodicity detector 157 can be implemented using a RX-AVC processor 151 .
- the RX-AVC processor 151 analyzes the signal for components that are other than a component of interest. Such components can include forms of ambient noise that are not detected by the VAD 140 , for example, forms of noise which are not stationary or which are periodic such as music.
- the component of interest is typically human speech.
- the RX-AVC module 150 controls the level and dynamic range of the far-end sound 105 as a function of the detected noise 103 , for example, by communicating an estimate of noise at the near-end 103 , drc_noise_estimate, to the DRC 155 .
- the RX-AVC processor 151 can store information about the near-end signal 128 for later analysis.
- the processor 151 can be configured to execute a frame energy sampling routine that updates a static memory buffer 152 with information about the energy of each newly received signal frame (e.g., frames F 1 , F 2 , . . . , F 200 ) for the near-end signal 128 .
- the routine can rewrite information about frame energies that are outside of the averaging segment 210 with the new information and update a pointer P 1 to indicate the location of the new information in the static memory 152 .
- information about the frame energies in the averaging segment 210 can be stored in a packed form. Each frame energy is processed prior to storage in the static memory buffer 152 .
- information about the signal frame F 2 is initially computed as a 32-bit value 410 . Since very low frame energies may not be of interest in the context of RX-AVC module 150 , and differentiation of high-level energies may not improve performance, 16 significant bits 420 are extracted from the 32-bit value 410 by clipping 402 and truncating 403 the excessive bits. If the frame energy exceeds a certain threshold, the energy is stored as the maximum 16-bit value. For example, bits of the 32-bit value 410 to the right of the 16 significant bits are rounded. The result is a 16-bit value 420 that is indicative of the frame energy.
- the 16-bit value 420 is obtained from bits 27 to 12 of the 32-bit value 410 .
- the location of the extracted 16-bit value 420 is tunable, e.g., such that in another case bits 25 to 10 are extracted, and so forth.
- bit size of the frame energy information can be obtained by computing the square root of the remaining 16-bit value 420 and storing it as an 8-bit value 425 .
- This 8-bit value 425 can be packed with an 8-bit value 427 similarly obtained for an adjacent frame, e.g., F 1 .
- These values can be stored in static memory. For processing, the values can be retrieved from static memory 152 , and unpacked. Then each unpacked 8-bit value 425 can be squared to obtain the 16-bit processed value 440 .
- the frame energies are stored for only a subset of signal frames, e.g., every second, or every third frame.
- the extent of information stored can be selected according to the size of each signal frame. For example, if each frame corresponds to 5 ms, sufficient performance may be obtained by storing information for a series that consists of every second, third, or fourth frame.
- the stored information about the signal is analyzed to determine the presence of a signal for an intermittent sound with regular periodicity such as a drum beat in pop music.
- the RX-AVC processor 151 uses an auto-correlation function 157 to detect such a periodic component not of interest that occurs simultaneously with human speech that is of interest.
- the algorithm uses auto-correlations of 20 ms frame energies over an averaging segment 210 that is 4 seconds in duration.
- the frame energies for the averaging segment 210 are stored in static memory 152 , e.g., as discussed above.
- the auto-correlation function 157 assesses the correlation between frame energies in the averaging segment 210 that are separated by a fixed number of frames, the separation corresponding to a period.
- the function is typically limited to searching for correlations that have a periodicity of 0.25 to 1 seconds (i.e., corresponding to 1 to 4 Hz).
- the latter range of periodicities which can be characteristic of some musical rhythms, is identified as the search window 220 in FIG. 5 .
- the RX-AVC processor 151 evaluates peaks in the auto-correlation function 157 by the following exemplary criteria:
- Frame periodicities of 13 to 48 are analyzed in this example as these correspond to the 0.25 to 1 second periodicity described above if 20 ms frames are used.
- the thresholds, Threshold_ 1 , Threshold_ 2 , and Threshold_ 3 can be determined empirically or can be set by other algorithms. For example, Threshold_ 1 , Threshold_ 2 , and Threshold_ 3 can be set to 0.70, 0.0625, and 0.25 respectively, as these parameters have been found to characterize the auto-correlation peaks of rhythmic music. Use of the auto-correlation function and tuning of the thresholds can facilitate detection of periodicities that are not perfectly regular. Hence, such detectable, imperfect periodicities are considered periodic herein.
- the periodic signals detected by the RX-AVC processor 151 are periodic in the frequency domain of about 0.3 Hz to 6 Hz, or about 1 Hz to 4 Hz and do not correspond to musical or verbal pitch as would be detected in shorter time analysis. Such periodic signals can be produced by a musical instrument such as a percussion instrument. In addition, any musical instrument that produces a defined pitch can still be detected by the module if it is played in a rhythmic manner, e.g., a manner having repetitive noise bursts.
- a signal that includes pop music with a drum beat that has a period of 0.5 seconds was sampled at 8 KHz.
- the averaging segment 210 used by the module was 4 seconds in duration, and the auto-correlation function 157 searched for periodic signals in a search window of 0.25 to 1.0 seconds, i.e., between 1 Hz and 4 Hz.
- the peak of the auto-correlation function 157 indicates the beat period.
- the normalized auto-correlation function for the averaging segment 210 shown in FIG. 5 is graphed in FIG. 6 .
- the peak of the function 610 is at 25 frames of 20 ms, which corresponds to a beat period of 0.5 seconds.
- the module 150 triggers a signal modulator to alter the signal in order to improve the perception and/or interpretation of a component of interest, e.g., human speech.
- a component of interest e.g., human speech.
- the modulator is the DRC 155 .
- the DRC 155 can compress the dynamic range of the signal based on the level of noise, drc_noise_estimate, which is computed based on the VAD 140 and the RX-AVC 150 .
- the VAD module 140 can be configured to evaluate each noise frame for non-periodic noise by detecting stationarity and non-tonality in the near-end signal 128 as an indication of random noise. Random noise can include Gaussian noise incurred during transmission. Typically, the VAD module 140 activates a trigger, VAD_trigger, when it perceives a signal of interest.
- the VAD module 140 causes the noise estimator 156 to update the drc_noise_estimate value. For example, if the signal level is less than a certain threshold, or if the signal is stationary or non-tonal, the VAD indicator, VAD_trigger, is not activated. This state (NOT VAD_trigger) activates the update_noise_flag 1 flag (Table 1, line 2). As a result, drc_noise_estimate, is updated with the current energy estimate current_energy_estimate (Table 1, line 5).
- the VAD module 140 may be unable to discriminate between a periodic signal component not of interest, such as rhythmic music, and a component of interest, such as speech.
- a periodic signal component When a periodic signal component is detected, the RX-AVC processor 151 provides a second noise estimate that overrides the VAD noise estimate. For example, when the processor 151 detects a periodic component (Table 1, line 4), it triggers the update_noise_flag 2 , which causes the noise estimate drc_noise-_estimate to be overwritten by averaged_energy_estimate, the averaged frame energies from the interval between two consecutive beats (Table 1, line 6).
- the frames that are used for this averaging can be from the middle of the averaging segment 210 , e.g., two seconds prior to the decision instant.
- This value for the noise reflects the level of ambient noise caused by a periodic component such as music more accurately than the VAD noise estimate current_energy_estimate, which does not average energy levels across a full period of the periodic component.
- the RX-AVC processor 151 can evaluate the averaging segment 210 at regular intervals of about 0.25 seconds. Relative to continuous cycling, such an evaluation frequency reduces the amount of processing time required without impairing detection.
- Each evaluation includes resetting the update_noise_flag 2 (Table 1, line 3), and re-evaluating the updated averaging segment 210 for rhythm (Table 1, line 4).
- the VAD 140 can evaluate each frame for noise.
- the above-described exemplary configuration can be used in a handheld telephone which enhances the reproduction of sound from a signal if it detects rhythmic music locally.
- the noise determination routine can include estimating noise levels from intervals of the signal which include a periodic component, but which are free of a second component, e.g., human speech. Speech recognition algorithms can be interfaced with the RX-AVC 150 to identify such intervals.
- a variety of ambient noises can be detected by the RX-AVC module 150 , such as rhythmic music and other periodic background sounds.
- the module can include a pitch detection routine.
- the module can be programmed or trained to discriminate between sounds that have a pitch and/or timbre of a voice and sounds that have a pitch and/or timbre of a musical instrument.
- any of a variety of methods can be used to identify the periodic component.
- the methods can search for periodic or approximately periodic elements in the time domain or in the frequency domain of the signal. For example, Fourier transforms can be applied to the sequence of frame energies to identify recurring signals in the frequency domain.
- the near-end device can be triggered to generate anti-noise which comprises sound waves that cancel periodic components of the ambient noise.
- the techniques may be implemented in hardware, software, or a combination of the two in order to analyze digital or analog signals.
- each device may include a sound input device, such as a microphone, and a sound output device, such as a loudspeaker.
Landscapes
- Telephone Function (AREA)
- Mobile Radio Communication Systems (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
- This application is a continuation application of and claims priority to U.S. application Ser. No. 09/851,399, filed on May 7, 2001.
- This invention relates to audio signal processing for speech communication.
- In typical speech communication over wire or wireless communication networks, ambient noise in the vicinity of a listener at one location can obscure speech received from a speaker at another location.
-
FIG. 1 is a schematic of a communication path for speech. -
FIG. 2 is a schematic of the near-end device 101. -
FIG. 3 is a schematic of the RX-AVC module 150. -
FIG. 4 is a schematic of a method for storing information about frame energies. -
FIG. 5 is a graph of the amplitude of pop music sampled at 8 KHz. -
FIG. 6 is a graph of an auto-correlation function of the sound sample inFIG. 5 . - Referring to the example in
FIG. 1 , a far-end device 102 detects far-end sound 105 that can include speech. Thesound 105 is converted to asignal 106, the far-end signal, which is transmitted to the near-end device 101, for example, by modulating a radio frequency signal, interfacing with a network such as the Internet, or sending a signal on a waveguide. The transmission of thesignal 106 can also include combinations of known signal transmission modes, such as those that use electric, optical, microwave, infrared, and radio signals, and any number of intermediaries, such as switches, computer servers, and satellites. - The near-
end device 101 reproduces the far-end sound 105. The near-end device 101 also detects near-end sound that can includeambient noise 103. The near-end device 101 processes thesignal 106 in response to theambient noise 103 in order to render the far-end sound 105 more human-interpretable to a user of the near-end device 101. - In the example depicted in
FIG. 1 , the near-end device 101 is a handheld telephone that receives the far-end signal 106 from the far-end device 102 which is a telephone at a remote location. - Referring also to the example in
FIG. 2 , the near-end device 101 uses amicrophone 112 to detectsound 120 on the near-end. An analog signal for the near-end sound 120 can be converted into adigital signal 128 by a processor,CODEC 130. Thedigital signal 128 is evaluated by a voice activity detector (VAD) 140, and by a receive signal automatic volume control (RX-AVC)module 150. The RX-AVC module 150 monitors the near-end signal 128 for particular components, e.g., using a periodicity detector 157. The RX-AVC module 150 can also have a noise estimator 156 for providing an estimate of noise in the signal. The noise estimator can be controlled by triggers from theVAD 140 and the periodicity detector 157. Values from the noise estimator 156 are used by a dynamic range controller (DRC) 155 to alter the far-end signal 106. - The
digital signal 128 for thenear end sound 120 can be encoded by theencoder 110 for transmission (TX) to the far-end device 102. - The near-
end device 101 receives thesignal 106 for the far-end sound 105 at a receiver (RX). Thesignal 106 is decoded by the decoder 145 and analyzed by a receive path voice activity detector (RX-VAD) 162. The decodedsignal 106 is modulated by theDRC module 155, e.g., to adjust the signal in the response to noise estimates from the noise estimator 156 and flags from the RX-VAD 162. The adjusted signal is converted to an analog signal byCODEC 130 and rendered as sound by thespeaker 170. - Referring also to
FIG. 3 , the noise estimator 156 and periodicity detector 157 can be implemented using a RX-AVC processor 151. The RX-AVC processor 151 analyzes the signal for components that are other than a component of interest. Such components can include forms of ambient noise that are not detected by theVAD 140, for example, forms of noise which are not stationary or which are periodic such as music. The component of interest is typically human speech. The RX-AVC module 150 controls the level and dynamic range of the far-end sound 105 as a function of the detectednoise 103, for example, by communicating an estimate of noise at the near-end 103, drc_noise_estimate, to theDRC 155. - The RX-
AVC processor 151 can store information about the near-end signal 128 for later analysis. For example, theprocessor 151 can be configured to execute a frame energy sampling routine that updates astatic memory buffer 152 with information about the energy of each newly received signal frame (e.g., frames F1, F2, . . . , F200) for the near-end signal 128. The routine can rewrite information about frame energies that are outside of theaveraging segment 210 with the new information and update a pointer P1 to indicate the location of the new information in thestatic memory 152. - To reduce the demand on system resources, information about the frame energies in the
averaging segment 210 can be stored in a packed form. Each frame energy is processed prior to storage in thestatic memory buffer 152. - Referring to
FIG. 4 , information about the signal frame F2 is initially computed as a 32-bit value 410. Since very low frame energies may not be of interest in the context of RX-AVC module 150, and differentiation of high-level energies may not improve performance, 16significant bits 420 are extracted from the 32-bit value 410 by clipping 402 and truncating 403 the excessive bits. If the frame energy exceeds a certain threshold, the energy is stored as the maximum 16-bit value. For example, bits of the 32-bit value 410 to the right of the 16 significant bits are rounded. The result is a 16-bit value 420 that is indicative of the frame energy. - In the example depicted in
FIG. 4 , the 16-bit value 420 is obtained frombits 27 to 12 of the 32-bit value 410. The location of the extracted 16-bit value 420 is tunable, e.g., such that in anothercase bits 25 to 10 are extracted, and so forth. - Further reduction in bit size of the frame energy information can be obtained by computing the square root of the remaining 16-
bit value 420 and storing it as an 8-bit value 425. This 8-bit value 425 can be packed with an 8-bit value 427 similarly obtained for an adjacent frame, e.g., F1. These values can be stored in static memory. For processing, the values can be retrieved fromstatic memory 152, and unpacked. Then each unpacked 8-bit value 425 can be squared to obtain the 16-bit processed value 440. - In other embodiments, the frame energies are stored for only a subset of signal frames, e.g., every second, or every third frame. The extent of information stored can be selected according to the size of each signal frame. For example, if each frame corresponds to 5 ms, sufficient performance may be obtained by storing information for a series that consists of every second, third, or fourth frame.
- The stored information about the signal is analyzed to determine the presence of a signal for an intermittent sound with regular periodicity such as a drum beat in pop music. In some embodiments, the RX-AVC
processor 151 uses an auto-correlation function 157 to detect such a periodic component not of interest that occurs simultaneously with human speech that is of interest. - Typically, the auto-correlation function 157 is defined as follows:
where -
- N is the averaging segment size, and
which denotes the average sample energy for the frame frm and s[n] is the level of a signal at a discrete time index within the frame. A 20 ms frame that includes information for sound sampled at 8 kHz has 160 time-indexed samples.
- N is the averaging segment size, and
- For example, the algorithm uses auto-correlations of 20 ms frame energies over an averaging
segment 210 that is 4 seconds in duration. The frame energies for the averagingsegment 210 are stored instatic memory 152, e.g., as discussed above. The auto-correlation function 157 assesses the correlation between frame energies in the averagingsegment 210 that are separated by a fixed number of frames, the separation corresponding to a period. The function is typically limited to searching for correlations that have a periodicity of 0.25 to 1 seconds (i.e., corresponding to 1 to 4 Hz). The latter range of periodicities, which can be characteristic of some musical rhythms, is identified as thesearch window 220 inFIG. 5 . - The RX-
AVC processor 151 evaluates peaks in the auto-correlation function 157 by the following exemplary criteria: -
- a. y[max]>Threshold_1;
- c. y[max]−y[min]>Threshold_3;
- where y[i] is a normalized auto-correlation function (R[i]/R[0]);
- max=argi max{y[i]}, i=13, . . . ,48; and
- min=argi min{y[i]}, i=13, . . . ,48. Referring to
FIG. 6 , thepeak height 630 that is evaluated with respect to Threshold_3 is depicted as is therange 620 that is used to in the evaluation of Threshold_2.
- a. y[max]>Threshold_1;
- Frame periodicities of 13 to 48 are analyzed in this example as these correspond to the 0.25 to 1 second periodicity described above if 20 ms frames are used.
- The thresholds, Threshold_1, Threshold_2, and Threshold_3, can be determined empirically or can be set by other algorithms. For example, Threshold_1, Threshold_2, and Threshold_3 can be set to 0.70, 0.0625, and 0.25 respectively, as these parameters have been found to characterize the auto-correlation peaks of rhythmic music. Use of the auto-correlation function and tuning of the thresholds can facilitate detection of periodicities that are not perfectly regular. Hence, such detectable, imperfect periodicities are considered periodic herein.
- The periodic signals detected by the RX-
AVC processor 151 are periodic in the frequency domain of about 0.3 Hz to 6 Hz, or about 1 Hz to 4 Hz and do not correspond to musical or verbal pitch as would be detected in shorter time analysis. Such periodic signals can be produced by a musical instrument such as a percussion instrument. In addition, any musical instrument that produces a defined pitch can still be detected by the module if it is played in a rhythmic manner, e.g., a manner having repetitive noise bursts. - Referring to the example in
FIG. 5 , a signal that includes pop music with a drum beat that has a period of 0.5 seconds was sampled at 8 KHz. The averagingsegment 210 used by the module was 4 seconds in duration, and the auto-correlation function 157 searched for periodic signals in a search window of 0.25 to 1.0 seconds, i.e., between 1 Hz and 4 Hz. The peak of the auto-correlation function 157 indicates the beat period. The normalized auto-correlation function for the averagingsegment 210 shown inFIG. 5 is graphed inFIG. 6 . The peak of thefunction 610 is at 25 frames of 20 ms, which corresponds to a beat period of 0.5 seconds. - When the RX-
AVC processor 151 detects a periodic component to the signal as described above, themodule 150 triggers a signal modulator to alter the signal in order to improve the perception and/or interpretation of a component of interest, e.g., human speech. - In some embodiments, the modulator is the
DRC 155. TheDRC 155 can compress the dynamic range of the signal based on the level of noise, drc_noise_estimate, which is computed based on theVAD 140 and the RX-AVC 150. The level of noise can be sampled as set forth by the pseudocode in Table 1.TABLE 1 Pseudo-code for Noise Determination 1. update_noise_flag1 = FALSE 2. If NOT (VAD_trigger) → update_noise_flag1 = TRUE 3. update_noise_flag2 = FALSE 4. If (rhythm_detect) → update_noise_flag2 = TRUE 5. If (update_noise_flag1 = TRUE) → update drc_noise_estimate with current_energy_estimate 6. Else If (tne_r_update_flag = TRUE) → update drc_noise_estimate with averaged_energy_estimate - The
VAD module 140 can be configured to evaluate each noise frame for non-periodic noise by detecting stationarity and non-tonality in the near-end signal 128 as an indication of random noise. Random noise can include Gaussian noise incurred during transmission. Typically, theVAD module 140 activates a trigger, VAD_trigger, when it perceives a signal of interest. - When the
VAD module 140 does not perceive a signal of interest, theVAD module 140 causes the noise estimator 156 to update the drc_noise_estimate value. For example, if the signal level is less than a certain threshold, or if the signal is stationary or non-tonal, the VAD indicator, VAD_trigger, is not activated. This state (NOT VAD_trigger) activates the update_noise_flag1 flag (Table 1, line 2). As a result, drc_noise_estimate, is updated with the current energy estimate current_energy_estimate (Table 1, line 5). The noise level can be updated as follows:
drc_noise_estimate=α*drc_noise_estimate+(1−α)*current_energy_estimate,
where α is a smoothing constant. - The
VAD module 140 may be unable to discriminate between a periodic signal component not of interest, such as rhythmic music, and a component of interest, such as speech. When a periodic signal component is detected, the RX-AVC processor 151 provides a second noise estimate that overrides the VAD noise estimate. For example, when theprocessor 151 detects a periodic component (Table 1, line 4), it triggers the update_noise_flag2, which causes the noise estimate drc_noise-_estimate to be overwritten by averaged_energy_estimate, the averaged frame energies from the interval between two consecutive beats (Table 1, line 6). The frames that are used for this averaging can be from the middle of the averagingsegment 210, e.g., two seconds prior to the decision instant. This value for the noise reflects the level of ambient noise caused by a periodic component such as music more accurately than the VAD noise estimate current_energy_estimate, which does not average energy levels across a full period of the periodic component. - Different steps of the noise determination routine as set forth in Table 1 can be run with different frequencies. The RX-
AVC processor 151, for example, can evaluate the averagingsegment 210 at regular intervals of about 0.25 seconds. Relative to continuous cycling, such an evaluation frequency reduces the amount of processing time required without impairing detection. Each evaluation includes resetting the update_noise_flag2 (Table 1, line 3), and re-evaluating the updatedaveraging segment 210 for rhythm (Table 1, line 4). In contrast, theVAD 140 can evaluate each frame for noise. - The above-described exemplary configuration can be used in a handheld telephone which enhances the reproduction of sound from a signal if it detects rhythmic music locally.
- In addition to those described above, a number of different embodiments can be used to processing signals in response to locally detected sound in order to improve communications.
- In some embodiments, the noise determination routine can include estimating noise levels from intervals of the signal which include a periodic component, but which are free of a second component, e.g., human speech. Speech recognition algorithms can be interfaced with the RX-
AVC 150 to identify such intervals. - Further, a variety of ambient noises can be detected by the RX-
AVC module 150, such as rhythmic music and other periodic background sounds. - In other embodiments, the module can include a pitch detection routine. The module can be programmed or trained to discriminate between sounds that have a pitch and/or timbre of a voice and sounds that have a pitch and/or timbre of a musical instrument.
- Any of a variety of methods can be used to identify the periodic component. The methods can search for periodic or approximately periodic elements in the time domain or in the frequency domain of the signal. For example, Fourier transforms can be applied to the sequence of frame energies to identify recurring signals in the frequency domain.
- Any of a variety of methods can be used to make the far-
end signal 106 more human-interpretable when it is rendered as sound. For example, the near-end device can be triggered to generate anti-noise which comprises sound waves that cancel periodic components of the ambient noise. - Further, the techniques may be implemented in hardware, software, or a combination of the two in order to analyze digital or analog signals.
- The techniques described here are also not limited to telephones, or the exemplary configuration described above; they may find applicability in any computing or processing environment for communications. For example, desktop computers linked to a computer network can be used to exchange sound communications that include human speech and ambient noise. Typically, each device may include a sound input device, such as a microphone, and a sound output device, such as a loudspeaker.
- Still other implementations are also within the scope of the claims.
Claims (34)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/934,059 US7149685B2 (en) | 2001-05-07 | 2004-09-03 | Audio signal processing for speech communication |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/851,399 US6820054B2 (en) | 2001-05-07 | 2001-05-07 | Audio signal processing for speech communication |
US10/934,059 US7149685B2 (en) | 2001-05-07 | 2004-09-03 | Audio signal processing for speech communication |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/851,399 Continuation US6820054B2 (en) | 2001-05-07 | 2001-05-07 | Audio signal processing for speech communication |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050027526A1 true US20050027526A1 (en) | 2005-02-03 |
US7149685B2 US7149685B2 (en) | 2006-12-12 |
Family
ID=25310683
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/851,399 Expired - Lifetime US6820054B2 (en) | 2001-05-07 | 2001-05-07 | Audio signal processing for speech communication |
US10/934,059 Expired - Fee Related US7149685B2 (en) | 2001-05-07 | 2004-09-03 | Audio signal processing for speech communication |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/851,399 Expired - Lifetime US6820054B2 (en) | 2001-05-07 | 2001-05-07 | Audio signal processing for speech communication |
Country Status (4)
Country | Link |
---|---|
US (2) | US6820054B2 (en) |
CN (1) | CN100490314C (en) |
MY (1) | MY131821A (en) |
WO (1) | WO2002091570A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060271358A1 (en) * | 2000-05-30 | 2006-11-30 | Adoram Erell | Enhancing the intelligibility of received speech in a noisy environment |
US20080243492A1 (en) * | 2006-09-07 | 2008-10-02 | Yamaha Corporation | Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor |
US20090281812A1 (en) * | 2006-01-18 | 2009-11-12 | Lg Electronics Inc. | Apparatus and Method for Encoding and Decoding Signal |
US20100121648A1 (en) * | 2007-05-16 | 2010-05-13 | Benhao Zhang | Audio frequency encoding and decoding method and device |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9736209B2 (en) | 2000-03-17 | 2017-08-15 | Facebook, Inc. | State change alerts mechanism |
US7624172B1 (en) | 2000-03-17 | 2009-11-24 | Aol Llc | State change alerts mechanism |
US7039193B2 (en) * | 2000-10-13 | 2006-05-02 | America Online, Inc. | Automatic microphone detection |
US6820054B2 (en) | 2001-05-07 | 2004-11-16 | Intel Corporation | Audio signal processing for speech communication |
US7752037B2 (en) * | 2002-02-06 | 2010-07-06 | Broadcom Corporation | Pitch extraction methods and systems for speech coding using sub-multiple time lag extraction |
US7236927B2 (en) * | 2002-02-06 | 2007-06-26 | Broadcom Corporation | Pitch extraction methods and systems for speech coding using interpolation techniques |
US7529661B2 (en) * | 2002-02-06 | 2009-05-05 | Broadcom Corporation | Pitch extraction methods and systems for speech coding using quadratically-interpolated and filtered peaks for multiple time lag extraction |
US7640306B2 (en) | 2002-11-18 | 2009-12-29 | Aol Llc | Reconfiguring an electronic message to effect an enhanced notification |
CN100593197C (en) * | 2005-02-02 | 2010-03-03 | 富士通株式会社 | Signal processing method and device thereof |
GB0706427D0 (en) * | 2007-04-02 | 2007-05-09 | British Telecomm | Data recovery scheme |
GB0705329D0 (en) * | 2007-03-20 | 2007-04-25 | Skype Ltd | Method of transmitting data in a communication system |
US8831936B2 (en) * | 2008-05-29 | 2014-09-09 | Qualcomm Incorporated | Systems, methods, apparatus, and computer program products for speech signal processing using spectral contrast enhancement |
US8538749B2 (en) * | 2008-07-18 | 2013-09-17 | Qualcomm Incorporated | Systems, methods, apparatus, and computer program products for enhanced intelligibility |
US9124708B2 (en) * | 2008-07-28 | 2015-09-01 | Broadcom Corporation | Far-end sound quality indication for telephone devices |
US9202456B2 (en) * | 2009-04-23 | 2015-12-01 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for automatic control of active noise cancellation |
ES2508590T3 (en) | 2010-01-08 | 2014-10-16 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, encoding apparatus, decoding apparatus, program and recording medium |
US9053697B2 (en) | 2010-06-01 | 2015-06-09 | Qualcomm Incorporated | Systems, methods, devices, apparatus, and computer program products for audio equalization |
JP4837123B1 (en) * | 2010-07-28 | 2011-12-14 | 株式会社東芝 | SOUND QUALITY CONTROL DEVICE AND SOUND QUALITY CONTROL METHOD |
GB2495927B (en) | 2011-10-25 | 2015-07-15 | Skype | Jitter buffer |
US9524729B2 (en) * | 2012-02-16 | 2016-12-20 | 2236008 Ontario Inc. | System and method for noise estimation with music detection |
US20140278393A1 (en) * | 2013-03-12 | 2014-09-18 | Motorola Mobility Llc | Apparatus and Method for Power Efficient Signal Conditioning for a Voice Recognition System |
US20140337021A1 (en) * | 2013-05-10 | 2014-11-13 | Qualcomm Incorporated | Systems and methods for noise characteristic dependent speech enhancement |
US9721159B2 (en) * | 2015-10-05 | 2017-08-01 | Evan Donald Balster | Periodicity analysis system |
JP6670224B2 (en) * | 2016-11-14 | 2020-03-18 | 株式会社日立製作所 | Audio signal processing system |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5890109A (en) * | 1996-03-28 | 1999-03-30 | Intel Corporation | Re-initializing adaptive parameters for encoding audio signals |
US5903819A (en) * | 1996-03-13 | 1999-05-11 | Ericsson Inc. | Noise suppressor circuit and associated method for suppressing periodic interference component portions of a communication signal |
US5907823A (en) * | 1995-09-13 | 1999-05-25 | Nokia Mobile Phones Ltd. | Method and circuit arrangement for adjusting the level or dynamic range of an audio signal |
US5978756A (en) * | 1996-03-28 | 1999-11-02 | Intel Corporation | Encoding audio signals using precomputed silence |
US6212273B1 (en) * | 1998-03-20 | 2001-04-03 | Crystal Semiconductor Corporation | Full-duplex speakerphone circuit including a control interface |
US6262943B1 (en) * | 1997-08-27 | 2001-07-17 | The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland | Signal processing system for sensing a periodic signal in noise |
US20010012997A1 (en) * | 1996-12-12 | 2001-08-09 | Adoram Erell | Keyword recognition system and method |
US20020019733A1 (en) * | 2000-05-30 | 2002-02-14 | Adoram Erell | System and method for enhancing the intelligibility of received speech in a noise environment |
US20020077813A1 (en) * | 1999-01-06 | 2002-06-20 | Adoram Erell | System and method for relatively noise robust speech recognition |
US20030002659A1 (en) * | 2001-05-30 | 2003-01-02 | Adoram Erell | Enhancing the intelligibility of received speech in a noisy environment |
US6820054B2 (en) * | 2001-05-07 | 2004-11-16 | Intel Corporation | Audio signal processing for speech communication |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4611342A (en) | 1983-03-01 | 1986-09-09 | Racal Data Communications Inc. | Digital voice compression having a digitally controlled AGC circuit and means for including the true gain in the compressed data |
US4609788A (en) | 1983-03-01 | 1986-09-02 | Racal Data Communications Inc. | Digital voice transmission having improved echo suppression |
US4628529A (en) | 1985-07-01 | 1986-12-09 | Motorola, Inc. | Noise suppression system |
US5303308A (en) | 1992-07-07 | 1994-04-12 | Gn Netcom A/S | Audio frequency signal compressing system |
DE4229912A1 (en) | 1992-09-08 | 1994-03-10 | Sel Alcatel Ag | Method for improving the transmission properties of an electroacoustic system |
DE9421650U1 (en) | 1994-12-16 | 1996-05-30 | Grundig AG, 90762 Fürth | Arrangement for adaptive adaptation of the dynamic range of an audio signal |
US6708146B1 (en) | 1997-01-03 | 2004-03-16 | Telecommunications Research Laboratories | Voiceband signal classifier |
US6535846B1 (en) | 1997-03-19 | 2003-03-18 | K.S. Waves Ltd. | Dynamic range compressor-limiter and low-level expander with look-ahead for maximizing and stabilizing voice level in telecommunication applications |
WO2000060830A2 (en) | 1999-03-30 | 2000-10-12 | Siemens Aktiengesellschaft | Mobile telephone |
US6754337B2 (en) | 2002-01-25 | 2004-06-22 | Acoustic Technologies, Inc. | Telephone having four VAD circuits |
-
2001
- 2001-05-07 US US09/851,399 patent/US6820054B2/en not_active Expired - Lifetime
-
2002
- 2002-05-02 CN CNB028094913A patent/CN100490314C/en not_active Expired - Fee Related
- 2002-05-02 WO PCT/US2002/014015 patent/WO2002091570A1/en not_active Application Discontinuation
- 2002-05-07 MY MYPI20021648A patent/MY131821A/en unknown
-
2004
- 2004-09-03 US US10/934,059 patent/US7149685B2/en not_active Expired - Fee Related
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5907823A (en) * | 1995-09-13 | 1999-05-25 | Nokia Mobile Phones Ltd. | Method and circuit arrangement for adjusting the level or dynamic range of an audio signal |
US5903819A (en) * | 1996-03-13 | 1999-05-11 | Ericsson Inc. | Noise suppressor circuit and associated method for suppressing periodic interference component portions of a communication signal |
US5890109A (en) * | 1996-03-28 | 1999-03-30 | Intel Corporation | Re-initializing adaptive parameters for encoding audio signals |
US5978756A (en) * | 1996-03-28 | 1999-11-02 | Intel Corporation | Encoding audio signals using precomputed silence |
US20010012997A1 (en) * | 1996-12-12 | 2001-08-09 | Adoram Erell | Keyword recognition system and method |
US6262943B1 (en) * | 1997-08-27 | 2001-07-17 | The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland | Signal processing system for sensing a periodic signal in noise |
US6212273B1 (en) * | 1998-03-20 | 2001-04-03 | Crystal Semiconductor Corporation | Full-duplex speakerphone circuit including a control interface |
US20020077813A1 (en) * | 1999-01-06 | 2002-06-20 | Adoram Erell | System and method for relatively noise robust speech recognition |
US20030004712A1 (en) * | 1999-01-06 | 2003-01-02 | Adoram Erell | System and method for relatively noise robust speech recognition |
US20020019733A1 (en) * | 2000-05-30 | 2002-02-14 | Adoram Erell | System and method for enhancing the intelligibility of received speech in a noise environment |
US6820054B2 (en) * | 2001-05-07 | 2004-11-16 | Intel Corporation | Audio signal processing for speech communication |
US20030002659A1 (en) * | 2001-05-30 | 2003-01-02 | Adoram Erell | Enhancing the intelligibility of received speech in a noisy environment |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060271358A1 (en) * | 2000-05-30 | 2006-11-30 | Adoram Erell | Enhancing the intelligibility of received speech in a noisy environment |
US7630887B2 (en) | 2000-05-30 | 2009-12-08 | Marvell World Trade Ltd. | Enhancing the intelligibility of received speech in a noisy environment |
US20100121635A1 (en) * | 2000-05-30 | 2010-05-13 | Adoram Erell | Enhancing the Intelligibility of Received Speech in a Noisy Environment |
US8090576B2 (en) | 2000-05-30 | 2012-01-03 | Marvell World Trade Ltd. | Enhancing the intelligibility of received speech in a noisy environment |
US8407045B2 (en) | 2000-05-30 | 2013-03-26 | Marvell World Trade Ltd. | Enhancing the intelligibility of received speech in a noisy environment |
US20090281812A1 (en) * | 2006-01-18 | 2009-11-12 | Lg Electronics Inc. | Apparatus and Method for Encoding and Decoding Signal |
US20110057818A1 (en) * | 2006-01-18 | 2011-03-10 | Lg Electronics, Inc. | Apparatus and Method for Encoding and Decoding Signal |
US20080243492A1 (en) * | 2006-09-07 | 2008-10-02 | Yamaha Corporation | Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor |
US20100121648A1 (en) * | 2007-05-16 | 2010-05-13 | Benhao Zhang | Audio frequency encoding and decoding method and device |
US8463614B2 (en) * | 2007-05-16 | 2013-06-11 | Spreadtrum Communications (Shanghai) Co., Ltd. | Audio encoding/decoding for reducing pre-echo of a transient as a function of bit rate |
Also Published As
Publication number | Publication date |
---|---|
CN1507689A (en) | 2004-06-23 |
US6820054B2 (en) | 2004-11-16 |
US7149685B2 (en) | 2006-12-12 |
WO2002091570A1 (en) | 2002-11-14 |
US20030023433A1 (en) | 2003-01-30 |
MY131821A (en) | 2007-09-28 |
CN100490314C (en) | 2009-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7149685B2 (en) | Audio signal processing for speech communication | |
RU2743315C1 (en) | Method of music classification and a method of detecting music beat parts, a data medium and a computer device | |
US7499686B2 (en) | Method and apparatus for multi-sensory speech enhancement on a mobile device | |
US6898566B1 (en) | Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal | |
EP1536414B1 (en) | Method and apparatus for multi-sensory speech enhancement | |
US8600073B2 (en) | Wind noise suppression | |
JP4484283B2 (en) | Audio processing apparatus and method | |
RU2251750C2 (en) | Method for detection of complicated signal activity for improved classification of speech/noise in audio-signal | |
US6993481B2 (en) | Detection of speech activity using feature model adaptation | |
US20050108004A1 (en) | Voice activity detector based on spectral flatness of input signal | |
US8473282B2 (en) | Sound processing device and program | |
JP3248755B2 (en) | Voice detection method and device | |
US9905250B2 (en) | Voice detection method | |
US20230360666A1 (en) | Voice signal detection method, terminal device and storage medium | |
US8423357B2 (en) | System and method for biometric acoustic noise reduction | |
US5864795A (en) | System and method for error correction in a correlation-based pitch estimator | |
KR100976082B1 (en) | Voice activity detector and validator for noisy environments | |
US8442817B2 (en) | Apparatus and method for voice activity detection | |
US20090161882A1 (en) | Method of Measuring an Audio Signal Perceived Quality Degraded by a Noise Presence | |
EP1672619A2 (en) | Speech coding apparatus and method therefor | |
Lee et al. | A voice activity detection algorithm for communication systems with dynamically varying background acoustic noise | |
US20240105213A1 (en) | Signal energy calculation with a new method and a speech signal encoder obtained by means of this method | |
US20130226568A1 (en) | Audio signals by estimations and use of human voice attributes | |
JP2003316380A (en) | Noise reduction system for preprocessing speech- containing sound signal | |
CN116916506A (en) | Lamp effect display method and terminal based on music beat |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ERELL, ADORAM;KLEINSTEIN, AVI;REEL/FRAME:018044/0640 Effective date: 20010809 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.) |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20181212 |