EP2840570A1 - Enhanced estimation of at least one target signal - Google Patents

Enhanced estimation of at least one target signal Download PDF

Info

Publication number
EP2840570A1
EP2840570A1 EP13181563.1A EP13181563A EP2840570A1 EP 2840570 A1 EP2840570 A1 EP 2840570A1 EP 13181563 A EP13181563 A EP 13181563A EP 2840570 A1 EP2840570 A1 EP 2840570A1
Authority
EP
European Patent Office
Prior art keywords
signal
phase
amplitude
estimation
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13181563.1A
Other languages
German (de)
French (fr)
Inventor
Pejman Mowlaee
Rahim Saeidi
Gernot Kubin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technische Universitaet Graz
Original Assignee
Technische Universitaet Graz
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technische Universitaet Graz filed Critical Technische Universitaet Graz
Priority to EP13181563.1A priority Critical patent/EP2840570A1/en
Priority to EP14753072.9A priority patent/EP3036739A1/en
Priority to PCT/EP2014/067667 priority patent/WO2015024940A1/en
Publication of EP2840570A1 publication Critical patent/EP2840570A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the signal of interest can be any target signal included in the at least one discrete-time signal.
  • This approach according to the invention pushes the limits of the conventional methods by introducing interaction between amplitude estimation and phase estimation stages.
  • the at least one discrete-time signal can be a bio-medical, radar, image or video signal.
  • the complex time-frequency representation X can be either one- or multidimensional.
  • the matrix X is typically composed of frames as rows and frequency bins as its columns (rows are often larger than the columns).
  • speech signals it is composed of a wide dynamic range of values (80 dB).
  • the dynamic range is often much lower as the signal is sparse in time-frequency.
  • the at least one discrete-time signal can be derived from a multi channel signal.
  • An additional information provided by at least a second measurement device can be processed to give an extraordinary accurate estimation of the at least one target signal.
  • the method according to the invention is also suited to estimate two or more target signals.
  • a typical approach to estimate the signal of interest s 1 (t) consists of transforming the continuous-time signal y(t) into a quantized discrete-time signal y(n) by applying an analog digital converter 1 on the continuous-time signal y(t).
  • a signal estimation device 2 processes the discrete time signal y(n) using a priori information to provide an estimate of at least the signal of interest ⁇ 1 (n). In the given example an estimate of the signal ⁇ 2 (n) representing noise is provided as well.
  • Fig. 3 shows exemplary state of the art modifications of the modification stage of Fig. 2 (if not stated otherwise in the description of the figures, same reference signs describe same features).
  • the M discrete-time signals exploited from the number of M sensors are analyzed in block A, providing N x M samples in a complex format as described in Fig. 2 .
  • block 3 the amplitude part contained in the complex format of the samples is exploited (block 4 exploits the phase part contained in the complex format of the samples).
  • the samples are processed through amplitude or phase enhancement stages, wherein the amplitude enhancement stage is provided with a noise estimate, and finally synthesized in block S to provide an enhanced signal, in particular an enhanced speech signal.
  • the modifications of the samples can be categorized in four different groups.
  • FIG. 4 shows an exemplary schematic block-diagram of a variant of the invention.
  • Block A and block S represent analysis and synthesis blocks as described in Fig. 2 and 3 , wherein block A is provided with at least one discrete-time signal y(n).
  • the conventional enhancement block C represents any phase-unaware amplitude estimator or phase-unaware amplitude estimation methods (or any amplitude estimation method performed irrespectively of the phase spectrum of at least the signal of interest s 1 (t)), which separate signal of interest s 1 (n) from noise and/or other signals s 2 (n) for example by applying a frequency-dependent gain function (mask) on observed noisy amplitude spectrum.
  • gain functions are Wiener filter (as softmask) and binary mask.
  • Noise reduction capability obtained by conventional methods is limited since they only modify the amplitude or phase individually.
  • the block C and the block "New Enhancement" is provided with a noise estimate.
  • Fig. 4 shows a block "stopping rule", which provides a criterion to stop the feedback loop.
  • the output of the block “New Enhancement” can be looped back as an input signal s in (n) (the input signal s in (n) can be in complex format) for the block “New Enhancement” in a following iteration.
  • the block “New Enhancement” is described in more detail in Fig. 6 .
  • Fig. 6 shows a schematic block-diagram of the block "New Enhancement” according to the invention shown in Fig. 4 and 5 .
  • Fig. 8 shows a schematic block-diagram of a typical single-channel separation algorithm based on amplitude estimation on a complex spectrum of a noisy signal described in appendix AP1, said amplitude estimation being performed phase-unaware.
  • a signal y comprises two signals s1 and s2 to be separated, wherein amplitude estimates ⁇ 1 and ⁇ 2 a noisy phase signal ⁇ y is applied to reconstruct the clean signals ⁇ 1 and ⁇ 2 .
  • Fig. 9 shows a schematic block-diagram of amplitude-aware phase estimation.
  • the signal reconstruction is provided with phase information corresponding to the signals s1 and s2 respectively.
  • An minimum mean square error (MMSE) phase estimation block is shown, which is provided with the amplitude estimates ⁇ 1 and ⁇ 2 and the signal y, said phase estimation being amplitude-aware and providing phase signals ⁇ 1 and ⁇ 2 .
  • MMSE minimum mean square error
  • Fig. 10 shows two schematic block-diagrams of two different single-channel speech separation algorithms.
  • a typical method to estimate a clean speech amplitude X ⁇ (corresponding to ⁇ 1 of Fig. 8 and 9 ) is shown in (a), wherein the amplitude estimation (within the block "Gain function") is not provided with any phase information.
  • the amplitude estimation (within the block "Gain function") is not provided with any phase information.
  • phase-aware amplitude estimation and amplitude-aware phase estimation do not relate to speech signals only.
  • phase-aware amplitude estimation and amplitude-aware phase estimation is applicable to a plurality of signals and the speech signals described in appendix AP2 and AP1 just represent one utilization of phase-aware amplitude estimation and amplitude-aware phase example, respectively. Therefore, the invention is not limited to the examples given in this specification and can be adjusted in any manner known to a person skilled in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

Method for estimation of at least one signal of interest (s1(t), s1(n)) from at least one discrete-time signal (y(n)), said method comprising the steps of
a) transforming the at least one discrete-time signal (y(n)) into a frequency domain to obtain a complex spectrum F y n n = 0 N - 1
Figure imga0001
of the at least one discrete-time signal (y(n));
b) performing an amplitude estimation on the complex spectrum F y n n = 0 N - 1
Figure imga0002
to obtain an estimated amplitude spectrum of the at least one signal of interest (s1(t), s1(n));
c) performing a phase estimation on the complex spectrum F y n n = 0 N - 1 ,
Figure imga0003
said phase estimation being an amplitude-aware phase estimation using an input signal (sin(n)) to obtain an estimated phase spectrum of the at least one signal of interest (s1(t), s1(n)) ;
d) performing an amplitude estimation on the complex spectrum F y n n = 0 N - 1 ,
Figure imga0004
said amplitude estimation being a phase-aware amplitude estimation using the result of the phase estimation of step c) to obtain an enhanced complex spectrum F s ^ 1 n n = 0 N - 1
Figure imga0005
of the at least one signal of interest (s1(t), s1(n)).

Description

    Field of the invention and description of prior art
  • The present invention relates to a method for estimation of at least one signal of interest from at least one discrete-time signal. Furthermore, the invention relates to a device for carrying out a method according to the invention.
  • In many applications, signals of interest are corrupted by noise sources and/or other signals. Therefore, depending on the requirements of a given application, efforts were taken to reduce the level of noise and/or other signals to a tolerable level.
  • In case of the signal of interest (target signal) being a continuous-time signal to be processed by digital data processing, the target signal is usually measured and transformed into a quantized discrete-time signal. Provided that the sampling rate and the quantization levels were chosen properly the quantized discrete-time signal comprises the desired target signal and, as an undesirable effect, also noise sources and/or other signals.
  • Conventional phase-unaware amplitude estimation methods separate the target signal (signal of interest) from noise and/or other signals by applying a frequency-dependent gain function (mask) on observed noisy amplitude spectrum. Examples for such gain functions are Wiener filter (as softmask) and binary mask. Noise reduction capability obtained by conventional methods is limited since they only modify the amplitude or phase individually.
  • Summary of the invention
  • It is an object of the present invention to provide an enhanced method for the estimation of at least one target signal.
  • In a first aspect of the invention, this aim is achieved by means of above-mentioned method, comprising the following steps:
    1. a) transforming the at least one discrete-time signal into a frequency domain to obtain a complex spectrum of the at least one discrete-time signal;
    2. b) performing an amplitude estimation on the complex spectrum to obtain an estimated amplitude spectrum of the at least one signal of interest;
    3. c) performing a phase estimation on the complex spectrum, said phase estimation being an amplitude-aware phase estimation using an input signal to obtain an estimated phase spectrum of the at least one signal of interest;
    4. d) performing an amplitude estimation on the complex spectrum, said amplitude estimation being a phase-aware amplitude estimation using the result of the phase estimation of step c) to obtain an enhanced complex spectrum of the at least one signal of interest.
  • By virtue of this approach according to the invention it is possible to perform accurate estimations of at least one target signal comprised in at least one discrete-time signal even under adverse conditions, i.e. highly correlated noise sources and/or other signals. For example, the signal of interest can be any target signal included in the at least one discrete-time signal. This approach according to the invention pushes the limits of the conventional methods by introducing interaction between amplitude estimation and phase estimation stages.
  • Step c) and d) according to the invention are based upon certain conditions. Amplitude-aware phase estimation according to step c) requires at least an estimation of the amplitude spectrum (signal magnitude spectrum) of the at least one signal of interest and preferably estimation of the amplitude spectrum of the vector sum of all other sources. In particular, the amplitude-aware phase estimator has been derived from the non-patent literature "Phase estimation for signal reconstruction in single-channel speech separation" (P. Mowlaee, R. Saiedi, and R. Martin, in Proceedings of the International Conference on Spoken Language Processing, 2012, see appendix AP1) and "STFT phase improvement for single channel speech enhancement" (M. Krawczyk and T. Gerkmann, in International Workshop on Acoustic Signal Enhancement; Proceedings of IWAENC, 2012, pp. 1-4). The phase-aware amplitude estimator has been derived from the non-patent literature "On phase importance in parameter estimation in single-channel speech enhancement" (P. Mowlaee and R. Saeidi, in IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp 7462-7466, see appendix AP2) and "MMSE-optimal spectral amplitude estimation given the STFT-phase" (T. Gerkmann and M. Krawczyk, Signal Processing Letters, IEEE, vol. 20, no. 2, pp. 129 - 132, Feb. 2013).
  • The at least one discrete-time signal can be of any source or the interaction of sources, for example a noisy speech signal or the superposition of several speech and/or noise signals. The at least one discrete-time signal could be obtained by observation, measurement and/or calculation.
  • In a variant of the invention, the amplitude estimation on the complex spectrum in step b) is performed irrespectively of the phase spectrum of the signal of interest.Such an amplitude estimation can be a "phase-unaware amplitude estimation" referring to any conventional amplitude estimation method which is performed irrespectively of the phase spectrum of at least the at least one signal of interest.
  • Preferably, after step b) the result of the amplitude estimation of the preceding step b) is used as an input signal in step c). In particular, the result of the amplitude estimation is only used as an input signal in step c) if step c) follows in direct order to step b).
  • According to a development of the invention, the amplitude estimation on the complex spectrum in step b) is performed by a frequency-dependent time-frequency mask, in particular by Wiener filtering of the complex spectrum.
  • In a further development of the invention, after step d) the steps c) and d) are repeated iteratively wherein as input signal in step c) the result of the phase-aware amplitude estimation of the preceding step d) is used. Therefore, a loop is closed by a feedback from an output of a phase-aware amplitude estimator to an input of an amplitude-aware phase estimator. Previous iterative speech enhancement methods aimed at improving the spectral amplitude estimates only within the iterations. In these methods, neither a phase enhancement stage nor a combined synthesis-analysis stage was used within the feedback loop for the iterations. Instead, a noisy phase was exploited in signal reconstruction. No phase information was taken into account to update the signal parameters of an enhanced target signal. A synergistic effect in this closed loop according to the invention stems from the fact that better amplitude estimation assists the phase estimation and a better phase estimation assists the amplitude estimation. These improvements can be continued by alternating between the two estimators multiple times until a sufficient quality of the joint amplitude and phase estimates is obtained.
  • In yet another development of the invention, the consistency between the phase and amplitude estimations of the enhanced complex spectrum (as the enhanced complex spectrum provides an input in step c)) of the at least one signal of interest is monitored according to the following comparison criterion:
    F(X) = STFT(STFT -1(X)) - X, X being a matrix composed of a complex time-frequency representation of the enhanced complex spectrum, wherein at least one quality index ε(i) is established to measure inconsistency of complex time-frequency representations denoted by D (i), obtained at each loop-iteration, and defined for the i-th loop iteration as follows: D i = F X i - F X i - 1 ,
    Figure imgb0001

    wherein the i-th ε(i) quality index is calculated by ε i = D i F X i
    Figure imgb0002

    and the loop-iterations are stopped at least when a quality indexε(i) gets lower than a pre-defined threshold ε th. Establishing a quality index ε(i) and comparison with a defined threshold ε th allows to measure the decrease of the amount of inconsistency observed between the phase and amplitude estimates obtained by each iteration before feedback to the phase estimation. Therefore, the iterations can be stopped when the quality index ε(i) gets lower than the pre-defined threshold ε th , allowing fast and efficient processing of the transformed signal.
  • Advantageously, the threshold is ε th = 0,05, which is especially suited as a comparison criterion for the quality index ε(i).
  • According to another development of the invention the iterations are stopped at least after a predefined number of iterations, in particular after five, six or seven iterations. This allows to limit the number of iterations and therefore to limit the computing efforts. It is also possible to relate to the above mentioned comparison criterion and to limit the number of iterations in case ε(i) does not fall below the threshold ε th.
  • In a variant of the invention the transformation method in step a) is a spectro-temporal transformation, in particular STFT, Wavelet or sinusoidal signal modelling. For example by replacing STFT representation with sinusoidal model, it is possible to reduce the dimensionality of signal feature to a great level, hence less computational effort. On the other hand, replacing STFT with other time-frequency transformations including Wavelet or Wigner ville time-frequency representation for amplitude estimation or Chirplet signal transformation and complex Wavelet transformation for representation of both amplitude and phase enables to have a non-uniform resolution to analyze different frequency bands, which is advantageous when applied to audio or speech signals.
  • Preferably, the at least one discrete-time signal can be a bio-medical, radar, image or video signal. In this case, the complex time-frequency representation X can be either one- or multidimensional. The matrix X is typically composed of frames as rows and frequency bins as its columns (rows are often larger than the columns). For speech signals, it is composed of a wide dynamic range of values (80 dB). For bio-medical signals, the dynamic range is often much lower as the signal is sparse in time-frequency.
  • Alternatively, the method according to the invention is especially suited if the at least one discrete-time signal is an audio signal.
  • In a further development of the invention the at least one discrete-time signal comprises at least one speech signal. The speech signal can be the target signal, which is true for many everyday life speech-related applications, in particular for automatic speech recognition (ASR) applications. According to a challenging scenario the at least one discrete-time signal can comprise two or even more speech signals. The target signal is represented by one speech signal to be separated from the accompanying signals.
  • Furthermore, the at least one discrete-time signal can be derived from a single channel signal. Single channel signals are common in many applications as they rely on a signal obtained by a single microphone (cell phones, headsets,...) but usually do provide less information than multi channel devices. Therefore, the requirements on signal enhancement are very high, especially in case of single channel speech separation (SCSS). Since the method according to the invention provides strongly enhanced target signals it is exceptionally suited to be applied on single channel signals.
  • Alternatively, the at least one discrete-time signal can be derived from a multi channel signal. An additional information provided by at least a second measurement device can be processed to give an extraordinary accurate estimation of the at least one target signal.
  • Of course, the method according to the invention is also suited to estimate two or more target signals.
  • In a second aspect of the invention the aim to provide an enhanced method for the estimation of at least one target signal is achieved by means of a device for carrying out a method according to any of the preceding claims.
  • Brief description of the drawings
  • The specific features and advantages of the present invention will be better understood through the following description. In the following, the present invention is described in more detail with reference to exemplary embodiments (which are not to be construed as limitative) shown in the drawings, which show:
    • Fig. 1 a schematic block-diagram illustrating the object of the invention,
    • Fig. 2 an exemplary schematic block-diagram of a state of the art multi-sensor speech enhancement method,
    • Fig. 3 exemplary state of the art modifications of Fig. 2,
    • Fig. 4 an exemplary schematic block-diagram of a variant of the invention,
    • Fig. 5 an exemplary schematic block-diagram of another variant of the invention,
    • Fig. 6 a schematic block-diagram of the block "New Enhancement" according to the invention shown in fig. 4 and 5,
    • Fig. 7 a detailed schematic block-diagram of the stopping rule block shown in fig. 4,
    • Fig. 8 a schematic block-diagram of a typical single-channel separation algorithm based on amplitude estimation on a complex spectrum of a noisy signal described in appendix AP1 in detail, said amplitude estimation being performed phase-unaware,
    • Fig. 9 a schematic block-diagram of amplitude-aware phase estimation described in appendix AP1 in detail,
    • Fig. 10 two schematic block-diagrams of two different single-channel speech separation algorithms described in appendix AP2 in detail.
    Detailed description of the invention
  • Fig. 1 shows a schematic block-diagram illustrating the object of the invention. Given an exemplary continuous-time signal y(t) which includes for example two different signals s1(t) and s2(t), it is an object of the invention to estimate the clean signal ŝ1(n) and/or ŝ2(n). Assuming that the signal s1(t) is a signal of interest and the signal s2(t) represents for example interfering noise (the signal s2(t) could stem from any other source or from a superposition of sources), a typical approach to estimate the signal of interest s1(t) consists of transforming the continuous-time signal y(t) into a quantized discrete-time signal y(n) by applying an analog digital converter 1 on the continuous-time signal y(t). As a next step, a signal estimation device 2 processes the discrete time signal y(n) using a priori information to provide an estimate of at least the signal of interest ŝ1(n). In the given example an estimate of the signal ŝ2(n) representing noise is provided as well.
  • Fig. 2 shows an exemplary schematic block-diagram of a state of the art multi-sensor speech enhancement method (which can be applied by a signal estimation device 2 according to Fig. 1) to be applied on M discrete-time signals exploited from a number of M sensors, said speech enhancement method composed of three stages, i.e. analysis, modification and synthesis. The analysis stage might consist in different signal representations including short-time Fourier transformation (STFT), Sinusoidal modeling, polyphase filter banks, Mel-frequency Cepstral analysis and/or any other suitable transformation applicable on at least one discrete-time signal. The discrete-time signals exploited from the number of M sensors are therefore transformed in a complex format providing amplitude and phase parts of the signals. Furthermore, the analysis stage is required to decompose the complex signals into a number of N different frequency channels, hence a product of N x M samples are provided for the modification stage. The modification stage known from the state of the art can be in two ways: a) amplitude enhancement, in which any frequency-dependent gain function as amplitude estimator (e.g. Wiener filter as a common choice) is employed together with a noise estimator given either by a reference microphone or a noise tracking method, or b) phase enhancement, in which the noisy phase is often directly copied to synthesize enhanced output signal. Finally, the synthesis stage is applied on the resulting N x M samples of the modification stage to reconstruct enhanced signals, in particular enhanced speech signals.
  • Fig. 3 shows exemplary state of the art modifications of the modification stage of Fig. 2 (if not stated otherwise in the description of the figures, same reference signs describe same features). The M discrete-time signals exploited from the number of M sensors are analyzed in block A, providing N x M samples in a complex format as described in Fig. 2. In block 3 the amplitude part contained in the complex format of the samples is exploited (block 4 exploits the phase part contained in the complex format of the samples). The samples are processed through amplitude or phase enhancement stages, wherein the amplitude enhancement stage is provided with a noise estimate, and finally synthesized in block S to provide an enhanced signal, in particular an enhanced speech signal. The modifications of the samples can be categorized in four different groups.
    • A first group provides an estimate for the clean speech spectral amplitude based on a noise estimate from a noise tracker or a reference sensor and a speech estimate using a decision-directed method (see US 2009/0163168 A1 and Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator", IEEE Trans. Acoust., Speech, Signal Processing, vol. 32. no. 6, pp. 1109-1121, Dec 1984). The noisy phase is directly used unaltered when reconstructing a time-domain enhanced speech signal at an output. This group can be represented in Fig. 3 by an amplitude switch ASW being in a position P2 and a phase switch PSW being in a position P3.
    • A second group (ASW in position P2 and PSW in position P4) refers to phase enhancement only methods. For example non-patent literature see appendix AP1 and AP2 suggested to employ Griffin and Lim iterations to estimate the signal phase for signal reconstruction given the Wiener filtered amplitude spectrum, using synthesis-analysis in iterations.
    • A third group (ASW in position P1 and PSW in position 4) refers to phase enhancement only methods used with the noisy amplitude. The phase estimation often requires strong assumptions knowing exact onsets and fundamental frequency of clean signals, in particular speech signals and previous frame phase values.
    • A fourth group (ASW in position P2 and PSW in position P4, but in contrast to the second group no iterations) refers to a method assuming a clean spectral amplitude is available and spectral amplitude is estimated in a phase-aware way in an open loop configuration.
  • Fig. 4 shows an exemplary schematic block-diagram of a variant of the invention. Block A and block S represent analysis and synthesis blocks as described in Fig. 2 and 3, wherein block A is provided with at least one discrete-time signal y(n). Block A transforms the at least one discrete-time signal y(n) for example by an N-point Fourier transform (any other time-frequency transformation providing amplitude and phase spectra suffice for the method according to the invention) into a frequency domain to obtain a complex spectrum, i.e. F y n n = 0 N - 1 = Y k e j ϕ k n = 0 N - 1 ,
    Figure imgb0003
    where n = [0 ... N-1] with N the window size and Y(k) and φ y (k) as the kth frequency component for the magnitude and phase spectrum of y(n), respectively. F({y(n)} of the at least one discrete-time signal y(n).
  • In contrast to the signal modification and enhancement methods described in Fig. 3, an enhanced signal modification method according to the invention is provided, which is described in detail in Fig. 6. A block "New Enhancement" is provided with a noise estimate, N x M samples, and, depending on the switching position of a loop switch LSW,
    • with an amplitude estimate of the amplitude part of the complex spectrum of at least one corresponding signal of interest s1(n) (preferably from noise and/or other signals s2(n) as well) provided by a conventional enhancement block C (loop switch LSW in position P1) or
    • with an output signal of the new enhancement method, which is looped back as an input signal sin(n) (loop switch LSW in position P2).
  • The conventional enhancement block C represents any phase-unaware amplitude estimator or phase-unaware amplitude estimation methods (or any amplitude estimation method performed irrespectively of the phase spectrum of at least the signal of interest s1(t)), which separate signal of interest s1(n) from noise and/or other signals s2(n) for example by applying a frequency-dependent gain function (mask) on observed noisy amplitude spectrum. Examples for such gain functions are Wiener filter (as softmask) and binary mask. Noise reduction capability obtained by conventional methods is limited since they only modify the amplitude or phase individually. Preferably, the block C and the block "New Enhancement" is provided with a noise estimate. Block C performs an amplitude estimation on the complex spectrum F y n n = 0 N - 1
    Figure imgb0004
    to obtain an estimated amplitude spectrum of the at least one signal of interest s1(n) (preferably from noise and/or other signals s2(n) as well).
  • Furthermore, Fig. 4 shows a block "stopping rule", which provides a criterion to stop the feedback loop. The block "new enhancement" is first provided with the amplitude estimate of the complex spectrum F y n n = 0 N - 1
    Figure imgb0005
    of the conventional block C. The output of the block "New Enhancement" can be looped back as an input signal sin(n) (the input signal sin(n) can be in complex format) for the block "New Enhancement" in a following iteration. The block "New Enhancement" is described in more detail in Fig. 6.
  • Fig. 5 shows an exemplary schematic block-diagram of another variant of the invention, wherein the feedback loop differs from the variant shown in Fig. 4. Herein, the output of the block "New Enhancement" is synthesized in Block S and analyzed in a following analysis block A, before being looped back as an input signal sin(n) to the block "New Enhancement", provided that the loop switch LSW being in position P2. This allows monitoring the consistency between the phase and amplitude estimations of the enhanced complex spectrum of the at least one signal of interest according to the following comparison criterion:
    • F(X) = STFT(STFT -1(X)) - X, X being a matrix composed of a complex time-frequency representation of the enhanced complex spectrum, wherein at least one quality index ε(i) is established to measure inconsistency of complex time-frequency representations denoted by D (i), obtained by each loop-iteration, and defined for the i-th loop iteration as follows:
    D i = F X i - F X i - 1 ,
    Figure imgb0006

    wherein the i-th ε(i) quality index is calculated by ε i = D i F X i
    Figure imgb0007

    and the loop-iterations are stopped at least when a quality indexε(i) gets lower than a pre-defined threshold ε th. The threshold ε th is preferably ε th = 0,05.
  • Fig. 6 shows a schematic block-diagram of the block "New Enhancement" according to the invention shown in Fig. 4 and 5. Herein, two blocks are shown processing the N x M samples described in the preceding figures. Generally, the a block "amplitude-aware phase estimation" performs a phase estimation on the complex spectrum F y n n = 0 N - 1 ,
    Figure imgb0008
    said phase estimation being an amplitude-aware phase estimation using the input signal sin(n) to obtain an estimated phase spectrum of the at least one signal of interest s1(n), wherein the result of the phase-unaware amplitude estimation of the conventional enhancement block C (see Fig. 4 and 5) is used as said input signal sin(n). The block "amplitude-aware phase estimation" provides an enhanced phase estimation of the at least one signal of interest s1(t) (preferably, an enhanced phase estimation of the noise or any other signal s2(t) as well) to a following block "phase-aware amplitude estimator". Within the block "phase-aware amplitude estimator" an amplitude estimation on the complex spectrum F y n n = 0 N - 1
    Figure imgb0009
    is performed, said amplitude estimation being a phase-aware amplitude estimation using the result of the phase estimation of the block "amplitude-aware phase estimation" to obtain an enhanced complex spectrum F s ^ 1 n n = 0 N - 1
    Figure imgb0010
    of the at least one signal of interest s1(n).
  • Fig. 7 shows a detailed schematic block-diagram of the block "stopping rule" shown in Fig. 4. A block "consistency check" is provided with
    • the estimated phase spectrum of the at least one signal of interest derived from the block "amplitude-aware phase estimation" (see Fig. 6) and with
    • the estimated amplitude spectrum of the at least one signal of interest derived from the block "phase-aware Amplitude Estimator" (see Fig. 6).
  • The consistency of the enhanced complex spectrum F s ^ 1 n n = 0 N - 1
    Figure imgb0011
    can either be assumed to converge after a certain number of iterations (for example five, six or seven iterations) or a inconsistency criterion can be applied (for example the quality index ε(i) mentioned above) limiting the number of iterations.
  • Fig. 8 shows a schematic block-diagram of a typical single-channel separation algorithm based on amplitude estimation on a complex spectrum of a noisy signal described in appendix AP1, said amplitude estimation being performed phase-unaware. Herein, a signal y comprises two signals s1 and s2 to be separated, wherein amplitude estimates Ŝ1 and Ŝ2 a noisy phase signal φ y is applied to reconstruct the clean signals ŝ1 and ŝ2.
  • Fig. 9 shows a schematic block-diagram of amplitude-aware phase estimation. In contrast to Fig. 8, the signal reconstruction is provided with phase information corresponding to the signals s1 and s2 respectively. An minimum mean square error (MMSE) phase estimation block is shown, which is provided with the amplitude estimates Ŝ1 and Ŝ2 and the signal y, said phase estimation being amplitude-aware and providing phase signals φ̂1 and φ̂2. Detailed description of the algorithm is given in appendix AP1.
  • Fig. 10 shows two schematic block-diagrams of two different single-channel speech separation algorithms. A typical method to estimate a clean speech amplitude (corresponding to Ŝ1 of Fig. 8 and 9) is shown in (a), wherein the amplitude estimation (within the block "Gain function") is not provided with any phase information. Within the scope of this specification such a amplitude estimation is referred to as being phase-unaware. In contrast, Fig. 10 (b) provides an example of a phase-aware amplitude estimation (block "Gain function"), wherein the amplitude estimation is based at least on the magnitude spectra Y of the signal y and the phase spectra φ x of a speech signal x, the phase spectra φ x being the phase spectra of a speech signal x. Taking into account the phase spectra φ x of the speech signal x to calculate the a clean speech amplitude makes the amplitude estimation phase-aware. Detailed description of the algorithm is given in appendix AP2.
  • Of course, the terms phase-aware amplitude estimation and amplitude-aware phase estimation defined herein do not relate to speech signals only. In fact, phase-aware amplitude estimation and amplitude-aware phase estimation is applicable to a plurality of signals and the speech signals described in appendix AP2 and AP1 just represent one utilization of phase-aware amplitude estimation and amplitude-aware phase example, respectively. Therefore, the invention is not limited to the examples given in this specification and can be adjusted in any manner known to a person skilled in the art.

Claims (15)

  1. Method for estimation of at least one signal of interest (s1(t), s1(n)) from at least one discrete-time signal (y(n)), said method comprising the steps of
    a) transforming the at least one discrete-time signal (y(n)) into a frequency domain to obtain a complex spectrum F y n n = 0 N - 1
    Figure imgb0012
    of the at least one discrete-time signal (y(n));
    b) performing an amplitude estimation on the complex spectrum F y n n = 0 N - 1
    Figure imgb0013
    to obtain an estimated amplitude spectrum of the at least one signal of interest (s1(t), s1(n));
    c) performing a phase estimation on the complex spectrum F y n n = 0 N - 1 ,
    Figure imgb0014
    said phase estimation being an amplitude-aware phase estimation using an input signal (sin(n)) to obtain an estimated phase spectrum of the at least one signal of interest (s1(t), s1(n));
    d) performing an amplitude estimation on the complex spectrum F y n n = 0 N - 1 ,
    Figure imgb0015
    said amplitude estimation being a phase-aware amplitude estimation using the result of the phase estimation of step c) to obtain an enhanced complex spectrum F s ^ 1 n n = 0 N - 1
    Figure imgb0016
    of the at least one signal of interest (s1(t), s1(n)).
  2. Method of claim 1, wherein in step b) the amplitude estimation on the complex spectrum F y n n = 0 N - 1
    Figure imgb0017
    is performed irrespectively of the phase spectrum of the signal of interest (s1(t), s1(n)).
  3. Method of claim 1 or 2, wherein after step b) the result of the amplitude estimation of the preceding step b) is used as an input signal (sin(n)) in step c).
  4. Method of any of the claims 1 to 3, wherein in step b) the amplitude estimation on the complex spectrum F y n n = 0 N - 1
    Figure imgb0018
    is performed by a frequency-dependent time-frequency mask, in particular by Wiener filtering of the complex spectrum F y n n = 0 N - 1 .
    Figure imgb0019
  5. Method of any of the claims 1 to 4, wherein after step d) the steps c) and d) are repeated iteratively wherein as input signal (sin(n)) in step c) the result of the phase-aware amplitude estimation of the preceding step d) is used.
  6. Method of any of the claims 1 to 5, wherein the consistency between the phase and amplitude estimations of the enhanced complex spectrum F s ^ 1 n n = 0 N - 1
    Figure imgb0020
    of the at least one signal of interest (s1(t), s1(n)) is monitored according to the following comparison criterion:
    F(X) = STFT(STFT -1(X)) - X, X being a matrix composed of a complex time-frequency representation of the enhanced complex spectrum, wherein at least one quality index ε (i) is established to measure inconsistency of complex time-frequency representations denoted by D (i), obtained by each loop-iteration, and defined for the i-th loop iteration as follows: D i = F X i - F X i - 1 ,
    Figure imgb0021

    wherein the i-th ε (i) quality index is calculated by ε i = D i F X i
    Figure imgb0022

    and the loop-iterations are stopped at least when a quality index ε (i) gets lower than a pre-defined threshold ε th.
  7. Method of claim 6, wherein the threshold is ε th = 0,05.
  8. Method of any of the claims 5 to 7, wherein the iterations are stopped at least after a predefined number of iterations, in particular after five, six or seven iterations.
  9. Method of any of the claims 1 to 8, wherein the transformation method in step a) is a spectro-temporal transformation, in particular STFT, Wavelet or sinusoidal signal modeling.
  10. Method of any of the claims 1 to 9, wherein the at least one discrete-time signal (y(n)) is a bio-medical, radar, image or video signal.
  11. Method of any of the claims 1 to 9, wherein the at least one discrete-time (y(n)) signal is an audio signal.
  12. Method of claim 11, wherein the at least one discrete-time signal (y(n)) comprises at least one speech signal.
  13. Method of claims 11 or 12, wherein the at least one discrete-time signal (y(n)) is derived from a single channel signal.
  14. Method of claims 11 or 12, wherein the at least one discrete-time signal (y(n)) is derived from a multi channel signal.
  15. Device for carrying out a method according to any of the preceding claims.
EP13181563.1A 2013-08-23 2013-08-23 Enhanced estimation of at least one target signal Withdrawn EP2840570A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP13181563.1A EP2840570A1 (en) 2013-08-23 2013-08-23 Enhanced estimation of at least one target signal
EP14753072.9A EP3036739A1 (en) 2013-08-23 2014-08-19 Enhanced estimation of at least one target signal
PCT/EP2014/067667 WO2015024940A1 (en) 2013-08-23 2014-08-19 Enhanced estimation of at least one target signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP13181563.1A EP2840570A1 (en) 2013-08-23 2013-08-23 Enhanced estimation of at least one target signal

Publications (1)

Publication Number Publication Date
EP2840570A1 true EP2840570A1 (en) 2015-02-25

Family

ID=49115345

Family Applications (2)

Application Number Title Priority Date Filing Date
EP13181563.1A Withdrawn EP2840570A1 (en) 2013-08-23 2013-08-23 Enhanced estimation of at least one target signal
EP14753072.9A Withdrawn EP3036739A1 (en) 2013-08-23 2014-08-19 Enhanced estimation of at least one target signal

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP14753072.9A Withdrawn EP3036739A1 (en) 2013-08-23 2014-08-19 Enhanced estimation of at least one target signal

Country Status (2)

Country Link
EP (2) EP2840570A1 (en)
WO (1) WO2015024940A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903355A (en) * 2021-12-09 2022-01-07 北京世纪好未来教育科技有限公司 Voice acquisition method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7492814B1 (en) * 2005-06-09 2009-02-17 The U.S. Government As Represented By The Director Of The National Security Agency Method of removing noise and interference from signal using peak picking
US20090163168A1 (en) * 2005-04-26 2009-06-25 Aalborg Universitet Efficient initialization of iterative parameter estimation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090163168A1 (en) * 2005-04-26 2009-06-25 Aalborg Universitet Efficient initialization of iterative parameter estimation
US7492814B1 (en) * 2005-06-09 2009-02-17 The U.S. Government As Represented By The Director Of The National Security Agency Method of removing noise and interference from signal using peak picking

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EPHRAIM Y ET AL: "Speech Enhancement Using a- Minimum Mean- Square Error Short-Time Spectral Amplitude Estimator", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, USA, vol. ASSP-32, no. 6, 1 December 1984 (1984-12-01), pages 1109 - 1121, XP002435684, ISSN: 0096-3518, DOI: 10.1109/TASSP.1984.1164453 *
MOWLAEE P ET AL: "On phase importance in parameter estimation in single-channel speech enhancement", 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) IEEE PISCATAWAY, NJ, USA, 31 May 2013 (2013-05-31) - 31 May 2013 (2013-05-31), pages 7462 - 7466, XP002717793, ISBN: 978-1-4799-0356-6, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639113> *
PEJMAN MOWLAEE ET AL: "Phase estimation for signal reconstruction in single-channel speech separation", INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 31 January 2013 (2013-01-31), XP055092414 *
TIMO GERKMANN ET AL: "MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase", IEEE SIGNAL PROCESSING LETTERS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 20, no. 2, 1 February 2013 (2013-02-01), pages 129 - 132, XP011482926, ISSN: 1070-9908, DOI: 10.1109/LSP.2012.2233470 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903355A (en) * 2021-12-09 2022-01-07 北京世纪好未来教育科技有限公司 Voice acquisition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
EP3036739A1 (en) 2016-06-29
WO2015024940A1 (en) 2015-02-26

Similar Documents

Publication Publication Date Title
US9666183B2 (en) Deep neural net based filter prediction for audio event classification and extraction
JP3154487B2 (en) A method of spectral estimation to improve noise robustness in speech recognition
DE112014003337T5 (en) Speech signal separation and synthesis based on auditory scene analysis and speech modeling
JP2005518118A (en) Filter set for frequency analysis
AU2009203194A1 (en) Noise spectrum tracking in noisy acoustical signals
Vincent et al. Estimation of LF glottal source parameters based on an ARX model.
Ganapathy Multivariate autoregressive spectrogram modeling for noisy speech recognition
JP6348427B2 (en) Noise removal apparatus and noise removal program
Min et al. Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement
Do et al. Speech Separation in the Frequency Domain with Autoencoder.
Islam et al. Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask
Watanabe et al. Iterative sinusoidal-based partial phase reconstruction in single-channel source separation.
Benesty et al. On widely linear Wiener and tradeoff filters for noise reduction
Agcaer et al. Optimization of amplitude modulation features for low-resource acoustic scene classification
EP2840570A1 (en) Enhanced estimation of at least one target signal
Yoshioka et al. Dereverberation by using time-variant nature of speech production system
Bavkar et al. PCA based single channel speech enhancement method for highly noisy environment
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
Li et al. Multichannel identification and nonnegative equalization for dereverberation and noise reduction based on convolutive transfer function
Malek Blind compensation of memoryless nonlinear distortions in sparse signals
Do et al. A variational autoencoder approach for speech signal separation
Rassem et al. Restoring the missing features of the corrupted speech using linear interpolation methods
CN110491408B (en) Music signal underdetermined aliasing blind separation method based on sparse element analysis
Adrian et al. Synthesis of perceptually plausible multichannel noise signals controlled by real world statistical noise properties
Mallidi et al. Robust speaker recognition using spectro-temporal autoregressive models.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130823

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20150826